Please use this identifier to cite or link to this item:
http://hdl.handle.net/10266/6070
Title: | Sentiment Analysis of Social Media for Hindi Language |
Authors: | Rani, Sujata |
Supervisor: | Kumar, Parteek |
Keywords: | Sentiment;Social Media;Sentence-based;Aspect-based;Hindi Language;Indian Language |
Issue Date: | 11-Jan-2021 |
Abstract: | In recent years, due to the availability of voluminous data on web for Indian languages, it has become an important task to analyze this data to retrieve useful information. Because of the growth of Indian language content, it is beneficial to utilize this explosion of data for the purpose of sentiment analysis. There are various applications of sentiment analysis in different domains such as recruitment, education, marketing, policy making, unemployment, fighting riots, terrorism, and education, etc. This research contributes to the development of Hindi sentiment analysis system for aspect, sentence and document level. The system is able to perform the sentiment analysis of Twitter posts. The system is available online at www.hindisenti.com. Hindi is the official language of India belonging to the family of Aryan languages. It is the 4th most spoken language with 310 million speakers across the world. In India, Hindi is spoken by a total of 422 million speakers; it’s about 41% of total population of India. Therefore, there is a need to perform sentiment analysis in Hindi language so that the opinions of users in Hindi can be easily classified and proved useful for the users in decision making. Iin today’s life, mostly people share their opinions on social media platforms. This motivated us to explore the field of sentiment analysis on social media for Hindi language. Although there are many differences in language structure of English and Hindi which arise different challenges while performing sentiment analysis on text dataset. This research work presents the description about the general process of sentiment analysis at different sentiment levels, i.e., aspect/feature, sentence and document level. This research depicts a systematic review in the field of sentiment analysis in general and Indian languages specifically. The current status of Indian languages in sentiment analysis is classified according to the Indian language families. The periodical evolution of Indian languages in the field of sentiment analysis, sources of selected publications on the basis of their relevance are also described. Further, taxonomy of Indian languages in sentiment analysis based on techniques, domains, sentiment levels and classes has been presented. This research work will assist researchers in finding the available resources such as annotated datasets, pre-processing linguistic and lexical resources in Indian languages for sentiment analysis and will also support in selecting the most suitable sentiment analysis technique in a specific domain along with relevant future research directions. This thesis presents the architectures of SA system for Hindi language at sentence level and aspect level using ML and lexicon based techniques, respectively. To train the ML algorithms, corpus of reviews and tweets has been collected from online websites and Twitter, respectively. The corpus has been annotated by three Hindi native speakers and has been validated using the statistic kappa measure. The experimental results given by different ML algorithms have been measured using performance measures precision, recall and F-measure. To further improve the accuracy of the system, deep learning based CNN has been applied on the corpus of Hindi reviews. The experimental results suggest that properly trained CNNs can outperform the traditional ML algorithms for sentiment classification. At aspect level, sentiment analysis has been perfoemd using lexicon-based technique. The system has been experimented on reviews dataset about products and movies inHindi language. The proposed system uses two lexical resources Hindi Dependency Parser (HDP) and Hindi SentiWordNet (HSWN). It follows an efficient aspect extraction process to extract all the relevant aspects which include three steps, i.e., extraction of frequent nouns, identification of relevant nouns and removal of irrelevant nouns. The sentiment nodes are extracted using HSWN. The system uses HDP to determine the association between the aspect nodes and sentiment nodes. Also, the system generates a dependency graph and assigns the sentiment to the particular aspect having the least distance between sentiment word and aspect word. This thesis also presents a case study of sentiment analysis for education domain by performing sentiment analysis of student feedback. The students’ feedback has been collected from Coursera and SRS of the University using “R” language with natural processing techniques. The sentiments of students have been analyzed in the form of different emotions such as anger, anticipation, disgust, fear, joy, sadness, surprise, trust as well as positive and negative sentiments. Two new emotions satisfaction and dissatisfaction are derived from the existing emotions. The direct and indirect assessment methods of course evaluation have been compared and it has been observed that both the methods provide converging evidence of student learning and teaching quality. Thus, this system can help an organization in improving student learning and teaching quality. |
URI: | http://hdl.handle.net/10266/6070 |
Appears in Collections: | Doctoral Theses@CSED |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
951403004_SujataRani (1).pdf | 5.28 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.