Classification of Opinion on Movie Reviews by using Classifiers with 3-gram TF-IDF and SVD Features
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Extraction of features plays an effective role in sentiment analysis or opinion mining
about an issue, customer reviews and products etc. in which these are fed to machine
learning approaches to get the sentiments classified. Existing techniques widely used
TF-IDF feature extraction from the unigram lexicons of the sentiment documents, some
used the term frequency score of the unigram words as features. In this work, unigram,
two word clusters (bigram) and three word clusters (trigram) are generated after
filtering the collected sentiment data. Data used are the web movie reviews collection
of the users. Data is collected manually from varies websites( www.imdb.com, bookmyshow.com,
google user reviews ) about the conflict of Bollywood movie Padmaavat in
which three different sentiments were found. Many people have positive moods about
the issue and releasing of the movie Padmaavat and some of them are against of the
movie, which are taken as negative reviews. A very little quantity was showing neutral
moods, which show sentiments both in favor and against the movie. Hence three different
categories of reviews are marked and fed to the proposed opinion mining system. All
three unigram, bigram and trigram word lexicons are used further to get the TF-IDF of
all the reviews. After that singular value decomposition (SVD) features are generated.
Four different machine learning classifiers named as a K-Nearest Neighbor, Support Vector
Machine, Naive-Bayes and Decision Tree are used for the classification step in which
results are compared. Experimental results show more accuracy in classification when
proposed feature extraction techniques are used as compared to existing method. Among
the classifiers, decision trees give better accuracy in classification of sentiments than all
other used classifiers. Decision tree gives 0.9272% accuracy in classification for positive
sentiments, 0.8901% accuracy for negative sentiments and 0.9629% accuracy for neural
sentiments.
Description
Master of Engineering- CSE
