Download Print this article

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Naive Bayes classifier wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Index Copernicus Value- 56.65
Volume||5||Issue||03||March-2017||Pages-6266-6273||ISSN(e):2321-7545
Website: http://ijsae.in
DOI: http://dx.doi.org/10.18535/ijsre/v5i03.04
New Avenues in opinion mining: Considering Dual Sentiment Analysis
Authors
1
Pankaj R Chandre , Rohan Raj2, Himanshu Raj3
Dept. of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune, India
Email Id - [email protected], [email protected]
ABSTRACT
In the modern world, due to advancement and outreach of technology, ease of access to any kind of product
or service is growing immensely. Subjective attitude (i.e. Sentiment) ranging from that of products, news,
movies to that of social networking mediums is being given for each and every product now-a-days. The
market now, not only values the expert opinion but the reviews of masses have taken an equal importance as
they are the one using the products and services. For the betterment of products, the input must be
understood and the data must be analyzed by proper Machine Learning techniques along with Natural
Language Processing in order to draw the conclusions and comprehending the overall situation. The topicbased text classification based on the Bag-of-Words model has some fundamental inadequacies, although
various algorithms and classifiers (like naïve Bayes, support vector machines) are already analyzing
sentiments and giving categorical feedback as a generic output. Polarity shift problem restricts the
performance of these existing models. To address this problem for sentiment classification, Dual sentiment
analysis (DSA) has been expanded from a 2 facet classification to a 3 facet classification which considers
neutral reviews from the dataset as well for better accuracy and understanding. For each training and test
review, a novel data expansion technique is being proposed that will use opposite class labels of positive and
negative sentiments in one to one correspondence for a dual training and dual prediction algorithm. A
corpus method based pseudo-antonym dictionary has also been proposed to remove the single language
(English) based restriction and to maintain domain consistency as it will be pairing up words on the basis of
sentiment strength.
Keywords- Natural Language Processing, Bag-of-Words, Machine Learning, Dual Sentiment Analysis,
Opinion mining, Naïve Bayes, Support Vector Machines, Dataset, Polarity shift, Corpus Method
1. INTRODUCTION
Natural language processing, text analysis and computational linguistics are used to identify and extract
subjective information in source materials. This is nothing but Sentiment analysis which is widely applied to
reviews and social media for a variety of applications, ranging from marketing to customer service.
Analyzers are used for polarity identification. Analyzers are of two types, manual (domain oriented) and
automatic (generalized oriented). We used domain oriented in our methodology. In manual analyzer,
predefined data set exists in which similar/ related term have to be fed and then the result occurs. Sentiment
analysis is used to classify polarity and the sentiment analyzer is used to define polarity opinion expressed is
(+) ve, (-) ve or (=) neutral [1]. A model called dual sentiment analysis (DSA) addresses this problem of
polarity for sentiment classification. We first propose a novel data expansion technique by creating a
sentiment reversed review for both, training and test review.
On this basis, we propose a dual training algorithm to make use of original and reversed training reviews in
pairs for learning a sentiment classifier, and a dual prediction algorithm to classify the test reviews by
considering two sides of one review. Sentiment analysis is to extract the opinion of the user from the text
Pankaj R Chandre et al IJSRE Volume 5 Issue 3 March 2017
Page 6266
document, identifying the orientations of opinions from the text. This movie was awesome-[sentiment]. This
was boring-[sentiment]. Sentiment analysis and opinion mining, as a special text mining task for determining
the subjective attitude (i.e., sentiment) expressed by the text, is becoming a hotspot in the field of data mining
and natural language processing.
Opinions and its related concepts such as sentiments, evaluations, attitudes and emotions are the subjects of
study of sentiment analysis and opinion mining. The inception and rapid growth of the field coincide with
those of the social media on the Web, e.g., reviews, forum discussions, blogs, micro blogs, Twitter, and
social networks. Because for the first time in human history, we have a huge volume of opinionated data
recorded in digital forms and the future seems to be very piquant with the inception and expansion of clouds,
business analytics, data science and other such topics. Since early 2000, sentiment analysis has grown and
still vastly growing to be one of the most active research areas in natural language processing. Data mining,
Web mining, and text mining have very wide applications relating to sentiment analysis. There are three
types of semantic orientation for any review, which can be positive, negative, or neutral, but some reviews
are ambiguous to ascertain. We examined the effect of valence shifters on classifying the reviews. Three
types of valence shifters are scrutinized which are negations, intensifiers, and diminishers. To reverse the
semantic polarity of a particular term, Negations are used, while whenever we need to increase and decrease,
respectively, the degree to which a term is positive or negative, intensifiers and diminishers are used.
Sentiment classification is a basic task in sentiment analysis, to classify the sentiment (e.g., positive or
negative) of a given text. The bag-of-words (BOW) model is typically used for text representation. In the
BOW model, a Review text is represented by a vector of independent words to train a sentiment classifier
statistical machine. Learning algorithms (such as naïve Bayes, maximum entropy classifier, and support
vector machines) are then employed. The organization of this paper is as follows. Section 2 reviews the
related work. In Section 3, we present the proposed system with introduction of the DSA framework in
detail. Section 3 also presents the two methods for constructing an antonym dictionary. The experimental
scopes are discussed in Section 3. Section 4 finally draws conclusions and outlines directions for the future
work.
2. RELATED WORK
There are four categories of Sentiment analysis: document-level, sentence-level, phrase-level, and aspectlevel sentiment analysis.
Phrase/subsentence and aspect-level sentiment analysis are affected by complex polarity shift. As discussed
by T. Wilson et al. [13], they began with a lexicon of words with established prior polarities, and identify the
“contextual polarity” of phrases, based on some refined annotations. Choi Cardie [15] uses a different kind of
negators to improve subsentential sentiment analysis. Supervised model for subsentential was developed by
Nakagawa et al. [10], which ascertain inter-node based polarity in the dependency graph. Term-counting and
machine learning methods are two types of document and sentence-level sentiment classification. In termcounting, based on manually-collected or external lexical resources [20], [22], content words are oriented on
basis of total orientation score. Sentiment classification is a statistical problem in the case of machine
learning methods, where a text is represented by a bag-of-words; then, the supervised machine learning
algorithms are applied as classifier [21].
Machine learning methods are more evident than term-counting methods in various sentiment classification
literatures. In the case of term-counting methods, the sentiment of polarity-shifted words can be reversed and
then summed up but it's relatively more tedious in case of the bag-of-words model. Das and Chen [6]
proposed a method by simply attaching “NOT” to words in the scope of negation, so that in the text “I don’t
like book”, the word “like” becomes a new word “like-NOT”. But this showed a very little improvement as
reported by Yet pang et al. [21]. Linguistic features or lexical resources were also considered to model
Pankaj R Chandre et al IJSRE Volume 5 Issue 3 March 2017
Page 6267
polarity shift. Syntactic parsing was used to capture valence shifters which improved the term counting
system significantly but showed very marginal improvement in the case of machine learning system. Li and
Huang [12] put forward bifurcation of each sentence into the polarity shifted and polarity un-shifted part to
represent them as two bags-of-words. Classification models are then trained and combined to project final
polarity.
Table 1- An Example of Creating Reversed Training Reviews
Review Text
Class
Original review
I don't like to eat Chinese. It tastes bad.
Negative
Reversed review
I like to eat Chinese. It tastes good.
Positive
In this paper we extend the previous work in three major aspects; firstly, a selective data expansion procedure
is added. Then we extend DSA framework to consider neutral sentiments as well. Lastly, a corpus-based data
dictionary is constructed to remove external dependencies.
3. PROPOSED SYSTEM
3.1 Data Expansion Technique
In the field of natural language processing and text mining, Agirre and Martinez [24] proposed expanding the
amount of labeled data through a Web search using monosemous synonyms or unique expressions in
definitions from Word-Net for the task of word sense disambiguation. Training data from external dictionary
was put forth by Fujita and Fujino [7]. Rui Xia et al. [1] proposed a construction of original and reversed
reviews in a one -to-one correspondence. The further dataset was expanded at training and test stage. Rui Xia
et al. [1] also proposed a two-step data expansion technique based on antonym dictionary. After detecting
negation, Text-reversion is done in which all sentiment words out of the scope of negation are reversed to
their antonym. Label-reversion is then applied to reverse the class label. Table 1 gives an example of the
given proposition. Given, an original training review, “I don’t like to eat Chinese. It tastes bad. (class:
Negative)”, the reversed review is obtained by three steps: 1) the sentiment word “bad” is reversed to its
antonym “good”; 2) the negation word “don’t” is removed. Since “like” is in the scope of negation, it is not
reversed; 3) the class label is reversed from Negative to Positive. Note that in data expansion for the test
dataset, we only conduct Text Reversion. Machine based sentiment reversed review might be not as good as
a human-generated sentence. So we should maintain an optimal grammatical quality. Considering this
problem some weight or numerical value could be attached to words on the basis of their sentiment strength.
3.2 Dual Training
In this stage, two training sets are created. The original training samples are referred as 'Original training set'.
Then the original training samples are reversed to their opposites and are called as 'Reversed training set'.
One-to-one correspondence is always maintained between the original and reversed reviews. The classifier is
designed by maximizing a combination of the likelihoods of the original and reversed training samples. This
process is called as dual training [1]. We will be using Naïve Bayes classifier to derive the dual training
algorithm. Along with Naïve Bayes, Logistic regression model and Support vector machines will be
examined during the experiment. The Naïve Bayes uses a combined probability for training parameters,
whereas the Logistic regression uses a log probability function and SVM's optimizes a combined hinge loss
function.
First, we determine the train set and test set in the dataset by considering the basic formula of the Naïve
Bayes Theorem,
Pankaj R Chandre et al IJSRE Volume 5 Issue 3 March 2017
Page 6268
( )
( )
Where, the P(C) is the prior probability of class C, P(x) is the prior probability of the training data. P(C/A) is
the probability of C given x and P(x/C) is the probability of x given C.
Second, we convert data into frequency table. As mentioned in the basic formula of the theorem, to calculate
the prior probability of the class as well as of the training data, we compute the Prior probability by the given
formula,
Where, P(C) is the probability of the class, N is the total count of class in the training set and (N.C) is the
total count of particular class in training set.
Next, we compute the conditional probability/likelihood of each word attribute by the method,
|
| |
Generally, we want the most probable hypothesis given the training data so we compute the posterior
probability by the formula,
CMAP = arg max P(x1, x2,….,xn).P(C)
And at last, we determine the class of the test set and then proceed for the prediction stage so as to classify
the reviews in three-class sentiment classification.
Suppose the example in Table 1 is used as one training sample. As far as only the original sample (“I don’t
like to eat Chinese. It tastes bad.”) is considered, the feature “like” will be improperly recognized as a
negative indicator (since the class label is Negative), ignoring the expression of negation. Nevertheless, if the
generated opposite sample (“I like to eat Chinese. It tastes good.”) is also used for training, “like” will be
learned correctly, due to the removal of negation in sample reversion. Therefore, the procedure of dual
training can correct some learning errors caused by polarity shift.
Fig. 1 Model Architecture
3.3 Dual Prediction
After training classification model, the original and reversed test samples are used together for prediction.
We predict the test sample in terms of positive and negative degrees. Let and ̃ be the original and reversed
Pankaj R Chandre et al IJSRE Volume 5 Issue 3 March 2017
Page 6269
sample. We are using ̃ to assist the prediction of rather than predictig the class of ̃.This process is called
dual prediction [1]. Let
| and
| ̃ denotes posterior probability of and ̃ respectively.
When we want to measure how positive a test review is, we not only consider how positive the original
test review is (i.e.,
| ), but also consider how negative the reversed test review is (i.e.,
| ̃ ).
Conversely, when we measure how negative a test review is, we consider the probability of being
negative (i.e.,
| ), as well as the probability of ̃ being positive (i.e.,
| ̃ ).
The dual predicting function is defined as:
|
|
̃
̃
|
|
|̃
|̃
Where is a tradeoff parameter
.
Let us use the example given in Table 1 again to explain why dual prediction works in addressing the polarity
shift problem. This time we assume that “I don’t like to eat Chinese” is an original test review, and “I like to
eat Chinese” is the reversed test review. In traditional BOW, “like” will contribute a high positive score in
predicting overall orientation of the test sample, despite of the negation structure “don’t like”. Hence, it is
very likely that the original test review will be misclassified as Positive. While in Dual Prediction, due to the
removal of negation in the reversed review, “like” this time the plays a positive role. Therefore, the
probability that the reversed review being classified into Positive must be high. In Dual Prediction, a
weighted combination of two component predictions is used as the dual prediction output. In this manner, the
prediction error of the original test sample can also be compensated by the prediction of the reversed test
sample. Apparently, this can reduce some prediction errors caused by polarity shift. In the experimental
study, we will extract some real examples from our experiments to prove the effectiveness of both dual
training and dual prediction.
3.4 Selective Data Expansion
Our review example in Table 1 has a very distinct sentiment polarity. However not all the cases will be free
from ambiguity. Such a problem could limit the use of all the labeling reviews for data expansion and data
training. To tackle such problems Rui Xia et al. [1] investigated and subsequently put forth a selective data
expansion procedure to select a part of training reviews for data expansion.
Let us use another pair of complex examples to understand the proposed technique.
Review (a): The mobile's processor is fast, and the cost is low. It's easy to play games.
Review (b): The mobile's processor is somewhat smooth, but the cost is bit high. It's not tough to play games.
In review (a), it's having a very strong sentiment with a low polarity shift rate. The statement explicitly
explains its view as well as the degree of the sentiment. Hence, both the original and reversed review are
examples of a good labeling instance.
In review (b), the sentiment polarity is not distinct and unambiguous as compared to review (a). So, therefore
creating a reversed review for review (b) is not required as in the case of review (a). For this purpose Rui Xia
et al. [1] proposed a sentiment degree metric for selecting the most sentiment-distinct training reviews for
data expansion.
3.5 Positive-Negative-Neutral framework for DSA
Most widely used sentiment analysis technique i.e. polarity classification classifies the reviews either positive
or negative. But there are situation where neutral reviews also exist. The existing DSA systems are not able
to classify the neutral review. Hence, a system is proposed by Rui Xia et al. [1], which gives us a 3-class
sentiment classification.
Table 2 gives an example of creating the reversed reviews for sentiment-mixed neutral reviews. Any neutral
review comprises of two main situations. It may be an objective text which could be neither positive nor
Pankaj R Chandre et al IJSRE Volume 5 Issue 3 March 2017
Page 6270
negative or it could be an ambiguous statement with mixed positive and negative text projecting a conflicting
sentiment. Therefore the reversed review's sentiment is also supposed to be on neutral grounds.
The selective data expansion procedure is still used in this case, i.e., only the labeled data with high posterior
probability will be used for data expansion.
Table 2 : An Example Of Data Expansion For Neutral Reviews
Review Text
Class
Original review
This Chinese soup is spicy but it tastes delicious.
Neutral
Reversed review
This Chinese soup is sweet, but it doesn't tastes delicious.
Neutral
3.6 Pseudo-Antonym Dictionary
Existing DSA system is totally dependent on a word that is the opposite of actual word called as external
antonym dictionary. There is still a question that how to construct a suitable dictionary for sentiment
classification. We can easily get various external antonym dictionaries directly from the well-defined lexicon
such as WordNet (http://wordnet.princeton.edu/) in English. Various other web-based repositories hosting
service such as GitHub could also provide abundant resources. These databases categorize English words
into synonym sets called synsets which provides short, general definition, and records the various semantic
relations between these synonym sets.
After analyzing any specific synonym set we have to find its corresponding antonym set from any other
external source like Thesaurus. But since it couldn't still guarantee domain-consistency of our tasks or sample
subsets. Hence, a corpus-based method was proposed by Rui Xia et al. [1] to construct a pseudo-antonym
dictionary. It basically uses mutual information based on labeled training data to identify the most positiverelevant and most negative-relevant features. Categories are then made and pairing is done for the features
that have same level of sentiment strength as compared to antonym word.
Table 3 : Repository of Words in Sentiment Classification
#Positive
(-:
(^-^)
:-*
(o:
<3
Affordable
Marvelous
#Negative
)-:
:-/
:*(
)o:
>:o
Annoying
Pathetic
#Neutral
:|
:-o
8-)
:o->-<|:
Alright
Average
Yeah
Mutual dependence of any two random variables is measured by the Mutual Information (MI). It’s a feature
selection method in text categorization and sentiment classification [14]. The ranking and relevance of
positive and negative groups are calculated by MI metric. Equivalent ranked positive and negative relevant
words are matched as a pair of antonym words which gives us a pseudo antonym dictionary. A pseudo
dictionary doesn't matches words on the basis of their exact meaning, rather matches them on the basis of the
contextual sentiment strength they possess. Therefore semantic meaning always supersedes a syntactic
meaning. They are language-independent and domain-adaptive as well. Thus, DSA model is widely
applicable irrespective to the availability of lexical antonym dictionaries across different languages and
domains.
4. CONCLUSION
In this paper, we focus on creating reversed review to assist supervised sentiment classification which gives
us an insight in sentiment analysis and opinion mining. Previous experiments demonstrate the effectiveness
of DSA model in case of polarity classification and it significantly outperforms several alternatives. The use
Pankaj R Chandre et al IJSRE Volume 5 Issue 3 March 2017
Page 6271
of corpus-based pseudo-antonym dictionary implicates a very wide practical approach when constrained with
limited lexical resources and domain knowledge. We plan to conduct a wide range of practical experiments
which will include real world data-sets from different commercial and research based sources such as
Amazon, Princeton University, Flipkart and others. It will help us to explore new methods and possibilities
based on the analytics drawn from the experiment results and will also provide a base for our future
endeavors. In the future, we plan to integrate this model with real time web-based applications and various
social media to test its diverse applicability. Furthermore, complex polarity shift patterns with more vague
meaning and incompatible or conflicting sentiments will be considered.
REFERENCES
1. Rui Xia, Feng Xu, Chengqing Zong, Qianmu Li, Yong Qi, and Tao Li, “Dual Sentiment Analysis:
Considering Two Sides of One Review”. IEEE Trans. Knowl. Data Eng., vol. 27, no. 8, pp. Aug.
2015.
2. Tian, N., Xu, Y., Li, Y., Abdel-Hafez, A., Josang, A.: Product feature taxonomy learning based on
user reviews. In: WEBIST 2014 10th International Conference on Web Information Systems and
Technologies (2014).
3. R. Xia, T. Wang, X. Hu, S. Li, and C. Zong, “Dual Training and Dual Prediction for Polarity
Classification, “Proceedings of the Annual Meeting of the Association for Computational Linguistics
(ACL), pp. 521-525, 2013.
4. R. Xia, T. Wang, X. Hu, S. Li, and C. Zong,“Dual training and dual prediction for polarity
classification,” in Proc. Annu. Meeting Assoc. Comput. Linguistics, 2013, pp. 521–525.
5. Erik Cambria, Amir Hussain, “Sentic Computing Techniques, Tools, and Applications”, Published at
Springer May 9, 2012
6. A. Abbasi, S. France, Z. Zhang, and H. Chen, “Selecting attributes for sentiment classification using
feature relation networks” IEEE Trans-actions on Knowledge and Data Engineering (TKDE), vol. 23,
no. 3, pp. 447-462, 2011.
7. S. Fujita and A. Fujino,“Word sense disambiguation by combining labeled data expansion and semisupervised learning method,” Proceedings of the International Joint Conference on Natural Language
Processing (IJCNLP), pp. 676-685, 2011.
8. I. Councill, R. MaDonald, and L. Velikovich, “What’s Great and what’s Not: Learning to Classify the
Scope of Negation for Improved Sentiment Analysis,” Proceedings of the Workshop on negation and
speculation in natural language processing, pp. 51-59, 2010.
9. S. Li, S. Lee, Y. Chen, C. Huang and G. Zhou, “Sentiment Classification and Polarity Shifting,”
Proceedings of the International Conference on Computational Linguistics (COLING), 2010.
10. T. Nakagawa, K. Inui, and S. Kurohashi. “Dependency tree-based sentiment classification using
CRFs with hidden variables,” Proceedings of the Annual Conference of the North American Chapter
of the Association for Computational Linguistics (NAACL), pp. 786-794, 2010.
11. X. Ding,B. Liu and L. Zhang, "Entity discovery and assignment for opinion mining applications"
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (KDD), 2009.
12. S. Li and C. Huang, “Sentiment classification considering negation and contrast transition,”
Proceedings of the Pacific Asia Conference on Language, Information and Computation (PACLIC),
2009.
13. T. Wilson, J. Wiebe, and P. Hoffmann, "Recognizing contextual polarity:An exploration of features
for phrase-level sentiment analysis,"Computational Linguistics, vol. 35, no. 3, pp. 399-433, 2009.
Pankaj R Chandre et al IJSRE Volume 5 Issue 3 March 2017
Page 6272
14. S. Li, R. Xia, C. Zong and C.Huang, “A framework of feature selection methods for text
categorization,” Proceedings of the Annual Meeting of the Association for Computational Linguistics
(ACL), pp. 692-700,2009.
15. Y. Choi and C. Cardie, “Learning with Compositional Semantics as Structural Inference for
Subsentential Sentiment Analysis,” Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), pp. 793-801, 2008.
16. Y. Choi and C. Cardie, “Learning with Compositional Semantics as Structural Inference for Sub
sentential Sentiment Analysis, “Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), pp. 793-801, 2008.
17. V. Ng, S. Dasgupta and S. Arifin, “Examining the Role of Linguistic Knowledge Sources in the
Automatic Identification and Classification of Reviews,” Proceedings of the International
Conference on Computational Linguistics and Annual Meeting of the Association for Computational
Linguistics (COLING/ACL), pp. 611-618, 2006.
18. A. Kennedy and D. Inkpen, “Sentiment classification of movie reviews using contextual valence
shifters,” Computational Intelligence,vol. 22, pp. 110–125, 2006.
19. M. Gamon, “Sentiment classification on customer feedback data: noisy data, large feature vectors,
and the role of linguistic analysis,” Proceedings of the International Conference on Computational
Linguistics (COLING), pp. 841-847, 2004.
20. P. Turney and M. L. Littman, “Measuring praise and criticism: Inference of semantic orientation from
association,” ACM Transactions on Information Systems (TOIS), vol. 21, no. 4, pp. 315-346, 2003.
21. B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machine learning
techniques,” Proceedings of the Conference on Empirical Methods in Natural Language Processing
(EMNLP),pp. 79-86, 2002.
22. P. Turney, “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification
of reviews,” Proceedings of the Annual Meeting of the Association for Computational Linguistics
(ACL), 2002.
23. S. Das and M. Chen, “Yahoo! for Amazon: Extracting market sentiment from stock message boards,”
Proceedings of the Asia Pacific Finance Association Annual Conference, 2001.
24. E. Agirre, and D. Martinez, “Exploring automatic word sense disambiguation with decision lists and
the Web,” Proceedings of the COLINGWorkshop on Semantic Annotation and Intelligent Content, pp.
11-19,2000.
25. R. Mihalcea and D. Moldovan, “An automatic method for generating sense tagged corpora”
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 1999.
Pankaj R Chandre et al IJSRE Volume 5 Issue 3 March 2017
Page 6273