Download Determining the Sentiment of Opinions

DETERMINING THE SENTIMENT OF OPINIONS 1 Presentation by Md Mustafizur Rahman (mr4xb) OUTLINES  What is an Opinion?  Problem definition  Word Sentiment Classifier  Sentence Sentiment Classifier  Experimental Analysis  Shortcomings  Future works 2 WHAT IS AN OPINION?  An opinion is a quadruple [Topic, Holder, Claim, Sentiment]  The Holder believes a Claim about the Topic and in many cases associates a Sentiment.   Opinion may contain sentiment or not  e.g.  I believe the world is flat. (absent) Sentiment can be implicit or explicit  e.g. I like apple. (explicit)  e.g. We should decrease our dependence on oil (implicit) 3 PROBLEM DEFINITION  Opinion = [Topic, Holder, Claim, Sentiment]  Given a Topic  a set of texts about the topic   Find The sentiments (only positive or negative) about the topic in each sentence  Identify the people who hold that sentiment.  4 AUTHORS APPROACH  4 Basic stages  Calculation of the polarity of sentiment bearing words (Word Sentiment Classifier)  Selection of sentence containing both topic and holder  Holder based region identification  Combine these polarity to provide the sentence sentiment (Sentence Sentiment Classifier) 5 WORD SENTIMENT CLASSIFIER   To build a classifier we need a training data How to generate training data for word sentiment classifier?  Assemble a small amount of seed words by hand  Seed word list only contains positive and negative polarity words  Then grow this list by adding synonyms and antonyms from WordNet [1] 6 WORD SENTIMENT CLASSIFIER WORDNET 7 WORD SENTIMENT CLASSIFIER WORDNET (CONTD.) Figure: An example of the relationship between Hyponyms and Hypernym [source: wikipedia] 8 WORD SENTIMENT CLASSIFIER (CONTD.)   Initial Seed word list  Adjectives (15 positive and 19 negative)  Verbs (23 positive and 21 negative) Final Seed word list  Adjectives (5880 positive and 6233 negative)  Verb (2840 positive and 3239 negative)  Some words e.g. “great”, “strong” appears in both positive and negative categories. 9 WORD SENTIMENT CLASSIFIER (CONTD.)  Now we have A set of words  Each word has a class label (or polarity) of either positive or negative   How to calculate the strength of the sentiment polarity? For a new word w we compute first the synonym set (syn1, syn2, …, synn) from WordNet .  Then we compute arg max P(c|w) which is equivalent to arg max P(c| syn1, syn2, …, synn)  Here c is sentiment category (positive or negative)  10 WORD SENTIMENT CLASSIFIER (CONTD.)  There are two possible ways to calculate   arg max P(c|w) Approach 1 arg maxp(c | w)  arg maxP(c)P(w | c)  arg maxP(c)P(s yn_1, syn_2,..., syn_n | c) m  arg maxP(c)  p(f_k | c)^ count(f_k, synset(w)) k 1 Where f_k is the kth feature of category c.  And count(f_k,synset(w)) is the total number of occurrence of f_k in the synonym set of w.  11 WORD SENTIMENT CLASSIFIER (CONTD.)  There are two possible ways to calculate   arg max P(c|w) Approach 2 arg max p(c | w)  arg max p(c) p( w | c) n  arg max p(c)   count (syn _ i, c) i 1 count (c) Where count(syn_i,c) is the count of occurrence of w’s synonyms in the list of c. 12 WORD SENTIMENT CLASSIFIER (CONTD.)   word “amusing”, for example, is classified as carrying primarily positive sentiment, and “blame” as primarily negative “afraid” with strength 0.99 represents strong negativity while “abysmal” with strength -0.61 represents weaker negativity. 13 SENTENCE SENTIMENT CLASSIFIER  Consists of 4 parts:  Identification of Topic in the sentence (i.e. direct matching)  Identification of opinion holder  Identification of region  Development of model to combine sentiments 14 SENTENCE SENTIMENT CLASSIFIER (CONTD.) HOLDER IDENTIFICATION  Assumption Person and organization are the only opinion holder  For sentence with more than holder just pick the closest one to Topic.   Method  BBN named entity tagger identifier [2]  A software tool [http://www.bbn.com/technology/speech/identifinder] 15 SENTENCE SENTIMENT CLASSIFIER (CONTD.) SENTIMENT REGION IDENTIFICATION Where to look for the sentiment?  Proposed different sentiment region  Window 1 Full sentence Window 2 Words between holder and Topic Window 3 Window2 ± 2 Window 4 Window 2 to the end of the sentence 16 SENTENCE SENTIMENT CLASSIFIER (CONTD.) CLASSIFICATION MODEL  3 different models  Model 0:    (signs in region) Signs can be positive or negative Model 1:  Harmonic mean of the sentiment in the region 1 n p (c | s )  p (c | w _ i )  n(c) i 1 if argmax p(c_j | w_i)  c 17 SENTENCE SENTIMENT CLASSIFIER (CONTD.) CLASSIFICATION MODEL  Model 1 (Contd.) n( c) is the number of words in the region whose sentiment category is c.  s is the sentiment strength   Model 2  Geometric mean of the sentiment in the region n p(c | s )  10^ (n(c)  1) x  p(c | w_i) i 1 if arg max p(c _ j | w _ i )  c 18 SYSTEM ARCHITECTURE 19 EXPERIMENTAL ANALYSIS  Two set of experiments for  Word Sentiment Classifier  Sentence Sentiment Classifier 20 EXPERIMENTAL ANALYSIS (CONTD.) WORD SENTIMENT CLASSIFIER  Dataset Word List from TOEFL exam  A predefined list  Containing 19748 English Adjectives  And 8011 English Verbs     Take an intersection of above two lists. Finally take randomly 462 adjectives and 502 verbs. Classification of dataset   Human 1 and Human 2: label adjectives Human 2 and Human 3 : label verbs 21 EXPERIMENTAL ANALYSIS (CONTD.) WORD SENTIMENT CLASSIFIER Class Label Positive, Negative and Neutral  Measurement Type  Strict – Consider all class label  Lenient – Two Class Label Negative and Positive merged with neutral  22 Table: Inter Human Agreement EXPERIMENTAL ANALYSIS (CONTD.) WORD SENTIMENT CLASSIFIER Table: Human-Machine Agreement (Small Seed Set) 23 Table: Human-Machine Agreement (Larger Seed Set) EXPERIMENTAL ANALYSIS (CONTD.) SENTENCE SENTIMENT CLASSIFIER  Dataset 100 sentences from the DUC 2001 Corpus  Topics covered: “illegal alien”, “term limit”, “gun control” and “NAFTA”   Classification of Sentence 100 sentences from the DUC 2001 Corpus [3]  Two human classify the sentence into three class label : positive, negative and N/A.  24 EXPERIMENTAL ANALYSIS (CONTD.) SENTENCE SENTIMENT CLASSIFIER  Experiment Variants Three different models  Four different windows  Two different word classifier models  Manual annotated holder vs. automatic holder   So in total 16 different variants for each model 1 and model 2 and 8 different variants for model 0. 25 EXPERIMENTAL ANALYSIS (CONTD.) SENTENCE SENTIMENT CLASSIFIER 26 Table: Results with manually annotated Holder Table: Results with automatic Holder EXPERIMENTAL ANALYSIS (CONTD.) SENTENCE SENTIMENT CLASSIFIER  Performance Matrix  Correctness  Correct identification of both holder and sentiment Best Model : Model 0  Best Window : window 4   Accuracy 81% accuracy obtained on manually annotated holder  67% accuracy obtained on automatic holder  27 SHORTCOMINGS  Consider only unigram model. As a result, for some words having both positive and negative sentiment this model will fail.  E.g.: Term limit really hit at democracy.   Model cannot infer sentiment from fact Absence of adjective, verb and noun sentiment word prevents classification.  E.g.: She thinks term limit will give women more opportunities in politics.  28 FUTURE WORK  One of assumption of this work is that the topic is given. Can we extract topic automatically?  E.g: Twitter HashTag ??   Not only positive or negative sentiment  Context dependent sentiment (Bi-gram or ti-gram analysis) 29 REFERENCES    [1] Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. 1993. Introduction to WordNet: An On-Line Lexical Database. http://www.cosgi.princeton.edu/~wn. [2] BBN named entity tagger identifierhttp://www.bbn.com/technology/speech/identifind er [3] DUC 2001 Corpus. http://wwwnlpir.nist.gov/projects/duc/data.html 30

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Determining the Sentiment of Opinions