Download Determining the Sentiment of Opinions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DETERMINING THE SENTIMENT
OF OPINIONS
1
Presentation by
Md Mustafizur Rahman (mr4xb)
OUTLINES

What is an Opinion?

Problem definition

Word Sentiment Classifier

Sentence Sentiment Classifier

Experimental Analysis

Shortcomings

Future works
2
WHAT IS AN OPINION?

An opinion is a quadruple
[Topic, Holder, Claim, Sentiment]
 The Holder believes a Claim about the Topic and in
many cases associates a Sentiment.


Opinion may contain sentiment or not
 e.g.

I believe the world is flat. (absent)
Sentiment can be implicit or explicit
 e.g.
I like apple. (explicit)
 e.g.
We should decrease our dependence on oil (implicit)
3
PROBLEM DEFINITION

Opinion = [Topic, Holder, Claim, Sentiment]

Given
a Topic
 a set of texts about the topic


Find
The sentiments (only positive or negative) about the
topic in each sentence
 Identify the people who hold that sentiment.

4
AUTHORS APPROACH

4 Basic stages

Calculation of the polarity of sentiment bearing
words (Word Sentiment Classifier)

Selection of sentence containing both topic and
holder

Holder based region identification

Combine these polarity to provide the sentence
sentiment (Sentence Sentiment Classifier)
5
WORD SENTIMENT CLASSIFIER


To build a classifier we need a training data
How to generate training data for word
sentiment classifier?

Assemble a small amount of seed words by hand

Seed word list only contains positive and negative
polarity words

Then grow this list by adding synonyms and
antonyms from WordNet [1]
6
WORD SENTIMENT CLASSIFIER
WORDNET
7
WORD SENTIMENT CLASSIFIER
WORDNET (CONTD.)
Figure: An example of the relationship
between Hyponyms and Hypernym [source:
wikipedia]
8
WORD SENTIMENT CLASSIFIER (CONTD.)


Initial Seed word list

Adjectives (15 positive and 19 negative)

Verbs (23 positive and 21 negative)
Final Seed word list

Adjectives (5880 positive and 6233 negative)

Verb (2840 positive and 3239 negative)

Some words e.g. “great”, “strong” appears in both
positive and negative categories.
9
WORD SENTIMENT CLASSIFIER (CONTD.)

Now we have
A set of words
 Each word has a class label (or polarity) of either
positive or negative


How to calculate the strength of the sentiment
polarity?
For a new word w we compute first the synonym set
(syn1, syn2, …, synn) from WordNet .
 Then we compute arg max P(c|w) which is
equivalent to arg max P(c| syn1, syn2, …, synn)
 Here c is sentiment category (positive or negative)

10
WORD SENTIMENT CLASSIFIER (CONTD.)

There are two possible ways to calculate


arg max P(c|w)
Approach 1
arg maxp(c | w)
 arg maxP(c)P(w | c)
 arg maxP(c)P(s yn_1, syn_2,..., syn_n | c)
m
 arg maxP(c)  p(f_k | c)^ count(f_k, synset(w))
k 1
Where f_k is the kth feature of category c.
 And count(f_k,synset(w)) is the total number of
occurrence of f_k in the synonym set of w.

11
WORD SENTIMENT CLASSIFIER (CONTD.)

There are two possible ways to calculate


arg max P(c|w)
Approach 2
arg max p(c | w)
 arg max p(c) p( w | c)
n
 arg max p(c)

 count (syn _ i, c)
i 1
count (c)
Where count(syn_i,c) is the count of occurrence of
w’s synonyms in the list of c.
12
WORD SENTIMENT CLASSIFIER (CONTD.)


word “amusing”, for
example, is classified as
carrying primarily
positive sentiment, and
“blame” as primarily
negative
“afraid” with strength 0.99 represents strong
negativity while “abysmal”
with strength -0.61
represents weaker
negativity.
13
SENTENCE SENTIMENT CLASSIFIER

Consists of 4 parts:

Identification of Topic in the sentence (i.e. direct
matching)

Identification of opinion holder

Identification of region

Development of model to combine sentiments
14
SENTENCE SENTIMENT CLASSIFIER (CONTD.)
HOLDER IDENTIFICATION

Assumption
Person and organization are the only opinion holder
 For sentence with more than holder just pick the
closest one to Topic.


Method

BBN named entity tagger identifier [2]

A software tool
[http://www.bbn.com/technology/speech/identifinder]
15
SENTENCE SENTIMENT CLASSIFIER (CONTD.)
SENTIMENT REGION IDENTIFICATION
Where to look for the sentiment?
 Proposed different sentiment region

Window 1
Full sentence
Window 2
Words between holder and
Topic
Window 3
Window2 ± 2
Window 4
Window 2 to the end of the
sentence
16
SENTENCE SENTIMENT CLASSIFIER (CONTD.)
CLASSIFICATION MODEL

3 different models

Model 0:



(signs in region)
Signs can be positive or negative
Model 1:

Harmonic mean of the sentiment in the region
1 n
p (c | s ) 
p (c | w _ i )

n(c) i 1
if argmax p(c_j | w_i)  c
17
SENTENCE SENTIMENT CLASSIFIER (CONTD.)
CLASSIFICATION MODEL

Model 1 (Contd.)
n( c) is the number of words in the region whose
sentiment category is c.
 s is the sentiment strength


Model 2

Geometric mean of the sentiment in the region
n
p(c | s )  10^ (n(c)  1) x  p(c | w_i)
i 1
if arg max p(c _ j | w _ i )  c
18
SYSTEM ARCHITECTURE
19
EXPERIMENTAL ANALYSIS

Two set of experiments for

Word Sentiment Classifier

Sentence Sentiment Classifier
20
EXPERIMENTAL ANALYSIS (CONTD.)
WORD SENTIMENT CLASSIFIER

Dataset
Word List from TOEFL exam
 A predefined list

Containing 19748 English Adjectives
 And 8011 English Verbs




Take an intersection of above two lists.
Finally take randomly 462 adjectives and 502 verbs.
Classification of dataset


Human 1 and Human 2: label adjectives
Human 2 and Human 3 : label verbs
21
EXPERIMENTAL ANALYSIS (CONTD.)
WORD SENTIMENT CLASSIFIER
Class Label
Positive, Negative and Neutral
 Measurement Type

Strict – Consider all class label
 Lenient – Two Class Label Negative and Positive
merged with neutral

22
Table: Inter Human Agreement
EXPERIMENTAL ANALYSIS (CONTD.)
WORD SENTIMENT CLASSIFIER
Table: Human-Machine Agreement (Small Seed Set)
23
Table: Human-Machine Agreement (Larger Seed Set)
EXPERIMENTAL ANALYSIS (CONTD.)
SENTENCE SENTIMENT CLASSIFIER

Dataset
100 sentences from the DUC 2001 Corpus
 Topics covered: “illegal alien”, “term limit”, “gun
control” and “NAFTA”


Classification of Sentence
100 sentences from the DUC 2001 Corpus [3]
 Two human classify the sentence into three class
label : positive, negative and N/A.

24
EXPERIMENTAL ANALYSIS (CONTD.)
SENTENCE SENTIMENT CLASSIFIER

Experiment Variants
Three different models
 Four different windows
 Two different word classifier models
 Manual annotated holder vs. automatic holder


So in total 16 different variants for each model 1
and model 2 and 8 different variants for model 0.
25
EXPERIMENTAL ANALYSIS (CONTD.)
SENTENCE SENTIMENT CLASSIFIER
26
Table: Results with manually
annotated Holder
Table: Results with automatic
Holder
EXPERIMENTAL ANALYSIS (CONTD.)
SENTENCE SENTIMENT CLASSIFIER

Performance Matrix

Correctness

Correct identification of both holder and sentiment
Best Model : Model 0
 Best Window : window 4


Accuracy
81% accuracy obtained on manually annotated
holder
 67% accuracy obtained on automatic holder

27
SHORTCOMINGS

Consider only unigram model.
As a result, for some words having both positive and
negative sentiment this model will fail.
 E.g.: Term limit really hit at democracy.


Model cannot infer sentiment from fact
Absence of adjective, verb and noun sentiment word
prevents classification.
 E.g.: She thinks term limit will give women more
opportunities in politics.

28
FUTURE WORK

One of assumption of this work is that the topic is
given.
Can we extract topic automatically?
 E.g: Twitter HashTag ??


Not only positive or negative sentiment

Context dependent sentiment (Bi-gram or ti-gram
analysis)
29
REFERENCES



[1] Miller, G.A., R. Beckwith, C. Fellbaum, D.
Gross, and K. Miller. 1993. Introduction to
WordNet: An On-Line Lexical Database.
http://www.cosgi.princeton.edu/~wn.
[2] BBN named entity tagger identifierhttp://www.bbn.com/technology/speech/identifind
er
[3] DUC 2001 Corpus. http://wwwnlpir.nist.gov/projects/duc/data.html
30