Download Sentiment Analysis Model Overview

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial neural network wikipedia , lookup

Transcript
Sentiment analysis overview in
the text area
--Yuanyuan Liu
Sentiment analysis
• Sentiment analysis (also known as opinion mining) refers to the
use of natural language processing, text analysis and computational
linguistics to identify and extract subjective information in source
materials. Sentiment analysis is widely applied to reviews and social
media for a variety of applications, ranging
from marketing to customer service.
• Generally speaking, sentiment analysis aims to determine the
attitude of a speaker or a writer with respect to some topic or the
overall contextual polarity of a document. The attitude may be his
or her judgment or evaluation (see appraisal theory), affective state
(that is to say, the emotional state of the author when writing), or
the intended emotional communication (that is to say, the
emotional effect the author wishes to have on the reader).
Introduction
• Goal
• Granularity
 Document level
 Paragraph level
 Sentence level [feature/aspect level]
• Evaluation
 accuracy[precision and recall]
Methods
• knowledge-based techniques
• classify text by affect categories based on the presence of
unambiguous affect words such as happy, sad, afraid, and bored.
• assign arbitrary words a probable “affinity” to particular emotions.
• statistical methods
• Machine learning
• hybrid approaches
Measures using ML
• Classifier
•
•
•
•
Naïve Bayes
Maximum Entropy (MaxEnt)
Feature-based SVM
…
• Neural networks
• Recurrent neural network(RNN)
• Convolutional neural network(CNN)
• …
• Deep memory network and attention model
Sentiment Lexicons
•
•
•
•
•
GI (The General Inquirer)
LIWC (Linguistic Inquiry and Word Count)
MPQA Subjectivity Cues Lexicon
Bing Liu Opinion Lexicon
SentiWordNet
Naïve Bayes
• assign to a given document d the class c∗ = arg
maxc P (c | d)
• Assumption: the fi’s are conditionally
independent given d’s class:
Naïve Bayes
• Advantages:
• Simple
• Disadvantages:
• Its conditional independence assumption clearly does not
hold in real-world situations.
MaxEnt
• MaxEnt model
MaxEnt
• Advantages:
• MaxEnt makes no assumptions about the relationships
between features, and so might potentially perform better
when conditional independence assumptions are not met.
• Disadvantages:
• A lot of computations.
• Adam Berger
• http://www.cs.cmu.edu/afs/cs/user/aberger/www/html/tu
torial/tutorial.html
SVM
• Find a hyper plane and maximize the margin.
Accuracy comparison
Datasets: movie reviews from the Internet Movie Database(IMDb)
papers
• Survey:
• Thumbs up? Sentiment Classification using machine
Learning Techniques (Pang & Lee)
• Opinion mining and sentiment analysis (Pang & Lee)
• Comprehensive Review Of Opinion Summarization
(Kim et al)
• New Avenues in Opinion Mining and Sentiment
Analysis (Cambria et al)
RNN
• A recurrent neural network (RNN) is a class of artificial neural
network where connections between units form a directed
cycle. This creates an internal state of the network which
allows it to exhibit dynamic temporal behavior.
• Application:
• handwriting recognition
• speech recognition
RNN
RNN
CNN
• A convolutional neural network (CNN,
or ConvNet) is a type of feed-forward artificial
neural network in which the connectivity
pattern between its neurons is inspired by the
organization of the animal visual cortex.
CNN
Aspect Level Sentiment Classification
with Deep Memory Network
Duyu Tang
Bing Qin
Ting Liu
Motivation
• Drawbacks of conventional neural models
• capture context information in an implicit way, and are
incapable of explicitly exhibiting important context clues of
an aspect.
• expensive computation
• Intuition: only some subset of context words are
needed to infer the sentiment towards an aspect.
• E.g. “ great food but the service was dreadful! ”
Background:Memory network
• Question answering
• Central idea: inference with a long-term memory
component
• Components:
•
•
•
•
Memory m: an array of objects
I: converts input to internal feature representation
G: updates old memories with new input
O: generates an output representation given a new input
and the current memory state
• R: outputs a response based on the output representation
Background: attention model
• One important property of human perception is
that one does not tend to process a whole scene
in its entirety at once.
• Instead, humans focus attention selectively on
parts of the visual space to acquire information
when and where it is needed, and combine
information from different fixations over time to
build up an internal representation of the scene,
guiding future eye movements and decision
making.
Deep memory network model
aspect word
• sentence s = {w1, w2, … , wi, … , wn}
• Word embedding matrix:
• word embedding of wi :
vocabulary size
The dimension of
the word vector
• Task: determining the sentiment polarity of
sentences towards the aspect wi.
Overview of the approach
Figure 1: An illustration of our deep memory network with three computational
layers (hops) for aspect level sentiment classification
Attention model
• Content attention
• Location attention
Content attention
• Intuition:
 context words do not contribute equally to the
semantic meaning of a sentence
 the importance of a word should be different if
we focus on different aspect
Content attention
• Input:
external memory m:
aspect vector vaspect :
• Output:
mi is a piece of memory m
αi ∈ [0,1] is the weight of mi and ∑i αi = 1
Calculation of αi
• Softmax function
• where
Location attention
• Intuition:
• a context word closer to the aspect should
be more important than a farther one.
Location attention—model 1
• The memory vector mi:
• vi ∈ Rdx1 is a location vector for word wi
n is the sentence length
k is the hop number
li is the location of wi
Location attention—model 2
• The memory vector mi:
• vi ∈ Rdx1 is a location vector for word wi
Location attention—model 3
• The memory vector mi:
• vi is regarded as a parameter
Location attention—model 4
• The memory vector mi:
• Different from Model 3, location representations are
regarded as neural gates to control how many
percent of word semantics is written into the
memory.
The Need for Multiple Hops
• Computational models that are composed of
multiple processing layers have the ability to
learn representations of data with multiple levels
of abstraction.
• In this work, the attention layer in one layer is
essentially a weighted average compositional
function, which is not powerful enough to handle
the sophisticated computationality like negation,
intensification and contrary in language.
Aspect level sentiment classification
• Regard the output vector in last hop as the
feature, and feed it to a softmax layer for
aspect level sentiment classification.
• Means;
 minimizing the cross entropy error of sentiment classification
 Loss function:
Experiments
• Datasets [from SemEval 2014]
Comparison to other methods
• accuracy
• runtime
Effects of location attention
Visualize Attention Models
Error Analysis
• 1. non-compositional sentiment expression.
• E.g. “dessert was also to die for!”
• 2. complex aspect expression consisting of many
words.
• E.g. “ask for the round corner table next to the large window.”
• 3. sentimental relation between context words such
as negation, comparison and condition.
• E.g. “but dinner here is never disappointing, even if the prices
are a bit over the top”.
Conclusion
• develop deep memory networks that capture
importance of context words for aspect level
sentiment classification.
• leverage both content and location
information.
• using multiple computational layers in
memory network could obtain improved
performance.
Thanks