Download The differences between Sentiment Analysis and Artificial

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Machine learning wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Pattern recognition wikipedia , lookup

Intelligence explosion wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Time series wikipedia , lookup

Affective computing wikipedia , lookup

Transcript
The differences between Sentiment Analysis and Artificial Intelligence Driven Emotion
Micah Ainsley Brown, HnD, MBCS
Centiment
[email protected]
Shane Pase, Ph.D
Fielding University
Matthew Price, Ph.D
Fielding University
Tunisha Singleton, Ph.D
Fielding University
Abstract
This paper examines the current state of sentiment analysis, what it is, how it works and what it is
currently used for. The differences between sentiment analysis and artificial intelligence driven emotion
are explored. We also discuss the way Centiment’s solution differs from other artificial intelligence
driven tools in this space including how the neurological and neurophysiological aspects of the product
apply to practical usage as well as architectural aspects of the solution.
Introduction
Human emotion, at its most basic level in an example like love is the experiencing of multiple layers of
Cognitive Intimate Imitation which is an overlap of recollection and romantic perception. These are the
depths of functions that make us human, they are also the fundamental difference between us and animals.
The amygdala, the part of our brain that controls emotion, processes emotion in direct proportion to how
many orders of magnitude of neuron connections the subject brain has. So, in not so many words, the
depth of our emotional intelligence depends on our ability to learn and the more a life form can learn and
store, the deeper its self-awareness and external emotional perception is.
What does this have to do with sentiment analysis and artificial intelligence? Well, before answering that
question, we need to define sentiment analysis the nature of this matter requires a deep dive in how we do
this, at a high level sentiment analysis (also known as opinion mining) refers to the use of natural
language processing, text analysis and computational linguistics to identify and extract subjective
information in source materials.
A Review of Sentiment Analysis
A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or
feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect
is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance,
at emotional states such as "angry", "sad", and "happy". Early work in that area includes Turney and Pan
who applied different methods for detecting the polarity of product reviews and movie reviews
respectively. This work is at the document level. One can also classify a document's polarity on a multiway scale, which was attempted by Pang and Snyder among others: Pang and Lee expanded the basic task
of classifying a movie review as either positive or negative to predict star ratings on either a 3 or a 4 star
scale, while Snyder[4] performed an in-depth analysis of restaurant reviews, predicting ratings for various
aspects of the given restaurant, such as the food and atmosphere (on a five-star scale). Even though in
most statistical classification methods, the neutral class is ignored under the assumption that neutral texts
lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity
problem, three categories must be identified.
Machine Learning and Sentiment Analysis
First generation machine learning has and is currently being used in sentiment analysis, In fact it can be
proven that specific classifiers such as the Max Entropy and the Support Vector Machines can benefit
from the introduction of a neutral class and improve the overall accuracy of sentiment classification.
There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first
identifying the neutral language, filtering it out and then assessing the rest in terms of positive and
negative sentiments, or it builds a three way classification in one step. This second approach often
involves estimating a probability distribution over all categories (e.g. Naive Bayes classifiers as
implemented by Python's NLTK kit). Whether and how to use a neutral class depends on the nature of the
data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter
the neutral language out and focus on the polarity between positive and negative sentiments. If, in
contrast, the data is mostly neutral with small deviations towards positive and negative affect, this
strategy would make it harder to clearly distinguish between the two poles.
A different method for determining sentiment is the use of a scaling system whereby words commonly
associated with having a negative, neutral or positive sentiment with them are given an associated number
on a -10 to +10 scale (most negative up to most positive). This makes it possible to adjust the sentiment of
a given term relative to its environment (usually on the level of the sentence). When a piece of
unstructured text is analyzed using natural language processing, each concept in the specified
environment is given a score based on the way sentiment words relate to the concept and its associated
score. This allows movement to a more sophisticated understanding of sentiment, because it is now
possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words,
for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score.
Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to
determine the sentiment in a text rather than the overall polarity and strength of the text.
Recent works, such as Rosa, Rodríguez and Bressan detect sentiment variations in accordance with the
user's profile. In sentiment analysis is important to consider different scores for verbs tenses, negative
sentences and others, such as in Sentimeter-Br metric.Recent works, such as Rosa, Rodríguez and Bressan
detect sentiment variations in accordance with the user's profile.
Subjectivity/Objectivity Identification
This task is commonly defined as classifying a given text (usually a sentence) into one of two classes:
objective or subjective. This problem can sometimes be more difficult than polarity classification. The
subjectivity of words and phrases may depend on their context and an objective document may contain
subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su,[14]
results are largely dependent on the definition of subjectivity used when annotating texts. However,
Pang[15] showed that removing objective sentences from a document before classifying its polarity helped
improve performance.
Feature and Aspect-Based Classification
Feature and aspect-based classification refers to determining the opinions or sentiments expressed on
different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank.[16] A feature or
aspect is an attribute or component of an entity, e.g., the screen of a cell phone, the service for a
restaurant, or the picture quality of a camera. The advantage of feature-based sentiment analysis is the
possibility to capture nuances about objects of interest. Different features can generate different sentiment
responses, for example a hotel can have a convenient location, but mediocre food. This problem involves
several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining
whether an opinion expressed on each feature/aspect is positive, negative or neutral. The automatic
identification of features can be performed with syntactic methods or with topic modeling. More detailed
discussions about this level of sentiment analysis can be found in Liu's work.
Deep Learning and Sentiment Analysis
Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based
techniques, statistical methods, and hybrid approaches using second generation artificial intelligence
focused on the use of neural networks, which is where Centiment comes in.
Knowledge-based techniques classify text by affect categories based on the presence of unambiguous
affect words such as happy, sad, afraid, and bored. Some knowledge bases not only list obvious affect
words, but also assign arbitrary words a probable "affinity" to particular emotions. Statistical methods
leverage on elements from machine learning such as latent semantic analysis, support vector machines,
"bag of words" and Semantic Orientation — Pointwise Mutual Information (See Peter Turney's[1] work in
this area). More sophisticated methods try to detect the holder of a sentiment (i.e., the person who
maintains that affective state) and the target (i.e., the entity about which the affect is felt.). To mine the
opinion in context and get the feature that has been opinionated, the grammatical relationships of words
are used. Grammatical dependency relations are obtained by deep parsing of the text. Hybrid approaches
leverage on both machine learning and elements from knowledge representation such as ontologies and
semantic networks in order to detect semantics that are expressed in a subtle manner, e.g., through the
analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to
other concepts that do so.
Open source software tools deploy machine learning, statistics, and natural language processing
techniques to automate sentiment analysis on large collections of texts, including web pages, online news,
internet discussion groups, online reviews, web blogs, and social media.[28] Knowledge-based systems, on
the other hand, make use of publicly available resources, to extract the semantic and affective information
associated with natural language concepts. Sentiment analysis can also be performed on visual content,
i.e., images and videos. One of the first approaches in this direction is SentiBank[29] utilizing an adjective
noun pair representation of visual content.
A human analysis component is required in sentiment analysis, as automated systems are not able to
analyze historical tendencies of the individual commenter, or the platform and are often classified
incorrectly in their expressed sentiment. Automation impacts approximately 23% of comments that are
correctly classified by humans]. However, also humans often disagree, and it is argued that the interhuman agreement provides an upper bound that automated sentiment classifiers can eventually reach.
Sometimes, the structure of sentiments and topics is fairly complex. Also, the problem of sentiment
analysis is non-monotonic in respect to sentence extension and stop-word substitution (compare THEY
would not let my dog stay in this hotel vs. I would not let my dog stay in this hotel). To address this issue a
number of rule-based and reasoning-based approaches have been applied to sentiment analysis, including
defeasible logic programming.[32] Also, there are a number of tree traversal rules applied to syntactic parse
tree to extract the topicality of sentiment in open domain setting.
Emotion and Sentiment
The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments.
This is usually measured by precision and recall. However, according to research human raters typically
agree 79%[1] of the time (see Inter-rater reliability).
Thus, a 70% accurate program is doing nearly as well as humans, even though such accuracy may not
sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about
20% of the time, since they disagree that much about any answer .[2] More sophisticated measures can be
applied, but evaluation of sentiment analysis systems remains a complex matter. For sentiment analysis
tasks returning a scale rather than a binary judgment, correlation is a better measure than precision
because it takes into account how close the predicted value is to the target value.
The very definition of sentiment analysis answers the difference between sentiment analysis and which is
what Centiment understands. Emotion unlike sentiment analysis is driven by conventional artificial
intelligence tools and methods, but it is NOT driven by second generation artificial intelligence and the
additional ability of our tool to drill down and find permutational differences in expression means the
generalized accuracy rate of 79% is closer to 80-85% emotional correctness in our most recent user tests.
Second generation artificial intelligence mostly revolves around the use of artificial neural networks - in
our case the convolutional neural network.
A History of Neural Networks
The NYT covers this and Wikipedia also, for a long time, it was assumed by some of the smartest people
in the world and in this field that the only way for intelligent computers to work like humans was to
explicitly program them with every permutation of how they needed to think. Early versions of this led to
what is referred to as the AI "winter", a period in which all progress in the field was halted due to the
small-mindedness of non-scientists being tripped up by early mistakes within the field.
For a long period of time (from about 1950-2000) there were small advances, but for the most part, the
field was not only not mainstream within technology, but misunderstood massively. Some of the people
who are now on the Google brain project were viewed as crazy for telling the world what was possible
using AI and they were ostracized from the academic community.
Then something interesting happened: Moore's law kicked in servers got cheaper and data storage
exploded, creating the perfect fertile ground that had been sought after by many in the community for
decades. The horizon of the dream was here.
Then DARPA got involved. Still, for the most part, the general public knew nothing about the field and
Artificial intelligence specialists were seen as wacky, despite the fact that AI was slowly seeping into
daily life. Within military and academic circles, advancement was being made at a rapid rate. Then came
2007. The iPhone. Voice Control, later to be called Siri.
Artificial intelligence was now in all of our hands, but still, for the most part the masses did not make the
connection. However the cat was out of the bag, at this point and IBM saw this and formed Watson, a
group at the massive computing company focused on creating AI tools, Google and Facebook and many
other tech companies followed suit.
The fundamental point here: Things like clustering, bayesian interference, the support vector machine and
other supervised methods were still being used for much of this, Artificial Neural Networks had not been
revisited for the most part, meaning that there were serious limits on the results that would come out of
these products. Then in 2011 Google broke that trend, committing serious corporate resources to breaking
the trend with google brain. Check out the story is here. Google were not the only ones to do this, and as
the movement picked up steam, it began to rewrite the rules on everything, logistics, translation, finance,
even bioinformatics.
That leads us to the Centiment team and our definition of emotion, the way we see it. Most of the field
has developed the ability to identify anywhere between around six distinct emotions to around 20. We
have made breakthroughs not in increasing the number of understood emotions - although our ability to
do that is pretty good, but the context in which they are understood.
In order to really break new ground in understanding contextual emotion, an entirely new dataset beyond
text is needed - in our case that dataset is EEG and FMRI data.
The difference in what we do is also where emotion is understood in relation to content - specifically
video. Most people in this field attempt to understand the emotional significance of the entire video - as a
whole. We understand emotion at specific timestamps.
By understanding emotion at specific points in context, and you can understand (or at least attempt to
create a way to understand) how the person watching the content feels, the "mood" of the content itself.
Deeply understanding if the consumer’s mood matches the content which gets you to the statistically most
accurate price per consumer. Resulting in more wins per bid, and higher conversion rates.
More important than the money, by reconciling all the massive amounts of social data out there with data
coming from the human brain, you reduce the noise from EEG/FMRI waveforms and zero in on specific
behaviors, making it easier to diagnose mental illnesses, literally moving ergonomics and bioinformatics,
forward.
Conclusion
So to look at the differences here, they mostly exist in the nature of the tools used and the execution. Our
emotional analysis is driven using convolutional neural networks, identifying more emotions in a more
accurate way across much larger data sets than just text, the conclusions come from video, social
interactions and many other data sources that are cross referenced with text, this is completely different to
sentiment analysis which is the first generation use of machine learning to understand text, resulting in a
79% accuracy - our early tests are resulting in rates of 80-90% emotional accuracy.
Micah Ainsley Brown, HnD, MBCS
Centiment
[email protected]