Download White Paper Using Sentiment Analysis for Gaining Actionable Insights

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bayesian inference in marketing wikipedia , lookup

Transcript
corevalue.net
[email protected]
White Paper
Using Sentiment Analysis for
Gaining Actionable Insights
Sentiment analysis is a growing business trend that allows companies to better understand
their brand, products, and services by analyzing the attitudes, opinions, and emotions
expressed by an online audience.
Author
Olena Domanska
CoreValue Data Science Engineer
Using sentiment analysis for gaining actionable insights
Consumer opinions undoubtedly affect a company’s reputation and should be of a high interest
to businesses, as they can prove to be extremely valuable assets. Actionable insights provide
businesses an advantage over their competition and help them maintain a competitive edge
on the market.
Today it’s easy for consumers to loudly express their satisfaction and their frustration about
a company or a product through social media, forums, blogs, and review platforms which can
greatly impact public opinion.
Sentimental analysis allows businesses to analyze public opinion about a product or service
in order to unlock the hidden value contained within. This information, when used correctly,
enables them to make better informed business decisions.
The notion of sentiment analysis
Sentiment analysis (also known as opinion mining) refers to the use of natural language
processing, text analysis and computational linguistics to identify and extract subjective
information found in source materials.
By harnessing the power of sentiment analysis and wrangling all the opinion-related
information it contains, businesses can extract tremendous value and use it to their
advantage. This data mining requires significant effort, however, as it involves various
product/service comparisons, subjectivity and probability defining, emotional components
classification, opinion reasoning, and summarizing. In layman’s terms, the sentiment analysis
engine lurks in social platforms, processes tons of unrestricted data, and derives actionable
insights that are directly related to business results.
Core techniques of sentiment analysis
There are two fundamental ways in which to approach sentiment analysis: supervised and
unsupervised (or lexicon-based).
The lexicon-based approach rests upon the assumption that the contextual sentiment
orientation of the text can be calculated by summing up the sentiment scores of each separate
word or phrase. Essentially, this technique relies on external lexical resources that are
concerned with mapping words to a categorical class (positive, negative, neutral) or numerical
sentiment score. As a result, its effectiveness strongly depends on the quality and adequacy
of the chosen resource.
While the obvious advantage of the approach is avoiding the arduous step of labeling training
data, one must also be aware of its possible limitations. A few examples include instances
when a word associated with a positive or negative sentiment actually has opposite
orientations in different application domains or, when a sentence containing sentiment words
may not express any sentiment at all (in interrogative and conditional sentences). Sentences
with a sarcastic tone often warp the polarity of sentiment words and many sentences without
sentiment words can also imply opinions.
Supervised techniques, on the other hand, work with the notion of training data. Specifically,
training samples and the corresponding output values are entered into the algorithm before
applying it to the actual data set. This enables the algorithm to handle new unknown data in
the future and provide more accurate sentiment classification in specific domains for which it
has been trained. The most common supervised learning methods are Naive Bayes
classification and Support Vector Machines (SVM) although researchers apply many others as
well, including maximum entropy, random forest, neural networks, and regression tree.
Recent works in the area shows that supervised approaches tend to overcome unsupervised
ones. But is this really true? In this article, we will try to verify this assumption with real data.
Data Science Sentiment Analysis
Sentiment analysis in action
Theory aside, the real questions seem to be, “How effective is sentiment analysis in practice?”
and “Which approach is more accurate: supervised or unsupervised? In order to figure this
out, we decided to analyze all the available reviews for HubSpot, one of the popular marketing
automation platforms, from a natural language processing perspective. The script for the
following analysis can be found on GitHub.
To perform our analysis, we began by closely examining the data we collected in order to
discover the most frequently used words and built associations between them to figure out
clusters and themes within the reviews. We then examined how review topics changed over
time. Finally, we identified the sentiments of consumer opinions by applying alternatively
unsupervised and supervised methods, comparing how each performed on our real data.
Exploratory Phase
Here is the sample of data we gathered:
"HubSpot is our main marketing platform. It's currently used to automate our marketing
programs, including email marketing, landing pages, social media and our blog. We also use
the tool to score leads, and automate our lead nurturing process. It's easy to measure the
success of our programs through the reporting. HubSpot is great for automating workflow
emails, creating new campaigns, landing page creation and compiling lists. They have a great
training program for gearing up with HubSpot..."
Each review was scored by reviewers on a scale from 1 to 10. Here is a distribution of these
scores:
As you can see, the distribution is strongly left-skewed with a distinct peak at the highest
value. It is interesting to see how algorithms perform on such unbalanced real data.
To start off, we determined the most frequently used words by building a word cloud with the
help of Wordle. Below, you can find the screenshot that illustrates the results.
Data Science Sentiment Analysis
After examining the word cloud, we concluded that people mainly discussed the features of
HubSpot’s inbound marketing platform and describe them with the words “easy”, “great” and
“amazing”.
Even though word clouds give us an understanding of which words are most popular in
reviews, they don’t allow us to determine numerical proportions of their occurrence
frequencies. To do this, we built a so-called “document-term matrix” that shows which terms
contain the review and how often they appear. Adding the number of term occurrences for
each review, we get the following histogram:
Data Science Sentiment Analysis
The most frequently used words in the review texts are the words “hubspot” (2531), “market”
(1418), “use” (1393), and “tool” (690 occurrences). But how are these words connected to
each other? To answer this, we built the following graph, illustrating the associations between
these words. The thicker the line connecting two words, the higher the probability of their cooccurrence in a review:
We see that words “hubspot”, “lead”, “tool”, as well as “hubspot”, “can”, “help”; “email”,
“content”, “page” are usually present together. At this point, the content of the reviews starts
to become more clear. Next, we thought it would be interesting to find out what directions of
discussions are hidden in reviews’ texts.
Topics Identification
To identify topics, we grouped reviews into clusters using the Hierarchical Clustering
technique. The results of which are shown below (we first determined the ascending clustering
of reviews before constructing the tree with only a few of the uppermost clusters). It was
necessary to pick a threshold level to form the groups, so we decided on the simplest and
most popular solution, which is to inspect the dendrogram.
Data Science Sentiment Analysis
Hierarchical Clustering
30 uppermost clusters of reviews
In our case, threshold at 0.4 level seemed to be a reasonable choice, which revealed 3
clusters, depicted by black, green, and red boxes containing 73, 12, and 15 percents of
reviews, respectively. Thereby revealing that 73 out of 100 reviewers discussed the same
topic. But what was the topic they discussed? We determined topics of those clusters based
on probabilistic modeling of term frequency. Below are a few of the most frequently used
words for each topic:
Topic 1: "marketing, hubspot, tool, customer, lead, inbound, sale"
Topic 2: "hubspot, email, social, page, blog, content, website, manage"
Topic 3: "hubspot, time, can, make, help, need, get"
On the basis of the word vectors listed as topics, we concluded the theme of the reviews
within each cluster. For instance, the first group of the reviews concentrates on the idea that
Hubspot is a leading marketing tool for increasing customer sales, while the second group of
reviews is devoted to discussing Hubspot tools like social media and blog posts publisher,
landing page creator, content management and website visitor tracking. Let’s discover the
trending topics across the three-year time frame:
Data Science Sentiment Analysis
According to this plot, there were just a few reviews on Hubspot until the middle of 2013,
with topic 1 proving to be prevalent for yet another year. The highest concentration of reviews
occurs at the beginning of 2015, followed by a slowdown which still continues.
We continued our sentiment analysis of the reviews using unsupervised and supervised
approaches, comparing their accuracy.
Sentiment Analysis
We began using the unsupervised (lexicon-based) approach, which estimates a record's
sentiment by counting the number of occurrences of "positive" and "negative" words and
utilizing Hu and Liu's "opinion lexicon". It categorizes around 6,800 words as positive or
negative and is available for download here. Other useful resources for lexicon-based
sentiment analysis include the MPQA Subjectivity Lexicon, SentiWordNet, SenticNet.
To assign a numeric score to each review, we simply subtract the number of negative words
from the number of positive words that occur. A new question arose: “Should we take into
account the length of the review?” Consider the following two reviews. The first is “fantastic”
(one word-long, the sentiment score is equal to one) while the second review is several
sheets long, expressing both positive and negative thoughts about the product, but with a
total score is also equal to one. Obviously, we should rank these reviews differently. One
way to take this peculiarity into account is to normalize the score by the length of the
review. Not to be unbound, we calculated scores for HubSpot reviews in both cases: with
Data Science Sentiment Analysis
and without normalization to evaluate the sum of squared errors (SSE). As it turns out,
normalization reduced the SSE twice (see the details on GitHub). Based on these
arguments, we determined our analysis using three steps: count the sentiment score for
each review, normalize the score by the review length, map the obtained scores to the
interval [1,10] and round them due to the fact that every review has a rating score
assigned by reviewers in the range of 1 to 10. As a result, we obtained 10-class
classification problem (NORMAL formulation). However, sentiment classification is usually
formulated as a two-class classification problem: “positive” and “negative” (BINARY
formulation), where a review with the rating score from 1 to 4 is considered to be a
negative review, while a review with 5 to 10 rating score is considered to be a positive
review. It is also possible to use a “neutral” class and consider a three-class classification
problem, by which a review with 1 to 3 rating score is considered to be negative, 4 to 6 neutral, and 7 to 10 - positive (BASIC formulation). We obtained the following distributions
of the reviews’ classes for each of the formulated sentiment classification problems:
After comparing the distribution of reviews’ scores assigned by reviewers with the
distributions of reviews’ scores obtained with the help of lexicon-based approach, we
concluded that the unsupervised (lexicon-based) technique performs well only in the case of
binary classification, where accuracy reaches 0.69. If you consider that normal and basic
formulations have accuracies equal to only 0.0065 and 0.19 respectively, our analysis
reconfirms the research provided by other scientists who determined the accuracy of
classification algorithms depends significantly on the number of classes considered. Taking
into account that, according to a series of experiments with Mechanical Turk, humans only
agree 79% of the time, this algorithm gives competitive result in the case of two classes. The
quantitative perspective of opinions in this case stated that 68.8% of people were positive
about the Hubspot product.
Data Science Sentiment Analysis
Now let’s consider the supervised approach. We start with a Naive Bayes classifier. The chosen
classifier applies Bayes’ Theorem to predict the class of the given text using a number of
previously classified samples of the same type. We divide our reviews into 2 groups: train
and test, in order to evaluate the accuracy of the method on the text set. As in the case of
lexicon-based approach, we provide NORMAL, BASIC and BINARY formulations. The accuracy
of the method turns out to be 0.63, 0.99, and 1 for normal, basic and binary formulations,
respectively. It turns out that compared to unsupervised method, one of the simplest
supervised models - Naive Bayes classifier - was able to achieve a recall accuracy up to 100%
for our biased data.
In the landscape of R, the fantastic RTextTools package was developed by Timothy P. Jurka
and colleagues for automatic text classification. The package includes nine algorithms for
ensemble classification and is designed to conduct supervised learning in less than 10 steps.
For sentiment classification of Hubspot reviews, we chose the following 5 of the existed
algorithms: support vector machine (SVM), maximum entropy (MAXENT), random forest (RF),
classification or regression tree (TREE) and neural networks (NNET), and implemented them
for every formulation: NORMAL, BASIC, and BINARY. All of these methods showed almost
100% accuracy in the case of two or three classes and the accuracy in the interval [0.59,
0.72] in the case of 10 numerical classes. The comparison of the obtained results can be
observed in the following plot:
Data Science Sentiment Analysis
Conclusions
In this article, we provided a thorough comparison of unsupervised and supervised
approaches to sentiment analysis using the example of Hubspot platform reviews. Specifically,
we have used and evaluated the results from seven different models, including lexicon-based,
Naive Bayes classifier, support vector machine,
maximum entropy,
random
forest, classification, and neural networks algorithms.
Our examination shows that when data is skewed, both lexicon-based (unsupervised) and
machine learning (applied to supervised scenario) techniques perform very well in terms of
accuracy in the case of binary classification. As expected, the machine learning methodologies
outperformed the lexicon-based method.
Overall, the sentiment analysis proves to be a relatively simple and effective tool to extract
valuable opinion-based information from source data. This creates the potential for further
growth of sentiment analysis by expanding its usage into new areas, much to the benefit of
businesses who harness the power of this method.
Data Science Sentiment Analysis
About CoreValue
CoreValue, a Software and
Technology Services firm
headquartered in New Jersey with
Development Labs in Eastern
Europe, provides Mobility and
traditional Cloud based CRM
implementation services, Mobile
applications in Pharmaceutical,
Medical, Media and
CoreValue Services
18 Overlook Ave, Suite 9
Rochelle Park, NJ 07662
908-312-4070
[email protected]
Telecommunication verticals.
Customers trust CoreValue to
provide Infrastructure services
utilizing premier staff in Data
Science, Data Management,
Database Services, Quality
Assurance and traditional
development.