Download LNCS 7634 - Sentiment Classification of Drug Reviews

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sentiment Classification of Drug Reviews
Using a Rule-Based Linguistic Approach
Jin-Cheon Na, Wai Yan Min Kyaing, Christopher S.G. Khoo, Schubert Foo,
Yun-Ke Chang, and Yin-Leng Theng
Wee Kim Wee School of Communication and Information
Nanyang Technological University
31 Nanyang Link, Singapore 637718
{tjcna,kymwai,assgkhoo,assfoo,ykchang,tyltheng}@ntu.edu.sg
Abstract. Clause-level sentiment classification algorithm is developed and
applied to drug reviews on a discussion forum. The algorithm adopts a pure
linguistic approach of computing the sentiment of a clause from the prior
sentiment scores assigned to individual words, taking into consideration the
grammatical dependency structure of the clause using the sentiment analysis
rules. MetaMap, a medical resource tool, is used to identify various disease
terms in the review documents to utilize domain knowledge for sentiment
classification. Experiment results with 1,000 clauses show the effectiveness of
the proposed approach, and it performed significantly better than baseline
machine learning approaches. Various challenging issues were identified
through error analysis, and we will continue improving our linguistic algorithm.
Keywords: Sentiment Classification, Drug Reviews, a Rule-Based Linguistic
Approach.
1
Introduction
With the explosion of Web 2.0 platforms, there are enormous amounts of usergenerated content, called social media. Therefore, for the past decade, many
researchers have been studying effective algorithms for sentiment analysis (or
sentiment classification) of user-generated content [7]. Sentiment analysis is a type of
subjectivity analysis which analyzes sentiment in a given textual unit with the
objective of understanding the sentiment polarities (i.e. positive, negative, or neutral)
of the opinions toward various aspects of a subject. It is still considered as a very
challenging problem since user generated content is described in various and complex
ways using natural language. Digital libraries are about new ways of dealing with
knowledge, and researchers are considering the problem of organizing and searching
digital objects, not just by standard metadata fields but also by sentiment polarities [8].
For sentiment analysis, most of researchers have worked on general domains (such
as electronic products, movies, and restaurants reviews), but not much on health and
medical domains. Previous studies have shown that this health-related user-generated
content is useful from different points of view. Firstly, users are often looking for
H.-H. Chen and G. Chowdhury (Eds.): ICADL 2012, LNCS 7634, pp. 189–198, 2012.
© Springer-Verlag Berlin Heidelberg 2012
190
J.-C. Na et al.
stories from “patients like them” on the Internet, which they cannot always find
among their friends and family [11]. Moreover, studies investigating the impact of
social media on patients have shown that for some diseases and health problems,
online community support can have a positive effect [6]. Because of its novelty as
well as quality and trustworthiness issues, user-generated content of social media in
health and medical domains is underexploited. Therefore, the objective of this paper
is to develop an effective method for sentiment analysis of social media content in
health and medical domains. The sentiment analysis is applied to drug reviews on a
discussion forum. In the following sections, related work is discussed first. Then our
proposed sentiment analysis method and its experiment results are presented and
discussed. Finally, conclusion is provided.
2
Related Work
Researchers have used various approaches for sentiment analysis [7]. Most of the
early studies were focused on document-level analysis for assigning the sentiment
orientation of a document [9]. However, these document-level sentiment analysis
approaches are less effective when review texts are rather short, and in-depth
sentiment analysis of review texts is required. More recently researchers have carried
out sentence-level sentiment analysis to examine and extract opinions toward various
aspects of a reviewed subject [4]. In contrast to most studies which focused on
document-level or sentence-level sentiment analysis, our approach uses clause-level
sentiment analysis so that different opinions on multiple aspects expressed in a
sentence can be processed separately in each clause. For instance, the sentence “I like
this drug, but it causes me some drowsiness” has two clauses expressing two aspects:
overall opinion and side effects. Some researchers have studied phrase-level
contextual sentiment analysis, but phrases are often not long enough to contain both
sentiment and feature terms together for detailed analysis [13].
Generally there are two main approaches for sentiment analysis: a machine
learning approach (or a statistical text mining approach) and a linguistic approach (or
a natural language processing approach). Since clauses are quite short and do not
contain many subjective words, the machine learning approach generally suffer from
data sparseness problem. Also the machine learning approach cannot handle complex
grammatical relations between words in a clause. Some researchers used various
linguistic features in addition to the bag-of-word (BOW) feature in the machine
learning approach to overcome the limitation of the BOW approach [14], which is
being a popular trend in sentence-level sentiment analysis. In this study, we are using
a pure linguistic approach to overcome these weaknesses of the machine learning
approach. The main advantages of a pure linguistic approach are that we can define
sophisticated rules to handle various grammatical relations between words in a
sentence or clause, and new rules based on linguistics can be incrementally added to
the system. Our previous work performed sentiment analysis at the clause-level using
a linguistic approach and focused on movie reviews [12]. This study is based on the
previous work, and we are improving the approach by adding additional rules for
handling more complex relations between words and adapting it to a new domain,
drug reviews.
Sentiment Classification of Drug Reviews Using a Rule-Based Linguistic Approach
3
191
Sentiment Classification Method
For clause-level sentiment classification of drug reviews, firstly, each sentence is
broken into independent clauses, and their review aspects are determined. In our
study, each clause indicates an independent clause which may include a dependent
clause. We are using 6 types of aspects related to drugs: overall opinion, effectiveness,
side effects, condition, cost, and dosage. Since automatic methods for the clause
separation and aspect detection are also challenging problems, clauses and their
aspects are separated and tagged by manual coders. Then, for each clause of the
review text, semantic annotation (such as disorder terms) is performed, and a prior
sentiment score is assigned to each word. Then, for each clause, the grammatical
dependencies are determined using a parser, and the contextual sentiment score is
calculated by traversing the dependency tree based on its clause structure.
In the first step of this study, we have created a general lexicon (9,630 terms) and a
domain lexicon (10 terms). For the general lexicon construction, firstly, we collected
positive and negative terms (7,611 terms) from Subjectivity Lexicon (SL) [13]. We
set the prior score +1 to strongly subjective positive terms and -1 to strongly
subjective negative terms. Also we set the prior score +0.5 to weakly subjective
positive terms and -0.5 to weakly subjective negative terms. In addition to SL, we
collected additional terms from SentiWordNet (SWN) [2], which are not occurring in
SL. Then, to keep only common terms, we used a general language dictionary from
12dict project (http://wordlist.sourceforge.net/) to filter out rare terms. After that,
because of relatively low quality of SWN, three manual coders set prior scores to the
collected 2,019 terms, and conflicting cases are resolved by using a heuristic
approach. In addition, we added a small number of domain specific terms to the
domain lexicon during the development phase. To compensate for the small domain
lexicon, MetaMap [1] is used to tag disorder terms, such as “pain” and “hair loss”,
using Disorders semantic group [3], and set them to -1 sentiment score. Disorders
semantic group contains a list of UMLS (the Unified Medical Language System)
semantic types related to disorder terms, such as “Disease or Syndrome” and “Injury
or Poisoning”. However, we excluded “Finding” semantic type from it to reduce
false positive disorder terms.
We have used the Stanford NLP library [5] to process the grammatical relations of
words in a clause. There are 55 Stanford typed dependencies (i.e. grammatical
relations). The Stanford typed dependencies are binary grammatical relations between
two words: a governor and a dependent. For example, the sentence “I like the drug”
has the following dependencies among the words: nsubj[like-2, I-1], det[drug-4, the3], and dobj[like-2, drug-4]. In the typed dependencies, “nsubj[like-2, I-1]” indicates
that the first word “I” is a nominal subject of the second word “like”. In the
dependency relation, “I” is the dependent (or modifier) word and “like” is the
governor (or head) word. “dobj[like-2, drug-4]” indicates that the fourth word
“drug” is the direct object of the governor “like”. “det[drug-4, the-3]” indicates that
“the” in the third position is a determiner of “drug”.
To calculate a contextual sentiment score of a clause, we have defined general
sentiment analysis rules that utilize the grammatical relations, part-of-speech, and
prior sentiment scores of terms in the clause. Table 1 shows a summary of general
rules with example phrases or clauses.
192
J.-C. Na et al.
Table 1. A summary of general rules with example phrases or clauses
Rule
Group
Phrase
Rules
Target Relations
Examples
Adjectival
Phrase
The relation between an adverb and an adjective
defined by the Adverbial Modifier relation:
advmod(adjective, adverb)
Verb Phrase
The relation between a verb and an adverb defined
by the Adverbial Modifier relation, advmod(verb,
adverb).
The relation between an adjective and a noun
phrase defined by the Adjectival Modifier relation,
amod(noun, adjective)
The relation between two elements connected by a
coordinating conjunction: “and”, “or”, and “but”.
Handles conj_and(), conj_or(), and conj_but()
relations.
The relation between a verb phrase and an object / a
complement defined by the Direct Object, Indirect
Object, and Adjectival Complement relations:
dobj(), iobj(), and acomp().
The relation between a verb / an adjective and a
clausal complement defined by the Open Clausal
Complement relation, xcomp().
- enthusiastically
responsive (+)
- extremely disappointed
(-)
- cheer happily (+)
- fail badly (-)
Noun Phrase
Conjunct
Conjunct
Predicate
Predicate
Clausal
Complement
Relation
Clause
Clause
Clause
connectors
Clause
Connector
Negation of
Term
Negation
Term: neg()
Polarity
Shifter
Adjectival
Polarity Shifter
of
Verb
Phrase
Polarity Shifter
Predicate
Polarity Shifter
Default
Rules
Default Rules
The relation between a subject and a predicate
defined by the Nominal Subject, Clausal Subject,
Passive Nominal Subject, and Clausal Passive
Subject relations: nsubj(), csubj(), nsubjpass(), and
csubjpass().
The relation between two clauses in a sentence
defined by Adverbial Clause Modifier, Clausal
Complement, Purpose Clause Modifier relations:
advcl(), ccomp(), and purpcl().
The relation between a negation word (such as not,
never, or none) and the word it modifies defined by
the Negation modifier: neg()
The relation between an adjective and a polarity
shifter adverb defined by the Adverbial Modifier
relation, advmod(adjective, adverb)
The relation between a verb and a polarity shifter
adverb defined by the Adverbial Modifier relation,
advmod(verb, adverb).
The relation between a polarity shifter verb and an
object / a complement defined by the Direct Object,
Adjectival Complement relationship, and Open
Clausal Complement relations: dobj(), acomp(), and
xcomp().
Applied to any undefined relations
- great drug (+)
- big failure (-)
- He is good and honest.
(+)
- He
is
bad
and
dishonest. (-)
- provide goodness (+)
- provide problems (-)
- I love to use this great
drug again. (+)
- I will advise to throw
away the drug. (-)
- The drug did great.
(+)
- It performed poorly. (-)
- He says that the drug
works well. (+)
- He misuses drugs in
order to sleep well. (-)
- Not lousy (+)
- Not good (-)
- hardly bad (+)
- hardly good (-)
- hardly fail (+)
- rarely succeed (-)
- ceased boring (+)
- stopped interesting (-)
- my tolerance (+)
- the drug’s flaws (-)
Phrase Rules. Phrase rules are used to calculate contextual sentiment scores of phrases
which can be a subject, an object, or a verb phrase in clauses. Adjectival Phrase rules
handle the relation between an adverb and an adjective defined by the Adverbial
Modifier relation, advmod(adjective, adverb). When inputs (i.e. an adverb and an
adjective) are of the same sentiment orientation, they tend to intensify each other. The
absolute value of the output should be larger than or equal to the absolute values of the
inputs but less than 1. Therefore, the formula (+ (|Adjective| + (1-|Adjective|) *
|Adverb|)) is applied when the adverb and adjective are both positive. For example, the
adjective “Enthusiastically” in the phrase “Enthusiastically Responsive” intensifies the
positive adjective “Responsive”. The prior sentiment score of the adjective
Sentiment Classification of Drug Reviews Using a Rule-Based Linguistic Approach
193
“Responsive” and the adverb “Enthusiastically” are 0.5 and +1.0 respectively. The
output sentiment score of the phrase “Enthusiastically Responsive” is calculated as (+
(0.5 + (1– 0.5) * 1.0) = +1.0). Similarly, the adverb “Strictly” in the phrase “Strictly
Prohibitive” intensifies the negative adjective “Prohibitive”. When the adverb is
positive and adjective is negative, the adverb also intensifies the adjective (e.g.,
“Extremely Disappointed”, -1.0). The formula (- (|Adjective| + (1-|Adjective|) *
|Adverb|)) is used when (1) the adverb and adjective are both negative and (2) the
adverb is positive and adjective is negative. When the adverb is negative and adjective
is positive, the output is the value of the negative adverb (e.g., “Glaringly Stared”, -1.0).
For Verb Phrase and Noun Phrase rules, similar formulas are used.
Predicate Rules. The predicate indicates all the syntactic components except its
subject. Predicate rules handle the relation between a verb phrase and an object / a
complement, and their formulas are similar to those of rules for Adjectival Phrase.
Clausal Complement Relation rules handle the relation between a verb / an adjective
and a clausal complement. For instance, for a complex clause with “to” dependency,
the sentiment score of the “to” clause is intensified when the governor term is
positive, and the sentiment score is negated when the governor term is negative.
Clause Rules. Clause rules handle the relation between a subject and a predicate. As
before, when inputs are of the same sentiment orientation, they intensify each other.
When the subject is positive and predicate is negative, the output is the value of the
negative predicate. However, when the subject is negative and predicate is positive,
the output can be either positive or negative. Therefore, the values of the subject and
the predicate are compared, and if the absolute value of the subject is larger than that
of the predicate, the output becomes the sentiment score of the subject, and vice versa.
Polarity Shifter Rules. Some negation words are not detected by the Negation
Modifier relation. Therefore, negation terms, called polarity shifter, should be handled
in other relations. For instance, Adjectival Polarity Shifter rules handle the relation
between an adjective and a polarity shifter adverb in the Adverbial Modifier relation.
In the relation, the polarity shifter adverb, such as “hardly”, shifts the original
sentiment orientation of the adjective.
Default Rules. Since the existing rules cannot comprehensively cover all the relations
of words in clauses, Default rules are applied to unmatched phrases. The formulas are
generalized since the output sentiment orientation can vary in such situations. When
both terms are either positive or negative, they intensify each another and the output
maintains their original sentiment orientation. However, when their sentiment
orientations are different, the term with a greater sentiment score is used as the output.
Additional special rules for handling more complex relations between words are
also defined.
Intensify, Mitigate, Maximize, and Minimize Rules. These special rules are defined
using additional modifier terms including intensifiers (40 terms), mitigators (44
terms), maximizers (43 terms), and minimizers (12 terms). These modifier terms
collected mainly from [10] are checked before related relations are applied. For
instance, in advmod(adjective, adverb), the adverb is checked for firing the special
rules. In the Intensify rule, the polarity score of the modified word is doubled, but
limited to a value of ±1 (e.g., “enormously good”, +1.0). If the modified word is
neural, the score becomes +0.5. Conversely, the polarity score of the modified word is
halved in the Mitigate rule (e.g., “slightly better”, +0.5). The polarity score is
194
J.-C. Na et al.
maximized to ±1 in the Maximize rule to denote the upper extreme of the intensity
scale (e.g., “totally bad”, -1). In the Minimize rule, the polarity score of the modified
word is severely reduced to account for the lower extreme of the intensity scale (e.g.,
“minimal passion”, 0.25).
Decrease Disorder Rule. This rule is defined to handle “decrease-type verb + disorder
object” cases, such as “It reduces the pain”. In the Direct Object relation, the
decrease-type verb shifts the original sentiment orientation of a disorder term. Since
the disorder term is set to -1.0, the result becomes +1.0. If the general Predicate rules
were applied, the output of the clause would be -1 which is wrong. We have collected
23 decrease-type verbs, such as “reduce”, “decrease”, “lessen”, etc. Since sentiment
analysis is domain specific, this kind of rules using medical domain knowledge will
be added further in the future work.
Positive and Negative Valence Rules. A positive or negative valence term can
determine overall sentiment polarity of a whole clause or sentence no matter how it is
modified by other terms. These rules check whether the governor word in any relation
matches the positive or negative valence terms for the rule trigger. We are using 13
positive valence terms (e.g., “solve”) and 8 negative valence terms (e.g., “hate”) that
are all verbs. If a positive or negative valence term is found in a clause, its sentiment
prior score becomes an output sentiment score of the clause. In case the sentiment
prior score of the valence term is neutral, the output value becomes +0.5 for a positive
valence term and -0.5 for a negative valence term.
Phrasal Verb Rules. A phrasal verb, such as “back off” and “break up”, is identified
by using the Phrasal Verb Particle relation. In the case of “back off” that is negative,
if “back” and “off” were handled by the Default rules, it would become positive.
Currently we are using manually collected 121 phrasal verbs.
Contradicting Connectors Rule. Using this rule, dependent clauses having
contradicting connectors, such as although, though, however, and but, are ignored, and
only their main clauses are considered to calculate contextual sentiment scores. For
instance, in the sentence “Although the drug worked well, it gave me a headache”, the
dependent clause “Although the drug worked well” is ignored or neutralized.
Question Rule. This rule is used to detect question sentences or clauses using the
question mark “?”. It sets the whole sentence or clause to the neural value 0.
We use a bottom-up approach in evaluating the rules in the dependency tree, and
leaf nodes are evaluated first and the resultant polarity scores propagated to upperlevel nodes for further evaluation. Since a node in the dependency tree can have
several grammatical relations with its direct children nodes, we defined rule
processing priorities among the relations. Generally, these priorities allow the system
to process phrases (i.e. small components) first in a clause, and use the calculated
phrase scores to process a predicate, and finally the clause score is calculated using
the scores of the subject and the predicate. Now we will see how the system processes
the sentence “it completely eliminated stomach problems” with actual rules. The
dependency tree for the sentence is shown in Figure 1, in which words in the sentence
are nodes and grammatical relations are edge labels. Firstly, the contextual sentiment
score of the object “stomach problems” [nn(problems:-0.5, stomach:0)] at the lowest
bottom level is calculated using the Default rules. The prior sentiment scores of the
nouns “stomach” and “problems” are 0 and -0.5 respectively. Thus, the contextual
sentiment score of the noun phrase “stomach problems” is calculated to -0.5.
Sentiment Classification of Drug Reviews Using a Rule-Based Linguistic Approach
195
Subsequently, the root node “eliminated” is processed with its three children nodes.
Firstly, advmod(eliminated, completely) is processed since it has a higher priority
than the other two relations. The contextual sentiment score of the verb phrase
“completely eliminated” [advmod(eliminated:0, completely:0)] is calculated by using
the Maximize rule since “completely” is a maximize term. The contextual score of the
verb phrase is calculated to +0.5 (the Maximize rule converts the neutral verb term to
+0.5). For the predicate “completely eliminated stomach problems” defined by
dobj(eliminated, problems), the sentiment score is calculated using Predicate Polarity
Shifter rules since “eliminated” is a polarity shifter term. The sentiment scores of the
verb phrase and object are +0.5 and -0.5 respectively. Thus, it is calculated as (-1.0 * 0.5) which is equal to +0.5 (the polarity shifter verb shifts the original sentiment
orientation of the object). Finally, the score for the clause “it completely eliminated
stomach problems” defined by nsubj(eliminated, it) is calculated by using Clause
rules. Since the prior score of the subject “It” is 0, the final score of the clause
remains +0.5. If advmod(eliminated, completely) were processed right after
dobj(eliminated, problems), we could have the final score of +1. Therefore, we are
still exploring the best rule processing priorities among the relations.
Fig. 1. The dependency tree for the sentence: It completely eliminated stomach problems
4
Experiment Results and Discussion
Drug review sentences were collected from the drug review website DrugLib.com
(www.druglib.com) to evaluate the developed algorithm. Firstly, for the algorithm
development, we prepared the development dataset. Two manual coders worked on
the same 200 sentences, and separated them into clauses, and tagged them into
positive, negative, or neural class with corresponding aspects. The same 239 clauses
were identified by the two coders, and agreement rate between them is 79%, and
Cohen Kappa value is 0.64 which is considered as substantial agreement. Then, for
the evaluation dataset, the two coders tagged a total of 1,200 sentences (one coder
tagged 400 sentences, and the other one tagged 800), and we selected randomly a
dataset of 1,000 clauses (500 positive and 500 negative) for sentiment classification.
In addition to our linguistic approach, we evaluated a machine learning approach in
order to provide a benchmark for comparison with our approach. For the machine
learning approach, we used the widely used machine learning algorithm, SVM
(Support Vector Machine). In the first approach (SVM-1), we used BOW and
196
J.-C. Na et al.
negation document features for sentiment classification. For the BOW, term
frequency is used for each term. For negation handling, if negation terms, such as
“neither”, “never”, “no”, “non”, “nothing”, “not”, “n't”, and “none”, occur odd
number of times in a clause, the negation feature becomes 1, otherwise it becomes 0.
In the second approach (SVM-2), we added an additional linguistic feature, functional
dependencies, to consider grammatical relations between words and overcome data
sparseness problem. For each typed dependency of all the 55 Stanford typed
dependencies, we used 4 document features as follows: TD(+,+), TD(+,-), TD(-,+),
and TD(-,-). The governor and dependent terms in type dependencies are converted to
+ or – using the general and domain lexicons (note that + includes neutral) to utilize
prior scores of subjective terms. For instance, for “amod(drug, great)”, we have the
following 4 type dependency features: amod(+,+): 1, amod(+,-): 0, amod(-,+): 0, and
amod(-,-): 0. Even though there are various approaches to use linguistic features as
document features [14], we believe that our approach is competitively good enough as
a benchmark for comparison with our linguistic approach. In the future work, we plan
to explore various linguistic features used in existing works for further in-depth
comparison with our linguistic approach. Table 2 shows precision, recall, F-score, and
accuracy of the two baseline machine learning approaches and our linguistic
approach. We conducted 10-fold cross validation, and precision, recall, and accuracy
are calculated using the following formulas:
Pr ecision =
number of correctly classified positive (or negative) clauses
number of automatically classified positive (or negative) clauses
Re call =
number of correctly classified positive (or negative) clauses
number of relevant positive (or negative) clauses
F − score = 2 ×
Accuracy =
(1)
(2)
Pr ecision × Re call
Pr ecision + Re call
(3)
number of correctly classified positive and negative clauses
number of relevant positive and negative clauses
(4)
Table 2. Precision, recall, F-score, and accuracy of sentiment classfication methods
Approach
SVM-1
SVM-2
Linguistic
Approach
Term Features for SVM
BOW
NegaType
tion
Dependencies
*
*
*
*
*
Polarity
Precision
Recall
F-Score
Accuracy
Positive
Negative
Positive
Negative
Positive
0.72
0.69
0.75
0.73
0.78
0.64
0.75
0.70
0.76
0.80
0.66
0.71
0.71
0.74
0.79
Negative
0.80
0.76
0.78
0.69
(694/1000)
0.73
(731/1000)
0.78
(784/1000)
As shown in Table 2, accuracy of SVM-2 approach is significantly better than SVM1 (two-sided t-test, p <= 0.05) since the additional linguistic feature helps for sentiment
classification. Also our linguistic approach performed significantly better than the both
baselines (two-sided t-test, p <= 0.05). Table 3 shows precision, recall, F-score, and
accuracy of our linguistic approach for the six aspects. Accuracy of Overall clauses is
the highest (83%), and accuracy of Dosage clauses is the lowest (64%).
Sentiment Classification of Drug Reviews Using a Rule-Based Linguistic Approach
197
Table 3. Precision, recall, F-score, and accuracy of aspect-level sentiment classfication using a
lingusitc approach
Aspect
Overall
Effectiveness
Side Effects
Condition
Dosage
Cost
Polarity
Precision
Recall
F-Score
Accuracy
Positive
Negative
Positive
Negative
Positive
Negative
Positive
Negative
Positive
Negative
Positive
Negative
0.86
0.76
0.88
0.58
0.38
0.92
0.88
0.97
0.63
0.78
1
N.A.
0.90
0.68
0.79
0.76
0.61
0.81
0.94
0.77
0.67
0.73
1
N.A.
0.89
0.71
0.83
0.65
0.46
0.86
0.88
0.84
0.64
0.71
1
N.A.
0.83
(192/230)
0.78
(289/371)
0.78
(223/287)
0.80
(39/49)
0.64
(39/61)
1.00
(2/2)
216 wrongly classified clauses were analyzed, and a source of the errors is categorized
into six groups based on their nature. Inference Problem Error (35%, 75 clauses) is a
major source of errors because users often share their experiences without using any
subjective and medical terms. For the example clause “it felt like I was wearing a hat”, it
becomes a challenging task for the system to understand the true meaning of the clause.
System Error (27%, 58 clauses) is another major source of errors. Inaccurate parsing
results from Stanford parser caused the system to trigger irrelevant rules. For instance, in
the clause with no subject “ruined relationships tremendously”, the parser could not
correctly identify the Direct Object relation between the verb “ruined” and the noun
“relationships” and that led the system to make a wrong prediction. Also, an incomplete
set of the sentiment analysis rules is also another concern. The current rules should be
extended to handle various complex expressions in sentences. Negation handling is
another challenging issue since a negation term is meant to negate a specific component in
a clause, such as the verb, object, complement, or following clause. We noticed in many
cases that the system negated a wrong component in the dependency tree of a clause, and
it caused errors. Context Error (17%, 37 clauses) occurs since it is hard to determine the
polarity of individual clauses without knowing the whole context. For example, in the
clause “I lost 3 lbs”, it is positive for the users using slimming pills, but for cancer
patients, it is negative. Lexicon Error (9%, 20 clauses) occurs since our system is strongly
relying on lexicons. For example, the system prediction for the clause “that works too” is
negative because the word “too” is a negative word in the general lexicon. To solve this
problem, we plan to add a corrected prior score of the term “too” to the domain lexicon
which has a higher priority than the general lexicon when a prior sentiment score is
assigned to each word in clauses. User Text Error (6%, 13 clauses) caused by misspelling
and grammatical mistakes introduced difficulties for grammatical parsing, prior score
assignment, and disorder term detection. MetaMap Error (6%, 13 clauses) occurs since
MetaMap cannot detect certain disorder terms correctly. To handle the problem, we plan
to prepare our own disorder term list to compensate for MetaMap errors.
198
5
J.-C. Na et al.
Conclusion
With the rapid growth of user-generated content on the Internet, sentiment analysis is
becoming important in digital libraries. We have applied sentiment analysis to health
and medical domains, particularly focusing on public opinions on drugs with various
aspects. Experiment results showed the effectiveness of our proposed linguistic
approach, and it performed significantly better than the baseline machine learning
approaches. Various challenging issues were indentified through error analysis, and
we plan to continue improving our linguistic algorithm.
References
[1] Aronson, A.R., Lang, F.M.: An Overview of MetaMap: Historical Perspective and
Recent Advances. Journal of American Medical Informatics Association 17, 229–236
(2010)
[2] Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An Enhanced Lexical
Resource for Sentiment Analysis and Opinion Mining. In: The Seventh International
Conference on Language Resources and Evaluation (LREC 2010), pp. 2200–2204 (2010)
[3] Bodenreider, O., McCray, A.T.: Exploring semantic groups through visual approaches.
Journal of Biomedical Informatics 36, 414–432 (2003)
[4] Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: The
International Conference on Web Search and Web Data Mining, pp. 231–240 (2008)
[5] de Marneffe, M.-C., MacCartney, B., Manning, C.D.: Generating typed dependency
parses from phrase structure parses. In: The 5th International Conference on Language
Resources and Evaluation (2006)
[6] Jaloba, A.: The club no one wants to join: Online behaviour on a breast cancer discussion
forum. First Monday 14(7) (2009)
[7] Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers (2012)
[8] Na, J.-C., Thet, T.T., Khoo, C., Kyaing, W.Y.M.: Visual Sentiment Summarization of
Movie Reviews. In: International Conference on Asian Digital Libraries, ICADL 2011,
Beijing, China, pp. 277–287 (2011)
[9] Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using
machine-learning techniques. In: The Conference on Empirical Methods in Natural
Language Processing, pp. 79–86 (2002)
[10] Quirk, R., Greenbaum, S., Leech, G., Svartvik, J.: A Comprehensive Grammar of the
English Language. Longman (1985)
[11] Sarasohn-Kahn, J.: The Wisdom of Patients: Health Care Meets Online Social Media.
California Healthcare Foundation, Oakland (2008)
[12] Thet, T.T., Na, J.-C., Khoo, C.: Aspect-Based Sentiment Analysis of Movie Reviews on
Discussion Boards. Journal of Information Science 36(6), 823–848 (2010)
[13] Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level
sentiment analysis. In: The Conference on Human Language Technology and Empirical
Methods in Natural Language Processing, pp. 347–354 (2005)
[14] Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing Contextual Polarity: An Exploration of
Features for Phrase-Level Sentiment Analysis. Computational Linguistics 35(3), 399–
433 (2009)