Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sentiment Detection Rik Sarkar (03305048) Kedar Godbole (03305805) Outline Sentiment detection: the problem statement Difficulties in sentiment detection Approaches to sentiment detection Conclusion Project proposal Problem Statement Detect the polarity about a particular topic in a document Polarity: - Positive - Negative - Mixed - Neutral Motivation Reviews on the Web Opinions about a product Opinions about the individual aspects of a product Movie/book reviews Feedback/evaluation forms Issues Reference to multiple objects in the same document - The NR70 is trendy. T-Series is fast becoming obsolete. Dependence on the context of the document - “Unpredictable” plot ; “Unpredictable” performance Negations have to be captured - Monochrome display is not what the user wants Issues (contd.) Metaphors/Similes - The metallic body is solid as a rock Part-of and Attribute-of relationships - The small keypad is inconvenient Absence of a polar word - How can someone sit through this seminar? Approaches to Sentiment Detection Based on pre-selected sets of words Naive Bayes Support Vector Machines Unsupervised learning Enhancement by NLP An Unsupervised Learning Technique Extract phrases from the review based on patterns of POS tags JJ – Adjective First word Second word RB – Adverb JJ NN NN – Noun RB JJ JJ JJ NN JJ Unsupervised Learning PointWise Mutual Information (PMI) and Semantic Orientation (SO) PMI(word1, word2) = p( word1 & word 2) log p( word1) p( word 2) SO (phrase) = PMI (phrase, ”excellent”) – PMI (phrase, “poor”) Unsupervised Learning Determine the Semantic Orientation (SO) of the phrases Search on AltaVista SO (phrase) = hits( phraseNEAR" excellent" )hits(" poor" ) log hits( phraseNEAR" poor" )hits(" excellent" ) Unsupervised Learning Calculate average semantic orientation of document: Extracted phrase POS tags Semantic Orientation Low fees JJ NN 0.333 Online service JJ NN 2.780 Inconveniently located RB VB -1.541 Average Semantic Orientation = 0.524 Need for NLP Identifying phrases is not enough – need to identify subject/object - The NR70 is trendy. T-Series is fast becoming obsolete. Need to identify part-of and attribute-of relationship - The battery is long-lasting Focus of the sentiment Feature/attribute terms: BNP - Base Noun Phrases - battery, display, keypad dBNP - Definite Base Noun Phrases - “the display” bBNP - Beginning Definite Base Noun Phrases - “The battery is long-lasting” Sentiment Analyzer Sentiment lexicon database - <lexical_entry> <POS> <sent_category> - “excellent” JJ + Sentiment pattern database - <predicate> <sent_category> <target> - “I am impressed with the flash capabilities” - impress + PP(by;with) target SA (contd.) Identify sentences containing feature terms Ternary expressions (T-expressions) - +ve/-ve sentiment verbs <target, verb, “”> - trans verbs <target, verb, source> Binary expressions (B-expressions) - <adjective, target> SA (contd.) Identify sentiment phrases within subject, object phrases Associating sentiment with the target - Based on sentiment patterns “I was impressed by the flash capabilities” “This camera takes excellent pictures” - Based on B-expressions “Poor performance in a dark room” Other issues Position of the sentiment words - Words at the beginning and end of a review Sentiment about the characters in the movie versus Sentiment about the actors in the movie – abstraction. “He played the role of a very corrupt politician” “He played the role brilliantly” Conclusion Sentiment detection can be used in areas ranging from marketing research to movie reviews. Sentiment Detection is a “hard” problem due to context-sensitivity, complex sentences, etc. Statistical methods should be augmented with NLP techniques. References Yi, Nasukawa, et al. Sentiment Analyzer: Extracting Sentiments about a Given Topic using NLP techniques. Proceedings of the Third IEEE International Conference on Data Mining, p. 427, Nov 19-22, 2003 Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. Proceedings of the 40th Annual Meeting of ACL, p. 417-424, 2002 Matthew Hurst and Kamal Nigam. Retrieving Topical Sentiments from Online Document Collections. Document Recognition and Retrieval XI, p. 27-34, 2004 References (contd.) B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using Machine Learning techniques. Proceedings of the 2002 ACL EMNLP Conference, p. 79-86, 2002 Project Sentiment analyzer for a specific domain Given set of features, initial list of polar words Learns new polar words from documents analyzed