Download Movie Review Mining and Summarization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Movie Review Mining and
Summarization
Li Zhuang, Feng Jing, and Xiao-Yan Zhu
ACM CIKM 2006
Speaker: Yu-Jiun Liu
Date : 2007/01/10
Outline
 Introduction
 The characteristic of movie review
mining
 Definition
 Approach
 Experiment
Introduction
 Review is useful for both information
promulgators and readers.
 However, many reviews are lengthy with
only few sentences expressing the
author’s opinions.
 Automatically generate the summary of
reviews.
 Product Review v.s. Movie Review
The characteristic of movie review
mining
 The promulgators probably comment
more other movie-related elements.
 The reader probably wants more.
 Movie review must generate richer
summary than product review.
 A multi-knowledge based approach.
Definition 1
 Movie Feature
 A movie feature is a movie element or a movierelated people that has been commented on.
 According to IMDB, feature classes are divided
into two groups: ELEMENT and PEOPLE.
 ELEMENT: OA, ST (screenplay), SE (special effects)…etc.
 PEOPLE: PPR, PDR, PAC…etc.
 Example: “story”, “script”, and “screenplay” belong to
ST class; “actor”, “actress”, and “supporting cast”
belong to PAC class.
Definition 2
 Relevant Opinion of A Feature
 The relevant opinion of a feature is a set of
words or phrases that expresses a positive
(PRO) or negative (CON) opinion on the
feature.
 The polarity of a same opinion word may vary in
different domain.
 Example: “predictable” is neutral in product
review; sounds negative in movie review.
Definition 3
 Feature-Opinion Pair
 A feature-opinion pair consists of a feature
and a relevant opinion.
 An explicit F-O pair : both the feature and
the opinion appear in sentence.
 Example: “The movie is excellent.”
 An implicit F-O pair : the feature or the
opinion does not appear in sentence.
 Example: “When I watched this film, I hoped it
ended as soon as possible.” (no opinion word)
Approach – multi-knowledge based
Keyword list generation
 Build a keyword list to capture main
feature/opinion words in movie reviews.
 Divide the list into two classes: features
and opinions.
Feature Keywords
 The words converge.
 Special parts: People Name (multi-format)
(ex: Liu Yu Jiun ; Liu Y.J. ; L. Y. Jiun … etc)
Opinion Keywords
 Not only use the statistical results.
 The first 100 positive/negative words are
selected as seed.
 For each substantive in WordNet, search it
in WordNet for the synsets of its first two
meanings. If one of the seed words is in the
synsets, the substantive is added to the
opinion word list.
 Remained opinion words with high frequency
are added as domain specific words.
Mining Explicit F-O Pairs
 In a sentence, use keyword list to find all feature/opinion
words.
 Use dependency grammar graph to detect the path
between each feature word and each opinion word.
 Stanford Parser
(http://www-nlp.stanford.edu/software/lex-parser.shtml)
Mining Explicit F-O Pairs II
 Example: “This movie is a masterpiece.”
 Path: “movie (NN) – nsubj – is (VBZ) – dobj – masterpiece (NN)”
Mining Implicit F-O Pairs
 This problem is difficult, so only deal with two
simple cases with opinion words appearing.
 Very short sentences that appear at the beginning
or ending of a review and contain obvious opinion
words.
 Ex: “Great!”  “movie-great” or “film-great”
 Specific mapping from opinion word to feature word.

Summary Generation
1.
2.
3.
Collect all the sentences that express opinions on
a feature class.
The semantic orientation of the relevant opinion in
each sentence is identified.
List the organized sentence as the summary.
Experiments
 Performance measure
Data
 Select 11 movies from the top 250 list of
IMDB.
 For each movie, the first 100 reviews are
downloaded.
 Totally more than 16,000 sentences and
more than 260,000 words.
 Four movie fans were asked to label f-o
pairs, and give the classes of feature
word and opinion word respectively.
Results
 Use 880 reviews as training data, and
220 reviews as testing data.
Results II