Download Sentiment Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A SURVEY
ON
SENTIMENT ANALYSIS
CHARU NANDA
MOHIT DUA
BANASTHALI VIDYAPITH
NIT, KURUKSHETRA
JAIPUR, RAJASTHAN
HARYANA
AGENDA

INTRODUCTION

ABOUT NLP

TASKS OF NLP

SENTIMENT ANALYSIS

APPROACHES

OPINION MINING

CLASSES OF SA

LEVELS OF SA

WORK DONE

RELATED WORK

CONCLUSION AND FUTURE SCOPE

REFERENCES
INTRODUCTION
 NLP IS A SUBFIELD OF ARTIFICIAL INTELLIGENCE AND HELPS IN THE
INTERACTION BETWEEN COMPUTERS AND HUMANS USING NATURAL
LANGUAGES.
 SENTIMENT ANALYSIS IS AN “APPLICATION OF NLP” WHICH HELPS IN
PROCESSING HUGE DATA IN THE FORM OF HUMAN OPINION OR
REVIEWS ABOUT PARTICULAR THINGS OR OBJECT ON THE WEB AND
CATEGORIZE THEM INTO DIFFERENT CLASSES.
ABOUT NLP
 NATURAL LANGUAGE PROCESSING (NLP) IS A FIELD OF COMPUTER
SCIENCE AND COMPUTATIONAL LINGUISTICS CONCERNED WITH THE
INTERACTION BETWEEN COMPUTER SYSTEMS AND HUMAN BEING.
 NLP IS CONSIDERED AS A SUB FIELD OF ARTIFICIAL INTELLIGENCE (AI)
AND IS RELATED TO HUMAN-COMPUTER INTERACTION.
Theory and development of
comp systems able to
perform tasks requiring
human intelligence.
Study of language and its
structure.
AI
LINGUISTICS
NLP
 THE APPROACHES TO NLP ARE BASED ON MACHINE LEARNING, A TYPE OF
ARTIFICIAL INTELLIGENCE THAT USES PATTERNS IN DATA TO IMPROVE
UNDERSTANDING.
 A GOOD ADVANTAGE OF MACHINE LEARNING ALGORITHMS IS THAT IT
FOCUSES ON THE MOST COMMON CASES.
 ON PROVIDING HUGE DATA, THE SYSTEMS THAT ARE BASED ON
AUTOMATIC LEARNING ARE MORE ACCURATE.
TASKS OF NLP
1.
MACHINE TRANSLATION
2.
SENTIMENT ANALYSIS
3.
OPTICAL CHARACTER RECOGNITION
4.
DISCOURSE ANALYSIS
5.
PART-OF-SPEECH TAGGING
6.
SPEECH RECOGNITION
7.
WORD SENSE DISAMBIGUATION
8.
STEMMING
9.
QUESTION ANSWERING
10. PARSING….
SENTIMENT ANALYSIS
“PROCESS OF IDENTIFYING AND CATEGORIZING
OPINIONS EXPRESSED IN A PIECE OF TEXT, ESPECIALLY IN
ORDER TO DETERMINE WHETHER THE WRITER'S ATTITUDE
TOWARDS A PARTICULAR TOPIC, PRODUCT, ETC. IS
POSITIVE, NEGATIVE, OR NEUTRAL.”
SENTIMENT ANALYSIS IS PROCESS OF EXTRACTING INFORMATION
USUALLY FROM A SET OF DOCUMENTS, OFTEN USING ONLINE REVIEWS
TO DETERMINE POLARITY ABOUT OBJECTS.
SENTIMENTS APPEAR LIKE:
 I LIKE IT- POSITIVE
 IT WAS HORRIBLE- NEGATIVE
 IT WAS JUST ONE TIME WATCH- NEUTRAL
POSITIVE
WORDS
NEGATIVE
WORDS
NEUTRAL
WORDS
Good
Bad
May be
Great
Boring
Not sure
Awesome
Disappointing
It could be
Beautiful
Worst
I don’t know
Superb
Horrible
So so
Marvelous
Unhappy
Maybe yes maybe no
APPROACHES
EARLIER APPROACHES
1.
MEETINGS
2.
INTERVIEWS
3. QUESTION ANSWERING SESSIONS
CURRENT APPROACHES
1.
BLOGS
2. SOCIAL MEDIA WEBSITES
3. FORUMS
“THE CLASSIFICATION OF SENTIMENT ANALYSIS APPROACHES CAN BE
CATEGORIZED AS MACHINE LEARNING WHERE IT FOCUSES ON THE
DEVELOPMENT OF COMPUTER PROGRAMS THAT CAN TEACH THEMSELVES
TO GROW AND CHANGE WHEN EXPOSED TO NEW DATA, OTHER IS
LEXICON BASED WHICH RESTS ON THE IDEA THAT AN IMPORTANT PART
OF LEARNING A LANGUAGE CONSISTS OF BEING ABLE TO UNDERSTAND
AND PRODUCE LEXICAL PHRASES AS CHUNKS AND THE LAST ONE IS
HYBRID APPROACH WHICH IS THE COLLECTION OF ABOVE MENTIONED
TWO APPROACHES.”
OPINION MINING
 OPINION MINING IS A SYNONYM OF SENTIMENT ANALYSIS. AS THE NAME
SUGGESTS MINING WHAT IS THE OPINION OF PEOPLE ABOUT DIFFERENT
THINGS IS WHAT IT REFERS TO.
 IT IS THE TYPE OF TEXT MINING WHICH CLASSIFIES THE TEXT ACCORDING
TO ITS POLARITY INTO DIFFERENT CLASSES.
 VOLUMINOUS AMOUNT OF DATA IS PRESENT ON THE WEB AND IN
CONTINUOUSLY INCREASING.
 THE PROCESSING OF THIS DATA HAS BECOME THE NECESSITY OF TODAY'S
TIME.
OPINIONS
Implicit &
Explicit
Regular &
Comparative
 EXPLICIT AND IMPLICIT
EXPLICIT DATA IS THE INFORMATION THAT IS PROVIDED INTENTIONALLY SUCH AS
REGISTRATION FORMS ETC.
IMPLICIT DATA IS INFORMATION THAT IS NOT PROVIDED INTENTIONALLY BUT IS
GATHERED, EITHER THROUGH ANALYSIS OF EXPLICIT DATA OR THROUGH DIRECTLY.
 REGULAR AND COMPARATIVE
REGULAR OPINION IS CONSIDERED AS THE STANDARD OPINION AND IT CAN
EITHER BE DIRECT OR INDIRECT
DIRECT – “THE RESOLUTION OF THIS PHONE IS BRILLIANT.”
INDIRECT – “AFTER I SWITCHED TO THIS PHONE, I LOST ALL MY DATA!”
IN COMPARATIVE ONE CAN EXPRESS THEIR OPINIONS BY THE COMPARING
SIMILAR ENTITIES RATHER THAN BY EXPRESSING IT DIRECTLY AS POSITIVE OR
NEGATIVE.
“WHITE SHIRT IS BETTER THAN BLACK.”
CLASSES OF SENTIMENT ANALYSIS
 BASED ON WORK DONE TILL NOW IT HAS BEEN OBSERVED THAT TASK OF
SENTIMENT ANALYSIS IS PERFORMED ON THE SUBJECTIVE REVIEWS
COLLECTED FROM THE INTERNET.
 THESE REVIEWS ARE THEN PROCESSED AND CATEGORIZED INTO THREE
DIFFERENT CLASSES NAMELY
• POSITIVE
• NEGATIVE
• NEUTRAL
LEVELS OF SENTIMENT ANALYSIS
Document Level
Sentence Level
Entity and aspect level
 DOCUMENT LEVEL- : AT THIS LEVEL, IT IS CHECKED WHETHER THE WHOLE
OPINION DOCUMENT EXPRESSES A POSITIVE OR NEGATIVE SENTIMENT .
 SENTENCE LEVEL- AT THIS SENTENCE LEVEL, THE TASK IS RESTRICTED TO
SENTENCES AND THE POLARITY OF THOSE SENTENCES IS DETERMINED.
 ENTITY AND ASPECT LEVEL- THIS IS THE LEVEL WHICH GIVES THE MOST
PRECISE SENTIMENT WHICH IS NOT CLEARLY DEFINED BY THE ABOVE TWO
LEVELS. IT GIVES FINE GRAINED ANALYSIS OF EACH OF THE PARTICULAR
ENTITY.
WORK DONE
AS MENTIONED EARLIER, THE WORK DONE TILL NOW HAS BEEN ON
SUBJECTIVE REVIEWS.
 THE WORK DONE TILL NOW IS CONSIDERABLE ON
• NEWS
• TWITTER
• AMAZON DATA
• BUSINESS DATA
 DIFFERENT LANGUAGES IN WHICH WORK HAS BEEN DONE
• ENGLISH
• NEPALI
• CHINESE
• BENGALI
• THAI
 MANY DIFFERENT APPROACHES ARE USED FOR THE PROCESSING OF TEXT
SUCH AS MACHINE LEARNING, DOMAIN KNOWLEDGE DRIVEN ANALYSIS,
STATISTICAL APPROACHES.
 THESE APPROACHES PROVIDE AN EFFECTIVE WAY OF SCRUTINIZING
INFORMATION AND DECIDING THE POLARITY OF THAT PARTICULAR DATA.
RELATED WORK
 DAS AND HIS TEAM DEVELOPED SENTIWORDNET FOR BENGALI
LANGUAGE CONSISTING OF 35,805 WORDS .
 SHARMA ET. AL USED UNSUPERVISED LEXICON METHOD OVER THE TWEETS
TO FIND THE POLARITY.
 HUANGFU ET. AL PRESENTED IMPROVED CHINESE NEWS SENTIMENT
ANALYSIS METHOD.
 YAN ET. AL PROPOSED AND IMPLEMENTED TIBETAN SENTENCE SENTIMENT
TENDENCY JUDGMENT SYSTEM BASED ON MAXIMUM ENTROPY MODEL.
 JOSHI ET. AL PROPOSED STRATEGY FOR HINDI LANGUAGE. THEY
DEVELOPED A HINDI SENTIWORDNET BY REPLACING WORDS OF ENGLISH
WORDNET BY THEIR HINDI EQUIVALENTS.
CONCLUSION
 ON THE BASIS OF STUDIES AND SURVEYS DONE TILL DATE, THIS CAN
BE PROCLAIMED THAT VARIETY OF TASKS AND APPROACHES HAVE
BEEN USED AND RESULTED IN MANY ACHIEVEMENTS.
 THE BEST CLASSIFICATION METHOD AMONGST THE THREE IS
CONSIDERED TO BE THE MACHINE LEARNING APPROACH USED FOR
PREDICTING THE POLARITY OF THE SENTIMENTS BASED ON THE
DATASET.
 THIS CAN BE USED IN ALL THE DIFFERENT ASPECTS AND WILL BE ABLE
TO GIVE MUCH GOOD RESULTS.
 AREAS COULD BE POLITICS, FASHION, EDUCATION, BUSINESS ETC.
FUTURE SCOPE
 IT IS OBSERVED THAT SENTIMENT ANALYSIS PLAYS A VITAL ROLE IN
RESOLVING THE ISSUE OF POLARITY OF THE REVIEWS.
 THE STUDY SHOWED THAT THE DOMAIN OF WORK IN HINDI
LANGUAGE IS VERY SMALL.
 ALSO SOME NEW TECHNIQUES COULD BE UTILIZED TO INCREASE THE
ACCURACY OF THE RESULTS AND HENCE GIVE US MORE ACCURATE
AND PRECISE ANSWERS.
REFERENCES
 AMITAVA DAS, SIVAJI BANDOPADAYA, SENTIWORDNET FOR BANGLA, KNOWLEDGE
SHARING EVENT -4: TASK, VOLUME 2,2010.
 AMITAVA DAS, SIVAJI BANDOPADAYA, ”SENTIWORDNET FOR INDIAN LANGUAGES”,
PROCEEDINGS OF THE 8TH WORKSHOP ON ASIAN LANGUAGE RESOURCES, PAGES
5663, BEIJING, CHINA, AUGUST 2010.
 YAKSHI SHARMA, VEENU MANGAT, MANDEEP KAUR, A PRACTICAL APPROACH TO
SEMANTIC ANALYSIS OF HINDI TWEETS”, 1ST INTERNATIONAL CONFERENCE ON NEXT
GENERATION COMPUTING TECHNOLOGIES(NGCT-2015), DEHRADUN, INDIA,PAGE
NO(677-680), SEPTEMBER 4-5, 2015.
 YU HUANGFU, GUOSHI WU, YU SU JING LI, PENGFEI SUN JIE HU, “ÄN IMPROVED
SENTIMENT ANALYSIS ALGORITHM FOR CHINESE NEWS”, 12TH INTERNATIONAL
CONFERENCE ON FUZZY SYSTEMS AND KNOWEDGE DISCOVERY(FSKD), PAGE
NO(1366-1371), 2015.
 PURTATA BHOIR, SHILPA KOLTE, “SENTIMENT ANALYSIS OF MOVIE REVIEWS USING
LEXICON APPROACH”, IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL
INTELLIGENCE AND COMPUTING RESEARCH, 2015.
 XIADONG YAN, TAO HUANG, “TIBETIAN SENTENCE SENTIMENT ANALYSIS BASED ON
THE MAXIMUM ENTROPY MODEL”.10TH INTERNATIONAL CONFERENCE ON BROADBAND
AND WIRELESS COMPUTING, COMMUNICATION AND APPLICATION, PAGE NO (594597), 2015.
 CHANDAN PRASAD GUPTA, BAL KRISHAN, “DETECTING SENTIMENT ANALYSIS IN NEPALI
TEXTS, IEEE, PAGE NO (1-4), 2015.
 ZHONGKAI HU,JIANQING HU,WEIFENG DING,XIAOLIN ZHENG“REVIEW SENTIMENT
ANALYSIS BASED ON DEEP LEARNING”,12TH INTERNATIONAL CONFERENCE ON EBUSINESS ENGINEERING, IEEE, PAGE NO(87-94), 2015.
 YAN WAN, HONGZHURUI NIE. TIANGUANG LAN, ZHAHUI WANG, “FINE GRAINED
SENTIMENT ANALYSIS OF ONLINE REVIEWS”, 12TH INTERNATIONAL CONFERENCE ON
FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PAGE NO(1406-1411), 2015
 ANDREA SALINCA, “BUSINESS REVIEW CLASSIFICATION EMPLOYING SENTIMENT
ANALYSIS”, 17TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC
ALGORITHMS FOR SCIENTIFIC COMPUTING, PAGE NO (247-250), 2015.
THANK YOU…