Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ERASMUS MUNDUS cLINK PROGRESS REPORT Date: 08/12/2013 Title: Product Reputation Evaluation based on Social Network using Multi Agent System Name of Student: Umar Farooq x PhD Post Doc Abstract: The Word Wide Web has dramatically changed the way people express their opinion about product and services. The customer can post product reviews on different websites and express their emotion in internet discussion forums, groups, blogs, twitter etc. There is a rich source of information available on Internet about products and services which can be used by both customer as well as industrial organization to make decisions. It is a challengeable task to collect and organize these information, therefore most of the industrial organization ignore it. Organizations currently make decision throughout product life cycle using Product Life Cycle Management System (PLM). We want to incorporate the unstructured information available on social network and to integrate it with structure information in PLM to build a shared vision. This shared knowledge can be used to make useful decision throughout product life cycle. The aim of the thesis is to gather, process and analyze knowledge from web and combined it with PLM. We are proposing a multi agent system in which each agent works on different type of knowledge. Each agent will share knowledge with other in order to build comprehensive knowledge and this knowledge can be used by organization to make some useful decisions. Keywords : Product Reputation, Sentiment Analysis, Opinion Mining, Multi Agent System In order to proposed a multi agent system for product reputation evaluation I will work on the following tasks. 1) 2) 3) 4) 5) 6) Information extraction from Social Network Data Cleaning Sentiment Analysis (I am currently working on this topic) Reputation Evaluation Integration of unstructured data with PLM ( Product Knowledge Management) Industrial Case Study Schedule for 24 Months Month April, 2013 May, 2013 June, 2013 July, 2013 August, 2013 September, 2013 Topic, Description and Outcomes Topic: Sentiment Analysis Topic Description: In the first 6 months I will work on Sentiment analysis, as I am already working on it, so I will continue it. I will try to improve the architecture that I have already been proposed for sentiment analysis; I will try to automate it. After automation we will be able to perform experiment to produce result and to check the accuracy of our system. First I will check the system on product reviews and then on tweeter data and after than on other social networks. During this duration I will also work on Feature base sentiment analysis, so that we can find the opinion of customer about features of product. Note: Sentiment Analysis for product and the proposed architecture is explained in detail in the last section of this document. Outcomes: One research Publications in Journal on Sentiment Analysis Month October, 2013 November,213 December, 2013 Topic, Description and Outcomes Topic: Information extraction from SN Topic Description: This work will answer that how we can extract data about a product from SN? On the basis of which criteria data will be extracted? This work will give a review of existing methods and try to proposed new and efficient methods for information extraction. We will propose agent architecture which will extract data from different social networks sites. We will propose an opinion search engine which will take the product from user as in input, and will extract opinionated text from different social networks. This work will address the following research questions. January, 2014 February, 2014 March, 2014 April, 2014 May, 2014 June, 2014 1. How to extract reviews of product from different review sites (such as amazon.com, eopinion.com etc)? 2. How to extract data from tweeter about product? 3. How to extract data from blogs and forums? 4. How to extract data from other SNs? Outcomes: One or two research publications on information extraction from SN Topic: Cleaning of user Data Topic Description: As we know that user comments have a lot of noise, so we need to remove the noise before performing sentiment analysis. We will use the existing methods to remove noise. Which spelling correction method can be used? Reviews have little noise however discussion forums, blogs and tweeter have a large amount of noise. In tweeter the users use special characters and symbols to express opinion which is called Photonics. We will use the existing methods in order to convert the photonics into text expression? Topic: Reputation Evaluation Topic Description: In this work I will study the current reputation model use for product evaluation and will try to propose a model. Most current product reputation model are for customers, therefore we need to propose a reputation model for organization, so that they can know about their product strengths and weaknesses. Some reviews sites uses 5 star to rank the product, we also incorporate those model and will also try to provide a visualization. This work will address that on which criteria we can evaluate the product reputation? We will propose a multi criteria model per source and then will integrate to evaluate production reputation which will be based on multi sources (Many SNs) and multi criteria. In this way we will be able to evaluate product reputation which is based on unstructured date in the form of social networks. Outcomes: One Publication on Product Reputation Evaluation Model Month Topic, Description and Outcomes July, 2014 Topic: Integration of unstructured data with PLM System August, 2014 Topic Description: This work will address that how we can integrate the knowledge that we have learnt from SN with Product Life Cycle Management System. In another words how we can integrate the unstructured data coming from SN with the structure data in PLM System? September, 2014 October, 2014 November,2014 December, 2014 January, 2014 February, 2015 March, 2015 We will propose a data management policy to manage data flow coming from unstructured source with the structure source. From ontology engineering concepts we will try to use ontology merging, ontology mapping and consistency analysis. Outcomes: One Publication on Integration of unstructured data with PLM System Topic: Industrial Case Study Topic Description: Industrial case study will be perform in order to test our system, when, where and for which product it is applicable? This work will answer that how we can use the shared understanding about product in different phases of product life cycle? How this knowledge can be used to make useful decisions? This will also address that when, where and for which product this knowledge can be used? In this regard the nature of the product plays very key role. For example in product which follow evolutionary or prototype model, this shared knowledge can be used in different phase of product life cycle. For manufacturing product we can use while developing a new product or want to make some advancement in the existing product. It can be very useful in software industry, because a little effort is need to release a new version of software after taking views from customers. Outcomes: One publication on using shared knowledge for the improvement of product. One publication on A multi agent system for product reputation evaluation based on SN. Thesis writing and Defense Figure 1: High Level Blocked Diagram of Thesis Web Information Extraction and Cleaning Sentiment Analysis Product Reputation Evaluation Integration of Unstructured Data with PLM PLM System Sentiment Analysis for Product Reputation 1. Introduction: People express their opinion about product on the World Wide Web (WWW) in different ways. A customer may post product review on a product reviews websites, merchant website and express their emotion on blogs, groups, discussion forums and twitter. The customer views about a product on the WWW are present in two forms: in document from and sentence form [4]. Most user generated contents on review sites and merchant sites are in document form; however the user views and comments on blogs, discussion forums and twitter are in sentence form. This information can be used by both customer and industrial organization to make some decisions. For example if a customer wants to buy a product, it first checks the product reviews of existing customer. Similarly organization can also use this information to check product reputation, drawbacks, strength and market trend. Sentiment analysis and opinion mining is used to find that whether opinion of customer about a product is positive or negative. It is the broad area of natural language processing, computational linguistics and text mining. Sentiment analysis is the study of opinion and emotion expressed in text. S. Das et al [1] and R. M. Tong [2] have introduced Sentiment Analysis for the first time in 2001 and the term opinion mining was first used by Dave et al [3] in 2003. It is challengeable task to determine that the emotion expressed in a document or sentence about a product are positive or negative. A lot of standalone applications are developed to find sentiment i.e. for product reviews, news, tweeters and blogs [13], [14], [15], [16], [17] some of which are explain in section 3. The sentiment analysis may be document level, or sentence level [4]. In document level it usually considers that the document is opinionated and generally over all polarity of document is classified as positive or negative. In sentence level first it is identified that whether the sentence is subjective (opinionated) or objective. If sentence is subjective then it is classified as a positive or negative. For practical application it is not enough to identify that opinion about a product is positive or negative. A more detailed analysis is needed in order to know the customer likes, dislikes, product strengths, weaknesses and market trend. This detailed analysis can be used for the improvement of product life cycle. For this purpose featured based sentiment analysis need to be performed. A customer may express opinion about product, product features, attributes, characteristics, parts or components. It is therefore needed to identify the target feature about which customer express opinion and then determine that whether it is positive or negative. Another more detailed study is needed to analyze comparative sentences. Sometime the customer compares two products in a single sentence. It is needed to identify that which features of which product the customer preferred. Several other detailed analyses can be done in order to build knowledge. We want to propose a system that extract data about product from social networks, clean it and perform sentiment analysis in order to classify the opinion as positive or negative. A multi criteria and multi-source model will be used to evaluate product reputation. This reputation data will be integrated with PLM System in order to make decision during product life cycle. 2. Related work: Sentiment analysis is an emerging and challengeable research topic in recent years. Different researchers worked on different aspect of this issue and several supervised, unsupervised, machine learning and rule based methods are proposed. A lot of sentiment analysis applications are developed for product reviews, blogs, forums and tweeter. The current work on sentiment analysis can be classified as sentence, document and feature level sentiment classification [4]. Bing Liu et al [14] proposed a number of techniques based on data mining and natural language processing to identify the features of product on which opinion is expressed and classify the opinion as positive or negative. Each features is then ranked according to it frequency to provide a summary to customer. Some other researchers such as Papescu and Etzioni [15] also used the same techniques in a different way for extraction of features and opinion bearing words. An attribute based sentiment analysis system, using semantic role labeling tool is proposed by Hanxiao Shi et al [16]. This system provide visualization of product features to customer which is built from all reviews about that product, this help the customer to make decisions about the product. Another method proposed by Xiaojun Li et al [18] that also using semantic role labeling tool to classify the opinion about camera in reviews. Soo-Min Kim and Eduard Hovy [21] developed a system that takes a topic and text about the topic from user and automatically finds the opinion holder and opinion expressed. In this method a collection of positive and negative words is constructed and synonyms and anatomy set is obtained from WordNet. This dictionary of word is then used to find the individual polarity of the word, which is combined to find the overall polarity of the sentence. Mikhail Bautin et al [19] proposed a method to mine opinion about some leaders and countries from international news. News and articles are extracted from online news resources in nine different languages. Other languages are translated into English using IBM WebSphere Translation Server (WTS). Then corpus is used to find opinion about some world leaders (such as George Bush and Vladimir Putin) and countries ( i-e Egypt and Israel). A Sentence level sentiment analysis system has been proposed by A. Khan et al [20] for product reviews. In this method a bag of sentence is constructed from product reviews. Noise from sentence is removed and spelling corrections are made. Each sentence is then classified as a subjective or objective using both machine learning and lexical approaches. The polarities of subjective sentences are then identified by using lexical dictionary such as SentiWordNet in consideration with contextual feature of each word in sentence. Most of these techniques are domain dependent and extract data from a specific source either from product reviews sites, twitter or blogs. Most of the system uses its own lexicon dictionary of opinionated word and perform sentiment at document level. Very few systems perform word sense disambiguation and tackle the impact of conjunction, modifiers on polarity at sentence level. All these systems are standalone application and is not integrated with other system, it simply provide a summary to both customer and organization. We want to propose a sentiment analysis system that will incorporate all social networks, and perform sentiment at sentence level by incorporating conjunction, modifiers as well as word sense disambiguation. This knowledge is then integrated with PLM System to construct a product knowledge base, which will be used throughout product life cycle. 3. Proposed solution for sentiment analysis We are going to propose a system that uses sentiment analysis to discover the opinion of customer about products and services. We know that data on the Social Network is in unstructured form therefore the data is first brought into a structure form so that a machine can process it easily. For this purpose our system first extract data about product or service from Social networks and store it in database. Our system perform sentiment at sentence level therefore each sentence from database is given to part of speech tagger (POS), which identify the different part of speech used in a sentence. Sentiment bearing part of speech such as adjective verb, adverb and noun which carry the polarity are given to SentiWordNet [6] in order to find the polarity score of each word. Before obtaining the polarity score the context of word is identified in which it appear. The numeric polarity score of these parts of speech are combined to know the overall polarity of the sentence. If negation words such as not, don’t cannot are found then the system will reverses the polarity of sentence. On the other hand if conjunction words such as and, or, but, however etc. are found then the system will use conjunction handling heuristics and the polarity is calculated according to the conjunction word used and feature discussed. The following figure show architecture of the propose system, which consist of Text loader, POS Tagger, SentiWordNet Sentiment Classifiers modules, Word Sense Disambiguation, WordNet and Conjunction Handling Heuristics, each module is explain in the next section. 3.1 Text Loader: This module extracts text about product and service from Social Network and stores it in database. The views and comments about a product or service may be post on merchant site, review sites, blogs, discussion forums and twitter. Therefore Different methods can be used to extract information from these sources. The following methods can be used. 1) If the emotion is available on the merchant website then the organization has full control on data and it is easy for them to accessed and manage the information. 2) Most reviews site allows both customers as well as organization to download product reviews. 3) Emotion may be extracted from twitter about a particular topic using various tools such Tweet Archivist and Tweet Seeker etc. 4) Twitter also provides an Application Programming Interface (API) that can be used to collect tweets and other related data [9]. 5) We can also write our own code using web crawling and web scraping in order to extract data about a product from social networks. 6) If the merchant has already an account on tweeter then it can also downloads its all tweets, replies, followers and direct messages. 3.2 Part of Speech(POS) tagger: Part of speech tagger is used to tag each word in a sentence with it appropriate part of speech. A subjective sentence is a sentence in which a customer express it opinion about a product or service. This sentence can include verb, noun, adjective, adverb, negation, conjunction which are very important in sentiment analysis. Adjective, verb and adverb are actual sentiment bearing words and on the bases of these parts of speech we can find the overall polarity express in whole sentence. However other words such as negation, conjunction, enhancer and reducer can also modify the polarity, so these part of speech must also be handle. Part of speech tagger is used to identify different part of speech. Sentiment bearing words are identified so that we can find the individual polarity score from SentiWordNet. Negation, conjunction and modifier are discovered to change the polarity according to the modifier word used. The following table show some tags assign by part of speech tagger to different words. Table 1: Tags of POS Tagger No 1 2 3 4 5 Part of Speech Adjective Verb Adverb Noun Conjunction POS Tagger Abbreviation JJ VB RB NN CC SentiWordNet Abbreviation A V R N _ 3.3 SentiWordNet: SentiWordNet [6] is a database of opinionated words derived from WordNet [7]. It is publically available for researcher. In this database each word is associated with its polarity score representing positive and negative sentiment. Most sentiment analysis system uses SentiWordNet to classify sentiment. Our system also uses SentiWordNet to find polarity of the opinionated words. Our system gives adjectives, verbs, adverbs and nouns to SentiWordNet in order to find their associative polarity scores. 3.4 Word Sense Disambiguation: We know that a word may be used in many senses. Therefore in SentiWordNet a word may have many polarity score depending on the context of the word in which it can be used. A word may have positive polarity score if used in one context conversely it may have negative polarity score if used in another context. Therefore Before obtaining polarity score from SentiWordNet we need to identify the context in which the word appears. This identification of the context of the word in a sentence is refers to sense disambiguation. Most of the research on sentiment analysis focused on finding opinion out of context [22]. However now the researchers try to classify the opinion by considering the context of the word [20, 22, 23, 25]. We will use the method proposed by A. Khan with some changes [20]. In this method the POS tag pattern of the sentence is match with all possible senses extracted from WordNet glossaries. The system will first look for exact match and extract the polarity score of that sense from SenitWorNet. In the absence of exact match the system will go for nearest match and will extract the polarity score of that sense from SentiWordNet. In this way we are able to obtained polarity score of word from SentiWordNet according to the context in which it appears. 3.5 Sentiment Classifier This module is divided into verb sentiment classifier, noun sentiment classifier, adjective sentiment classifier and sentence sentiment classifier sub modules. Verb, noun and adjective sentiment classifier classifying by the polarity score of corresponding part of speech using SentiWordNet. The polarity score is a numeric value that can be positive or negative and it is obtain from database of opinionated words called SentiWordNet. However before obtaining the polarity score we need to identify the context in which the word is used, so we can obtain the right polarity score corresponding to that context. Sentence sentiment classifier combines the polarity of adjectives verbs, adverb and nouns in order to find the overall polarity of the sentence. In the presence of negation and conjunction the sentence polarity is readjusted according to negation and conjunction heuristics. 3.6 Negation Negations are those words that reverse the polarity of the sentence if found. These words such as No, not, don’t, never, wouldn’t, can’t, doesn’t etc are very important in finding the polarity. If these words are found then our system reverse the polarity of the sentence. For example if the polarity is 0.45 and a negation word is found then the system will reverse it to -0.45. 3.7 Conjunction: Conjunctions are those words that combine words and phrases in a sentence. Example of conjunction words are AND, OR, BUT, HOWEVER, SIMILARLY etc. The conjunctions words has a substantial impact on the overall polarity of the sentence, this issue is address by Arun Meena and T.V. Prabhakar [8] and A.Khan [20] in his papers. It is very challengeable task to identify the polarity of the whole sentence in presence of these words. Current research work only focus on classifying the opinion in the presence of conjunction without considering the feature. Researchers try to find the whole sentiment express in a sentence. If a single feature of an object is discussed then this method works fine. However when opinion about more than one features is expressed then it is not useful to classify the whole sentence as positive or negative. Therefore if opinion is expressed about multi feature in the presence of conjunction then our system consider that multiple opinions are expressed in a single sentence about multi features and therefore we need to classify each separately. For example in sentence “The screen of this mobile is good but the battery life is short” the conjunction word “but” is used here. In this sentence opinion about scree and battery is expressed, therefore we need to find the opinion about each feature separately. In the presence of conjunction word our systems first find that whether one or more than one feature is discussed in a sentence. If one feature is discussed then our system used the rules proposed by Meena et al [8] for conjunction to find the overall polarity of the sentence. On the other if multiple features are discuss then our system find the polarity of each phrase separately. 3.8 Feature based sentiment analysis: In feature based sentiment analysis a more detailed analysis is performed. In this model first it is identified that about which feature of product the customer express their opinion and then it is classified as a positive or negative. From this analysis we can identify the customer likes dislikes and strength and weakness of the product. The customer usually uses noun and noun phrases to talk about product features, however sometime adjective can also be used. Therefore we can extract product feature by incorporating these part of speech. There are two types of feature; one is called frequent and another one is infrequent feature [26]. Frequent feature are those feature about which many customer express their opinion. On the other hand the feature about which few customer talk about is called infrequent. Frequent feature are easier to identify than infrequent because it appear many times. Association mining method [27] can be used to identify all frequent features as used by [26]. Different customer may use different word to talk about the same feature therefore we can also obtain the synonym set of the frequent feature from WordNet to build a dictionary of frequent features. 4. Sentiment analysis of comparative sentences: Evaluation of a product can be done in two ways, direct appraisal and comparison [4]. In direct opinion in a single sentence about a single product is expressed. In comparative sentence two or more competitive products are compared. It is very useful to extract such sentences in order to identify which product the customer prefer and which feature of which product the customer likes or dislikes. 5. Competitor Analysis: Most organization arranges survey or customer poll to evaluate their own as well as competitor products. Therefore competitor analysis can also be very useful in order to know about competitor. The same sentiment analysis and feature based sentiment analysis can also be performed for competitor in order to know the strength and weakness of competing product. This information can also be very useful for the improvement of different phases of product life cycle. References: [1] [2] [3] [4] [5] Sanjiv Das and Mike Chen. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), 2001. Richard M. Tong. An operational system for detecting and tracking opinions in on-line discussion. In Proceedings of the Workshop on Operational Text Classification (OTC), 2001. Kushal Dave, Steve Lawrence, and DavidM. Pennock. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of WWW, pages 519–528, 2003. B. Liu. Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing, Second Edition, (editors: N. Indurkhya and F. J. Damerau), 2010. Ainsworth Anthony Bailey. Consumer Awareness and Use of Product Review Websites. Journal of Interactive Advertising, Vol 6, No 1, Fall 2005 [6] Esuli A, Sebastiani F. (2006). SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. Proceedings from International Conference on Language Resources and Evaluation (LREC), Genoa, 2006. [7] Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. 1993. Introduction to WordNet: An On-Line Lexical Database. http://www.cosgi.princeton.edu/~wn. Arun Meena and T. V. Prabhakar. Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. In Advances in Information Retrieval, volume 4425 of Lecture Notes in Computer Science. Springer, 2007. Haewoon Kwak , Changhyun Lee , Hosung Park , Sue Moon, What is Twitter, a social network or a news media?, Proceedings of the 19th international conference on World wide web, April 26-30, 2010, Raleigh, North Carolina, USA R. Abdullah and R. Atan and M.A.A. Murad, „MASK-SM: Multi-agent system based knowledge management system to support knowledge sharing of software [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] maintenance knowledge management”, journal of Computer and Information Science, vol. 3, issue 2, 2010. Vermeulen, S. Bohte, D., Somefun, & Poutré J. L. (2006). “Improving patient activity schedules by multi-agent Pareto appointment exchanging”. D. Choinski, M. Metzger and W. Nocon, „Hybrid multiagent system for knowledge management in distributed control system”, Journal of hybrid artificial intelligent systems, pages 124-131, 2011, Springer. Hu and Bing Liu. Mining Opinion Features in Customer Reviews, Mining, American Association for Artificial Intelligence 2004. Minqing Hu and Bing Liu, “Mining and Summarizing Customer Reviews”, KDD’04, August 22–25, 2004, Seattle, Washington, USA, 2004. Popescu, A.M. and O. Etzioni, 2004. Extracting product features and opinions from reviews. American Association for Artificial Intelligence, 2004 Shi, H., G. Zhou and P. Qian, 2010. An attribute based sentiment analysis system. Information technology Journal, 9: 1607-1614. Kushal Dave ,Steve Lawrence,David M. Pennock,”Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. Xiaojun Li,Lin Dai,Hanxiao Shi,”Opinion Mining of Camera Reviews Based on Semantic Role Labeling”,2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010). Mikhail Bautin, Lohit Vijayarenu, and Steven Skiena. International sentiment analysis for news and blogs. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM), 2008. A. Khan, B. Baharudin, and K. Khan, “Sentiment Classification from Online Customer Reviews Using Lexical Contextual Sentence Structure,” Communications in Computer and Information Science, Software Engineering and Computer Systems,, Springer Verlag, 2011, pp. 317-331 Kim, S.-M. & Hovy, E. (2004). Determining the sentiment of opinions. In Proceedings of the 20th international conference on computational linguistics (COLING 2004), August 23 – 27, 2004 (pp. 1367–1373). Geneva,Switzerland. Tamara Martn-Wanton, Alexandra Balahur-Dobrescu,Andres Montoyo-Guijarro, and Aurora Pons-Porrata.2010. Word sense disambiguation in opinion mining: Pros and cons. In Proc. of CICLing’10, Madrid,Spain. Wiebe, J., Mihalcea, R.: Word sense and subjectivity. In: 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, The Association for Computation Linguistics (2006) 1065{1072 Fellbaum, C.: WordNet: an electronic lexical database. MIT Press (1998) Mart__n-Wanton, T., Pons-Porrata, A., Montoyo-Guijarro, A., Balahur, A.: Opinion polarity detection: Using word sense disambiguation to determine the polarity of opinions. In: 2nd International Conference on Agents and Arti_cial Intelligence,Volume 1 - Arti_cial Intelligence, Valencia, Spain, INSTICC Press (2010) 483{486 M. Hu and B. Liu, “Mining and summarizing customer reviews,” Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 168–177, 2004. Agrawal, R. & Srikant, R. 1994. Fast algorithm for mining association rules. VLDB’94, 1994.