Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Advance Foundation And Research In Science & Engineering (IJAFRSE) Volume 2, Special Issue , Vivruti 2016. Impact Factor: 1.036, Science Central Value: 26.54 The Survey: Trend Analysis Using Big Data Sushant Savita Madhukar Gandhi (PG Scholar)*, Prof.Sachin Bojewar, Asso Professor (VIT)* [email protected] [email protected] ABSTRACT In this paper, survey of trend analysis was summarized and its significance towards its methodologies was illustrated. We have develop the system trend analysis using big data with the help of map reduce framework (Hadoop). Here we have used to research on Big data, Map reduce framework & Data mining techniques. Here we have focused a variety of techniques approaches and different areas of the research which are helpful and marked as the important field of trend analysis using Big Data. Keywords: Map reduce framework, Big data, Data mining techniques I. INTRODUCTION There are a large number of social media websites that enable users to contribute, modify and grade the content. User has an opportunity to express their personal opinions about specific topics. The example of these types of websites include blogs, forums, product reviews products/sites, and social networks. If we research about daily data generation from social web sites, that survey said World’s largest community facebook’s Like button pressed 2.7 billion times every day across the web surfing, 10 million photos are uploaded everyday on facebook, emphasizes the fact that 300 million of tweets are sent to Twitter, every day too, and 3000 photos are uploaded from Flickr, every minute are without forgetting the 153 million blogs posted daily on social sites, that must be comments or reviews on brands, products, services, politics etc. and these all data we get in unstructured data format[1]. a. Social Media trend analysis Social media trend analysis represents the process of analysis and extraction of current trend from social media dataset [1]. First we have to work on social media mining: community analysis or detection, opinion mining and sentiment analysis, product reviews analysis, blogs analysis etc. considers that the Social Big Data consist of two parts. Each represented in the different formalism content, first is social relations and second is social content. The social relations is represented by social graphs and where nodes represents the users of social media and links represent relations between them just like publication of post, like and tag etc[6]. The second part social content i.e. social content and social text are the texts exchanged between users. The social content is characterized by linguistic aspect of the different interactions in social media like topics, keywords, sentiments expressed, etc [5]. b. Opinion mining and sentiment analysis The opinion mining and sentiment analysis are two research areas. That areas belongs between the fields of natural language processing, artificial intelligence and text mining. They can contained three different levels namely document level, sentence level and aspect leve[7]. In document level, it is to classify the feelings to expressed by the authors in all text document it must be positive feeling, negative 64 | © 2016, IJAFRSE and VIVRUTI 2016 All Rights Reserved www.ijafrse.org International Journal of Advance Foundation And Research In Science & Engineering (IJAFRSE) Volume 2, Special Issue , Vivruti 2016. Impact Factor: 1.036, Science Central Value: 26.54 or neutral [8]. In the sentence level, it is first necessary to determine the subjectivity of the sentence and then determine the polarity (positive, negative or neutral) of subjective sentences [9], and last in aspect level is addressed by extracting sentences or documents containing the entities associated with their aspects, at first. 1024 Gigabytes = 1 Terabyte.1024 Terabytes = 1 Petabyte.1024 Petabytes = 1 Exabyte. 1024 Exabytes = 1 Zettabyte. 1024 Zettabytes = 1 Yottabyt. II. METHODOLOG a. Big Data Here we used the first term is “Big Data”, which can be defined in different ways but we get comfortably understand the three ways different features are their common characteristics are: first is “Volume” –Large data sets, second is “Variety”-Different types of data from countless resources, and third is “Velocity”- Data should be collected in real time. Big Data can be require for new forms of processing to enable enhanced decision making, which be insight discovery and should be process optimization. As example, 2.7 zeta bytes of data exist in the digital universe, 235 terabytes of the data were already collected by the U.S. Library of congress in April 2011, which is because of only business to business communication per day[2]. b. Map Reduce Second term is “Map Reduce” framework which is base on Apache Hadoop Technology, Map Reduce framework using which we can write applications or programs to process huge amounts of data (Big Data), in parallel , on large clusters of commodity hardware in a reliable manner. Map Reduce should be processing our program for distributed computing based on java. MapReduce[3] jobs split the input data-set into individual chunks and then send for parallel processing to the map and reduce tasks. It has two important environments. Map written by user take an input pair and produce a set of intermediate key value pairs. The map reducer library group together all intermediate value associated with the same intermediate key and passes them to the reduce function. Reduce function also written by the user accepts an intermediate key and a set of value for that key it merges these value to form a possible smaller set of value. c. Core Algorithm The Mahout clustering algorithms Canopy algorithm The Canopy algorithm is divided clustering algorithm into two phases. First is by using a simple distance computing method, to divide the data sets into overlapping subsets – canopy. And second is by using a precise and rigorous computing method to calculate the data distance vectors in the same Canopy. The difference between Canopy algorithm and traditional clustering algorithm is the Canopy using two computing distance methods and only computing the overlapping data vectors[4]. K-Means algorithm K-Means algorithm is a widely used clustering algorithm. Randomly select k objects, that each objects 65 | © 2016, IJAFRSE and VIVRUTI 2016 All Rights Reserved www.ijafrse.org International Journal of Advance Foundation And Research In Science & Engineering (IJAFRSE) Volume 2, Special Issue , Vivruti 2016. Impact Factor: 1.036, Science Central Value: 26.54 represents a cluster center. For the remaining objects, according to their distance to the center of each cluster, divide them into the smallest distance cluster center, and then re-calculate each cluster center. Repeat this process until the clustering criterion function converges[4]. FP-tree algorithm First this algorithm compresses the input database creating an FP-tree instance to represent frequent items. Then it divides the compress database into a set of conditional databases, each one is associated with one frequent pattern. Finally, each database is mined separately[4]. III. IMPLEMENTATION Step 1. preparation phase: 1. The agent read key words from dictionary which contains the public opinion hot topics and trends key words[4]. 2. According the key words, the agent search on the messages into the collection of databases from various social sites. 3. All the messages will be grouped into database sets. Step 2. processing phase: 1. We can use TF-Gini algorithm to select feature words which will be stored into the intermediate text library[4]. 2. To use the Hadoop Mahout text mining algorithms process the public opinion messages. This is the most important step in the system. The mining results will be stored into the text feature library. Step 3. analysis phase: 1. The visualization tools are going to read mining results from the text library[4]. 2. According to the results, visualization tools generates various forms of reports, such as a histograms, pie charts, graphs etc. IV. CONCLUSION Public opinion analysis [4] play’s important role in the social networking society. Using social networking sites user’s comments, reviews most important manner to user express their views and thoughts. Using 66 | © 2016, IJAFRSE and VIVRUTI 2016 All Rights Reserved www.ijafrse.org International Journal of Advance Foundation And Research In Science & Engineering (IJAFRSE) Volume 2, Special Issue , Vivruti 2016. Impact Factor: 1.036, Science Central Value: 26.54 these data we find out what is current trends are going on. Trends analysis is very important and useful to choose the own decision by public easily. Hadoop is used to as a Big Data processing platform and Mahout is a Big Data mining library. V. REFERENCES [1] Guellil, I.; Boukhalfa, K., "Social big data mining: A survey focused on opinion mining and sentiments analysis," in Programming and Systems (ISPS), IEEE 2015 12th International Symposium on , vol., no., pp.1-10, 28-30 April 2015 [2] Hua Fang; Zhaoyang Zhang; Chanpaul Jin Wang; Daneshmand, M.; Chonggang Wang; Honggang Wang, "A survey of big data research," in Network, IEEE , vol.29, no.5, pp.6-9, September-October 2015 [3] Singh, N.; Agrawal, S., "A review of research on MapReduce scheduling algorithms in Hadoop," in Computing, Communication & Automation (ICCCA), IEEE 2015 International Conference on , vol., no., pp.637-642, 15-16 May 2015 [4] Songtao Shang; Minyong Shi; Wenqian Shang; Zhiguo Hong, "Research on public opinion based on Big Data," in Computer and Information Science (ICIS), 2015 IEEE/ACIS 14th International Conference on , vol., no., pp.559-562, June 28 2015-July 1 2015 [5] Jiliang Tang, Yi Chang, and Huan Liu. “Mining social media with social theories: A survey”. SIGKDD Explor. Newsl., 15(2):20–29, June 2014 [6] Christo Wilson, Alessandra Sala, Krishna PN Puttaswamy, and Ben Y Zhao. Beyond social graphs: User interactions in online social networks and their implications. ACM Transactions on the Web (TWEB), 6(4):17,2012 [7] Mostafa Karamibekr and Ali A Ghorbani. A structure for opinion in social domains. In Social Computing (SocialCom), 2013 International Conference on, pages 264–271. IEEE, 2013 [8] Valakunde, N.D.; Patwardhan, M.S., "Multi-aspect and Multi-class Based Document Sentiment Analysis of Educational Data Catering Accreditation Process," in Cloud & Ubiquitous Computing & Emerging Technologies (CUBE), 2013 International Conference on , vol., no., pp.188-192, 15-16 Nov. 2013 [9] Mostafa Karamibekr and Ali A Ghorbani. Sentence subjectivity analysis in social domains. In Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on, volume 1, pages 268–275. IEEE, 2013 67 | © 2016, IJAFRSE and VIVRUTI 2016 All Rights Reserved www.ijafrse.org