Download The Survey: Trend Analysis Using Big Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
International Journal of Advance Foundation And Research In Science & Engineering (IJAFRSE)
Volume 2, Special Issue , Vivruti 2016. Impact Factor: 1.036, Science Central Value: 26.54
The Survey: Trend Analysis Using Big Data
Sushant Savita Madhukar Gandhi (PG Scholar)*, Prof.Sachin Bojewar, Asso Professor (VIT)*
[email protected]
[email protected]
ABSTRACT
In this paper, survey of trend analysis was summarized and its significance towards its
methodologies was illustrated. We have develop the system trend analysis using big data with the
help of map reduce framework (Hadoop). Here we have used to research on Big data, Map reduce
framework & Data mining techniques. Here we have focused a variety of techniques approaches
and different areas of the research which are helpful and marked as the important field of trend
analysis using Big Data.
Keywords: Map reduce framework, Big data, Data mining techniques
I.
INTRODUCTION
There are a large number of social media websites that enable users to contribute, modify and grade the
content. User has an opportunity to express their personal opinions about specific topics. The example of
these types of websites include blogs, forums, product reviews products/sites, and social networks.
If we research about daily data generation from social web sites, that survey said World’s largest
community facebook’s Like button pressed 2.7 billion times every day across the web surfing, 10 million
photos are uploaded everyday on facebook, emphasizes the fact that 300 million of tweets are sent to
Twitter, every day too, and 3000 photos are uploaded from Flickr, every minute are without forgetting
the 153 million blogs posted daily on social sites, that must be comments or reviews on brands, products,
services, politics etc. and these all data we get in unstructured data format[1].
a. Social Media trend analysis
Social media trend analysis represents the process of analysis and extraction of current trend from social
media dataset [1]. First we have to work on social media mining: community analysis or detection,
opinion mining and sentiment analysis, product reviews analysis, blogs analysis etc. considers that the
Social Big Data consist of two parts. Each represented in the different formalism content, first is social
relations and second is social content. The social relations is represented by social graphs and where
nodes represents the users of social media and links represent relations between them just like
publication of post, like and tag etc[6]. The second part social content i.e. social content and social text are
the texts exchanged between users. The social content is characterized by linguistic aspect of the
different interactions in social media like topics, keywords, sentiments expressed, etc [5].
b. Opinion mining and sentiment analysis
The opinion mining and sentiment analysis are two research areas. That areas belongs between the
fields of natural language processing, artificial intelligence and text mining. They can contained three
different levels namely document level, sentence level and aspect leve[7]. In document level, it is to
classify the feelings to expressed by the authors in all text document it must be positive feeling, negative
64 | © 2016, IJAFRSE and VIVRUTI 2016 All Rights Reserved
www.ijafrse.org
International Journal of Advance Foundation And Research In Science & Engineering (IJAFRSE)
Volume 2, Special Issue , Vivruti 2016. Impact Factor: 1.036, Science Central Value: 26.54
or neutral [8]. In the sentence level, it is first necessary to determine the subjectivity of the sentence and
then determine the polarity (positive, negative or neutral) of subjective sentences [9], and last in aspect
level is addressed by extracting sentences or documents containing the entities associated with their
aspects, at first.
1024 Gigabytes = 1 Terabyte.1024 Terabytes = 1 Petabyte.1024 Petabytes = 1 Exabyte. 1024 Exabytes = 1 Zettabyte. 1024
Zettabytes = 1 Yottabyt.
II. METHODOLOG
a. Big Data
Here we used the first term is “Big Data”, which can be defined in different ways but we get
comfortably understand the three ways different features are their common characteristics are: first is
“Volume” –Large data sets, second is “Variety”-Different types of data from countless resources, and
third is “Velocity”- Data should be collected in real time. Big Data can be require for new forms of
processing to enable enhanced decision making, which be insight discovery and should be process
optimization. As example, 2.7 zeta bytes of data exist in the digital universe, 235 terabytes of the data
were already collected by the U.S. Library of congress in April 2011, which is because of only business
to business communication per day[2].
b. Map Reduce
Second term is “Map Reduce” framework which is base on Apache Hadoop Technology, Map
Reduce framework using which we can write applications or programs to process huge amounts of
data (Big Data), in parallel , on large clusters of commodity hardware in a reliable manner. Map
Reduce should be processing our program for distributed computing based on java. MapReduce[3] jobs
split the input data-set into individual chunks and then send for parallel processing to the map and
reduce tasks. It has two important environments. Map written by user take an input pair and produce
a set of intermediate key value pairs. The map reducer library group together all intermediate value
associated with the same intermediate key and passes them to the reduce function. Reduce function
also written by the user accepts an intermediate key and a set of value for that key it merges these
value to form a possible smaller set of value.
c. Core Algorithm
The Mahout clustering algorithms
Canopy algorithm
The Canopy algorithm is divided clustering algorithm into two phases. First is by using a simple distance
computing method, to divide the data sets into overlapping subsets – canopy. And second is by using a
precise and rigorous computing method to calculate the data distance vectors in the same Canopy. The
difference between Canopy algorithm and traditional clustering algorithm is the Canopy using two
computing distance methods and only computing the overlapping data vectors[4].
K-Means algorithm
K-Means algorithm is a widely used clustering algorithm. Randomly select k objects, that each objects
65 | © 2016, IJAFRSE and VIVRUTI 2016 All Rights Reserved
www.ijafrse.org
International Journal of Advance Foundation And Research In Science & Engineering (IJAFRSE)
Volume 2, Special Issue , Vivruti 2016. Impact Factor: 1.036, Science Central Value: 26.54
represents a cluster center. For the remaining objects, according to their distance to the center of each
cluster, divide them into the smallest distance cluster center, and then re-calculate each cluster center.
Repeat this process until the clustering criterion function converges[4].
FP-tree algorithm
First this algorithm compresses the input database creating an FP-tree instance to represent frequent
items. Then it divides the compress database into a set of conditional databases, each one is associated
with one frequent pattern. Finally, each database is mined separately[4].
III. IMPLEMENTATION
Step 1. preparation phase:
1. The agent read key words from dictionary which contains the public opinion hot topics and
trends key words[4].
2. According the key words, the agent search on the messages into the collection of databases
from various social sites.
3. All the messages will be grouped into database sets.
Step 2. processing phase:
1. We can use TF-Gini algorithm to select feature words which will be stored into the
intermediate text library[4].
2. To use the Hadoop Mahout text mining algorithms process the public opinion messages. This
is the most important step in the system. The mining results will be stored into the text
feature library.
Step 3. analysis phase:
1. The visualization tools are going to read mining results from the text library[4].
2. According to the results, visualization tools generates various forms of reports, such as a
histograms, pie charts, graphs etc.
IV. CONCLUSION
Public opinion analysis [4] play’s important role in the social networking society. Using social networking
sites user’s comments, reviews most important manner to user express their views and thoughts. Using
66 | © 2016, IJAFRSE and VIVRUTI 2016 All Rights Reserved
www.ijafrse.org
International Journal of Advance Foundation And Research In Science & Engineering (IJAFRSE)
Volume 2, Special Issue , Vivruti 2016. Impact Factor: 1.036, Science Central Value: 26.54
these data we find out what is current trends are going on. Trends analysis is very important and useful
to choose the own decision by public easily. Hadoop is used to as a Big Data processing platform and
Mahout is a Big Data mining library.
V. REFERENCES
[1]
Guellil, I.; Boukhalfa, K., "Social big data mining: A survey focused on opinion mining and
sentiments analysis," in Programming and Systems (ISPS), IEEE 2015 12th International
Symposium on , vol., no., pp.1-10, 28-30 April 2015
[2]
Hua Fang; Zhaoyang Zhang; Chanpaul Jin Wang; Daneshmand, M.; Chonggang Wang; Honggang
Wang, "A survey of big data research," in Network, IEEE , vol.29, no.5, pp.6-9, September-October
2015
[3]
Singh, N.; Agrawal, S., "A review of research on MapReduce scheduling algorithms in Hadoop," in
Computing, Communication & Automation (ICCCA), IEEE 2015 International Conference on , vol.,
no., pp.637-642, 15-16 May 2015
[4]
Songtao Shang; Minyong Shi; Wenqian Shang; Zhiguo Hong, "Research on public opinion based on
Big Data," in Computer and Information Science (ICIS), 2015 IEEE/ACIS 14th International
Conference on , vol., no., pp.559-562, June 28 2015-July 1 2015
[5]
Jiliang Tang, Yi Chang, and Huan Liu. “Mining social media with social theories: A survey”. SIGKDD
Explor. Newsl., 15(2):20–29, June 2014
[6]
Christo Wilson, Alessandra Sala, Krishna PN Puttaswamy, and Ben Y Zhao. Beyond social graphs:
User interactions in online social networks and their implications. ACM Transactions on the Web
(TWEB), 6(4):17,2012
[7]
Mostafa Karamibekr and Ali A Ghorbani. A structure for opinion in social domains. In Social
Computing (SocialCom), 2013 International Conference on, pages 264–271. IEEE, 2013
[8]
Valakunde, N.D.; Patwardhan, M.S., "Multi-aspect and Multi-class Based Document Sentiment
Analysis of Educational Data Catering Accreditation Process," in Cloud & Ubiquitous Computing &
Emerging Technologies (CUBE), 2013 International Conference on , vol., no., pp.188-192, 15-16
Nov. 2013
[9]
Mostafa Karamibekr and Ali A Ghorbani. Sentence subjectivity analysis in social domains. In
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM
International Joint Conferences on, volume 1, pages 268–275. IEEE, 2013
67 | © 2016, IJAFRSE and VIVRUTI 2016 All Rights Reserved
www.ijafrse.org