Download Paper Title (use style: paper title)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Journal of Applied Environmental Sciences
ISSN 0973-6077 Volume 11, Number 1 (2016), pp. 259-266
© Research India Publications
http://www.ripublication.com
A Comprehensive Survey of Approaches Used For
Detecting Events in Twitter
1
Jaya Sree R.V., 2 Dhanalakshmi G., 3 Poonkuzhali S.M.,
4
Sharmila Banu A.
1
Assistant Professor, Department of Information Technology,
Panimalar Institute of Technology, Varadharajapuram, Poonamalle, Chennai, India.
E-mail: [email protected]
2
Assistant Professor (Grade-1), Department of Information Technology,
Panimalar Institute of Technology, Varadharajapuram, Poonamalle, Chennai, India.
E-mail:[email protected]
3
Assistant Professor, Department of Computer Science and Engineering,
Panimalar Institute of Technology, Varadharajapuram, Poonamalle, Chennai, India.
E-mail: [email protected]
4
Assistant Professor, Department of Information Technology,
Panimalar Institute of Technology, Varadharajapuram, Poonamalle, Chennai, India.
E-mail: [email protected]
Abstract
Micro bloggers are web applications which acts as a broad cast medium in
which peoples are allowed to share their statuses, information, links, images ,
videos and opinions in short messages. They offer a light weight, easy and
fastest way of communication among them. Some of the prevalent micro
blogging services are twitter, face book, Google + etc. The information posted
on twitter is called tweets. These messages provide the information that varies
from daily life time events to latest worldwide news and events. Analyzing
such rich source of user generated data can yield unprecedentedly valuable
information. Mining such valuable information helps to identify the events
that occurred over space and time. Event detection from twitter data has many
new challenges when compared to event detection from traditional media.
This paper provides a survey of various techniques used for detecting events
from twitter.
Keywords: Micro blogging, event detection, event identification, analyzing
social media, twitter data streams
Introduction
Social networking sites also called as micro bloggers are the fastest emerging in
recent years. It has become a new type of real time communication channel. They
260
Jaya Sree R.V. et al
are very popular because of its characteristics like immediacy, portability and
easiness of use. A person around the world uses such media to communicate
instantly about the real time happenings and to express their opinion on a
common topic publicly through their computers or mobile phones [14]. This
network allows each user to create their own identity and share their thoughts
among the other users. This helps the people to build a social relationship and to
identify the users with same interest and to view the content posted by other users.
[15]
Event detection has been a prominent research topic for long time [24]. Social
networks or social media is widely used as a source of information for the
detection of events like traffic congestion, incidents, natural disaster (earth
quakes, storms, floods, fires, etc.), or other events. An event is defined as the real
life happenings that occur over a period of time and space. [14] Actually, event
detection in Twitter is used for various real-time notifications such as that
necessary for help during a large-scale disaster emergency and updates of real
time traffic.
Twitter is one of the most prevalent micro blog in recent times, which produces
more than 400 million tweets by more than 140 million users per day. Twitter
allows the users to post messages called tweets which is no longer than 140
characters. Each users have several followers, who can re- tweet their post.
Because of its huge users, the content they tweet also varies based on their interest
and behavior. [11]
Unlike other media, tweets provide timely and fine grained information about any
kind of events. Twitter has also arisen as a fastest communication medium for
collecting and broadcasting breaking news. [3][17][16] Tweets can be called as
dynamic source of information. Mining or analyzing such rich source of data can
provide unknown information. It has also become a powerful tool to monitor
terrorist activity and to predict crime. [21]
Event detection in traditional media has long been addressed in a research
program called Topic Detection and Tracking (TDT) which determines the events
from news articles[1][23]. But event detection from twitter data poses new
challenge because tweets are unstructured, informal and meaningless and contain
abbreviated words. Social media content may also contain a polluted and false
data which affects the event detection algorithm.
This article focuses on a survey of approaches and techniques for event detection.
The techniques are categorized based on the type of event, detection method and
detection tasks.
Overview of Twitter
As described in introduction, twitter is the fastest growing and popular micro blogging service. Twitter was developed by Biz Stone, Jack Dorsey, Evan
Williams, and Noah Glass in March 2006 and launched by July 2006. It has been
ranked as eighth most popular site. Twitter allows posting short and concise ideas
of any kind of information, their thoughts and opinions of real time happenings in
and around them.
A Comprehensive Survey Of Approaches Used For Detecting Events In Twitter
261
Twitter allows the user to create their own profile, and connect with other users
available in twitter. A user in twitter can follow any other users without getting
their approval. No limitations are imposed by the twitter on the number of
followers for a user, but one user must typically follow only up to 2000 users. A
registered users can post or read tweets. But unregistered users can only read;
however one can set privacy for their content to hide the content. User can read
the timeline of other users to know their chronological streams of tweets.
Twitter allows the users to use the symbols like @, # etc. Hashtags are used to
group the topic in which any keyword or phrase is preceded by the symbol #. So
these hashtags are used to collect tweets of different users in a same subject of
interest. The symbol @ before a username creates a mention or reply link. The
users can also repost the tweets of another users using RT (retweet).
Twitter provides an API called twitter streaming API that allows users to access
the public data and many services programmatically. The content of twitter is
categorized as follows,
Table 1: Categories of twitter data
Category
Pointless babblers
Conversational
Pass along value
Self-promotion
Spam
News
Percentage
40%
38%
9%
6%
4%
4%
Figure 1: Architecture of Twitter streaming
262
Jaya Sree R.V. et al
Twitter messages are reported in timely manner during disaster or in any
emergency situation. Some examples are bomb blast in Mumbai in 2008, flooding
of red river valley in US and Canada in 2009, US airway plane crash on the
Hudson river in 2009. Thus twitter users are considered as information source or
information seekers.
Challenges of analyzing twitter messages:
The twitter contains meaningless messages called pointless blabbers or rumors
[5]. So the major challenge in event detection from twitter is to remove this
polluted content from real world events.
Types of Events
Event has been classified as temporal event and spatio-temporal event. Temporal
events are related to time series. Spatio- temporal events corresponds to events
occurred over a period of time at a specific space.
They have been also classified as retrospective events and new events [22, 2].
Retrospective events (RED) are historical events that occurred in past. New
events (NED) occur in live streams.
Event Detection Algorithms
In Traditional Media:
Event detection algorithm in traditional media can be classified as document pivot
method and feature pivot method.
a) Document pivot method
By using document pivot method, the events are detected by clustering of similar
documents. The similarity of the document is determined based on the semantic
distance between the documents and its textual similarity. In traditional media, the
event detection is done on news documents. The research project TDT focuses on
this issue [1]. TDT (Topic Detection and Tracking) consists of three major tasks:
 Segmentation
 Detection
 Tracking
TDT attempts to identify the important events from the newswire and present
them to the users in a concise manner.
The corpus of news documents is preprocessed and the data are represented in a
specific format for further investigation. The preprocessing step includes stop
word removal, stemming and tokenization technique. The preprocessed data are
represented in term vector or bag of words format. Each term of the document are
weighed and represented in tf-idf matrix (term frequency- inverse document
frequency) [19]. Another form of data representation is named entity
representation [12]. Sometimes, both representations are combined together into
mixed vector [23, 12].
A Comprehensive Survey Of Approaches Used For Detecting Events In Twitter
263
b) Feature pivot method
By using feature pivot method, distribution of words in the corpus is analyzed and
the words are grouped together to detect events.
An infinite state automaton is modeled to identify the bursty documents in a
stream over a particular period of time [10]. The appearance of individual words
is modeled using binomial distribution and the bursty words are detected by some
threshold based heuristics. Such bursty words are grouped together to identify the
bursty events [6]. The above two algorithm detects the events by analyzing words
from time domain.
The words can also be analyzed by frequency domain using discrete Fourier
transformation (DFT) [8] and wavelet transformation [9, 7].
In Twitter:
Events from twitter messages are generally detected using machine learning
approaches. Either it can be of supervised or unsupervised learning. Events to be
detected falls under any of the two types (NED or RED) mentioned in section 3.
Most of the research emphasis on new event detection (NED), but in recent year’s
interest in RED also increased like query based event retrieval [13].
The targeted events can either be specified or unspecified. Specifie d events
include planned social events or known events (ex: Any disaster in particular
region). In specified event detection all information like time, venue, and location
are known. Unspecified events includes unplanned or unknown events (ex: tweets
about late breaking news) [20].
Detection of events from twitter stream is done by various techniques like
machine learning and data mining, NLP (natural language processing), text
mining, information extraction and information retrieval.
Unsupervised learning: Like traditional media, unsupervised learning for
unspecified events in twitter is detected using clustering algorithms. This
unsupervised learning of unspecified event is most appropriate for NED events
than RED events. Clustering NED events produces accurate and efficient events.
[20, 4, 17]
Supervised Learning: NED for specified events relies on supervised learning.
Many supervised learning algorithms have been proposed for event detection.
Some of them are Naïve Bayes classification algorithm, SVM (support vector
machines), decision tree [18]
In some research, both supervised clustering and unsupervised classification is
combined together for event detection. Relevant or important tweets are detected
using classification before applying the clustering algorithm.
Performance
One of the major issues in event detection of twitter messages is performance
evaluation. In traditional media, the performance metric includes precision, recall
and F-measure.
264
Jaya Sree R.V. et al
Precision is defined as fraction of detected events that are relevant.
Precision=|{relevant events}∩{detected events}|
|{detected events}|
Recall is defined as fraction of relevant events that are detected.
Recall=| {relevant events} ∩ {detected events}|
|{relevant events}|
F-measure is the harmonic mean of precision and recall.
Recall measure is not suitable for noisy data sets. And it is infeasible for twitter
messages. Thus precision metric in most of the existing research work
Conclusion
This paper focuses on event detection from micro-blogs. Event detection finds real
life happenings that occurred over period of time and space. A micro blogging
website provides valuable unknown knowledge from user generated content.
However, tweets contain humdrum messages (meaningless, polluted, and useless).
Thus event detection helps to determine relevant information of specific interest.
This paper provides a brief survey of various approaches used for detecting events
from twitter messages. We also presented the characteristics of twitter and challenges
of analyzing the tweets. We discussed various types of events (NED, RED) and its
suitable detection techniques. The method of detection events includes unsupervised
and supervised learning which uses clustering algorithms and classification
algorithms respectively. Finally, the performance metric of event detection algorithm
is also discussed.
References
[1]
[2]
[3]
Allan, J., J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. 1998. Topic
detection and tracking pilot study final report. In Proceedings of the DARPA
Broadcast News Transcription and Understanding Workshop, Lansdowne,
VA, pp. 194–218.
Allan, J., R. Parka, and Lavrenko V. 1998. Online new event detection and
tracking. In Proceedings of the 21st Annual International ACMSIGIR
Conference on Research and Development in Information Retrieval, SIGIR
’98, ACM, New York, NY, pp. 37–45.
Amer-Yahia, S., S. Anjum, A. Ghenai, A. Siddique, S. Abbar, S. Madden, A.
Marcus, And M.El-Haddad. 2012. Maqsa: A system for social analytics on
news. In Proceedings of the 2012 ACM SIGMOD International Conference
A Comprehensive Survey Of Approaches Used For Detecting Events In Twitter
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
265
on Management of Data, SIGMOD ’12, ACM, New York, NY, pp. 653–656.
Becker, H., M. Naaman, and L. Gravano. 2011a. Beyond trending topics:
Real-world event identification on Twitter. In ICWSM, Barcelona, Spain.
Castillo, C., M. Mendoza, and B. Poblete. 2011. Information credibility on
Twitter. In Proceedings of the 20th International Conference on World Wide
Web, WWW ’11, ACM, New York, NY, pp. 675–684.
Fung, G. P. C., J. Xu Yu, P. S. Yu, and H. Lu. 2005. Parameter free bursty
events detection in text streams. In Proceedings of the 31st International
Conference on Very Large Data Bases, VLDB ’05, VLDB Endowment, pp.
181–192.
Gerald Kaiser. A friendly guide to wavelets. Birkhauser Boston Inc.,
Cambridge, MA, USA, 1994.
He, Q., K. Chang, and E.-P. Lim. 2007. Analyzing feature trajectories for
event detection. In Proceedings of the 30th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, SIGIR
’07, ACM, New York, NY, pp. 207–214.
Ingrid Daubechies. Ten lectures on wavelets. Society for Industrial and
Applied Mathematics, Philadelphia, PA, USA, 1992.
Kleinberg, J. 2002. Bursty and hierarchical structure in streams. In
Proceedings of the Eighth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD ’02, ACM, New York, NY,
pp. 91–101.
Krishnamurthy, B., P. Gill, and M. Arlitt. 2008. A few chirps about Twitter.
In Proceedings of the First Workshop on Online Social Networks, WOSN
’08, ACM, New York, NY, pp. 19–24.
Kumaran, G., and J. Allan. 2004. Text classification and named entities for
new event detection. In Research and Development in Information Retrieval,
Sheffield, UK, pp. 297–304.
Metzler, D., C. Cai, and E. H. Hovy. 2012. Structured event retrieval over
micro blog archives. In HLTNAACL, pp. 646–655.
Milstein .S, Chowdhury .A, Hochmuth. G, Lorica. B, and Magoulas. R.
Twitter and the micro-messaging revolution: Communication, connections,
and immediacy.140 characters at a time. O’Reilly Media, 2008.
Mislove. A, Marcon.M, Gummadi K. P., Druschel.P and Bhattacharjee. B,
“Measurement and analysis of online social networks,” in Proc. 7th ACM
SIGCOMM Conf. Internet Meas., San Diego, CA, USA, 2007, pp. 29–42.
Pak, A., and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis
and opinion mining. In Proceedings of the Seventh International Conference
on Language Resources and Evaluation (LREC’10), European Language
Resources Association (ELRA), Valletta, Malta.
Phuvipadawat, S., and T. Murata. 2010. Breaking news detection and tracking
in Twitter. In IEEE/WIC/ACM International Conference on Web Intelligence
and Intelligent Agent Technology (WI-IAT), Vol. 3, Toronto, ON, pp.
120–123.
266
Jaya Sree R.V. et al
[18] Popescu, A. M., M. Pennacchiotti, and D. Paranjpe. 2011. Extracting events
and event descriptions from Twitter. In Proceedings of the 20th International
Conference Companion on World Wide Web, WWW ’11, pp. 105–106.
[19] Salton, G. 1988. Automatic Text Processing: The Transformation, Analysis
and Retrieval of Information by Computer. Addison-Wesley: Boston.
[20] Sankaranarayanan, J., H. Samet, B. E. Teitler, M. D. Lieberman, and J.
Sperling. 2009. TwitterStand: News in tweets. In Proceedings of the 17th
ACM SIGSPATIAL International Conference on Advances in Geographic
Information Systems, GIS ’09, ACM, New York, NY, pp. 42–51.
[21] Wang, X., M. S. Gerber, and D. E. Brown. 2012. Automatic crime prediction
using events extracted from Twitter posts. In Proceedings of the 5th
International Conference on Social Computing, Behavioral-Cultural
Modeling and Prediction, SBP’12. Springer Verlag: Berlin, Heidelberg, pp.
231–238.
[22] Yang, Y., T. Pierce, and J. Carbonell. 1998. A study of retrospective and online event detection. In Proceedings of the 21st Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval,
SIGIR ’98, ACM, New York, NY, pp. 28–36.
[23] Yang, Y., J. G. Carbonell, R. D. Brown, T. Pierce, B. T. Archibald, and X.
Liu. 1999. Learning approaches for detecting and tracking news events. IEEE
Intelligent Systems, 14(4): 32–43.
[24] Yiming Yang, Tom Pierce, and Jaime Carbonell. A study of retrospective and
online event detection. In SIGIR ’98: Proceedings of the 21st annual
international ACM SIGIR conference on Research and development in
information retrieval, pages 28–36, New York, NY, USA, 1998.ACM.