Download There many researches have been done on spammer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Technology Research Preparation 32144
Assignment IV- Literature Review
Social Spammer Behaviour the NEXT MOVE
(Track Tag System on Social Network Analysis)
Prepared By: Anand A. Chinchore
UTS ID: 11621392
Supervisor: Dr. Guandong Xu
Lecturer and Tutor: Dr. Laurel Evelyn Dyson
Table of Contents
1.
Abstract ................................................................................................................................... 1
2.
Introduction ............................................................................................................................. 2
3.
Research Question .................................................................................................................. 5
4.
Literature Review and Related Work ..................................................................................... 6
5.
4.1.
Fighting spam on social website ...................................................................................... 6
4.2.
Framework for unsupervised spam detection in social networking site ......................... 7
4.3.
Social Spam Detection ...................................................................................................... 8
4.4.
Mining for geographically disperse communities in social networks .............................. 8
4.5.
Crime pattern detection and collaborative mining ........................................................... 9
Develop intuitive techniques - new pattern making ............................................................. 11
5.1.
Behaviour that Track NEXT MOVE ............................................................................... 12
5.2.
Research Methods .......................................................................................................... 13
5.3.
Use of concept Survey of Approach .............................................................................. 15
5.4.
Objectives ....................................................................................................................... 15
5.4.1. Main objective ................................................................................................... 15
5.4.2. Sub-objectives .................................................................................................... 15
5.5. Research Contribution .................................................................................................... 15
6.
Conclusion ............................................................................................................................ 17
7.
References ............................................................................................................................. 18
Assignment IV | Technology Research Preparation 32144
1. Abstract
This research involved the Spammer detection on social networking website for the
cybercrime investigation. The purpose of this research is to show how we can detect the behavior
of the spammer on social networking website of social groups with the help of tags and
bookmarks.
There is lot of work done on spammer detection on social networking websites and their
post. All those previous researches help to identify the spam post with the help of the tags and
bookmarks attached with the post on social network website. The first phase of this research is to
check those tags and bookmarks with the help of available data and with the methods and
techniques of all researches done in past. As the spammer always change his techniques it is very
hard to identify what he or she is going to do next. The second phase of this research is to
identify the specific spammer behavior with the help of all the data which came out of past
research methods and techniques used in first phase.
In the last phase, the research will may help to predict the future of spammer attack and
his or her desire next intention. It will help to predict the future intention, development of new
idea (development of pattern) and may be the next target of the spammer in future research area.
The research aspect will explore the new ear of the crime detection with new data mining
technique to prevent the cyber world attack and assist cybercrime investigation, data mining and
Social network analysis for new development. This research will open the new door to the virtual
world.
Assignment IV | Technology Research Preparation 32144
2. Introduction
Today, crime and its patterns are increasing on social network websites. More than 80%
of the population in metro cities cannot live without internet even for an hour, specifically,
without the social networking websites such as Facebook, Twitter, and chat groups of yahoo and
g+. From the positive side, it is one of the greatest evolutions of the century where people are
getting closed to each other from every corner of the word through such social networking
website. On the other hand, there is thousands of new registrations everyday with real and fake
profiles on social network websites. Crime through such social networking website such as
stealing information with the help of fake profiles, personal information access and alteration,
identity theft, download and upload confidential information, financial thefts are increasing.
As crime increased the war for fighting with it increased with new technology,
techniques and methodologies. Researches in data mining such as crime pattern detection,
analysis social networking data with text mining, automatic online monitoring, mining criminal
network of the chat log, forensic media analysis and many more has contributed significantly in
the cybercrime investigation since year 2000.
For such contribution, the data mining field introduced at the start of 21st century when
cybercrime just began. ‘Some people say that the actual birth of data mining happened in London
in 1662 when John Graunt wrote natural and political observations made upon the bills of
mortality’ (www.dataminingarticles.com, 2013). Today, there are a number of data mining
techniques used to evaluate the pattern from available data set stored in a computer.
Pattern such as text mining, entity extraction, clustering, association rule mining,
sequential pattern mining, deviation detection, classification, string comparator and social
network analysis is highly used to achieve desired results with the addition of the new algorithms
and little modification as per the requirement. Crime investigation using data mining techniques
Assignment IV | Technology Research Preparation 32144
is the additional contribution in cybercrime investigation area. Focusing on the data in a system
rather than the whole database, what you overlook can often be more important to find the
specific pattern and increase speed and depth in investigate of cybercrime. Data mining help to
find such a pattern with the over mentioned practices.
Data mining techniques with new idea will develop safer and more efficient
environment for crime investigator. Algorithms, methods and theories have profoundly
contributed the information technology field. Furthermore, protection of data on social network
websites and secured database enhance security at a certain level, but again, making it fool-proof
is a very tough task. An expert hacker aware of the web cob and core programming which can
help to bypass the antivirus or firewall protection and get into the web servers, and individual or
companies can lose a huge amount of data and confidential information in a click of time.
Research of new pattern using a data mining techniques helped cybercrime investigator
and spot the criminals such as spammer in less amount of time. But the investigation needs some
new way or finds behavior of crime pattern detection. As per the literature review it has been
seen the most of the techniques finds the same patterns researches done in past but just with little
modification. I agree that alteration of algorithm and methodology help to find new way in
research but it just make another subway for same thing. We all know that as technology change
the human mind change. We adapt new technology as fast as we could. So far crime patterns or
idea’s also. Researching of behavior of the spammer on social network website help crime
investigation to find the behavioral pattern of the spammer and their future attacks.
This research uses the specifically use the tags and later bookmarks to identify the spam
post. After identification the research will help to identify the pattern of specific spammer and
his or her attack pattern. It will trace out the future attack or NEXT TARGET of the spammer.
This research focuses on how trace spammer behaviour with the tags and bookmarks he
or she uploaded with the post on social website. It also explores the intention of the spammer
Assignment IV | Technology Research Preparation 32144
behind the specific action. This action may help to find his or her NEXT MOVE. With the help of
data mining techniques for cybercrime investigation and past researches this may help to track
the pattern of crating spam and prevent future crime attack.
This research may help the cybercrime investigation process in the design of high
quality techniques in an efficient way. This would be the new door towards the spammer
detection, data mining and social network analysis field.
Assignment IV | Technology Research Preparation 32144
3. Research Question
This research will focus on detection of the spammer on social networking website. The
research includes the following questions:
I.
II.
Behaviour of Spammer: What he or she going to do next?
Tacking his or her geo location, behaviour and next target could help to understand
spammer intellectual?
III.
What is the frequency of the spammer when he or she moves to next location for attack
(How many spam attack says per month)?
IV.
Does the frequency of those attacks could track behaviour pattern of making and
attacking new spam? (Making a new spam means coding and thinking about new idea of
attack) this demonstrations the thinking pattern behind every spam.
This research will work on above mentioned questions which help to find the behaviour
of the specific spammer and his or her NEXT MOVE.
Research believe that, “there is a pattern every spammer holds in his head or memory
which developed new spam and this pattern continues if the spammer is successful more than 2
times after posting spam post for which he or she continues for NEXT MOVE ”.
Assignment IV | Technology Research Preparation 32144
4. Literature Review and Related Work
There many researches have been done on spammer detection or spammer mining. Few
of the researches focus on the spam identification which helps to identify the real or fake post on
social website. Furthermore, the researches were continued and help to identify the spammer
with the help of tags and bookmarks which may be auto reproduced or reproduce and rereproduce by others. Some of them specify the intention such as popularity, likelihood, click of
the spammer behind posting contents on social website with those tags. Moreover, again few of
them are intended to earn money on clicks. Following is the related work done by some
researcher in Spammer mining and detection area:
4.1. Fighting spam on social website
This research article introduces spam and spammer a survey approaches and future
challenges. This article surveys three categories. Detection, Demotion and prevention which
proposed before for email and web spam for counter measures which help to this survey as well.
The aim of the research is to evaluate the spam countermeasures for social websites and their
future challenges. It compares and contrasts the email spam with web spam which finds the key
characteristics which relate to spam (Heymann P., 2007).
Efficient analysis of the research shows that the characteristics substantially change the
dynamic of the adversarial relationship between service provider and spammer. Furthermore, the
in the example of social bookmarking system shows that, there is only one interaction for
contents creation, consisting of single action where user may post bookmark by himself and a
bookmark may consist of ULR and optional list of tags describe that URL. Research categorises
and display in one or more interfaces such as user bookmark, tag bookmark, most recent list,
most popular list and tag cloud.
Assignment IV | Technology Research Preparation 32144
Moreover, the identification-based method help in detection of spam which can done
manually or pattern based. Rank-based strategy helps in demotion for combining spam to reduce
the prominence of content likely to be spam. Lastly, interface- or limit-based help in prevention
method which tries to make contribution spam content which is difficult to track.
4.2. Framework for unsupervised spam detection in social networking site
This research article process the submitted user spam report on social website for spam
detection. This paper present the framework based on HITS web link analysis framework and its
instantiated models. The models subsequently introduce propagation between messages reported
by the same user, messages authored by the same user and messages with similar content.
(Bosma M, et. al , 2010).
Research aim explained a present framework that indicates the likelihood of message
being spam based on user spam report. Researcher believes that, an unsupervised framework as
introduced above is more suitable for identifying spam messages than static supervised classifier.
Unsupervised framework dose not depends on contents of the spam and can therefore detect the
new type of spam. The proposed solution find the framework based on reports alone, reports and
authors, and reports authors and similar messages which al improve spam ranking performance.
Performance also improves the similarity-author-reporter model performance which is used in
research. The Proposed model divided above three models specified.
In my conclusion, this research explained a new process model that explains the
submitted user spam messages for unsupervised spam detection. The quality of research is really
good and I would like to refer this article in my research. The literature review done in this
research referring with a high quality researches.
Assignment IV | Technology Research Preparation 32144
4.3. Social Spam Detection
This research discussed the motivation of social spam and presents the report of
automatic detection of spammer in social tagging system. The six different features helps to
achieve the 98% accuracy in detecting social spammer. The research restrict up to only social
bookmarking system. These system are called broad folksonomies that is user provide
annotations of contents that is external to the bookmarking system. Also the success of social
bookmarking system and the large communities they bind make them an attractive target for
spamming (Markines B. et. al., 2009)
As per the findings, the process plays an important role. TagSpam help to share prevalent
vocabulary to annotate resources, because spammer may use tags or combination of tags for the
post. TagBlur which track number of unrelated post and links in high frequency tags. Such as
DomFp for document structure, NumAds for search created pages for solo purpose, Plagiarism
for used copyrighted contents and finally ValidLinks which are created for spam or malicious
activity.
In my conclusion, this research achieved its aim at very high level. I would like to refer
this article in my research. The ratio of 98% success is very high in research are of detecting
spammer using social bookmarking system.
4.4. Mining for geographically disperse communities in social networks
There are various data mining techniques which are extensively used in criminal analysis
and cybercrime investigation today. It interprets data and help to understand the behavioral
pattern of the crime.
The research explained about the discovery of communities that are geographically
disperse stems from the requirement to identify higher-level organizational structures, such as a
logistics group that provides support to various geographically disperse terrorist cells. This
Assignment IV | Technology Research Preparation 32144
research applied a variant of Newman-Girvan modularity to this problem known as distance
modularity. They have modified the Louvain algorithm to find partitions of networks that
provided near-optimal solutions to this quantity. Then it was applied algorithm to numerous
samples from two real-world social networks and a terrorism network data set whose nodes have
associated geospatial locations. The research experiments show that, to be an effective approach
and highlight various practical considerations when applying the algorithm to distance
modularity maximization (Shakarian et. al., 2013).
There are some of the researches which were successful to discover the use of data
mining in crime investigation process. To find the terrorist on social network is challenging task.
This research gave one direction for crime investigation on social network. The quality of
research is very high and currently in use for social network analysis and crime investigation in
U.S. army. I would certainly like to refer this article in my research which provided me direction
to find the geographically disperse community on social network. It will help me to guide and
find the next targeted location of the specific spammer. The research has done good quality of
literature review and provided 21 references.
4.5. Crime pattern detection and collaborative mining
Same as automatic system, it is possible find the co-offenders network links for crime
pattern detection by use of clustering techniques (Nath S. V., 2006). This research detects the
geo-special area of crime pattern. Criminal group discovery can reveal from multiple social
network data by the use of association mining and statistical method (Fard, A. M. and M. Ester,
2009). Crime pattern detection and collaborative mining in social network data discover criminal
group became easy with this research study. This explores the different types of crime patterns,
that which help to detect the group of criminals on a social network website. I would certainly
Assignment IV | Technology Research Preparation 32144
like to refer this research article in my report as it focuses on geo-special are as well as specific
crime pattern which I am looking forward in spammer detection and their behaviour.
Assignment IV | Technology Research Preparation 32144
5. Develop intuitive techniques - new pattern making
Develop intuitive techniques for cybercrime investigation is one of the challenging
researches in data mining. With the combination of few old data mining techniques and in
addition of new, process and pattern detection, it is possible to find the new methodology for
crime and cybercrime investigation. Following process will be followed for this research:
First, text mining can detect the tags, bookmarks, name, address and even more
information about the criminal. Secondly, article surveyed three categories of potential
countermeasures, which is used for survey of approaches can help in tags and bookmark
recognition on social network website. It can also help to trace out the unregistered tags which in
included in the post of social website which bring the result of fake and real post and its
reproductions.
This research is to identify the Spammer Behavior and its NEXT MOVE. The process
involves the identification of pattern of creating spam by spammer which helps to track that what
he or she may going to developed next and their targets.
Assignment IV | Technology Research Preparation 32144
5.1. Behaviour that Track NEXT MOVE
Spam can be made by different techniques and methods. Some of them are really simple.
Tags and bookmarks are most common methods and which involves many techniques used by
spammer to send spam messages over the internet and on social website for different intentions.
This could involve likelihoods of self or the content posted, blogger usage to make money,
embedded content in spam post to make some fun or intention to steal contents or data of other
users who access the spam message unknowingly or without knowledge, post contents which
comprises reproduction, auto-reproduction and re-reproduction embedded programmed to
distribute the contents all over the world through social network website.
This research focuses on the tags which are used by spammer and their reproduction
frequency which helps to track the spammer behaviour and its future target. The research first
examines the one of the spammer and contents posted by him or her which let us know about his
or her ability to program the spam message and post on social network.
With the help of that specific spammer results, research helps to trace the spammer tags
and combination of tags used in spam by that spammer. This could be get done by the past
research methods developed by the other researcher.
Furthermore, after gathering the tags data we can process through text mining technique
to find the pattern and geographical location used by the spammer for spam attack. Text mining
techniques help to find the specific patter in normalized and extracted data from the previous
phase which help to find the behaviour pattern of the spammer behind the spam post.
Assignment IV | Technology Research Preparation 32144
5.2. Research Methods
The behaviour pattern includes finding as follows:
 Tags and combination of the tags used by spammer
 Geographical location of the spammer
 Geographical location of spam post access which trail the targeted audience.
 Social network website used for the each post
 Method(s) which developed behind each spam post
 Developed method(s) and its pattern of creative thinking
Assignment IV | Technology Research Preparation 32144
Assignment IV | Technology Research Preparation 32144
5.3. Use of concept Survey of Approach
In my research I prefer to use the concept survey of approached used by Paul Heymann
methodology (2007) with little modification. Firstly, research study extracts the contents related
to spam from available dataset. Secondly, it will identify the relevant content but using text
mining techniques which help to find the tags, geo location and targeted audience. Lastly,
research will help to trace the behaviour of the spammer and his or her NEXT MOVE.
5.4. Objectives
5.4.1. Main objective
1. To identify ways to improve the spammer mining techniques to develop the
behavioral pattern of the spammer and spam detection.
5.4.2. Sub-objectives
1. To examine the tag and combination of tag creation techniques.
2. To examine the targeted audience pattern.
3. To examine the method(s) developed or designed behind the each spam post.
4. To Identify the NEXT MOVE of the spammer and his or her pattern
5.5. Research Contribution
My research area will contribute to the:
1. Social network website to understand the behaviour of the spammer and their posts.
2. Cybercrime investigation departments able to track the spammer NEXT MOVE.
3. Spammer mining and Social Network Analysis for development of new techniques and
methods.
Assignment IV | Technology Research Preparation 32144
4. Also, it helps to understand the behavioral pattern of making spam messaged by each
spammer.
Assignment IV | Technology Research Preparation 32144
6. Conclusion
So many things are already invented in data mining and crime investigation. Crime
investigation processes need to redevelop and need to be research for new techniques, process
and methods to track the criminals and stop the crime event before it happens. Again, it is hard to
detect the future event based on the available data set and criminal information. However,
researcher cannot stop at this place with the techniques that are available. Everyday new crime
and new crime pattern appear. Criminal post online and vanish with in a blink of eye after crime
activity perform. To notice such incident need to update the system of crime pattern detection
almost every day.
Spammer can be traced out by developing different techniques via programming of super
human but to identify the thinking pattern of individual is hard. It is really hard to find what an
individual spammer going to do next. Furthermore, the action takes place by spammer by his or
her own developed continued crime pattern. On the other hand, it is not impossible to identify the
individual thinking what the individual or individual spammer is going to do next. It just need
the step-by-step evaluation of process and normalized data and data pattern such as tag and
combination of tags used in making of spam. Moreover, this research believes if find the
behaviour of the thinking pattern of the spammer then we can also track down the spammer
NEXT MOVE.
Things are already invented, we just have to re-search and re-create. Techniques and
methods are already available. We just have to mould it with the real human mind. Use of
behaviour and thinking of the criminal not only help to prevent the cyber-attacks on social
network websites but also to track its NEXT MOVE.
Assignment IV | Technology Research Preparation 32144
7. References
Alguliev, R. M., R. M. Aliguliyev, Nazirova S. A., 2011, 'Classification of Textual E-Mail
Spam Using Data Mining Techniques', Applied Computational Intelligence and Soft Computing,
Institute of Information Technology of Azerbaijan National Academy of Sciences, 9 F, Agayev
Street, Baku 1141, Azerbaijan
Benevenuto, F., G. Magno, Rodrigues T., Almeida V., 2010, Detecting spammers on twitter,
Seventh annual Colabaration, Anti aAbuse and Spam Conference.
Benevenuto, F., T. Rodrigues, Almeida V., Almeida Jussara, Zhang C., Ross K., 2008,
'Identifying video spammers in online social networks', Proceedings of the 4th international
workshop on Adversarial information retrieval on the web, Beijing, China, pp. 45-52.
Bosma, M., E. Meij, Weerkamp W., 2012, 'A Framework for Unsupervised Spam Detection in
Social Networking Sites', Advances in Information Retrieval, Springer Berlin Heidelberg, pp.
364-375.
Benevenuto, F., T. Rodrigues, Almeida V., Almeida Jussara, Almeida J., Goncalves M., 2009,
'Detecting spammers and content promoters in online video social networks', Proceedings of the
32nd international ACM SIGIR conference on Research and development in information
retrieval, Boston, MA, USA, pp. 620-627.
Fard, A. M. and M. Ester, 2009, 'Collaborative Mining in Multiple Social Networks Data for
Criminal Group Discovery', Computational Science and Engineering, 2009. CSE '09.
Lee, K., J. Caverlee, Kamath K. Y. Cheng Z., 2012, 'Detecting collective attention spam',
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. Lyon, France, pp.
48-55.
Liu, J.-Y., Y.-X. Zhao, Wang Y-H, Hu L., 2012, 'Spam Short Messages Detection via Mining
Social Networks', Journal of Computer Science and Technology, Vol- 27, pp. 506-514.
Markines, B., C. Cattuto, Menczer F., 2009, 'Social spam detection', Proceedings of the 5th
International Workshop on Adversarial Information Retrieval on the Web, Madrid, Spain, pp.
41-48.
Paul, H., 2007, 'Fighting Spam on Social Web Sites: A Survey of Approaches and Future
Challenges, Vol-11, pp.36-45.
Shakarian, P., P. Roos, Calahan D., 2013, 'Mining for geographically disperse communities in
social networks by leveraging distance modularity', Proceedings of the 19th ACM SIGKDD
international conference on Knowledge discovery and data mining, Chicago, Illinois, USA, pp.
1402-1409.
Shwu-Min, H., 2009, 'The Behavior and Preferences of Users on Web 2.0 Social Network Sites:
An Empirical Study', Information Technology: New Generations.
Assignment IV | Technology Research Preparation 32144
Shyam Varan, N., 2006, 'Crime Pattern Detection Using Data Mining', Web Intelligence and
Intelligent Agent Technology Workshops, 2006, WI-IAT Workshops, IEEE/WIC/ACM.
Stafford, G. and L. L. Yu, 2013, 'An Evaluation of the Effect of Spam on Twitter Trending
Topics', Social Computing (SocialCom).
Tsigkas, O., O. Thonnard, Tzovaras D., 2012, 'Visual spam campaigns analysis using abstract
graphs representation', Proceedings of the Ninth International Symposium on Visualization for
Cyber Security', Seattle, Washington, pp. 64-71.
Wang, A., 2010, 'Detecting Spam Bots in Online Social Networking Sites: A Machine Learning
Approach', Data and Applications Security and Privacy XXIV, Springer Berlin Heidelberg. Vol6166, pp. 335-342.
Wei, C., A. Sprague, Warner G., Skjellum A., 2008, 'Mining spam email to identify common
origins for forensic application', Proceedings of the ACM symposium on Applied computing.
Fortaleza, Ceara, Brazil, pp.1433-1437.
Yardi, S., D. Romero, Romero D., Schoenebeck G., Boyd D. 2009, 'Detecting spam in a Twitter
network', vol 15, http://firstmonday.org/ojs/index.php/fm/article/view/2793/2431.
Assignment IV | Technology Research Preparation 32144