Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Technology Research Preparation 32144 Assignment IV- Literature Review Social Spammer Behaviour the NEXT MOVE (Track Tag System on Social Network Analysis) Prepared By: Anand A. Chinchore UTS ID: 11621392 Supervisor: Dr. Guandong Xu Lecturer and Tutor: Dr. Laurel Evelyn Dyson Table of Contents 1. Abstract ................................................................................................................................... 1 2. Introduction ............................................................................................................................. 2 3. Research Question .................................................................................................................. 5 4. Literature Review and Related Work ..................................................................................... 6 5. 4.1. Fighting spam on social website ...................................................................................... 6 4.2. Framework for unsupervised spam detection in social networking site ......................... 7 4.3. Social Spam Detection ...................................................................................................... 8 4.4. Mining for geographically disperse communities in social networks .............................. 8 4.5. Crime pattern detection and collaborative mining ........................................................... 9 Develop intuitive techniques - new pattern making ............................................................. 11 5.1. Behaviour that Track NEXT MOVE ............................................................................... 12 5.2. Research Methods .......................................................................................................... 13 5.3. Use of concept Survey of Approach .............................................................................. 15 5.4. Objectives ....................................................................................................................... 15 5.4.1. Main objective ................................................................................................... 15 5.4.2. Sub-objectives .................................................................................................... 15 5.5. Research Contribution .................................................................................................... 15 6. Conclusion ............................................................................................................................ 17 7. References ............................................................................................................................. 18 Assignment IV | Technology Research Preparation 32144 1. Abstract This research involved the Spammer detection on social networking website for the cybercrime investigation. The purpose of this research is to show how we can detect the behavior of the spammer on social networking website of social groups with the help of tags and bookmarks. There is lot of work done on spammer detection on social networking websites and their post. All those previous researches help to identify the spam post with the help of the tags and bookmarks attached with the post on social network website. The first phase of this research is to check those tags and bookmarks with the help of available data and with the methods and techniques of all researches done in past. As the spammer always change his techniques it is very hard to identify what he or she is going to do next. The second phase of this research is to identify the specific spammer behavior with the help of all the data which came out of past research methods and techniques used in first phase. In the last phase, the research will may help to predict the future of spammer attack and his or her desire next intention. It will help to predict the future intention, development of new idea (development of pattern) and may be the next target of the spammer in future research area. The research aspect will explore the new ear of the crime detection with new data mining technique to prevent the cyber world attack and assist cybercrime investigation, data mining and Social network analysis for new development. This research will open the new door to the virtual world. Assignment IV | Technology Research Preparation 32144 2. Introduction Today, crime and its patterns are increasing on social network websites. More than 80% of the population in metro cities cannot live without internet even for an hour, specifically, without the social networking websites such as Facebook, Twitter, and chat groups of yahoo and g+. From the positive side, it is one of the greatest evolutions of the century where people are getting closed to each other from every corner of the word through such social networking website. On the other hand, there is thousands of new registrations everyday with real and fake profiles on social network websites. Crime through such social networking website such as stealing information with the help of fake profiles, personal information access and alteration, identity theft, download and upload confidential information, financial thefts are increasing. As crime increased the war for fighting with it increased with new technology, techniques and methodologies. Researches in data mining such as crime pattern detection, analysis social networking data with text mining, automatic online monitoring, mining criminal network of the chat log, forensic media analysis and many more has contributed significantly in the cybercrime investigation since year 2000. For such contribution, the data mining field introduced at the start of 21st century when cybercrime just began. ‘Some people say that the actual birth of data mining happened in London in 1662 when John Graunt wrote natural and political observations made upon the bills of mortality’ (www.dataminingarticles.com, 2013). Today, there are a number of data mining techniques used to evaluate the pattern from available data set stored in a computer. Pattern such as text mining, entity extraction, clustering, association rule mining, sequential pattern mining, deviation detection, classification, string comparator and social network analysis is highly used to achieve desired results with the addition of the new algorithms and little modification as per the requirement. Crime investigation using data mining techniques Assignment IV | Technology Research Preparation 32144 is the additional contribution in cybercrime investigation area. Focusing on the data in a system rather than the whole database, what you overlook can often be more important to find the specific pattern and increase speed and depth in investigate of cybercrime. Data mining help to find such a pattern with the over mentioned practices. Data mining techniques with new idea will develop safer and more efficient environment for crime investigator. Algorithms, methods and theories have profoundly contributed the information technology field. Furthermore, protection of data on social network websites and secured database enhance security at a certain level, but again, making it fool-proof is a very tough task. An expert hacker aware of the web cob and core programming which can help to bypass the antivirus or firewall protection and get into the web servers, and individual or companies can lose a huge amount of data and confidential information in a click of time. Research of new pattern using a data mining techniques helped cybercrime investigator and spot the criminals such as spammer in less amount of time. But the investigation needs some new way or finds behavior of crime pattern detection. As per the literature review it has been seen the most of the techniques finds the same patterns researches done in past but just with little modification. I agree that alteration of algorithm and methodology help to find new way in research but it just make another subway for same thing. We all know that as technology change the human mind change. We adapt new technology as fast as we could. So far crime patterns or idea’s also. Researching of behavior of the spammer on social network website help crime investigation to find the behavioral pattern of the spammer and their future attacks. This research uses the specifically use the tags and later bookmarks to identify the spam post. After identification the research will help to identify the pattern of specific spammer and his or her attack pattern. It will trace out the future attack or NEXT TARGET of the spammer. This research focuses on how trace spammer behaviour with the tags and bookmarks he or she uploaded with the post on social website. It also explores the intention of the spammer Assignment IV | Technology Research Preparation 32144 behind the specific action. This action may help to find his or her NEXT MOVE. With the help of data mining techniques for cybercrime investigation and past researches this may help to track the pattern of crating spam and prevent future crime attack. This research may help the cybercrime investigation process in the design of high quality techniques in an efficient way. This would be the new door towards the spammer detection, data mining and social network analysis field. Assignment IV | Technology Research Preparation 32144 3. Research Question This research will focus on detection of the spammer on social networking website. The research includes the following questions: I. II. Behaviour of Spammer: What he or she going to do next? Tacking his or her geo location, behaviour and next target could help to understand spammer intellectual? III. What is the frequency of the spammer when he or she moves to next location for attack (How many spam attack says per month)? IV. Does the frequency of those attacks could track behaviour pattern of making and attacking new spam? (Making a new spam means coding and thinking about new idea of attack) this demonstrations the thinking pattern behind every spam. This research will work on above mentioned questions which help to find the behaviour of the specific spammer and his or her NEXT MOVE. Research believe that, “there is a pattern every spammer holds in his head or memory which developed new spam and this pattern continues if the spammer is successful more than 2 times after posting spam post for which he or she continues for NEXT MOVE ”. Assignment IV | Technology Research Preparation 32144 4. Literature Review and Related Work There many researches have been done on spammer detection or spammer mining. Few of the researches focus on the spam identification which helps to identify the real or fake post on social website. Furthermore, the researches were continued and help to identify the spammer with the help of tags and bookmarks which may be auto reproduced or reproduce and rereproduce by others. Some of them specify the intention such as popularity, likelihood, click of the spammer behind posting contents on social website with those tags. Moreover, again few of them are intended to earn money on clicks. Following is the related work done by some researcher in Spammer mining and detection area: 4.1. Fighting spam on social website This research article introduces spam and spammer a survey approaches and future challenges. This article surveys three categories. Detection, Demotion and prevention which proposed before for email and web spam for counter measures which help to this survey as well. The aim of the research is to evaluate the spam countermeasures for social websites and their future challenges. It compares and contrasts the email spam with web spam which finds the key characteristics which relate to spam (Heymann P., 2007). Efficient analysis of the research shows that the characteristics substantially change the dynamic of the adversarial relationship between service provider and spammer. Furthermore, the in the example of social bookmarking system shows that, there is only one interaction for contents creation, consisting of single action where user may post bookmark by himself and a bookmark may consist of ULR and optional list of tags describe that URL. Research categorises and display in one or more interfaces such as user bookmark, tag bookmark, most recent list, most popular list and tag cloud. Assignment IV | Technology Research Preparation 32144 Moreover, the identification-based method help in detection of spam which can done manually or pattern based. Rank-based strategy helps in demotion for combining spam to reduce the prominence of content likely to be spam. Lastly, interface- or limit-based help in prevention method which tries to make contribution spam content which is difficult to track. 4.2. Framework for unsupervised spam detection in social networking site This research article process the submitted user spam report on social website for spam detection. This paper present the framework based on HITS web link analysis framework and its instantiated models. The models subsequently introduce propagation between messages reported by the same user, messages authored by the same user and messages with similar content. (Bosma M, et. al , 2010). Research aim explained a present framework that indicates the likelihood of message being spam based on user spam report. Researcher believes that, an unsupervised framework as introduced above is more suitable for identifying spam messages than static supervised classifier. Unsupervised framework dose not depends on contents of the spam and can therefore detect the new type of spam. The proposed solution find the framework based on reports alone, reports and authors, and reports authors and similar messages which al improve spam ranking performance. Performance also improves the similarity-author-reporter model performance which is used in research. The Proposed model divided above three models specified. In my conclusion, this research explained a new process model that explains the submitted user spam messages for unsupervised spam detection. The quality of research is really good and I would like to refer this article in my research. The literature review done in this research referring with a high quality researches. Assignment IV | Technology Research Preparation 32144 4.3. Social Spam Detection This research discussed the motivation of social spam and presents the report of automatic detection of spammer in social tagging system. The six different features helps to achieve the 98% accuracy in detecting social spammer. The research restrict up to only social bookmarking system. These system are called broad folksonomies that is user provide annotations of contents that is external to the bookmarking system. Also the success of social bookmarking system and the large communities they bind make them an attractive target for spamming (Markines B. et. al., 2009) As per the findings, the process plays an important role. TagSpam help to share prevalent vocabulary to annotate resources, because spammer may use tags or combination of tags for the post. TagBlur which track number of unrelated post and links in high frequency tags. Such as DomFp for document structure, NumAds for search created pages for solo purpose, Plagiarism for used copyrighted contents and finally ValidLinks which are created for spam or malicious activity. In my conclusion, this research achieved its aim at very high level. I would like to refer this article in my research. The ratio of 98% success is very high in research are of detecting spammer using social bookmarking system. 4.4. Mining for geographically disperse communities in social networks There are various data mining techniques which are extensively used in criminal analysis and cybercrime investigation today. It interprets data and help to understand the behavioral pattern of the crime. The research explained about the discovery of communities that are geographically disperse stems from the requirement to identify higher-level organizational structures, such as a logistics group that provides support to various geographically disperse terrorist cells. This Assignment IV | Technology Research Preparation 32144 research applied a variant of Newman-Girvan modularity to this problem known as distance modularity. They have modified the Louvain algorithm to find partitions of networks that provided near-optimal solutions to this quantity. Then it was applied algorithm to numerous samples from two real-world social networks and a terrorism network data set whose nodes have associated geospatial locations. The research experiments show that, to be an effective approach and highlight various practical considerations when applying the algorithm to distance modularity maximization (Shakarian et. al., 2013). There are some of the researches which were successful to discover the use of data mining in crime investigation process. To find the terrorist on social network is challenging task. This research gave one direction for crime investigation on social network. The quality of research is very high and currently in use for social network analysis and crime investigation in U.S. army. I would certainly like to refer this article in my research which provided me direction to find the geographically disperse community on social network. It will help me to guide and find the next targeted location of the specific spammer. The research has done good quality of literature review and provided 21 references. 4.5. Crime pattern detection and collaborative mining Same as automatic system, it is possible find the co-offenders network links for crime pattern detection by use of clustering techniques (Nath S. V., 2006). This research detects the geo-special area of crime pattern. Criminal group discovery can reveal from multiple social network data by the use of association mining and statistical method (Fard, A. M. and M. Ester, 2009). Crime pattern detection and collaborative mining in social network data discover criminal group became easy with this research study. This explores the different types of crime patterns, that which help to detect the group of criminals on a social network website. I would certainly Assignment IV | Technology Research Preparation 32144 like to refer this research article in my report as it focuses on geo-special are as well as specific crime pattern which I am looking forward in spammer detection and their behaviour. Assignment IV | Technology Research Preparation 32144 5. Develop intuitive techniques - new pattern making Develop intuitive techniques for cybercrime investigation is one of the challenging researches in data mining. With the combination of few old data mining techniques and in addition of new, process and pattern detection, it is possible to find the new methodology for crime and cybercrime investigation. Following process will be followed for this research: First, text mining can detect the tags, bookmarks, name, address and even more information about the criminal. Secondly, article surveyed three categories of potential countermeasures, which is used for survey of approaches can help in tags and bookmark recognition on social network website. It can also help to trace out the unregistered tags which in included in the post of social website which bring the result of fake and real post and its reproductions. This research is to identify the Spammer Behavior and its NEXT MOVE. The process involves the identification of pattern of creating spam by spammer which helps to track that what he or she may going to developed next and their targets. Assignment IV | Technology Research Preparation 32144 5.1. Behaviour that Track NEXT MOVE Spam can be made by different techniques and methods. Some of them are really simple. Tags and bookmarks are most common methods and which involves many techniques used by spammer to send spam messages over the internet and on social website for different intentions. This could involve likelihoods of self or the content posted, blogger usage to make money, embedded content in spam post to make some fun or intention to steal contents or data of other users who access the spam message unknowingly or without knowledge, post contents which comprises reproduction, auto-reproduction and re-reproduction embedded programmed to distribute the contents all over the world through social network website. This research focuses on the tags which are used by spammer and their reproduction frequency which helps to track the spammer behaviour and its future target. The research first examines the one of the spammer and contents posted by him or her which let us know about his or her ability to program the spam message and post on social network. With the help of that specific spammer results, research helps to trace the spammer tags and combination of tags used in spam by that spammer. This could be get done by the past research methods developed by the other researcher. Furthermore, after gathering the tags data we can process through text mining technique to find the pattern and geographical location used by the spammer for spam attack. Text mining techniques help to find the specific patter in normalized and extracted data from the previous phase which help to find the behaviour pattern of the spammer behind the spam post. Assignment IV | Technology Research Preparation 32144 5.2. Research Methods The behaviour pattern includes finding as follows: Tags and combination of the tags used by spammer Geographical location of the spammer Geographical location of spam post access which trail the targeted audience. Social network website used for the each post Method(s) which developed behind each spam post Developed method(s) and its pattern of creative thinking Assignment IV | Technology Research Preparation 32144 Assignment IV | Technology Research Preparation 32144 5.3. Use of concept Survey of Approach In my research I prefer to use the concept survey of approached used by Paul Heymann methodology (2007) with little modification. Firstly, research study extracts the contents related to spam from available dataset. Secondly, it will identify the relevant content but using text mining techniques which help to find the tags, geo location and targeted audience. Lastly, research will help to trace the behaviour of the spammer and his or her NEXT MOVE. 5.4. Objectives 5.4.1. Main objective 1. To identify ways to improve the spammer mining techniques to develop the behavioral pattern of the spammer and spam detection. 5.4.2. Sub-objectives 1. To examine the tag and combination of tag creation techniques. 2. To examine the targeted audience pattern. 3. To examine the method(s) developed or designed behind the each spam post. 4. To Identify the NEXT MOVE of the spammer and his or her pattern 5.5. Research Contribution My research area will contribute to the: 1. Social network website to understand the behaviour of the spammer and their posts. 2. Cybercrime investigation departments able to track the spammer NEXT MOVE. 3. Spammer mining and Social Network Analysis for development of new techniques and methods. Assignment IV | Technology Research Preparation 32144 4. Also, it helps to understand the behavioral pattern of making spam messaged by each spammer. Assignment IV | Technology Research Preparation 32144 6. Conclusion So many things are already invented in data mining and crime investigation. Crime investigation processes need to redevelop and need to be research for new techniques, process and methods to track the criminals and stop the crime event before it happens. Again, it is hard to detect the future event based on the available data set and criminal information. However, researcher cannot stop at this place with the techniques that are available. Everyday new crime and new crime pattern appear. Criminal post online and vanish with in a blink of eye after crime activity perform. To notice such incident need to update the system of crime pattern detection almost every day. Spammer can be traced out by developing different techniques via programming of super human but to identify the thinking pattern of individual is hard. It is really hard to find what an individual spammer going to do next. Furthermore, the action takes place by spammer by his or her own developed continued crime pattern. On the other hand, it is not impossible to identify the individual thinking what the individual or individual spammer is going to do next. It just need the step-by-step evaluation of process and normalized data and data pattern such as tag and combination of tags used in making of spam. Moreover, this research believes if find the behaviour of the thinking pattern of the spammer then we can also track down the spammer NEXT MOVE. Things are already invented, we just have to re-search and re-create. Techniques and methods are already available. We just have to mould it with the real human mind. Use of behaviour and thinking of the criminal not only help to prevent the cyber-attacks on social network websites but also to track its NEXT MOVE. Assignment IV | Technology Research Preparation 32144 7. References Alguliev, R. M., R. M. Aliguliyev, Nazirova S. A., 2011, 'Classification of Textual E-Mail Spam Using Data Mining Techniques', Applied Computational Intelligence and Soft Computing, Institute of Information Technology of Azerbaijan National Academy of Sciences, 9 F, Agayev Street, Baku 1141, Azerbaijan Benevenuto, F., G. Magno, Rodrigues T., Almeida V., 2010, Detecting spammers on twitter, Seventh annual Colabaration, Anti aAbuse and Spam Conference. Benevenuto, F., T. Rodrigues, Almeida V., Almeida Jussara, Zhang C., Ross K., 2008, 'Identifying video spammers in online social networks', Proceedings of the 4th international workshop on Adversarial information retrieval on the web, Beijing, China, pp. 45-52. Bosma, M., E. Meij, Weerkamp W., 2012, 'A Framework for Unsupervised Spam Detection in Social Networking Sites', Advances in Information Retrieval, Springer Berlin Heidelberg, pp. 364-375. Benevenuto, F., T. Rodrigues, Almeida V., Almeida Jussara, Almeida J., Goncalves M., 2009, 'Detecting spammers and content promoters in online video social networks', Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Boston, MA, USA, pp. 620-627. Fard, A. M. and M. Ester, 2009, 'Collaborative Mining in Multiple Social Networks Data for Criminal Group Discovery', Computational Science and Engineering, 2009. CSE '09. Lee, K., J. Caverlee, Kamath K. Y. Cheng Z., 2012, 'Detecting collective attention spam', Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. Lyon, France, pp. 48-55. Liu, J.-Y., Y.-X. Zhao, Wang Y-H, Hu L., 2012, 'Spam Short Messages Detection via Mining Social Networks', Journal of Computer Science and Technology, Vol- 27, pp. 506-514. Markines, B., C. Cattuto, Menczer F., 2009, 'Social spam detection', Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, Madrid, Spain, pp. 41-48. Paul, H., 2007, 'Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges, Vol-11, pp.36-45. Shakarian, P., P. Roos, Calahan D., 2013, 'Mining for geographically disperse communities in social networks by leveraging distance modularity', Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, Chicago, Illinois, USA, pp. 1402-1409. Shwu-Min, H., 2009, 'The Behavior and Preferences of Users on Web 2.0 Social Network Sites: An Empirical Study', Information Technology: New Generations. Assignment IV | Technology Research Preparation 32144 Shyam Varan, N., 2006, 'Crime Pattern Detection Using Data Mining', Web Intelligence and Intelligent Agent Technology Workshops, 2006, WI-IAT Workshops, IEEE/WIC/ACM. Stafford, G. and L. L. Yu, 2013, 'An Evaluation of the Effect of Spam on Twitter Trending Topics', Social Computing (SocialCom). Tsigkas, O., O. Thonnard, Tzovaras D., 2012, 'Visual spam campaigns analysis using abstract graphs representation', Proceedings of the Ninth International Symposium on Visualization for Cyber Security', Seattle, Washington, pp. 64-71. Wang, A., 2010, 'Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach', Data and Applications Security and Privacy XXIV, Springer Berlin Heidelberg. Vol6166, pp. 335-342. Wei, C., A. Sprague, Warner G., Skjellum A., 2008, 'Mining spam email to identify common origins for forensic application', Proceedings of the ACM symposium on Applied computing. Fortaleza, Ceara, Brazil, pp.1433-1437. Yardi, S., D. Romero, Romero D., Schoenebeck G., Boyd D. 2009, 'Detecting spam in a Twitter network', vol 15, http://firstmonday.org/ojs/index.php/fm/article/view/2793/2431. Assignment IV | Technology Research Preparation 32144