Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-4, Issue-ICCIN-2K14, March 2014 ICCIN-2K14 | January 03-04, 2014 Bhagwan Parshuram Institute of Technology, New Delhi, India An Analysis of Telecommunication Fraud using Outlier Detection Model based on Similar Coefficient Sum Daya Gupta, Payal Pahwa, Rajiv Arora Abstract— In modern day, new technologies have a great impact on people’s life. This has resulted in the increase of frauds in today’s technological environment. Service providing companies such as telecommunication companies, banking companies etc. suffer from financial losses due to the customers’ fraudulent behaviors. Telecommunication fraud not only financially effects the company but also effects the individual customers as well. So detection of fraud is very important in today’s life. In this paper, we have identified the customer’s fraudulent behaviour using outlier detection model based on similar coefficient sum. In this approach, frauds are detected by computing coefficient sum of every two objects. This method has also been implemented in Mat Lab R2010a and the obtained results show the feasibility and validity of the method. Keywords—Data Mining; Coefficient Sum; Outlier. Fraud Detection; Similar I. INTRODUCTION Fraud can be referred to as the particular actions of employment of any services without being charged intention as in [1]. A deception made intentionally for selfish gains or to damage other individuals is also called a fraud as in [2]. This definition of fraud may vary according the legal jurisdiction it lies in. It is a violation of the civil law and is a criminal offence. Common purposes of fraud include the defrauding people of their valuables and money but some fraud to gain prestige and respect rather than any monetary gain like in some science fields. The scale in which frauds are happening have risen over the years to about 5% of the annual revenue in a typical organization averaging the median loss to $160,000 [3]. The frauds done by the owners and executives are most costly than the employee frauds by around 9 times. These frauds usually occur in the following industries: banking, manufacturing, government and telecommunications. One of the major impact of frauds is on the telecommunications industry. It suffers losses annually amounting to billion of US dollars because of the fraud in the networks as in [4] [5] [6] [7] [8] [9]. Not only financially, a fraud result in loss of services, distress and reduces the confidence in customers [9]. Frauds have also considerably effected the revenues of the network operators by making losses of about 2 to 6 percent of its total revenue. It is also pointed out by [10] that it is difficult to provide the exact estimates because some frauds can never be detected and the network operators don’t wish to release the figures on fraud losses publically. Manuscript received May 2014 Daya Gupta, Computer Engineering Department Delhi Technological University, New Delhi, India Payal Pahwa, Computer Engineering Department, BPIT, Guru Gobind Singh Indraprastha University New Delhi, India Rajiv Arora, Research Scholar, Computer Engineering Department, Delhi Technological University New Delhi, India 17 The increasing competition in the telecommunication industry and the rise in the losses [6] has resulted in the fraud has escalated from being a problem that the network operators could bear to one that is dominates trade and general press majorly as in [7]. Hence, our concern is to deal with the frauds in telecommunication industry. Various researchers have proposed various techniques like classification techniques based on user profiling [11], rule based methods [12] [13], using expert knowledge based on some threshold [14] etc. for detection of frauds. Outlier mining is a well known technique used for pattern recognition in data mining. A pattern which doesn’t behave normally is defined as an outlier as in [15]. Defining a region representing normal behavior and declaring any observation in the data that is not belonging in the region as an outlier is a straight forward approach to detect outliers. Outlier mining has the following advantages over the mining methods based on statistics: First, this method can effectively find outliers, unknown to the distribution of data set. Second it overcomes the shortcoming that outlier mining doesn’t require prior knowledge of the distribution of laws for constructing of probabilistic model based. This paper investigates the technique of outlier mining for finding the frauds in telecommunication. Since outlier mining discards the irrelevant transactions and hence efficiently detect fraudulent behavior. We show the results on a data set of telecommunication. Thereby outlier mining increases the success rate by 90%. The structure of paper is as follows: A brief introduction of literature review is given in section II. Similar Coefficient Sum approach is discussed in Section III. The implementation details and results are discussed in section IV. Section V is the concluding section. II. RELATED WORK Although the research on fraud and its detection has been carried out for many years and telecommunication companies have spent relatively more money on this than the research community, still their efforts do not reach beyond the limit of the companies and have not been made accessible by the public research community. The various faces of digital transmission of wireless communication have been discussed in [17]. Tumbling is the vulnerability of wireless communication to a wireless fraud. [17] Keeping in mind that the fraud could allow the fraudsters to access and steal telephone services and digital technology. This work has also mentioned about clone frauds and attempts to stop cloners. Increase in the incidents of phone frauds with corporation and telecommunication companies are reported in [11]. It has talked about the alliance formed to stop phone frauds, Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd. An Analysis of Telecommunication Fraud using Outlier Detection Model based on Similar Coefficient Sum preventive measures being taken by the customers and telephone companies to fight fraud and the foil the cloning of the cell phone. The use of knowledge-based approach to examine the call records obtained from cellular switches in real time is considered in [14]. They affirm the use of uniform thresholds to all of a carrier’s subscribers effectively forces comparison against an imaginary subscriber. However, they decided to model every subscriber individually and let the subscribers’ profile to be adaptive in time. They also use knowledge about the usual fundamental behavior, for example, doubtful destination numbers. The analysis component present in them finds out if the alarm, taken together, is good enough to be reviewed by a human analyst. Hence the system is credited with the ability to detect frauds swiftly and helps to analysts to concentrate on the most dangerous and most likely to be frauds. The authors have reported their first attempt to detect frauds in a database of stimulated calls as in [18]. A supervised feedforward neural network is used to detect any anomalous use. There are six unique user types simulated stochastically according to the users’ calling patterns. Two set of features are derived from this data, the first one shows the recent use and the other one shows the long term behavior. They both are accumulated statistics of call data over time windows of unique lengths. The input to the neural network is this data. Classifier’s performance is estimated to be 92.5% on the test data, which has limited value in view of the simulated data and a need to give the class specific estimates on accuracy arises. This has been reported in [19] Rule-based methods of detecting frauds are presented in [12] [13]. Adaptive rule sets are used by the authors to uncover the indicators of fraudulent behavior from a database of cellular calls. Profiles are made using these indicators, which are then used as the features to a system combining the evidence from multiple profilers to generate alarms. Rule selection is also used to choose a set of rules that cover larger sets of fraudulent cases. These rules further constitute monitors, which are in turn pruned by a feature selection methodology. Output of these monitors is weighted together by a learning, linear threshold unit. The results are then examined with a cost model in which misclassification cost is proportional to the time. User profiling and classification techniques for the detection of frauds in mobile communication networks are presented in [20]. The identification of relevant user groups on the basis of call data is reported by the author: each user assigned to the relevant group. Neural networks and the probabilistic models are used by him to learn the usage patterns from the call data. There is an attempt by the author to promote the dynamic modeling of the behavioral pattern for detecting frauds. All the research papers as identified above do not use the technique of outlier mining on the available data. As already discussed if this technique is used we can easily found the outliers without prior knowledge about the data sets. Hence, this distinctively differentiates our work from already existing ones and clearly defines the central focus of this paper. III. FRAUD DETECTION BASED ON OUTLIER MINING The problem of finding patterns in data that deviate from normal behavior is called Outlier Detection in [21]. 18 According to the application domains, these patterns are named e.g. Outliers, anomalies, exceptions, discordant observations, novelties or noise. The fact that the results can lead to critical information on the basis of which we can take an action makes the process of outlier detection important. Similar Coefficient Sum is an example of an outlier mining technique. A. Similar Coefficient Sum The concept of similar coefficient sum is initiated by Jiang Lingmin [22]. This method judges whether it is outlier or not according to the similar coefficient sum-based outlier detection algorithm and it is described as follows [22]: Let X = [x1, x2, x3… xn} be a set of objects to be checked, every object with m indexes, that is, xi = [xi1, xi2, xi3… xim} Using Data Matrix will be presented as follows: Now, outlier sets of n objects are required. Before estimating the dispersion degree of objects among X, similar coefficient rij between every object is computed first, and then the similar coefficient matrix, that is pi is the sum of the ith line in similar coefficient matrix, the smallest, the furthest between object i and other objects. That means object i is candidate item of outlier set. λ is the threshold, and all object which have λi>= λ are considered as outlier set. It means if value of particular object is greater than the threshold value then that object is considered as an outlier and is a fraud. Also when the value of object less than threshold then is not considered to be a fraud. Design process of our process is discussed in next sub section. B. Design Process of Our Appraoch The design process of our approach of finding the outlier based on similar coefficient sum is as follows: First we clean the raw call detail records (CDR) on the basis of name using data cleansing algorithm as discussed in [23] so that we find the duplicity in data set and then we apply the outlier mining approach similar coefficient sum on individual customer records and the performance is evaluated using some performance indicators as discussed in [24]. True Positive (TP) represents the positive tuples (tuples of main class of interest) that are correctly detected as fraud, whereas False Positive (FP) represents the negative tuples (tuples of class of non interest) that are incorrectly detected as fraud. True Negative (TN) represents the negative tuples that are correctly detected as fraud whereas False Negative (FN) represents the positive tuples that are incorrectly detected as fraud. True positive rate represents the proportion of positive tuples that are Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd. International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-4, Issue-ICCIN-2K14, March 2014 ICCIN-2K14 | January 03-04, 2014 Bhagwan Parshuram Institute of Technology, New Delhi, India correctly identified. False positive represents the proportion of positive tuples that are incorrectly identified. The accuracy or the success rate represents the number of correct instances divided by the total number of instances. The process is shown in Fig.1 below: 4% 6% TP 50% TN 40% FP FN Raw Call Details Data (CDR) Fig.3 TP, TN, FP, FN of above Data Sets Data is cleaned using [23] algorithm 100 50 0 Outlier mining approach is applied on individual customer record 89 90 9 TP Rate FP Rate Success Rate Performance is evaluated Series1 Fig.1 Design Process of our Approach IV. EXPERIMNET AND RESULT ANALYSIS V. CONCLUSIONS We have implemented the described approach using MatLab R2010. The telecommunication companies do not, in general, agree to share the data sets of their customers with researchers, therefore we have taken the datasets available on the internet. But this data doesn’t contain the fraudulent entries. So we introduce some fraudulent data with the help of experts of the private sector telecommunication industry. We have cleaned the data on the basis of name using cleansing algorithm which is described in [23]. We divide the datasets customer wise by using IF-then-Else classifiers and this is shown in fig.2. We have taken 50 entries of 20 customers for evaluating our approach. We have shown few entries and few attributes in the screen starts due to space limitation. After data cleansing, we apply the outlier mining technique similar coefficient sum on individual customers. If we look at Fig.2, we see that third tuple detect as outliers. We evaluate our approach using some performance indicators as discussed above. We have proposed an outlier mining approach based on similar coefficient sum for finding frauds in telecommunication industry. Outlier mining effectively finds outliers, unknown to the distribution of data set. This technique detect the abnormal behavior of individual customer on the basis of their past call details in an efficient manner. The manual intervention in the proposed algorithm is highly negligible resulting in high degree of automation and accuracy. We have implemented this technique in Mat Lab R2010a. REFERENCES [1]. [2]. [3]. [4]. TABLE I CUSTOMER WISE DATSET [5]. Name Called Number Calling Number [6]. Duration [7]. N1 10612899 68373748102 2 N1 612283725 68373748102 43 [9]. N1 613069656 68373748102 675 [10]. N1 613481951 68373748102 33 [11]. [8]. [12]. The values of TP, TN, FP and FN are graphically fig. 3. The TP Rate, FP Rate and Success Rate are shown in fig. 4. For better efficiency, TP Rate should be high and FP Rate must be low. For TP Rate high, TP Must be high and TN & FP must be low. [13]. 19 Investigating Fraudulent Acts, University Of Houston System Administrative Memorandum”, http://www.uhsa.uh.edu/samiAM/01C04.htm, 2000. http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Fraud.ht ml L.Cortesão, F.Martins, A.Rosa, P. Carvalho,” Fraud Management Systems in Telecommunications: a practical approach” ICT, 2005. Davis, A. B. and S. K. Goyal (1993). Management of cellular fraud: Knowledge-based detection, classification and prevention. In Proceedings of the 13th International Conference on Artificial Intelligence, Expert Systems and Natural Language, Avignon, France, Volume 2,pp. 155–164. Johnson, M. (1996). Cause and effect of telecoms fraud. Telecommunication (International Edition) 30 (12), 80–84. Parker, T. (1996). The twists and turns of fraud. Telephony 231 (supplement issue), 18– 21. O’Shea, D. (1997). Beating the bugs: Telecom fraud. Telephony 232(3), 24. Pequeno, K. A. (1997). Real-time fraud detection: Telecom’s next big step. Telecommunications (Americas Edition) 3 1(5), 59–60. Hoath, P. (1998). Telecoms fraud, the gory details. Computer Fraud & Security 20 (1), 10–14. Barson, P., S. Field, N. Davey, G. McAskie, and R. Frank (1996).The detection of fraud in mobile phone networks. Neural Network World 6(4), 477–484. Phone fraud: The battle rages on. Communication News, 33(8):8, 1996. T. Fawcett and F. Provost. Combining data mining and machine learning for effective user profiling. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of the 13th International Conference on Machine Learning, pages 8–13, 1996. T. Fawcett and F. Provost. Adaptive fraud detection. Journal of Data mining and Knowledge Discovery, 1(3):291–316, 1997. Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd. An Analysis of Telecommunication Fraud using Outlier Detection Model based on Similar Coefficient Sum [14]. A.B. Davis and S. K. Goyal. Management of cellular fraud: Knowledge based detection, classification and prevention. In Proceedings of the 13th International Conference on Artificial Intelligence, Expert Systems and Natural Language, Vol. 2, pages 155–164, Avignon, France, 1993. [15]. Ben-Gal I., Outlier detection, In: Maimon O. and Rockach L. (Eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers," Kluwer Academic Publishers, 2005, ISBN 0-387-24435-2. [16]. H. Kvarnstrom, E. Lundin, and E. Jonsson. Combining fraud and intrusion detection – meeting new requirements –. In Proceedings of the 5th Nordic Workshop on Secure IT systems (NordSec2000), Reykjavik, Iceland, October 2000. [17]. R. A. Shaffer. Good guys, bad guys and digital. Forbes, 154(4):122, 1994. [18]. P. Barson, S. Field, N. Davey, G. McAskie, and R. Frank. The detection of fraud in mobile phone networks. Neural Network World, 6(4):477–484, 1996. [19]. S. Field and P. Hobson. Techniques for telecommunications fraud management. In J. Alspector, R. Goodman, and T. X. Brown, editors, Proceedings of International Workshop on Applications of Neural Networks to Telecommunications, Vol. 3, pages 107– 115, New Jersey, USA, 1997. Lawrence Erlbaum. [20]. J. Hollmen. User Profiling and Classification for Fraud Detection in Mobile Communication Networks. PhD thesis, Helsinki University of Technology, Department of Cognitive and Computer Science and Engineering, 2000. Espoo, Finland. [21]. Victoria J. Hodge and Jim Austin,” A Survey of Outlier Detection Methodologies”, Kluwer Academic Publishers, 2004. [22]. Jiang Lingmin. Clustering Algorithm to Check Outlier Based on Similar Coefficient Sum. Computer Engineering. Vol.29, Nov, 2003, pp.183-185. [23]. Arora R, Pahwa P., Bansal S.,” Alliance Rules for Data Warehouse Cleansing”, International Conference on Signal Processing Systems IEEE, pages 743-747, 2009. [24]. Han J., Kamber M.,” Data Mining: Concepts and Techniques”, 2nd ed. ISBN 1-55860-901-6 Morgan Kaufmann Publishers, March 2006. 20 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd.