Download An Analysis of Telecommunication Fraud using Outlier Detection Model based on Similar Coefficient Sum

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
International Journal of Soft Computing and Engineering (IJSCE)
ISSN: 2231-2307, Volume-4, Issue-ICCIN-2K14, March 2014
ICCIN-2K14 | January 03-04, 2014
Bhagwan Parshuram Institute of Technology, New Delhi, India
An Analysis of Telecommunication Fraud using
Outlier Detection Model based on Similar
Coefficient Sum
Daya Gupta, Payal Pahwa, Rajiv Arora
Abstract— In modern day, new technologies have a great
impact on people’s life. This has resulted in the increase of
frauds in today’s technological environment. Service providing
companies such as telecommunication companies, banking
companies etc. suffer from financial losses due to the customers’
fraudulent behaviors. Telecommunication fraud not only
financially effects the company but also effects the individual
customers as well. So detection of fraud is very important in
today’s life. In this paper, we have identified the customer’s
fraudulent behaviour using outlier detection model based on
similar coefficient sum. In this approach, frauds are detected
by computing coefficient sum of every two objects. This
method has also been implemented in Mat Lab R2010a and the
obtained results show the feasibility and validity of the method.
Keywords—Data Mining;
Coefficient Sum; Outlier.
Fraud
Detection;
Similar
I. INTRODUCTION
Fraud can be referred to as the particular actions of
employment of any services without being charged
intention as in [1]. A deception made intentionally for
selfish gains or to damage other individuals is also called a
fraud as in [2]. This definition of fraud may vary according
the legal jurisdiction it lies in. It is a violation of the civil
law and is a criminal offence. Common purposes of fraud
include the defrauding people of their valuables and money
but some fraud to gain prestige and respect rather than any
monetary gain like in some science fields.
The scale in which frauds are happening have risen over the
years to about 5% of the annual revenue in a typical
organization averaging the median loss to $160,000 [3].
The frauds done by the owners and executives are most
costly than the employee frauds by around 9 times. These
frauds usually occur in the following industries: banking,
manufacturing, government and telecommunications.
One of the major impact of frauds is on the
telecommunications industry. It suffers losses annually
amounting to billion of US dollars because of the fraud in
the networks as in [4] [5] [6] [7] [8] [9]. Not only
financially, a fraud result in loss of services, distress and
reduces the confidence in customers [9]. Frauds have also
considerably effected the revenues of the network operators
by making losses of about 2 to 6 percent of its total revenue.
It is also pointed out by [10] that it is difficult to provide
the exact estimates because some frauds can never be
detected and the network operators don’t wish to release
the figures on fraud losses publically.
Manuscript received May 2014
Daya Gupta, Computer Engineering Department Delhi Technological
University, New Delhi, India
Payal Pahwa, Computer Engineering Department, BPIT, Guru Gobind
Singh Indraprastha University New Delhi, India
Rajiv Arora, Research Scholar, Computer Engineering Department,
Delhi Technological University New Delhi, India
17
The increasing competition in the telecommunication
industry and the rise in the losses [6] has resulted in the
fraud has escalated from being a problem that the network
operators could bear to one that is dominates trade and
general press majorly as in [7].
Hence, our concern is to deal with the frauds in
telecommunication industry.
Various researchers have proposed various techniques like
classification techniques based on user profiling [11], rule
based methods [12] [13], using expert knowledge based on
some threshold [14] etc. for detection of frauds. Outlier
mining is a well known technique used for pattern
recognition in data mining. A pattern which doesn’t behave
normally is defined as an outlier as in [15]. Defining a
region representing normal behavior and declaring any
observation in the data that is not belonging in the region as
an outlier is a straight forward approach to detect outliers.
Outlier mining has the following advantages over the
mining methods based on statistics: First, this method can
effectively find outliers, unknown to the distribution of data
set. Second it overcomes the shortcoming that outlier
mining doesn’t require prior knowledge of the distribution
of laws for constructing of probabilistic model based.
This paper investigates the technique of outlier mining for
finding the frauds in telecommunication. Since outlier
mining discards the irrelevant transactions and hence
efficiently detect fraudulent behavior. We show the results
on a data set of telecommunication. Thereby outlier mining
increases the success rate by 90%.
The structure of paper is as follows:
A brief introduction of literature review is given in section
II. Similar Coefficient Sum approach is discussed in
Section III. The implementation details and results are
discussed in section IV. Section V is the concluding
section.
II. RELATED WORK
Although the research on fraud and its detection has been
carried out for many years and telecommunication
companies have spent relatively more money on this than
the research community, still their efforts do not reach
beyond the limit of the companies and have not been made
accessible by the public research community.
The various faces of digital transmission of wireless
communication have been discussed in [17].
Tumbling is the vulnerability of wireless communication to
a wireless fraud. [17] Keeping in mind that the fraud could
allow the fraudsters to access and steal telephone services
and digital technology. This work has also mentioned about
clone frauds and attempts to stop cloners.
Increase in the incidents of phone frauds with corporation
and telecommunication companies are reported in [11]. It
has talked about the alliance formed to stop phone frauds,
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
An Analysis of Telecommunication Fraud using Outlier Detection Model based on Similar Coefficient Sum
preventive measures being taken by the customers and
telephone companies to fight fraud and the foil the cloning
of the cell phone.
The use of knowledge-based approach to examine the call
records obtained from cellular switches in real time is
considered in [14]. They affirm the use of uniform
thresholds to all of a carrier’s subscribers effectively forces
comparison against an imaginary subscriber. However, they
decided to model every subscriber individually and let the
subscribers’ profile to be adaptive in time. They also use
knowledge about the usual fundamental behavior, for
example, doubtful destination numbers. The analysis
component present in them finds out if the alarm, taken
together, is good enough to be reviewed by a human
analyst. Hence the system is credited with the ability to
detect frauds swiftly and helps to analysts to concentrate on
the most dangerous and most likely to be frauds. The
authors have reported their first attempt to detect frauds in a
database of stimulated calls as in [18]. A supervised feedforward neural network is used to detect any anomalous use.
There are six unique user types simulated stochastically
according to the users’ calling patterns. Two set of features
are derived from this data, the first one shows the recent
use and the other one shows the long term behavior. They
both are accumulated statistics of call data over time
windows of unique lengths. The input to the neural network
is this data. Classifier’s performance is estimated to be 92.5%
on the test data, which has limited value in view of the
simulated data and a need to give the class specific
estimates on accuracy arises. This has been reported in [19]
Rule-based methods of detecting frauds are presented in
[12] [13]. Adaptive rule sets are used by the authors to
uncover the indicators of fraudulent behavior from a
database of cellular calls. Profiles are made using these
indicators, which are then used as the features to a system
combining the evidence from multiple profilers to generate
alarms. Rule selection is also used to choose a set of rules
that cover larger sets of fraudulent cases. These rules
further constitute monitors, which are in turn pruned by a
feature selection methodology. Output of these monitors is
weighted together by a learning, linear threshold unit. The
results are then examined with a cost model in which
misclassification cost is proportional to the time.
User profiling and classification techniques for the
detection of frauds in mobile communication networks are
presented in [20]. The identification of relevant user groups
on the basis of call data is reported by the author: each user
assigned to the relevant group. Neural networks and the
probabilistic models are used by him to learn the usage
patterns from the call data. There is an attempt by the
author to promote the dynamic modeling of the behavioral
pattern for detecting frauds.
All the research papers as identified above do not use the
technique of outlier mining on the available data. As
already discussed if this technique is used we can easily
found the outliers without prior knowledge about the data
sets. Hence, this distinctively differentiates our work from
already existing ones and clearly defines the central focus
of this paper.
III. FRAUD DETECTION BASED ON OUTLIER
MINING
The problem of finding patterns in data that deviate from
normal behavior is called Outlier Detection in [21].
18
According to the application domains, these patterns are
named e.g. Outliers, anomalies, exceptions, discordant
observations, novelties or noise. The fact that the results
can lead to critical information on the basis of which we
can take an action makes the process of outlier detection
important. Similar Coefficient Sum is an example of an
outlier mining technique.
A. Similar Coefficient Sum
The concept of similar coefficient sum is initiated by Jiang
Lingmin [22]. This method judges whether it is outlier or
not according to the similar coefficient sum-based outlier
detection algorithm and it is described as follows [22]:
Let X = [x1, x2, x3… xn} be a set of objects to be checked,
every object with m indexes, that is, xi = [xi1, xi2, xi3…
xim}
Using Data Matrix will be presented as follows:
Now, outlier sets of n objects are required. Before
estimating the dispersion degree of objects among X,
similar coefficient rij between every object is computed
first, and then the similar coefficient matrix, that is
pi is the sum of the ith line in similar coefficient matrix, the
smallest, the furthest between object i and other objects.
That means object i is candidate item of outlier set.
λ is the threshold, and all object which have λi>= λ are
considered as outlier set.
It means if value of particular object is greater than the
threshold value then that object is considered as an outlier
and is a fraud. Also when the value of object less than
threshold then is not considered to be a fraud. Design
process of our process is discussed in next sub section.
B. Design Process of Our Appraoch
The design process of our approach of finding the outlier
based on similar coefficient sum is as follows:
First we clean the raw call detail records (CDR) on the
basis of name using data cleansing algorithm as discussed
in [23] so that we find the duplicity in data set and then we
apply the outlier mining approach similar coefficient sum
on individual customer records and the performance is
evaluated using some performance indicators as discussed
in [24]. True Positive (TP) represents the positive tuples
(tuples of main class of interest) that are correctly detected
as fraud, whereas False Positive (FP) represents the
negative tuples (tuples of class of non interest) that are
incorrectly detected as fraud. True Negative (TN)
represents the negative tuples that are correctly detected as
fraud whereas False Negative (FN) represents the positive
tuples that are incorrectly detected as fraud. True positive
rate represents the proportion of positive tuples that are
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
International Journal of Soft Computing and Engineering (IJSCE)
ISSN: 2231-2307, Volume-4, Issue-ICCIN-2K14, March 2014
ICCIN-2K14 | January 03-04, 2014
Bhagwan Parshuram Institute of Technology, New Delhi, India
correctly identified. False positive represents the proportion
of positive tuples that are incorrectly identified. The
accuracy or the success rate represents the number of
correct instances divided by the total number of instances.
The process is shown in Fig.1 below:
4%
6%
TP
50%
TN
40%
FP
FN
Raw Call Details Data (CDR)
Fig.3 TP, TN, FP, FN of above Data Sets
Data is cleaned using [23]
algorithm
100
50
0
Outlier mining approach is applied
on individual customer record
89
90
9
TP Rate FP Rate
Success
Rate
Performance is evaluated
Series1
Fig.1 Design Process of our Approach
IV. EXPERIMNET AND RESULT ANALYSIS
V. CONCLUSIONS
We have implemented the described approach using
MatLab R2010. The telecommunication companies do not,
in general, agree to share the data sets of their customers
with researchers, therefore we have taken the datasets
available on the internet. But this data doesn’t contain the
fraudulent entries. So we introduce some fraudulent data
with the help of experts of the private sector
telecommunication industry. We have cleaned the data on
the basis of name using cleansing algorithm which is
described in [23].
We divide the datasets customer wise by using IF-then-Else
classifiers and this is shown in fig.2. We have taken 50
entries of 20 customers for evaluating our approach. We
have shown few entries and few attributes in the screen
starts due to space limitation. After data cleansing, we
apply the outlier mining technique similar coefficient sum
on individual customers. If we look at Fig.2, we see that
third tuple detect as outliers. We evaluate our approach
using some performance indicators as discussed above.
We have proposed an outlier mining approach based on
similar coefficient sum for finding frauds in
telecommunication industry. Outlier mining effectively
finds outliers, unknown to the distribution of data set. This
technique detect the abnormal behavior of individual
customer on the basis of their past call details in an
efficient manner. The manual intervention in the proposed
algorithm is highly negligible resulting in high degree of
automation and accuracy. We have implemented this
technique in Mat Lab R2010a.
REFERENCES
[1].
[2].
[3].
[4].
TABLE I
CUSTOMER WISE DATSET
[5].
Name
Called
Number
Calling Number
[6].
Duration
[7].
N1
10612899
68373748102
2
N1
612283725
68373748102
43
[9].
N1
613069656
68373748102
675
[10].
N1
613481951
68373748102
33
[11].
[8].
[12].
The values of TP, TN, FP and FN are graphically fig. 3.
The TP Rate, FP Rate and Success Rate are shown in fig. 4.
For better efficiency, TP Rate should be high and FP Rate
must be low. For TP Rate high, TP Must be high and TN &
FP must be low.
[13].
19
Investigating Fraudulent Acts, University Of Houston System
Administrative
Memorandum”,
http://www.uhsa.uh.edu/samiAM/01C04.htm, 2000.
http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Fraud.ht
ml
L.Cortesão, F.Martins, A.Rosa, P. Carvalho,” Fraud Management
Systems in Telecommunications: a practical approach” ICT, 2005.
Davis, A. B. and S. K. Goyal (1993). Management of cellular
fraud: Knowledge-based detection, classification and prevention.
In Proceedings of the 13th International Conference on Artificial
Intelligence, Expert Systems and Natural Language, Avignon,
France, Volume 2,pp. 155–164.
Johnson, M. (1996). Cause and effect of telecoms fraud.
Telecommunication (International Edition) 30 (12), 80–84.
Parker, T. (1996). The twists and turns of fraud. Telephony 231
(supplement issue), 18– 21.
O’Shea, D. (1997). Beating the bugs: Telecom fraud. Telephony
232(3), 24.
Pequeno, K. A. (1997). Real-time fraud detection: Telecom’s next
big step. Telecommunications (Americas Edition) 3 1(5), 59–60.
Hoath, P. (1998). Telecoms fraud, the gory details. Computer
Fraud & Security 20 (1), 10–14.
Barson, P., S. Field, N. Davey, G. McAskie, and R. Frank
(1996).The detection of fraud in mobile phone networks. Neural
Network World 6(4), 477–484.
Phone fraud: The battle rages on. Communication News, 33(8):8,
1996.
T. Fawcett and F. Provost. Combining data mining and machine
learning for effective user profiling. In E. Simoudis, J. Han, and
U. Fayyad, editors, Proceedings of the 13th International
Conference on Machine Learning, pages 8–13, 1996.
T. Fawcett and F. Provost. Adaptive fraud detection. Journal of
Data mining and Knowledge Discovery, 1(3):291–316, 1997.
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
An Analysis of Telecommunication Fraud using Outlier Detection Model based on Similar Coefficient Sum
[14]. A.B. Davis and S. K. Goyal. Management of cellular fraud:
Knowledge based detection,
classification and prevention. In
Proceedings of the 13th International Conference on Artificial
Intelligence, Expert Systems and Natural Language, Vol. 2, pages
155–164, Avignon, France, 1993.
[15]. Ben-Gal I., Outlier detection, In: Maimon O. and Rockach L.
(Eds.) Data Mining and Knowledge Discovery Handbook: A
Complete Guide for Practitioners and Researchers," Kluwer
Academic Publishers, 2005, ISBN 0-387-24435-2.
[16]. H. Kvarnstrom, E. Lundin, and E. Jonsson. Combining fraud and
intrusion detection – meeting new requirements –. In Proceedings
of the 5th Nordic Workshop on Secure IT systems
(NordSec2000), Reykjavik, Iceland, October 2000.
[17]. R. A. Shaffer. Good guys, bad guys and digital. Forbes,
154(4):122, 1994.
[18]. P. Barson, S. Field, N. Davey, G. McAskie, and R. Frank. The
detection of fraud in mobile phone networks. Neural Network
World, 6(4):477–484, 1996.
[19]. S. Field and P. Hobson. Techniques for telecommunications fraud
management. In J. Alspector, R. Goodman, and T. X. Brown,
editors, Proceedings of International Workshop on Applications
of Neural Networks to Telecommunications, Vol. 3, pages 107–
115, New Jersey, USA, 1997. Lawrence Erlbaum.
[20]. J. Hollmen. User Profiling and Classification for Fraud Detection
in Mobile Communication Networks. PhD thesis, Helsinki
University of Technology, Department of Cognitive and
Computer Science and Engineering, 2000. Espoo, Finland.
[21]. Victoria J. Hodge and Jim Austin,” A Survey of Outlier Detection
Methodologies”, Kluwer Academic Publishers, 2004.
[22]. Jiang Lingmin. Clustering Algorithm to Check Outlier Based on
Similar Coefficient Sum. Computer Engineering. Vol.29, Nov,
2003, pp.183-185.
[23]. Arora R, Pahwa P., Bansal S.,” Alliance Rules for Data
Warehouse Cleansing”, International Conference on Signal
Processing Systems IEEE, pages 743-747, 2009.
[24]. Han J., Kamber M.,” Data Mining: Concepts and Techniques”,
2nd ed. ISBN 1-55860-901-6 Morgan Kaufmann Publishers,
March 2006.
20
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.