Download credit card fraud detection based on behavior mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
CREDIT CARD FRAUD DETECTION BASED ON BEHAVIOR
MINING
NIMISHA PHILIP1, SHERLY K.K2
Department of Computer Science & Engineering, Toc H Institute of Science & Technology
email:[email protected]
Department of Information Technology, Toc H Institute of Science & Technology, Ernakulam, Kerala, India
[email protected]
Abstract — Globalization and increased use of the Internet for online shopping has resulted in a considerable proliferation of
credit card transactions throughout the world.Higher acceptability and convenience of credit cards for purchases has not only given
personal comfort to customers but also attracted a large number of attackers. As a result, credit card payment systems must be
supported by efficient fraud detection capability for minimizing unwanted activities by adversaries. Most of the well known
algorithms for fraud detection are based on supervised training Every cardholder has a certain shopping behavior, which
establishes an activity profile for him. Existing FDS try to capture behavioral patterns as rules which are static .This becomes
ineffective when cardholder develops new patterns of behavior Here, we propose a unsupervised method to dynamically profile
behavior pattern of customer Then the incoming transactions are compared against the user profile to indicate the anomalies,
based on which the corresponding warnings are outputted. A FP tree based pattern matching algorithm is used to evaluate how
unusual the new transactions are.
Key words- Fraud detection, adaptive profiling, FP tree, Behavior mining
----------------------------------------------------------------------------------------------------------------------------------------------------patterns. Fraud detection based on the analysis of
existing purchase data of cardholder is a promising way
to reduce the rate of successful credit card frauds
.Deviation from such patterns is a potential threat to the
system.
I. INTRODUCTION
Due to a rapid advancement in the electronic
commerce technology, the use of credit cards has
dramatically increased. As credit card becomes the most
popular mode of payment for both online as well as
regular purchase, cases of fraud associated with it are
also rising. A few of many ways that money can be
thieved from the credit card are Phishing, Pharming,
Skimming and Dumpster driving. Hackers and fraudsters
are becoming more sophisticated and skillful at
manipulating internet protocol, web languages and tools
to or discover any weakness that they can exploit. Thus
the internet transaction fraud is 12 times higher than instore fraud.
II. PROBLEMS WITH CREDIT CARD FRAUD
DETECTION
One of the biggest problems associated with credit
card fraud detection is the lack of the both literature
providing experimented results and of real world data for
researchers to perform experiments on. This is because
fraud detection is often associated with sensitive
financial data that is kept confidential for reasons of
customer accuracy.
Credit-card-based purchases can be categorized into
two types: 1) physical card and 2) virtual card. In a
physical-card- based purchase, the cardholder presents
his card physically to merchant for making a payment.
To carry out fraudulent transactions in this kind of
purchase, an attacker has to steal the credit card. If the
cardholder does not realize the loss of card, it can lead
to a substantial financial loss to the credit card company.
In the second kind of purchase, only some important
information about a card (card number, expiration date,
secure code) is required to make the payment. Such
purchases are normally done on the Internet or over the
telephone. To commit fraud in these types of purchases,
a fraudster simply needs to know the card details. Most
of the time, the genuine cardholder is not aware that
someone else has seen or stolen his card information.
The only way to detect this kind of fraud is to analyze the
spending patterns on every card and to figure out any
inconsistency with respect to the “usual” spending
Some of the properties a fraud detection system
should have in order to perform some good results.
• The system should
distributions
be able to handle skewed
• The ability to handle noise.
• Overlapping data
• The systems should be able to adapt themselves to
new kinds of fraud.
• There is a need for good matrix to evaluate the
classified system.
• The systems should take into account the cost of
the fraudulent behavior detected and cost
associated with stopping it.
7
TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12
The rule indicates that for all transactions of a
customer recorded in a time window, 19% (support)
transactions are playing “Xbox contest” in “Saturday”,
“8pm-10pm”. There is a 70% probability (confidence)
that if a transaction happens in “Saturday”, “8pm-10pm”
it would be an Xbox contest.
III. RELATED WORK
Most previous work for fraud detection or anomaly
detection was classification based. Famous algorithms
utilized for fraud detection are NB (naïve Bayesian),
proposed by Elkan et al [10], C4.5, by Quinlan [16] and
BP
(Back-propagation)[2].The
researches
of
Domingos[8], Elkan [9][10] and Witten[8] NB algorithm is
very effective in many real world data sets and is
extremely efficient in that it learns both linear and
nonlinear relationships directly from the data being
modeled. in a linear fashion.
However when attributes are redundant and not
normally distributed, the predictive accuracy is reduced.
C4.5 can output not only accurate predictions but also
explain the patterns, decision tree and rule set, in it.
However, scalability and efficiency problems, such as
the substantial decrease in performance, can occur
when C4.5 is applied to large data sets. Backpropagation neural networks can process a very large
number of instances, and have a high tolerance to noisy
data. The BP algorithm requires long training times and
extensive testing and retraining of parameters. The
common disadvantage of all these algorithms is that they
rely on supervised training, which requires human
involvement to prepare training cases, and test cases to
optimize parameters.
To solve this problem, a fraud detection system is
proposed to detect fraudulent transactions in an online
system .It dynamically profiles user behavior pattern.
Since most online systems contain non-stationery data,
the system is able to adjust its detector to keep up with
the change of the user behavior.
Brin et al [4] introduced a dynamic itemset counting
technique to reduce the number of database scans.
Ozden et al[15] presented cyclic and interesting
association rule mining. Ng et al [14] introduced a
constraint-based rule mining technique. Cheung et al[6]
presented an incremental updating technique to discover
the association rules in a large scale database. Some
popular rule generators, RL proposed by Clearwater [7],
C4.5 by Quinlan [16] for example, are based on
supervised learning,
An FP-tree (frequent pattern tree) structure and FPtree growth algorithm, proposed by Han [12] are utilized
to uncover these hidden association rules from the
recent transactions for this user.
V. FRAUD DETECTION SYSTEM
An online transaction system includes several web
applications and services to provide OLTP (OnLine
Transaction Processing stages), FDS (Fraud Detection
System), a database storing transaction data, and a
database replication in order to provide minimum
performance degradation on OLTP by backend data
process or analysis. FDS is a backend process, whose
impact on the front end of the online system is
minimized, since it only talks to the replication database.
Fraud Detection System consists of three major
modules; (1) Data engine serves as an interface
between the replication database and the FDS. It
collects and pre-formats the recent transactions of all
individual customers in the online system (2)The rule
engine module mines the recent transactions to
generate a profile, an association rule set stored in an
FP-tree, for each user.(3)The rule monitor module
monitors the new transaction for every user. Any new
transaction of a particular user is compared against the
FP-tree for that user to indicate the anomaly.
IV. BASIC IDEA
The association rule is first introduced by Agrawal et
al [1][17]. The following is a formal statement of the
problem: Let I = {i1, i2, . . .,im} be a set of literals, called
items. Let D be a set of transactions, where each
transaction T is a set of items such that associated with
each transaction is a unique identifier, called its TID. A
transaction T contains X, a set of some items in I, if X‫ك‬
T. An association rule is an implication of the form X=>Y
where X‫ك‬I,Y‫ك‬I and X‫ځ‬Y=‫׎‬. The rule X=>Y holds in the
transaction set D with confidence c if c% of transactions
in D that contain X also contain Y. The rule X=>Y has
support s in the transaction set D if s% of transactions in
D contain X‫ڂ‬Y.
Two important measures for association rules,
support and confidence, are defined as follows.
Support is the statistic significance of an association
rule. While a high support is often desirable for
association rules. The confidence of a rule indicates the
degree of correlation in the dataset between X and Y. It
is used as a measure of a rule’s strength. Often a large
confidence is required for association rules.
An example of a association rule:
FIG 1. ARCHITECTURE OF FDS
day(“Saturday”)
/\time(“8pm-10pm”)→play(“Xbox
contest”) [support=19%, confidence=70%].
8
TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12
5. Until no change
A. Data Engine
It collects and pre-formats the most recent
transactions of an individual user from a transaction
database which are analyzed to profile his current
behavior. The ‘recent’ is a slide window, which could be
a time window or a transaction count window. For
example, recent transactions could be all the
transactions in the past two months, or the recent 500
transactions.
Monthly spending sequence is considered where a
month is divided into four weeks and week is divided into
seven days as Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday, and Sunday. A day is
divided into two morning and evening. IP address of the
merchant and name of the city where he is located are
also considered to capture the user profile. The items
purchased by the user also considered. They are
classified into electronic items, clothes, books, groceries,
miscellaneous...etc.
Marketing-related statistical study shows that most
people exhibit a consistent spending nature, although
the consistency may be disturbed to a certain extent due
to emergency or accidental causes Each transaction is
an itemset. A set of items or attribute-value pairs is
referred to as an itemset. Normally, the transactions that
are stored in the issuing bank’s database contain many
attributes.. Since humans tend to exhibit specific
behavioristic profiles, every cardholder can be
represented by a set of patterns containing information
about the typical purchase category, where they shop, in
which country, with what merchants, the time since the
last purchase, the amount of money spent and customer
demographic features such as IP address, city name etc.
The values of the attributes are the converted into
categorical variables.
If user city matches merchant city, the purchase is
local. If user country matches with merchant country, the
purchase is national otherwise international
An example of recent transactions for a customer for
a typical online transaction system is shown below. Each
transaction includes the attributes such as purchased
product category, time (in weekday), time (in day part),
grouped IP address, the grouped purchased amount and
area of purchase.
TABLE I:
Tid
1
2
3
4
5
In credit card transaction processing, every card has
a certain credit limit and any transaction within the
available credit limit is a syntactically valid transaction.
Credit cardholders, however, normally carry out
transactions for values usually much lower than the
credit limit of the card .So, spending behavior related to
transaction amount is captured by clustering the amount
into three clusters: High amount (ch), Low amount (cl),
Medium amount (cm) using k-means clustering
algorithm. K-means clustering algorithm is an
unsupervised learning algorithm for grouping a given set
of data based on similarity in their attribute values.
Recent Transactions of a Customer
Transaction data
ET,ST,EV,139.168,international,cm
ET,ST,MR,202.55,national,cl
ET,SU,MR,139.168,international,ch
BK,ST, EV, 139.168, international, cl
CL,ST,EV,139.168,international,cl
The abbreviations: ET-electronic items, BK-book, CLclothes, ST-Saturday, SU-Sunday, MO-Monday, EVevening, MR-morning, cm-Medium amount , cl-low
amount,
ch-High
amount,TUE-Tuesday,THURThursday, FRI-Friday, WED-Wednesday,GR-Groceries,
Misc- Miscellaneous.
B. Rule Engine
The major responsibility of rule engine is to
adaptively generate association rule sets to profile user
behaviors.
K-means clustering algorithm
Algorithm: k-means The k-means algorithm for
partitioning where each cluster center Is represented by
the mean value of the objects in the cluster.
Set min_supp as 60%, min_conf as 65%.The
occurrence frequency is the no of transactions that
contains the itemset.(support count).If the frequency of
an itemset is greater than or equal to the product of
min_sup and the total no of transactions ,then it is a
frequent set. Since the rarely occurred items would be
filtered out when we generate the frequent itemsets, the
rarely occurred behavior such as fraudulent actions will
be filtered out.
Input: k-number of clusters
D-a data set containing n objects
Output: A set of k clusters
Method:
1. Arbitrarily choose k objects from D as the initial
cluster centers
We use the FP-tree (Frequent Pattern tree) growth
algorithm proposed by Han [12] to extract the
associations among features from transactions during a
certain period in order to profile the user’s behavior and
changes in these patterns, may signal fraudulent use.
FP tree looks for behavior changes that may imply fraud.
2. Repeat
3. (re)assign each object to the cluster to which the
object is the most similar based on the mean value of
the objects in the cluster
4. Update the cluster means; calculate the mean
value of the objects for each cluster.
FP-tree structure is used to store compressed,
crucial information about frequent patterns. It consists of
9
TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12
a linked table and a prefix tree, which store quantitative
information about frequent patterns.
Since different kinds of frequent items are of different
importance to profile a user or a system. For example, a
matched IP pattern could be more important than a
matched time pattern. A weight function, weight (ti), was
used to give various stresses to the different item types.
So increase sim_credit (ti) by G(s, c) × weight (ti) instead
of G(s, c). The weight function is a fixed look up table,
which maps the different item types to different weights.
A neural network is used to train real data to get the
optimized look up table.
An FP-tree is constructed from an empty root and a
header table. Branches of the FP-tree are then inserted
into the tree by scanning the transactions a second time.
The items in each transaction are processed in the order
of descending support A branch is created for each
transaction. For two branches sharing a common prefix,
we merge the shared path and increase count of the
nodes in that path by one. To make tree traversal easy,
a header table is built so that each item pointer points to
its occurrences in the tree via a chain of node links.
C. Rule Monitor
The customer’s profile is utilized to monitor a new
transaction to indicate how unusual the new transaction
is. At the same time, FP-tree is updated adaptively by
accumulating the occurrences of the attributes in new
transactions of an individual customer without human
involvement.
sim(T) represents the extent that a new transaction is
comparable to the customer’s normal behavior patterns.
It is compared against a set of thresholds to determine
the corresponding fraud likelihood.
2. Alert Accumulating Algorithm
A set of thresholds is set up to fire corresponding
fraud warnings. Technically, it is possible to misclassify
some legal transactions, which do not follow the
customer’s normal behavior. Since the frequent items
are filtered by min support before creating the FP-tree.
Therefore the tree is not completed; the very unusual
patterns are not collected in the tree. Moreover, a
customer could also suddenly change his or her
behavior. Another important issue is that in order to
minimize the fraud detection cost, the purchased amount
is a factor of firing alarm. For a very small purchase
amount, for example .50$, even it is highly suspicious,
fire an alarm is not economical. Since the objective of
fraud detection action is to minimize the total cost. By
using a suspicious threshold for a single transaction, a
sequence of fraud transactions with low purchasing
amounts could be missed.
Two techniques are utilized to build the rule monitor.
To indicate the anomaly of a new transaction, a FP-tree
based pattern matching algorithm is designed. And an
alert accumulating algorithm is used to lower the false
alarm and to detect a set of fraudulent transactions with
low suspicious values.
1. FP-tree based pattern matching algorithm
Suppose T = {t1,t2,…,tn} is an incoming transaction.
For each frequent item t, calculate a similarity credit
sim_credit (ti).by the following steps
Double SimMatch(T) {
sim=0.0;
To solve these problems, accumulate the warnings
from a set of new transactions and calculate the alert
values for a set of transactions by accumulating their
suspicious values. The transactions to be processed
could include all the new transactions after last FP-tree
updating, or can use an expiring time window to specify
the transactions we would like to accumulate. Then by
comparing the alert sum instead of the single alert
against a set of thresholds, a corresponding fraud alarm
would be fired. The threshold set is decided by the
detection sensitivity specified by the user.
For each item ti in T {
If(found headtablelink, in the head table) {
sim_credit=0.0;
headtablelink
for
each
tree
node
in
Nij
in
if (Pj(ti) ‫ك‬T)
sim_credit+=G(Nij.s,Nij.c)*weight(ti);
Accumulate all transactions within the specified time
window. A simple step function is the most straight
forward expiring function. For step function, if time of a
transaction, t, is larger than T2, f(t) equals to 1, else f(t)
equals to 0.
sim+=sim_credit;
}
}
return sim;
The fraud alert value is calculated by:
}
AlertValue = ∑i=1 n-1(s(Ti) × f(Ti) × amount(Ti)), where
Ti is a transaction in the accumulation window, s(Ti) is
the suspicious value of Ti, f(Ti) is the expiring weight of
Ti specified by expiring function, amount(Ti) is the
consumed money of transaction Ti is reasonable that
several highly suspicious transactions having very low
consumed points will not fire an alert, by contrast, only
For implementation, G(s, c) = –s × log2(1+ε–c), where
s is the support, c is the confidence, s≤c≤1.0, ε is a real
number used to specify the upper boundary of function
G. The function is chosen by the intuition that if a new
transaction matches with a rule having larger probability
it is more likely to match the user’s behavior, and the
confidence value is emphasized
10
TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12
transition probability matrix (TPM) and initial probability
distribution vector (IPD). Here, the number of states is
three representing three clusters. This block generates
synthetic transaction amounts for genuine cardholder.
The values associated with TPM and IPD can be
changed to capture the spending behavior of a
cardholder properly.
one highly suspicious transaction having very high
consumed points could fire an alert.
3. Output
The output of FDS is an alert value indicating the
suspicious level of analyzed transactions. This value is
sent back to a web application. By comparing the alert
value to a set of thresholds, we are able to know the
corresponding reaction to be performed. For example, if
the alert value for a transaction or a set of transactions is
below .45, no warning is given. If the alert value is in a
range of .45 to .85, an email is sent to the user in order
to give a fraud warning. If the alert value is higher than
.85, temporally lock the account to prevent further fraud
or loss. And the user should also be informed by an
email or a message to verify the transaction and unlock
his or her account. If the user confirms this is a fraud
transaction, further protection should be performed, for
example changing the password. If the user confirms
this is a legal transaction, unlock his account and the
transaction would be added to the recent transaction set,
from which the customer’s FP-tree would be updated.
Therefore the user profile, FP-tree, could be adaptively
changed by keeping up with the changing of the user’s
behavior.
Fraud Markov Chain Module (FMCM): Similar to
GMCB, this block generates synthetic fraud transaction
amounts. The values associated with TPM and IPD can
also be changed in order to capture changing behavior
of the fraudster.
We use standard performance metrics to analyze the
different test cases. True Positive (TP) is the percentage
of fraudulent transactions identified as fraudulent. False
Positive (FP) is the percentage of genuine transactions
identified as fraudulent. It is important to achieve high TP
along with low FP in a fraud detection system. However,
design constraints are such that any attempt to improve
TP results in higher FP. So we empirically determine the
design parameters of the system and then simulate by
varying inputs.
VI. CONCLUSION
A novel fraud detection framework is proposed.
Individual user’s behavior pattern is dynamically profiled
from the transactions by using a set of association rules.
An FP-tree (frequent pattern tree) structure and FP-tree
growth algorithm are utilized to uncover these hidden
association rules from the recent transactions for the
user. The incoming transactions for that user are then
compared against the profile in order to discover the
anomalies, based on which the corresponding warnings
are outputted. The FP-tree growth algorithm has been
improved and used it for pattern matching. The real
world scenario is captured using Markovian modulated
poisson process. Unsupervised training and self
adjustment to changing user behavior make the
proposed system effective for monitoring online
transaction systems and provide fraud detection and
protection.
4. Data Preparation
In real life, any actual credit card database contains
fraudulent transactions interspersed with genuine
transactions and these two types of transactions are
generated by two different parties, namely genuine
cardholders and fraudsters. Hence these are
independent events with separate arrival rates..
Transaction arrival rate is also an important parameter
for evaluating a fraud detection system. In the credit card
fraud detection domain, the occurrence of fraudulent
transactions is very low compared to the number of
genuine transactions. Hence, mixing of genuine and
fraudulent transactions need to be controlled properly.
However, the existing synthetic data generation
models are not capable of controlling transaction arrival
rates as well as mixing of two types of transactions.
Hence, this real life scenario is captured more accurately
using a Markov Modulated Poisson Process. Therefore,
we have constructed an MMPP-based transaction
simulator consisting of three main components,
VII. REFERENCES
[1]. Agrawal, R. and Srikant, R. (1993): Fast algorithms for mining
association rules. In Proc. of the 20th Intl. Conf. on Very Large
Data Bases, pp.478–499. Santiago, Chile.
[2]. Agrawal, R. and Srikant, R. (1995): Mining sequential patterns. In
Proc. of the International Conference on Data Engineering, 3–14.
Taipei, Taiwan.
[3] J. Xu, A. H. Sung and Q. Liu (2005). Online fraud detection system
based on non-stationery anomaly detection, The International
Conference on Security and Management.
[4]. Brin, S., Motwani, R. and Silverstein, C. (1997): Beyond market
basket: generalizing association rules to correlations. In Proc. of
the ACM SIGMOD Intl. Conf. on Management of Data, pp. 265–
276. Tucson, Arizona, USA.
[5]. Chan, P. and Stolfo, S. (1998): Toward scalable learning with nonuniform class and cost distributions: A case study in credit card
fraud detection. Proc. of the Fourth International Conference on
Knowledge Discovery and Data Mining, pp.164–168.
Markov Modulated Poisson Process Module
(MMPPM): MMPP is a doubly stochastic Poisson
process, where the arrival rate is determined by the state
of a Markov Chain. The MMPPM has two states—
Genuine state (G) and Fraud state (F). The Poisson
arrival rate takes discrete values corresponding to each
state. Frequent items are captured using poisson
process and the arrival rate vector represents arrival rate
for different weeks. In the G state, Poisson arrival rate
vector is λG, while in F, arrival rate vector is λF . pGF is
the transition probability from G to F, whereas pFG is the
transition probability from F to G.
Genuine Markov Chain Module(GMCM): This block
consists of a finite Markov chain with associated
11
TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12
[6]. Cheung, D., Han, J., Ng, V. and Wang, C. (1996): Maintenance of
discovered association rules in large databases: an incremental
updating technique. In Proc. of the Intl. Conf. on Data
Engineering, pp.106–114. New Orleans, Louisiana, USA.
[7]. Clearwater S. and Provost, F. (1993): RL4: A tool for knowledgebased induction. In Proc. of the Second International IEEE
Conference on Tools for Artificial Intelligence, pp. 24–30.
[8]. Domingos, P. and Pazzani, M. (1996): Beyond independence:
conditions for the optimality of the simple Bayesian classifier, in
Proc. of the 13th Conference on Machine Learning, pp.105–112,
Bari, Italy.
[9]. Elkan, C. (2000): Magical thinking in data mining: lessons from
CoIL challenge 2000, Department of Computer Science and
Engineering, University of California, San Diego, USA.
[10]. Elkan, C. (1997): Naïve Bayesian Learning. Technical Report
CS97–557, Department of Computer Science and Engineering,
University of California, San Diego, USA.
[11]. Han, J. and Kamber, M. (2000): Data Mining: Concepts and
Technique, Morgan Kaufmann; 1st edition.
[12]. Han, J., Pei, J. and Yin, Y. (2000): Mining frequent patterns
without candidate generation. SIGMOD’00, 1–12, Dallas, TX,
USA.
[13]. Kerr, K. and Litan, A. (2002): Online transaction fraud and
prevention
get
more
sophisticated,
Gartner.
http://www.gartnerg2.com/research/rpt-0102-0013.asp
[14]. R., Lakshmanan, L., Han, J. And Pang, A. (1998): Exploratory
mining and pruning optimizations of constrained association rules.
In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data,
13–24. Seattle, Washington, USA.
[15]. OZDEN, B., RAMASWAMY, A. and SILBERCHATZ, A. (1998):
Cyclic association rules. In Proc. of the Intl. Conf. on Data
Engineering, 412–421.
[16].QUINLAN, J. R. (1993): C4.5: Program for machine learning.
Morgan Kaufmann, San Mateo, CA, USA.
[17]. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules
between sets of items in large databases. In Proc. of the ACM
SIGMOD Conference on Management of Data, Washington D.C.,
May 1993.
[18]. M. Houtsma and A. Swami. Set-oriented mining of association
rules. Research Report RJ 9567, IBM Almaden Research Center,
San Jose, California, October 1993.
12