Download THE DATA - Leiden Institute of Advanced Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
D ATA S CIENCE A PPROACH
TO
A NALYSIS
C LIENT T RANSACTIONS
IN
M ITCH B OS
T ELECOMMUNICATION M ARKETING
AND
L EIDEN I NSTITUTE
OF
M ICHAEL E MMERICH
A DVANCED C OMPUTER S CIENCE
[email protected]
O VERVIEW
Quizzes, games, ringtones and adult content,
they are all part of the Telecommunication
Marketing Business. In the Telecommunication Marketing Business thousands of transactions are done on a daily basis. But what is
possible by using all this data in order to come
up with pricing schemes, predict future profits
and target specific markets. By using different
Data Mining Techniques we will analyse these
transactions and try to find these.
T HE D ATA
In order to find the relations and patterns for the the Telecommunation Marketing Business,
the data was used from the company Telefuture. The data consists of millions of transaction
payments. Customers subscribe to a specific service and the information from these customer
payments is stored. Every different payment is a different transaction. The parameters of the
different payments are: the service, the time of payment, the status of the payment, country
the payment was made and the amount. The status of the payment tells us if the payment was
completed or not.
At the time of writing all the data I want is not yet available for me but will be soon. The
data will then be processed and ordered so it will be possible to use the analysis techniques on
it.
R ESEARCH Q UESTION
Research Question
Given large data volumes from past transactions of a telecommunication marketing firm:
are there patterns in the transaction data that
can be used to determine pricing schemes for
clients? And, how can we maken the process
of payment more streamlined and efficient?
R ELEVANT W ORK
The concept of Associated Rule Mining was
first introduced by Agrawad, Imielinski and
Swami. It was based on customers that fill
a basket with goods from the supermarket.
By analyzing these baskets, patterns could be
found. This concept can also be called, Market
Basket Analysis.
P REDICTIVE A NALYSIS
E XPLORATORY A NALYSIS
Association Rule Mining
Predictive Analysis is a concept that includes
a variety of techniques from predictive modelling, data mining and machine learning. It
uses historical data and transactions to forecast opportunities in the future.
Neural Networks
Neural Networks is a powerful analysis
technique that mimics the neurons of the biological brain. Neural networks learn through
training. The input and the output is known
but not the way to get there.
The concept of Association Rule Mining was
introduced in order to discover patterns between different products in a large-scale data
set of transactions. The best known example is
of the rule:
{onions, potatoes} =⇒ {burgers}
This rule means that if a customer buys
onions and potatoes, that this customer will
also be likely to buy hamburger meat. By
finding these rules businesses can use these
for different marketing and pricing strategies.
In order to find only significant rules in
the data set, two constraints are used. These
are the minimum support and confidence constraints. The support tells how many times a
specific item set appears in the data set and
the confidence how many times a specific rule
is true. Two steps are taken to find the rules:
Some time later Agrawal and Srikant came
up with a new algorithm to find Association
Rules fast. They introduced the Apriori and
AprioriTid algorithms.
1. The minimum support is used to find all
of the frequent itemsets.
The Frequent Path algorithm is a second
algorithm that was introduced by Han, Pei
and Yin. An advantage over ther Apriori
Algorithm is that the FP Growth algorithm
only reads the dataset twice. In our research
this will be the algorithm we are using.
2. The minimum confidence is used to find
the relevant sets and make rules.
Clustering
An schema of a Neural Network
Fraud Detection
By using predictive analysis, it is possible to
identify and track fraudulent transactions. In
fraude detection, scores will be given to different transactions and customers. This way
risky transactions can be identified and filtered out. So at the same time reducing the exposure of risk for the company.
The concepts of clustering is an analysis
method that groups a set of items together in
such a way that items with the same characteristics are placed together. This way it might be
possible to find similar characteristics for specific services and see what the different transactions have in common.
It is not possible to precisely define the concept of clustering. There are many different
goals for clustering. This is also the reason that
there are so many clustering algorithms.
k-means clustering