Download Data Mining

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Database model wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Data mining wikipedia , lookup

Data Mining
Andrie Suherman
Major Elements
Steps/ Processes
Tools used for data mining
Advantages and Disadvantages
What is Data Mining?
Data Mining, also known as KnowledgeDiscovery in Databases (KDD), is the process
of automatically searching large volumes of
data for patterns.
Data Mining applies many older
computational techniques from statistics,
machine learning and pattern recognition
Data mining consists of five
major elements:
Extract, transform, and load transaction data
onto the data warehouse system.
Store and manage the data in a
multidimensional database system.
Provide data access to business analysts and
information technology professionals.
Analyze the data by application software.
Present the data in a useful format, such as a
graph or table.
Data Mining Goal
The ultimate goal of data mining is
prediction - and predictive data mining
is the most common type of data
mining and one that has the most direct
business applications.
3 Steps Data Mining Process
Stage 1: Exploration. This stage usually starts with
data preparation which may involve cleaning data,
data transformations, selecting subsets of records
Stage 2: Model building and validation. This
stage involves considering various models and
choosing the best one based on their predictive
Stage 3: Deployment. That final stage involves
using the model selected as best in the previous
stage and applying it to new data in order to
generate predictions or estimates of the expected
Some of the tools used for
data mining are:
Artificial neural networks - Non-linear predictive models that
learn through training and resemble biological neural networks
in structure.
Decision trees - Tree-shaped structures that represent sets of
decisions. These decisions generate rules for the classification of
a dataset.
Rule induction - The extraction of useful if-then rules from data
based on statistical significance.
Genetic algorithms - Optimization techniques based on the
concepts of genetic combination, mutation, and natural
Nearest neighbor - A classification technique that classifies each
record based on the records most similar to it in an historical
Reasons for the growing
popularity of Data Mining
Growing Data Volume
Limitations of Human Analysis
Low Cost of Machine Learning
Marking/Retailing: Data mining can
aid direct marketers by providing them
with useful and accurate trends about
their customers’ purchasing behavior.
Banking/Crediting: Data mining can
assist financial institutions in areas such
as credit reporting and loan
Law enforcement: Data mining can aid law
enforcers in identifying criminal suspects as
well as apprehending these criminals by
examining trends in location, crime type,
habit, and other patterns of behaviors.
Researchers: Data mining can assist
researchers by speeding up their data
analyzing process; thus, allowing them more
time to work on other projects.
Privacy Issues: For example,
according to Washing Post, in 1998,
CVS had sold their patient’s prescription
purchases to a different company
American Express also sold their
customers’ credit card purchases to
another company.
Security issues: Although companies have a lot of
personal information about us available online, they
do not have sufficient security systems in place to
protect that information.
Misuse of information: Some of the company will
answer your phone based on your purchase history.
If you have spent a lot of money or buying
a lot of product from one company, your call will be
answered really soon. So you should not think that
your call is really being answer in the order in which
it was receive.