Download Data Mining - Department of Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining
BY
JEMINI ISLAM
Data Mining
Outline:
• What is data mining?
• Why use data mining?
• How does data mining work
• The process of data mining
• Tools of data mining
What is data mining?
Generally, data mining (sometimes called data
or knowledge discovery) is the process of
analyzing data from different perspectives and
summarizing it into useful information. It allows users to
analyze data from many different dimensions or angles,
categorize it, and summarize the relationships identified.
Technically, data mining is the process of finding correlations
or patterns among dozens of fields in large relational
databases.
Cont..
Data Mining, also known as Knowledge-Discovery
in Databases (KDD), is the process of automatically
searching large volumes of data for patterns. Data
Mining is a fairly recent and contemporary topic in
computing. However, Data Mining applies many
older computational techniques from statistics,
machine learning and pattern recognition.
Example of data mining:
A simple example of data mining is its use in a retail
sales department. If a store tracks the purchases of a
customer and notices that a customer buys a lot of
silk shirts, the data mining system will make a
correlation between that customer and silk shirts.
The sales department will look at that information
and may begin direct mail marketing of silk shirts to
that customer, or it may alternatively attempt to get
the customer to buy a wider range of products..
Example Cont..
Another widely used (though hypothetical) example
is that of a very large North American chain of
supermarkets. Through intensive analysis
of the transactions and the goods bought over a
period of time, analysts found that beers and diapers
were often bought together.
Continue..
The grocery chain could use this newly discovered
information in various ways to increase revenue.
For example, they could move the beer display
closer to the diaper display. And, they could
place the high-profit diapers next to the high-profit
beers.
Why use data mining?
• Data is one of the most valuable assets for
any corporation - but only if we know how
to reveal valuable knowledge hidden in raw
data. Data mining allows us to extract
diamonds of knowledge from historical data
and predict useful outcomes form that.
Cont..
Data mining can* optimize business decisions,
* increase the value of each customer and
communication, and
*improve satisfaction of customer with your
services.
How does data mining work?
Data mining creates link between separate
transactions and analytical systems in a largescale information technology. It uses various
software to analyze relationships and patterns.
Generally,the following four types of
relationships are sought:
Classification
A task of finding a function that
maps records into one of several
discrete classes. For example, a
restaurant chain could mine
customer purchase data to
determine when customers visit and
what they typically order. This
information could be used to
increase traffic by having daily
specials.
Clustering
Clustering is a task of
identifying groups of records
that are similar between
themselves but different from
the rest of the data. For example,
data can be mined to identify
market segments or consumer
affinities
Association.
Data can be mined to identify association.
The beer-diaper example is an example of
associative mining.
Sequential Patterns
Data is mined to anticipate
behavior patterns and trends.For
example, an outdoor equipment
retailer could predict the
likelihood of a backpack being
purchased based on a consumer's
purchase of sleeping bags and
hiking shoes.
The process of data mining
The process of data mining consists
of three stages:
1) The initial exploration,
2) model building or pattern
identification with validation
or verification, and
(3) deployment (i.e., the application
of the model to new data in order to
generate predictions).
Stage 1: Exploration
This stage usually starts with
data preparation which may involve
cleaning data,data transformations,
selecting subsets of records and in case of data sets with large
numbers of variables ("fields") –
performing some preliminary feature
selection operations to bring the
number of variables to a manageable
range).
Stage 2: Model building and
validation.
This stage involves considering various
models and choosing the best one based on
their predictive performance (i.e., explaining
the variability in question and producing
stable results across samples).
Stage 3: Deployment.
That final stage involves using the model
selected as best in the previous stage and
applying it to new data in order to generate
predictions or estimates of the expected
outcome.
Tools of Data Mining
Artificial Neural
Networks: Non-linear
predictive models that
learn through training
and resemble biological
neural networks in
structure.
Cont..
Genetic algorithms: Optimization techniques
that use processes such as genetic
combination, mutation, and natural selection
in a design based on the concepts of natural
evolution.
Cont..
Decision trees: Tree
shaped structures that
represent sets of
decisions. These
decisions generate rules
for the classification of a
dataset
Cont..(Tools of Data Mining)
Nearest neighbor
method: A technique that
classifies each record in a
dataset based on a
combination of the classes of
the k record(s) most similar
to it in a historical dataset
(where k 1). Sometimes
called the k-nearest neighbor
technique
Cont..
Rule induction: The
extraction of useful ifthen rules from data
based on statistical
significance.
Tools of Data Mining (Cont..)
Data visualization: The visual interpretation
of complex relationships in multidimensional
data. Graphics tools are used to illustrate data
relationships.
Conclusion:
The concept of Data Mining is becoming
increasingly popular as a business information
management tool where it is expected to reveal
knowledge structures that can guide decisions in
conditions of limited certainty. Today increasingly
more companies acknowledge the value of this new
opportunity and use data mining tools and solutions
that help optimizing their operations and increase
customer’s bottom line.