Download Issues of Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining
By : Tung, Sze Ming ( Leo )
CS 157B
Definition


A class of database application that analyze
data in a database using tools which look for
trends or anomalies.
Data mining was invented by IBM.
Purpose


To look for hidden patterns or previously
unknown relationships among the data in a
group of data that can be used to predict future
behavior.
Ex: Data mining software can help retail
companies find customers with common
interests.
Background Information


Many of the techniques used by today's data
mining tools have been around for many years,
having originated in the artificial intelligence
research of the 1980s and early 1990s.
Data Mining tools are only now being applied
to large-scale database systems.
The Need for Data Mining



The amount of raw data stored in corporate
data warehouses is growing rapidly.
There is too much data and complexity that
might be relevant to a specific problem.
Data mining promises to bridge the analytical
gap by giving knowledgeworkers the tools to
navigate this complex analytical space.
The Need for Data Mining, cont’


The need for information has resulted in the
proliferation of data warehouses that integrate
information multiple sources to support
decision making.
Often include data from external sources, such
as customer demographics and household
information.
Approach to Data Mining




association
sequence-based analysis
clustering
classification
Association



Classic market-basket analysis, which treats the
purchase of a number of items (for example, the
contents of a shopping basket) as a single transaction.
This information can be used to adjust inventories,
modify floor or shelf layouts, or introduce targeted
promotional activities to increase overall sales or
move specific products.
Example : 80 percent of all transactions in which beer
was purchased also included potato chips.
Sequence-based analysis


Traditional market-basket analysis deals with a
collection of items as part of a point-in-time
transaction.
to identify a typical set of purchases that might
predict the subsequent purchase of a specific
item.
Clustering



Clustering approach address segmentation
problems.
These approaches assign records with a large
number of attributes into a relatively small set of
groups or "segments."
Example : Buying habits of multiple population
segments might be compared to determine which
segments to target for a new sales campaign.
Classification



Most commonly applied data mining technique
Algorithm uses preclassified examples to
determine the set of parameters required for
proper discrimination.
Example : A classifier derived from the
Classification approach is capable of
identifying risky loans, could be used to aid in
the decision of whether to grant a loan to an
individual.
Issues of Data Mining


Present-day tools are strong but require
significant expertise to implement effectively.
Issues of Data Mining
Susceptibility to "dirty" or irrelevant data.
 Inability to "explain" results in human terms.

Issues

susceptibility to "dirty" or irrelevant data
Data mining tools of today simply take everything
they are given as factual and draw the resulting
conclusions.
 Users must take the necessary precautions to
ensure that the data being analyzed is "clean."

Issues, cont’

inability to "explain" results in human terms
Many of the tools employed in data mining
analysis use complex mathematical algorithms that
are not easily mapped into human terms.
 what good does the information do if you don’t
understand it?


The End