Download DATA MINING

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
DATA MINING
CS157A
Swathi Rangan
A Brief History of Data Mining
• The term “Data Mining” was only
introduced in the 1990s.
• Data Mining roots are traced back along
three family lines: classical statistics,
artificial intelligence, and machine
learning.
• Union of historical and recent
developments in statistics, artificial
intelligence and machine learning.
Data Mining Overview
• Process of automatically searching large
volumes of data for relationships and patterns.
• Discovery of information in terms of patterns or
rules from vast amounts of data.
• Attempts to discover rules and patterns from
data.
• Deals with “knowledge discovery” in databases.
• A valuable tool for business.
Goals Of Data Mining
Prediction
Involves using some variables or fields in the data set to
predict unknown or future values of other variables.
Data Mining can show how certain attributes within the data
will behave in the future.
Ex: Analysis of buying transactions to predict what
customers will buy under certain discounts, how much
sales a store will generate, and whether deleting a
product would yield more profits.
Ex: Credit card company predicting if a person is a good
credit risk by looking at certain known attributes such as
age, income, debts, and past debt repayment history.
Goals Of Data Mining cont..
Identification
Data patterns can be used to identify the
existence of an item, event, or activity.
Ex: Hackers/Intruders trying to break into a
system may be identified by the programs
executed, files accessed, etc..
Biological Applications- existence of a gene
may be identified by certain sequences of
nucleotide symbols in the DNA sequence.
Goals Of Data Mining cont..
Classification
• Data Mining can partition data so that different
categories can be identified based on
combinations of parameters.
• Can be done by finding rules that partition given
data into groups.
Ex: Customers in a supermarket can be categorized
into discount-seeking shoppers, shoppers in a
rush, loyal regular shoppers, and infrequent
shoppers.
Ex: A credit card company wants to decide whether
or not to give a credit card to an applicant.
Example
To decide whether or not to give a credit card to
applicant, the company assigns a credit worthiness level of
good, average or bad to current customers. Therefore,
rules are applied to this situation.
Consider 2 attributes: Education level and Income
Rules could be of the following:
Ұperson P, P.degree=masters and P.income>75,000
>P.credit=excellent
Ұperson P, P.degree=bachelors or (P.income>=25,000 and
P.income<=75,000)
>P.credit=good
Goals cont..
Optimization
• optimize the use of limited resources such
as time, space, money, or materials and to
maximize output variables such as sales
or profits under a given set of constraints.
What Data Mining can do
• Enables companies to determine
relationships among “internal” and
“external” factors.
• Predict cross-sell opportunities and make
recommendations
• Segment markets and personalize
communications.
• Predicts outcomes of future situations
The process Of Data Mining
• There are 3 main steps in the Data Mining
process:
– Preparation: data is selected from the
warehouse and “cleansed”.
– Processing: algorithms are used to process
the data. This step uses modeling to make
predictions.
– Analysis: output is evaluated.
Reasons for growing popularity
• Growing data volume- enormous amount of
existing and appearing data that require
processing.
• Limitations of Human Analysis- humans lacking
objectiveness when analyzing dependencies for
data.
• Low cost of Machine Learning- the data mining
process has a lower cost than hiring highly
trained professionals to analyze data.
Data Mining Techniques
• Association Rule- is to discover interesting
associations between attributes that are
contained in a database.
• Clustering- finds appropriate groupings of
elements for a set of data.
• Sequential patterns-looking for patterns
where one event leads to another later
event.
• Classification- looking for new patterns.
Applications of Data Mining
• Data Mining is applied in the following areas:
– Prediction of the Stock Market: predicting the future
trends.
– Bankruptcy prediction: prediction based on computer
generated rules, using models
– Foreign Exchange Market: Data Mining is used to
identify trading rules.
– Fraud Detection: construction of algorithms and
models that will help recognize a variety of fraud
patterns.
References
• http://www.megaputer.com/dm/dm101.php3#whyuse
• http://www.anderson.ucla.edu/faculty/jason.frand/teacher
/technologies/palace/datamining.htm
• http://www.slais.ubc.ca/people/students/studentprojects/C_Zeller/500WWW/moreon.htm#More%20on%20Data%20Mining
• http://www.megaputer.com/dm/dm101.php3#whyuse
• http://www.unc.edu/~xluan/258/datamining.html
• http://www.ciadvertising.org/student_account/fall_00/adv
391k/shmun/privacy.html