Download Data Mining - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Waqas Haider Bangyal
Source Materials
• “ Data Mining: Concepts and Techniques”
by Jiawei Han & Micheline Kamber,
Second Edition, Morgan Kaufmann, 2006
 “Data Mining: Introductory and Advanced Topics”,
by Dunham, Margaret H,
Prentice Hall, 2003
2
What Is Data Mining?
• Data mining is the principle of sorting through large
amounts of data and picking out relevant information.



The extraction of knowledge from data is called data
mining.
Data mining can also be defined as the exploration
and analysis of large quantities of data in order to
discover meaningful patterns and rules.
The ultimate goal of data mining is to discover
knowledge.
Data Rich, Information Poor
Motivation
 Lots of data is being collected and warehoused
 Web data, e-commerce
 purchases at department/grocery stores
 Bank/Credit Card transactions
 Computers have become cheaper and more powerful
 Data collected and stored at enormous speeds
(GB/hour)
 remote sensors on a satellite
 telescopes scanning the skies
Motivation
 Traditional techniques infeasible for raw data
 Human analysts may take weeks to discover useful
information
 We are drowning in data, but starving for knowledge!
 Data mining may help scientists
 in classifying and segmenting data
Motivation
huge amounts of data are automatically collected
To which class does
this star belong?
such an analysis can no longer be conducted manually
Why is data mining important?
 Rapid computerization of businesses produce huge
amount of data
 How to make best use of data?
 A growing realization:
knowledge discovered from data can be used for
competitive advantage.
Evolution of Database Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
 1990s—2000s:
 Data mining and data warehousing, multimedia databases,
and Web databases
Evolution of Database Technology
Evolutionary Step
Business Question
Enabling Technologies
Product Providers
Product Providers
Data Collection
(1960s)
"What was my total
revenue in the last
five years?"
Computers, tapes,
disks
IBM,
static data delivery
Data Access
(1980s)
"What were unit
sales in New
England last
March?”
Relational databases Oracle, Sybase,
(RDBMS), Structured Informix, IBM,
Query Language
Microsoft
(SQL), ODBC
Data Warehousing
(1990)
"What were unit
multidimensional
sales in New
databases, data
England last March? warehouses
Drill down to
Boston."
Oracle,Pilot,
dynamic data
delivery at multiple
levels
Data Mining
( Emerging Today)
"What’s likely to
happen to Boston
unit sales next
month? Why?"
Pilot, Lockheed,
IBM, SGI, numerous
startups (nascent
industry)
Prospective,
proactive
information delivery
Advanced
algorithms, massive
databases
dynamic data
delivery at record
level
Data Warehouse example
Data Warehouses: Data warehousing is defined as a process of centralized data
management and retrieval.
It is repository of information collected from multiple sources, stored under a
unified schema and usually reside at a single site
The process Of Data Mining
 There are 3 main steps in the Data Mining process:
 Preparation:
data is selected from the warehouse and “cleansed”.
 Processing:
algorithms are used to process the data. This step uses
modeling to make predictions.
 Analysis:
output is evaluated.
Reasons for growing popularity
 Growing data volume-
enormous amount of existing and appearing data
that require processing.
 Limitations of Human Analysishumans lacking objectiveness when analyzing.
 Low cost of Machine Learningthe data mining process has a lower cost than
hiring highly trained professionals to analyze data.
Applications of Data Mining
 Data Mining is applied in the following areas:
 Prediction of the Stock Market:
predicting the future trends.
 Bankruptcy prediction:
prediction based on computer generated rules, using
models
 Foreign Exchange Market:
data Mining is used to identify trading rules.
 Fraud Detection:
construction of algorithms and models that will help
recognize a variety of fraud patterns.
Results of Data Mining Include:
 Forecasting what may happen in the future
 Classifying people or things into groups by
recognizing patterns
 Clustering people or things into groups based on
their attributes
 Associating what events are likely to occur
together
 Sequencing what events are likely to lead to later
events
Data Mining Functions
Two types of model:
 Predictive models predict unknown values based on
known data
 Descriptive models identify patterns in data
 Each type has several sub-categories, each of which
has many algorithms. We won't have time to look at
ALL of them in detail.
Data Mining Functions
Thanks