Download Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Introduction to Data Mining
Chapter 1
1
Chapter 1 Outline
– Background
– Information is Power
– Knowledge is Power
– Data Mining

2
Introduction
3
4
Information is Power
Relevant
 Right Information
 Globalised world
 Vast amount of information available

5
What is an information
a collection of data
 The act of human analysis and
interpretation of activities
 Decomposing it into various
components and tackling them

6
What is Knowledge?
The act of human synthesis and
evaluation of information
 Integration of the relevant components
and form as a relevant whole system.

7
Data Mining Definition I


The nontrivial extraction of hidden, previously
unidentified, and potentially valuable
knowledge from data
A variety of techniques such as neural
networks, decision trees or standard
statistical techniques to identify nuggets of
information or decision-making knowledge in
bodies of data, and extracting these in such a
way that they can be put to use in areas such
as decision support, prediction, forecasting,
and estimation.
8
Data Mining Definition II

Finding hidden information in a
database
9
Hidden Information



Number of years of experiences
Great secret recipes
Success Factors
10
Database Processing vs. Data
Mining Processing

Query

– Well defined
– SQL

Data
– Poorly defined
– No precise query language

– Operational data

Output
– Precise
– Subset of database
Query
Data
– Not operational data

Output
– Fuzzy
– Not a subset of database
11
Query Examples

Database
– Find all credit applicants with surname name of Lee.
– Identify customers who have purchased more
than $100,000 in the last year.
– Find all customers who have purchased bread

Data Mining
– Find all credit applicants who are good credit
risks. (classification)
– Identify customers with similar eating habits.
(Clustering)
– Find all items which are frequently purchased
with bread. (association rules)
12
Data Mining Models and Tasks
13
Data Mining vs. KDD
Knowledge Discovery in Databases
(KDD): process of finding useful
information and patterns in data.
 Data Mining: Use of algorithms to
extract the information and patterns
derived by the KDD process.

14
KDD Process
Modified from [FPSS96C]





Selection ( Pre-Mining 1): Obtain data from various
sources.
Preprocessing (Pre-Mining 2) : Cleanse data.
Transformation (Pre-Mining 3): Convert to common
format. Transform to new format.
Data Mining: Obtain desired results.
Interpretation/Evaluation (Post-Mining): Present
results to user in meaningful manner.
15
KDD Process Ex: Web Log

Selection:
– Select log data (dates and locations) to use

Preprocessing:
– Remove identifying URLs
– Remove error logs

Transformation:
– Sessionize (sort and group)

Data Mining:
– Identify and count patterns
– Construct data structure

Interpretation/Evaluation:
– Identify and display frequently accessed sequences.

Potential User Applications:
– Cache prediction
– Personalisation
16
Data Mining Development
•Relational Data Model
•SQL
•Association Rule Algorithms
•Data Warehousing
•Scalability Techniques
•Similarity Measures
•Hierarchical Clustering
•IR Systems
•Imprecise Queries
•Textual Data
•Web Search Engines
•Bayes Theorem
•Regression Analysis
•EM Algorithm
•K-Means Clustering
•Time Series Analysis
•Algorithm Design Techniques
•Algorithm Analysis
•Data Structures
•Neural Networks
•Decision Tree Algorithms
17