Download Introduction - UCLA Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Knowledge Discovery from
DataBases (KDD)
A.K.A. Data Mining &
by other names as well
Carlo Zaniolo
UCLA CS Dept
1
What is Data Mining?
Data mining
Extraction of interesting (non-trivial, implicit,
previously unknown & potentially useful) patterns
or knowledge from huge amount of data.
Alternative names
Knowledge discovery (mining) in databases
(KDD), knowledge extraction, data/pattern
analysis, data archeology, data dredging,
information harvesting, business intelligence, ...
2
Why Data Mining?
 Explosive growth of data available—the Big-Data
Revolution
Business: Web, e-commerce, transactions, stocks, …
Science: Remote sensing, bioinformatics, scientific
simulation, …
Society and everyone: news, digital cameras, ...
 We are drowning in data -- but starving for
knowledge!
Knowledge is the key to improve your business and
operations
Data Mining tools and techniques: automate knowledge
discovery from large data sets
3
DM Applications
E.g.: Marketing products to customers:
1.
Find clusters of customers who share the same
characteristics: interest, income level, spending
habits, etc.,
2. Determine customer purchasing patterns over time
3. Cross-market analysis—Find associations/corelations between product sales (and predict on
that basis)
4. Profiling—What types of customers buy what
products.
4
DM Applications:
Fraud Detection and Security
 Approaches: Clustering & outlier detection, looking for
unusual patterns.
 Applications: Health care, retail, credit card service,
telecomm.
Auto insurance: ring of collisions
Money laundering: suspicious monetary transactions
Medical insurance
Professional patients, ring of doctors, and ring of references
Unnecessary or correlated screening tests
Telecommunications: phone-call fraud
Phone call model: destination of the call, duration, time of day
or week. Analyze patterns that deviate from an expected norm
Anti-terrorism
5
New Applications
Software Bug Mining
Graph Mining: e.g. finding social networks
Web Mining
Personalization and reccomendations
Mining and Scientific Applications—Biology
Spatio-Temporal and GIS:
Find geographical clusters.
Mine for trajectories and travel plans.
Multi Relational Data Mining
Mining for knowledge and relationship from
multiple tables, as in
Inductive Logic Programming.
6
New Research Topics
Theoretical foundations
Statistical Data Mining
Visual Data Mining
Privacy-Preserving Data Mining
7
A Historical Perspective
1. Machine Learning (AI)
2. Decision Support Environments:
Scalability, Integration, Warehousing,
OLAP (DB)
3. Statistical foundation and synergism with
other disciplines—e.g., visualization.
4. Mining Streams of sensor & web data
8
Work plan
 Introduction

Core Techniques:
1. Classification,
2. Association, and
3. Clustering
 Process and Systems
 New Applications and Research Directions
9
Knowledge Discovery (KDD) Process
Data mining—core of
knowledge discovery
process
Useful New
knowledge
Pattern& Rules
Auditing
Task-Specific Data
Data Mining
Data Warehouse
Data Selection &
preprocessing
Data Cleaning
Data Integration
Data Sources: transactional &
operational data
10