Download Course on Data Mining

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course on Data Mining (581550-4)
Intro/Ass. Rules
7.11.
24./26.10.
Clustering
14.11.
Episodes
KDD Process
Home Exam
30.10.
Text Mining
21.11.
28.11.
Appl./Summary
Course on Data Mining
Page
1/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Accepted to Autumn 2001 Course
 Arkko Jouko
 Löfström Jaakko
 Sahlberg Mauri
 Asikainen Tomi
 Malinen Johanna
 Saikku Arja
 Aunimo Lili
 Mäkelä Eetu
 Sundman Jonas
 Hyvönen Leena
 Ojala Petri
 Tarvainen Tero
 Johansson Carl
 Palin Kimmo
 Tiihonen Sami
 Jokinen Sakari
 Pasanen Janne
 Tolvanen Juha
 Kerminen Antti
 Pietilä Mikko
 Uusitalo Petri
 Kuokkanen Ville
 Pitkänen Esa
 Vasankari Minna
 Lehmussaari Kari
 Rapiokallio Maarit
 Lehtonen Miro
 Roos Teemu
 Virtanen Otso
Course on Data Mining
Page
2/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Lecturers
Lectures
Course Material
Exercises
Contents
Course on Data Mining
Page
3/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Dr. Mika Klemettinen
•
PhD Mika Klemettinen:
–
–
–
–
•
Email: [email protected]
WWW: http://www.cs.helsinki.fi/u/mklemett/
Room: B356
Tel: 050-483 6661
PhD in January 1999:
– Thesis: A Knowledge Discovery
Methodology for Telecommunication
Network Alarm Databases
•
Data mining and SGML/XML related
research at UH/CS (1994-2000) and at
Nokia (2000-)
Course on Data Mining
Page
4/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Dr. Pirjo Moen
•
PhD Pirjo Moen:
–
–
–
–
•
Email: [email protected]
WWW: http://www.cs.helsinki.fi/pirjo.moen/
Room: B350
Tel:191 44238
PhD in February 2000:
– Thesis: Attribute, Event Sequence, and Event
Type Similarity Notions for Data Mining
•
Data mining related research at UH/CS
(1994-)
Course on Data Mining
Page
5/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
DM/SGML/XML at UH/CS
•
RATI (A structured text database system/
Rakenteiset tekstitietokannat), 1988-91
•
Data mining from telecommunication
alarm data, 1994-97
•
Structured and Intelligent Documents (SID),
1995-98
•
From Data to Knowledge (FDK), 1995-
•
Knowledge worker’s workstation
(TYTTI), 2000-02
•
DM Group (99), DOREMI Group (00)
Linux was invented here!
Course on Data Mining
Page
6/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
NRC in Short
•
•
•
•
Nokia is the global leader in digital
communication technologies with around
60 000 employees all over the world
Nokia Research Center (NRC) has
around 1 200 employees in Finland, USA,
Japan, China, Germany, Hungary, UK, etc.
NRC's role is to enhance the Nokia's
technological competitiveness by exploring
and developing new technologies
Strongly involved in many European
Union and national research projects
Course on Data Mining
Page
7/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
DM Group at NRC
•
•
Background:
– At the University of Computer
Science data mining methods and
theory of data mining since late 80´s
– Association and episode rule mining,
time series similarity, analysis of
telecommunication alarm data and
web logs, etc.
Other members include:
– Dr. Heikki Mannila (group leader)
– Dr. Hannu Toivonen
Course on Data Mining
Page
8/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Lectures (1)
•
24.10.-30.11.2001 (12 lectures):
– 7 normal lectures
– 5 seminar like lectures
•
Wed 14-16, Fri 12-14 (A217):
– Wed: normal lecture
– Fri: seminar like lecture (except for 26.10.)
•
Lectures are obligatory:
– Normal lectures: 5/7
– Seminar like lectures: 4/5
•
Lists are circulated
Course on Data Mining
Page
9/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Lectures (2)
•
Lecturing language is Finnish, slides are
in English:
– Students can also use English
– A foreign student group can be established
•
Normal lectures:
– Basics, terminology, standard methods
– Lecturer driven teaching
•
Seminar like lectures:
– Extensions to the basic methods
– Lecturer gives an introduction
– Student groups give short presentations
Course on Data Mining
Page
10/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Lectures (3)
•
Group for seminar (and exercise) work:
– 10 groups, à 3 persons, 2 groups/lecture
– Dates are agreed at the beginning of course
– Articles are given on previous week's Wed
•
Seminar presentations:
– Presentation in an HTML page (around 3-5
printed pages) due to seminar starting:
• Can be either a HTML page or a printable
document in PostScript/PDF format
– 30 minutes of presentation
– 5-15 minutes of discussion
– Active participation
Course on Data Mining
Page
11/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Course Material
•
•
•
•
Lecture slides
Original articles
Seminar presentations
Book: "Data Mining: Concepts and
Techniques" by Jiawei Han and Micheline
Kamber, Morgan Kaufmann Publishers,
August 2000. 550 pages. ISBN 1-55860489-8
•
Remember to check course website and
folder for the material!
Course on Data Mining
Page
12/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Exercises
•
Given by Pirjo Moen:
– Email: [email protected]
– Room: B350
– Tel: 191 44238
•
•
•
1.11.-29.11.2001 (5 exercises)
Thu 12-14 (A318)
Exercises are obligatory:
– Exercises: 4/5
•
•
Lists are circulated
Discussion is an essential part!
Course on Data Mining
Page
13/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Exercises
•
Usually around 3-4 exercises:
– 2-3 "normal" exercises (with subtasks):
• Available due Thu mornings at 9
– 1 group work:
• A practical exercise
• Available due Thu mornings at 9
• A written report (not hand-written!) must be
returned at the exercise session
• Group = the seminar presentation group
•
Foreign students:
– Return all exercises in written format to
Pirjo Moen
Course on Data Mining
Page
14/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Home Exam
•
•
•
The home exam is given on 28.11.2001
Must be returned by 21.12.2001 (printed
version, not hand-written, not by email)
Tentatively:
– Course lectures, seminar presentations and
exercises are the material for the exam
– Questions contain both theoretical and
practical issues
– Around 4-6 smaller questions
– Around 1-2 bigger questions
Course on Data Mining
Page
15/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Course Evaluation
•
•
Scale: 1-/3 … 3/3 or rejected
Grade = home exam + exercises +
experiments + group presentations:
– home exam: max 30 points
• (4 X 5p) + (1 X 10p)
– normal exercises (10): max 5 points
• 2: 1p, 4: 2p, 6: 3p, 8: 4p, 10: 5p
– experiments (5): max 15 points
• max 3 points/experiment
– group presentation: max 10 points
Course on Data Mining
Page
16/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Course Evaluation
•
Passing the course: min 30 points
– home exam: min 13 points (max 30 points)
– exercises/experiments: min 8 points (max
20 points)
• at least 3 returned and reported experiments
– group presentation: min 4 points (max 10
points)
•
Remember also the other requirements:
– Attending the lectures (5/7)
– Attending the seminars (4/5)
– Attending the exercises (4/5)
Course on Data Mining
Page
17/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Course Contents (1)
•
Module/Week 1:
–
–
–
–
•
What is Data Mining?
Association rules
24.10. normal lecture by Mika
26.10. normal lecture by Mika
Module/Week 2:
–
–
–
–
Recurrent patterns
Episode rules, minimal occurrences
31.10. normal lecture by Mika
2.11. seminar like lecture by Pirjo
Course on Data Mining
Page
18/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Course Contents (2)
•
Module/Week 3:
– Text mining
– 7.11. normal lecture by Mika
– 9.11. seminar like lecture by Mika
•
Module/Week 4:
–
–
–
–
–
Clustering
Classification
Similarity
14.11. normal lecture by Pirjo
16.11. seminar like lecture by Mika
Course on Data Mining
Page
19/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization
Course Contents (3)
•
Module/Week 5:
–
–
–
–
•
Knowledge discovery process
Pre- and postprocessing
21.11. normal lecture by Pirjo
23.11. seminar like lecture by Pirjo
Module/Week 6:
–
–
–
–
Data mining tools
Summary, future
28.11. normal lecture by Pirjo
30.11. seminar like lecture by Pirjo
Course on Data Mining
Page
20/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization / Groups
Group Establishment
•
•
Group is for both seminar and weekly
group exercise work
10 groups à 3 persons
Get grouped!
Course on Data Mining
Page
21/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization / Groups
• Group presentation time allocation:
– Fri 2.11.:
Group 1, Group 2
(associations)
– Fri 9.11.:
Group 3, Group 4
(episodes)
– Fri 16.11.:
Group 5, Group 6
(text mining)
– Fri 23.11.:
Group 7, Group 8
(clustering)
– Fri 30.11.:
Group 9, Group 10
(KDD process)
Course on Data Mining
Page
22/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization / Groups
• Group 1:
– Asikainen Tomi, Hyvönen Leena
• Group 2:
– Löfström Jaakko, Pitkänen Esa, Tarvainen Tero
• Group 3:
– Jokinen Sakari, Kuokkanen Ville, Tolvanen Juha
• Group 4:
– Lehmussaari Kari, Pietilä Mikko, Uusitalo Petri
• Group 5:
– Johansson Carl, Kerminen Antti, Sundman Jonas
Course on Data Mining
Page
23/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Course Organization / Groups
• Group 6:
– Malinen Johanna, Sahlberg Mauri, Vasankari Minna
• Group 7:
– Arkko Jouko, Ojala Petri, Rapiokallio Maarit
• Group 8:
– Palin Kimmo, Pasanen Janne (, X)
• Group 9:
– Aunimo Lili, Lehtonen Miro, Saikku Arja
• Group 10:
– X, X, X
Course on Data Mining
Page
24/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Introduction to Data Mining (DM)
What? Why?
Applications
KDD Process
DM Views
Major Issues
Course on Data Mining
Page
25/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Computers in 1940s (ENIAC)
Course on Data Mining
Page
26/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Personal Home Network in 2000s
File Edit Locate
View
Storage
H elp
500
E
D
C
B
A
400
300
200
100
0
1
2
3
4
5
6
Network
Traffic
7
Mount
431
7437 1950
79%
/
02 631963
47358
Help
93%
/us
Storage
Storage
Storage
Storage
Storage
Storage
Internet
Storage
Course on Data Mining
Page
27/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Evolution of Database Technology
• 1960s:
– Data collection, database creation, IMS and network DBMS
• 1970s:
– Relational data model, relational DBMS implementation
• 1980s:
– RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
• 1990s:
– Data mining and data warehousing, multimedia databases, and
Web technology
Course on Data Mining
Page
28/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Why Data Mining?
• Enormous amounts of
data available:
– Automated data collection
tools and mature database
technology lead to huge
amounts of data stored in
databases, data warehouses
and other information
repositories
– Manual inspection is either
tedious or just impossible
Course on Data Mining
Page
29/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
What is Data Mining?
• Ultimately:
– "Extraction of interesting (non-trivial,
implicit, previously unknown, potentially
useful) information or patterns from data in
large databases"
• Often just:
– "Tell something interesting about this data",
"Describe this data"
 Exploratory, semi-automatic data
analysis on large data sets
Course on Data Mining
Page
30/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
What is Data Mining?
• Rather established terminology:
– Data mining
• Usually DM is one part of KDD process
– Knowledge discovery in databases (KDD)
• The general term that covers, e.g., data
preprocessing, DM, and post-processing
• Not so often used terms:
– Knowledge extraction, data archeology
• Newest hype:
– Business intelligence, knowledge
management
Course on Data Mining
Page
31/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
What is DM Useful for?
Increase knowledge to base
decision upon
E.g., impact on marketing
The role and importance
of KDD and DM has
growed rapidly - and is
still growing!
But DM is not just
marketing...
Marketing
Database
Marketing
Data
Warehousing
KDD &
Data Mining
Course on Data Mining
Page
32/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Potential Applications?
• Database analysis and decision
support:
– Market analysis and management
– Risk analysis and management
– Fraud detection and management
• Other applications:
– Web mining
– Text mining
– etc.
Course on Data Mining
Page
33/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Example (1)
• You are a marketing manager for a
cellular telephone company:
– Customers receive a free phone (worth 150€)
with one-year contract; you pay a sales
commission of 250€ per contract
– Problem: Turnover (after contract expires) is
25%
– Giving a new phone to everyone whose
contract is expiring is very expensive
– Bringing back a customer after quitting is
both difficult and expensive
Course on Data Mining
Page
34/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Example (1)
Yippee!
I won't leave!
• Three months before a
contract expires, predict
which customers will leave:
– If you want to keep a customer
that is predicted to leave, offer
them a new phone
Course on Data Mining
Page
35/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Example (2)
Oh, yes!
I love my
Ferrari!
• You are an insurance
officer and you should
define a suitable monthly
payment for an 18-year-old
boy who has bough a
Ferrari … what to do?
Course on Data Mining
Page
36/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Example (2)
• Analyze all previous customer data
and paid compensations data
• What is the predicted accident
probability based on…
– Driver's gender (male/female) and age
– Car model and age, place of living
– etc.
• If the accident probability is higher
than on average, set the monthly
payment accordingly!
Course on Data Mining
Page
37/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Example (3)
• You are in a foreign country and
somebody steals or duplicates your
credit card or mobile phone …
• Credit card companies …
– use historical data to build models of
fraudulent behaviour and use data mining to
help identify similar instances
• Phone companies …
– analyze patterns that deviate from an
expected norm (destination, duration, etc.)
Course on Data Mining
Page
38/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Example (4)
Excellent surfing
experience!
• Web access logs can be analyzed
for …
– discovering customer preferences
– improving Web site organization
• Similarly …
– all kinds of log information analysis
– user interface/service adaptation
Course on Data Mining
Page
39/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Knowledge Discovery Process (1)
Learning the domain
Creating a target data set
Data cleaning/preprocessing
Data reduction/projection
Choosing the DM task
Course on Data Mining
Page
40/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Knowledge Discovery Process (2)
Choosing the DM algorithm(s)
Data mining: Search
Pattern evaluation
Knowledge presentation
Use of discovered knowledge
Course on Data Mining
Page
41/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Typical KDD Process
Time
based
selection
Raw
data
Operational
Database
Eval. of
interestingness
Input data
1
Preprocessing
Data mining
Cleaned
Verified
Focused
2
Utilization
Postprocessing
Results
3
Selected
usable
patterns
Course on Data Mining
Page
42/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Utilization
Increasing potential
to support
business decisions
Making
Decisions
End User
Data Presentation
Visualization Techniques
Business
Analyst
Data Mining
Information Discovery
Data
Analyst
Data Exploration
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
OLAP, MDA
DBA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
Course on Data Mining
Page
43/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
The Value Chain
Decision
• Promote product A in region Z.
Knowledge
• Mail ads to families of profile P
• Cross-sell service B to clients C
• A quantity Y of product A is used in
region Z
• Customers of class Y use x% of C
during period D
Information
• X lives in Z
Data
• Customer data
• S is Y years old
• X and S moved
• W has money in Z
• Store data
• Demographical Data
• Geographical data
Course on Data Mining
Page
44/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Data Mining Views
• General approaches:
– Descriptive data mining:
• Describe what interesting can be
found in this data!
• Explain this data to me!
– Predictive data mining:
• Based on this and previous data,
tell me what will happen in the
future!
• Show me the future trends!
Course on Data Mining
Page
45/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Data Mining Views
• Views based on …
– Databases to be mined
– Knowledge to be discovered
– Techniques utilized
– Applications adapted
• Let's take a closer look at
these views...
Course on Data Mining
Page
46/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Data Mining Views
Databases to be mined
Databases
•
•
•
•
•
•
•
Relational
Transactional
Object-oriented
Object-relational
Active
Spatial
Time-series
•
•
•
•
•
•
•
Text, XML
Multi-media
Heterogeneous
Legacy
Inductive
WWW
etc.
Course on Data Mining
Page
47/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Data Mining Views
Knowledge to be mined = tasks
Knowledge • Characterization
=
• Discrimination
task
•
•
•
•
Association
Classification
Clustering
Trend
• Deviation
analysis
• Outlier analysis
• etc.
Course on Data Mining
Page
48/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Data Mining Views
Techniques utilized
•
Techniques
•
•
•
•
•
•
Database-oriented
Data warehouse (OLAP)
Machine learning
Statistics
Visualization
Neural networks
Etc.
Course on Data Mining
Page
49/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Data Mining Views
Applications adapted
Applic.
• Retail
(supermarkets
etc.)
• Telecom
• Banking
• Fraud analysis
• DNA mining
• Stock market
analysis
• Web mining
• Log data analysis
• etc.
Course on Data Mining
Page
50/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Major Issues in Data Mining
• Mining methodologies and interaction:
–
–
–
–
–
–
–
Mining different kinds of knowledge
Interactive mining of knowledge
Incorporation of background knowledge
DM query languages and ad-hoc DM
Visualization of DM results
Handling noise and incomplete data
The interestingness problem
• Performance and scalability:
– Efficiency and scalability of DM algorithms
– Parallel, distributed and incremental mining methods
Course on Data Mining
Page
51/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Major Issues in Data Mining
• Diversity of data types:
– Handling complex types of data
– Mining information from heterogeneous databases (Web etc.)
• Application and integration of discovered knowledge:
– Domain-specific DM tools
– Intelligent query answering and decision making
– Integration of discovered knowledge with existing knowledge
• Protection of data …
– Security
– Integrity
– Privacy
Course on Data Mining
Page
52/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Historical Data Mining Activities
•
•
•
•
•
•
1989 IJCAI Workshop
1991-1994 KDD Workshops
1995-1998 KDD Conferences
1998 ACM SIGKDD
1999- SIGKDD Conferences
And many smaller/new DM conferences …
– PAKDD, PKDD
– SIAM-Data Mining, (IEEE) ICDM
– etc.
Course on Data Mining
Page
53/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Useful References on Data Mining
“Standards”
• DM:
Conferences:
Journals:
KDD, PKDD, PAKDD, ...
Data Mining and Knowledge
Discovery, CACM
• DM/DB:
Conferences:
ACM-SIGMOD/PODS, VLDB, ...
Journals:
ACM-TODS, J. ACM,
IEEE-TKDE, JIIS, ...
Conferences:
Journals:
Machine Learning, AAAI, IJCAI, ...
Machine Learning, Artific. Intell., ...
• AI/ML:
Course on Data Mining
Page
54/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Conclusions
• Data mining: semi-automatic discovery of interesting
patterns from large data sets
• Knowledge discovery is a process:
– Preprocessing
– Data mining
– Postprocessing
• To be mined, used or utilized different …
–
–
–
–
Databases (relational, object-oriented, spatial, WWW, …)
Knowledge (characterization, clustering, association, …)
Techniques (machine learning, statistics, visualization, …)
Applications (retail, telecom, Web mining, log analysis, …)
Course on Data Mining
Page
55/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Conclusions
•
Module/Week 1:
–
–
–
–
•
What is Data Mining?
Association rules
24.10. normal lecture by Mika
26.10. normal lecture by Mika
Module/Week 2:
– Episode rules, minimal occurrences
– 31.10. normal lecture by Mika
– 2.11. seminar like lecture by Pirjo
•
Module/Week 3:
– Text mining
– 7.11. normal lecture by Mika
– 9.11. seminar like lecture by Mika
Course on Data Mining
Page
56/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Conclusions
•
Module/Week 4:
– Clustering, Classification, Similarity
– 14.11. normal lecture by Pirjo
– 16.11. seminal like lecture by Mika
•
Module/Week 5:
–
–
–
–
•
Knowledge discovery process
Pre- and postprocessing
21.11. normal lecture by Pirjo
23.11. Seminar like lecture by Pirjo
Module/Week 6:
– Data mining tools, Summary, Future
– 28.11. normal lecture by Pirjo
– 30.11. seminal like lecture by Pirjo
Course on Data Mining
Page
57/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Seminar Presentations
• Seminar presentations:
– Articles are given on previous
week's Wed
– Presentation in an HTML page
(around 3-5 printed pages) due
to seminar starting:
• Can be either a HTML
page or a printable
document in
PostScript/PDF format
– 30 minutes of presentation
– 5-15 minutes of discussion
– Active participation
Course on Data Mining
Page
58/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Seminar Presentations/Groups 1-2
Quantitative Rules
MINERULE
Course on Data Mining
Page
59/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Seminar 1/2: Quantitative Rules
• R. Srikant, R. Agrawal: "Mining Quantitative Association Rules in
Large Relational Tables", Proc. of the ACM-SIGMOD 1996
Conference on Management of Data, Montreal, Canada, June
1996.
Course on Data Mining
Page
60/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Seminar 2/2: MINERULE
• Rosa Meo, Giuseppe Psaila, Stefano Ceri: "A New SQL-like
Operator for Mining Association Rules". VLDB 1996: 122-133
Course on Data Mining
Page
61/62
Mika Klemettinen and Pirjo Moen
University of Helsinki/Dept of CS
Autumn 2001
Introduction to Data Mining (DM)
Thank you for
your attention and
have a nice course!
Thanks to Jiawei Han from Simon Fraser University for his slides
which greatly helped in preparing this lecture! Also thanks to Fosca
Giannotti and Dino Pedreschi from Pisa for their slides.
Course on Data Mining
Page
62/62