Download What is Data Mining?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human genetic clustering wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Informatiseringscentrum
Patterns in Usage Data
Victor Maijer
University of Amsterdam
2 June 2006, Vancouver
Informatiseringscentrum
Overview
-
Introduction
-
Data Mining
-
Results
-
Sakai & DM
-
Conclusion
Informatiseringscentrum
Introduction
•
•
•
•
•
UvA founded in 1632 (Atheneum Illustre)
7 schools (faculty), 1518 study programmes
25.000 students, 3500 employees (2000 academic staff)
Blackboard is our VLE since 1999, 13.000 users per day
We run OSP and regard Sakai as a potential succesor of
Blackbaord
Informatiseringscentrum
Strategic Information

Stakeholders need strategic information in order to make
decisions
Stakeholders are:

Instructors

Administrators

Management

Support

Etc.
Informatiseringscentrum
Data Warehouse

Provides an integrated and total view of
learning/collaboration systems

Makes the systems current and historical information
easily available for decision making

Makes decision-support transactions possible without
hindering operational systems

Presents a flexible and interactive source of strategic
information
Informatiseringscentrum
Architecture
Informatiseringscentrum
Info for Administrators & Management
Informatiseringscentrum
Why I went mining
•
I had data, a lot
•
I did it before
•
I wanted to do some fun stuff
Official reason (the one I tell my boss):
• We needed strategic information about how our VLE
evolved
Informatiseringscentrum
What is Data Mining?
•
Data mining is the extraction of implicit, previously
unknown, and potentially useful information from data.
•
Clustering is a data mining technique that applies when
instances are to be divided into natural groups.
Informatiseringscentrum
Example
Course
Documents
ABBA
36
BEATLES
4
COLDPLAY
30
DARKHORSES
2
ELASTICA
24
Group
Members
Average Docs
A
ABBA,
COLDPLAY,
ELASTICA
30
B
BEATLES,
DARKHORSES
3
Informatiseringscentrum
Procedure
•
•
•
•
•
•
•
•
Determine mining questions
Determine source (tables)
Verify by changing items via GUI
Identify needed output formats for analysis
Define SQL-queries
Program scripts (Perl)
Determine which clustering techniques you want to apply
Analyze (cluster).
‘Weka’ is an excellent JAVA OS tool for Data Mining
http://www.cs.waikato.ac.nz/ml/weka/
Informatiseringscentrum
Domains clustered
•
CourseSites and its content
•
Users (instructors)
•
Sessions (student)
Informatiseringscentrum
Site clusters

Basic usage (content + announcements)

Extended usage
Cluster
Size(%)
N
A
87
1547
B
7
122
C
4
66
D
2
43
20
140
120
15
100
DiscussionFora
80
Content
Announcement
60
Gradebook
Tests
10
Groups
40
5
20
0
Cluster A
Cluster B
Cluster C
Cluster D
0
Cluster A
Cluster B
Cluster C
Cluster D
Informatiseringscentrum
Content clusters
100
80
Test
60
Asignment
Document
External Link
40
Folder
20
0
Cluster A
Cluster B
Cluster C
Cluster D
Cluster
Size(%)
N
A
91
1636
B
3
62
C
3
57
D
3
45
Informatiseringscentrum
Instructor activity clusters
600
500
400
Announcements
Content
300
Dropbox
DiscussionBoard
200
Gradebook
Test
100
0
Cluster A
Cluster B
Cluster C
Cluster D
Cluster
Size(%)
N
A
88
1443
B
7
115
C
4
61
D
1
15
Informatiseringscentrum
Student session clusters
180
160
140
120
100
80
60
40
20
0
171,45
Clicks
Dur(min)
63,4
25,5
29,7
Cluster B
Cluster C
8,7 3,8
Cluster A
Cluster
Size(%)
N
A
91
1294K
B
6
90K
C
2
32K
Informatiseringscentrum
Extra
•
•
Female students click significant more than male
students and have significant longer sessions
Any ideas?
Informatiseringscentrum
Sakai & Data mining
•
•
•
Our UvA Pilots were too small to analyze
Content can be clustered
Events are difficult to cluster (not enough logging
compared to Blackboard
Informatiseringscentrum
Implications
•
•
Put rumours into perspective
Differentiate to user groups
– Support
– Functionality
Informatiseringscentrum
Conclusion
•
Methods
– Clustering can be used to discover usage patterns
– You need appropiate hardware for preprocessing and
clustering
•
Results
– Basic Usage (Documents & Announcements)
– Duration of a session is a couple of minutes
– Extended Usage grows but is limited
•
•
Sakai needs more logging if it wants to compete with
Blackboard
A Sakai warehouse would be nice
Informatiseringscentrum
Evolvement
Users
Usage
0