Download COPIAS DE SEGURIDAD

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia, lookup

Nonlinear dimensionality reduction wikipedia, lookup

Expectation–maximization algorithm wikipedia, lookup

K-means clustering wikipedia, lookup

Cluster analysis wikipedia, lookup

Transcript
A Decision Support System to
improve e-Learning Enviromments
Marta Zorrilla, Diego García, Elena Álvarez
University of Cantabria, Spain
Motivation
y
y
y
Nowadays most universities and
educational centres offer e-learning
courses
BUT, due to the lack of face-to-face,
instructors have real difficulties knowing the
rhythm and progress of their students
E-learning platforms provide some tools to
monitor and track the student activity
◦ Poor information
◦ Difficult to get a clear vision of each student
MATEP Data warehouse
Goal
y
y
Develop a module to answer questions such as
OLAP
DATA MINING
When do students connect to
the system?
Knowing students’ profiles
according to demographic and
navigation information
Do they work online?
Grouping students’ according to
their style of learning
How often do they use
collaborative tools? Which one?
Which tools do they use
together?
Who leaves the course and
when?
Knowing drop-out students’
profile
Instructors can obtain models without requiring data
mining knowledge
How to get it
y
Defining templates for each question
◦
◦
◦
◦
y
choose the variables
specify how to obtain them (ETL)
determine the data mining algorithm
establish parameters of the algorithm
Two main difficulties
◦ Choosing parameters of algorithms
◦ Showing the results in an easy and
understandable way (GUI)
Web Service Architecture
Backboard
module
...
Generate
XML file
Moodle
module
se
o
o
Ch
at e
l
p
t em
Templates DB
(XML schemas)
Wrapper
Send data
XML file
Refine results
View results
XML
schema
validation
Modify
parameters
Visualize
results
Data
validation
Parameters
selection
Data Mining
algorithms
Proposal of patterns: Student
profile
◦ Variables: gender, age, number of sessions, time spent,
average sessions per week, average time spent per week.
◦ Algorithms:
x EM (Expectation-Maximization) / SOM Æ nº clusters
x Kmeans
Average
Cluster 0
Cluster 1
Cluster 2
Age
22
24
22
21
Gender
Male
Male
Female
Male
TotalTime
1976
1313
2290
2085
Sessions
128
80
146
141
AvgTimeWeek
115
76
134
122
AvgSessionWeek
7
4
8
7
Instances:
26% (10)
Instances
44% (17)
Instances
31% (12)
Proposal of patterns: Resources
used together
◦ Variables: sessionID, boolean variable for each
resource used in the course.
◦ Algorithms:
x Apriori
x Borgelt (more efficient and lower number of rules)
DISCUSSION <- NoASSIGNEMENT NoCONTENT (43.9, 71.2)
DISCUSSION <- NoASSIGNEMENT NoCONTENT NoMAIL (32.5, 76.6)
DISCUSSION <- NoASSIGNEMENT NoCONTENT ORGANIZER NoMAIL (21.8, 70.5)
NoASSIGNEMENT <- DISCUSSION NoCONTENT NoMAIL (35.3, 70.5)
NoCONTENT <- DISCUSSION (61.3, 73.2)
NoCONTENT <- DISCUSSION NoASSIGNEMENT (39.7, 78.7)
NoCONTENT <- DISCUSSION NoASSIGNEMENT NoMAIL (32.4, 76.9)
NoCONTENT <- DISCUSSION NoASSIGNEMENT ORGANIZER (26.9, 70.3)
NoCONTENT <- DISCUSSION NoMAIL (49.3, 71.6)
NoMAIL <- (100.0, 80.6)
ORGANIZER <- (100.0, 80.3)
… 57 rules
Proposal of patterns: Session
profile
◦ Variables: time spent in session, hits and time
spent in content-pages, hits and time spent in
collaborative resources (mail, discussion, chat)
and in the rest of resources of the course.
◦ Algorithms:
x EM (Expectation-Maximization) / SOM Æ nº clusters
x Kmeans
Session profile
Average
Cluster 0
Cluster 1
Cluster 2
Cluster 3
SessionTime
14.0658
6.0482
27.0222
51.8116
66.7015
hit_mail
0.6873
0.6191
2.2519
0.9799
0.6741
hit_discussion
8.9338
7.1086
23.9259
17.4246
16.9726
hit_chat
0.0625
0.0021
2.3185
0.0402
0.0373
hit_contentpage
1.4481
0.6111
2.363
0.8769
11.5572
hit_assignments
1.1112
0.5813
3.3111
6.2739
1.4975
hit_weblinks
0.0672
0.0184
0.6222
0.0804
0.4428
hit_organizer
2.3489
1.5293
3.9259
3.1206
10.7015
hit_learningobjectives
0.1315
0.0955
0.7333
0.1834
0.301
hit_other
1.0856
0.7269
5.3333
3.4648
1.5249
time_mail
0.725
0.5591
4.1852
1.2965
0.9502
time_discussion
3.1068
1.9746
5.8667
9.0101
9.6592
time_chat
0.0018
0
0.0741
0
0
time_contentpage
4.9652
1.7227
5.8963
3.2613
44.5
time_assignments
2.9017
0.6796
5.3111
27.7412
3.6517
time_weblinks
0.0321
0.0116
0.6296
0.0226
0.0821
time_organizer
0.6511
0.2741
1.1037
0.9573
4.6318
time_learningobjectives
0.0178
0.0148
0.0444
0.0201
0.0423
time_other
1.201
0.501
2.5407
8.3367
1.9254
Instances:
83% (4731)
Instances:
2% (135)
Instances:
7% (398)
Instances:
7% (402)
Conclusions
y
In the instructor’s opinion, the models
allow her
◦ to gain an insight about her students and
the use of the course.
◦ To validate or refute hypothesis used in
the design of the learning process.
y
It is necessary to show the results in a
more intuitive way which allows
instructors to interpret them easily
Key issues
y
Towards Data Mining without
parameters:
◦ Choosing algorithms and their parameters
y
Data Mining models visualization:
◦ Showing the results in an easy and
understandable way
x Java 2D/3D.
x Wrapper (Matlab, Mathematica, IRIS explorer,
Graphviz, …)
Software architecture
WEB Client
JavaScript / AJAX
DHTML / XML
CSS / XSLT
Application Server
JSP / Servlets
DB Access
XML
Parser
ODBC/
JDBC
DOM
….
Backboard
SQL Server
Moodle
MySQL
Data
mining
Java EE v.5
Display
Criterios para resultados de
asociación
y
y
y
y
y
y
y
y
- un nº suficiente ni 3 ni 100, entorno a 10 y luego si quiere
más pues se generan más específicas
- sin repetición de semántica, esto es que sea realmente
distintas (huevos y arroz -> tomate, huevos -> tomate) ambas
con confiazan muy parecida (boost más de un % de
diferencia entre una regla y otra)
- un atributo en el consecuente resulta más sencillo de leer
- un nº reducido en el antecedente (de 3 a 4 atributos) sino
muy difícil de interpretar (quizá en elearning, en cambio en
cesta de la compra podría tener sentido... en función del nº
de atributos inicial...
- no tener que determinar parámetros de entrada (soporte ,
confianza, boost o lift,...)
Borgelt y Balcazar !!!-> usar positivos o positivos y negativos,
si solo se usan positivos habrá que bajar el soporte y la
confianza
Sugerir quitar el recurso con el que se accede a la
herramienta
Predictive apriori…