Download Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
UNIVERSITÉ DU QUÉBEC À MONTRÉAL
in collaboration with
FREUDENTHAL INSTITUTE OF SCIENCE AND MATHEMATICS EDUCATION
UNIVERSITY OF UTRECHT
DME9200 STAGE DE RECHERCHE II
REPORT
PRESENTED TO
PAUL DRIJVERS
AND
JEAN BÉLANGER
CLAUDIA CORRIVEAU
JUNE 17TH 2010
Table of contents
Introduction ......................................................................................................................... 3
1. Objectives of the internship ............................................................................................ 3
1.1 Internship purpose and modalities ............................................................................ 3
1.2 Agreement on the work to do.................................................................................... 3
1.3 The objectives of the internship report ..................................................................... 4
2. Presentation of the Freudenthal Institute ........................................................................ 4
3. Presentation of the research ............................................................................................ 5
4. Collecting and analyzing data ......................................................................................... 5
4.1 Log files .................................................................................................................... 6
3.2 Data treatment: data mining ...................................................................................... 7
3.2.1 Decision tree .......................................................................................................... 8
3.2.2 Hierarchical clustering ......................................................................................... 11
3.2.3 Correspondence analysis ...................................................................................... 15
5. Analysis......................................................................................................................... 19
5.1 Activity 1 ................................................................................................................ 19
5.2 Activity 2 ................................................................................................................ 20
5.3 Comments and questions ........................................................................................ 20
References ......................................................................................................................... 22
ANNEXES ........................................................................................................................ 23
2
Introduction
As part of the doctoral program1, two research internships have to be made by the
students. As it is explained in the course descriptor, the research internships are intended
to provide an opportunity to acquire a wider vision of what it means to do research and it
is also an occasion to interact with researchers and learn from those interactions. I had the
great chance to accomplish one of those two internships at the Freudenthal Institute for
Science and Mathematics Education (Department of Mathematics) based in Utrecht,
Netherlands, from May 25th to June 18th.
1. Objectives of the internship
In this part of the report, I will present the internship purposes from the university point
of view. I will then present what was planned for me to do and finally, I will present the
objectives of this report.
1.1 Internship purpose and modalities
The internship is seen as research training through active participation on different parts
of a research. The student must work 135 hours on a project that differs from his or her
own research project. Before coming to Utrecht, I had to write an internship project that
described what I would do during the internship. I will briefly explain what was asked for
me to do.
1.2 Agreement on the work to do
I was asked to work on a project concerning the use of ICT tools at upper secondary level
to learn algebraic skills. Four types of expectations were requested: in terms of actions,
production, contribution and attendance. In other words, I would have to analyze the data,
to show initiatives in finding ways to analyze the data and to present my contribution in
an exhaustive report.
1
Doctoral program in education, Université du Québec à Montréal
3
1.3 The objectives of the internship report
This report has several purposes. The first one is to make the work done available to the
researchers involved in the research. It is also an opportunity to link the work done with
both the specific objectives of the work planned and the larger objectives of the
internship (from the university point of view).
2. Presentation of the Freudenthal Institute
In the field of Mathematics Education, the Freudenthal Institute for Science and
Mathematics Education is internationally recognized. The Institute was first set up by
Hans Freudenthal in 1971, a mathematician (professor and researcher) who was also
interested by pedagogical issues. The Institute was then called (in English) Institute for
Development of Mathematics Education. In honor of the founder, it merged to the
Freudenthal Institute for Science and Mathematics Education.
A major instruction theory developed by Freudenthal, and still present in the philosophy
of the Institute, is called Realistic Mathematic Education (RME). In this approach, it is
said that, to learn, the student must experienced the mathematics. In other words, “in a
stimulating learning environment they [students] should have the opportunity to build up
their own knowledge and understanding (brochure about the Freudenthal Institute, p. 4).
Realistic must not be understood only as real life problem but as rich, meaningful and
challenging problems that can be solved in different ways.
The Freudenthal Institute is also well-known for its research philosophy (and
methodology): the Design Research.
This form of research is characterized by its
movement from theory to practice and from practice to theory in order to serve both
“worlds”: practice and research. It is also described as an iterative process which consists
in designing, testing and improving a tool or any material for instruction.
There are many important research themes: development of mathematical understanding
in childhood, use of technology, professional development, mathematics for students with
4
special needs, etc. The research I worked on is about the use of ICT tools at upper
secondary level to learn algebraic skills. I will now present more precisely this research.
3. Presentation of the research
The research focuses on the use of ICT tool to develop algebraic skills. There are several
phases to this research. The first one could be described as the exploration phase where
criteria for a functional ICT tool to developed algebraic skills were found (and listed)
through a Delphi method. This phase, which is already done, had the utility of choosing a
tool to do the rest of the research based on the list of criteria. The tool who has performed
the best is DME which is developed by the Institute. To evaluate the utilization of the
tool, two expert reviews and four one-to-ones were conducted. This led to modifications
for the prototype.
The second phase consists in trying, and eventually refining, the prototype. This phase
could be described as an induction process where one’s want to learn from the data
collected in this second phase. Two goals are aimed: to collect and interpret data to
understand the development of algebraic skills linked to the use of the tool, and to
interpret the interactions between students and the computer. During the internship, the
research was in that particular phase.
Eventually, there will be a third phase in which searchers will confirm or not hypothesis
made in the second phase. It could be describe as the passage from the field to the
laboratory. In other words, essays have been made, and now it’s time to confirm what
have been discovered. This leads to more conventional quantitative methods with a large
number or pupils.
4. Collecting and analyzing data
In order to have a better understanding of the research, I had to get more familiar with the
tool used to collect data and analyze the data. In this section, I will briefly explain what
log files are and what data mining is.
5
4.1 Log files
Log files are discrete recording of user action during the utilization of software (Guzdial,
1993). It offers the possibility to analyze human-computer interaction whether in realtime, so the software can adjust to what have been done by a user or to facilitate and
make more precise a feedback given by the computer (Corea and Weibelzahl, 2006), or
afterwards, for a post-hoc analysis. From a searcher point of view, this kind of method is
interesting to collect data because it doesn’t need a special kind of setting to do so
(sometimes some ways in collecting data are artificial).The log files are usually used to
characterize types of navigation on the web or using software. The question to be asked
is probably how the utilization of log files can be used to study a learning process when
using a digital tool. In a way, it offers the possibility to track every action of the students
so it permits to analyze those actions (which can be mistakes, progress, etc.). According
to Cocea and Weibelzahl (2006) they are able to record data for a large number of
students and they can capture multiple kind of information.
The main goal when choosing to use log files is to identify functional measures so they
can be manipulated by ether an individual or a computer. In the DME platform, what has
been recorded in the log files are all text entries made by a student: the name of the
student, the date, the time, the task id, the step, the expression entered, the feedback
given. As it was said, the log files are usually used to trace navigation types of people so
often, in log files, every mouse “click” is recorded so the navigation can be retrace.
As it was discussed with Christian Bokhove, it would be interesting to have logged
student navigation (use of video for example). It would also be interesting to have a
window so that the student could write remarks about mathematical or other difficulties
(the use of the tool). I don’t know if there is a “help” set up but it would be interesting to
have one about the use of the digital tool and to record in the log files its use. It would
also be helpful, from my perspective, to be able to “click” on the feedback and have
information on when this feedback is usually used. It could help the student to interpret
the feedback and if it is logged in, it could be used by the searcher to ameliorate the
6
feedback (you could see when feedbacks and which feedbacks need more
interpretations).
The hints on the left side could be hyperlinks and be logged in when they are used so the
searcher can retrace the frequency of use. It could be presented as a glossary of
mathematic symbols and objects (especially if you think the tool will be used more than
for this).
Recommendation for the study:
From a theoretical point of view

Log files are usually used to evaluate or describe navigations through software.
It would be interesting to clearly explain how the log files can also be used to
study a learning process (or the development of algebraic skills in that
particular case).
Comments about DME and log files

Would it be interesting to have a window so students could write their
difficulties? It could be used by the teacher or by the searcher (if it’s logged).

From my point of view, it would be interesting for the student to “click” on the
feedback to have a hint in what case this feedback is used. If those “clicks” are
recorded in the log files, the searcher would be able to know which feedback
are not understood and in what case.
3.2 Data treatment: data mining
Data mining is used to discover new knowledge from databases. According to Liu and
Ruiz (2008), data mining focuses particularly on revealing hidden patterns in large data
sets. Mining the data usually consists in making classifications, clustering, characterizing
and finding patterns. There are two functions to operate data mining: one is to find
regularities among data records and another is to identify relations among variables in the
data that will predict future values of the variables (Liu and Ruiz, 2008). Data mining can
be both “bottom up” and “top down” approaches in identifying patterns.
7
Data mining is characterized by:

High quantities of data deriving from numerical auto fill;

Data mining is often associated with quantitative methods but it differs from
standard statistical approaches.
There are two types of data mining. The first is used when there is a variable to explain
(for example the success of a task) and the searcher wants to link this variable with other
variables (for example the occurrence of a particular feedback usually leads to a success).
This is a supervised method (as a decision tree for example). In the contrary, if there is no
variable to explain then one may use a non-supervised method of classification
(clustering).
The type of variables to be used in data mining can be qualitative (but must be associated
to a cardinal and finite of course) or quantitative.
Recommendation for the study:
3.2.1 Decision tree
Authors : Quinlan (ID3 in 1979 and C4.5 in 1993), Kass (CHAID in 1980).
It is possible to generate knowledge in the form of decision trees that is capable of
solving difficult problems of practical significance (Quinlan, 1986, p. 83).

The decision tree is used when the variable to predict is known;

It is used to classify so the data must be recast as a classification problem;

It consists in finding a good partitioning of data;

The goal is to group “people” homogeneously from a variable to predict (or
explain) perspective;

These trees are constructed beginning with the root (variable to predict) of the tree
and proceeding down to its leaves.

Can be used with categorical or discrete variables (the continuous variables
must be transform as discrete ones).
8
Advantages : very intuitive analysis, very visual, can be done with SPSS (I think).
Disadvantages : the variable to predict must be known, instability which means that the
choice made at the top of the tree have an effect on the choice made at the bottom of the
tree (Rakotomalala, 2005)
How to construct a decision tree: key questions (Rakotomalala, 2005)

How to choose a good partitioning?
o To evaluate the relevance of a variable, to produce a good segmentation,
you can used a chi-2 test or the t test of Tschuprow that you calculate for
every variables to be used in the tree and you choose the one with the
highest rate. Of course, this causes instability: the choices made at the
beginning of the tree will have an effect on the choices that have to be
made at the bottom of the tree.

How to make the “cuts” when you have a continuous numerical variable?
o Usually, the cut is halfway between to values (two separate values). If you
have tree separate values then you have two “mid-points”. To choose the
one to be used, you calculate (for each mid-point) a chi-2 test (or the t of
Tschuprow).

How to define the size of the tree?
o You can fix a stopping criterion (pre-pruning). You accept or refuse the
segmentation if the chi-2 (or the t of Thschuprow) is over or under a
threshold. The formalization of the threshold is done with a statistical
hypothesis test.
Example of use
The variable to predict: the achievement of a task (goed, half or fout) – can be used for a
task (every students) or for a student (every tasks).
The variables to be link : number of steps (have to be cut), crises (yes or no, if its done
for a student)…
9
Example of a decision tree
Example of a Decision Tree
al
al
us
ric
ric
uo
o
o
n
s
g
g
i
te
te
as
nt
cl
ca
ca
co
Tid Refund Marital
Status
Taxable
Income Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
Splitting Attributes
Refund
Yes
No
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
NO
> 80K
NO
YES
10
Model: Decision Tree
Training Data
3
Another Example of Decision
Tree
al
al
us
ric
ric
uo
o
o
n
s
i
g
g
te
te
nt
as
cl
ca
ca
co
Tid Refund Marital
Status
Taxable
Income Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
MarSt
Married
NO
Single,
Divorced
Refund
No
Yes
NO
TaxInc
< 80K
NO
> 80K
YES
There could be more than one tree that
fits the same data!
10
4
Source: http://zoo.cs.yale.edu/classes/cs445/slides/DM_DecisionTree-mod_jdu.ppt
10
3.2.2 Hierarchical clustering
Authors : Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "14.3.12
Hierarchical clustering" (PDF). The Elements of Statistical Learning (2nd ed.). New
York: Springer. pp. 520–528.
Hierarchical clustering does not require a number of clusters to be searched. “Instead,
they require the user to specify a measure of dissimilarity between (disjoint) groups of
observations, based on the pair wise dissimilarities among the observations in the two
groups. As the name suggests, they produce hierarchical representations in which he
clusters at each level of the hierarchy are created by merging clusters to the next lower
level” (Hastie, Tibshirani and Friedman, 2009, p. 520).
At the lowest level : all individuals are in different classes
At the highest level : all individuals are in the same class

It is an automatic classification;

It aims to group individuals in classes (homogeneously);

The distance is used as a measure of dissimilarity;
How does one define similarity between clusters (Moore)?

Finding the minimum distance between points (could be students, could be tasks,
could be feedbacks ???) in clusters (in which case we’re simply doing Euclidian
Minimum Spanning Trees);

Finding the maximum distance between points in clusters;

Finding the average distance between points in clusters
Advantages :do not need to know what you are looking for, very visual, can be done
with SPSS (I think).
11
Example of hierarchical clustering
(Integrally
taken
from
a
tutorial
on
clustering
algorithm,
source:
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html)
Let’s now see a simple example: a hierarchical clustering of distances in kilometers
between some Italian cities. The method used is single-linkage.
Input distance matrix (L = 0 for all the clusters):
BA FI MI NA RM TO
BA 0 662 877 255 412 996
FI 662 0 295 468 268 400
MI 877 295 0 754 564 138
NA 255 468 754 0 219 869
RM 412 268 564 219 0 669
TO 996 400 138 869 669 0
The nearest pair of cities is MI and TO, at distance 138. These are merged into a single
cluster called "MI/TO". The level of the new cluster is L(MI/TO) = 138 and the new
sequence number is m = 1.
Then we compute the distance from this new compound object to all other objects. In
single link clustering the rule is that the distance from the compound object to another
object is equal to the shortest distance from any member of the cluster to the outside
object. So the distance from "MI/TO" to RM is chosen to be 564, which is the distance
from MI to RM, and so on.
12
After merging MI with TO we obtain the following matrix:
BA
BA FI MI/TO NA RM
0 662 877 255 412
FI
662 0
MI/TO 877 295
295
468 268
0
754 564
0 219
NA
255 468
754
RM
412 268
564
219 0
min d(i,j) = d(NA,RM) = 219 => merge NA and RM into a new cluster called NA/RM
L(NA/RM) = 219
m=2
BA FI MI/TO NA/RM
BA
FI
0 662
662 0
MI/TO 877 295
NA/RM 255 268
877
255
295
268
0
564
564
0
13
min d(i,j) = d(BA,NA/RM) = 255 => merge BA and NA/RM into a new cluster called
BA/NA/RM
L(BA/NA/RM) = 255
m=3
BA/NA/RM FI MI/TO
0
268 564
BA/NA/RM
FI
268
0
295
MI/TO
564
295
0
min d(i,j) = d(BA/NA/RM,FI) = 268 => merge BA/NA/RM and FI into a new cluster
called BA/FI/NA/RM
L(BA/FI/NA/RM) = 268
m=4
14
BA/FI/NA/RM MI/TO
BA/FI/NA/RM
0
295
MI/TO
295
0
Finally, we merge the last two clusters at level 295.
The process is summarized by the following hierarchical tree:
3.2.3 Correspondence analysis
Authors : Benzecri (1970 et +)
Correspondence analysis is used with complex tables of numbers. A complex table of
numbers can be replaced by simpler tables that are a good approximation of the complex
one.
15
How to do a correspondence analysis :
You have a matrix
University
Pre-
Others
Total
university
A
13
2
5
20
B
20
2
8
30
C
10
5
5
20
D
7
1
22
30
Total
50
10
40
100
1) You must evaluate what would be the matrix if the data were proportionally
distributed
University
Program:
Pre-
Others
Total
…
20
university
A
20%*50 = 10
B
…
…
30
C
20
D
30
Total
50
10
40
100
2) You calculate the difference between the new matrix and the real one;
M – M0 = D
13
 20

10

7
2 5  10
2 8  15

5 5  10
 
1 22  15
2 8 3 0
3 12   5 1

5 8 0 3
 
3 12   8 2
3
4 
3

10 
16
3) Then you express this difference graphically to be able to analyze it
You want to be able to represent this matrix in a 2D form : you express the matrix
D = M1 + M2
3 0
 5 1

0 3

 8 2
3  1 1 2   2 1 1
4   1 1 2   4 2 2 


3  2 2 4   2 1 1 
 
 

10   4 4 8   4 2 2 
Then M1 and M2 have to be express in a multiplication of a column vector by a
row vector so…
D = C1 L1 + C2 L2
3 0
 5 1

0 3

 8 2
3  1 1 2   1   2 1 1  1 
4   1 1 2   1   4 2 2   2 


3  2 2 4   2   2 1 1   1
 
  
 
10   4 4 8   4   4 2 2   2 
1
1 2
2
A
1
1
B
2
1
C
-1
2
D
-2
-4
University
2
1
Pre-univ.
-1
1
Others
-1
-2
1 1
You analyze the mapping obtained
17
3
C
2
Pre-Uni
A
1
B/Univ.
0
-3
-2
-1
0
1
2
-1
Others
3
Series1
-2
-3
D
-4
-5
4) You try to optimize the readability
Scalar product of vectors :

positive means that they have affinities

negative means that they have no affinities

zero means that there is no relation (more than with other variables)
Example: Program C and Pre-university have an affinity;
Program A does not lead much to other area than university or preuniversity;
Program A does not lead more than other programs to pre-university
Advantages : , very visual, can be done with SPSS (I think)
Disadvantages : in your case it will need to find a way to “pre-work” the data easily,
most of the literature is in French
18
5. Analysis
I am now going to present the analysis of students productions made in order to
potentially understand the effect of the feedback on their action, to interpret their
difficulties (linked or not with the use of the tool, linked to the type of tasks, if it is a
crises or not). For the analysis, I examined every entry lines of ten students from task
number 1.2 to task number 2.6. I located the “fout” and places where the steps were
stable or dropping in order to understand the difficulties encountered by students. I then
marked the difficulties and classified into two categories: mathematics and use of the
tool. Here are the main difficulties found.
5.1 Activity 1
Mathematical difficulties and errors

Rewriting generates a lot of difficulties:
o With signs
o With division (especially if the x is on the right side: 2 = 5x, x = 5/2).

When the common factor is not written in the same order, it seems more difficult
to recognize it and the student wants to repeat what has been done so, for
example, a student may write that both sides are equal to 0 :
o (4*x+2)*sqrt(2+4*x) = 0 of sqrt(4*x+2)*(6*x-2) = 0

Some calculating mistakes when the student expands

When students simplify, they sometimes forget a solution (common factor = 0).
Sometimes it can be interpreted as a mathematical difficulty but iot could also be
because of the tool which needs the two solutions on the same line.
Difficulties with the tool

Working on a solution at one time generates a feedback that can be interpreted as
if it was wrong.

Rewriting on the same line:
o x-5 = 4 of x = 9 of 2*x-7 = 0 of 2*x = 7 of x = 3+((1)/(2))
19
Here the student is having trouble adapting his way of working to the tool
requests: see Dao

Writing an fractional number with the computer : 3 + ½ can become 31/2

When a quadratic equation has no solution, the feedback tells the student before
his trial to find factors. This leads to blockages (see especially task 1.6)

BUGS : see Broersen 1.11, Dao 1.8, Diemen 1.6
5.2 Activity 2
Mathematical difficulties and errors

Rewriting mistake: dividing by something that implies an addition

Rewriting mistakes: a multiplication is treated as an addition or an addition is
treated as a multiplication.

Factoring mistake: factoring something that is not a common factor

Square root mistake: the student tries to change the square root to have an
exponent but uses the exponent -½

Square root mistake: manipulation of a division of square roots (see Boon)

Brackets mistake: missing brackets
Difficulties with the tool

BUG: task 2.6 (feedback is always “goed” even when there is a mistake)
5.3 Comments and questions

There is a lot of difficulties involving rewriting simple equations: are the
manipulations harder on the computer than on paper? You would have to look at
the pre and post tests. I also wonder if the rewriting mistakes can be due to
working all the solutions at the same time.

Does this way of working the solutions (all at the same time has an effect on the
post test? Do the students try to imitate what has been done with the tool? In other
words, how do the students solve the equation in pretest? How do they do it with
20
the computer? How do they do it in post test? Has the tool helped? In what way?
Has the tool harmed? In what way?

Could the student work on one solution at the time? Instead of saying that there
are solutions missing, could the feedback tell something more positive like: that is
one of the solutions.

When there is a quadratic equation which doesn’t have any solution, the feedback
“ Deze stap bevat correcte en niet correcte onderdelen. Verwijder of vervang de
delen die niet correct zijn” is not helping because it’s not obvious, for the student,
that there are no solutions. (Task 1.6)

What to think about subtraction mistakes when rewriting the expression (and
“sending parts of the equation on the other side of the equality”). The feedback is
helping or not? It is hard to know. I’m also wondering if the student gets more
vigilant or not on those action because of the feedback. Would it be possible to
retrace this kind of errors?

When the variable (which is usually x) is written differently, a more complex
form, as log x, the student is unable to recognize the pattern to solve the equation.
(Task 1.11)

Manipulating the square root thinking about both the positive and negative
solution seems to be difficult. Does it have an effect on the strategy used to solve
(expanding for example). To know that we would have to see if tries have been
made to complete the task without expanding and if the student has changed
his/her strategy to expand. It would also be possible to see the evolution (from
task to task) of the strategy used. It would also be interesting to find if there is a
relation between the task and the strategy used (for any kind of task – category 1,
2, 3, 4, etc.). To do that, we would have to code, for every student, the strategy
used for every task. We would also have to find a way to verify the hypothesis.

We could also try to find witch task were difficult and why. For example, the task
1.6 wasn’t solved by any student and I think it’s because of the feedback. How to
do it in a quantitative way?

Activity 2: Amin tries to find a working strategy: he starts by putting everything
on one side which equals zero.
21
References
Cocea, M. & Weibelzahl, S. (2006). Can Log Files Analysis Estimate Learners’ Level of
Motivation? In M. Schaaf and K. Althoff (Eds). ABIS 2006 : 14th Workshop on
Adaptivity and User Modeling in Interactive Systems, Universityu of Hildesheim,
Germany.
Guzdial, M. (1993). Deriving Software Usage Patterns from Log Files. GVU Technical
Report;GIT-GVU-93-41, Georgia Institute of Technology.
Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). The Elements of Statistical
Learning (2nd ed.). New York: Springer. pp. 520–528.
Liu, X. & Ruiz, M.-E. (2008). Using Data Mining to Predict K–12 Students’ Performance
on Large-Scale Assessment Items Related to Energy. Journal of Research in
Science Teaching, 45(5), 554–573.
Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1, 81-106.
Rakotomalala, R. (2005) Arbres de decision. Revue MODULAD, 33, 163-187.
http://www.autonlab.org/tutorials/kmeans11.pdf
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html
http://zoo.cs.yale.edu/classes/cs445/slides/DM_DecisionTree-mod_jdu.ppt
22
ANNEXES
Horaire
24 mai 2010 (19h à 22h)
 Lecture d’articles préparatoires (à propos du projet de recherche) :
o Bokhove, C., & Drijvers, P. (2010). Digital tools for algebra education:
criteria and evaluation. International journal of computers for
mathematical learning, 15(1), 45-62.
o Bokhove, C., & Drijvers, P. (2010). Digital activities that invite symbol
sense behavior.
ANNEXE A – 1 et 2
25 mai 2010 (9h à 17h10)
 Rencontre avec M. Drijvers à 9h : visite de l’Institut Freudenthal, présentation de
quelques chercheurs;
 Rencontre avec M. Bokhove à 11h : discussion autour de son projet de recherche;
précision des tâches à faire pendant le stage; planification d’une rencontre
(vendredi le 28 mai à 11h30);
 Lecture du plan de recherche ANNEXE B;
 Familiarisation avec le logiciel de « résolution d’équation »;
 Premier contact avec les données : élagage des données.
26 mai 2010 (9h à 16h)
 Lecture d’articles à propos de l’analyse de fichiers log :
o Castillo, V. R. C., Villaflor, K. B. V., Rodriguez, R. L., & Rodrigo, M. M.
T. (2010, March).Modeling student affect and behavior using biometric
readings, log files and low fidelity playbacks. Paper presented at the 10th
Philippine Computing Society Congress, Davao City, Philippines.
o Muehlenbrock, M. (2005, July). Automatic action analysis in an
interactive learning environment Paper presented at the Workshop on
Usage Analysis in Learning Systems at the 12th International Conference
on Artificial Intelligence in Education AIED-2005, Amsterdam, the
Netherlands.
ANNEXE A – 3 et 4

Lecture d’un article à propos du projet de recherche :
o Bokhove, C. (2008). Use of ICT in formative scenarios for algebraic
skills. Paper presented at the International Society for Design and
Development in Education.
ANNEXE A – 5

Liste de toutes les données recueillies dans les fichiers log et les données qui
peuvent en être déduite et leur intérêt pour la recherche, en lien avec les objectifs
(ANNEXE C – Data.doc)
23

Élaboration du document questions et commentaires comme base de discussion à
la rencontre du 28 mai avec M. Bokhove (ANNEXE C – Meeting_28-05-10.doc)
27 mai 2010 (10h-21h)
 Participation à la journée annuelle de l’institution Freudenthal
o Croisière sur le lac…
o Dîner…
28 mai 2010 (9h-15h)
 Manipulation du logiciel
 Écriture d’une réaction au projet (ANNEXE C – Reaction.doc)
 Recherche d’outil d’analyse selon les données : arbre de décision (algorithme
C4.5 ou C5.0 de Quinlan, 1993).
 Rencontre avec Christian Bokhove à 11h30.
31 mai 2010 (8h-16h)
 Recherche documentaire à propos de l’analyse de fichiers log, du « data mining »,
d’arbres de décision…(voir dossier Data mining articles).
 Lecture d’un article à propos des arbres de décision :
o Quinlan, J. R. (1986). Induction of Decision Trees. Machine learning, 1,
81-106.
 Lecture d’un article à propos des fichiers log :
o Guzdial, M. (1993). Deriving Software Usage Patterns from Log Files.
 Lecture d’un tutorial pour l’analyse de données quantitative :
o R : A self-learn tutorial
 Liste d’idées pour traiter les données (ANNEXE C – possible_classifications.doc)
ANNEXE A 6, 7 et 8
1er juin 2010 (8h-16h)
 Lecture d’un didacticiel sur les arbres de décision :
o Rakotomalala, R. (2005). Arbres de décision. Revue Modulas, 33, 163187.
 Recherche de manières de traiter les données :
o Arbre de décision
o Clustering (hiérarchique)
o Analyse factorielle de correspondance
 Recherche de documentation à propos de ces outils de classification
 Familiarisation avec ces outils de classification
 Écriture du début du rapport
 Lecture de la brochure qui retrace l’histoire de l’Institut, qui explique sa
philosophie et l’orientation prise pour la recherche.
24
2 juin 2010 (8h30 à 15h)
 Poursuite de l’écriture du rapport
 Familiarisation avec le logiciel SPSS (SPSS tutoriel)
 Essayer de trouver une manière (quantitative d’analyser les données)
 Début d’une analyse qualitative des données
3 juin 2010 (9h-16h30)
 Préparation de la rencontre avec Paul Drijvers
 Analyse qualitative des données
 Rencontre avec Paul Drijvers
 Écriture d’un bref résumé de la rencontre
 Écriture d’un sommaire des analyses (en date du 3 juin)
4-5-6 juin 2010 (9h-13h+8h)
 Poursuite des analyses
 Écriture d’un sommaire des analyses (en date du 4-5-6 juin)
7 juin 2010 (9h-16h30+20h-22h30)
 Poursuite des analyses
 Essaies statistiques (avec logiciel SPSS)
 Rencontre avec Christian Bokhove
 Planification d’une rencontre avec Mme Heleen Verhage
8 juin 2010 (8h-16h+20h-22h30)
 Poursuite des analyses
 Écriture d’un sommaire des analyses
 Écriture du rapport
 Planification avec une doctorante (Aldine) d’une table ronde autour de la Realistic
Mathematics Education
o Élaboration de questions cadres
o Invitations
14 juin 2010 (9h-17h)
 Préparation de la rencontre avec Heleen Verhage
 13h – Rencontre avec Heleen Verhage (directrice de l’Institut Freudenthal)
 15h – Rencontre avec Paul Drijvers
15 juin 2010 (8h30-16h30)
 Préparation de la table ronde : lectures
o van den Heuvel-Panhuizen, M.
 13h – Table ronde autour du theme de la Realistic Mathematics Education avec
Jaap den Hertog et Aad Goddijn
25
16 juin 2010 (15h-18h)
 Poursuite des analyses
17 juin 2010 (8h-17h)
 Poursuite des analyses
 Terminer le rapport de stage
 Préparation d’un dossier pour Christian Bokhove et Paul Drijvers
 15h – Conférence de M. Paul Drijvers au département de mathématiques de
l’Université d’Utrecht
18 juin 2010 (9h-16h)
 Rencontre avec Christian Bokhove et Paul Drijvers
 Fin du stage
26
Data fields from the log files or not
Table 1 present the data available to the analysis, a description of each variable, the type
of variable and some hints on how the could be used in the analysis.
Data fields
Type of
task
Crises
Duration 12
Table 1 : Data fields
Description
Type of variables
The four categories :
Categorial
equations with common
factor, covering up subexpressions, resisting visual
salience in powers of subexpressions, hidden factors
Whether the task can (not a
Categorial : yes or no
crises) or cannot (a crises)
be solved using available
knowledge
Duration in solving a task
Numerical (continue)
Use


Duration 2
Duration between two
actions that occurred
Numerical (continue)


Number of
steps
How many steps (including
backwards) to solve the task
Numerical (discrete)


Linked to
the type of
tasks
Linked to
the crises
Linked to
the
feedback
Linked to
whether it’s
a “half” or
a “fout”
Linked to
the type of
task
Linked to
the crises

Feedback
Expression
Resolved
2
Categorial
Current state of the equation
Classifies if the equation is
solved (goed), quasi-solved
(half) or not solved because
of a mistake (fout)
Categorial
This duration is an approximation
27