Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UNIVERSITÉ DU QUÉBEC À MONTRÉAL in collaboration with FREUDENTHAL INSTITUTE OF SCIENCE AND MATHEMATICS EDUCATION UNIVERSITY OF UTRECHT DME9200 STAGE DE RECHERCHE II REPORT PRESENTED TO PAUL DRIJVERS AND JEAN BÉLANGER CLAUDIA CORRIVEAU JUNE 17TH 2010 Table of contents Introduction ......................................................................................................................... 3 1. Objectives of the internship ............................................................................................ 3 1.1 Internship purpose and modalities ............................................................................ 3 1.2 Agreement on the work to do.................................................................................... 3 1.3 The objectives of the internship report ..................................................................... 4 2. Presentation of the Freudenthal Institute ........................................................................ 4 3. Presentation of the research ............................................................................................ 5 4. Collecting and analyzing data ......................................................................................... 5 4.1 Log files .................................................................................................................... 6 3.2 Data treatment: data mining ...................................................................................... 7 3.2.1 Decision tree .......................................................................................................... 8 3.2.2 Hierarchical clustering ......................................................................................... 11 3.2.3 Correspondence analysis ...................................................................................... 15 5. Analysis......................................................................................................................... 19 5.1 Activity 1 ................................................................................................................ 19 5.2 Activity 2 ................................................................................................................ 20 5.3 Comments and questions ........................................................................................ 20 References ......................................................................................................................... 22 ANNEXES ........................................................................................................................ 23 2 Introduction As part of the doctoral program1, two research internships have to be made by the students. As it is explained in the course descriptor, the research internships are intended to provide an opportunity to acquire a wider vision of what it means to do research and it is also an occasion to interact with researchers and learn from those interactions. I had the great chance to accomplish one of those two internships at the Freudenthal Institute for Science and Mathematics Education (Department of Mathematics) based in Utrecht, Netherlands, from May 25th to June 18th. 1. Objectives of the internship In this part of the report, I will present the internship purposes from the university point of view. I will then present what was planned for me to do and finally, I will present the objectives of this report. 1.1 Internship purpose and modalities The internship is seen as research training through active participation on different parts of a research. The student must work 135 hours on a project that differs from his or her own research project. Before coming to Utrecht, I had to write an internship project that described what I would do during the internship. I will briefly explain what was asked for me to do. 1.2 Agreement on the work to do I was asked to work on a project concerning the use of ICT tools at upper secondary level to learn algebraic skills. Four types of expectations were requested: in terms of actions, production, contribution and attendance. In other words, I would have to analyze the data, to show initiatives in finding ways to analyze the data and to present my contribution in an exhaustive report. 1 Doctoral program in education, Université du Québec à Montréal 3 1.3 The objectives of the internship report This report has several purposes. The first one is to make the work done available to the researchers involved in the research. It is also an opportunity to link the work done with both the specific objectives of the work planned and the larger objectives of the internship (from the university point of view). 2. Presentation of the Freudenthal Institute In the field of Mathematics Education, the Freudenthal Institute for Science and Mathematics Education is internationally recognized. The Institute was first set up by Hans Freudenthal in 1971, a mathematician (professor and researcher) who was also interested by pedagogical issues. The Institute was then called (in English) Institute for Development of Mathematics Education. In honor of the founder, it merged to the Freudenthal Institute for Science and Mathematics Education. A major instruction theory developed by Freudenthal, and still present in the philosophy of the Institute, is called Realistic Mathematic Education (RME). In this approach, it is said that, to learn, the student must experienced the mathematics. In other words, “in a stimulating learning environment they [students] should have the opportunity to build up their own knowledge and understanding (brochure about the Freudenthal Institute, p. 4). Realistic must not be understood only as real life problem but as rich, meaningful and challenging problems that can be solved in different ways. The Freudenthal Institute is also well-known for its research philosophy (and methodology): the Design Research. This form of research is characterized by its movement from theory to practice and from practice to theory in order to serve both “worlds”: practice and research. It is also described as an iterative process which consists in designing, testing and improving a tool or any material for instruction. There are many important research themes: development of mathematical understanding in childhood, use of technology, professional development, mathematics for students with 4 special needs, etc. The research I worked on is about the use of ICT tools at upper secondary level to learn algebraic skills. I will now present more precisely this research. 3. Presentation of the research The research focuses on the use of ICT tool to develop algebraic skills. There are several phases to this research. The first one could be described as the exploration phase where criteria for a functional ICT tool to developed algebraic skills were found (and listed) through a Delphi method. This phase, which is already done, had the utility of choosing a tool to do the rest of the research based on the list of criteria. The tool who has performed the best is DME which is developed by the Institute. To evaluate the utilization of the tool, two expert reviews and four one-to-ones were conducted. This led to modifications for the prototype. The second phase consists in trying, and eventually refining, the prototype. This phase could be described as an induction process where one’s want to learn from the data collected in this second phase. Two goals are aimed: to collect and interpret data to understand the development of algebraic skills linked to the use of the tool, and to interpret the interactions between students and the computer. During the internship, the research was in that particular phase. Eventually, there will be a third phase in which searchers will confirm or not hypothesis made in the second phase. It could be describe as the passage from the field to the laboratory. In other words, essays have been made, and now it’s time to confirm what have been discovered. This leads to more conventional quantitative methods with a large number or pupils. 4. Collecting and analyzing data In order to have a better understanding of the research, I had to get more familiar with the tool used to collect data and analyze the data. In this section, I will briefly explain what log files are and what data mining is. 5 4.1 Log files Log files are discrete recording of user action during the utilization of software (Guzdial, 1993). It offers the possibility to analyze human-computer interaction whether in realtime, so the software can adjust to what have been done by a user or to facilitate and make more precise a feedback given by the computer (Corea and Weibelzahl, 2006), or afterwards, for a post-hoc analysis. From a searcher point of view, this kind of method is interesting to collect data because it doesn’t need a special kind of setting to do so (sometimes some ways in collecting data are artificial).The log files are usually used to characterize types of navigation on the web or using software. The question to be asked is probably how the utilization of log files can be used to study a learning process when using a digital tool. In a way, it offers the possibility to track every action of the students so it permits to analyze those actions (which can be mistakes, progress, etc.). According to Cocea and Weibelzahl (2006) they are able to record data for a large number of students and they can capture multiple kind of information. The main goal when choosing to use log files is to identify functional measures so they can be manipulated by ether an individual or a computer. In the DME platform, what has been recorded in the log files are all text entries made by a student: the name of the student, the date, the time, the task id, the step, the expression entered, the feedback given. As it was said, the log files are usually used to trace navigation types of people so often, in log files, every mouse “click” is recorded so the navigation can be retrace. As it was discussed with Christian Bokhove, it would be interesting to have logged student navigation (use of video for example). It would also be interesting to have a window so that the student could write remarks about mathematical or other difficulties (the use of the tool). I don’t know if there is a “help” set up but it would be interesting to have one about the use of the digital tool and to record in the log files its use. It would also be helpful, from my perspective, to be able to “click” on the feedback and have information on when this feedback is usually used. It could help the student to interpret the feedback and if it is logged in, it could be used by the searcher to ameliorate the 6 feedback (you could see when feedbacks and which feedbacks need more interpretations). The hints on the left side could be hyperlinks and be logged in when they are used so the searcher can retrace the frequency of use. It could be presented as a glossary of mathematic symbols and objects (especially if you think the tool will be used more than for this). Recommendation for the study: From a theoretical point of view Log files are usually used to evaluate or describe navigations through software. It would be interesting to clearly explain how the log files can also be used to study a learning process (or the development of algebraic skills in that particular case). Comments about DME and log files Would it be interesting to have a window so students could write their difficulties? It could be used by the teacher or by the searcher (if it’s logged). From my point of view, it would be interesting for the student to “click” on the feedback to have a hint in what case this feedback is used. If those “clicks” are recorded in the log files, the searcher would be able to know which feedback are not understood and in what case. 3.2 Data treatment: data mining Data mining is used to discover new knowledge from databases. According to Liu and Ruiz (2008), data mining focuses particularly on revealing hidden patterns in large data sets. Mining the data usually consists in making classifications, clustering, characterizing and finding patterns. There are two functions to operate data mining: one is to find regularities among data records and another is to identify relations among variables in the data that will predict future values of the variables (Liu and Ruiz, 2008). Data mining can be both “bottom up” and “top down” approaches in identifying patterns. 7 Data mining is characterized by: High quantities of data deriving from numerical auto fill; Data mining is often associated with quantitative methods but it differs from standard statistical approaches. There are two types of data mining. The first is used when there is a variable to explain (for example the success of a task) and the searcher wants to link this variable with other variables (for example the occurrence of a particular feedback usually leads to a success). This is a supervised method (as a decision tree for example). In the contrary, if there is no variable to explain then one may use a non-supervised method of classification (clustering). The type of variables to be used in data mining can be qualitative (but must be associated to a cardinal and finite of course) or quantitative. Recommendation for the study: 3.2.1 Decision tree Authors : Quinlan (ID3 in 1979 and C4.5 in 1993), Kass (CHAID in 1980). It is possible to generate knowledge in the form of decision trees that is capable of solving difficult problems of practical significance (Quinlan, 1986, p. 83). The decision tree is used when the variable to predict is known; It is used to classify so the data must be recast as a classification problem; It consists in finding a good partitioning of data; The goal is to group “people” homogeneously from a variable to predict (or explain) perspective; These trees are constructed beginning with the root (variable to predict) of the tree and proceeding down to its leaves. Can be used with categorical or discrete variables (the continuous variables must be transform as discrete ones). 8 Advantages : very intuitive analysis, very visual, can be done with SPSS (I think). Disadvantages : the variable to predict must be known, instability which means that the choice made at the top of the tree have an effect on the choice made at the bottom of the tree (Rakotomalala, 2005) How to construct a decision tree: key questions (Rakotomalala, 2005) How to choose a good partitioning? o To evaluate the relevance of a variable, to produce a good segmentation, you can used a chi-2 test or the t test of Tschuprow that you calculate for every variables to be used in the tree and you choose the one with the highest rate. Of course, this causes instability: the choices made at the beginning of the tree will have an effect on the choices that have to be made at the bottom of the tree. How to make the “cuts” when you have a continuous numerical variable? o Usually, the cut is halfway between to values (two separate values). If you have tree separate values then you have two “mid-points”. To choose the one to be used, you calculate (for each mid-point) a chi-2 test (or the t of Tschuprow). How to define the size of the tree? o You can fix a stopping criterion (pre-pruning). You accept or refuse the segmentation if the chi-2 (or the t of Thschuprow) is over or under a threshold. The formalization of the threshold is done with a statistical hypothesis test. Example of use The variable to predict: the achievement of a task (goed, half or fout) – can be used for a task (every students) or for a student (every tasks). The variables to be link : number of steps (have to be cut), crises (yes or no, if its done for a student)… 9 Example of a decision tree Example of a Decision Tree al al us ric ric uo o o n s g g i te te as nt cl ca ca co Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 60K Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc < 80K NO > 80K NO YES 10 Model: Decision Tree Training Data 3 Another Example of Decision Tree al al us ric ric uo o o n s i g g te te nt as cl ca ca co Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 60K MarSt Married NO Single, Divorced Refund No Yes NO TaxInc < 80K NO > 80K YES There could be more than one tree that fits the same data! 10 4 Source: http://zoo.cs.yale.edu/classes/cs445/slides/DM_DecisionTree-mod_jdu.ppt 10 3.2.2 Hierarchical clustering Authors : Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "14.3.12 Hierarchical clustering" (PDF). The Elements of Statistical Learning (2nd ed.). New York: Springer. pp. 520–528. Hierarchical clustering does not require a number of clusters to be searched. “Instead, they require the user to specify a measure of dissimilarity between (disjoint) groups of observations, based on the pair wise dissimilarities among the observations in the two groups. As the name suggests, they produce hierarchical representations in which he clusters at each level of the hierarchy are created by merging clusters to the next lower level” (Hastie, Tibshirani and Friedman, 2009, p. 520). At the lowest level : all individuals are in different classes At the highest level : all individuals are in the same class It is an automatic classification; It aims to group individuals in classes (homogeneously); The distance is used as a measure of dissimilarity; How does one define similarity between clusters (Moore)? Finding the minimum distance between points (could be students, could be tasks, could be feedbacks ???) in clusters (in which case we’re simply doing Euclidian Minimum Spanning Trees); Finding the maximum distance between points in clusters; Finding the average distance between points in clusters Advantages :do not need to know what you are looking for, very visual, can be done with SPSS (I think). 11 Example of hierarchical clustering (Integrally taken from a tutorial on clustering algorithm, source: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html) Let’s now see a simple example: a hierarchical clustering of distances in kilometers between some Italian cities. The method used is single-linkage. Input distance matrix (L = 0 for all the clusters): BA FI MI NA RM TO BA 0 662 877 255 412 996 FI 662 0 295 468 268 400 MI 877 295 0 754 564 138 NA 255 468 754 0 219 869 RM 412 268 564 219 0 669 TO 996 400 138 869 669 0 The nearest pair of cities is MI and TO, at distance 138. These are merged into a single cluster called "MI/TO". The level of the new cluster is L(MI/TO) = 138 and the new sequence number is m = 1. Then we compute the distance from this new compound object to all other objects. In single link clustering the rule is that the distance from the compound object to another object is equal to the shortest distance from any member of the cluster to the outside object. So the distance from "MI/TO" to RM is chosen to be 564, which is the distance from MI to RM, and so on. 12 After merging MI with TO we obtain the following matrix: BA BA FI MI/TO NA RM 0 662 877 255 412 FI 662 0 MI/TO 877 295 295 468 268 0 754 564 0 219 NA 255 468 754 RM 412 268 564 219 0 min d(i,j) = d(NA,RM) = 219 => merge NA and RM into a new cluster called NA/RM L(NA/RM) = 219 m=2 BA FI MI/TO NA/RM BA FI 0 662 662 0 MI/TO 877 295 NA/RM 255 268 877 255 295 268 0 564 564 0 13 min d(i,j) = d(BA,NA/RM) = 255 => merge BA and NA/RM into a new cluster called BA/NA/RM L(BA/NA/RM) = 255 m=3 BA/NA/RM FI MI/TO 0 268 564 BA/NA/RM FI 268 0 295 MI/TO 564 295 0 min d(i,j) = d(BA/NA/RM,FI) = 268 => merge BA/NA/RM and FI into a new cluster called BA/FI/NA/RM L(BA/FI/NA/RM) = 268 m=4 14 BA/FI/NA/RM MI/TO BA/FI/NA/RM 0 295 MI/TO 295 0 Finally, we merge the last two clusters at level 295. The process is summarized by the following hierarchical tree: 3.2.3 Correspondence analysis Authors : Benzecri (1970 et +) Correspondence analysis is used with complex tables of numbers. A complex table of numbers can be replaced by simpler tables that are a good approximation of the complex one. 15 How to do a correspondence analysis : You have a matrix University Pre- Others Total university A 13 2 5 20 B 20 2 8 30 C 10 5 5 20 D 7 1 22 30 Total 50 10 40 100 1) You must evaluate what would be the matrix if the data were proportionally distributed University Program: Pre- Others Total … 20 university A 20%*50 = 10 B … … 30 C 20 D 30 Total 50 10 40 100 2) You calculate the difference between the new matrix and the real one; M – M0 = D 13 20 10 7 2 5 10 2 8 15 5 5 10 1 22 15 2 8 3 0 3 12 5 1 5 8 0 3 3 12 8 2 3 4 3 10 16 3) Then you express this difference graphically to be able to analyze it You want to be able to represent this matrix in a 2D form : you express the matrix D = M1 + M2 3 0 5 1 0 3 8 2 3 1 1 2 2 1 1 4 1 1 2 4 2 2 3 2 2 4 2 1 1 10 4 4 8 4 2 2 Then M1 and M2 have to be express in a multiplication of a column vector by a row vector so… D = C1 L1 + C2 L2 3 0 5 1 0 3 8 2 3 1 1 2 1 2 1 1 1 4 1 1 2 1 4 2 2 2 3 2 2 4 2 2 1 1 1 10 4 4 8 4 4 2 2 2 1 1 2 2 A 1 1 B 2 1 C -1 2 D -2 -4 University 2 1 Pre-univ. -1 1 Others -1 -2 1 1 You analyze the mapping obtained 17 3 C 2 Pre-Uni A 1 B/Univ. 0 -3 -2 -1 0 1 2 -1 Others 3 Series1 -2 -3 D -4 -5 4) You try to optimize the readability Scalar product of vectors : positive means that they have affinities negative means that they have no affinities zero means that there is no relation (more than with other variables) Example: Program C and Pre-university have an affinity; Program A does not lead much to other area than university or preuniversity; Program A does not lead more than other programs to pre-university Advantages : , very visual, can be done with SPSS (I think) Disadvantages : in your case it will need to find a way to “pre-work” the data easily, most of the literature is in French 18 5. Analysis I am now going to present the analysis of students productions made in order to potentially understand the effect of the feedback on their action, to interpret their difficulties (linked or not with the use of the tool, linked to the type of tasks, if it is a crises or not). For the analysis, I examined every entry lines of ten students from task number 1.2 to task number 2.6. I located the “fout” and places where the steps were stable or dropping in order to understand the difficulties encountered by students. I then marked the difficulties and classified into two categories: mathematics and use of the tool. Here are the main difficulties found. 5.1 Activity 1 Mathematical difficulties and errors Rewriting generates a lot of difficulties: o With signs o With division (especially if the x is on the right side: 2 = 5x, x = 5/2). When the common factor is not written in the same order, it seems more difficult to recognize it and the student wants to repeat what has been done so, for example, a student may write that both sides are equal to 0 : o (4*x+2)*sqrt(2+4*x) = 0 of sqrt(4*x+2)*(6*x-2) = 0 Some calculating mistakes when the student expands When students simplify, they sometimes forget a solution (common factor = 0). Sometimes it can be interpreted as a mathematical difficulty but iot could also be because of the tool which needs the two solutions on the same line. Difficulties with the tool Working on a solution at one time generates a feedback that can be interpreted as if it was wrong. Rewriting on the same line: o x-5 = 4 of x = 9 of 2*x-7 = 0 of 2*x = 7 of x = 3+((1)/(2)) 19 Here the student is having trouble adapting his way of working to the tool requests: see Dao Writing an fractional number with the computer : 3 + ½ can become 31/2 When a quadratic equation has no solution, the feedback tells the student before his trial to find factors. This leads to blockages (see especially task 1.6) BUGS : see Broersen 1.11, Dao 1.8, Diemen 1.6 5.2 Activity 2 Mathematical difficulties and errors Rewriting mistake: dividing by something that implies an addition Rewriting mistakes: a multiplication is treated as an addition or an addition is treated as a multiplication. Factoring mistake: factoring something that is not a common factor Square root mistake: the student tries to change the square root to have an exponent but uses the exponent -½ Square root mistake: manipulation of a division of square roots (see Boon) Brackets mistake: missing brackets Difficulties with the tool BUG: task 2.6 (feedback is always “goed” even when there is a mistake) 5.3 Comments and questions There is a lot of difficulties involving rewriting simple equations: are the manipulations harder on the computer than on paper? You would have to look at the pre and post tests. I also wonder if the rewriting mistakes can be due to working all the solutions at the same time. Does this way of working the solutions (all at the same time has an effect on the post test? Do the students try to imitate what has been done with the tool? In other words, how do the students solve the equation in pretest? How do they do it with 20 the computer? How do they do it in post test? Has the tool helped? In what way? Has the tool harmed? In what way? Could the student work on one solution at the time? Instead of saying that there are solutions missing, could the feedback tell something more positive like: that is one of the solutions. When there is a quadratic equation which doesn’t have any solution, the feedback “ Deze stap bevat correcte en niet correcte onderdelen. Verwijder of vervang de delen die niet correct zijn” is not helping because it’s not obvious, for the student, that there are no solutions. (Task 1.6) What to think about subtraction mistakes when rewriting the expression (and “sending parts of the equation on the other side of the equality”). The feedback is helping or not? It is hard to know. I’m also wondering if the student gets more vigilant or not on those action because of the feedback. Would it be possible to retrace this kind of errors? When the variable (which is usually x) is written differently, a more complex form, as log x, the student is unable to recognize the pattern to solve the equation. (Task 1.11) Manipulating the square root thinking about both the positive and negative solution seems to be difficult. Does it have an effect on the strategy used to solve (expanding for example). To know that we would have to see if tries have been made to complete the task without expanding and if the student has changed his/her strategy to expand. It would also be possible to see the evolution (from task to task) of the strategy used. It would also be interesting to find if there is a relation between the task and the strategy used (for any kind of task – category 1, 2, 3, 4, etc.). To do that, we would have to code, for every student, the strategy used for every task. We would also have to find a way to verify the hypothesis. We could also try to find witch task were difficult and why. For example, the task 1.6 wasn’t solved by any student and I think it’s because of the feedback. How to do it in a quantitative way? Activity 2: Amin tries to find a working strategy: he starts by putting everything on one side which equals zero. 21 References Cocea, M. & Weibelzahl, S. (2006). Can Log Files Analysis Estimate Learners’ Level of Motivation? In M. Schaaf and K. Althoff (Eds). ABIS 2006 : 14th Workshop on Adaptivity and User Modeling in Interactive Systems, Universityu of Hildesheim, Germany. Guzdial, M. (1993). Deriving Software Usage Patterns from Log Files. GVU Technical Report;GIT-GVU-93-41, Georgia Institute of Technology. Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). The Elements of Statistical Learning (2nd ed.). New York: Springer. pp. 520–528. Liu, X. & Ruiz, M.-E. (2008). Using Data Mining to Predict K–12 Students’ Performance on Large-Scale Assessment Items Related to Energy. Journal of Research in Science Teaching, 45(5), 554–573. Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1, 81-106. Rakotomalala, R. (2005) Arbres de decision. Revue MODULAD, 33, 163-187. http://www.autonlab.org/tutorials/kmeans11.pdf http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html http://zoo.cs.yale.edu/classes/cs445/slides/DM_DecisionTree-mod_jdu.ppt 22 ANNEXES Horaire 24 mai 2010 (19h à 22h) Lecture d’articles préparatoires (à propos du projet de recherche) : o Bokhove, C., & Drijvers, P. (2010). Digital tools for algebra education: criteria and evaluation. International journal of computers for mathematical learning, 15(1), 45-62. o Bokhove, C., & Drijvers, P. (2010). Digital activities that invite symbol sense behavior. ANNEXE A – 1 et 2 25 mai 2010 (9h à 17h10) Rencontre avec M. Drijvers à 9h : visite de l’Institut Freudenthal, présentation de quelques chercheurs; Rencontre avec M. Bokhove à 11h : discussion autour de son projet de recherche; précision des tâches à faire pendant le stage; planification d’une rencontre (vendredi le 28 mai à 11h30); Lecture du plan de recherche ANNEXE B; Familiarisation avec le logiciel de « résolution d’équation »; Premier contact avec les données : élagage des données. 26 mai 2010 (9h à 16h) Lecture d’articles à propos de l’analyse de fichiers log : o Castillo, V. R. C., Villaflor, K. B. V., Rodriguez, R. L., & Rodrigo, M. M. T. (2010, March).Modeling student affect and behavior using biometric readings, log files and low fidelity playbacks. Paper presented at the 10th Philippine Computing Society Congress, Davao City, Philippines. o Muehlenbrock, M. (2005, July). Automatic action analysis in an interactive learning environment Paper presented at the Workshop on Usage Analysis in Learning Systems at the 12th International Conference on Artificial Intelligence in Education AIED-2005, Amsterdam, the Netherlands. ANNEXE A – 3 et 4 Lecture d’un article à propos du projet de recherche : o Bokhove, C. (2008). Use of ICT in formative scenarios for algebraic skills. Paper presented at the International Society for Design and Development in Education. ANNEXE A – 5 Liste de toutes les données recueillies dans les fichiers log et les données qui peuvent en être déduite et leur intérêt pour la recherche, en lien avec les objectifs (ANNEXE C – Data.doc) 23 Élaboration du document questions et commentaires comme base de discussion à la rencontre du 28 mai avec M. Bokhove (ANNEXE C – Meeting_28-05-10.doc) 27 mai 2010 (10h-21h) Participation à la journée annuelle de l’institution Freudenthal o Croisière sur le lac… o Dîner… 28 mai 2010 (9h-15h) Manipulation du logiciel Écriture d’une réaction au projet (ANNEXE C – Reaction.doc) Recherche d’outil d’analyse selon les données : arbre de décision (algorithme C4.5 ou C5.0 de Quinlan, 1993). Rencontre avec Christian Bokhove à 11h30. 31 mai 2010 (8h-16h) Recherche documentaire à propos de l’analyse de fichiers log, du « data mining », d’arbres de décision…(voir dossier Data mining articles). Lecture d’un article à propos des arbres de décision : o Quinlan, J. R. (1986). Induction of Decision Trees. Machine learning, 1, 81-106. Lecture d’un article à propos des fichiers log : o Guzdial, M. (1993). Deriving Software Usage Patterns from Log Files. Lecture d’un tutorial pour l’analyse de données quantitative : o R : A self-learn tutorial Liste d’idées pour traiter les données (ANNEXE C – possible_classifications.doc) ANNEXE A 6, 7 et 8 1er juin 2010 (8h-16h) Lecture d’un didacticiel sur les arbres de décision : o Rakotomalala, R. (2005). Arbres de décision. Revue Modulas, 33, 163187. Recherche de manières de traiter les données : o Arbre de décision o Clustering (hiérarchique) o Analyse factorielle de correspondance Recherche de documentation à propos de ces outils de classification Familiarisation avec ces outils de classification Écriture du début du rapport Lecture de la brochure qui retrace l’histoire de l’Institut, qui explique sa philosophie et l’orientation prise pour la recherche. 24 2 juin 2010 (8h30 à 15h) Poursuite de l’écriture du rapport Familiarisation avec le logiciel SPSS (SPSS tutoriel) Essayer de trouver une manière (quantitative d’analyser les données) Début d’une analyse qualitative des données 3 juin 2010 (9h-16h30) Préparation de la rencontre avec Paul Drijvers Analyse qualitative des données Rencontre avec Paul Drijvers Écriture d’un bref résumé de la rencontre Écriture d’un sommaire des analyses (en date du 3 juin) 4-5-6 juin 2010 (9h-13h+8h) Poursuite des analyses Écriture d’un sommaire des analyses (en date du 4-5-6 juin) 7 juin 2010 (9h-16h30+20h-22h30) Poursuite des analyses Essaies statistiques (avec logiciel SPSS) Rencontre avec Christian Bokhove Planification d’une rencontre avec Mme Heleen Verhage 8 juin 2010 (8h-16h+20h-22h30) Poursuite des analyses Écriture d’un sommaire des analyses Écriture du rapport Planification avec une doctorante (Aldine) d’une table ronde autour de la Realistic Mathematics Education o Élaboration de questions cadres o Invitations 14 juin 2010 (9h-17h) Préparation de la rencontre avec Heleen Verhage 13h – Rencontre avec Heleen Verhage (directrice de l’Institut Freudenthal) 15h – Rencontre avec Paul Drijvers 15 juin 2010 (8h30-16h30) Préparation de la table ronde : lectures o van den Heuvel-Panhuizen, M. 13h – Table ronde autour du theme de la Realistic Mathematics Education avec Jaap den Hertog et Aad Goddijn 25 16 juin 2010 (15h-18h) Poursuite des analyses 17 juin 2010 (8h-17h) Poursuite des analyses Terminer le rapport de stage Préparation d’un dossier pour Christian Bokhove et Paul Drijvers 15h – Conférence de M. Paul Drijvers au département de mathématiques de l’Université d’Utrecht 18 juin 2010 (9h-16h) Rencontre avec Christian Bokhove et Paul Drijvers Fin du stage 26 Data fields from the log files or not Table 1 present the data available to the analysis, a description of each variable, the type of variable and some hints on how the could be used in the analysis. Data fields Type of task Crises Duration 12 Table 1 : Data fields Description Type of variables The four categories : Categorial equations with common factor, covering up subexpressions, resisting visual salience in powers of subexpressions, hidden factors Whether the task can (not a Categorial : yes or no crises) or cannot (a crises) be solved using available knowledge Duration in solving a task Numerical (continue) Use Duration 2 Duration between two actions that occurred Numerical (continue) Number of steps How many steps (including backwards) to solve the task Numerical (discrete) Linked to the type of tasks Linked to the crises Linked to the feedback Linked to whether it’s a “half” or a “fout” Linked to the type of task Linked to the crises Feedback Expression Resolved 2 Categorial Current state of the equation Classifies if the equation is solved (goed), quasi-solved (half) or not solved because of a mistake (fout) Categorial This duration is an approximation 27