Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Some issues and applications in cognitive diagnosis and educational data mining Brian W. Junker Department of Statistics Carnegie Mellon University [email protected] Presentation to the International Meeting of the Psychometric Society Tokyo Japan, July 2007 1 Rough Outline What to do when someone comes into my office? • Cognitive Diagnosis Models (CDM’s) in Psychometrics Models: a partial review • The Assistments Project: Using CDM’s in a learning-embedded assessment system • Educational Data Mining 2 What are CDM’s? How are they related? • Rupp (2007), Fu & Li (2007), Junker (1999), Roussos (1994), and others • Many definitions try to characterize what the unique challenges are, but… • A simple definition of CDM: “A latent trait measurement model useful for inferences about cognitive states or processes” 3 “…Measurement Model useful for inferences about cognitive…” • Unidimensional Item Response Models • Multidimensional Item Response Models – Compensatory structure (e.g. Reckase, 1985, 1997) – Multiplicative structure (e.g. Embretson, 1984, 1997) – Task difficulty (LLTM; e.g. Fischer, 1995) vs. person attribute modeling (MIRT, e.g. Reckase, 1997) • (Constrained) Latent Class Models – Macready and Dayton (1977); Haertel (1989); Maris (1999); • Bayes Net Models – Mislevy et al (e.g. Mislevy, Steinberg, Yan & Almond (1999) – AI and data mining communities (more later…) 4 Constrained Latent Class Models • Basic ingredients • Xij is data (task/item response) • Qjk is design (Q-matrix, skills, KC’s, transfer model…) • ik is latent (knowledge state component of examinee) [ i = (i1, …,iK) is a latent class label ] 5 Constrained Latent Class Models • The Q matrix is the incidence matrix of a bipartite graph • All such models look like (one-layer) discrete-node Bayesian networks. 6 Constrained Latent Class Models • Relate ik to Xij probabilistically: • Now it looks exactly like an IRT model • What is the form of Pj(i)? – Conjunctive (many forms!) – Disjunctive (less common!) – Other?? 7 Two simple conjunctive forms… • DINA • Examined by Junker & Sijtsma (2001) • Antecedents incl. Macready & Dayton (1977); Haertel (1989); Tatsuoka (1983, 1995) • Natural choice in educational data mining • Difficult to assign credit/blame for failure 8 A second simple conjunctive form • NIDA • Also examined by Junker & Sijtsma (2001) • Antecedents incl. Maris’ MCLCM (1999) • Maybe more readily assign credit/blame 9 A generalization of NIDA • RedRUM • *j is maximal probability of success • r*jk is penalty for each attribute not possessed • Introduced by Hartz (2002); cf. DiBello, Stout & Roussos (1995). 10 Compensatory & disjunctive forms are also possible • Weaver & Junker (2004, unpubl.) • Looks like multidimensional Rasch model • Plausible for some multi-strategy settings – limited proportional reasoning domain • DINO, NIDO, … (Rupp, 2007) – Pathological gambling as in DSM-IV (Templin 11 & Henson, 2006) A Common Framework Mixed nonlinear logistic regression models where •  is a coefficient vector; • h(i,Qj) is a vector of Qjk-weighted main effects and interactions among latent attributes: ikQjk, ik1Qjk1ik2Qjk2, ik1Qjk1ik2Qjk2ik3Qjk3 , … Henson et al. (LCDM, 2007); von Davier (GDM, 2005) 12 LCDM’s / GDM’s Obtain RedRUM, NIDA, DINA, DINO, etc., by constraining ’s! • Weaker constraints on ’s: conjunctive disjunctive blends, etc. • Potentially powerful – unifying framework for many CDM’s – exploratory modeling tool 13 Many general frameworks, model choices and design choices • Conceptual: Fu & Li (2007); Rupp (2007) • Extensions: HO-DINA, MS-DINA and others (de la Torre & Douglas, 2004, 2005); Fusion model system (Roussos et al., in press); Bayes Nets (Mislevy et al., 1999) • Model Families: Henson et al. (2007); von Davier (2005), etc. What to do when someone comes into my office? 14 Example: ASSISTments Project • Web-based 8th grade mathematics tutoring system • ASSIST with, and ASSESS, progress toward Massachusetts Comprehensive Assessment System Exam (MCAS) • Main statistical/measurement goals – Predict students’ MCAS scores at end of year – Provide feedback to teachers • Ken Koedinger (Carnegie Mellon), Neil Heffernan (Worcester Polytechnic), & over 50 others at CMU, WPI and Worcester Public Schools 15 The ASSISTment Tutor • • Main Items: Released MCAS or “morphs” Incorrect Main “Scaffold” Items – “One-step” breakdowns of main task – Buggy feedback, hints on request, etc. • Multiple Knowledge Component (Q-matrix) models: – – – – • 1 IRT  5 MCAS math strands 39 MCAS standards 77-106 “expert coded” basic skills Goals: – Predict MCAS Scores – KC Feedback: learned/notlearned, etc. 16 Goal: Predicting MCAS • The exact content of the MCAS exam is not known until months after it is given • The ASSISTments themselves are ongoing throughout the school year as students learn (from teachers, from ASSISTment interactions, etc.). % Correct on System per student 40 35 30 25 20 15 10 5 0 Sep 0 t Oct 1 Nov 2 Jan Dec 3 Time Jan 4 Feb 5 Mar 6 17 Methods: Predicting MCAS • Regression approaches [Feng et al, 2006; Anozie & Junker, 2006; Ayers & Junker, 2006/2007]: – – – – Percent Correct on Main Questions Percent Correct on Scaffold Questions Rasch proficiency on Main Questions Online metrics (efficiency and help-seeking; e.g. Campione et al., 1985; Grigorenko & Sternberg, 1998) – Both end-of-year and “month-by-month” models • Bayes Net (DINA Model) approaches: – Predicting KC-coded MCAS questions from Bayes Nets (DINA model) applied to ASSISTments [Pardos, et al., 2006]; – Regression on number of KC’s mastered in DINA model [Anozie 2006] 18 Results: Predicting MCAS Predictors df CV-MAD CV-RMSE 1 7.18 8.65 7 months, main questions only #KC’s of 77 1 learned (DINA) 6.63 8.62 3 months, mains and scaffolds Rasch Proficiency 1 5.90 7.18 7 months, main questions only PctCorrMain + 4 metrics 35 5.46 6.56 7 months; 5 summaries each month Rasch Profic + 5 metrics 6 6.46 7 months, main questions only PctCorrMain 5.24 Remarks 10-fold cross-validation using: 19 Results: Predicting MCAS • Limits of what we can accomplish for prediction – Feng et al. (in press) estimate best-possible MAD ¼ 6 from split-half experiments with MCAS – Ayers & Junker (2007) reliability calculation suggests approximate bounds 1.05· MAD · 6.46. – Best observed MAD ¼ 5.24 • Tradeoff: – Greater model complexity (DINA) can help [Pardos et al, 2006; Anozie, 2006]; – Accounting for question difficulty (Rasch), plus online metrics, does as well [Ayers & Junker, 2007] 20 Goal: KC Feedback • Providing feedback on – individual students – groups of students • Multiple KC (Q-matrix) models: – – – – • 1 IRT  5 MCAS math strands 39 MCAS standards 106 “expert coded” basic skills Scaffolding: Optimal measures of single KC’s? Optimal tutoring aids? – When more than one transfer model is involved, scaffolds fail to line up with at least one of them! • Use DINA Model, 106 KC’s 21 Results: KC Feedback • Average percent of KC’s mastered: 30-40% • February dip reflects a recording error for main questions • Monthly split-half cross-val accuracy 68-73% on average 22 Results: KC Feedback 23 Digression: Learning within DINA • Current model “wakes up reborn” each month; No data ! posterior falls back to prior ignoring previous response behavior. • Using last month’s posterior as this month’s prior treats previous response behavior too strongly (exchangeable with present). • Wenyi Jiang (ongoing, CMU) is looking at incorporating a Markov learning model for each KC in DINA. 24 Digression: Question & KC Model Characteristics Main Item: Which graph contains the points in the table? Scaffolds: 1. 2. 3. 4. X Y -2 -3 -1 -1 1 3 Guess gj (posterior boxplots) Slip sj (posterior boxplots) Quadrant of (-2,-3)? Quadrant of (-1,-1)? Quadrant of (1,3)? [Repeat main] 25 Some questions driven by ASSISTments • Different KC models for different purposes seem necessary. – How deeply meaningful are the KC’s? • Q-matrix is QC! task design; what about task ! examinee design? – Henson & Douglas (2005) provide recent developments in KLbased item selection for CDM’s – Most settings have designed, undesigned missingness – Interactions between assignment design and learning • How close to right does the CDM have to be? – Douglas & Chui (2007) have started mis-specification studies – Perhaps the Henson/von Davier frameworks can help? – For ASSISTments and other settings, this is a sparse data model fit question! • How to design and improve the KC model? 26 Some options for designing/improving KC model • Expert Opinion, Iterations • Rule space method (Tatsuoka 1983, 1995) • Directly minimizing ij||ij – Xij|| as a function of Q (Barnes 2005, 2006): Boolean regression & variable generation/selection [related: Leenen et al., 2000] • Learning Factors Analysis (Cen, Koedinger & Junker 2005, 2006): learning curve misfit is a better clue to improving the Q-matrix than static performance misfit 27 From www.educationaldatamining.org • Educational Data Mining Workshop, at the 13th International Conference on Artificial Intelligence in Education (AI-ED). Los Angeles, California, USA. July 9, 2007. • Workshop on Educational Data Mining, at the 7th IEEE International Conference on Advanced Learning Technologies. Niigata, Japan. During the period July 18-20, 2007. • Workshop on Educational Data Mining at the 21st National Conference on Artificial Intelligence (AAAI 2006). Boston, USA. July 16-17, 2006. • Workshop on Educational Data Mining at the 8th International Conference on Intelligent Tutoring Systems (ITS 2006). Jhongli, Taiwan, 2006. • Workshop on Educational Data Mining at the 20th National Conf. on 28 Artificial Intelligence (AAAI 2005). Pittsburgh, USA, 2005. From AAAI 2005 • Evaluating the Feasibility of Learning Student Models from Data Anders Jonnson, Jeff Johns, Hasmik Mehranian, Ivon Arroyo, Beverly Woolf, Andrew Barto, Donald Fisher, and Sridhar Mahadevan • Topic Extraction from Item-Level Grades Titus Winters, Christian Shelton, Tom Payne, and Guobiao Mei • An Educational Data Mining Tool to Browse Tutor-Student Interactions: Time Will Tell! Jack Mostow, Joseph Beck, Hao Cen, Andrew Cuneo, Evandro Gouvea, and Cecily Heiner • A Data Collection Framework for Capturing ITS Data Based on an Agent Communication Standard Olga Medvedeva, Girish Chavan, and Rebecca S. Crowley • Data Mining Patterns of Thought Earl Hunt and Tara Madhyastha • The Q-matrix Method: Mining Student Response Data for Knowledge Tiffany Barnes • Automating Cognitive Model Improvement by A*Search and Logistic Regression Hao Cen, Kenneth Koedinger, and Brian Junker • Looking for Sources of Error in Predicting Student’s Knowledge Mingyu Feng, Neil T. Heffernan, and Kenneth R. Koedinger • Time and Attention: Students, Sessions, and Tasks Andrew Arnold, Richard Scheines, Joseph E. Beck, and Bill Jerome • Logging Students’ Model-Based Learning and Inquiry Skills in Science Janice Gobert, Paul Horwitz, Barbara Buckley, Amie Mansfield, Edmund Burke, and Dimitry Markman 29 Educational Data Mining • Often very clever algorithms & data management, not constrained by quant or measurement traditions • A strength (open to new approaches) • A weakness (re-inventing the wheel, failing to see where a well-understood difficulty lies, etc) 30 Conclusions? Questions… • Lots of options for CDM’s, not yet much practical experience beyond “my model worked here” • Significant design questions remain, and seem to admit quantitative solutions • Need to be connected to real projects – real world constraints – real world competitors in EDM • It would be mutually advantageous to join with EDM and draw EDM (partially?) into our community… Can we do it? Do we want to? 31 END (references follow) 32 REFERENCES • Anozie, N.O. (2006). Investigating the utility of a conjunctive model in Q matrix assessment using monthly student records in an online tutoring system. Proposal to the 2007 Annual Meeting of the National Council on Research in Education. • Anozie, N.O. & Junker, B.W. (2006). Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system. American Association for Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA. • Anozie, N.O. & Junker, B. W. (2007). Investigating the utility of a conjunctive model in Q matrix assessment using monthly student records in an online tutoring system. Paper presented to the Annual Meeting of the National Council on Research in Education. Chicago, IL. • Ayers, E. & Junker, B.W. (2006). Do skills combine additively to predict task difficulty in eighthgrade mathematics? American Association for Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA. • Ayers, E. & Junker, B. W. (2006). IRT modeling of tutor performance to predict end of year exam scores. Submitted for publication. • Barnes, T. (2005). Q-matrix Method: Mining Student Response Data for Knowledge. In the Proceedings of the AAAI-05 Workshop on Educational Data Mining, Pittsburgh, 2005 (AAAI Technical Report #WS-05-02). 33 • Barnes, T., J. Stamper, T. Madhyastha. (2006). Comparative analysis of concept derivation using the q-matrix method and facets. Proceedings of the AAAI 21st National Conference on Artificial Intelligence Educational Data Mining Workshop (AAAI2006), Boston, MA, July 17, 2006. • Campione, J.C., Brown, A.L., & Bryant, N.R. (1985). Individual di erences in learning and memory. In R.J. Sternberg (Ed.). Human abilities: An informationprocessing approach, 103–126. New York: W.H. Freeman. • Cen, H., Koedinger K., & Junker B. (2005). Automating Cognitive Model Improvement by A Search and Logistic Regression. In Technical Report (WS-05-02) of the AAAI-05 Workshop on Educational Data Mining, Pittsburgh, 2005. • Cen, H., K. Koedinger, & B. Junker (2006). Learning factors analysis: a general method for cognitive model evaluation and improvement. Presented at the Eighth International Conference on Intelligent Tutoring Systems (ITS 2006), Jhongli, Taiwan. • Cen, H., K. Koedinger, & B. Junker (2007). Is more practice necessary? Improving learning efficiency with the Cognitive Tutor through educational data mining. Presented at the 13th Annual Conference on Artificial Intelligence in Education (AIED 2007), Los Angeles CA. • De la Torre, J. & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353. • de la Torre, J., & Douglas, J. A. (2005). Modeling multiple strategies in cognitive diagnosis. Paper presented at the annual meeting of the National Council on Measurement in Education, Montréal, QC, Canada. • DiBello, L. V., Stout,W. F., & Roussos, L. A. (1995). Unified cognitive/psychometric diagnostic assessment liklihood-based classification techniques. In P. D. Nichols, D. F. Chipman, & R. L. Brennan (Eds.), Cognitively diagnostic assessment (pp. 361-389). Hillsdale, NJ: Lawrence Erlbaum. 34 • • Douglas, J. A. & Chiu, C.-Y. (2007). Relationships Between Competing Latent Variable Models: Implications of Model Misspecification. Paper presented at the Annual Meeting of the National Council on Research in Education, Chicago IL. • Embretson, S. E. (1984). A General Latent Trait Model for Response Processes. Psychometrika, 49, 175–186. • Embretson, S. E. (1997). Multicomponent response models. Chapter 18, pp. 305-322 in van der Linden W. and Hambleton, R. A. (1997). Handbook of Modern Item Response Theory. New York: Springer. • Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with intelligent tutoring systems: developing metrics to measure assistance required. In Ikeda, Ashley & Chan (Eds.) Proceedings of the Eighth International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp 31–40. • Fischer, G. H. (1995). The linear logistic test model. Chapter 8, pp. 131-156 in Fischer, G. H. & Molenaar, I. (1995). Rasch Models: Foundations, Recent Developments, and Applications. New York: Springer. • Fu, J., & Li, Y. (2007). Cognitively Diagnostic Psychometric Models: An Integrative Review. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago IL. • Grigorenko, E. L. and Sternberg, R. J. (1998). Dynamic testing. Psychological Bulletin, 124, 75– 111. • Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301–321. 35 • Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Unpublished doctoral dissertation, University of Illinois at UrbanaChampaign, Urbana-Champaign, IL. • Henson, R. A., Templin, J. L., & Willse, J. T. (2007). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Invited paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago IL. • Henson, R. A., & Douglas, J. (2005). Test Construction for Cognitive Diagnosis. Applied Psychological Measurement, 29, 262–277. • Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 12, 55-73. • Leenen, I., Van Mechelen, I., & Gelman, A. (2000). Bayesian probabilistic extensions of a deterministic classification model. Computational Statistics, 15, 355-371. • Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187212. • Mislevy, R.J., Almond, R.G., Yan, D., & Steinberg, L.S. (1999). Bayes nets in educational assessment: Where do the numbers come from? In K.B. Laskey & H.Prade (Eds.), Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (437-446). San Francisco: Morgan Kaufmann. • Macready, G. B., & Dayton, C. M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Education Statistics, 33, 379-416. 36 • Pardos, Z. A., Heffernan, N. T., Anderson, B., & Heffernan, C. L. (2006). Using Fine Grained Skill Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data Mining held at the Eighth International Conference on Intelligent Tutoring Systems. Taiwan. 2006. • Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412. • Reckase, M. D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W. J. van der Lindern & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 271-286). New York, NY: Springer-Verlag. • Roussos, L. (1994). Summary and review of cognitive diagnosis models. Unpublished manuscript. • Roussos, L., diBello, L. V., Stout, W., Hartz, S., Henson, R. A., & Templin, J. H. (in press). The fusion model skills diagnosis system. In J. P. Leighton, & Gierl, M. J. (Ed.), Cognitively diagnostic assessment for education: Theory and practice. Thousand Oaks, CA: SAGE. • Rupp, A. A. (2007). Unique Characteristics of Cognitive Diagnosis Models. Invited paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago IL. • Tatsuoka, K. K. (1983). Rule space: an apporach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345-354. • Tatsuoka, K. K. (1995). Architecture of knowledge structures and cognitive diagnosis: a statistical pattern recognition and classification approach. Chapter 14 in Nichols, P. D., Chipman, S. F. and Brennan, R. L. (eds.) (1995). Cognitively diagnostic assessment. Hillsdale, NJ: Lawrence Erlbaum Associates. 37 • Templin, J., & Henson, R.(2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287-305. • Weaver, R. & Junker, B. W. (2004). Investigating the foundations of a cognitively diagnostic assessment through both traditional psychometric and skills-based measurement models: Advanced Data Analysis Final Report. Unpublished technical report. Pittsburgh, PA: Department of Statistics, Carnegie Mellon University. • von Davier, M. (2005). A General Diagnostic Model Applied to Language Testing Data. Technical Report RR-05-16. Princeton NJ: Educational Testing Service. 38