Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Core Methods in Educational Data Mining HUDK4050 Fall 2014 Welcome • Welcome back to the 2nd class session Administrative Stuff • Is everyone signed up for class? • If not, and you want to receive credit, please talk to me after class Other administrative questions? Today’s Readings • First, a no-penalty-or-punishment survey question Today’s Readings • Who read the Witten & Frank? • Who watched the BDE video? Questions? Comments? Concerns? What is a prediction model? What is a regressor? What are some things you might use a regressor for? • Bonus points for examples other than those in the BDE video Let’s do an example • Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions Skill COMPUTESLOPE pknow 0.2 time 7 totalactions 3 numhints ? Which of the variables has the largest impact on numhints? (Assume they are scaled the same) However… • These variables are unlikely to be scaled the same! • If Pknow is a probability – From 0 to 1 • And time is a number of seconds to respond – From 0 to infinity • Then you can’t interpret the weights in a straightforward fashion • What could you do? Let’s do another example • Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions Skill COMPUTESLOPE pknow 0.2 time 2 totalactions 35 numhints ? Is this plausible? What might you want to do if you got this result in a real system? Transforms • In the video, we talked about variable transforms • Who here has transformed a variable (for an actual analysis)? • What did you transform and why did you do it? Variable Transformation: EDM versus statistics • Statistics: fit data better AND avoid violating assumptions • EDM: fit data better Why don’t violations of assumptions matter in EDM? Interpreting Regression Models • Example from the video Example of Caveat • Let’s graph the relationship between number of graduate students and number of papers per year Data 16 14 12 Papers per year 10 8 6 4 2 0 0 2 4 6 8 10 Number of graduate students 12 14 16 Model • Number of papers = 4+ 2 * # of grad students - 0.1 * (# of grad students)2 • But does that actually mean that (# of grad students)2 is associated with less publication? • No! Example of Caveat 16 14 Papers per year 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of graduate students • (# of grad students)2 is actually positively correlated with publications! – r=0.46 Example of Caveat 16 14 Papers per year 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of graduate students • The relationship is only in the negative direction when the number of graduate students is already in the model… How would you deal with this? • How can we interpret individual features in a comprehensive model? Other questions, comments, concerns about lecture? RapidMiner 5.3 exercise • Go to the course website and download • Sep10dataset.csv • Data on the probability that a student error is careless • Calculated as in (Baker, Corbett, & Aleven, 2008) • Try to predict from other variables RapidMiner tasks • • • • • • • • • Build regressor to predict P(SLIP|TRIO) Look at model goodness Look at model Look at actual data and refine model Look at model goodness Build flat cross-validation Look at model goodness Build student-level cross-validation Look at model goodness Class Code will be posted later today Questions? Comments? Concerns? Questions about Basic HW 1? Reminders • You don’t have to do it perfectly, you just have to do it • If you run into trouble, feel free to email me or, better yet, use the moodle discussion forum Questions? Concerns? Other questions or comments? Next Class • Monday, September 15 • Classification Algorithms • Baker, R.S. (2014) Big Data and Education. Ch. 1, V3, V4, V5. • Witten, I.H., Frank, E. (2011) Data Mining: Practical Machine Learning Tools and Techniques. Ch. 4.6, 6.1, 6.2, 6.4 • Basic HW 1 due The End