Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
WHO ARE YOUR AT-RISK STUDENTS? USING DATA MINING TO TARGET INTERVENTION EFFORTS Lalitha Agnihotri , Ph.D., Senior Systems Analyst, DWH Alex Ott , Ed.D., Associate Dean, Academic & Enrollment Services Niyazi Bodur, Ph.D., VP, Information Technology & Infrastructure New York Institute of Technology EDUCAUSE Annual Conference October 16th, 2013 Presentation Description and Goals Learn how to improve targeted intervention by building a model to identify and classify at-risk students using data at your institution. Gain an understanding of the complete life cycle of the At-Risk Student Identification Model. Targeted Intervention for At Risk Students The Goal: Early targeted intervention based on risk factors for each at-risk student to improve retention Rationale for Key Elements: Early Targeted intervention Risk factors for each student Before the Model, All We Had Was… Students At Risk (STAR) Model Version 1.0 Data sources: Admissions data Registration/Placement test data Survey data Method: Combine all risk variables into an aggregated measure. Version 1.0 Report Output: Major Challenges with STAR 1.0 Limited attributes. Attributes of unknown strength, relevance, or even direction. Attributes equally weighted. Static Excel document: Big effort in getting all the attributes in one place. Data Mining Data Mining Classification Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Student ID Attributes Class Goal: previously unseen records should be assigned a class as accurately as possible. Find a model for class attribute as a function of the values of other attributes. Select the model that performs the best. STAR Model: Version 2.0 with Data Mining and Automated Tools 1. Built and automated the full dataset in our Data Warehouse 2. Used Data Mining tools (SQL Server Analysis Services) to train multiple dynamic statistical models 3. Enterprise solution SQL Build Data SSAS Modeling DMX Prediction Query SSRS Report Models Trained Logistic Regression Logistic Regression Naïve Bayes Naïve Bayes Neural Network Neural Network Ensemble Decision Trees Decision Trees Data Mining Knowledge Discovery: BIG Picture Attribute Work Hours Per Week Work Hours Per Week Major Certainty Career Goal Certainty Stafford Loan Amount Remedial Math Completion Plan NYITScholarshipAmount Work Hours Per Week Developmental Math CareerGoalCertainty College Reading Strategies Value Favors Students Returning Favors Students Not Returning 31+ 0 Not Sure Not sure >$5,900 Registered Undecided >$13,000 1-10 Not Registered Fairly Sure Registered Data Mining Knowledge Discovery: Detailed Picture Model Significance And Results Foundation Methodology Data Number of Variables and Strengths Discovery Version 1.0 Desktop Manual calculation Manual Collection Local data Limited and Equally Weighed Indication of whether a student is returning or not Model Name Version 2.0 Enterprise Data mining, combined more than one methods Enterprise data Fairly high and Weight Depends on Data mining model Uncover big picture for the University and individual variables for each student based on the prediction of returning or not Recall Precision Accuracy Manual 34% 42% 59% Logistic Regression 64% 55% 68% Neural Network 54% 55% 67% Naïve Bayes 49% 67% 73% Decision Trees 39% 72% 72% Ensemble 75% 56% 69% So How Did the Model Actually Perform? Model Name: Recall: Precision: Accuracy: 2011 Fall New Students Data 75% 56% 69% 2012 Fall New Students Data 73% 54% 59% Key Takeaways Success depends on productive partnership between IT and business. Data is the KEY. Data mining is a process. Select attributes based on (retention) research and particulars of your school. Questions? Lalitha Agnihotri, [email protected] Alexander Ott, [email protected] Niyazi Bodur, [email protected]