Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Educational Data Mining to inspect low performance academic areas of the students using ensemble classification Khawar Shakeel Department of Computer Science University of Gujrat, Pakistan Email: [email protected] Naveed Anwer Butt Department of Computer Science University of Gujrat, Pakistan Email: [email protected] Presented by Khawar Shakeel 1 Outline Introduction • What is Educational Data Mining (EDM)? • What problems can we solve using EDM? • Stakeholder Design Goal Related Work Design Approach Experimental Results Suggestions & Future Work 2 Educational Data Mining (EDM) - Introduction Data Mining (DM) ? Data mining is a method to identify the hidden details from the huge volume raw data; such methods are applied when data is outsized and less knowledge about data available. The Educational Data Mining is currently a growing research area of Data Mining (DM) based on statistical methods for educationally linked data in order to improve the system and quality of higher education institutions. 3 Possible Questions to be solved by EDM How to predict students learning behavior? How to group up the students according to their interests? What are the strong and weak areas of studies of students? How to identify the students needing more help? Which group(s) of students likely to be dropped or promoted? What kind of educational resources need to be allocated? and why? 4 EDM- Stakeholder Administration Administrators use EDM to make sure the allocation of the useful resources for the betterment in institutional education, Faculty and advisors are becoming more proactive in identifying and addressing at risk students. Educators Educators attempt to understand the learning process and the methods they can use to improve their teaching methods. Researchers Researchers focus on the development and the evaluation of data mining techniques for effectiveness. 5 Our study- Design Goal Design a predictive model capable of To explore the reason(s) of poor performance of majority of the students in some specific course(s) or domain in order to intimate the administration for necessary actions need to be taken accordingly. Main tasks are Extraction of predictable attributes from the data source. Identification of different attributes that may determine learning behavior of the student. Construction of prediction model based on selected predictable variables using different existing ensemble classification algorithms. Report to administration about the findings. 6 Previous Work Although, data mining in education is not a mature field but there are a lot of work has done in this area. That is because of its prospective to educational establishment. 7 Previous Work 8 Previous Work 9 Design Approach - Data Collection Student information System of university - Data Pre-processing ◦ Selection ◦ Cleaning ◦ Transformation - Development of model based on Ensemble classification algorithms ◦ Bagging ◦ Boosting (J48 Decision Tree algorithm as base classifier) - Useful patterns leading better decision making 10 Proposed Design Overview 11 Design Approach –Data Collection Secondary data is collected initially through the Semester system using University Information System. Targeted Students are from Master’s and BS (HONS) degree programs registered in different departments of all faculties of a Public Sector University. 12 Design Approach –Data Introduction The attributes from the data need to be examined are students marks of each category like assignment, quiz, presentation, midterm, final subjective final objective for courses. The final data for model included 3130 instances and 7 variables. 13 Design Approach – Dataset 14 Design Approach–Data Preprocessing Data Selection Two departments from each faculty are selected. The extracted data is from batches of years 2008 & 2009 of BS (HONS) and 2010 & 2011 of master degree programs. Only academic activity values are being recorded as variable, ignoring student’s other information like demographical and finance etc. Data Cleaning The record of student(s) having missing marks in any exam category of any course is being cleaned because it can leads to bias decision sometime. 15 Design Approach– Data Preprocessing In university, there are many courses are being taught i.e. general courses, elective courses, compulsory courses and core courses. When we talk about grading of a courses means where the obtained marks falls in grading ranges. Here we are considering C, D, and F grades as low grades where the marks tend to less than 60. These said grades consider low because these clue to affects the GPA of student negatively. The data of courses having higher percentage of low grades are selected for analysis, For the selected courses, the data of all students is collected. 16 Our Approach –Ensemble Classification 17 Our Approach –Ensemble Classification Ensemble classification techniques based on the method of combining the classifiers in order to acquire the reliable results. The most common model combining approaches that exist in the data mining, are Bagging and boosting. Bagging technique has a voting structure in which n models, generally of same nature, are built. For an unidentified instances, each model’s predictions are verified. That class is given which is contribution the majority vote between the predictions from models. Boosting technique has is almost same to bagging in which only the model building stage changes. Here the instances which are repeatedly misclassified are permitted to contribute in training added amount of times. There are normally n classifiers which having distinct weights for their accuracies. As a final point, that class is given which to having maximum weight. 18 Results – Basic statistical Facts 19 Academic Facts 90 80 82.0291 81.4414 74.486 70 67.5114 60 66.7432 70.9326 69.8718 65.783 59.6167 56.0114 50 HIGH GRADE 45.574 40 LOW GRADE 35.3155 30 20 10 0 ASSIGNMENT QUIZ PRESENTATION MID TERM FINAL SUBJECTIVE FINAL OBJECTIVE 20 Results – Technical Facts Algorithm’s Accuracy 96.4217 98 96 94.984 Boosting 94 Bagging Accuracy Bagging Boosting 21 Results – Statistical Summary 22 Technical Bottom line The results stating the facts that the boosted tree performs outclass than bagged tree comparatively when the standard deviations are in higher range but also when the data size is small at the same time. 23 Suggestions • Summarizing these facts, it is concluded that students need to improve their subjective approach like in “Mid Term” and “Final Subjective” in order to have “High Grades” and promoted to the next semester. And there should be balance in evaluation of different level of students. 24 Future Work As for future work, some other factors related to our research questions will be included like some financial and behavioral factors that may lead to better classification and to answer some new real time questions from educational environment. The more facts can be catch by enlarging the dataset and to include some other variables routing to new directions in decision making. Also some other mining techniques can be applied to discover some realities other than classification nature. 25 Thank You. [email protected] 26