Download khawar-shakeel-university-of-gujrat-pakistan

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Educational Data Mining to inspect low performance
academic areas of the students using ensemble classification
Khawar Shakeel
Department of Computer Science
University of Gujrat, Pakistan
Email: [email protected]
Naveed Anwer Butt
Department of Computer Science
University of Gujrat, Pakistan
Email: [email protected]
Presented by Khawar Shakeel
1
Outline
Introduction
• What is Educational Data Mining (EDM)?
• What problems can we solve using EDM?
• Stakeholder
Design Goal
Related Work
Design Approach
Experimental Results
Suggestions & Future Work
2
Educational Data Mining (EDM) - Introduction
Data Mining (DM) ?
Data mining is a method to identify the hidden details from the huge
volume raw data; such methods are applied when data is outsized and
less knowledge about data available.
The Educational Data Mining is currently a growing research area of
Data Mining (DM) based on statistical methods for educationally linked
data in order to improve the system and quality of higher education
institutions.
3
Possible Questions to be solved by EDM
How to predict students learning behavior?
How to group up the students according to their interests?
What are the strong and weak areas of studies of students?
How to identify the students needing more help?
Which group(s) of students likely to be dropped or promoted?
What kind of educational resources need to be allocated? and why?
4
EDM- Stakeholder
Administration
Administrators use EDM to make sure the allocation of the useful
resources for the betterment in institutional education, Faculty and
advisors are becoming more proactive in identifying and
addressing at risk students.
Educators
Educators attempt to understand the learning process and the
methods they can use to improve their teaching methods.
Researchers
Researchers focus on the development and the evaluation of data
mining techniques for effectiveness.
5
Our study- Design Goal
Design a predictive model capable of
To explore the reason(s) of poor performance of majority of the students
in some specific course(s) or domain in order to intimate the
administration for necessary actions need to be taken accordingly.
Main tasks are
Extraction of predictable attributes from the data source.
 Identification of different attributes that may determine learning
behavior of the student.
Construction of prediction model based on selected predictable
variables using different existing ensemble classification algorithms.
Report to administration about the findings.
6
Previous Work
Although, data mining in education is not a mature
field but there are a lot of work has done in this area.
That is because of its prospective to educational
establishment.
7
Previous Work
8
Previous Work
9
Design Approach
- Data Collection
Student information System of university
- Data Pre-processing
◦ Selection
◦ Cleaning
◦ Transformation
- Development of model based on Ensemble classification algorithms
◦ Bagging
◦ Boosting
(J48 Decision Tree algorithm as base classifier)
- Useful patterns leading better decision making
10
Proposed Design Overview
11
Design Approach –Data Collection
Secondary data is collected initially through the Semester
system using University Information System.
Targeted Students are from Master’s and BS (HONS)
degree programs registered in different departments of all
faculties of a Public Sector University.
12
Design Approach –Data Introduction
The attributes from the data
need to be examined are
students marks of each
category like assignment, quiz,
presentation, midterm, final
subjective final objective for
courses.
The final data for model
included 3130 instances and 7
variables.
13
Design Approach – Dataset
14
Design Approach–Data Preprocessing
Data Selection
Two departments from each faculty are selected. The extracted
data is from batches of years 2008 & 2009 of BS (HONS) and
2010 & 2011 of master degree programs.
Only academic activity values are being recorded as
variable, ignoring student’s other information like demographical
and finance etc.
Data Cleaning
The record of student(s) having missing marks in any exam
category of any course is being cleaned because it can leads to
bias decision sometime.
15
Design Approach– Data Preprocessing
In university, there are many courses are being taught i.e. general
courses, elective courses, compulsory courses and core courses.
When we talk about grading of a courses means where the obtained
marks falls in grading ranges. Here we are considering C, D, and F
grades as low grades where the marks tend to less than 60.
These said grades consider low because these clue to affects the GPA of
student negatively.
The data of courses having higher percentage of low grades are
selected for analysis,
For the selected courses, the data of all students is collected.
16
Our Approach –Ensemble Classification
17
Our Approach –Ensemble Classification
Ensemble classification techniques based on the method of combining
the classifiers in order to acquire the reliable results. The most common
model combining approaches that exist in the data mining, are Bagging
and boosting.
Bagging technique has a voting structure in which n models, generally of
same nature, are built. For an unidentified instances, each model’s
predictions are verified. That class is given which is contribution the
majority vote between the predictions from models.
Boosting technique has is almost same to bagging in which only the
model building stage changes. Here the instances which are repeatedly
misclassified are permitted to contribute in training added amount of
times. There are normally n classifiers which having distinct weights for
their accuracies. As a final point, that class is given which to having
maximum weight.
18
Results – Basic statistical Facts
19
Academic Facts
90
80
82.0291
81.4414
74.486
70
67.5114
60
66.7432
70.9326
69.8718
65.783
59.6167
56.0114
50
HIGH GRADE
45.574
40
LOW GRADE
35.3155
30
20
10
0
ASSIGNMENT
QUIZ
PRESENTATION
MID TERM
FINAL SUBJECTIVE FINAL OBJECTIVE
20
Results – Technical Facts
Algorithm’s Accuracy
96.4217
98
96
94.984
Boosting
94
Bagging
Accuracy
Bagging
Boosting
21
Results – Statistical Summary
22
Technical Bottom line
The results stating the facts that the boosted tree performs
outclass than bagged tree comparatively when the standard
deviations are in higher range but also when the data size is small
at the same time.
23
Suggestions
• Summarizing these facts, it is concluded that students need to
improve their subjective approach like in “Mid Term” and
“Final Subjective” in order to have “High Grades” and
promoted to the next semester. And there should be balance in
evaluation of different level of students.
24
Future Work
As for future work, some other factors related to our research
questions will be included like some financial and behavioral
factors that may lead to better classification and to answer some
new real time questions from educational environment.
The more facts can be catch by enlarging the dataset and to
include some other variables routing to new directions in decision
making. Also some other mining techniques can be applied to
discover some realities other than classification nature.
25
 Thank You.
[email protected]
26