Imperial Journal of Interdisciplinary Research (IJIR)
Vol-2, Issue-12, 2016
ISSN: 2454-1362
Recognition of Slow Learners Using
Classification Data Mining Techniques
Mukesh Kumar1, Shankar Shambhu2 & Punam Aggarwal3
Chitkara University, HP (INDIA)
Smt. Aruna Asaf Ali Govt. P. G. College, Kalka (INDIA)
Abstract: Educational Data Mining is used to predict
the future learning behavior of the student. It is still
a research topic for the researcher who wants do
better result from the prediction of the student. The
results of all these techniques help the teachers,
management, and administrator to draft new rules
and policy for the improvement of the educational
standards and hence overall results and student
retention. Taking this point in mind work has been
done to find the slow learner in a High School class
and then provide timely help to them for improving
their overall result. There are lots of techniques of
data mining are available for use but we are
selecting only those techniques which are mostly
used by different research for their result prediction
like J48, REPTree, Naive Bayes, SMO, Multilayer
Perceptron. On the collected dataset Multilayer
Perception classification algorithm gives 87.43%
accuracy when using whole dataset as training
dataset and SMO and J48 gives 69.00% accuracy
when using 10-fold cross validation algorithm.
Keywords: Data Mining, Educational
Prediction, Classification, Clustering.
1. Introduction
Educational data mining is one of the applications
of data mining. Data mining is used to find the
hidden pattern from a huge dataset and then apply
that hidden patterns for the decision making in
future. Its application is not limited to education but
also covers fields like sales, retail, transportation,
sports, marketing etc. In education, these data mining
techniques are used to predict slow learners, dropout,
under-performer etc and hence provide timely help to
those students who are the problem in education.
EDM is also broadly applied to E-learning system,
classroom teaching, MOOC Course learning,
curriculum redesign analysis, distance education etc.
It is a process to find some knowledge of the
database and then apply that knowledge for future
improvement. It is also known as KDD (Knowledge
discovery in a database). There is so many tools are
available in the commercial market for the mining
purpose like RapidMiner, WEKA, DBMiner,
interface for all techniques of mining.
Different research gives their own definition of
education, but the overall result of each research is
same i.e. How to improve the overall education
system? There are lots of factors which affect the
education system like the student, management,
administrators, infrastructures, teachers, teaching
methodology, basis facilities for boys and girls (like
separate toilets for M/F), transportation facilities etc.
So with the help of educational data mining
techniques, detailed analyses are performed on these
factors and find out those factors which affect the
education of the student and hence students
2. Liability of data mining in academics
As already written EDM play a significant role for
the overall development of the education? With the
help of this following question can be answered
which are enlisted below:
I. Who is the weak student in a particular class?
II. Who is likely being the dropout in the
III. Which subject students like most in their overall
IV. Which courses most attract the student in
V. Find out the possible attribute which effect the
student education and hence performance?
VI. How we can help those student who are slow
learner, under-performer and problem of
education dropout.
VII. Most importantly predicting result of the student
in final examination.
3. Proposed work for this research
Education plays a crucial role in the development of
the society. If education system and technology work
together then it makes unbelievable growth for
Fig 1. Use of Data Mining in Education setting
the society. At present technology are also used in
education like E-learning, MOOC Courses, Smart
Classes are introduced in school and it really works
well for the overall development of the student.
Technologies like data mining are also introduced in
the education sector for the prediction of the student
in their future learning behaviors. So taken these
prospective in mind, work has been done to predict
the slow learners in a class and hence provide timely
help for improving their final result. The major
motives behind this work are:
I. Find source of data collection for creating
dataset which contains predictive variables
II. Selecting best data mining technologies for the
analysis of the student performance.
III. After analysis of the dataset identifies those
students who are slow learners and need
immediate help in their study.
IV. Observe those variables which are extremely
influencing for the prediction of the student
academics performance.
V. At the end compare the predictive result of all
the techniques and choose the best classification
algorithm result for further improvement.
At the end of this paper, all these motives are
fulfilled and a brief conclusion is given. It helps for
further research in this novel field of data mining.
4. Literature Survey: Background and
prior work in this area
Use of data mining in education is tremendous.
But still, lots of researchers are working on
educational data mining techniques for the
betterment of education. As already told it's a broad
field and not limited to the present discussion like the
prediction of slow learners in a class.
Fig 2: Data Mining Process to be taken under consideration
Han Jiawei and Micheline Kamber, Education
Data Mining is a process of Knowledge Discovery in
a huge database which consists of Cleaning,
Integration, Selection, Transformation of data and
pattern Evaluation Phases [1].
S Weiss et al, They explain data mining as search
techniques, which search for the valuable
information for a huge database and hence apply that
information for better decision making [2].
William J. Frawley et al and Tech. Forecast, Data
Mining is a detailed process of extracting useful
pattern or useful information which is formerly
unknown to the database user. The known pattern or
information may include information like association
rules between variables, pattern finding between
variables etc. [3] [4].
S. Pal et al, they are using linear regression
technique for their analysis and find that factors like
mother’s education and family income of students
affect their academic performance [5].
M Bala, Dr. DB Ojha, They define EDM as
techniques which are helpful in finding the unknown
facts from a larger database, which are impossible to
find manually and hence that information, are
effectively used in the education setting. It is used to
increase the student retention rate, improve the
educational standard, and help administrator for
setting new rules and regulation for improving
educational standard [6].
Ying Zhang et al, to extract useful information,
data mining used to combine the machine learning,
visualization, and statistical approaches. There are
lots of techniques are adopted to collect the data for
making dataset for analysis like questionnaires,
feedback form, interview, discussion. After
collecting all data make a dataset according to the
selected tool for analysis and then apply some
techniques for analysis like Classification,
Clustering, Linear Regression, Support Vector
Machine, Decision Tree, Naive Bayes and K-mean.
Student learning behaviors, Course learning, student
retention rate, course suitability etc are predicted
using data mining techniques [7].
Cortez and Silva, taken twenty-nine attributes for
prediction of the result in Mathematics and
Portuguese. They applied data mining algorithm (like
Decision Tree, Neural Network, Support Vector
Machine and Random forest) on the dataset of 788
students of two schools from the Alentejo region of
Portugal. After analysis, they found that Decision
Tree (DT) and Neural Network (NN) had 93% and
91% accuracy in predicting the result according to
two- class (pass/fail) respectively [8]., in his case study they analyze the
student’s data for predicting their future learning
behaviour and hence the result. They also predict the
student result and warn them that they are at risk of
failure in final examination and provide timely help
to them [9].
M. Ramaswami et al, for the analysis of the
educational outcome of the student’s in higher
secondary education they used CHAID prediction
techniques to find the interrelationship between
different variables which are used for the prediction.
They used seven different class predictor variables
for their experimentation [10].
Lars Schmidt-Thieme et al, they applied machine
learning algorithm for prediction, a result of which
was further used to improve the academic
performance of the student. To deal with the problem
of the imbalanced data they applied three different
methods and hence found satisfactory results. After
balanced the datasets they further used SVM for a
small dataset and Decision tree algorithm for larger
dataset [11].
V. Ramesh et al, they applied survey
methodology to make the final dataset with some
significant variable of students and with
experimental methodology tried to found only those
variables which influencing the final result of the
student. They applied SMO, J48, REPTree, Naive
Bayes and Multilayer perception techniques for their
experimentation purpose. After analysis, they found
that factor like parent’s occupation plays a very
important role in student performance [12].
Applying EDM techniques for knowledge
discovery is important for the teachers, management,
and student. They all are using this knowledge for
the improvement of the education system. Teachers
are using this knowledge for improving their
teaching standard and the student is using to
improving their learning skill. Management of the
institution is using this knowledge for improving
infrastructure standard, provide basic facilities to the
student and decision making.
5. Data collection and proposed
By Han Jiawei, Micheline Kamber, EDM
software's should be developed in such way that the
users can analyze the student data with different
dimensions, enables to categorize and summarize the
desired results [2].
To complete this work a survey was conducted on
student and then for analysis purpose a data mining
tool should be used. Here WEKA tool is used for the
analysis purpose because it is open source software
and almost all the data mining techniques are
implemented on it. After the detailed survey and
having the discussion with the experts, some
attributes are selected related to the students which
are mostly affecting the academic performance in
high schools. These attributes are also known as
input variables for the analysis of the dataset.
The data are collected from two different high
schools with the help of survey method and after that
put it in desired file format required for the analysis.
Table-1: Selected Attributes of student taken for analysis purpose
Types of High School
{ Govt, Private, Govt_add}
Types of Education board
{State Board, CBSE, ICSE}
Medium of Instruction
{Hindi, English, Pahari}
Type of School
{Boys, Girls, Co-education}
Gender of student
{Girls, Boys}
Private Tuition taken
{Yes, No}
Location of the school area
{Urban, Rural}
Internal Grade of student
{A, B, C, D, E, F}
Mobile Phone
{Yes, No}
Computer at Home
{Yes, No}
Internet access to student
{Yes, No}
Attendance in the school
{ In % age out of attendance taken}
Eligible of Not Eligible
{E, NE}
Here for the implementation purpose CSV file
format are used for WEKA tool. WEKA is a open
source software tool kit and support maximum of
classification, clustering and association rule
6. Implementation of EDM techniques
on dataset
During this phase of work, first of all we preprocess our dataset with the help of WEKA
Preprocess feature on the tool interface.
Apply Filters on the dataset: For pre-processing
of dataset, implement filters on it to remove those
attributes who are not supposed to be significant for
the result prediction. After implementation of filter,
dataset left with only eight different attributes along
with class. The removed attributes are types of High
for the classification of the dataset with two class
values. Most of the algorithms are used ranker search
for find the high potential attributes in the given
School, types of education board, medium of
instruction and type of school.
Find out the High Potential Attribute: After
pre-processing of dataset find the high potential
attribute which are critically affects the overall
dataset with different attribute selection method. In
WEKA the different attribute evaluator algorithm are
CorrelationAttributeEval, GainRatioAttributeEval,
PrincipalComponents, ReliefAttributeEval, and
SymmetricalUncertAttributeEval which are further
used different search method like BestFirst,
GreedyStepwise and Ranker. In this work all the
dataset are used as a training dataset and we are not
using 10-fold cross validation method because of less
data in dataset. After applying all algorithm of
attribute evaluator INT_GRD, INT_ACC and ATND
dataset. In table below the entire algorithm with their
search method and first rank attribute are mentioned.
Table-2: High Potential attributes selection form the dataset
Ranker, BestFirst
At the end of this section, it is clear that only
seven attributes in the dataset are useful for the
prediction of the class attribute and rest of the
attributes are not affecting the overall result of the
7. Results of implementation
classification algorithm like Naive Bayes, SMO, J48,
REPTree and Multilayer Perceptron. These entire
algorithms are also tested with 10 fold cross
validation check as well as using full training data
set. The Correctly and Incorrectly classified
Instances after implementing listed algorithm under
10-fold validation are given in table below:
After the completed the pre-processing task,
dataset is tested and analyzed with five well known
Table-3: Correctly & Incorrectly Classified Instances using 10 fold validation check
Data Mining Techniques
Multilayer Perceptron
Naive Bayes
Correctly Classified
57.2864 %
67.3367 %
69.3467 %
69.3467 %
67.8392 %
67.8392 %
69.3467 %
Using 10-fold validation check classification
algorithm like J48, SMO and ZeroR are performing
better than other algorithm under consideration. The
Incorrectly Classified
42.7136 %
32.6633 %
30.6533 %
30.6533 %
32.1608 %
32.1608 %
30.6533 %
correctly classified instances are 69.3 percent. Which
is acceptable as the baseline condition given by
ZeroR algorithm is also 69.3 percent?
Fig - 3: Comparison of classifiation accuracy with the help of graph
The Correctly and Incorrectly classified Instances
after implementing listed algorithm using full dataset
as training dataset are given in table below:
Table-4: Correctly & Incorrectly Classified Instances using training data set
Data Mining Techniques
Multilayer Perceptron
Naive Bayes
Correctly Classified
87.4372 %
69.3467 %
69.3467 %
69.3467 %
72.3618 %
69.3467 %
69.3467 %
Using full dataset as training dataset for
classification algorithm like Multilayer Perceptron
Incorrectly Classified
12.5628 %
30.6533 %
30.6533 %
30.6533 %
27.6382 %
30.6533 %
30.6533 %
are performing exceptional well with 87.43 percent
correctly classified instances.
Fig - 3: Comparison of classification accuracy with the help of graph
medium of instruction. May be these attribute also
8. Conclusion and future scope
affect the performance of the student in education.
There are lot of drawbacks in education system
like midterm evaluation system use. It is really not
