Download 16: UPCAnalysis: Predictive Analysis of the Examinees` Outcome in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

Transcript
International Journal of Conceptions on Computing and Information Technology
Vol. 3, Issue. 3, October’ 2015; ISSN: 2345 - 9808
UPCAnalysis: Predictive Analysis of the Examinees’
Outcome in UPCAT for Rosales National High
School, Philippines
Aldous Val D. Basco, Jerald F. Dacumos, Ma. Kristine A. Dolor, Lian Grace Y. Perez, Zamora, Jennifer T.
College of Computer Studies
New Era University
No. 9 Central Avenue New Era, Quezon City, Phillipines
{aldousval, iamjfd.z, mariakristine.dolor, liangraceperez}@gmail.com, and [email protected]
Being the national college, a UP education is the pot of
gold that students everywhere throughout the nation yearn for
the University of the Philippines (UP) holds the crown as the
nation's top school (QS, 2013)[4]. Keeping in mind the end
goal to get admitted to the UP then again, the student needs to
pass UPCAT, of the UP College Admission Test (Sicat et al
2009)[2]. In any case, unfortunately, not everybody achieves
the end of the rainbow and gets to the pot of gold. In view of
UP's legacy of magnificence, administration and honor and its
type of genuine "Iskolars ng bayan", it is top decision among
school entrance examinees.
Abstract— Despite the fact that students from Rosales National
High School are performing well, statistics show that few
students pass the University of the Philippines College Aptitude
Test. UPCAT comprises of four subtests: Language Proficiency,
Science, Mathematics, and Reading Comprehension. The scores
on this subtest are consolidated with the weighted average of final
grades in the first three years of high school to determine
qualification into UP. By the use of data mining, this study will
formulate a forecasting model to help RNHS in predicting the
next UPCAT passers.
This study will be able to compare the methods and
techniques of data mining (Linear Regression, Decision Tree, and
Neural Networks) during the prediction of future UPCAT
passers, applying the data collected from 4th year students that
are in the first, second and third sections in Rosales National
High School, that includes their grades from 1st-3rd year with
some of their personal information such as Student No., Name,
Address, and their General Weighted Average.
The study would determine the future UPCAT results from
Rosales National High School. In this study, the researcher will
use data mining techniques that will help to predict the
outcome of the University of the Philippines College
Admission Test (UP, 2013)[6]. Data mining is the computerassisted procedure of burrowing through and breaking down
huge arrangements of information and afterward extracting the
significance of information (Alexander, 2015)[1]. It additionally
predicts practices and future patterns, permitting organizations
to make proactive, information driven choices. Also, data
mining tools can answer business addresses that customarily
were excessively tedious, making it impossible to determine. It
is an interdisciplinary field with contributions from many areas
such as pattern recognition and bioinformatics (Han and
Kamber, 2000)[3]. Other predictive problems include
forecasting and modeling. Modeling is basically the
demonstration of building a model in view of information from
circumstances where the answer is known and after that
applying the model to different circumstances where the
answers aren't known. Researcher used data of fourth year
students who took the exam from the year 2008-2012. At the
end of the study, a forecasting model will be developed and
will be used to predict if the student passed or failed in UPCAT
that can be considered long term because UP College
Admission Test is held once a year.
Keywords- Education, College entrance exam, UPCAT, Data
mining
I. INTRODUCTION
Education is an effective driver of improvement and is
one of the most grounded instruments for decreasing
neediness and enhancing wellbeing, sexual orientation
uniformity, peace, and solidness (The World Bank, 2015)[5].
Education gives kids, youth and grown-ups with the
information and aptitudes to be dynamic subjects and to
satisfy themselves as people. Moreover, it can change and to
actuate change and advance in the public. It empowers
individuals of the nation socialized and very much mannered
that is the reason numerous students are expecting to study in
a high institutionalized university.
College entrance exam is an institutionalized
inclination test to measure the aggregate learning in different
aptitude zones, for example, verbal, math, expository and
composing skills. These tests are not intended to quantify what
they have learned in school yet demonstrate the performance
of the student.
II. OBJECTIVES
This study aims to design a forecasting model that will
predict the outcome for the examinees from Rosales National
68
International Journal of Conceptions on Computing and Information Technology
Vol. 3, Issue. 3, October’ 2015; ISSN: 2345 - 9808
High School who will take the University of the Philippines
College Aptitude Test. And to know what are the indicators to
be considered in designing a forecasting model for UPCAT, as
well as the significant relationships of each variable. Also, this
study focused on determining the best technique in data
mining that can be used to come up with the most accurate
result and the level of acceptance of the predictive model to
develop.
An entrance examination is conducted by educational
institutions to determine whether prospective students are
qualified to enter. It is also used to determine the candidate’s
preparation for a course of study. Research data show that
individually administered aptitude tests have the following
qualities: a) excellent predictors of future scholastic
performance, b) provide ways of comparing an individual’s
performance with that of others in the same situation, c)
provide a profile of strengths and weaknesses, d) assess
differences among individuals, e) uncover hidden talents in
individuals, thus improving their educational opportunities,
and f) sere a valuable tools for working with the handicaps.
The use of GPA, as a predictive factor of student success, has
been used alone or in combination with other selective
admission criteria. (Stuenkel, 2009).
The study concentrated on high school students
performance that would figure out if the student would likely
pass the UPCAT or not and the different variables that would
influence their scores in the exam. The data from the previous
UPCAT passers grades and performance will be used by the
researcher. The study is limited to design a forecasting model
that will help to analyze the relationship between the
performance rating and the dataset from the records of the
students who passed the exam from school year 2008-2012 in
Rosales National High School and the profiles from the current
senior students.
So with data mining techniques, the cycle is built in
educational system which consists of forming hypothesis,
testing and training, i.e. its utilization can be directed to the
various acts of the educational process in accordance with
specific needs. a)number of students; b) professors; and c.)
administration and supporting administration.
III. RELATED WORKS
The following information gathered from reference book,
journals, internet and other related materials helped the
researcher to forecast the students’ outcome of the UPCAT.
Application of data mining in educational systems can be
directed to support the specific needs of each of the participants
in the educational process. The student is required to
recommend additional activities, teaching materials and tasks
that would favor and improve his/her learning. Professors
would have the feedback, possibilities to classify students into
groups based on their need for guidance and monitoring, to
find the most made mistakes, find the effective actions, etc.
Administration and administrative staff will receive the
parameters that will improve system performance (Romero et
al. 2007).
Data mining is the analysis of (often large) observational
data sets to find unsuspected relationships and to summarize
the data in novel ways that are both understandable and useful
to the data owner. Data mining typically deals with data that
have already been collected for some purpose other than the
data mining analysis (for example, they may have been
collected in order to maintain an up-to-date record of all the
transactions in a bank). This means that the objectives of the
data mining exercise play no role in the data collection
strategy. This is one way in which data mining differs from
much of statistics, in which data are often collected by using
efficient strategies to answer specific questions.
IV.
Simply state, data mining refers to extracting or “mining”
knowledge from large amounts of data. The term is actually a
misnomer. Remember that the mining of gold from rocksor
sand is referred to as gold mining rather than rock or sand
mining. Thus, data mining should have been more
appropriately name “knowledge mining from data,” which is
unfortunately somewhat long. “Knowledge mining,” a shorter
term may not reflect the emphasis on mining from large
amounts of data. Nevertheless, mining is a vivid term
characterizing the process that finds a small of precious
nuggets from a great deal of ram material (Kamber et al.
2000).
METHODOLOGY OF THE RESEARCH
Figure 1 shows the Architectural Design of the study.
Fig. 1. Architechtural Design
Data mining is one component of the exciting area of
machine learning and adaptive computation. The goal of
building computer systems that can adapt to their
environments and learn from their experience has attracted
researchers from many fields, including computer science,
engineering, mathematics, physics, neuroscience, and
cognitive science.
The current senior profile will be divided into two
sections: External and Internal. External data consists of
Name, Address and Student Number. Internal data consists of
the grades from 1st year to 3rd year together with the grades of
UPCAT passers from 2008-2012. After saving the students’
data in a database, it will now be imported for data
preprocessing that will use data mining techniques.
69
International Journal of Conceptions on Computing and Information Technology
Vol. 3, Issue. 3, October’ 2015; ISSN: 2345 - 9808
Data mining (called as knowledge or data discovery) is
the procedure of interpreting data from alternate points of
view and abridging it into helpful information - data that can
be utilized to build income, cuts costs, or both. Data mining is
one of various explanatory devices for interpreting
information. Linear Regression, decision tree and neural
networks are the techniques used in forecasting UPCAT
passers in Rosales National High School. The purpose of the
researcher in using the three data mining techniques is to
determine which forecasting model will give the most accurate
result.
Table 1 shows the summary for Decision Tree in
RapidMiner. The table indicates that 83.64% were correctly
classified. The Class Precision column, also called as positive
predictive value indicates the fraction of instances that is
retrieved which is relevant, while the Class recall, also called
as sensitivity is the fraction of relevant instances that are
retrieved. Both precision and recall are accordingly based on
comprehension and measure of relevance which implies that
high recall means that an algorithm returned significantly
more relevant results than irrelevant. As illustrated in Table 1,
on the first row, the model’s precision is 4/7 which is actually
63.64% and the second row’s model precision is 13/80 which
is actually 86.02% and that means, the model had returned
more relevant results. While the model’s recall for the column
True P obtained 35.00% and in column True F obtained
95.24% which actually indicates that the model got a high
recall and means that the model returned most of the relevant
results.
Statistical analysis was used also by the researcher for this
study. In Business Intelligence (BI), statistical analysis
includes gathering and examining each data sample test in an
arrangement of items from which tests can be drawn.
Aside from data mining and statistical analysis, the
researcher used the Correlational Research Design that
attempts to explore relationships to make predictions.
Constructive Research Design is also used in the study because
it is a method that builds artifact that solves a domain problem
in order to create knowledge about how problems can be
solved.
V.
Figure 3 shows the Neural Network Model using WEKA
data mining tool. In WEKA, the more attribute or perceptron
the more accurate results it provided.
RESULTS AND DISCUSSION
Fig. 2. Decision Tree from RapidMiner
Fig. 3. Neural Network Model using WEKA
Figure 2 demonstrates the decision tree show that the
Math, English and Science are the characteristics are the most
influencing in the model, respectively. The P means Passed
and the F means Failed. To interpret the tree, and reads
through it until it reached the leaf node.
TABLE I. SUMMARY TABLE FOR DECISION TREE FROM
RapidMiner
True P
Pred. P
Pred. F
Class recall
True F
Class Precision
7
4
63.64%
13
35.00%
80
95.24%
86.02%
**accuracy: 83.64 +/- (mikro: 83.65%)
Fig. 4. Results of Neural Network using WEKA
70
International Journal of Conceptions on Computing and Information Technology
Vol. 3, Issue. 3, October’ 2015; ISSN: 2345 - 9808
Figure 4 illustrates the result of the model. It indicates
that 87.0149% of the instances are correctly satisfied. The
mean absolute error of the model is 0.1241 which is almost
13% of the data that means the model obtained low accuracy
rate.
percentage error. Based from the accuracy rate, 92.7894%
efficiency of the forecasting model outcome of the UPCAT
passers means that the model would give reliable results as that
percentage of efficiency record. The model also represents that
9 per 10 students or for every 100 examinees, 92.7894 have
chances of being passed.
TABLE II. COEFFICIENT TABLE FOR MULTIPLE LINEAR
REGRESSIONS USING IBM SPSS
Model
Constant
FIL_AVE
ENG_AVE
MATH_AVE
SCI_AVE
VI. CONCLUSION AND DISCUSSION
The significant predictors that the forecasting model
classified are the subjects English, Filipino, Mathematics and
Science which are true as these subjects are the four subtests
comprising UPCAT. The student’s performance in English,
Science, Math and Filipino increases, then there would be an
increase in the University Predicted Grade. That means that
the UPG and the four variables are having significant
correlation.
Unstandardized
Coefficients
B
Standard
Error
T Stat
P-value
19.204
19.934
-0.620
0.293
-0.411
0.431
0.248
0.253
0.382
0.317
0.232
0.424
0.293
0.287
Independent Variable: UPG_Converted
0.371032
0.03884
0.083707
0.223149
0.157988
In the research, three data mining algorithms were applied
on the assessment data to predict the future UPCAT passers in
Rosales National High School either the examinees passed or
failed. The best data mining technique for this study is the
Multiple Linear Regression using IBM SPSS as it provides the
most accurate results. The 92.7894% efficiency of the
forecasting model outcome of the UPCAT passers means that
the model would give reliable results as that percentage of
efficiency record. The model also represents that 9 per 10
students or for every 100 examinees, 92.7894 have chances of
being passed.
In table II, column 1 shows the predictor variables
(constant, FIL_AVE, ENG_AVE, MATH_AVE, and
SCI_AVE). In Multiple Linear Regression, several
independent variables or functions are there. Adding a term x
to the preceding regression gives equation:
Y=B0+B1X1+B2X2+B3X3+B4X4
(1)
The first variable Constant represents itself, also referred
to as independent variable, UPG_Converted as the Yintercept, the height of the regression line when it crossed the
Y-axis. The second column, B values are used to predict the
dependent variables from independent variable. The B
coefficients of FIL_AVE, ENG_AVE, MATH_AVE, and
SCI_AVE indicate that for every unit increase,
UPG_Converted is predicted. The efficiency of the forecasting
model can be determined using the standard errors and using
the formula of Mean Absolute Percentage Error.
The future developers should use this model as the study
formulated a forecasting model that can be extended with more
distinctive attributes that will obtain accurate results and be
useful to improve the students learning outcome.
REFERENCES
[1]
[2]
[3]
(2)
[4]
[5]
[6]
[7]
The absolute value in this calculation is summed for every
fitted or forecasted point in time and divide it again by the
number of fitted points n, multiplying it by 100 makes it a
71
Doug Alexander, “Data Mining”, unpublished.
Gerardo P. Sicat and Marian Panganiban, “High School Background and
Academic Performance”, August, 2009, in press.
Jiawei Han and Micheline Kamber, “Data Mining: Concepts and
Techniques, 2nd edition”, 2000.
QS Quacquarelli Symonds, “Top Universities”, 2013, unpublished.
The World Bank Group, “Education”, 2015, unpublished.
University of the Philippines, 2015.
Novelozo, Diaz, “Predictive Analysis of Examinees outcome in UPCAT
for Rosales National High School”, unpublished.