Download Ramalan prestasi pelajar SPM aliran kejuruteraan awam di Sekolah

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Shockwave Rider wikipedia , lookup

Mathematical model wikipedia , lookup

Cross-validation (statistics) wikipedia , lookup

Machine learning wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Pattern recognition wikipedia , lookup

Time series wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
Neural Network Approach In Predicting Academic
Performance
ROSHIDI, DIN
ROBIZAH, SUDIN
School of Information Technology
Northern University of Malaysia
06020, Sintok, Kedah
MALAYSIA
Abstract: - This paper is mainly corcerned with the academic achievement analysis and the
application of neural network in academic performance prediction. The system is trained to
approximate the mathematical function using previous students result to predict the students
performance of the future achievement. A Multi-Layer Perceptron (MLP) uses as a neural
network model involving a backpropagation algorithm and tangent and sigmoid function as the
transfer function. The training data used not only based on the students result but also
incorporates some technical indicators.
Key-words:-Academic performance, Prediction, Preprocessing, Multi-Layer Perceptron.
1
Introduction
Excellent performance of the students
has always been a major issue in
education. If the students’ performance
can be predicted, proper planning can
be taken especially by the school to
ensure a better performance in the
future. It is about time for us to find and
develop a new mechanism to perform
this complex nonlinear forecasting
problem. Forecasting, at least intelligent
forecasting, very common problem in
human life, is predicting future events
based on historical data[6] very tough
task[3] and with neural networks
models,
effective
predictive
applications can be developed[7]. It has
been found from many studies that
neural network is capable of forecasting
and giving better results compared to
statistical and traditional analytical
methods[1][2][10][12]. In addition,
[11]what makes a neural network
superior to traditional statistical
methods in forecasting is that a neural
network is better able to recognize the
high level features, such as the intracorrelation or serial correlation, of a
training set and were well suited to do
prediction tasks[9]. It is believed that
neural network are also applicable in
areas related to education.
The goal of this study is to develop a
neural network model for forecasting
the students’ Malaysia Certification of
Education (MCE) results. The model is
trained by using a feed forward multilayer
perceptron
structure
with
backpropagation learning algorithm.
This study is also anticipated to
examine the applicability of a neural
network model for a predicting task.
The hypothesis is that the use of neural
networks will produce an accurate
model of student exam score
performance based on other exam
scores. Typically, this study involves a
three-stage process. The first stage is
about the decisions made on what the
input variables and learning parameters
would or should be. The second stage is
the training and testing the network.
The neural network model is then
formed from the learning process done
during the previous data. It’s training by
repeatedly taking in each input
(example data or a set of training facts),
guessing at its output, and comparing
the results to the supplied output[4].
The network is trained using a subset of
data until the average error between the
forecast and the actual values is reduced
to a minimum[2]. Once a neural
network has been adequately trained, it
is tested to see how well it has learned.
Several important network parameters
are identified to form a model that could
deliver the best performance. The last
stage is running the network. Running
the network means presenting it with
new inputs with no known outputs[4].
This study looks at the possibility of
presenting the network with data and
determines the respective behavior of
the network.
2
Methods
In this study, data were preprocessed
and transformed into the type and
format accepted by the neural network.
It is because the process of
preprocessing were made it more
meaningful[15] and the way the
application data were represented and
ordered determines the efficiency and
possibly the accuracy of results[16]. In
view of the fact that preprocessing
methods were critical to the neural
network’s performance. The process of
preprocessing were included data
cleansing, data representation, data
scaling and normalization, data
randomization and data segmentation.
After all of these processes had been
carried out, only the genuine data were
really ready for training and testing. The
minimum amount of data required for
training is 10(M + N) where M equals
the number of inputs and N equals the
number of outputs. Another rule of
thumb is that the number of training
cases should be 10 times the number of
model weights. The subjects’ points
(i.e. results), the student gender and the
year were defined as the input vectors to
the network. And the overall
achievements of the students (i.e. the
overall grades) were defined as the
output vector of the network.
We were used the raw data in the
training set in different formats. Some
of them were not acceptable or
understandable by the neural network.
Consequently, the inappropriate data
has to be transformed into network
understandable format, particularly
numeric form. It is because the task of
representing the data in a meaningful
way is an essential stage in the
successful application of a neural
approach[5][15]. The student gender
was converted into numerical code
which is 0 represent as a male and 1
represent as a female - meaningful
forms to a neural network. Since the
natural range of data was different from
the network’s operating range, the
numeric data were normalized or
scaled[15] after data representing.
Furthermore, the data were converted to
numbers so that it was easy for the
neural network to do its job which
usually only in the form of scaling and
normalization[8]. In addition, neural
networks understand only numbers that
were segmented differently in such a
way that:
! 70% of the data was allocated for
training
! Another 20% for validation
! And the remaining 10% was for
testing
After that, both data set (Data Set A and
Data Set B) were trained and tested
sequentially and randomly. From that,
the best result was produced and the
best-fitting neural network model would
be identified. After getting the bestfitting neural network model, then the
network was ready to run a real
prediction. A small group of data from
the preprocessed data was presented to
the network for prediction. The network
was expected to produce a prediction
for the student grade. In this operational
phase, the network was more efficient,
since it was no longer subjected to the
iterative process of training.
3
Experimental Result
Average
Testing
Result (%)
B
Average
Training
Result (%)
A
Best
Two Hidden
Neurons
Data Set
Experimental
Approach
The input patterns were trained and
tested both sequentially and randomly.
Table 1 summarizes the results of the
learning and testing phases respectively.
Data Set A with random testing gave
the best result which is the highest
percentage of correct testing. This
shows that the random approach is
much better than the sequential
approach.
Sequential
fall within the range of the neuron
activation function - usually 0 to 1, or –
1 to +1 [4]. After that, the data were
normalized to make sure all input
variables were treated equally and to
encourage faster convergence. This was
very important to ensure that the
statistical distribution of values for each
net input and output is roughly uniform.
The result or point obtained for each
subject taken by the student was
normalized as new X value = (Old X
value / maximum value). Maximum
value was used in the formula to limit
the maximum value to unity[15]. From
the training set, the year attribute was
scaled into the range of 0.1 and 0.9.
Scaling of the variable between 0.1 and
0.9 was used to limit the amount of the
sigmoid activation function used in the
representation of the variables, in order
to get out of “network paralysis” in the
training process[15]. The formula used
to scale the variable mentioned above
was [0.8/(xmax – xmin)]x + [0.9 – 0.8 xmax
/(xmax – xmin)]. Without adjustment, data
could not be presented to the network
and might not be able to make a
generalization. To avoid getting poor
results, thus, the training set were
randomized. We were splitted the
training set into several segmentations
to ensure the neural network performed
at
its
best
and
to
ensure
generalization[14].
Basically,
the
genuine data
were segmented as
follows:
! 80% of the data was allocated for
training
! Another 10% for validation
! And the remaining 10% was for
testing
In order to determine the best allocation
for data segmentation, another different
approach was done on the data. Here,
the data set (referred as Data Set B)
4
9
4
6
75.00
77.61
69.21
73.21
71.11
75.11
70.67
69.33
Random
A
B
4
5
5
8
75.83
75.43
75.43
76.95
76.45 *
76.00
75.11
76.00
Table 1: Results from the learning and
testing phases for different approach
and data set
Hence, the two best numbers of hidden
neurons were found to be 4 and 5. Out
of the two values obtained above, the
best one would be chosen as the most
fitting number of hidden neurons of the
neural network structure. Networks with
4 and 5 hidden neurons were then tested
with 10 different weights initialization.
Values obtained with 4 and 5 hidden
neurons are presented in the following
table. Results from the table 2, shows
that the best numbers of hidden neurons
are 5 since its average percentage of
correct testing is higher than of 4. So
from this point onwards, the best-fitting
neural network model that utilized the
best parameters, were used employed 5
hidden neurons at the hidden layer of
the three-layered network structure.
Hidden
Hidden
neurons = 4
neurons = 5
Weight Average Average Average Average
Seed Training Testing Training Testing
Result
Result Result
Result
1
2
3
4
5
6
7
8
9
10
AVER
AGE
75.83
75.22
76.05
61.56
76.17
72.50
72.11
69.22
69.06
72.89
72.06
76.45
76.00
76.89
60.44
77.33
73.33
71.11
68.45
68.00
73.33
72.13
75.83
76.00
76.33 *
75.61
73.56
71.89
76.22
76.11
73.50
75.17
75.02
76.00
77.33
76.89
76.44
72.44
72.45
76.89
76.89
74.22
75.56
75.51
Table 2 : Results from the training and
testing phases for different values of
weight seed.
Experiments were performed to choose
the optimum value of learning
coefficient. Starting with value 0.1, the
experiments were repeated until the
value was 1.0. The most effective
learning coefficient is the one that gave
the highest testing percentage but
lowest training percentage.
Learning Average Training Average Testing
Result (%)
Coefficient Result (%)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
75.61
75.78
75.61
75.67
75.45
76.50
76.05
76.00
75.83
72.56
75.56
75.56
76.00
75.56
75.56
75.56
76.00
75.56
76.00
72.44
Table 3: Results from the training and
testing phase for different learning
coefficients
It can be seen from the table 3 that the
learning coefficient with value 0.3 was
the best choice.
Momentum
Coefficient
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Average Training Average Testing
Result (%)
Result (%)
75.61
76.00
75.61
76.00
75.67
75.56
75.50
75.56
75.06
75.56
76.50
76.00
75.61
75.56
75.72
75.56
77.00
77.33
69.28
71.11
Table 4 : The results from training and
testing phases with different momentum
coefficients
From the table 4, we were found that
the best momentum coefficient for the
neural network model is 0.9 and the
network was able to correctly predict
77.33%. The training should stopped
after 85% of the data has been trained.
Table 6 summarizes that all the
parameters result would be built for the
best neural network model.
PARAMETER
Problem type
Network architecture
Training pattern
Number of neurons in
the input layer
Number of neurons in
the output layer
Number of hidden
layer
Number of neurons in
the hidden layer
Error decent method
Weights
updating
method
Learning coefficient
Momentum coefficient
Stopping criteria
Transfer function
Correctness of training
Correctness of testing
RESULT
Prediction
Multi-layer
perceptron
with
backpropagation
learning algorithm
Data set A (80%,
10%, 10% - training,
testing, validation);
random
8
5
1
5
Steepest descent
Update after each
pattern
0.3
0.9
85.00%
Sigmoid function
77.00%
77.33%
Table 6: Overall Network Configuration
For testing the accuracy of the model,
30 records data were fed into the
finished model. Six records from each
group of input patterns were selected
randomly from the total sample data as
the depicted in table 7. Besides that,
table 7 also shows the total correct
predicted grades made by the neural
network model in comparing with the
actual grade. The summary result shows
that the model can easily predict the
students with grade 1 and grade 2.
However, the model does have a
problem especially in predicting the
General Grade (GG) .
MCE
Grade
Occurrence of
actual grade
1
2
6
6
Occurrence
of
correct predicted
grade
6
6
3
GG
Fail
Total
6
6
6
30
2
1
3
18
Table 7 : Summary of actual and correct
predicted grades
Nevertheless, table 7 shows that the
estimated grades from neural network
are very close to the actual grades. In
other words, the forecasting output from
a neural network is accurate, with an
acceptable amount of error. This study
shows that neural network can be
trained with students’ data to predict
their achievement in the examination.
Year
Actual grade
1999
1999
1999
1999
1999
1999
1999
1999
1999
1999
3
3
3
Fail
1
Fail
2
Fail
GG
2
Forecasted
grade
3
3
3
Fail
1
Fail
2
GG
Fail
2
Table 8 : Comparison of actual and
forecasted grades
4
Conclusion
Recommendation
and
Based on the results, there is enough
evidence to suggest that the application
of neural network in predicting the
students’ performance is worth the
effort. The grade achieved is not just
based on the core subjects but also on
the elective subjects taken by the
student. Actually, building a neural
network model of student performance
is not that easy due to the fact that there
is such a large variation among
students. In prediction problem like
predicting the student performance or
even the school performance, the choice
of relevant inputs is the major issue. To
get the input that contain as much
relevant information as possible would
be more difficult than to present this
information to the neural network.
A future research possibility is to
include more predictive variables to
improve forecasts further and to
determine the student grade forecasting
efficiency of a neural network. For
instance, factors like the student’s
ethnic and religion, the student’s
previous achievement in major exam
could play important roles in
determining the overall student’s
achievement. Another potential area for
future research would be to perform the
result forecast, using a neural network,
for different types of school. The
different schools are also located in
different areas. There are schools which
are located in rural areas and there are
others which are located in urban areas.
Therefore, these factors i.e. type and
location of school could also contribute
to the students’ performance.
References :
[1] X1. Adya, M. and Collopy, F. , How
Effective Are Neural Networks At
Forecasting And Prediction? A Review
And Evaluation, Journal of Forecasting,
Sep – Nov, Vol. 17, Issue 5 & 6, 1998 pp.
481 – 495.
[2] X2. Aiken, M., Competitive Intelligence
Through Neural Networks, Competitive
Intelligence Review. 1st Quarter, Vol. 10,
Issue 1, 1999 pp. 49–53.
[3] X3. Angstenberger, J., Neural Networks
And Their Applications, John Wiley & Sons
Ltd, 1996
[4] X4. Awad, E. M., Building Expert
Systems: Principles, Procedures, and
Applications, West Publishing Company,
1996
[5] X6. Beale, R. and Edwards, A. D. N.,
Neural Networks and Pattern Recognition
in Human-Computer Interaction, Ellis
Horwood, 1992
[6] X6. Blum, A., Neural Networks in
C++: An Object-Oriented Framework For
Building Connectionist Systems, John
Wiley & Sons Inc, 1992
[7] X7. Ding, X., Canu, S., and Denoeux,
T.,
Neural
Networks
And
Their
Applications, John Wiley & Sons Ltd, 1996
[8] X8. Eklund, P., Fuzzy Sets, Neural
Networks, and Soft Computing, Van
Nostrand Reinhold, 1994
[9] X9. Huang, Chia-Sheng, An Information
Technology Approach to Production and
Operations Decision Making in Dairy
Processing, Dissertation (PhD), The
University of Wisconsin, 1999
[10] X10. Indro, D. C., Jiang, C. X.,
Patuwo, B. E., and Zhang, G. P. , Predicting
Mutual Performance Using Artificial
Neural Networks, Omega, Jun, Vol. 27,
Issue 3, 1999, pp. 373 – 380.
[11] X11. Law, R., Room Occupancy Rate
Forecasting: a Neural Network Approach,
International Journal of Contemporary
Hospitality Management, Vol. 10, Issue 6,
1998, pp. 234 – 239.
[12] X12. Mahmood, M. A., Sullivan, G.
L., and Tung, R., A New Approach To
Evaluating Business Ethics: An Artificial
Neural Networks Application, Journal of
End User Computing, Jul – Sep, Vol. 11,
Issue 3, 1999, pp. 11–19.
[13] X13. Muzaffer Uysal and M Sherif El
Roubi, Artificial Neural Networks Versus
Multiple Regression In Tourism Demand
Analysis, Journal of Travel Research, Nov,
Vol. 38, Issue 2, 1999, pp. 111 – 118.
[14] X14. Roberts, S. J. and Penny, W.,
Neural Networks: Friends or Foes?, Sensor
Review, Vol. 17, No. 1, 1997, pp. 64 – 70.
[15] X15. Tsoukalas, L. H., and Uhrig, R.
E., Fuzzy and Neural Approaches in
Engineering, John Wiley & Sons Inc, 1997
[16] X16. Turban, E., Expert Systems and
Applied Artificial Intelligence, Macmillan
Publishing Company, 1992