Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Shockwave Rider wikipedia , lookup
Mathematical model wikipedia , lookup
Cross-validation (statistics) wikipedia , lookup
Machine learning wikipedia , lookup
Data (Star Trek) wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Pattern recognition wikipedia , lookup
Time series wikipedia , lookup
Neural Network Approach In Predicting Academic Performance ROSHIDI, DIN ROBIZAH, SUDIN School of Information Technology Northern University of Malaysia 06020, Sintok, Kedah MALAYSIA Abstract: - This paper is mainly corcerned with the academic achievement analysis and the application of neural network in academic performance prediction. The system is trained to approximate the mathematical function using previous students result to predict the students performance of the future achievement. A Multi-Layer Perceptron (MLP) uses as a neural network model involving a backpropagation algorithm and tangent and sigmoid function as the transfer function. The training data used not only based on the students result but also incorporates some technical indicators. Key-words:-Academic performance, Prediction, Preprocessing, Multi-Layer Perceptron. 1 Introduction Excellent performance of the students has always been a major issue in education. If the students’ performance can be predicted, proper planning can be taken especially by the school to ensure a better performance in the future. It is about time for us to find and develop a new mechanism to perform this complex nonlinear forecasting problem. Forecasting, at least intelligent forecasting, very common problem in human life, is predicting future events based on historical data[6] very tough task[3] and with neural networks models, effective predictive applications can be developed[7]. It has been found from many studies that neural network is capable of forecasting and giving better results compared to statistical and traditional analytical methods[1][2][10][12]. In addition, [11]what makes a neural network superior to traditional statistical methods in forecasting is that a neural network is better able to recognize the high level features, such as the intracorrelation or serial correlation, of a training set and were well suited to do prediction tasks[9]. It is believed that neural network are also applicable in areas related to education. The goal of this study is to develop a neural network model for forecasting the students’ Malaysia Certification of Education (MCE) results. The model is trained by using a feed forward multilayer perceptron structure with backpropagation learning algorithm. This study is also anticipated to examine the applicability of a neural network model for a predicting task. The hypothesis is that the use of neural networks will produce an accurate model of student exam score performance based on other exam scores. Typically, this study involves a three-stage process. The first stage is about the decisions made on what the input variables and learning parameters would or should be. The second stage is the training and testing the network. The neural network model is then formed from the learning process done during the previous data. It’s training by repeatedly taking in each input (example data or a set of training facts), guessing at its output, and comparing the results to the supplied output[4]. The network is trained using a subset of data until the average error between the forecast and the actual values is reduced to a minimum[2]. Once a neural network has been adequately trained, it is tested to see how well it has learned. Several important network parameters are identified to form a model that could deliver the best performance. The last stage is running the network. Running the network means presenting it with new inputs with no known outputs[4]. This study looks at the possibility of presenting the network with data and determines the respective behavior of the network. 2 Methods In this study, data were preprocessed and transformed into the type and format accepted by the neural network. It is because the process of preprocessing were made it more meaningful[15] and the way the application data were represented and ordered determines the efficiency and possibly the accuracy of results[16]. In view of the fact that preprocessing methods were critical to the neural network’s performance. The process of preprocessing were included data cleansing, data representation, data scaling and normalization, data randomization and data segmentation. After all of these processes had been carried out, only the genuine data were really ready for training and testing. The minimum amount of data required for training is 10(M + N) where M equals the number of inputs and N equals the number of outputs. Another rule of thumb is that the number of training cases should be 10 times the number of model weights. The subjects’ points (i.e. results), the student gender and the year were defined as the input vectors to the network. And the overall achievements of the students (i.e. the overall grades) were defined as the output vector of the network. We were used the raw data in the training set in different formats. Some of them were not acceptable or understandable by the neural network. Consequently, the inappropriate data has to be transformed into network understandable format, particularly numeric form. It is because the task of representing the data in a meaningful way is an essential stage in the successful application of a neural approach[5][15]. The student gender was converted into numerical code which is 0 represent as a male and 1 represent as a female - meaningful forms to a neural network. Since the natural range of data was different from the network’s operating range, the numeric data were normalized or scaled[15] after data representing. Furthermore, the data were converted to numbers so that it was easy for the neural network to do its job which usually only in the form of scaling and normalization[8]. In addition, neural networks understand only numbers that were segmented differently in such a way that: ! 70% of the data was allocated for training ! Another 20% for validation ! And the remaining 10% was for testing After that, both data set (Data Set A and Data Set B) were trained and tested sequentially and randomly. From that, the best result was produced and the best-fitting neural network model would be identified. After getting the bestfitting neural network model, then the network was ready to run a real prediction. A small group of data from the preprocessed data was presented to the network for prediction. The network was expected to produce a prediction for the student grade. In this operational phase, the network was more efficient, since it was no longer subjected to the iterative process of training. 3 Experimental Result Average Testing Result (%) B Average Training Result (%) A Best Two Hidden Neurons Data Set Experimental Approach The input patterns were trained and tested both sequentially and randomly. Table 1 summarizes the results of the learning and testing phases respectively. Data Set A with random testing gave the best result which is the highest percentage of correct testing. This shows that the random approach is much better than the sequential approach. Sequential fall within the range of the neuron activation function - usually 0 to 1, or – 1 to +1 [4]. After that, the data were normalized to make sure all input variables were treated equally and to encourage faster convergence. This was very important to ensure that the statistical distribution of values for each net input and output is roughly uniform. The result or point obtained for each subject taken by the student was normalized as new X value = (Old X value / maximum value). Maximum value was used in the formula to limit the maximum value to unity[15]. From the training set, the year attribute was scaled into the range of 0.1 and 0.9. Scaling of the variable between 0.1 and 0.9 was used to limit the amount of the sigmoid activation function used in the representation of the variables, in order to get out of “network paralysis” in the training process[15]. The formula used to scale the variable mentioned above was [0.8/(xmax – xmin)]x + [0.9 – 0.8 xmax /(xmax – xmin)]. Without adjustment, data could not be presented to the network and might not be able to make a generalization. To avoid getting poor results, thus, the training set were randomized. We were splitted the training set into several segmentations to ensure the neural network performed at its best and to ensure generalization[14]. Basically, the genuine data were segmented as follows: ! 80% of the data was allocated for training ! Another 10% for validation ! And the remaining 10% was for testing In order to determine the best allocation for data segmentation, another different approach was done on the data. Here, the data set (referred as Data Set B) 4 9 4 6 75.00 77.61 69.21 73.21 71.11 75.11 70.67 69.33 Random A B 4 5 5 8 75.83 75.43 75.43 76.95 76.45 * 76.00 75.11 76.00 Table 1: Results from the learning and testing phases for different approach and data set Hence, the two best numbers of hidden neurons were found to be 4 and 5. Out of the two values obtained above, the best one would be chosen as the most fitting number of hidden neurons of the neural network structure. Networks with 4 and 5 hidden neurons were then tested with 10 different weights initialization. Values obtained with 4 and 5 hidden neurons are presented in the following table. Results from the table 2, shows that the best numbers of hidden neurons are 5 since its average percentage of correct testing is higher than of 4. So from this point onwards, the best-fitting neural network model that utilized the best parameters, were used employed 5 hidden neurons at the hidden layer of the three-layered network structure. Hidden Hidden neurons = 4 neurons = 5 Weight Average Average Average Average Seed Training Testing Training Testing Result Result Result Result 1 2 3 4 5 6 7 8 9 10 AVER AGE 75.83 75.22 76.05 61.56 76.17 72.50 72.11 69.22 69.06 72.89 72.06 76.45 76.00 76.89 60.44 77.33 73.33 71.11 68.45 68.00 73.33 72.13 75.83 76.00 76.33 * 75.61 73.56 71.89 76.22 76.11 73.50 75.17 75.02 76.00 77.33 76.89 76.44 72.44 72.45 76.89 76.89 74.22 75.56 75.51 Table 2 : Results from the training and testing phases for different values of weight seed. Experiments were performed to choose the optimum value of learning coefficient. Starting with value 0.1, the experiments were repeated until the value was 1.0. The most effective learning coefficient is the one that gave the highest testing percentage but lowest training percentage. Learning Average Training Average Testing Result (%) Coefficient Result (%) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 75.61 75.78 75.61 75.67 75.45 76.50 76.05 76.00 75.83 72.56 75.56 75.56 76.00 75.56 75.56 75.56 76.00 75.56 76.00 72.44 Table 3: Results from the training and testing phase for different learning coefficients It can be seen from the table 3 that the learning coefficient with value 0.3 was the best choice. Momentum Coefficient 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Average Training Average Testing Result (%) Result (%) 75.61 76.00 75.61 76.00 75.67 75.56 75.50 75.56 75.06 75.56 76.50 76.00 75.61 75.56 75.72 75.56 77.00 77.33 69.28 71.11 Table 4 : The results from training and testing phases with different momentum coefficients From the table 4, we were found that the best momentum coefficient for the neural network model is 0.9 and the network was able to correctly predict 77.33%. The training should stopped after 85% of the data has been trained. Table 6 summarizes that all the parameters result would be built for the best neural network model. PARAMETER Problem type Network architecture Training pattern Number of neurons in the input layer Number of neurons in the output layer Number of hidden layer Number of neurons in the hidden layer Error decent method Weights updating method Learning coefficient Momentum coefficient Stopping criteria Transfer function Correctness of training Correctness of testing RESULT Prediction Multi-layer perceptron with backpropagation learning algorithm Data set A (80%, 10%, 10% - training, testing, validation); random 8 5 1 5 Steepest descent Update after each pattern 0.3 0.9 85.00% Sigmoid function 77.00% 77.33% Table 6: Overall Network Configuration For testing the accuracy of the model, 30 records data were fed into the finished model. Six records from each group of input patterns were selected randomly from the total sample data as the depicted in table 7. Besides that, table 7 also shows the total correct predicted grades made by the neural network model in comparing with the actual grade. The summary result shows that the model can easily predict the students with grade 1 and grade 2. However, the model does have a problem especially in predicting the General Grade (GG) . MCE Grade Occurrence of actual grade 1 2 6 6 Occurrence of correct predicted grade 6 6 3 GG Fail Total 6 6 6 30 2 1 3 18 Table 7 : Summary of actual and correct predicted grades Nevertheless, table 7 shows that the estimated grades from neural network are very close to the actual grades. In other words, the forecasting output from a neural network is accurate, with an acceptable amount of error. This study shows that neural network can be trained with students’ data to predict their achievement in the examination. Year Actual grade 1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 3 3 3 Fail 1 Fail 2 Fail GG 2 Forecasted grade 3 3 3 Fail 1 Fail 2 GG Fail 2 Table 8 : Comparison of actual and forecasted grades 4 Conclusion Recommendation and Based on the results, there is enough evidence to suggest that the application of neural network in predicting the students’ performance is worth the effort. The grade achieved is not just based on the core subjects but also on the elective subjects taken by the student. Actually, building a neural network model of student performance is not that easy due to the fact that there is such a large variation among students. In prediction problem like predicting the student performance or even the school performance, the choice of relevant inputs is the major issue. To get the input that contain as much relevant information as possible would be more difficult than to present this information to the neural network. A future research possibility is to include more predictive variables to improve forecasts further and to determine the student grade forecasting efficiency of a neural network. For instance, factors like the student’s ethnic and religion, the student’s previous achievement in major exam could play important roles in determining the overall student’s achievement. Another potential area for future research would be to perform the result forecast, using a neural network, for different types of school. The different schools are also located in different areas. There are schools which are located in rural areas and there are others which are located in urban areas. Therefore, these factors i.e. type and location of school could also contribute to the students’ performance. References : [1] X1. Adya, M. and Collopy, F. , How Effective Are Neural Networks At Forecasting And Prediction? A Review And Evaluation, Journal of Forecasting, Sep – Nov, Vol. 17, Issue 5 & 6, 1998 pp. 481 – 495. [2] X2. Aiken, M., Competitive Intelligence Through Neural Networks, Competitive Intelligence Review. 1st Quarter, Vol. 10, Issue 1, 1999 pp. 49–53. [3] X3. Angstenberger, J., Neural Networks And Their Applications, John Wiley & Sons Ltd, 1996 [4] X4. Awad, E. M., Building Expert Systems: Principles, Procedures, and Applications, West Publishing Company, 1996 [5] X6. Beale, R. and Edwards, A. D. N., Neural Networks and Pattern Recognition in Human-Computer Interaction, Ellis Horwood, 1992 [6] X6. Blum, A., Neural Networks in C++: An Object-Oriented Framework For Building Connectionist Systems, John Wiley & Sons Inc, 1992 [7] X7. Ding, X., Canu, S., and Denoeux, T., Neural Networks And Their Applications, John Wiley & Sons Ltd, 1996 [8] X8. Eklund, P., Fuzzy Sets, Neural Networks, and Soft Computing, Van Nostrand Reinhold, 1994 [9] X9. Huang, Chia-Sheng, An Information Technology Approach to Production and Operations Decision Making in Dairy Processing, Dissertation (PhD), The University of Wisconsin, 1999 [10] X10. Indro, D. C., Jiang, C. X., Patuwo, B. E., and Zhang, G. P. , Predicting Mutual Performance Using Artificial Neural Networks, Omega, Jun, Vol. 27, Issue 3, 1999, pp. 373 – 380. [11] X11. Law, R., Room Occupancy Rate Forecasting: a Neural Network Approach, International Journal of Contemporary Hospitality Management, Vol. 10, Issue 6, 1998, pp. 234 – 239. [12] X12. Mahmood, M. A., Sullivan, G. L., and Tung, R., A New Approach To Evaluating Business Ethics: An Artificial Neural Networks Application, Journal of End User Computing, Jul – Sep, Vol. 11, Issue 3, 1999, pp. 11–19. [13] X13. Muzaffer Uysal and M Sherif El Roubi, Artificial Neural Networks Versus Multiple Regression In Tourism Demand Analysis, Journal of Travel Research, Nov, Vol. 38, Issue 2, 1999, pp. 111 – 118. [14] X14. Roberts, S. J. and Penny, W., Neural Networks: Friends or Foes?, Sensor Review, Vol. 17, No. 1, 1997, pp. 64 – 70. [15] X15. Tsoukalas, L. H., and Uhrig, R. E., Fuzzy and Neural Approaches in Engineering, John Wiley & Sons Inc, 1997 [16] X16. Turban, E., Expert Systems and Applied Artificial Intelligence, Macmillan Publishing Company, 1992