Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MS Thesis IDENTIFY STUDENT’S GRADE AND DROPOUT ISSUE Submitted By: Mereen Rehman Reg#: 671-FBAS/MSCS/S12 Supervisor: Ms. Zakia Jalil Assistant Professor Co Supervisor: Sadia Arshid Department of Computer Science & Software Engineering, Faculty of Basic & Applied Sciences, International Islamic University, Sector H-10, Islamabad 2014 Declaration FINAL APPROVAL It is certified that we have read the project titled “Identify Student’s Grade and Dropout Issue” submitted by Ms. Mehreen rehman (671-FBAS/MSCS/S12) is accepted in its present form by the Department of Computer Sciences and Software Engineering, Faculty of Basic & Applied Sciences, International Islamic University Islamabad, as satisfying the thesis requirements for the degree of MS of Sciences in Computer Science. Committee Internal Examiner Lecturer, Department of Computer Science, International Islamic University, Islamabad. External Examiner Lecturer, Department of Computer Science, International Islamic University, Islamabad. Supervisor Ms. Ms. Zakia Jalil Assistant Professor Department of Computer Science, International Islamic University, Islamabad. ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu i Declaration A thesis submitted to the Department of Computer Science and Software Engineering, International Islamic University, Islamabad As a partial fulfillment of the requirements for the award of the degree of MS Computer Science ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu ii Declaration DECLARATION I, hereby declare that “Identify Student’s Grade and Dropout Issue” neither as a whole nor as a part thereof has been copied out from any source. I have developed this project and the accompanied report entirely on the basis of my personal efforts made under the sincere guidance of my supervisor. No portion of the work presented in this report has been submitted in support of any application for any other degree or qualification of this or any other university or institution of learning. Mehreen Rehman 671-FBAS/MSCS/S-12 ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu iii Acknowledgements ACKNOWLEDGEMENTS All praises and much gratitude to Almighty Allah, the most merciful and glorious, who granted me the potential to work hard and perseverance to accomplish this research work. I would like to sprinkle special thanks on my supervisor Ms. Zakia Jalil Assistant Professor, who always provided me greatest support and help whenever I needed throughout my research work. He was readily available for consultation, shared his knowledge with me as well as motivated me for successful completion of this research work. I can’t forget successful support of my affectionate parents, who always shown desire and pray for my success as well as provided me financial and moral support throughout my life. I would like to thank all my respectable teachers, sincere friends and all those peoples in the faculty who helped me during this research project. Mehreen Rehman 671-FBAS/MSCS/S-12 ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu iv Project in Brief PROJECT IN BRIEF PROJECT TITLE : Identify Student’s Grade and Dropout Issue UNIVERSITY : Department of Computer Science & Software Engineering International Islamic University, Islamabad. UNDERTAKEN BY : Mehreen Rehman 671-FBAS/MSCS/s12 SUPERVISED BY Ms. Zakia Jalil Assistant Professor Department of Computer Science & Software Engineering International Islamic University, Islamabad, Sadia Arshid Assistant Professor Department of Computer Science & Software Engineering International Islamic University, Islamabad, : Co SUPERVISED BY: TOOLS USED : WEKA MS Office 2007 for documentation & presentation OPERATING SYSTEM : Windows 7 (64-bit.) SYSTEM USED : HP Pavilion dv5 Intel (R) Core (TM) 2 Duo CPU P7350 @ 2.00 GHz RAM 4 GB START DATE : May, 2014 COMPLETION DATE : Oct, 2014 ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu v Identify Student’s Grade and Dropout Issue Abstract Abstract Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. The knowledge is hidden among the educational data set and it is extractable through data mining techniques. Each university has specific criteria of course evaluation. In Islamic university total marks is 100 of a course and where 40% marks of internal and 60% of final exam. Continuous Evaluation of internal 40 are (Quiz 10, mid 20, assignment 10). I will derive variables from internal data for predicting student performance in the final exam. Admission merit criteria of Islamic university is Academics Qualification: 40% and Admission Test: 60%. Admission test is divided into five sections: Series/sequences, Quantitative, Logic, Analytical and English. By using HSSC and entry test result predict dropout student. In this research, the classification task will used Identify Student’s Grade and Dropout Issue. I will use Decision Tree method for data classification. By this task I will extract knowledge that describes students’ performance in final exam. It will helps to identifying the dropouts students and students who need special attention and allow the teacher to provide appropriate advising Keywords: Educational data mining, Data Mining. ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu vi Identify Student’s Grade and Dropout Issue Table of Contents Table of Contents 1. INTRODUCTION ............................................................................................................... 1 1.1 Data Mining .................................................................................................................. 1 1.2 Classification Methods .................................................................................................. 1 1.3 Clustering ..................................................................................................................... 2 1.4 Predication .................................................................................................................... 3 1.5 Association rule ............................................................................................................ 3 1.6 Neural networks ............................................................................................................ 4 1.7 Nearest Neighbor Method ............................................................................................. 4 1.8 Decision Tree ................................................................................................................ 5 1.9 Bayesian Classification ................................................................................................. 6 1.10 Research Objective ....................................................................................................... 7 2. Literature Review................................................................................................................. 8 2.1 Analyze students’ performance .......................................................................................... 8 2.2 A prediction for Student's Performance ......................................................................... 8 2.3 A prediction for performance improvement........................................................................ 8 2.4 A Prediction for Performance Improvement of Engineering Students ................................ 9 2.5 Classification to Decrease Dropout Rate of Students .......................................................... 9 2.6 MED to Reduce Dropout Rates ........................................................................................ 10 2.7 Improving the Student’s Performance ............................................................................. 10 2.8 Study of Factors Analysis Affecting Academic Achievement .......................................... 10 2.9 EDM for Predicting the Performance of Students ........................................................... 11 2.10 A prediction of performer or underperformer using classification .................................. 11 2.11 The Student Performance Analysis and Prediction. ....................................................... 11 2.12 Predicting Student Performance using ID3 AND C4.5 .................................................. 12 2.13 Predicting Graduate Employment.................................................................................. 12 ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu vii Identify Student’s Grade and Dropout Issue Table of Contents 2.14 Evaluation of Student Performance ............................................................................... 13 2.15 Literature Survey Concept matrix ................................................................................ 13 3. Problem Statement .......................................................................................................... 16 4. Proposed Solution .............................................................................................................. 17 4.1 WEKA............................................................................................................................. 17 4.2 Data collection for identify students grade ....................................................................... 17 4.3 Data collection for identify dropout issue ......................................................................... 18 4.4 Implementation ................................................................................................................ 18 5- Experiments .......................................................................................................................... 19 5.1 ID3 Decision Tree............................................................................................................ 19 5.2. Data set ........................................................................................................................... 19 5.3. Data selection and transformation ................................................................................... 19 5.4 Implementation of Mining Model .................................................................................... 22 5.5 Decision Tree .................................................................................................................. 22 5.6 The ID3 Decision Tree ..................................................................................................... 22 5.7 Impurity Measurement ..................................................................................................... 22 5.8 Entropy............................................................................................................................ 23 5.9 Information gain .............................................................................................................. 23 5.10 ID3 Algorithm .............................................................................................................. 23 5.11 C4.5 ............................................................................................................................... 24 6 Discussion on Result .............................................................................................................. 26 6.1 For identify student grades. .............................................................................................. 26 6.2 For identify dropout issue. ............................................................................................... 38 7 Conclusion .............................................................................. Error! Bookmark not defined. 7. References ...................................................................................................................... 63 ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu viii Identify Student’s Grade and Dropout Issue List of Tables List of Tables Table Number Page No Table 1: Conduction of research 2 Table 2: student relative variable 18 Table 3: student relative variable 76 Table 4: Data sets of English file 77 ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu ix Chapter 1 List of Figures List of Figures Figure Number Page No Figure 1: Data Mining 15 Figure 2: Classification 15 Figure 3: Clustering 15 Figure 4: Regression 15 Figure 5: Association rule 15 Figure 6: Nearest Neighbor Method 15 Figure 7: decision tree 15 Figure 8: English file in weka 15 Figure 9: C4.5 result 15 Figure 10: Evolution of test spit 15 Figure 11: Classifier visualize 15 Figure 12: dropout file in weka 15 Figure 13: Evolution of test spit 15 Figure 14: Run information 15 Figure 15: C4.5 result 15 ______________________________________________________________________________ Identify Student’s Grade and Dropout Issu x Chapter 1 Introduction 1. INTRODUCTION 1.1 Data Mining Data mining is the procedure of extraction of intriguing (non-insignificant, understood, at one time obscure and possibly valuable) examples or learning from colossal measure of information. Information mining procedures are utilized to work on substantial volumes of information to find concealed examples and connections supportive in choice making. An data mining calculation is a generally characterized strategy that takes information as data and produces yield as models or examples. The term generally characterized demonstrate that the methodology can be decisively encoded as a limited set of principles. Figure 1 data mining The new rising field [1], called Educational Data Mining, concerns with creating routines that find information from information starting from instructive situations. Instructive information mining is utilized to distinguish and upgrade instructive procedure which can enhance their choice making methodology. Key employments of EDM incorporate anticipating understudy execution, and concentrating on adapting so as to propose upgrades to current instructive practice. EDM can be viewed as one of the learning sciences, and also a zone of information mining. 1.2 Classification Methods Classification method like decision trees, Bayesian system and so forth can be connected on the instructive information for foreseeing the understudy's execution in examination. The forecast will help to distinguish the powerless understudies and help them to score better stamps. The Id3 (Iterative Dichotomise 3), C4.5, CART and ADT (Alternating Decision Tree) choice tree calculations are connected on understudy's information to anticipate their execution in the end of the year test. Identify Student’s Grade and Dropout Issue 1 Chapter 1 Introduction Figure 2 classification 1.3 Clustering Clustering can be said as identification of similar classes of objects. By using clustering techniques we can further identify dense and sparse regions in object space and can discover overall distribution pattern and correlations among data attributes. Classification approach can also be used for effective means of distinguishing groups or classes of object but it becomes costly so clustering can be used as preprocessing approach for attribute subset selection and classification. Figure 3 clustering Identify Student’s Grade and Dropout Issue 2 Chapter 1 1.4 Introduction Predication Regression technique can be adjusted for predication. Relapse examination can be utilized to model the relationship between one or more free variables and ward variables. In information mining autonomous variables are qualities known and reaction variables are what we need to anticipate. Lamentably, a lot of people certifiable issues are not just expectation. Consequently, more mind boggling systems (e.g., logistic relapse, choice trees, or neural nets) may be important to gauge future qualities. The same model sorts can frequently be utilized for both relapse and arrangement. Figure 4 regression For instance, the CART (Classification and Regression Trees) choice tree calculation can be utilized to construct both characterization trees (to group absolute reaction variables) and relapse trees (to figure constant reaction variables). Neural systems also can make both order and relapse models. 1.5 Association rule Association and correlation is generally to discover visit thing set discoveries among expansive information sets. This kind of discovering helps organizations to settle on specific choices, for example, index outline, cross advertising and client shopping conduct investigation. Identify Student’s Grade and Dropout Issue 3 Chapter 1 Introduction Figure 5 Association Association Rule calculations need to have the capacity to create tenets with certainty values short of what one. However the quantity of conceivable Association Rules for a given dataset is by and large vast and a high extent of the principles are more often than not of little (if any) quality. 1.6 Neural networks Neural system is a situated of associated info/yield units and every association has a weight present with it. Amid the learning stage, system adapts by changing weights in order to have the capacity to anticipate the right class names of the information tuples. Neural systems have the striking capacity to determine significance from confounded or loose information and can be utilized to concentrate examples and discover drifts that are so mind boggling it is not possible be recognized by either people or other machine methods. These are appropriate for ceaseless esteemed inputs and yields. Neural systems are best at distinguishing examples or patterns in information and appropriate for forecast or estimating needs. 1.7 Nearest Neighbor Method A method that groups each one record in a dataset focused around a blending of the classes of the k record(s) most like it in an authentic dataset (where k is more noteworthy than or equivalent to 1). Some of the time called the k-closest neighbor method. Identify Student’s Grade and Dropout Issue 4 Chapter 1 Introduction Figure 6 Nearest Neighbor Method 1.8 Decision Tree A decision tree is a tree in which each one limb hub speaks to a decision between various decision, and each one leaf hub speaks to a choice. Choice tree are regularly utilized for picking up data with the end goal of choice -making. Choice tree begins with a root hub on which it is for clients to take activities. From this hub, clients part every hub recursively as indicated by choice tree learning calculation. The last come about is a choice tree in which each one extension speaks to a conceivable situation of choice and its conclusion. The three broadly utilized choice tree learning calculations are: Id3, ASSISTANT and C4.5. Figure 7 Decision tree Identify Student’s Grade and Dropout Issue 5 Chapter 1 Introduction a) ID3 Id3 (Iterative Dichotomiser 3) is a choice tree calculation presented in 1986 by Quinlan Ross [1]. Id3 utilizes data increase measure to pick the part quality. It just acknowledges absolute traits in building a tree model. It doesn't give precise result when there is commotion. To evacuate the commotion preprocessing procedure must be utilized. Nonstop traits can be taken care of utilizing the Id3 calculation by discretizing or straightforwardly, by considering the qualities to discover the best part point by taking an edge on the property estimations. Id3 does not help pruning. b) C4.5 algorithm C4.5 calculation is a successor to Id3 created by Quinlan Ross [2]. C4.5 handles both unmitigated and nonstop ascribes to construct a choice tree. To handle consistent traits, C4.5 parts the characteristic qualities into two parcels focused around the chose edge such that all the qualities over the limit as one kid and the staying as an alternate tyke. It likewise handles missing characteristic qualities. C4.5 uses Gain Ratio as a credit choice measure to assemble a choice tree. It uproots the biasness of data increase when there are numerous result estimations of a property. From the start, ascertain the addition degree of each one trait. The root hub will be the property whose increase degree is most extreme. C4.5 utilizes cynical pruning to evacuate unnecessary extensions in the choice tree to enhance the exactness of order. 1.9 Bayesian Classification The Naïve Bayes Classifier procedure is especially suited when the dimensionality of the inputs is high. Regardless of its effortlessness, Naive Bayes can frequently beat more modern arrangement systems. Gullible Bayes model distinguishes the attributes of dropout understudies. It demonstrates the likelihood of each one info property for the anticipated state. A Naive Bayesian classifier is a basic probabilistic classifier focused around applying Bayesian hypothesis (from Bayesian detail) with solid (credulous) autonomy presumptions. By the utilization of Bayesian hypothesis we can compos Identify Student’s Grade and Dropout Issue 6 Chapter 1 Introduction 1.10 Research Objective The main objective of this paper is to use data mining methodologies to study students’ performance in the courses. Data mining provides many tasks that could be used to study the student performance. In this research, the classification task will used to evaluate student’s performance and as there are many approaches that are used for data classification, the decision tree method is used here. Information like Student batch (SB), Quiz marks (QM), Mid paper marks (MPM), Assignment Marks (AM), Attendance of Student (ATT), HSSC Marks (HSSC), Entry Test Marks (ETM) and End semester Marks (ESM) were collected from the students’ management system, to predict the performance at final exam. Identify Student’s Grade and Dropout Issue 7 Chapter 8 References 2. Literature Review 2.1 Analyze students’ performance Bharadwaj and Pal [3] acquired the college understudies information like participation, class test, course and task marks from the understudies' past database, to anticipate the execution at the end of the semester. The information set utilized as a part of this study was gotten from VBS Purvanchal University, Jaunpur (Uttar Pradesh) on the inspecting technique for machine Applications bureau obviously MCA (Master of Computer Applications) from session 2007 to 2010. At first size of the information is 50. The principle target was to utilize information mining strategies to study understudy's execution in the courses. They were chosen few determined variables yet they can't select the variable mid paper marks. They choose the variable (ASS – Assignment execution) and partition it into two classes: Yes – understudy submitted task, No – Student not submitted task however I think it will separate into three classes: Poor – < 40%, Average – > 40% and < 60%, Good –>60%. To assess understudy's execution they utilize order undertaking choice tree strategy however in choice tree there is no back following so a nearby ideal arrangement can be taken as worldwide arrangement and guidelines. 2.2 A prediction for Student's Performance The principle goal of Abeer and Ibrahim [4] was to utilize information mining procedures to study understudy's execution in end. General gratefulness and Classification undertaking is utilized to foresee the last grade of understudies. The information set utilized within this study was gotten from an understudy's database utilized as a part of one of the instructive establishments, on the inspecting system for Information framework office from session 2005 to 2010. At first size of the information is 1547 records. In full research paper they can't characterize that they utilize weka apparatus for execute yet at end they demonstrate a figure of result in weka. Weka is open source programming that actualizes an expansive accumulation of machine inclining calculations and is broadly utilized as a part of information mining applications. 2.3 A prediction for performance improvement Bhardwaj and Pal [5] led study on the understudy execution based by selecting 300 understudies from 5 distinctive degree school directing BCA (Bachelor of Computer Application) course of Identify Student’s Grade and Dropout Issue 8 Chapter 8 References Dr. R. M. L. Awadh University, Faizabad, India. By method for Bayesian order system on 17 characteristics, it was discovered that the components like students‟ review in senior auxiliary exam, living area, medium of educating, moms capability, understudies other propensity, family yearly pay and understudies family status were exceptionally associated with the understudy scholastic execution. They were surrounded to support the low scholarly achievers in advanced education. Bayesian grouping strategy is utilized on understudy database to anticipate the understudies division on the premise of earlier year database however Bayesian classifier request incredible consistency in information so some other system can be taken in attention. Other arrangement errand i.e choice tree system will likewise be use to foresee the understudies division on the premise of earlier year database. They were chosen 14 determined variables. Different variables are additionally be chosen i.e understudies review in High School instruction (HSG), Students review in Senior (SSG), The affirmation sort (Atype). 2.4 A Prediction for Performance Improvement of Engineering Students Surjeet Kumar Yadav and Saurabh Pal [6] led study on the understudy execution based by selecting 90 understudies from 5 distinctive degree school leading BCA (Bachelor of Computer Application) course of Dr. R. M. L. Awadh University, Faizabad, India. By method for choice tree arrangement technique on 17 property, it was observed that the elements like understudies review in senior optional exam, living area, medium of educating, moms capability, understudies other propensity, family yearly wage and understudies family status were profoundly corresponded with the understudy scholastic execution. The goals were confined in order to aid the low scholastic achievers in building. The C4.5, Id3 and CART choice tree calculations are connected on building understudy's information to foresee their execution in the last, most decisive test. Other characterization assignments are additionally be connected i.e Bayesian order technique on these 17 traits. 2.5 Classification to Decrease Dropout Rate of Students The primary target of Dr. Saurabh Pal [7] was to utilize information mining philosophies to discover understudies which are prone to drop out their first year of designing. Study lead on the understudy execution based by selecting 165 understudies. The grouping errand is utilized to assess earlier year's understudy dropout information and as there are numerous methodologies that are utilized for information characterization, the Bayesian order technique on 17 properties. Data like imprints in High School, checks in Senior Secondary, understudies family position and so forth were gathered from the understudy's administration framework, to anticipate rundown of understudies who need unique consideration. Identify Student’s Grade and Dropout Issue 9 Chapter 8 References They utilize the Bayesian grouping strategy to discover understudies which are liable to drop out their first year of designing. Anyway the issue with Bayesian hypothesis is it supports the most elevated happening esteem so for utilizing this procedures information ought to be predictable enough to beat this issue. Bayesian classifier request incredible consistency in information so some other strategy can be taken in attention. 2.6 MED to Reduce Dropout Rates Saurabh Pal [8] use information mining philosophies to discover understudies which are liable to drop out their first year of building. In this examination, the order assignment is utilized to assess earlier year's understudy dropout information and as there are numerous methodologies that are utilized for information grouping, the Id3, C4.5, CART and ADT choice tree strategies is utilized here. Data like evaluation in High School, review in Senior Secondary, understudy's family pay, folks capability and so forth were gathered from the understudy's administration framework, to foresee rundown of understudies who need unique consideration. The fundamental destination was to utilize information mining techniques to discover understudies which are prone to drop out their first year of designing the Id3, C4.5, CART and ADT choice tree strategies is utilized here. They were chosen 14 determined variables. Different variables are likewise be chosen i.e Students food habit(SFH), Students other habit(SOH) Students family status(FStat), student’s family size(FSize), Family annual income Status (FAIn), Student live in hostelor not(Hos). 2.7 Improving the Student’s Performance In K.shanmuga Priya [9], information characterization and choice tree is utilization which serves to enhance the understudy's execution in a finer manner. Give high certainty to understudies in their studies. To distinguish the understudies who need uncommon prompting or directing by the instructor this gives high caliber of instruction. The information set utilized is gotten from M.sc IT bureau of Information Technology 2009 to 2012 clump, Hindustan College of Arts and Science, Coimbatore. Initial 50 understudies information is taken as example and blunders were evacuated. No instrument and no product is utilization. They were chosen few determined variables however they can't select the variable mid paper marks. They choose the variable (PP – Paper Presentations. Paper presentation is separated into two classes: Yes – understudy partook Presentation, No – Student not took part in Presentation. be that as it may I think it isolate into three classes: Poor , Average and good. 2.8 Study of Factors Analysis Affecting Academic Achievement The point of Pimpa Cheewaprakobkit [10] is to dissect variables influencing scholarly accomplishment expectation of understudies' scholastic execution. It is helpful in recognizing Identify Student’s Grade and Dropout Issue 10 Chapter 8 References frail understudies who perform inadequately in their study. The information set included 1,600 understudy records with 22 traits of understudies enrolled between year 2001 and 2011 in a college in Thailand. They utilized WEKA open source information mining instrument to dissect characteristics Two order calculations have been embraced and looked at: the neural system C4.5 choice tree calculation. Three fundamental issue and future work is that Each component has an alternate critical Value different variables or components ought to be considered too Find approaches to exhort and support the at-danger understudies. Future examination ought to stretch the study to investigate the understudies' execution in different projects. 2.9 EDM for Predicting the Performance of Students The extent of Ajay Kumar Pal and Saurabh Pal [11], makes to concentrate the learning find from the understudy database for enhancing the understudy execution. They by information mining systems including a standard learner (Oner), a typical choice tree calculation C4.5 (J48), a neural system (Multilayer Perceptron), and a Nearest Neighbor calculation (Ib1) are utilized. The information set utilized as a part of that study was gotten from distinctive schools on the inspecting technique for B.sc. (Single guys of Science) course of session 2011-12. At first size of the information is 200. They utilized Weka open source programming. 2.10 A prediction of performer or underperformer using classification The extent of Ajay Kumar Pal and Saurabh Pal [12], makes to concentrate the learning find from the understudy database for enhancing the understudy execution. They by information mining systems including a standard learner (Oner), a typical choice tree calculation C4.5 (J48), a neural system (Multilayer Perceptron), and a Nearest Neighbor calculation (Ib1) are utilized. The information set utilized as a part of that study was gotten from distinctive schools on the inspecting technique for B.sc. (Single guys of Science) course of session 2011-12. At first size of the information is 200. They utilized Weka open source programming. 2.11 The Student Performance Analysis and Prediction. Information mining procedures assume a paramount part in information investigation. For the development of an arrangement model which could foresee execution of understudies, especially for building limbs, a choice tree calculation connected with the information mining strategies have been utilized as a part of the exploration. Various variables may influence the execution of understudies. In Vivek Kumar Sharma [13] some huge variables have been considered while building the choice tree for ordering understudies as indicated by their characteristics (grades). In this paper four diverse choice tree calculations J48, Nbtree, Reptree and Simple truck were looked at and J48 choice tree calculation is discovered to be the best suitable calculation for model development. Cross approval system and rate part technique were utilized to assess the Identify Student’s Grade and Dropout Issue 11 Chapter 8 References effectiveness of the diverse calculations. The customary KDD process has been utilized as a philosophy. The WEKA (Waikato Environment for Knowledge Analysis) device was utilized for dissection and expectation. . Results acquired in the present study may be useful for recognizing the frail understudies so that administration could take proper activities, and achievement rate of understudies could be expanded sufficiently. 2.12 Predicting Student Performance using ID3 AND C4.5 For Kalpesh Adhatrao [14] they have broke down the information of understudies enlisted in first year of building. This information was gotten from the data gave by the conceded understudies to the organization. It incorporates their full name, sex, application ID, scores in board examinations of classes X and XII, scores in door examinations, class and affirmation sort. We then connected the Id3 and C4.5 calculations in the wake of pruning the dataset to anticipate the consequences of these understudies in their first semester as unequivocally as could reasonably be expected. In this project, prediction parameters such as the decision trees generated using RapidMiner are not updated dynamically within the source code. In the future, we plan to make the entire implementation dynamic to train the prediction parameters itself when new training sets are fed into the web application. Also, in the current implementation, we have not considered extracurricular activities and other vocational courses completed by students, which we believe may have a significant impact on the overall performance of the students. Considering such parameters would result in better accuracy of prediction. 2.13 Predicting Graduate Employment Data mining has been connected in different zones on account of its capability to quickly break down inconceivable measures of information. Bangsuk Jantawan [15 ]is to assemble the Graduates Employment Model utilizing order undertaking within information mining, and to look at a few of information mining methodologies, for example, Bayesian strategy and the Tree system. The Bayesian system incorporates 5 calculations, including AODE, Bayesnet, HNB, Navivebayes, WAODE. The Tree technique incorporates 5 calculations, including Bftree, Nbtree, Reptree, Id3, C4.5. The examination utilizes a grouping undertaking as a part of WEKA, and we analyze the consequences of every calculation, where a few order models were created. To accept the produced model, the examinations were led utilizing true information gathered from graduate profile at the Maejo University in Thailand. The model is expected to be utilized for anticipating whether a graduate was utilized, unemployed, or in an undetermined circumstance. Identify Student’s Grade and Dropout Issue 12 Chapter 8 References 2.14 Evaluation of Student Performance In P. Ajith, M.S.S.Sai [16] outlier location components are utilized for distinguishing outliers which enhance the nature of choice making. We utilized outlier investigation to recognize outliers in the understudy information. In proposed framework, bunching system alongside univariant investigation is executed. Bunching is discovering gatherings of articles such that the items in one gathering will be like each other and not quite the same as the items in an alternate gathering. While bunching, the extensive information set is partition into bunches which comprises of outliers. In the wake of Clustering, the information focuses which are available outside the bunches are distinguished and treated as outliers. Recognizable proof is carried out by utilizing univariate investigation which is the least complex type of quantitative (factual) investigation. An essential method for introducing univariate information is to make a recurrence dispersion of the individual cases Here, we dissect the execution of UG understudies of our school and present the results utilizing outlier recognition instrument. The investigated results are spoken to utilizing histograms which are focused around univariate examination. 2.15 Literature Survey Concept matrix The Table 2.1 mentioned below; briefly describes the purpose for the conduction of research, techniques that are being used in the proposed paper, the results and outcomes of the proposed solution of paper, advantages of presenting the proposed method and future work of the following papers Publication Name & Year (IJCSIS) -11 (WJCAT) -13 (IJCSIS) -11 Purpose Mining Educational Data to Analyze Students‟ Performance A prediction for Student's Performance Using Classification Method A prediction for performance Technique Results/ Outcome Advantage Decision tree PSM has the Help to method highest gain improve the division of the student ID3 decision Mid tream Reduce tree, Weka has the failing ratio tool highest gain Bayes Classification and Identify Student’s Grade and Dropout Issue Other factors Identify those effect student student which performance needed 13 Chapter 8 (WCSIT) -12 (IJCSIS) -12 (IJIEEB) -12 (IJANA) -13 (IMECS) - 13 (IJCIT) -13 References improvement using classification A Prediction for Performance Improvement of Engineering Students using Classification Mining Educational Data Using Classification to Decrease Dropout Rate of Students Mining Educational Data to Reduce Dropout Rates of Engineering Students Improving the Student’s Performance Using Educational Data Mining Study of Factors Analysis Affecting Academic Achievement of Undergraduate Students in International Program Data Mining MATLAB tool special attention. C4.5, ID3, and CART decision tree algorithms. And Weka tool C4.5 has highest accuracy of 67.778% compare to other method. Model is successfully identifying the student who are likely to fail. Bayes Classification was implement in weka tool The student with mid= hindi are not continue their study. Predict the list of student who are going to drop their study. C4.5, ID3, CART and ADT decision tree algorithms were implement in Weka tool ID3 decision tree. ID3 can learn effective predictive models from the student dropout data. Produce short but accurate prediction list of student dropout. The attribute OSM has the maximum gain value. It improve students’ performance in an efficient way. Performance comparison between Decision Tree and Neural Network models Decision Tree Model is more accurate than the Neural Network Model The model will be updated and tested to have a satisfactory level. The rule Nearest Identify Student’s Grade and Dropout Issue results show 14 Chapter 8 References Techniques in EDM for Predicting the Performance of Students learner (OneR), decision tree algorithm C4.5 (J48), neural network (MultiLayer Perceptron) and Nearest Neighbour algorithm (IB1) were implemented in WEKA. Neighbour algorithm IB1 classifier has the lowest average error that they produce short but accurate prediction list for the student by applying the predictive models. Table 1 conduction of research Identify Student’s Grade and Dropout Issue 15 Chapter 8 3. References Problem Statement In Abeer [2] they were selected few derived variables but they cannot select the variable mid Marks which highly effect on final exam. The variable assigement divide it into two classes: Yes – student submitted assignment, No – Student not submitted assignment but I think it divide into three classes: Poor – < 40%, Average – > 40% and < 60%, Good –>60%. To analyze student’s performance they use classification task decision tree method but in decision tree there is no back tracking so a local optimal solution can be taken as global solution and rules are inferred if small data set is selected. No tool and no software used. Not able to predict students division of first semester. Identify Student’s Grade and Dropout Issue 16 Chapter 8 4. References Proposed Solution The classification did utilizing a Decision Tree technique to foresee the execution at the end of the semester. Decision tree strategy will utilized on understudy database to anticipate the understudies division on the premise of past database. Those variables will choose that exceedingly impact on understudies' execution. This study will help to the understudies and the educators to enhance the division of the understudy. This study will likewise work to recognize those understudies which required unique thoughtfulness regarding lessen coming up short proportion and making fitting move at opportune time. Data Analysis and Implementation of decision support system is use for evaluating student’s grade and Identify dropout student Using classification algorithm (decision tree). To evaluate student grade select attributes of internal data from student database. To identify dropout student select attributes of HSSC and entry test result from student database. The system will be implemented in WEKA. 4.1 WEKA Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. The data set will be obtained from IIUI, For Evaluating student’s grade Data set will be Computer Science and Software Engineering department. Initially size of data is 200 for each subject. For Identifying dropout student Data set will be obtained from admission department. Initially size of data will be 9217.Data will be stored in different tables which will be join in a single table. 4.2 Data collection for identify students grade For Predict the students division of computer science and software engineering subject. The attributes and there value are 1. Student Batch(present, senior) 2. Quiz marks (poor, average, good), 3. Mid paper marks (A, B+,B,C+,C,D+,D,F), 4. Assignment Marks (poor, average, good), 5. Attendance (poor, average, good), 6. End semester Marks (A, B+,B,C+,C,D+,D,F) Identify Student’s Grade and Dropout Issue 17 Chapter 8 References 4.3 Data collection for identify dropout issue To Predict the dropout students of International Islamic University, Islamabad (IIUI). The attributes and there value are 1. Gender (male, female) 2. HSSC marks (A, B+,B,C+,C,D+,D,F), 3. Series/sequences (average, good), 4. Quantitative (poor, average, good), 5. Logic (average, good), 6. Analytical (poor, average, good), 7. English (poor, average, good, vgood), 8. End semester Marks (A, B+,B,C+,C,D+,D,F), 9. Dropout(yes.no) 4.4 Implementation I have separated the whole execution into three stages. In the first stage, data about understudies who have been conceded was gathered. This incorporated the points of interest submitted to the school at the time of enrolment. In the second stage, unessential data was expelled from the gathered information and the applicable data was nourished into a database. The third stage included applying the Id3 and C4.5 calculations on the preparation information to get choice trees of the calculations. Identify Student’s Grade and Dropout Issue 18 Chapter 8 References 5. Experiments In this chapter we have discussed the implementation scenario and obtained the results in detail. This chapter is divided in to four parts. In first part we have discussed the data set and give its related statistics. In second part we discussed shortly parameter settings for both methods and its reasons. In third part we have discussed the results and discussions separately for both the methods. 5.1 ID3 Decision Tree The ID3 algorithm was invented by Ross Quinlan. It is a precursor to the c4.5 algorithm. We can create the decision tree in a given data set using this ID3 algorithm. This algorithm classifies the data using attributes. ID 3 follows the Occams’s Razer Principle. It is used to create the smallest possible decision tree. In an educational system student’s performance can be improved by analyzing the internal assessment and end semester examination. Internal assessment means class test, seminar, attendance, lab practical would be conducted by the teacher. Along with the internal assessment, communication skill and paper presentations done by the student in their academic days are also needed to analyze for the improvement of student’s performance. 5.2. Data set The data set used in this study was obtained from Islamic International University Islamabad of computer science and software engineering department of all courses from Batch 2013. Initially size of data for each subject is 200. Data stored in different tables was joined in a single table. Variable will derive from internal dataset. For Identify dropout student. Data set obtained from admission department. Initially size of data is 9217. But university cannot allow to use admission data so I make data set with my own it is fake data. 5.3. Data selection and transformation For identify student performance Some of the fields were selected which are required for data mining process. Some derived attributes were included. These attributes are given in Table – 2 Attributes Description Possible Value Batch Student Batch {Senior, present} Identify Student’s Grade and Dropout Issue 19 Chapter 8 References Quiz Quiz marks {Good, average, poor} Ass Assignment Marks {Good, Average, Poor} Mid Mid Grades A= (80% - 100%), B+ =( 75% - 79%), B = (70% - 64%), C+=(65% - 69%), C=(60% 65%), D+=(55% - 59%), D= (50% - 54%), and F = < 50%. ESM End semester Marks A= (80% - 100%), B+ =( 75% - 79%), B = (70% - 64%), C+=(65% - 69%), C=(60% 65%), D+=(55% - 59%), D= (50% - 54%), and F = < 50%. Table 2 student relative variables The values for the attributes are explained as follows for student performance. QUIZ – Marks obtained in quiz. In each semester two class tests are conducted and average of three class test are used to calculate sessional marks. Quiz is split into three classes: Poor – <, Average – > 40% and < 60%, Good –>60%. ASS - Assignment performance. In each semester two assignments are given to students by each teacher. Assignment performance is divided into two classes: Yes – student submitted assignment, No – Student not submitted assignment. MID - Grade are assigned to all students using following mapping A – 80% - 100%, B+ – 75% - 79%, B – 70% - 64%, C+ – 65% - 69%, C –60% - 65%, D+ –55% - 59%, D – 50% - 54%, and F - < 50%. ESM - End semester Marks obtained in semester. -Grade are assigned to all students using following mapping A – 80% - 100%, B+ – 75% - 79%, B – 70% - 64%, C+ – 65% 69%, C –60% - 65%, D+ –55% - 59%, D – 50% - 54%, and F - < 50%. For identify dropout issue Some of the fields were selected which are required for data mining process. Some derived attributes were included. These attributes are given in Table – 3 Attributes Description Possible Value Gender Gender Male, female HSSC HSSC Marks A+ – 80% to 100%, A – 70% - 79%, B – 60% - 69%, C – 50% - 59%. Identify Student’s Grade and Dropout Issue 20 Chapter 8 References ESM End semester marks A= (80% - 100%), B+ =( 75% - 79%), B = (70% - 64%), C+=(65% - 69%), C=(60% 65%), D+=(55% - 59%), D= (50% - 54%), and F = < 50%. SQ Sequence { good > 10 and average < 10} QT Quantitative { good > 15, average >10 & < 15. Poor < 10} LO Logic { good > 15, average >10 & < 15. Poor < 10} AT Analytical { good > 10 and average <10} EH English { Vgood > 25, good > 20 & < 25, average >10 & < 20. Poor <10.} Dropout Dropout {yes, no} Table 3 student relative variables The values for the attributes are explained as follows for dropout student. HSSC Marks - Students grade in High School education. Students. Grade are assigned to all students using following mapping A+ – 80% to 100%, A – 70% - 79%, B – 60% 69%, C – 50% - 59%. ESM - End semester Marks obtained in semester. -Grade are assigned to all students using following mapping A – 80% - 100%, B+ – 75% - 79%, B – 70% - 64%, C+ – 65% 69%, C –60% - 65%, D+ –55% - 59%, D – 50% - 54%, and F - < 50%. Sequence – It is entry test part and total number is 15. It is sprit in two classes: good > 10 and average < 10. Quantitative – Entry test part. It is sprit in three classes: good > 15, average >10 & < 15. Poor < 10. Logic – Entry test part. It is sprit in three classes: good > 15, average >10 & < 15. Poor < 10. Analytical– Entry test part. It is sprit in two classes: good > 10, average <10. English – Entry test part. It is sprit in four classes: Vgood > 25, good > 20 & < 25, average >10 & < 20. Poor <10. Identify Student’s Grade and Dropout Issue 21 Chapter 8 References Dropout - Dropout condition. Whether the student continues or not after one year. Possible values are Yes if student continues study and No if student dropped the study after one year. 5.4 Implementation of Mining Model Weka is open source programming that executes a vast gathering of machine inclining calculations and is broadly utilized as a part of information mining applications. From the above information, drop.arff document was made. This record was stacked into WEKA wayfarer. The characterize board empowers the client to apply order and relapse calculations to the ensuing dataset, to gauge the precision of the ensuing prescient model, and to imagine wrong forecasts, or the model itself. The calculation utilized for order is Naive Bayes. Under the "Test choices", the 10-fold cross-approval is chosen as our assessment approach. Since there is no different assessment information set, this is important to get a sensible thought of exactness of the produced model. This prescient model gives approach to anticipate whether another understudy will keep on selecting or not following one year. 5.5 Decision Tree A decision tree is a tree in which each one limb hub speaks to a decision between various plan B, and each one leaf hub speaks to a choice. Decision tree are generally utilized for picking up data with the end goal of choice -making. Decision tree begins with a root hub on which it is for clients to take activities. From this hub, clients part every hub recursively as per choice tree learning calculation. The last come about is a choice tree in which each one extension speaks to a conceivable situation of choice and its result. The three generally utilized choice tree learning calculations are: Id3, ASSISTANT and C4.5. 5.6 The ID3 Decision Tree Id3 is a basic choice tree learning calculation created by Ross Quinlan [14]. The essential thought of Id3 calculation is to build the choice tree by utilizing a top-down, covetous pursuit through the offered sets to test each one property at each tree hub. Keeping in mind the end goal to choose the trait that is most valuable for characterizing a given sets, we present a metric – data pick up. To discover an ideal approach to arrange a learning set, what we have to do is to minimize the inquiries asked (i.e. minimizing the profundity of the tree). Subsequently, we require some capacity which can measure which inquiries give the most adjusted part. The data pick up metric is such a capacity. 5.7 Impurity Measurement In the dataset there will be a few quantities of characteristics and classes of properties. The estimation of homogeneity or heterogeneity of information in the dataset is focused around Identify Student’s Grade and Dropout Issue 22 Chapter 8 References classes. The immaculateness of table can be distinguished by, which contains one and only class. The information table which comprise of more than a few classes are known as heterogeneous or debasement of table. There are a few approaches to measure debasement of table polluting influence in the tables. Anyhow the well strategy is entropy, gini file and grouping blunder. Thus, the system for Entropy is utilized to ascertain the quantitative debasement. Entropy of immaculate table gets to be zero when the likelihood turns into one and it achieves greatest qualities when all classes in the dataset have approach likelihood. 5.8 Entropy Given probabilities p1, p2, … , ps, where pi = 1, Entropy is characterized as H(p1, p2, … , ps) = - (pi log pi) Entropy discovers the measure of request in a given database state. An estimation of H = 0 recognizes a flawlessly characterized set. As it were, the higher the entropy, the higher the possibility to progress the grouping procedure. 5.9 Information gain The Information increase can be expanded with the normal virtue of the subsets which are delivered by the properties in the given information set. This measure is utilized to focus the best quality for the specific hub in the tree. Selecting the new trait and apportioning the given qualities will be rehashed for every non terminal hub. On the off chance that any quality has been joined higher in the tree, that property will be prohibited. In this way, the greater part of the given traits will be seemed once in the distance all through the tree. Thusly above procedure will be proceeded in all the leaf hub till any of the conditions are met, (i) Each trait is incorporated once in all the way of the tree, or (ii) If each one trait's entropy quality is zero, the given worth will be connected with the leaf hub. 5.10 ID3 Algorithm Id3 (Values given, Target_attribute, Attributes) Step 1: Create a tree with root hub. Step 2: Return the single tree root hub with mark +, if all the given qualities are certain. Step 3: Return the single tree root hub with name -, if all the given qualities are negative. Identify Student’s Grade and Dropout Issue 23 Chapter 8 References Step 4: Return the single tree root hub with name = most regular estimations of target qualities in the given worth. It can be performed when foreseeing trait is vacant. Step 5: else start (i) A points out best trait in the given quality. (ii) In choice tree the root for a characteristic is A (iii) For An, every conceivable qualities Vi as, (a) If A = Vi include relating limb beneath root. (b) Let given quality Vi which is subset of the given worth Vi for A. (c) If the given quality Vi is unfilled (i) Add another leaf to the extension hub which is equivalent to most normal target esteem in the given quality. (ii) Add the sub tree Id3 to this new limb hub (values given Vi, Target_attribute, Attribute). Step 6: End the procedure. Step 7: Return the root hub. 5.11 C4.5 C4.5 is a well-known calculation used to produce a choice trees. It is an augmentation of the Id3 calculation used to defeat its inconveniences. The choice trees produced by the C4.5 calculation can be utilized for arrangement, and hence, C4.5 is additionally alluded to as a measurable classifier. The C4.5 calculation rolled out various improvements to enhance Id3 calculation [2]. Some of these are: 1. Taking care of preparing information with missing estimations of traits 2. Taking care of varying expense traits 3. Pruning the choice tree after its creation 4. Taking care of traits with discrete and constant qualities Let the preparation information be a set S = s1, s2 ... of effectively ordered specimens. Each one example Si = xl, x2... is a vector where xl, x2 ... speak to qualities or peculiarities of the specimen. The preparation information is a vectorC = c1, c2..., where c1, c2... speak to the class to which each one example has a place with. At every hub of the tree, C4.5 picks one quality of the information that most viably parts information set of examples S into subsets that can be one class or the other [5]. It is the standardized data pick up (distinction in entropy) that comes about because of picking a quality Identify Student’s Grade and Dropout Issue 24 Chapter 8 References for part the information. The characteristic variable with the most astounding standardized data increase is considered to settle on the choice. The C4.5 calculation then proceeds the more diminutive sub-records having next most elevated standardized data pick up. Identify Student’s Grade and Dropout Issue 25 Chapter 8 References 6 Discussion on Result 6.1 For identify student grades. The data set used in this study was obtained from Islamic International University Islamabad of computer science and software engineering department of all courses from Batch 2013. Table 5 is the data sets of subject English total sample are 168. Samples are dividing in three equal parts two parts are used for training and one part use for testing. 112 samples are used for training and 56 samples are used for testing. ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Batch present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present Quiz good good good good good average good good average good good good good good average good average good good average good average good average average good good good good good good good Identify Student’s Grade and Dropout Issue Ass good good good good good good good good good good good good good good good average good good average good good average good average average good average good average average average good Mid F B+ A A D C A B D A A B A D+ B B+ B D C C C F A B D+ B+ A B B C D+ B ESM B B A A C C+ A B C+ B+ A C+ A B C+ B+ B C+ C B B D A C+ C+ B+ B B B B C+ B 26 Chapter 8 References 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present good average good good good average good good good good good average good good good good good average good good good good good good good good average average average fail good average good good average good average good good good good average poor good good Identify Student’s Grade and Dropout Issue average good good good good average good good average average good average good average good good average good good average good good average good average good average average good good good average good good good average average good good good fail average average good good C+ B B B A C+ D B B A A B A A A B+ B C B D+ A B+ A A C A B+ B+ A B B B+ B A C A F D D D B F D+ B B+ B B B B A B B+ A B+ A B+ C+ B+ B+ A A B+ C B+ C+ A B A A B B B B+ B B A B+ B+ B+ B A D C+ B+ C+ A C+ C+ B+ A 27 Chapter 8 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 References present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present senior present present present present present present present present present present present present present good average average fail average good good good average fail good good good good good good average average good good good good average good good good good good average poor good good good good good good good average average good good good good good average Identify Student’s Grade and Dropout Issue average good average average good good good average average good good good good good good average good good good good average good fail good good good good good average good good average good good average good average good good good good average good average good C B D+ D D+ C A B+ A A A C B+ B+ A B C A B C+ F A C A A C+ B+ A B+ C+ A A B+ A A C A A A B A A A B F B C+ C+ D+ B B A B+ B+ B+ A C B+ B+ B+ B+ B B+ B+ B D A C A A B A A B B A A B+ B+ B+ B+ B+ A B+ B+ A A A B+ C+ 28 Chapter 8 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 References present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present present senior senior average good average good average average good average average good good average good average good good average good average good good good good good good good good fail average average good good good good good good good good average average average good good average good Identify Student’s Grade and Dropout Issue good good good average good good average good average good good good good average average average good average good good good good average good average average good average good average good average good average average average good good good average average good good fail good B A D C+ C+ C+ A C D+ A A B B B C C F A C A A A D B+ B B+ A B+ D A C B B+ A B+ C A C B C+ C B C C+ A C+ B+ C C B B B C+ C+ A B+ B C+ B C+ C+ D A B+ A A A C B+ C+ A A B C B C+ B B+ A B+ C+ A B B B C+ B+ C+ A B+ 29 Chapter 8 168 senior References good good B B Table 4 Data set English subject The root node can be deducted by calculating gain information from the given student data set. Here by we have to calculate the entropy value first. Dataset S is a set of 168 given values are “A”, “B+”,“B”, “C+”, “C”, “D+” “D” and “F” for the attribute ESM. Entropy = - ((A/n) * (log (A/n))) – ((B+/n) * (log (B+/n))) – ((B/n) * (log (B/n))) – ((C+/n) * (log (C+/n))) – ((C/n) * (log (C/n))) - ((D+/n) * (log (D+/n))) – ((D/n) * (log (D/n))) – ((F/n) * (log (F/n))) This form (Fig 1) shows the input values of the given data set. From this input values we calculate the values for Entropy, Gain, Split Information and Gain Ratio for each attribute. Using the Entropy value we are calculating the gain value, Gain = Entropy - (Abs (((A)/n) * (log (A/n)))) / Abs (Entropy) * ((A/n) * (log (A/n)))) – (Abs (((B+)/n) * (log (B+/n)))) / Abs (Entropy) * ((B+/n) * (log (B+/n)))) –(Abs (((B)/n) * (log (B/n)))) / Abs (Entropy) * ((B/n) * (log (B/n)))) –(Abs (((C+)/n) * (log (C+/n)))) / Abs (Entropy) * ((C+/n) * (log (C+/n)))) –(Abs (((C)/n) * (log (C/n)))) / Abs (Entropy) * ((C/n) * (log (C/n)))) – (Abs (((D+)/n) * (log (D+/n)))) / Abs (Entropy) * ((D+/n) * (log (D+/n)))) –(Abs (((D)/n) * (log (D/n)))) / Abs (Entropy) * ((D/n) * (log (D/n)))) –(Abs (((F)/n) * (log (F/n)))) / Abs (Entropy) * ((F/n) * (log (F/n)))) Attribute selection can be done by calculating Gain Ratio. Before that we must calculate the Split Information. Split Information = log (gain) = log (Gain) Using the split value Gain Ratio can be calculated. Gain Ratio = split information / gain This Fig 2 shows the calculated value of entropy, gain, split value and gain ratio for the given attributes. The attribute mid has the maximum gain value, so it is the root node of the decision tree. These calculations will be continued until all the data classification has been done or else till all the given attributes get over. WEKA toolbox is a broadly utilized tool stash for machine learning and information mining initially created at the University of Waikato in New Zealand. It contains a vast gathering of state-of-the-workmanship machine learning and information mining calculations written in Java. Identify Student’s Grade and Dropout Issue 30 Chapter 8 References WEKA contains devices for relapse, arrangement, grouping, affiliation guidelines, visualization, and information preprocessing. WEKA has gotten to be extremely mainstream with scholastic and modern analysts, and is additionally broadly utilized for instructing purposes. To utilize WEKA, the gathered information need to be arranged and changed over to (arff) document organization to be perfect with the WEKA information mining toolbox. Figure 8 English file in weka Decision tree methods have been connected on the dataset close by to construct the order model. The procedures are: the Id3 choice tree calculation. In the wake of applying the preprocessing and readiness systems, we attempt to break down the information outwardly and figure out the conveyance of qualities. Figure 1 show the chat of ESM value. In data set of English subject there are 41 sample of A, 41 sample of B+, 44 sample of B, 28 sample of C+, 9 sample of C, 1 sample of D+ and 4 sample of D. Identify Student’s Grade and Dropout Issue 31 Chapter 8 References Figure 9 C4.5 result Figure show decision tree in which mid is on root node. Variable mid has highest gain ratio so it is in root node The tree created by Id3 calculation was exceptionally profound, since it began by characteristic Mid, which has 8 values. The Mid has the most extreme addition proportion max gain ratio, which made it the beginning node and best variable. Different characteristics took an interest in the decision tree were batch, quiz and Ass. The Id3 tree demonstrated that all these characteristics have an impact on the grades of understudy, yet the most emotional characteristics were: mid and quiz. Different indications could be concentrated from the tree demonstrates that the understudies with Mid = “A" to “C+” are proceed with their study. Figure 10 show the result of decision tree id3 algorithm. Find the accuracy of id3 algorithm Applied classification method on data set. The accuracy of id3 algorithms is 40%. Identify Student’s Grade and Dropout Issue 32 Chapter 8 References Figure 10 evaluation on test split 6.1.1 === Run information === Scheme:weka.classifiers.trees.Id3 Relation: english3-weka.filters.unsupervised.attribute.Remove-R1 Instances: 168 Attributes: 5 Batch Quiz Ass Mid ESM Test mode:split 66.0% train, remainder test Identify Student’s Grade and Dropout Issue 33 Chapter 8 References 6.1.2=== Classifier model (full training set) === Id3 Mid = F | Quiz = good | | Ass = good: B | | Ass = average: D | | Ass = fail: null | Quiz = average | | Ass = good: C+ | | Ass = average: D | | Ass = fail: null | Quiz = fail: null | Quiz = poor: null Mid = B+ | Quiz = good | | | | Ass = good: B+ Ass = average: B+ | | Ass = fail: null | Quiz = average: B | Quiz = fail: B | Quiz = poor: null Mid = A | Quiz = good | | | Ass = good | | Batch = present: A Identify Student’s Grade and Dropout Issue 34 Chapter 8 References | | | Batch = senior: B+ | | Ass = average | | | Batch = present: A | | | Batch = senior: A | | Ass = fail: null | Quiz = average | | Ass = good: B+ | | Ass = average: B | | Ass = fail: null | Quiz = fail: B+ | Quiz = poor: null Mid = D | Quiz = good | | | Ass = good: C+ | Ass = average: C | | Ass = fail: null | Quiz = average: C | Quiz = fail: D+ | Quiz = poor: null Mid = C | Ass = good: B | Ass = average | | Quiz = good: B | | Quiz = average: C+ | | Quiz = fail: null | | | Ass = fail: C Quiz = poor: null Identify Student’s Grade and Dropout Issue 35 Chapter 8 References Mid = B | Quiz = good | | Ass = good | | | | | | | | Batch = senior: B Ass = average: B+ | | | Quiz = average | | Ass = fail: A | | | Batch = present: B+ Ass = good: B Ass = average: C+ | Ass = fail: null | Quiz = fail: B | Quiz = poor: null Mid = D+ | Ass = good: B | Ass = average: C+ | Ass = fail: null Mid = C+ | Ass = good: B | Ass = average | | Quiz = good: B | | Quiz = average: B | | Quiz = fail: null | | Quiz = poor: null | Ass = fail: A Time taken to build model: 0 seconds Identify Student’s Grade and Dropout Issue 36 Chapter 8 References 6.1.3 === Evaluation on test split === === Summary === Correctly Classified Instances 23 40.3509 % Incorrectly Classified Instances 32 56.1404 % Kappa statistic 0.2555 Mean absolute error 0.1728 Root mean squared error 0.3361 Relative absolute error 79.825 % Root relative squared error 102.2144 % UnClassified Instances 2 Total Number of Instances 3.5088 % 57 6.1.4=== Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.417 0.395 0.227 0.417 0.294 0.487 B 0.786 0.073 0.786 0.786 0.786 0.903 A 0.5 0.038 0.333 0.5 0.4 0.2 0.156 0.222 0.2 0.211 0.267 0.075 0.571 0.267 0.705 0.683 0.364 0.747 0 0 0 0 0 0.75 D 0 0 0 0 0 0.5 D+ 0.458 0.418 Weighted Avg. 0.418 0.155 Identify Student’s Grade and Dropout Issue C 0.416 C+ B+ 0.717 37 Chapter 8 References 6.1.4=== Confusion Matrix === a b c d e f g <-- classified as 5 0 1 5 1 0 0| a=B 2 11 0 0 1 0 0 | b = A 1 0 1 0 0 0 0| c=C 6 0 1 2 1 0 0 | d = C+ 7 3 0 1 4 0 0 | e = B+ 1 0 0 1 0 0 0| f=D 0 0 0 0 0 0 0 | g = D+ Figure 11 classifier visalize 6.2 For identify dropout issue. The data set used in this study had been obtained from Islamic International University Islamabad admission department. But university not allow. Table 6 is the data sets of sample are 9217. Samples are divide in three equal parts two part are used for training and one part use for testing. Where 6144 sample are use for training and 3074 samples are used for testing. Gender male male male male male male male male male male male male male male male male HSSC Marks A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ ESM A A A A A A A A A A A A A A A A sequence good good good good good good good good good good good good good good good good Quantitative good good good good good good good good good good good good good good good good Identify Student’s Grade and Dropout Issue Logic good good good good good good good good average average average average average average average average Analytical good good good good average average average average good good good good average average average average English vgood good average poor vgood good average poor vgood good average poor vgood good average poor droupout no no no yes no no no no no yes no no no yes no no 38 Chapter 8 male male male male male male male male male male male male male male male male male male male male male male male . References A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ A+ . . A A A A A A A A A A A A A A A A A A A A A A A . . good good good good good good good good good good good good good good good good good good good good good good good . . . . good good good good good good good good average average average average average average average average average average average average average average average . . . . poor poor poor poor poor poor poor poor good good good good good good good good average average average average average average average . . . . good good good good average average average average good good good good average average average average good good good good average average average . . vgood good average poor vgood good average poor vgood good average poor vgood good average poor vgood good average poor vgood good average . . no yes no no no no no no no no no no no no no no no no no no yes no no . . . . Decision tree methods have been connected on the dataset close by to construct the order model. The procedures are: the Id3 choice tree calculation. In the wake of applying the preprocessing and readiness systems, we attempt to break down the information outwardly and figure out the conveyance of qualities. Figure 1 show the chat of class label dropout. In data set of dropout there are 6103 sample of yes and 3113 sample of no. Identify Student’s Grade and Dropout Issue 39 Chapter 8 References Fihure 12 dropout file in weka 6.2.1 === Run information === Scheme:weka.classifiers.trees.Id3 Relation: dropout7 Instances: 9216 Attributes: 9 Gender HSSC Marks ESM sequence Quantitative Identify Student’s Grade and Dropout Issue 40 Chapter 8 References Logic Analytical English droupout Test mode:split 66.0% train, remainder test Figure 14 run information 6.2.2 === Classifier model (full training set) === Id3 ESM = A | English = vgood | | sequence = good Identify Student’s Grade and Dropout Issue 41 Chapter 8 | | | | References | Quantitative = good: no | | | | | Quantitative = average | | Logic = good: no | | Logic = average | | | | | Analytical = good: no | | | | | Analytical = average: yes Logic = poor: no | | | | | | | Quantitative = poor | | | | Logic = good: no | | | | Logic = average: no | | | | Logic = poor | | | | | Analytical = good: yes | | | | | Analytical = average: no | | sequence = average: no | English = good | | | Quantitative = good | | sequence = good | | | | Logic = good: no | | | | Logic = average: yes | | | | Logic = poor | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | sequence = average: no | | Quantitative = average: no | | Quantitative = poor: no | English = average | | Quantitative = good: no Identify Student’s Grade and Dropout Issue 42 Chapter 8 | References | Quantitative = average: no | | Quantitative = poor | | | | | sequence = average | | | | | | | | | | | | | | Logic = average | | | | Analytical = good: yes Analytical = average: no Logic = poor | | Logic = good: no | | | | sequence = good: no | | | | | | | | | | Analytical = good: no Analytical = average: yes English = poor | | Logic = good | | sequence = good | | | | Analytical = good | | | | | Quantitative = good: yes | Quantitative = average: no | | | | | | | | | | | | | | | | | Quantitative = poor: yes Analytical = average: no sequence = average: no | Logic = average | | | Quantitative = good: no | | | Quantitative = average | | | | sequence = good: no | | | | sequence = average | | | | | | | | | Analytical = good: no | Analytical = average: yes Identify Student’s Grade and Dropout Issue 43 Chapter 8 References | | | Quantitative = poor: no | | Logic = poor | | | Quantitative = good | | | | sequence = good: no | | | | sequence = average: yes | | | Quantitative = average | | | | sequence = good | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | | sequence = average: yes | | | Quantitative = poor: no ESM = B+ | English = vgood | | Quantitative = good | | | Logic = good: no | | | Logic = average | | | | sequence = good: no | | | | sequence = average | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | Logic = poor | | | | sequence = good | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | | sequence = average: no | | Quantitative = average: no | | Quantitative = poor: no Identify Student’s Grade and Dropout Issue 44 Chapter 8 References | English = good | | Quantitative = good | | | Logic = good | | | | sequence = good: yes | | | | sequence = average | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | Logic = average | | | | sequence = good | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | | sequence = average: no | | | Logic = poor: no | | Quantitative = average | | | sequence = good: no | | | sequence = average | | | | Analytical = good: no | | | | Analytical = average | | | | | Logic = good: yes | | | | | Logic = average: no | | | | | Logic = poor: yes | | Quantitative = poor: no | English = average | | Logic = good | | | Quantitative = good: no | | | Quantitative = average: no | | | Quantitative = poor Identify Student’s Grade and Dropout Issue 45 Chapter 8 References | | | | sequence = good | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | | sequence = average: no | | Logic = average | | | Quantitative = good: no | | | Quantitative = average: no | | | Quantitative = poor | | | | Analytical = good: no | | | | Analytical = average: yes | | Logic = poor | | | Quantitative = good | | | | sequence = good: no | | | | sequence = average | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | Quantitative = average: no | | | Quantitative = poor: no | English = poor | | Logic = good | | | sequence = good | | | | Quantitative = good: no | | | | Quantitative = average | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | | Quantitative = poor: no | | | sequence = average Identify Student’s Grade and Dropout Issue 46 Chapter 8 References | | | | Quantitative = good | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | | Quantitative = average: no | | | | Quantitative = poor | | | | | Analytical = good: no | | | | | Analytical = average: yes | | Logic = average | | | Quantitative = good: no | | | Quantitative = average | | | | sequence = good: no | | | | sequence = average | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | Quantitative = poor: no | | Logic = poor | | | Quantitative = good | | | | Analytical = good: no | | | | Analytical = average: yes | | | Quantitative = average: no | | | Quantitative = poor | | | | sequence = good: yes | | | | sequence = average | | | | | Analytical = good: yes | | | | | Analytical = average: no ESM = B | English = vgood Identify Student’s Grade and Dropout Issue 47 Chapter 8 References | | Logic = good | | | Analytical = good | | | | sequence = good | | | | | Quantitative = good: yes | | | | | Quantitative = average: no | | | | | Quantitative = poor: no | | | | sequence = average | | | | | Quantitative = good: no | | | | | Quantitative = average: yes | | | | | Quantitative = poor: yes | | | Analytical = average | | | | Quantitative = good | | | | | sequence = good: yes | | | | | sequence = average: no | | | | Quantitative = average: no | | | | Quantitative = poor: no | | Logic = average | | | Quantitative = good: no | | | Quantitative = average | | | | sequence = good: no | | | | sequence = average | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | Quantitative = poor: no | | Logic = poor: no | English = good | | sequence = good: no Identify Student’s Grade and Dropout Issue 48 Chapter 8 References | | sequence = average | | | Quantitative = good: no | | | Quantitative = average | | | | Analytical = good: no | | | | Analytical = average | | | | | Logic = good: no | | | | | Logic = average: yes | | | | | Logic = poor: yes | | | Quantitative = poor | | | | Logic = good | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | | Logic = average: no | | | | Logic = poor: no | English = average | | sequence = good: no | | sequence = average | | | Quantitative = good: no | | | Quantitative = average | | | | Logic = good | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | | Logic = average: no | | | | Logic = poor: no | | | Quantitative = poor | | | | Analytical = good | | | | | Logic = good: no Identify Student’s Grade and Dropout Issue 49 Chapter 8 References | | | | | Logic = average: no | | | | | Logic = poor: yes | | | | Analytical = average | | | | | Logic = good: yes | | | | | Logic = average: yes | | | | | Logic = poor: no | English = poor | | Analytical = good | | | Quantitative = good | | | | sequence = good | | | | | Logic = good: no | | | | | Logic = average: yes | | | | | Logic = poor: yes | | | | sequence = average: no | | | Quantitative = average | | | | sequence = good | | | | | Logic = good: yes | | | | | Logic = average: yes | | | | | Logic = poor: no | | | | sequence = average: no | | | Quantitative = poor | | | | sequence = good: no | | | | sequence = average: yes | | Analytical = average | | | sequence = good: no | | | sequence = average | | | | Quantitative = good Identify Student’s Grade and Dropout Issue 50 Chapter 8 References | | | | | Logic = good: no | | | | | Logic = average: no | | | | | Logic = poor: yes | | | | Quantitative = average: no | | | | Quantitative = poor | | | | | Logic = good: yes | | | | | Logic = average: yes | | | | | Logic = poor: no ESM = C+ | English = vgood | | Quantitative = good | | | Logic = good: no | | | Logic = average: no | | | Logic = poor | | | | sequence = good: no | | | | sequence = average | | | | | Analytical = good: yes | | | | | Analytical = average: no | | Quantitative = average | | | Logic = good: no | | | Logic = average: no | | | Logic = poor: no | | Quantitative = poor: no | English = good | | Logic = good | | | Quantitative = good: no | | | Quantitative = average: no Identify Student’s Grade and Dropout Issue 51 Chapter 8 References | | | Quantitative = poor | | | | sequence = good: no | | | | sequence = average | | | | | Analytical = good: yes | | | | | Analytical = average: no | | Logic = average: no | | Logic = poor | | | Quantitative = good: no | | | Quantitative = average | | | | sequence = good | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | | sequence = average: no | | | Quantitative = poor: no | English = average | | Quantitative = good: no | | Quantitative = average: no | | Quantitative = poor | | | Logic = good: no | | | Logic = average: no | | | Logic = poor | | | | sequence = good | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | | sequence = average: no | English = poor: no ESM = C Identify Student’s Grade and Dropout Issue 52 Chapter 8 References | Analytical = good | | Quantitative = good | | | English = vgood | | | | Logic = good: no | | | | Logic = average: no | | | | Logic = poor | | | | | sequence = good: no | | | | | sequence = average: yes | | | English = good: no | | | English = average: no | | | English = poor | | | | Logic = good | | | | | sequence = good: yes | | | | | sequence = average: no | | | | Logic = average: no | | | | Logic = poor: no | | Quantitative = average: no | | Quantitative = poor: no | Analytical = average | | Quantitative = good | | | English = vgood | | | | Logic = good | | | | | sequence = good: no | | | | | sequence = average: yes | | | | Logic = average: no | | | | Logic = poor: no | | | English = good: no Identify Student’s Grade and Dropout Issue 53 Chapter 8 References | | | English = average: no | | | English = poor: no | | Quantitative = average | | | Logic = good | | | | English = vgood | | | | | sequence = good: yes | | | | | sequence = average: no | | | | English = good: no | | | | English = average: no | | | | English = poor | | | | | sequence = good: no | | | | | sequence = average: yes | | | Logic = average: no | | | Logic = poor | | | | English = vgood: no | | | | English = good: no | | | | English = average: yes | | | | English = poor: no | | Quantitative = poor | | | English = vgood: no | | | English = good | | | | Logic = good: no | | | | Logic = average: yes | | | | Logic = poor: no | | | English = average: no | | | English = poor: no ESM = D+ Identify Student’s Grade and Dropout Issue 54 Chapter 8 References | sequence = good | | English = vgood | | | Quantitative = good: no | | | Quantitative = average | | | | Logic = good: no | | | | Logic = average | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | | Logic = poor: no | | | Quantitative = poor: no | | English = good | | | Quantitative = good | | | | Logic = good: no | | | | Logic = average | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | | Logic = poor: no | | | Quantitative = average: no | | | Quantitative = poor | | | | Logic = good | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | | Logic = average: no | | | | Logic = poor: no | | English = average | | | Quantitative = good | | | | Logic = good: no Identify Student’s Grade and Dropout Issue 55 Chapter 8 References | | | | Logic = average | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | | Logic = poor | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | Quantitative = average: no | | | Quantitative = poor | | | | Logic = good | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | | Logic = average: no | | | | Logic = poor: no | | English = poor | | | Quantitative = good | | | | Logic = good: no | | | | Logic = average: no | | | | Logic = poor | | | | | Analytical = good: no | | | | | Analytical = average: yes | | | Quantitative = average | | | | Logic = good: no | | | | Logic = average | | | | | Analytical = good: yes | | | | | Analytical = average: no | | | | Logic = poor: no | | | Quantitative = poor: no Identify Student’s Grade and Dropout Issue 56 Chapter 8 References | sequence = average | | Analytical = good | | | English = vgood | | | | Quantitative = good: no | | | | Quantitative = average: no | | | | Quantitative = poor | | | | | Logic = good: no | | | | | Logic = average: yes | | | | | Logic = poor: no | | | English = good | | | | Quantitative = good | | | | | Logic = good: yes | | | | | Logic = average: yes | | | | | Logic = poor: no | | | | Quantitative = average | | | | | Logic = good: no | | | | | Logic = average: no | | | | | Logic = poor: yes | | | | Quantitative = poor | | | | | Logic = good: yes | | | | | Logic = average: yes | | | | | Logic = poor: no | | | English = average | | | | Quantitative = good | | | | | Logic = good: yes | | | | | Logic = average: yes | | | | | Logic = poor: no Identify Student’s Grade and Dropout Issue 57 Chapter 8 References | | | | Quantitative = average | | | | | Logic = good: yes | | | | | Logic = average: no | | | | | Logic = poor: yes | | | | Quantitative = poor: no | | | English = poor | | | | Logic = good: yes | | | | Logic = average | | | | | Quantitative = good: yes | | | | | Quantitative = average: no | | | | | Quantitative = poor: no | | | | Logic = poor: no | | Analytical = average | | | Quantitative = good: no | | | Quantitative = average | | | | English = vgood: no | | | | English = good: no | | | | English = average: no | | | | English = poor | | | | | Logic = good: no | | | | | Logic = average: no | | | | | Logic = poor: yes | | | Quantitative = poor | | | | Logic = good | | | | | English = vgood: no | | | | | English = good: no | | | | | English = average: no Identify Student’s Grade and Dropout Issue 58 Chapter 8 References | | | | | English = poor: yes | | | | Logic = average | | | | | English = vgood: yes | | | | | English = good: yes | | | | | English = average: no | | | | | English = poor: no | | | | Logic = poor | | | | | HSSC Marks = A+: no | | | | | HSSC Marks = A: no | | | | | HSSC Marks = B: no | | | | | HSSC Marks = C | | | | | | English = vgood: no | | | | | | English = good: no | | | | | | English = average: no | | | | | | | | | | | | | | | | English = poor | | | | Gender = male: no Gender = female: yes ESM = D: yes ESM = F: yes Time taken to build model: 0.02 seconds 6.2.3 === Evaluation on test split === === Summary === Correctly Classified Instances 3088 Incorrectly Classified Instances 45 Identify Student’s Grade and Dropout Issue 98.5637 % 1.4363 % 59 Chapter 8 References Kappa statistic 0.9678 Mean absolute error 0.0144 Root mean squared error Relative absolute error 0.1198 3.2148 % Root relative squared error 25.4112 % Total Number of Instances 3133 Figure 13 evalution of test split 6.2.4 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.986 0.015 0.992 0.986 0.989 0.985 no 0.985 0.014 0.973 0.985 0.979 0.985 yes Identify Student’s Grade and Dropout Issue 60 Chapter 8 References Weighted Avg. 0.986 0.015 0.986 0.986 0.986 0.985 === Confusion Matrix === a b <-- classified as 2058 29 | a = no 16 1030 | b = yes Figure15 C4.5 result Identify Student’s Grade and Dropout Issue 61 Chapter 8 References 7 Conclusion In this paper, the classification task is used on student database to predict the student’s performance on the basis of previous semester database. As there are many approaches that are used for data classification, the decision tree method is used here. Information’s like Attendance, mid paper mark, Assignment marks entry test marks and other variables were collected from the student’s database, to predict the performance at the end of the semester and Information’s like gender, ESM, HSSC marks, entry test marks and other variables were collected from the student’s database, to identify dropout issue. This study will help to the students and the teachers to improve the division of the student. This study will also work to identify those students which needed special attention to reduce failing ration and taking appropriate action at right time. Identify Student’s Grade and Dropout Issue 62 Chapter 8 8. References References [1] J. R. Quinlan, “Introduction of decision tree”, Journal of Machine learning”, : pp. 81-106, 1986. [2] J. R. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers, Inc, 1992. [3] Brijesh Kumar Bhardwaj, Saurabh Pal. “Mining Educational Data to Analyze Students‟ Performance” (IJCSIS), Vol. 2, No. 6, 2011 [4] Abeer Badr El Din Ahmed, Ibrahim Sayed Elaraby “Data Mining: A prediction for Student's Performance Using Classification Method” (WJCAT) 2(2): 43-47, 2014 [5] Brijesh Kumar Bhardwaj, Saurabh Pal “Data Mining: A prediction for performance improvement using classification” (IJCSIS), Vol. 9, No. 4, April 2011 [6] Surjeet Kumar Yadav , Saurabh Pal. “Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification” (WCSIT) Vol. 2, No. 2, 51-56, 2012 [7] Dr. Saurabh Pal. Mining Educational Data Using Classification to Decrease Dropout Rate of Students” (IJMSE), VOL. 3, NO. 5, MAY 2012 [8] Saurabh Pal. “Mining Educational Data to Reduce Dropout Rates of Engineering Students” (ijieeb), 2012 [9] K.Shanmuga Priya and A.V.Senthil Kumar. ”Improving the Student’s Performance Using Educational Data Mining” (IJANA) Volume: 04 Issue: 04 Pages:1680-1685 (2013) [10] Pimpa Cheewaprakobkit, “Study of Factors Analysis Affecting Academic Achievement of Undergraduate Students in International Program” IMECS 2013, March 13 - 15, 2013 [11] Ajay Kumar Pal and Saurabh Pal “Data Mining Techniques in EDM for Predicting the Performance of Students” JCSIT, Volume 02– Issue 06, November 2013 [12] U . K. Pandey, and S. Pal, “Data Mining: A prediction of performer or underperformer using classification”, (IJCSIT) International Journal of Computer Science and Information Technology, Vol. 2(2), pp.686-690, ISSN:0975-9646, 2011. [13] Mrinal Pandey, Vivek Kumar Sharma “A Decision Tree Algorithm Pertaining to the Student Performance Analysis and Prediction” International Journal of Computer Applications (0975 – 8887) Volume 61– No.13, January 2013 [14] Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao “PREDICTING STUDENTS’ PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS” International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013. Identify Student’s Grade and Dropout Issue 63 Chapter 8 References [15] Bangsuk Jantawan, Cheng-Fa Tsai “The Application of Data Mining to Build Classification Model for Predicting Graduate Employment” (IJCSIS) International Journal of Computer Science and Information Security, Vol. 11, No. 10, October 2013 [16] P. Ajith, M.S.S.Sai, B. Tejaswi “Evaluation of Student Performance: An Outlier Detection Perspective” International Journal of Innovative Technology and Exploring Engineering (IJITEE)ISSN: 2278-3075, Volume-2, Issue-2, January, 2013 [15] Margret H. Dunham, “Data Mining: Introductory and advance topic”. [16] http://en.wikipedia.org/wiki/Predictive_modelling Identify Student’s Grade and Dropout Issue 64