Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Volume 11 Number 1 World Institute for Engineering and Technology Education (WIETE) World Transactions on Engineering and Technology Education Editor-in-Chief: Zenon j. Pudlowski World Institute for Engin eering and Technology Education (WIETE) Melbourne, Australia MELBOURNE 2013 World Transactions on Engineering and Technology Education Contents Z.J. Pudlowski Editorial Pozenel, M. Assess ing teamwork in a software engineering capstone course Zhai, H. Ideas by a local college for building the course major, Materials Formation and Control Engineering 13 Zhan, Q., Liang, J. & Gu, Q. Acti vity theory as a framewo rk of information architecture fo r an e-learning system in vocational education 19 Ward, R.B. Leadership and leopards 25 Varona, D., Capretz, L.F. & A multicultural comparison of software engineers 31 Njock Libii, J. Comparing the calculated coefficients of performance of a class of wind turbines that produce p0wer between 330 kW and 7,500 kW 36 Wang, W-C. & Li, J. The design of an open laboratory information management system based upon a browser/server (8 /S) archi tecture 41 Kasih, J., Ayub, M. & Susanto, S. Predicting students' final pass ing results using the Classification and Regression Trees (CART) algorithm 46 6 Raza, A. Index of Authors 50 4 World fransac1ions on Engineering and Technology Ld11ca1ion Vol. II, No. I, 2013 ©2013 WllJL Predicting students' final passing results using the Classification and Regression Trees (CART) algorithm Julianti Kasiht, Mewati Ayubt & Sani Susantot Maranatha Christian University, Bandung, lndonesiat Parahyangan Catholic Un iversity, Bandung, Indones iat ABSTRACT: The aim of this research shares that of a previous research programme that involved a lecturer as an academic advisor helping students to predict their final passing results based on their performance in several subjects in early semesters of their study period [ 1]. Previous research use d discriminant analysis, which is considered to be impractical in this instance. In this research, a more practical method is introduced and applied, based upon the classification of functions of data mining. The classification was performed usi ng software based upon the Classification and Regression Trees (CART) algorithm. INTRODUCTION Ideally, this work is the fulfilment of the authors ' promise to undertake research to provide academic advisors with a more practical way of predicting the final passing results of a student [I]. The passing results in the Indonesian education system are classified into three grades: Extraordinary (Cum Laude), Very Satisfactory and Satisfactory. The arguments for making this kind of prediction are as follows: • First, one of the important aims of higher education in the Republic of Indonesia is to prepare the academic participants (students) to become members of society with the academic and/or professional abilities to enable them to apply/develop/enrich the foundations of knowledge in the sc iences, technology and the arts. Second, to achieve this aim, undergraduate students are assigned to several academic advisors (an informal translation of the Indonesian term Dasen Wali) throughout their years of higher educational studies. Third, academic advisors, who are lecturers, have as their main task the fostering of students ' academic and nonacademic activities. With regard to students ' academic activities, one of the duties of the academic advisor is to help students in setting up their study plans for each semeste r. Fourth, setting up a study plan includes providing guidance for students regarding how many subjects, and which subjects, to undertake. Fifth, through this guidance, students are expected to obtain the best passing results at the end of their undergraduate study [ 1][2]. In the previous research, it was demonstrated that discriminant analysis helped academic advisors in the Faculty of Information Technology, University X in Bandung, West Java, Indonesia, to predict the final passing results of a student based on his/her performance in some subjects in the early stage, i.e. the first four semesters, of a higher education study programme. (Please note that for reasons of confidentiality, the full name of the institution has not been included). This facility enables academic advisors to assist students to set up their study plans for each semester, so that the students perform to their full potential [!]. This work aims to help the academic advisors with a more practical way of predicting the final passing results of a student. In this research, a data mining task called classification was employed. Classification is performed through a technique called Classification and Regression Trees (CART), which diagrammatically is presented in the form of decision trees. This kind of diagram tree serves in a more practical manner compared to the territorial map employed in the previous research [I]. 46 OVERVIEW OF THE BACKGROUN D THEORY The MIT Technology Review chose data mining as one of ten emerging techno logies that will change the world . umerous definitions of data mining are available. However, Larose concisely defined data mining as discovering knowledge in data in the title of one of his publications [3]. Furthermore, classification is one of the six main tasks of data mining; the others being description, estimati on, prediction, clustering and association . Some important points about classification are as follows: In classification, there are two types of variables. The first is a target variable. This variable is categorical, which means it could be partitioned into seve ral classes or categories. Second is a set of predictor variables. The classification task examines a large set of records, called the training data. Each record contains information about the target variable, as well as a set of input or predictor variables. The algorithm applied to carry out the classification task wou ld, firstly, examine the training data set containing both the predictor variab les and the (already classified) target variable. The algorithm learns which combinations of predictor variables are associated with which target variable. Then, the algorithm looks at new records, for which no information about the target variable is available . Based on the classifications in the training data set, the algorithm assigns classifications to the new records . There are two prominent algorithms in the area of classification: the Classification and Regression Trees (CART) algorithm and C4.5 algorithm . These two algorithms result in a decision tree, which is, basically, a collection of decision nodes, connected by branches, extending downward from the root node until terminating in leaf nodes. The CART algorithm produces a binary tree, while the C4.5 algorithm is not restricted to binary splits. The binary feature of the CART algorithm produces a very readable decision tree and for this reason is chosen for this research . The detailed calculation was performed using Clementine I 0.1 data mining software [3] . EXPERIMENT: THE RESULT AND I TERPRET ATI ON As in the previous research programme, the academic transcripts from 146 alumni served as input data, which were available from the first author [I]. From this work, the students ' final passing resul!s are detem1ined by the final marks from the following five subjects: IF I 02 (1ntroduction to Computer Application), IF I 03 (Introduction to Information Technology), IF I 04 (Algorithms and Programming), IF I 05 (Basic Programming) and IF202 (Linear Algebra and Matrices). The final marks of these five subjects take the role of predictor variables. The final marks for a subject are classified into five groups: A (High Distinction), B (Distinction), C (Credit), D (Pass) and E (Fail) with some intermediates such as B+ and C+ . Based on the passing results, the students ' final passing results were classified into three groups: I - Extraordinary (Cum Laude); 2 - Very Satisfactory; and 3 - Satisfactory. The students ' final passing results take the role of target variable . .. l dl~ +CARTt.lodcl - JuliM?i Ckmentirwa10.t [ lie gd·~ JMert Yttw ! Ool s §l.rpefNOOe r:W! a'*' CG!d & ts ,._ ,O ., ~ tt.i t , J1> • !:!flD • *'' CFilS?-OM i Classes ~-"'- ,~<~i: E- 6{1.r.U'oed;:ioJr.!) ~ BllSbJ:s:s~stald'(1 ;. G3o D~l.Jnd~~*i; {:~ ;:ezw,;,.-~ lg ~ l:-J Figure I : The classification model representation in Clementine I 0.1. 47 The Cleme ntine I 0.1 model is re presented in Figure I. In this figure , the icon on : the left side describes the input data, the top right side describes the algorithm employed, in this case, the CART algori thm, and the bottom right side describes the output of the algorithm. The input data are saved in the form of an SPSS worksheet file , a section of which is shown in Figure 2. Dlstrnctlon 0.shnchOrl A 6• A A A c 0 8• D1sllntllOC'I c H1;hDtshnchon 8• O.s1JnC110n Hi;hOtshnc11en A 0.UtnellOn C• Pass A A c A A 8 A B+ 8• A A 01S1inttion 0.shnchon .High01stinC1ion 01s1inchon Distinction O.stinchon c p,., C• A A A 8 8 ~ Oisiintlion Onlinc11on High Distinction Ontinction A A Otstinc1ion 8• A A Distinction c .A ·0isfoc1ion ·C A ·Distinction "High Distinction figure 2: A part of the data fil e. The result of this model m the form of a two-level decision tree is presented in Figure 3. An interpretation of this diagram follows . NodeO n: % Cateoory : • ();stinctlon 67.808 29.452 2-740 99 : 43 • 4 : 100.000 146 : IF103 A: B+ B; C:C+ I Node2 Node1 Cal~!::!: • Distincllon • High Distinction • Pass Total "' 43.077 55.3SS 1.538 44.521 n Category 28 • Ofstinction 36 • High Distinction 1 • Pass 65 Total "' 87.654 8.642 3 .704 55.479 7l 7 3 81 Figure 3: The two-level decision tree output. Node-0 describes the following information: out of 146 alumni, the number of those with the final passing results High Distinction, Distinction an d Pass are, respectively, 43 (29.452%), 99 (67 .808%) and 4 (2. 740%). 48 Node-I branch represents a group of alumni who received an A or B+ grade for the subject IF 103 (Introduction to Information Technol ogy). From the training data , it was predicted that futu re students belonging to this group would finis h their studies with a High Distinction, Di stinction and Pass, in their fina l passing resul ts 55.385%, 43.077% and 1.53 8% probabilities, respectively. A similar interpretation applies to ode-2. The decision tree can be extended into a longer version. For instance, a three-l evel decision tree for this research problem is g iven in Figure 4. N odeO Cat~ori % n 67.808 29.452 2 .740 99 43 4 100.000 146 • Distinction • Htgh D istinction a Pass Total I..:: I IF103 A: B.,. B; C: CT I I Node 1 Node4 Cal'~O[i % C a t~O!)! n 36 • Pass 43.077 55.385 l.538 Total 44.521 65 • Di.stinction • High Distinction 28 1 % • Distinction • High D istinction • Pass Total n 87.654 8 .642 3 .704 71 7 3 55.479 81 l.:: I IF104 A: a.+ B· C· CT ·_f ______ ________ _, .•·---- ---- -··-Node 3 • Node 2 _c_a_teg ~or~y~_ _ _%_ _ _n i • Distinction • High D istinction 27'.273 72-727 12 32 •_ P_a_s_ s _ _ _ _ _o_.o_oo _ _o Total 30.137 44 '--------------' Category % n ~ 76.190 16 ~ 4 : 19.048 _ _ _ __ __ 1 4 .762 : • D istinction : • High D istinction : •_ P_as _-_s _ · Total 14.384 21 ·- --- -------------·----------- --· Figure 4: The three-level decision tree output. CONCLUSIONS AND SUGGESTION FOR FURTHER RESEARCH This research demonstrates how academic advisors can predict the final passing resulcs of students. Compared to the previous research , thi s demonstrates that the prediction can be performed in a more practical manner[!]. In the previous research, it was necessary to first generate the two discriminant functions and a territorial map, followed by the calculation of two discriminant scores for each student. Finally, the two scores were plotted on the map to find the corresponding group, which represents a group of students ' final pass ing results . In this research, all this is represented by a tree diagram, called a decision tree . Further research may take the form of the investigation of the prediction of the final passing results, which mi ght be based on a data mining task called association . Association is performed through a technique called Market Basket Analysis . Such research will be undertaken in the not too distant future . ACKNOW LEDGMENT The authors would like to express their sincerest thanks to Mr Radiant Victor Imbar, former Dean of the Facul ty of Information Technol ogy at Maranatha Christian Uni versity, for hi s support and providing the data from the academ ic transcripts of alumni . REFERENCES l. 2. 3. Kasih, J. and Susanto, S., Predicting students ' final results through discri minant analys is. World Transactions on Engng. and Technol. Educ., 10, 2, 144-147 (20 12). The Republic ofl ndones ia. Government Regulation about Higher Education, 60 (1999). Larose, D.T., Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken, New Jersey, USA: John Wiley & Sons, Inc. (2005). 49 World Transactions on Engineering and Technology Education Association of Ta iwan Engineering College of Engineering & Engineering Tcchnoloh'Y Education and Management (ATE EM) Nonhem Ill inois Taipd. Ta iwan niversity Col lege of Management Hungkuang Un iversity Taichung. Taiwan NYU•pCJly ~'YU K L niversi ty Vijayawada. Guntur Di strict Polytechnic Institute of cw York Uni versity Andhra Pradesh. India New York. USA TechnolOb'Y Academy for Researc h (C-STAR) Chcnnai. India DeKalb, Illinois. US A POLYTECHSIC JSSTITITTE OF Cornmon" ealt h Sc ience and • Russian Association of Engineerin g Universities Moscow, Ru ssia Technological Education Inst itute of Pi raeus Piraeus-Ath ens. Greece Uni\crsi ty ofBots,,ana Gaborone. Botswana Published by: World Institute for Engi neering and Technology Education (WIET E) 34 Hampshire Road, Glen Waverley, Me lbourne, VIC 3150, Australia E-mail: [email protected] Internet: http://www.wiete.com .au 2013 WIETE !SS 1446-2257 Articles published in the World Transactions on Engineering and Technology Education (WTE&T E) have undergone a formal and rigorous process of peerreview by international referees . Thi s Journa l is copyright. Apart !Tom any fair dea ling for the purpose of private study, research, critici sm or review as permitted under the Copyright Act, no part of this Journal may be reproduced by any process without the written permission of the publisher. Re.1ponsibili1y fo r the con/ent.1· of these articles rests upon the authors and not the publisher. Data presented and conclusions de veloped by the authors are for information on~y and are not intended jar use without independent substmuiaring investigarions 011 the pan of !11e porential user.