Download World Transactions on Engineering and Technology Education Volume 11 Number 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Volume 11 Number 1
World Institute
for Engineering
and Technology
Education (WIETE)
World Transactions on
Engineering and Technology
Education
Editor-in-Chief:
Zenon j. Pudlowski
World Institute for Engin eering and Technology Education (WIETE)
Melbourne, Australia
MELBOURNE 2013
World Transactions on Engineering and Technology Education
Contents
Z.J. Pudlowski
Editorial
Pozenel, M.
Assess ing teamwork in a software engineering capstone course
Zhai, H.
Ideas by a local college for building the course major, Materials Formation
and Control Engineering
13
Zhan, Q., Liang, J. & Gu, Q.
Acti vity theory as a framewo rk of information architecture fo r an e-learning
system in vocational education
19
Ward, R.B.
Leadership and leopards
25
Varona, D., Capretz, L.F. &
A multicultural comparison of software engineers
31
Njock Libii, J.
Comparing the calculated coefficients of performance of a class of wind
turbines that produce p0wer between 330 kW and 7,500 kW
36
Wang, W-C. & Li, J.
The design of an open laboratory information management system based
upon a browser/server (8 /S) archi tecture
41
Kasih, J., Ayub, M. & Susanto, S.
Predicting students' final pass ing results using the Classification and
Regression Trees (CART) algorithm
46
6
Raza, A.
Index of Authors
50
4
World fransac1ions on Engineering and Technology Ld11ca1ion
Vol. II, No. I, 2013
©2013 WllJL
Predicting students' final passing results using the Classification and Regression
Trees (CART) algorithm
Julianti Kasiht, Mewati Ayubt & Sani Susantot
Maranatha Christian University, Bandung, lndonesiat
Parahyangan Catholic Un iversity, Bandung, Indones iat
ABSTRACT: The aim of this research shares that of a previous research programme that involved a lecturer as an
academic advisor helping students to predict their final passing results based on their performance in several subjects in
early semesters of their study period [ 1]. Previous research use d discriminant analysis, which is considered to be
impractical in this instance. In this research, a more practical method is introduced and applied, based upon the
classification of functions of data mining. The classification was performed usi ng software based upon the
Classification and Regression Trees (CART) algorithm.
INTRODUCTION
Ideally, this work is the fulfilment of the authors ' promise to undertake research to provide academic advisors with a
more practical way of predicting the final passing results of a student [I]. The passing results in the Indonesian
education system are classified into three grades: Extraordinary (Cum Laude), Very Satisfactory and Satisfactory.
The arguments for making this kind of prediction are as follows:
•
First, one of the important aims of higher education in the Republic of Indonesia is to prepare the academic
participants (students) to become members of society with the academic and/or professional abilities to enable
them to apply/develop/enrich the foundations of knowledge in the sc iences, technology and the arts.
Second, to achieve this aim, undergraduate students are assigned to several academic advisors (an informal
translation of the Indonesian term Dasen Wali) throughout their years of higher educational studies.
Third, academic advisors, who are lecturers, have as their main task the fostering of students ' academic and nonacademic activities. With regard to students ' academic activities, one of the duties of the academic advisor is to
help students in setting up their study plans for each semeste r.
Fourth, setting up a study plan includes providing guidance for students regarding how many subjects, and which
subjects, to undertake.
Fifth, through this guidance, students are expected to obtain the best passing results at the end of their
undergraduate study [ 1][2].
In the previous research, it was demonstrated that discriminant analysis helped academic advisors in the Faculty of
Information Technology, University X in Bandung, West Java, Indonesia, to predict the final passing results of a
student based on his/her performance in some subjects in the early stage, i.e. the first four semesters, of a higher
education study programme. (Please note that for reasons of confidentiality, the full name of the institution has not been
included). This facility enables academic advisors to assist students to set up their study plans for each semester, so that
the students perform to their full potential [!].
This work aims to help the academic advisors with a more practical way of predicting the final passing results of a
student. In this research, a data mining task called classification was employed. Classification is performed through a
technique called Classification and Regression Trees (CART), which diagrammatically is presented in the form of
decision trees. This kind of diagram tree serves in a more practical manner compared to the territorial map employed in
the previous research [I].
46
OVERVIEW OF THE BACKGROUN D THEORY
The MIT Technology Review chose data mining as one of ten emerging techno logies that will change the world .
umerous definitions of data mining are available. However, Larose concisely defined data mining as discovering
knowledge in data in the title of one of his publications [3]. Furthermore, classification is one of the six main tasks of
data mining; the others being description, estimati on, prediction, clustering and association .
Some important points about classification are as follows:
In classification, there are two types of variables. The first is a target variable. This variable is categorical, which
means it could be partitioned into seve ral classes or categories. Second is a set of predictor variables. The
classification task examines a large set of records, called the training data. Each record contains information about
the target variable, as well as a set of input or predictor variables.
The algorithm applied to carry out the classification task wou ld, firstly, examine the training data set containing
both the predictor variab les and the (already classified) target variable. The algorithm learns which combinations
of predictor variables are associated with which target variable. Then, the algorithm looks at new records, for
which no information about the target variable is available . Based on the classifications in the training data set, the
algorithm assigns classifications to the new records .
There are two prominent algorithms in the area of classification: the Classification and Regression Trees (CART)
algorithm and C4.5 algorithm . These two algorithms result in a decision tree, which is, basically, a collection of
decision nodes, connected by branches, extending downward from the root node until terminating in leaf nodes.
The CART algorithm produces a binary tree, while the C4.5 algorithm is not restricted to binary splits.
The binary feature of the CART algorithm produces a very readable decision tree and for this reason is chosen for
this research . The detailed calculation was performed using Clementine I 0.1 data mining software [3] .
EXPERIMENT: THE RESULT AND I TERPRET ATI ON
As in the previous research programme, the academic transcripts from 146 alumni served as input data, which were
available from the first author [I]. From this work, the students ' final passing resul!s are detem1ined by the final marks
from the following five subjects: IF I 02 (1ntroduction to Computer Application), IF I 03 (Introduction to Information
Technology), IF I 04 (Algorithms and Programming), IF I 05 (Basic Programming) and IF202 (Linear Algebra and
Matrices). The final marks of these five subjects take the role of predictor variables. The final marks for a subject are
classified into five groups: A (High Distinction), B (Distinction), C (Credit), D (Pass) and E (Fail) with some
intermediates such as B+ and C+ .
Based on the passing results, the students ' final passing results were classified into three groups: I - Extraordinary (Cum
Laude); 2 - Very Satisfactory; and 3 - Satisfactory. The students ' final passing results take the role of target variable .
.. l dl~
+CARTt.lodcl - JuliM?i Ckmentirwa10.t
[ lie
gd·~
JMert
Yttw
! Ool s
§l.rpefNOOe
r:W! a'*'
CG!d & ts ,._ ,O ., ~ tt.i t ,
J1> •
!:!flD
•
*''
CFilS?-OM i Classes ~-"'- ,~<~i:
E- 6{1.r.U'oed;:ioJr.!)
~ BllSbJ:s:s~stald'(1
;. G3o D~l.Jnd~~*i;
{:~ ;:ezw,;,.-~
lg ~
l:-J
Figure I : The classification model representation in Clementine I 0.1.
47
The Cleme ntine I 0.1 model is re presented in Figure I. In this figure , the icon on :
the left side describes the input data,
the top right side describes the algorithm employed, in this case, the CART algori thm, and
the bottom right side describes the output of the algorithm.
The input data are saved in the form of an SPSS worksheet file , a section of which is shown in Figure 2.
Dlstrnctlon
0.shnchOrl
A
6•
A
A
A
c
0
8•
D1sllntllOC'I
c
H1;hDtshnchon
8•
O.s1JnC110n
Hi;hOtshnc11en
A
0.UtnellOn
C•
Pass
A
A
c
A
A
8
A
B+
8•
A
A
01S1inttion
0.shnchon
.High01stinC1ion
01s1inchon
Distinction
O.stinchon
c
p,.,
C•
A
A
A
8
8
~
Oisiintlion
Onlinc11on
High Distinction
Ontinction
A
A
Otstinc1ion
8•
A
A
Distinction
c
.A
·0isfoc1ion
·C
A
·Distinction
"High Distinction
figure 2: A part of the data fil e.
The result of this model m the form of a two-level decision tree is presented in Figure 3. An interpretation of this
diagram follows .
NodeO
n:
%
Cateoory
: • ();stinctlon
67.808
29.452
2-740
99 :
43 •
4 :
100.000 146 :
IF103
A: B+
B; C:C+
I
Node2
Node1
Cal~!::!:
• Distincllon
• High Distinction
• Pass
Total
"'
43.077
55.3SS
1.538
44.521
n
Category
28
• Ofstinction
36
• High Distinction
1
• Pass
65
Total
"'
87.654
8.642
3 .704
55.479
7l
7
3
81
Figure 3: The two-level decision tree output.
Node-0 describes the following information: out of 146 alumni, the number of those with the final passing results High
Distinction, Distinction an d Pass are, respectively, 43 (29.452%), 99 (67 .808%) and 4 (2. 740%).
48
Node-I branch represents a group of alumni who received an A or B+ grade for the subject IF 103 (Introduction to
Information Technol ogy). From the training data , it was predicted that futu re students belonging to this group would
finis h their studies with a High Distinction, Di stinction and Pass, in their fina l passing resul ts 55.385%, 43.077% and
1.53 8% probabilities, respectively. A similar interpretation applies to ode-2.
The decision tree can be extended into a longer version. For instance, a three-l evel decision tree for this research
problem is g iven in Figure 4.
N odeO
Cat~ori
%
n
67.808
29.452
2 .740
99
43
4
100.000 146
• Distinction
• Htgh D istinction
a Pass
Total
I..::
I
IF103
A: B.,.
B; C: CT
I
I
Node 1
Node4
Cal'~O[i
%
C a t~O!)!
n
36
• Pass
43.077
55.385
l.538
Total
44.521
65
• Di.stinction
• High Distinction
28
1
%
• Distinction
• High D istinction
• Pass
Total
n
87.654
8 .642
3 .704
71
7
3
55.479
81
l.::
I
IF104
A:
a.+
B· C· CT
·_f ______ ________ _,
.•·---- ---- -··-Node 3
•
Node 2
_c_a_teg
~or~y~_ _ _%_ _ _n i
• Distinction
• High D istinction
27'.273
72-727
12
32
•_ P_a_s_
s _ _ _ _ _o_.o_oo
_ _o
Total
30.137
44
'--------------'
Category
%
n
~
76.190 16 ~
4 :
19.048
_ _ _ __
__ 1
4 .762
: • D istinction
: • High D istinction
: •_ P_as
_-_s _
·
Total
14.384
21
·- --- -------------·----------- --·
Figure 4: The three-level decision tree output.
CONCLUSIONS AND SUGGESTION FOR FURTHER RESEARCH
This research demonstrates how academic advisors can predict the final passing resulcs of students. Compared to the
previous research , thi s demonstrates that the prediction can be performed in a more practical manner[!]. In the previous
research, it was necessary to first generate the two discriminant functions and a territorial map, followed by the
calculation of two discriminant scores for each student. Finally, the two scores were plotted on the map to find the
corresponding group, which represents a group of students ' final pass ing results . In this research, all this is represented
by a tree diagram, called a decision tree .
Further research may take the form of the investigation of the prediction of the final passing results, which mi ght be
based on a data mining task called association . Association is performed through a technique called Market Basket
Analysis . Such research will be undertaken in the not too distant future .
ACKNOW LEDGMENT
The authors would like to express their sincerest thanks to Mr Radiant Victor Imbar, former Dean of the Facul ty of
Information Technol ogy at Maranatha Christian Uni versity, for hi s support and providing the data from the academ ic
transcripts of alumni .
REFERENCES
l.
2.
3.
Kasih, J. and Susanto, S., Predicting students ' final results through discri minant analys is. World Transactions on
Engng. and Technol. Educ., 10, 2, 144-147 (20 12).
The Republic ofl ndones ia. Government Regulation about Higher Education, 60 (1999).
Larose, D.T., Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken, New Jersey, USA: John
Wiley & Sons, Inc. (2005).
49
World Transactions on Engineering and Technology Education
Association of Ta iwan Engineering
College of Engineering &
Engineering Tcchnoloh'Y
Education and Management
(ATE EM)
Nonhem Ill inois
Taipd. Ta iwan
niversity
Col lege of Management
Hungkuang Un iversity
Taichung. Taiwan
NYU•pCJly
~'YU
K L niversi ty
Vijayawada. Guntur Di strict
Polytechnic Institute
of cw York Uni versity
Andhra Pradesh. India
New York. USA
TechnolOb'Y Academy for
Researc h (C-STAR)
Chcnnai. India
DeKalb, Illinois. US A
POLYTECHSIC JSSTITITTE OF
Cornmon" ealt h Sc ience and
•
Russian Association of
Engineerin g Universities
Moscow, Ru ssia
Technological Education Inst itute
of Pi raeus
Piraeus-Ath ens. Greece
Uni\crsi ty ofBots,,ana
Gaborone. Botswana
Published by:
World Institute for Engi neering and Technology Education (WIET E)
34 Hampshire Road, Glen Waverley, Me lbourne, VIC 3150, Australia
E-mail: [email protected]
Internet: http://www.wiete.com .au
2013 WIETE
!SS
1446-2257
Articles published in the World Transactions on Engineering and Technology Education (WTE&T E) have undergone a
formal and rigorous process of peerreview by international referees .
Thi s Journa l is copyright. Apart !Tom any fair dea ling for the purpose of private study, research, critici sm or review as
permitted under the Copyright Act, no part of this Journal may be reproduced by any process without the written
permission of the publisher.
Re.1ponsibili1y fo r the con/ent.1· of these articles rests upon the authors and not the publisher. Data presented and
conclusions de veloped by the authors are for information on~y and are not intended jar use without independent
substmuiaring investigarions 011 the pan of !11e porential user.