Download use of association rule mining in higher secondary

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Journal on Advanced Computer Theory and Engineering (IJACTE)
USE OF ASSOCIATION RULE MINING IN HIGHER
SECONDARY EDUCATION IN ODISHA
1
1&2
P. Sunil Kumar, 2 Ashok Kumar Panda
Assistant Professor, IISIT, BPUT, Bhubaneswar, Odisha, India
Email : [email protected]
Abstract - Classroom teaching has been accepted as one of
the major mode of delivery by educational institutions in
India. The communication between teacher and student
has a largest impact on the student. Essentially the
language used for communication plays a vital role for
encouraging the students to attend classes sincerely and
thus produce a good result. This paper uses association
rule mining to study the effect of language on the student
regularity by measuring the parameters like support,
confidence and other interesting measures.
II. BACKGROUND AND RELATED
WORK
Educational data mining has emerged as an independent
research area in recent years, culminating in 2008 with
the establishment of the annual International Conference
on Educational Data Mining, and the Journal of
Educational Data Mining. Romero and Ventura [21]
provides a comprehensive study of EDM from 1995
to 2005. It describes the need for analyzing the student
data which can be used by students, educators and
administrators.
Keywords: Data Mining, Association Rule, EDM,
Interesting Measures, Higher Secondary Education.
I. INTRODUCTION
Z.N. Khan [14] found Girls with high socioeconomic status were relatively higher achievers in
science stream and boys with low socio-economic status
were relatively higher achievers in general.
India has one of the largest higher education systems in
the World, and has been witnessing healthy growth in its
number of institutions and enrollment in the last few
decades. However, there has been no significant
improvement in terms of quality of higher education
delivery [1].
Madhyastha and Tanimoto [13] investigated the
relationship between consistency and student
performance with the aim to provide guidelines for
scaffolding instruction.
In Odisha the students of rural areas basically undergo
ten years of school education where Oriya language is
used by the teachers for communication, for majority
of the subjects taught. When the students enter into
colleges, all of a sudden they have to learn all the
subjects in English. The globalization of Indian
economy during 2009 introduced several professional
and soft skill courses. Students are bound to study these
courses in English language only. Hence there is direct
effect of language used for communication on the young
minds.
Beck and Mostow [6]; Pechenizkiy et al. [20]
discovered which types of pedagogical support are most
effective, either overall or for different groups of
students or in different situations. McQuiggan et al.
[19], found whether students are experiencing poor selfefficiency. Baker [3] identified students who are offtask. D'Mello et al. [8] studied on students who are
bored or frustrated. Dekker et al. [7] Romero et al. [22];
Superby et al. [23] found factors that predict student
failure or non-retention in college courses.
Data mining is finding hidden patterns in a large
collection of data. Data Mining can be used in
educational field to enhance our understanding of
learning process to focus on identifying, extracting and
evaluating variables related to the learning process of
students as described by Alaa el-Halees [2].
Han and kamber describes data mining software that
allows the users to analyze data from different
dimensions and categorize it [5].
III. ASSOCIATION RULE MINING
Data Mining is the discovery of hidden information
found in databases [4] [18]. Dunham [9] categorized
various models and tasks of data mining into two
In this paper it is tried to find out the association of
language and regularity of students in the class.
ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013
31
International Journal on Advanced Computer Theory and Engineering (IJACTE)
groups: predictive and descriptive. One o f the m o s t
significant descriptive data mining applications is that
of mining association rules. Introduced in 1993 [15],
used extensively in marketing and retail communities
in addition to many other diverse fields [17].
Association rule mining is one of the important
technique which aims at extracting, interesting
correlation, frequent patterns, associations or casual
structures among set of items in the transaction
databases or other data mining repositories [17].
Cosine: Consider two vectors X and Y and the angle
they form when they are placed so that their tails
coincide. When this angle nears 0°, then cosine nears 1,
i.e. the two vectors are very similar: all their coordinates
are pair wise the same. When this angle is 90 degree the
vector are perpendicular, the most dissimilar, and the
cosine is 0.The usual form that is given for cosine of
an association rule is X, Y. The closer cosine (X⇒Y)
is to 1, the more transactions containing item X also
contain item Y, and vice versa. On the contrary, the
closer cosine (X⇒Y) is to 0, the more transactions
contain item X without containing item Y, and vice
versa. This equality shows that transactions not
containing neither item X nor item Y have no influence
on the result of Cosine (X⇒Y). This is known as the
null-invariant property. Note also that cosine is a
symmetric measure [10, 11].
A formal statement of the association rule problem is as
follows:
Definition: [16] [12] Let I = {i1, i2… im} be a set of m
distinct attributes. Let D be a database, where each
record (tuple) T has a unique identifier, and contains a
set of items such that T⊆I. An association rule is an
implication of the form of X⇒Y, where X,Y ⊆ I are
sets of items called itemsets, and X∩Y=ϕ. Here, X is
called antecedent while Y i s c a l l e d c o n s e q u e n t ;
t h e r u l e me a n X⇒Y. Association r u l e s c a n b e
c l a s s i f i e d based on the type of vales, dimensions
of data, and levels of abstractions involved in the rule. If
a rule concerns associations between the presence or
absence of items, it is called Boolean association rule.
And the dataset consisting of attributes which can
assume only binary (0-absent, 1-present) values is called
Boolean database.
Added value: The added value of the rule X⇒Y is
denoted by AV (X⇒Y) and measures whether the
proportion of transactions containing Y among the
transactions containing X is greater than the proportion
of transactions containing Y among all transactions.
Then, only if the probability of finding item Y when
item X has been found is greater than the probability
of finding item Y at all can we say that X and Y are
associated and that X implies Y.A positive number
indicates that X and Y are related, while a negative
number means that the occurrence of X prevents Y from
occurring. Added Value is closely related to another
well-known measure of interest, the lift [10].
IV. DATA MINING TECHNIQUES USED
IN THIS PAPER
Lift: Lift symmetric measure. A lift well above 1
indicates a strong correlation between X and Y. A lift
around 1 says that P(X, Y) = P(X)*P(Y). In terms of
probability, this means that the occurrence of X and the
occurrence of Y in the same transaction are independent
events, hence X and Y not correlated. It is easy to show
that the lift is 1 exactly when added value is 0; the lift is
greater than 1 exactly when added value is positive
and the lift is below 1 exactly when added value is
negative [10, 11].
Support(X)=P(X)
Confidence X⇒Y =
Cosine X⇒Y =
P(X,Y)
P(Y)
P(X,Y)
√P(X)*P(Y)
Added Value(X⇒Y)= Confidence (X⇒Y) - P(Y)
Lift((X⇒Y)=
Correlation: Correlation is a symmetric measure. A
correlation around 0 indicates that X and Y are not
correlated a negative figure indicates that X and Y are
negatively correlated, and a positive figure indicates that
they are positively related. Note that the denominator of
the division is positive and smaller than 1.In other
words, if the lift is around 1, correlation can still be
significantly different from 0[11]
Confidence(X⟹Y)
P(Y)
Correlation(X⇒Y)=
Conviction(X⇒Y)=
P(X,Y)−P(X)∗P(Y)
√P(X)∗P(Y)∗(1−P(X))∗(1−P(Y)
1-P(Y)
(1-Confidence(X⟹Y)
Conviction: Conviction is not a symmetric measure. A
conviction around 1 says that X and Y are independent;
while conviction is infinite as conf (X⇒Y) is tending
to1.Note that if P(Y) is high then 1-P(Y) is small. In that
case even if conf(X, Y) is strong conviction may be
small [10].
These measures are calculated on the test data.
Support: The support (s) for an association rule X⇒Y
is the percentage of transaction that contains X∪Y [9].
Confidence: The confidence or strength (
association rule X⇒Y is the ratio of the number of
transactions that contain X [9].
ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013
32
International Journal on Advanced Computer Theory and Engineering (IJACTE)
Table 2: Support analysis
V. APPLICATION OF THE MEASURES
Class Teaching Medium
Oriya
English
Mixed -mode
(Oriya, Mixed-mode)
(English, Mixed-mode)
In this study data was collected about the 2012-13 batch
students of 10+2-Science, Sri Jaydev College of
Education and Technology, affiliated to Utkal
University, Odisha, India. These data are analyzed using
Association rule to find the interestingness of student in
opting class teaching language. In order to apply this
following steps are performed in sequence:
c] Measures and their analysis
a] Data set: The Boolean data set consisted of the
following information related to each student’s opted
language of teaching.
Table 3 describes that the most of the students
(85.7%) who attend the Oriya medium class also joins
the Mixed-mode medium class because it has highest
confidence. 72% of Mixed-mode medium class students
also join the Oriya medium class. 75 % of English
medium class students also join the Mixed-mode
medium class. Only 18 % of Mixed-mode medium class
students joins English medium.
Table 1: Data Set
Oriy
a
Englis
h
Mixed
-Mode
0
1
1
1
0
1
0: No and 1: Yes
Oriya&Mixed
-Mode
1
1
support
0.71
0.20
0.83
0.60
0.15
English
&Mixed
-mode
0
0
Table 3:Confidence Analysis
Class Teaching Medium
Oriya ⇒ Mixed − mode
Mixed − mode ⇒ Oriya
English ⇒ Mixed − mode
Mixed − mode ⇒ English
The size of the data set is 120.
b] Data selection and transformation: In this step
data selected from the above table to extract the
information. College organized the classes in three
different languages to attract the student that is Oriya,
English, and Mixed-mode. Oriya medium class contains
80 percent of the lecture in local language and 20
percent English. English medium class contains more
than 90 percent in English. Mixed-mode medium class
contains English and Oriya in almost equal ratio. Class
notes provided in each class only English medium. As
the course is available in only English medium college
tried to find out the student interestingness in opting
classroom teaching language. Following Venn diagram
(fig. 3) shows the complete attendance picture of
student in different medium class.
Confidence
0.857
0.720
0.750
0.180
Table 4 describes cosine analysis valve. It is a
symmetric analysis. It means two sets give same results
in either direction. In this research paper it shows the
angular value between two different medium of class.
Table shows that Oriya and Mixed-mode medium class
has lower angle (38.216) in comparison to English and
Mixed-mode medium. It can be concluded as the Oriya
and Mixed-mode medium class has more similarity of
student than the English and Mixed-mode medium class.
Table 4: Cosine Analysis
Class Teaching Medium
Oriya ⇒ Mixed − mode
Mixed − mode ⇒ Oriya
English ⇒ Mixed − mode
Mixed − mode ⇒ English
Cosine
0.786
0.786
0.367
0.367
Angle
38.210
38.210
68.416
68.416
Table 5 shows the added value analysis. In this table
Oriya ⇒ Mixed - mode and Mixed − mode ⇒
Oriya has positive number which shows that they are
related to each other. Occurrence of Oriya does not
prevent Mixed-mode from occurring similarly
occurrence of Mixed-mode does not prevent Oriya from
occurring.
English ⇒ Mixed − mode and Mixed − mode ⇒
English has negative number which shows that the
occurrence of English prevents occurring of Mixedmode similarly occurrence of Mixed-mode prevents
occurrence of English.
Figure 3
So, college tried to find out the student interestingness
of class teaching language. In this sequence college
focused on the attendance register. It identified that
there is some common interest in the student. On the
basis of available attendance college prepared the
support of each language. Following table 1 shows the
support level for each language. It describes that the
mixed-mode medium class had higher percentage of
support. Most of the students participate in this
classroom. Lowest percentage of support is for English
medium class.
ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013
33
International Journal on Advanced Computer Theory and Engineering (IJACTE)
Table 5:Added value Analysis
Class Teaching Medium
6. CONCLUSION
Association
rules
are
useful t o
find
the
association between two elements and shows
interestingness between them. In this paper seven
different interestingness parameters are used to find
the interestingness between two different medium of
class. From the above analysis it can be concluded that
the Mixed-mode medium class is more preferred over
Oriya and English medium class. Another conclusion is
extracted from confidence, cosine, AV analysis, lift,
correlation and conviction analysis is that most of the
Oriya medium class students are showing their interest
towards Mixed-mode medium class as well as English
medium class students also has greater interest in
Mixed-mode medium class. So college has to organize a
Mixed-mode medium class to keep regularity of
students at par.
Added Value
Oriya ⇒ Mixed − mode
0.024
Mixed − mode ⇒ Oriya
0.020
English ⇒ Mixed − mode
-0.083
Mixed − mode ⇒ English
-0.020
Table 6 contains lift analysis. It is a symmetric analysis.
It shows the occurrence of one item to another item. In
this tableOriya ⇒ Mixed - mode and Mixed − mode ⇒
Oriya relation has similar positive value (1.029) greater
than 1 which shows that occurrence of first is strongly
correlated with the other. In the case ofEnglish ⇒
Mixed − mode and Mixed − mode ⇒ English, it has
also same positive value but less than 1 which shows
that they are negatively correlated.
REFERENCE
[1]
Higher Education India: Twelveth Five Year Plan
(2012-2017)
and
Beyond.
http://www.ficci.com/publicationage.asp?spid=20168
[2]
Alaa el-Halees, Mining Students Data to
Analyze e- Learning Behavior: A Case Study,
2009.
[3]
Baker,
R.S.J.D., “Modeling
and
Understanding Students’ Off-Task
Behavior
in
Intelligent
Tutoring Systems.” In
Proceedings of the ACM CHI 2007: ComputerHuman Interaction conference, pp1059-1068,
2007.
[4]
Ming-Syan C h e n , J i a w e i H a n a n d Philip
S. Yu., Data Mining: An Overview from a
Database Perspective, IEEE transactions on
Knowledge and Data Engineering, Vol. 8, No. 6,
pp. 866-883, 1996.
[5]
Han J and Kamber M, “Data Mining: concepts
and techniques”, 2nd edition. The Morgan
Kaufmann series in database management syste,
Jim Gray series editor, 2006.
[6]
Beck, J.E. and Mostow, J., “How who should
practice: Using learning decomposition to
evaluate the efficacy of different types of practice
for different types of students.” In Proceedings
of the 9th International Conference on
Intelligent Tutoring Systems, pp353-362, 2008.
[7]
Dekker, G., Pechenizkiy, M. and Vleeshouwers,
J., “Predicting Students Drop Out: A Case
Study.” In Proceedings of the International
Conference on Educational Data Mining,
Cordoba, Spain, T. Barnes, M. Desmarais, C.
Romero and S. Ventura Eds.,pp41-50, 2009.
[8]
D'mello, S.K., Craig, S.D., Witherspoon, A.W.,
Table 6:Lift Analysis
Class Teaching Medium
Oriya ⇒ Mixed − mode
Mixed − mode ⇒ Oriya
English ⇒ Mixed − mode
Mixed − mode ⇒ English
Lift
1.029
1.029
0.900
0.900
Table 7 contains correlation value. In this table
Oriya ⇒ Mixed - mode and Mixed − mode ⇒ Oriya
has similar positive value and English Mixed-mode
and Mixed-mode  English has similar negative value
because it is a symmetric measurement. This table
shows that is positively correlated whereas English ⇒
Mixed − mode and Mixed − mode ⇒ English
is
negatively correlated to each other.
Table7: Correlation Analysis
Class Teaching Medium
Oriya ⇒ Mixed − mode
Mixed − mode ⇒ Oriya
English ⇒ Mixed − mode
Mixed − mode ⇒ English
Correlation
0.098
0.098
-0.112
-0.112
Table 8 shows conviction analysis. It shows that highest
conviction is found in the association of
Oriya ⇒ Mixed - mode medium class with value 1.167.
Lowest conviction is found in association of English ⇒
Mixed − mode with value 0.667.
Table 8: Conviction Analysis
Class Teaching Medium
Oriya ⇒ Mixed − mode
Mixed − mode ⇒ Oriya
English ⇒ Mixed − mode
Mixed − mode ⇒ English
conviction
1.167
1.071
0.667
0.976
ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013
34
International Journal on Advanced Computer Theory and Engineering (IJACTE)
McDaniel, B.T. and Graesser, A.C., “Automatic
Detection
of
Learner’s
Affect
from
Conversational Cues.” User Modeling and UserAdapted Interaction vol 18. pp45-80, 2008.
[9]
Margret
H
Dunham,
“Data
Mining
Introductory and Advanced Topics”, Pearson
education ISBN 978-81-7758-785-2,2006.
[10]
Merceron A and Yacef K, “Interestingness
Measures for Association Rules in Educational
Data”, 2007.
[11]
Merceron
A,
Yacef
K,“Revisiting
interestingness of strong symmetric association
rules in educational data”, Proceedings of the
International Workshop on Applying Data
Mining in e-Learning , 2007.
[12]
[13]
David Wai-Lok Cheung, Vincent T. Ng, Ada
Wai-Chee Fu, and Yongjian Fu, “Efficient
Mining of Association Rules in Distributed
Databases”, IEEE .IEEE
Transactions
on
Knowledge and Data Engineering, Vol 8, No 6,
pages 866–883, 1996.
Madhyastha.T.and Tanimoto, S.,” Student
Consistency and Implications for Feedback in
Online Assessment Systems. “In Proceedings of
the 2nd International Conference on Educational
Data Mining, pp81-90, 2009
[14]
Z. N. Khan, “Scholastic Achievement of Higher
Secondary Students in Science Stream”, Journal
of Social Sciences,Vol.1, No. 2, , pp84-87, 2005.
[15]
R. Agrawal, T. Imielinski, A. Swami. Mining
Associations between Sets of Items
in
massive D atabases, Proc. of the ACMSIGMOD Int'l Conference on Management
of ata, Washington D.C, 1993.
[16]
02, , 314-318, 2010.
[17]
V. Umarani, Dr. M. Punithavalli, “A STUDY
ON EFFECTIVE MINING OF ASSOCIATION
RULES FROM HUGE DATABASES”, IJCSR
International Journal of Computer
Science
and Research, Vol. 1,Issue 1, ISSN : 22109668, 2010
[18]
Usama M. Fayyad, Gregory Piatetsky- Shapiro,
and Padhraic Smyth, From Data Mining to
knowledge Discovery: An Overview, Advances
in Knowledge Discovery and Data Mining,
AAAI Press, , pp 1-34, 1996.
[19]
Mcquiggan, S., Mott, B. and Lester, J.”
Modeling Self-Efficacy in Intelligent Tutoring
Systems: An Inductive Approach. “User
Modeling and User-Adapted Interaction 18,
pp81-123, 2008.
[20]
Pechenizkiy, M., Calders, T., Vasilyeva, E. and
Debra, P. “Mining the Student Assessment Data:
Lessons Drawn from a Small Scale Case Study.”
In Proceedings of the 1st International
Conference on Educational Data
mining
,pp187-191, 2008.
[21]
Romera, C. and Ventura, S.,” Educational Data
Mining: A Survey from 1995 to 2005.”Expert
Systems with Applications 33, 125-146, 2007.
[22]
Romero, C ., V e n t u r a , S., Eapejo, P.G. and
Hervas,
C.,” Data Mining Algorithms to
Classify Students.” In Proceedings of the 1st
International Conference on Educational Data
Mining, pp8-17, 2008.
[23]
Superby, J.F., Vandamme, J.-P. and Meskens, N.,
“Determination of factors influencing the
achievement of thefirst-year university students
using data mining methods.” In Proceedings of
the Workshop on Educational Data Mining at the
8th International Conference on Intelligent
Tutoring Systems (ITS), pp37-44, 2006.
V.Umarani, Dr. M. Punithavalli, “Sampling
based Association Rules Mining- A Recent
Overview”, (IJCSE) International Journal on
Computer Science and Engineering Vol. 02, No.

ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013
35