Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal on Advanced Computer Theory and Engineering (IJACTE) USE OF ASSOCIATION RULE MINING IN HIGHER SECONDARY EDUCATION IN ODISHA 1 1&2 P. Sunil Kumar, 2 Ashok Kumar Panda Assistant Professor, IISIT, BPUT, Bhubaneswar, Odisha, India Email : [email protected] Abstract - Classroom teaching has been accepted as one of the major mode of delivery by educational institutions in India. The communication between teacher and student has a largest impact on the student. Essentially the language used for communication plays a vital role for encouraging the students to attend classes sincerely and thus produce a good result. This paper uses association rule mining to study the effect of language on the student regularity by measuring the parameters like support, confidence and other interesting measures. II. BACKGROUND AND RELATED WORK Educational data mining has emerged as an independent research area in recent years, culminating in 2008 with the establishment of the annual International Conference on Educational Data Mining, and the Journal of Educational Data Mining. Romero and Ventura [21] provides a comprehensive study of EDM from 1995 to 2005. It describes the need for analyzing the student data which can be used by students, educators and administrators. Keywords: Data Mining, Association Rule, EDM, Interesting Measures, Higher Secondary Education. I. INTRODUCTION Z.N. Khan [14] found Girls with high socioeconomic status were relatively higher achievers in science stream and boys with low socio-economic status were relatively higher achievers in general. India has one of the largest higher education systems in the World, and has been witnessing healthy growth in its number of institutions and enrollment in the last few decades. However, there has been no significant improvement in terms of quality of higher education delivery [1]. Madhyastha and Tanimoto [13] investigated the relationship between consistency and student performance with the aim to provide guidelines for scaffolding instruction. In Odisha the students of rural areas basically undergo ten years of school education where Oriya language is used by the teachers for communication, for majority of the subjects taught. When the students enter into colleges, all of a sudden they have to learn all the subjects in English. The globalization of Indian economy during 2009 introduced several professional and soft skill courses. Students are bound to study these courses in English language only. Hence there is direct effect of language used for communication on the young minds. Beck and Mostow [6]; Pechenizkiy et al. [20] discovered which types of pedagogical support are most effective, either overall or for different groups of students or in different situations. McQuiggan et al. [19], found whether students are experiencing poor selfefficiency. Baker [3] identified students who are offtask. D'Mello et al. [8] studied on students who are bored or frustrated. Dekker et al. [7] Romero et al. [22]; Superby et al. [23] found factors that predict student failure or non-retention in college courses. Data mining is finding hidden patterns in a large collection of data. Data Mining can be used in educational field to enhance our understanding of learning process to focus on identifying, extracting and evaluating variables related to the learning process of students as described by Alaa el-Halees [2]. Han and kamber describes data mining software that allows the users to analyze data from different dimensions and categorize it [5]. III. ASSOCIATION RULE MINING Data Mining is the discovery of hidden information found in databases [4] [18]. Dunham [9] categorized various models and tasks of data mining into two In this paper it is tried to find out the association of language and regularity of students in the class. ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013 31 International Journal on Advanced Computer Theory and Engineering (IJACTE) groups: predictive and descriptive. One o f the m o s t significant descriptive data mining applications is that of mining association rules. Introduced in 1993 [15], used extensively in marketing and retail communities in addition to many other diverse fields [17]. Association rule mining is one of the important technique which aims at extracting, interesting correlation, frequent patterns, associations or casual structures among set of items in the transaction databases or other data mining repositories [17]. Cosine: Consider two vectors X and Y and the angle they form when they are placed so that their tails coincide. When this angle nears 0°, then cosine nears 1, i.e. the two vectors are very similar: all their coordinates are pair wise the same. When this angle is 90 degree the vector are perpendicular, the most dissimilar, and the cosine is 0.The usual form that is given for cosine of an association rule is X, Y. The closer cosine (X⇒Y) is to 1, the more transactions containing item X also contain item Y, and vice versa. On the contrary, the closer cosine (X⇒Y) is to 0, the more transactions contain item X without containing item Y, and vice versa. This equality shows that transactions not containing neither item X nor item Y have no influence on the result of Cosine (X⇒Y). This is known as the null-invariant property. Note also that cosine is a symmetric measure [10, 11]. A formal statement of the association rule problem is as follows: Definition: [16] [12] Let I = {i1, i2… im} be a set of m distinct attributes. Let D be a database, where each record (tuple) T has a unique identifier, and contains a set of items such that T⊆I. An association rule is an implication of the form of X⇒Y, where X,Y ⊆ I are sets of items called itemsets, and X∩Y=ϕ. Here, X is called antecedent while Y i s c a l l e d c o n s e q u e n t ; t h e r u l e me a n X⇒Y. Association r u l e s c a n b e c l a s s i f i e d based on the type of vales, dimensions of data, and levels of abstractions involved in the rule. If a rule concerns associations between the presence or absence of items, it is called Boolean association rule. And the dataset consisting of attributes which can assume only binary (0-absent, 1-present) values is called Boolean database. Added value: The added value of the rule X⇒Y is denoted by AV (X⇒Y) and measures whether the proportion of transactions containing Y among the transactions containing X is greater than the proportion of transactions containing Y among all transactions. Then, only if the probability of finding item Y when item X has been found is greater than the probability of finding item Y at all can we say that X and Y are associated and that X implies Y.A positive number indicates that X and Y are related, while a negative number means that the occurrence of X prevents Y from occurring. Added Value is closely related to another well-known measure of interest, the lift [10]. IV. DATA MINING TECHNIQUES USED IN THIS PAPER Lift: Lift symmetric measure. A lift well above 1 indicates a strong correlation between X and Y. A lift around 1 says that P(X, Y) = P(X)*P(Y). In terms of probability, this means that the occurrence of X and the occurrence of Y in the same transaction are independent events, hence X and Y not correlated. It is easy to show that the lift is 1 exactly when added value is 0; the lift is greater than 1 exactly when added value is positive and the lift is below 1 exactly when added value is negative [10, 11]. Support(X)=P(X) Confidence X⇒Y = Cosine X⇒Y = P(X,Y) P(Y) P(X,Y) √P(X)*P(Y) Added Value(X⇒Y)= Confidence (X⇒Y) - P(Y) Lift((X⇒Y)= Correlation: Correlation is a symmetric measure. A correlation around 0 indicates that X and Y are not correlated a negative figure indicates that X and Y are negatively correlated, and a positive figure indicates that they are positively related. Note that the denominator of the division is positive and smaller than 1.In other words, if the lift is around 1, correlation can still be significantly different from 0[11] Confidence(X⟹Y) P(Y) Correlation(X⇒Y)= Conviction(X⇒Y)= P(X,Y)−P(X)∗P(Y) √P(X)∗P(Y)∗(1−P(X))∗(1−P(Y) 1-P(Y) (1-Confidence(X⟹Y) Conviction: Conviction is not a symmetric measure. A conviction around 1 says that X and Y are independent; while conviction is infinite as conf (X⇒Y) is tending to1.Note that if P(Y) is high then 1-P(Y) is small. In that case even if conf(X, Y) is strong conviction may be small [10]. These measures are calculated on the test data. Support: The support (s) for an association rule X⇒Y is the percentage of transaction that contains X∪Y [9]. Confidence: The confidence or strength ( association rule X⇒Y is the ratio of the number of transactions that contain X [9]. ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013 32 International Journal on Advanced Computer Theory and Engineering (IJACTE) Table 2: Support analysis V. APPLICATION OF THE MEASURES Class Teaching Medium Oriya English Mixed -mode (Oriya, Mixed-mode) (English, Mixed-mode) In this study data was collected about the 2012-13 batch students of 10+2-Science, Sri Jaydev College of Education and Technology, affiliated to Utkal University, Odisha, India. These data are analyzed using Association rule to find the interestingness of student in opting class teaching language. In order to apply this following steps are performed in sequence: c] Measures and their analysis a] Data set: The Boolean data set consisted of the following information related to each student’s opted language of teaching. Table 3 describes that the most of the students (85.7%) who attend the Oriya medium class also joins the Mixed-mode medium class because it has highest confidence. 72% of Mixed-mode medium class students also join the Oriya medium class. 75 % of English medium class students also join the Mixed-mode medium class. Only 18 % of Mixed-mode medium class students joins English medium. Table 1: Data Set Oriy a Englis h Mixed -Mode 0 1 1 1 0 1 0: No and 1: Yes Oriya&Mixed -Mode 1 1 support 0.71 0.20 0.83 0.60 0.15 English &Mixed -mode 0 0 Table 3:Confidence Analysis Class Teaching Medium Oriya ⇒ Mixed − mode Mixed − mode ⇒ Oriya English ⇒ Mixed − mode Mixed − mode ⇒ English The size of the data set is 120. b] Data selection and transformation: In this step data selected from the above table to extract the information. College organized the classes in three different languages to attract the student that is Oriya, English, and Mixed-mode. Oriya medium class contains 80 percent of the lecture in local language and 20 percent English. English medium class contains more than 90 percent in English. Mixed-mode medium class contains English and Oriya in almost equal ratio. Class notes provided in each class only English medium. As the course is available in only English medium college tried to find out the student interestingness in opting classroom teaching language. Following Venn diagram (fig. 3) shows the complete attendance picture of student in different medium class. Confidence 0.857 0.720 0.750 0.180 Table 4 describes cosine analysis valve. It is a symmetric analysis. It means two sets give same results in either direction. In this research paper it shows the angular value between two different medium of class. Table shows that Oriya and Mixed-mode medium class has lower angle (38.216) in comparison to English and Mixed-mode medium. It can be concluded as the Oriya and Mixed-mode medium class has more similarity of student than the English and Mixed-mode medium class. Table 4: Cosine Analysis Class Teaching Medium Oriya ⇒ Mixed − mode Mixed − mode ⇒ Oriya English ⇒ Mixed − mode Mixed − mode ⇒ English Cosine 0.786 0.786 0.367 0.367 Angle 38.210 38.210 68.416 68.416 Table 5 shows the added value analysis. In this table Oriya ⇒ Mixed - mode and Mixed − mode ⇒ Oriya has positive number which shows that they are related to each other. Occurrence of Oriya does not prevent Mixed-mode from occurring similarly occurrence of Mixed-mode does not prevent Oriya from occurring. English ⇒ Mixed − mode and Mixed − mode ⇒ English has negative number which shows that the occurrence of English prevents occurring of Mixedmode similarly occurrence of Mixed-mode prevents occurrence of English. Figure 3 So, college tried to find out the student interestingness of class teaching language. In this sequence college focused on the attendance register. It identified that there is some common interest in the student. On the basis of available attendance college prepared the support of each language. Following table 1 shows the support level for each language. It describes that the mixed-mode medium class had higher percentage of support. Most of the students participate in this classroom. Lowest percentage of support is for English medium class. ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013 33 International Journal on Advanced Computer Theory and Engineering (IJACTE) Table 5:Added value Analysis Class Teaching Medium 6. CONCLUSION Association rules are useful t o find the association between two elements and shows interestingness between them. In this paper seven different interestingness parameters are used to find the interestingness between two different medium of class. From the above analysis it can be concluded that the Mixed-mode medium class is more preferred over Oriya and English medium class. Another conclusion is extracted from confidence, cosine, AV analysis, lift, correlation and conviction analysis is that most of the Oriya medium class students are showing their interest towards Mixed-mode medium class as well as English medium class students also has greater interest in Mixed-mode medium class. So college has to organize a Mixed-mode medium class to keep regularity of students at par. Added Value Oriya ⇒ Mixed − mode 0.024 Mixed − mode ⇒ Oriya 0.020 English ⇒ Mixed − mode -0.083 Mixed − mode ⇒ English -0.020 Table 6 contains lift analysis. It is a symmetric analysis. It shows the occurrence of one item to another item. In this tableOriya ⇒ Mixed - mode and Mixed − mode ⇒ Oriya relation has similar positive value (1.029) greater than 1 which shows that occurrence of first is strongly correlated with the other. In the case ofEnglish ⇒ Mixed − mode and Mixed − mode ⇒ English, it has also same positive value but less than 1 which shows that they are negatively correlated. REFERENCE [1] Higher Education India: Twelveth Five Year Plan (2012-2017) and Beyond. http://www.ficci.com/publicationage.asp?spid=20168 [2] Alaa el-Halees, Mining Students Data to Analyze e- Learning Behavior: A Case Study, 2009. [3] Baker, R.S.J.D., “Modeling and Understanding Students’ Off-Task Behavior in Intelligent Tutoring Systems.” In Proceedings of the ACM CHI 2007: ComputerHuman Interaction conference, pp1059-1068, 2007. [4] Ming-Syan C h e n , J i a w e i H a n a n d Philip S. Yu., Data Mining: An Overview from a Database Perspective, IEEE transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 866-883, 1996. [5] Han J and Kamber M, “Data Mining: concepts and techniques”, 2nd edition. The Morgan Kaufmann series in database management syste, Jim Gray series editor, 2006. [6] Beck, J.E. and Mostow, J., “How who should practice: Using learning decomposition to evaluate the efficacy of different types of practice for different types of students.” In Proceedings of the 9th International Conference on Intelligent Tutoring Systems, pp353-362, 2008. [7] Dekker, G., Pechenizkiy, M. and Vleeshouwers, J., “Predicting Students Drop Out: A Case Study.” In Proceedings of the International Conference on Educational Data Mining, Cordoba, Spain, T. Barnes, M. Desmarais, C. Romero and S. Ventura Eds.,pp41-50, 2009. [8] D'mello, S.K., Craig, S.D., Witherspoon, A.W., Table 6:Lift Analysis Class Teaching Medium Oriya ⇒ Mixed − mode Mixed − mode ⇒ Oriya English ⇒ Mixed − mode Mixed − mode ⇒ English Lift 1.029 1.029 0.900 0.900 Table 7 contains correlation value. In this table Oriya ⇒ Mixed - mode and Mixed − mode ⇒ Oriya has similar positive value and English Mixed-mode and Mixed-mode English has similar negative value because it is a symmetric measurement. This table shows that is positively correlated whereas English ⇒ Mixed − mode and Mixed − mode ⇒ English is negatively correlated to each other. Table7: Correlation Analysis Class Teaching Medium Oriya ⇒ Mixed − mode Mixed − mode ⇒ Oriya English ⇒ Mixed − mode Mixed − mode ⇒ English Correlation 0.098 0.098 -0.112 -0.112 Table 8 shows conviction analysis. It shows that highest conviction is found in the association of Oriya ⇒ Mixed - mode medium class with value 1.167. Lowest conviction is found in association of English ⇒ Mixed − mode with value 0.667. Table 8: Conviction Analysis Class Teaching Medium Oriya ⇒ Mixed − mode Mixed − mode ⇒ Oriya English ⇒ Mixed − mode Mixed − mode ⇒ English conviction 1.167 1.071 0.667 0.976 ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013 34 International Journal on Advanced Computer Theory and Engineering (IJACTE) McDaniel, B.T. and Graesser, A.C., “Automatic Detection of Learner’s Affect from Conversational Cues.” User Modeling and UserAdapted Interaction vol 18. pp45-80, 2008. [9] Margret H Dunham, “Data Mining Introductory and Advanced Topics”, Pearson education ISBN 978-81-7758-785-2,2006. [10] Merceron A and Yacef K, “Interestingness Measures for Association Rules in Educational Data”, 2007. [11] Merceron A, Yacef K,“Revisiting interestingness of strong symmetric association rules in educational data”, Proceedings of the International Workshop on Applying Data Mining in e-Learning , 2007. [12] [13] David Wai-Lok Cheung, Vincent T. Ng, Ada Wai-Chee Fu, and Yongjian Fu, “Efficient Mining of Association Rules in Distributed Databases”, IEEE .IEEE Transactions on Knowledge and Data Engineering, Vol 8, No 6, pages 866–883, 1996. Madhyastha.T.and Tanimoto, S.,” Student Consistency and Implications for Feedback in Online Assessment Systems. “In Proceedings of the 2nd International Conference on Educational Data Mining, pp81-90, 2009 [14] Z. N. Khan, “Scholastic Achievement of Higher Secondary Students in Science Stream”, Journal of Social Sciences,Vol.1, No. 2, , pp84-87, 2005. [15] R. Agrawal, T. Imielinski, A. Swami. Mining Associations between Sets of Items in massive D atabases, Proc. of the ACMSIGMOD Int'l Conference on Management of ata, Washington D.C, 1993. [16] 02, , 314-318, 2010. [17] V. Umarani, Dr. M. Punithavalli, “A STUDY ON EFFECTIVE MINING OF ASSOCIATION RULES FROM HUGE DATABASES”, IJCSR International Journal of Computer Science and Research, Vol. 1,Issue 1, ISSN : 22109668, 2010 [18] Usama M. Fayyad, Gregory Piatetsky- Shapiro, and Padhraic Smyth, From Data Mining to knowledge Discovery: An Overview, Advances in Knowledge Discovery and Data Mining, AAAI Press, , pp 1-34, 1996. [19] Mcquiggan, S., Mott, B. and Lester, J.” Modeling Self-Efficacy in Intelligent Tutoring Systems: An Inductive Approach. “User Modeling and User-Adapted Interaction 18, pp81-123, 2008. [20] Pechenizkiy, M., Calders, T., Vasilyeva, E. and Debra, P. “Mining the Student Assessment Data: Lessons Drawn from a Small Scale Case Study.” In Proceedings of the 1st International Conference on Educational Data mining ,pp187-191, 2008. [21] Romera, C. and Ventura, S.,” Educational Data Mining: A Survey from 1995 to 2005.”Expert Systems with Applications 33, 125-146, 2007. [22] Romero, C ., V e n t u r a , S., Eapejo, P.G. and Hervas, C.,” Data Mining Algorithms to Classify Students.” In Proceedings of the 1st International Conference on Educational Data Mining, pp8-17, 2008. [23] Superby, J.F., Vandamme, J.-P. and Meskens, N., “Determination of factors influencing the achievement of thefirst-year university students using data mining methods.” In Proceedings of the Workshop on Educational Data Mining at the 8th International Conference on Intelligent Tutoring Systems (ITS), pp37-44, 2006. V.Umarani, Dr. M. Punithavalli, “Sampling based Association Rules Mining- A Recent Overview”, (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. ISSN (Print) : 2319 – 2526, Volume-2, Issue-6, 2013 35