Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Research in Computer and Communication Technology, Vol 3, Issue 10, October - 2014 ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156 Analyzing Feedback Patterns Using Data Mining Techinque Shaunak Chheda Prof. Lynette D’mello Department of Computer Engineering D.J. Sanghvi College of Engineering Mumbai, India Email: [email protected] Department of Computer Engineering D.J. Sanghvi College of Engineering Mumbai, India Email: [email protected] Abstract— This paper presents a data mining technique that can be used to study which courses a student will more likely be interested in, during his graduation. The raw data was collected from feedback forms of an institution offering various courses. We processed the raw data available and performed t-weight calculations to present useful results. Keywords—Data mining, KDD, Data preprocessing, t- weight. I. INTRODUCTION The data mining approach, a relatively new technique, is deployed in large databases to find novel and useful patterns that might otherwise remain unknown. This paper presents a data mining approach to study course likeliness patterns amongst students of various departments in a university. The data mining process consists of a series of transformation steps, as shown in Figure 1. It is the overall process of converting raw data into useful information. It deals with processing the raw data which is been collected from disparate sources and convert it into a uniform format [2]. The main aim of data preprocessing is to select relevant data with respect to the data mining task in hand. It consists of the following tasks: 1) Data cleaning Data cleaning processes attempts to “clean” the data by filling in missing values, smoothing noisy data, identifying or removing outliers and resolving inconsistencies. Ambiguous data can cause confusion for the mining procedure, resulting in an undesirable output. 2) Data integration Data is drawn from various sources for analysis. This would involve data integrating i.e. integrating multiple databases, files, etc. It should be ensured that all the data is represented in a consistent manner without any redundancies. 3) Data reduction It deals with eliminating all the irrelevant data and thereby reducing the size of data set. It obtains a reduced representation of the data set that is much smaller in volume, yet produces the same analytical result. The original data can be compressed or can be replaced by alternative, smaller representation. 4) Data transformation It converts the data into appropriate forms for mining so that the resulting mining process may be more efficient, and the patterns found easier to understand. II. Data mining Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. It is the most essential process where intelligent methods are applied to extract useful data patterns from the preprocessed data. Figure 1. KDD (Knowledge Discovery in Databases) Process I. Data Preprocessing www.ijrcct.org III. Data Post-processing It deals with the presentation of the information from the data mining phase in a manner which is easier for a user to understand. It consists of: Page 1201 International Journal of Research in Computer and Communication Technology, Vol 3, Issue 10, October - 2014 1) Pattern evaluation It deals with identifying truly interesting patterns from the processed data. 2) Knowledge presentation It utilizes various data visualization and knowledge representation techniques like graphs, charts, etc to present mined knowledge to the users [2]. II. DESCRIPTION OF DATA The University comprises of three departments namely P, Q and R. Each department consists of 50 students. The University offers various courses for its students. In this paper, we analyzed and counted feedbacks from all the students who attended the respective seminar. A positive feedback was taken as a “Yes” and a negative feedback was taken as a “No”. This study looks at the likelihood that a student belonging to a particular department would prefer to take a particular course of his interest in future. III. DATA PRE-PROCESSING The data for this study is collected from feedback forms from students of different departments and read into a spreadsheet. This data is then cleaned up by removing extra or unnecessary information and then integrated into a database. The resulting database contains 150 rows (one for each student) and four attributes (one for the department id of the student and one for each course covered). In Table 1 we present a sample of the raw data in the database: Table 1: Sample raw data DEPT ID A1 A2 … B1 B2 … C1 C2 … P Y N Q N Y R N N N N N Y Y Y Y Y Y Y N Y In the University, there are three major departments namely P, Q and R. The strength of each department is 50 students. In this dataset there was only one student whose department was “S.” Since there was only one student belonging to the Department S, we eliminated this row from our data set. ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156 Some students did not provide any feedback on the course. This meant that the student did not submit his feedback for a particular course. For any course that did not have a feedback from a particular student, we replaced it with the majority feedback for the department on that particular course. So, if a student from Department Q did not give a feedback for the course C1, but a majority of the other students from Department Q voted “Y” for that course, then we replaced the "No feedback" with a “Y”. The same process was carried out for students who could not attend the course. This made our dataset (or database, as shown in Table 1) ready to be processed or mined. We start our quantitative analysis with an exploratory quantitative analysis tool: t-weight calculations IV. QUANTITATIVE CHARACTERISTIC RULE A logic rule that deals with quantitative information is called quantitative rule. t-weights are an exploratory quantitative data analysis tool that present visualizations of within-class comparisons. Let qa be generalized tuple describing the target class. A measure t-weight for qa is the percentage of tuples of the target class from the initial working relation that are covered by qn t _ weight count (qa ) / i 1 count (qi ) n Where n is the total number of tuples for the target class in the generalized relation. q1,q2,... qn are tuples for the target class and qa is in q1,q2,... qn. The range for t-weight is [0.0, 1.0] or [0%, 100%]. The t-weight rule is expressed in the form: X , t arg et _ class ( X ) condition1 ( X )[t : w1 ] ... conditionm ( X )[t : wm ] The above rule can be understood as: If X is in target class, then there is a probability of wi that the tuple X satisfies conditioni . For example, in this set of data, t-weights will measure for each course, what is the probability that each class will cast a yes feedback or a no feedback. So, for each issue, we count the number of yes feedbacks and no feedbacks for each target class. Using the data from Table 1, we generated t-weights. Table 2. t-weights – Course C1 www.ijrcct.org Page 1202 International Journal of Research in Computer and Communication Technology, Vol 3, Issue 10, October - 2014 DEPT P P Q Q R R C1 Y N Y N Y N COUNT 10 40 45 5 15 35 t-weight 20% 80% 90% 10% 30% 70% The t-weights of Table 2 can be converted into logic rules in the form: Let the target class be Dept(D). Then the corresponding characteristic rule in logic form is: Rule 1: ∀x, Dept(X) =’P’ ⇒ (C1=”Y”)[t:20%] V (C1=”N”)[t:80%] This rule says that if X is in the target class, that is, if a student of the university belongs to department P, there is a 20% probability that this student gave a “No” feedback for course C1, and a 80% probability that this student gave a “Yes” feedback for course C1. Similarly, the next rules that can be generated from Table 2 are: Rule 2: ∀x, Dept(X) =’Q’ ⇒ (C1=”Y”)[t:90%] V (C1=”N”)[t:10%] Rule 3: ∀x, Dept(X) =’R’ ⇒ (C1=”Y”)[t:30%] V (C1=”N”)[t:70%] Table 3. t-weights – Course C2 DEPT P P Q Q R R C2 Y N Y N Y N COUNT 42 8 25 25 12 38 t-weight 84% 16% 50% 50% 24% 76% ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156 Table 4. t-weights – Course C3 DEPT P P Q Q R R C3 Y N Y N Y N COUNT 6 44 21 29 47 3 t-weight 12% 88% 42% 58% 94% 6% The t-weights of Table 4 can be converted to logic rules: Rule 7: ∀x, Dept(X) = ‘P’ ⇒ (C3=”Y”)[t:12%] V (C3=”N”)[t:88%] Rule 8: ∀x, Dept(X) = ‘Q’ ⇒ (C3=”Y”)[t:42%] V (C3=”N”)[t:58%] Rule 9: ∀x, Dept(X) = ‘R’ ⇒ (C3=”Y”)[t:94%] V (C3=”N”)[t:6%] V. CONCLUSION FOR T-WEIGHTS From the t-weight rules, we can come up with the following conclusions: There is a higher probability of students of Department P as well as Department R giving a no feedback, and of Department Q giving a yes feedback for course C1. There is a higher probability of students of Department P giving a yes feedback, and of Department R giving a no feedback for course C2. The probabilities are equally likely in case of Department Q. There is a higher probability of students of Department P as well as Department Q giving a no feedback, and of Department R giving a yes feedback for course C3. VI. CONCLUSION The t-weights of Table 3 can be converted to logic rules: Rule 4: ∀x, Dept(X) = ‘P’ ⇒ (C2=”Y”)[t:84%] V (C2=”N”)[t:16%] Rule 5: ∀x, Dept(X) = ‘Q’ ⇒ (C2=”Y”)[t:50%] V (C2=”N”)[t:50%] Rule 6: ∀x, Dept(X) = ‘R’ ⇒ (C2=”Y”)[t:24%] V (C2=”N”)[t:76%] www.ijrcct.org In this paper, data mining techniques are presented that can be used to study or mine which courses a student belonging to a department would prefer to take during his graduation tenure in the University. We have shown the whole data mining processing – from processing input data to preprocessing to presenting information (in the form of rules) and conclusions. Page 1203 International Journal of Research in Computer and Communication Technology, Vol 3, Issue 10, October - 2014 ISSN (Online) 2278- 5841 ISSN (Print) 2320- 5156 The exploratory data mining technique, t-weights, gave us a picture of what percentage of students from each department gave a positive or a negative feedback for a particular course. This study presents interesting patterns which can be utilized in determining which courses a student from a particular department would be more interested to take up during his term. Such results will be beneficial for the students as well as for the University in deciding which courses it should offer. Our future work aims at improving the efficiency of the results obtained by using advanced data mining techniques such as decision tree analysis and association rule mining. We can also use WEKA, which is a data mining tool available which supports several standard data mining tasks. The WEKA workbench contains a collection of visualization tools and algorithms for data analysis which can be used to provide a better and efficient representation of the results. VII. REFERENCES [1] Sikha Bagui, Dustin Mink, and Patrick Cash, “Data mining techniques to study voting patterns in the US”, Data Science Journal. [2] Jiawei Han, Micheline Kamber, Jian Pei, “Data mining Concepts and Techniques”, Morgan Kaufmann Publishers. [3] Oded Maimon, Lior Rokach, “Introduction to knowledge discovery in databases”. www.ijrcct.org Page 1204