Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart University Bangkok, Thailand 1 Content PART I Introduction to data mining Data mining technique: association rule discovery Data mining technique: data classification PART II Improving quality of graduate students by data mining Conclusion 2 What Is Data Mining ? Knowledge Discovery from Data: KDD (Data Mining): The process of nontrivial extraction of patterns from data. Patterns that are: •implicit, •previously unknown, and •potentially useful Patterns must be comprehensible for human users. 3 Mining Objective Knowledge Discovery Process: Iterative & Interactive Process Take actions based on findings Data sources Databases, flat files, Complex data Data Warehouses Interpret results Preprocessing data Gathering, cleaning and selecting data Search for patterns: Data Mining Neural nets, machine learning, statistics and others Report findings Analyst reviews output 4 What kind of data can be mined? Databases Relational databases Data warehouses Data Warehouse Transactional databases and Flat files Advanced DB systems and information repositories Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases, multimedia databases Heterogeneous and legacy databases World Wide Web Bioinformatic data 5 Two modes of data mining Predictive data mining Predict behavior based on historic data Use data with known results to build a model that can be later used to explicitly predict values for different data Methods: classification, prediction, … etc. Descriptive data mining Describe patterns in existing data that may be used to guide decisions Methods: Associations rule discovery, Sequence pattern discovery, Clustering, … etc. 6 Data Mining Techniques Data Clustering Association rule discovery Data Classification Outlier detection Data regression Etc. 7 8 Data Classification Classification is the process of assigning new objects to predefined categories or classes Given a set of labeled records Build a model Predict labels for future unlabeled records Example: Age, Educational background, Annual income, Current debts, Housing location => Making Decision Degree=“Master” and Income=7500 => Credit=“Excellent” 9 Three-Step Process of Classification Training Data Model construction Classifier Model Testing Data Model Evaluation Classifier Model Unseen Data Classification 10 Data Mining Tools ANGOSS KnowledgeStudio IBM Intelligent Miner Metaputer PolyAnalyst SAS Enterprise Miner SGI Mineset SPSS Clementine Many others More at http://www.kdnuggets.com/software 11 Data Mining Projects Checklist: Start with well-defined questions Define measures of success and failure Main difficulty: No automation Understanding the problem Data preparation Selection of the right mining methods Interpretation 12 Using Data Mining for Improving Quality of Engineering Graduates Objective: Discover knowledge from large databases of engineering student records. Discovered knowledge are useful in: - Assisting in development of new curricula, - Improvement of existing curricula, - Helping students to select the appropriate major 13 Using a data mining technique to help students in selecting their majors Motivation: - Student major selection is very important factor for his/her success. - Lack of experience and information on each major. Solution: - Find out the profiles of good students for each major using student profile database and course enrollment student databases (10 years) - Determine the most appropriate major for each student 14 A Data Mining based Approach for Improving Quality of Engineering Graduates Data Mining Tool student profile database SQL Server course enrollment student databases User DB2 Java Servlet 15 Data for Data Mining Stu_code Sex Address Sch_GPA ..... GPA 37058063 male Bangkok 2.5 ..... 2.3 37058167 male Songkla 3.4 ..... 3.2 ........... .... ....... ...... .... .... Stu_code Sub_code Term Year Grade 37058063 204111 1 2537 C+ 37058063 403111 1 2537 D 37058063 208111 1 2537 B+ Student profile database course enrollment student databases 16 Data preparation a classification model Stu_code 37058063 37058167 Sex Sch_GPA .... GPA . male Bangkok 2.5 .... 2.3 . male Songkla 3.4 .... 3.2 . .......... .... . Address + Stu_code Sub_code 37058063 204111 1 2537 C+ 37058063 403111 1 2537 D 37058063 208111 1 2537 B+ ....... ...... .... .... Stu_code Sex 204111 403111 37058063 male Medium Low 37058167 male High High ....... .... ...... ....... . … .... ..... ..... Term Year Grade GPA 2.3 3.2 ...... 17 Global Classification Model Global Decision Tree which determines which majors should be appropriate to which students. Each internal node represents a test on student’s profile. Each leaf node represents an appropriate major to be selected 18 Drawbacks of Global Classification Model - Low Precision ~ 50% due to the large number of majors - Number of students is different in each department => the model cannot predict correctly the best major to be selected. - The model proposes a unique major to be selected, a set of possible majors ordered by appropriateness score would be preferred. 19 Classification Model for Each Major - - - Decision tree predicts whether a student is likely to be a good student in a given major. Good students are those that graduate within 4 years and are at the first 40% ranking in a given major. Leaf nodes represent two class: Good and 20 Bad Advantage of Major’s Classification Model Good precision 80% The model predicts the best major to be selected even if number of students in each major is different Its proposes a set of possible majors to be selected ordered by appropriateness score. Encountered problems Database size Other factors that could affect student’s decision: Teacher Preference, etc. 21 Presentation of Discovered Knowledge 22 Applying Association rule discovery for Grade prediction Basket Analysis Education 204111 403111 417167 417168 Medium High Medium Medium 23 Grade Prediction for the Coming Term 24 Presentation of Discovered Knowledge 25 Conclusion & Future works Application of data mining in Education Use data mining techniques for improving quality of engineering students Apply data mining techniques to several other educational domains. 26