Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES Emily Thomas Director of Planning and Institutional Research 1 WHAT IS DATA MINING? + Data mining is the discovery of hidden knowledge, unexpected patterns and new rules in large databases. - Data mining is exploratory. The results lack the protection from spurious conclusions that validates theory-based hypothesis-driven statistics. 2 WHY USE DATA MINING? In the corporate world: • Large amounts of data are captured in enterprise data bases. • These databases are too large for traditional statistical techniques. • Identifying patterns in the data can target profitable, or unprofitable, customers. 3 WHY USE DATA MINING? In institutional research: • Large numbers of variables • We have insufficient time/resources to investigate all the relationships that might be informative. • Identifying data patterns can shed light on student behavior. 4 WHY DATA MINING NOW? • Development of large, integrated enterprise databases • Development of data mining techniques and software • Development of simplified user interface 5 DATA MINING TECHNIQUES • Decision trees • Rule induction • Nearest neighbors • Exploratory factor analysis • Stepwise regression • Neural networks • Clustering • Genetic algorithms 6 DECISION TREE ANALYSIS CHAID: Chi-squared Automatic Interaction Detector (SPSS Answer Tree) 1. Select significant independent variables 2. Identify category groupings or interval breaks to create groups most different with respect to the dependent variable 3. Select as the primary independent variable the one identifying groups with the most different values of the dependent variable 4. Select additional variables to extend each branch if there are further significant differences 7 TRANSFER RETENTION RATES Percent of new full-time Fall 2002 transfers returning in Spring 2003 All new transfers Returned Left N .88 .12 1,258 4 x 2 contingency table: chi square=214.41 p=.0000 GPA 0-1.31 80 .65 43 .35 123 GPA 1.31-3.00 586 .90 63 .10 649 GPA 3.00-4.00 428 .94 25 .06 453 no GPA 7 .21 26 .79 33 8 TRANSFER RETENTION RATES FALL 2002-SPRING 2003 Percent returning in Spring 2003 88% (n=1258) Fall 2002 GPA 0-1.31 65% (n=123) 1.31-3.00 90% (n=649) 3.00-4.00 94% (n=453) missing 21% (n=33) Age 13-20 81% (n=62) 20-48 50% (n=61) 9 SOS 2000: SATISFACTION WITH THE QUALITY OF EDUCATION Percent rating the quality of education good or excellent 70% (n=1695) Self-reported intellectual growth (chi square=418.46) low/none .30 n=122 7% moderate .56 n=560 33% large .79 n=689 41% Very large .91 n=324 19% 10 VERY LARGE INTELLECTUAL GROWTH 19% of students What is your overall impression of the quality of education at this college? 70%* Very large intellectual growth 91%* Quality of instruction Dissatisfied 71%* Satisfied 94%* Very satisfied 97%* Intellectually stimulated * Percent of students reporting “excellent” or “good” quality of education. Not always 93%* Always 100%* 11 LARGE INTELLECTUAL GROWTH 41% of students 79% of students rated educational quality good or excellent Satisfied with academic experience Less than half time 51%* About half the time 72% Concern for you as an individual Dissatisfied 59% Neutral 71% Satisfied 94% More than half the time 88% Almost always 94% Course availability Dissatisfied 77% Neutral 93% Satisfied 93% * Percent of students reporting “excellent” or “good” quality of education. 12 LOW OR MODERATE INTELLECTUAL GROWTH 40% of students INTELLECTUAL GROWTH None/small 30%* Moderate 55% Class size relative to type of course Satisfied with academic experience Very dissatisfied 8% Dissatisfied-satisfied 41% * Percent of students reporting “excellent” or “good” quality of education. Rarely 31% Half the time 58% Sense of belonging Dissatisfied 18% Satisfied 40% More than half time 77% Quality of instruction Dissatisfied 65% Satisfied 86% 13 SOS 2000: SATISFACTION WITH “THIS COLLEGE IN GENERAL” FACULTY COME TO CLASS WELL PREPARED Rarely/less than half the time (25% of students) Concern for you as an individual Condition of Campus buildings and grounds Condition of residence hall facilities Sense of belonging Half or more than half the time (43% of students) Academic experiences [in the classroom] Sense of Sense of bebelonging longing Personal safety Almost always (31% of students) Quality of instruction Sense of belonging Concern for you as an individual Attitudes of campus staff 14 DECISION TREE ADVANTAGES AND DISADVANTAGES + + + + Discover unexpected relationships Identify subgroup differences Use categorical or continuous data Accommodate missing data - Possibly spurious relationships - Presentation difficulties 15 BIBLIOGRAPHY • AnswerTree 2.0: User’s Guide. SPSS, 1998. • Adriaans, P and D Zantinge (1996). Data Mining. Harlow, England and elsewhere: Addison-Wesley. • Bordon, VMH (1995). Segmenting Student Markets with a Student Satisfaction and Priorities Survey. Research in Higher Education 16:2, 115-138. • Neville, PG. (1999). “Decision Trees for Predictive Modeling,” SAS Technical Report, The SAS Institute. • Thomas, EH and N Galambos. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision Tree Analysis. Forthcoming in Research in Higher Education, May 2004. 16