Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining Text Data for Useful Information in Higher Education John Zilvinskis Indiana University Institutional Researchers Credo “We have not succeeded in answering all our problems—indeed we sometimes feel we have not completely answered any of them. The answers we have found have only served to raise a whole set of new questions. In some ways we feel that we are as confused as ever, but we think we are confused on a higher level and about more important things.” Earl C. Kelley, Professor of Secondary Education at Wayne University, 1951 Presentation Overview 1. Describe basic concepts of text mining 2. Invite presentation attendees to ask questions and discuss application of this technology 3. List the differences in text mining software 4. Apply this technique to two real life examples 5. Provide implications and considerations Raise your hand if… You have a general understanding of text mining Keep your hand up if… You have or someone you know has participated in a text mining project You have played a significant role in at least one project that used text mining You have written code for or worked on several text mining projects Learning Outcomes As a result of attending this session, participants will be able to: • List fundamental methodologies for organizing text data. • Describe how one could integrate mined text in student learning and performance analytics. • Compare the differences between text mining software packages. • Use text mining methods to refine survey questions. Big Data & Data Mining Big Data (Laney) volume (amount of data) velocity (speed of data) variety (range of data types and sources) Data Mining - Applying algorithms to big data to generate new information Analytics Predictive, Automated, Scale, Real time Data mining to create “actionable intelligence” (Campbell, DeBlois, & Oblinger, 2007, p. 42) Learning v. Student Analytics Text Mining “The need to ‘turn text into numbers’ so powerful algorithms can be applied to large document databases” (Miner, Delen, Elder, Fast, Hill, & Nisbet, 2012, p. 30) Text analytics volume (amount of data) velocity (speed of data) variety (range of data types and sources) Citation Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications Miner, Delen, Elder, Fast, Hill, & Nisbet, 2012 Text Mining Processes Define project and identify data Process data: Establish a corpus, Pre-process data, Extract knowledge Develop models Evaluate results Disseminate results Extract Knowledge Classification Clustering Association Trend analysis Why Not Qualitative Research? • • • • Requires extensive resources Data must be processed in a timely fashion Might not be practical with big data Information must integrate with other data What Kind of Text Can We Mine? For What Purpose Should We Mine? “Perhaps attendees could share what type of textbased datasets are available to them or which ones they would like to have access to. This may help IR staff recognize what text they have access to and can analyze in addition to learning how they may conduct such analyses.” – AIR Program Reviewer How Can We Mine Text in IR? Kind of Data Application essays Written assignments CMS postings Student blogs Course evaluations Surveys E-portfolios Early alert, course drop text For What Purpose Acceptance, enrollment Likelihood of passing Participation Change in student major Faculty success Open-ended questions Student success Student performance Software Freeware RapidMiner – Easy user interface, inverse document frequencies, some aspects for purchase Weka/KEA – Applicable to machine learning, some resources R – Computer science heavy, many online resources Commercial Software Modeler Premium – (SPSS, IBM), strong user interface, other analytics tools, easy to use and comprehensive dictionary Enterprise Miner – (SAS), moderate user interface, comprehensive data manipulation, and integrated clustering function Classifying Open Ended Responses National Survey of Student Engagement Experimental item set leadership Formal leadership core item 1,482 of 4,836 students listed ‘other’ Classified 830 (56%) entries Classifying Open Ended Responses Position n % of other Tutoring 145 9.8% Teaching Assistant 87 5.9% Research Assistant 60 4.0% Secretary 55 3.7% Treasurer 57 3.8% Mentor 54 3.6% Member 51 3.4% Editor 25 1.7% Classifying Open Ended Responses Position Original Option Resident Assistant Diversity Advocate Judicial Officer President Write-In Other Tutoring Teaching Assistant Treasurer Editor Did Not Complete Formal Leadership n % 34.3% 206 38.9% 28 37.7% 20 4.6% 41 n 77 44 13 5 % 53.1% 50.6% 23.6% 20.0% Completed Formal Leadership n % 65.7% 395 61.1% 44 62.3% 33 95.4% 846 n 68 43 42 20 % 46.9% 49.4% 76.4% 80.0% Clustering E-Portfolio Submissions City University of New York (CUNY) Guttman High touch, block scheduling, learning communities, summer bridge Bill and Melinda Gates grant 163 student e-portfolio introductions Clustering E-Portfolio Submissions Concept Family Learning Everyday College participation Gamming Making friends Recreation Society Technology Business Custered Terms family, york, high school, college, child class, teacher, art, math, subject know, day, love, life high school, school, attend, guttman game, movie, favorite, watch, video shy, person, friend, know, quiet art, basketball, play, sport, travel social, worker, work, believe, help technology, information, art, health, mind guttman, business, manhattan, administration, graduate Regression of Academic Preparation and Clustered Text Related to Credit Hours β Sig. SATV -0.23 0.02 SATM 0.22 0.02 WritProf 0.20 0.02 Age 0.08 0.31 -0.15 0.06 R2 0.12 Independent Variable Connection to family Implications Process of automation Considering text source Weight of sentiment Considerations Theoretical v. A-theoretical Ethical considerations Creepy treehouse Use of language Thank You