Download Mining Text Data for Useful Information in Higher Education John

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Mining Text Data for Useful
Information in Higher Education
John Zilvinskis
Indiana University
Institutional Researchers Credo
“We have not succeeded in answering all our
problems—indeed we sometimes feel we have not
completely answered any of them. The answers we
have found have only served to raise a whole set of
new questions. In some ways we feel that we are as
confused as ever, but we think we are confused on a
higher level and about more important things.”
Earl C. Kelley, Professor of Secondary Education at
Wayne University, 1951
Presentation Overview
1. Describe basic concepts of text mining
2. Invite presentation attendees to ask questions
and discuss application of this technology
3. List the differences in text mining software
4. Apply this technique to two real life examples
5. Provide implications and considerations
Raise your hand if…
You have a general understanding of text mining
Keep your hand up if…
You have or someone you know has participated
in a text mining project
You have played a significant role in at least one
project that used text mining
You have written code for or worked on several
text mining projects
Learning Outcomes
As a result of attending this session, participants
will be able to:
• List fundamental methodologies for organizing
text data.
• Describe how one could integrate mined text
in student learning and performance analytics.
• Compare the differences between text mining
software packages.
• Use text mining methods to refine survey
questions.
Big Data & Data Mining
Big Data (Laney)
volume (amount of data)
velocity (speed of data)
variety (range of data types and sources)
Data Mining - Applying algorithms to big data to
generate new information
Analytics
Predictive, Automated, Scale, Real time
Data mining to create “actionable intelligence”
(Campbell, DeBlois, & Oblinger, 2007, p. 42)
Learning v. Student Analytics
Text Mining
“The need to ‘turn text into numbers’ so
powerful algorithms can be applied to large
document databases”
(Miner, Delen, Elder, Fast, Hill, & Nisbet, 2012, p. 30)
Text analytics
volume (amount of data)
velocity (speed of data)
variety (range of data types and sources)
Citation
Practical Text Mining and
Statistical Analysis for
Non-structured Text Data
Applications
Miner, Delen, Elder, Fast,
Hill, & Nisbet, 2012
Text Mining Processes
Define project and identify data
Process data:
Establish a corpus, Pre-process data,
Extract knowledge
Develop models
Evaluate results
Disseminate results
Extract Knowledge
Classification
Clustering
Association
Trend analysis
Why Not Qualitative Research?
•
•
•
•
Requires extensive resources
Data must be processed in a timely fashion
Might not be practical with big data
Information must integrate with other data
What Kind of Text Can We Mine?
For What Purpose Should We Mine?
“Perhaps attendees could share what type of textbased datasets are available to them or which ones
they would like to have access to. This may help IR
staff recognize what text they have access to and
can analyze in addition to learning how they may
conduct such analyses.”
– AIR Program Reviewer
How Can We Mine Text in IR?
Kind of Data
Application essays
Written assignments
CMS postings
Student blogs
Course evaluations
Surveys
E-portfolios
Early alert, course drop text
For What Purpose
 Acceptance, enrollment
 Likelihood of passing
 Participation
 Change in student major
 Faculty success
 Open-ended questions
 Student success
 Student performance
Software
Freeware
RapidMiner
– Easy user interface, inverse
document frequencies, some
aspects for purchase
Weka/KEA
– Applicable to machine
learning, some resources
R
– Computer science heavy,
many online resources
Commercial Software
Modeler Premium
– (SPSS, IBM), strong user
interface, other analytics
tools, easy to use and
comprehensive dictionary
Enterprise Miner
– (SAS), moderate user
interface, comprehensive data
manipulation, and integrated
clustering function
Classifying Open Ended Responses
National Survey of Student Engagement
Experimental item set leadership
Formal leadership core item
1,482 of 4,836 students listed ‘other’
Classified 830 (56%) entries
Classifying Open Ended Responses
Position
n % of other
Tutoring
145
9.8%
Teaching Assistant 87
5.9%
Research Assistant 60
4.0%
Secretary
55
3.7%
Treasurer
57
3.8%
Mentor
54
3.6%
Member
51
3.4%
Editor
25
1.7%
Classifying Open Ended Responses
Position
Original Option
Resident Assistant
Diversity Advocate
Judicial Officer
President
Write-In Other
Tutoring
Teaching Assistant
Treasurer
Editor
Did Not Complete
Formal Leadership
n
%
34.3%
206
38.9%
28
37.7%
20
4.6%
41
n
77
44
13
5
%
53.1%
50.6%
23.6%
20.0%
Completed Formal
Leadership
n
%
65.7%
395
61.1%
44
62.3%
33
95.4%
846
n
68
43
42
20
%
46.9%
49.4%
76.4%
80.0%
Clustering E-Portfolio Submissions
City University of New York (CUNY) Guttman
High touch, block scheduling,
learning communities, summer bridge
Bill and Melinda Gates grant
163 student e-portfolio introductions
Clustering E-Portfolio Submissions
Concept
Family
Learning
Everyday
College participation
Gamming
Making friends
Recreation
Society
Technology
Business
Custered Terms
family, york, high school, college, child
class, teacher, art, math, subject
know, day, love, life
high school, school, attend, guttman
game, movie, favorite, watch, video
shy, person, friend, know, quiet
art, basketball, play, sport, travel
social, worker, work, believe, help
technology, information, art, health, mind
guttman, business, manhattan, administration,
graduate
Regression of Academic Preparation and
Clustered Text Related to Credit Hours
β
Sig.
SATV
-0.23
0.02
SATM
0.22
0.02
WritProf
0.20
0.02
Age
0.08
0.31
-0.15
0.06
R2
0.12
Independent Variable
Connection to family
Implications
Process of automation
Considering text source
Weight of sentiment
Considerations
Theoretical v. A-theoretical
Ethical considerations
Creepy treehouse
Use of language
Thank You