Download bigdata-education

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
大数据科学与人才培养的互利关系
Education for Big Data and Big Data for Education:
Towards Integration of Big Data and Education
ChengXiang Zhai (翟成祥)
Department of Computer Science
University of Illinois at Urbana-Champaign
USA
BDSE2016, May 25, 2016, Guiyang, China
1
The Big Data revolution:
“DataScope” enhances human perception
Microscope
Telescope
DataScope
(数据镜)
2
DataScope enables prediction & optimal
decision making
Predicted Values
of Real World Variables
Predictive
Model
Teacher
Change the World
Real World
Student
Sensor 1
…
Sensor k
…
Non-Text
Data
Multiple
Predictors
(Features)
…
Joint Mining
of Non-Text
and Text
Text
Data
3
Big Data creates both challenges
and opportunities for education
• Challenges for education:
Education for Big Data
– Educate many data scientists & engineers quickly and affordably
• Opportunities for education: Big Data for Education
– Leverage Big Data technology to scale up and improve education
• Big Data and education are mutually beneficial  Integration!
– Education supplies workforce for developing innovative Big Data
technology and applications
– Big Data supplies technology for scaling up and improving quality of
education
4
Rest of the talk
1. Education for Big Data
2. Big Data for Education
3. Integration of Big Data and Education
5
Part 1: Education for Big Data
“….(in the next few years) we project a need for 1.5 million
additional analysts in the United States who can analyze data
effectively…“,
-- McKinsey Big Data Study, 2012
The need is global …
6
Educating workforce for Big Data
• Question 1: What to teach in Big Data?
PhD, MS, BS in Data Science
• Question 2: How to teach Big Data effectively at large scale with
low cost?
Massive Open Online Courses (MOOCs)
7
What to teach? New degrees in Data Science?
Cloud computing
Artificial intelligence
Operations research
Human-computer interactions
…
+ Health, Medicine, Finance,
Smart City, Education, …
Application
Highly interdisciplinary! Analysis
Acquisition
Sensor network
Internet of things
Statistical sampling
…
Aggregation
Data mining
Machine learning
Statistical modeling
Scalable systems
…
Databases
Information retrieval
NLP, Computer vision
…
8
How to teach? Emergency of Massive Open Online
Courses (MOOCs)
• Many platforms: Coursera, Edx, Udacity, 清华大学慕课平台,…
• Characteristics
– Free/affordable education at large scale on all kinds of topics
– Limited assessment support, but strong online community support
– Partnership with universities
• Early stage of “education revolution” enabled by IT & Big Data
(more later)
9
My experience with MOOCs
• Taught 2 MOOCs in 2015 = CS410 Text Info Systems at UIUC
– Text Retrieval and Search Engines
– Text Mining and Analytics
• Coordinated Data Mining Specialization: 5 courses + Capstone
–
–
–
–
–
–
Pattern Discovery
Cluster Analysis
Text Retrieval
Text Mining
Visualization
Capstone Project
10
Text Retrieval & Text Mining MOOCs
• Each lasted 4 weeks
– Modularized video lectures
– Weekly quizzes
– Programming assignment (open challenge with a leaderboard) with auto
grading
• Enrollment
–
–
–
–
~50,000 signed up
> 10,000 seriously watched lecture videos
1,000~1,500 completed the course
700~900 did programming assignments
11
Students are from all over the world!
64,651 Learners
181 Countries
12
The majority of learners are 25~44 years old
25~44 years old
13
US, India, and China have most of the learners
United States
India
China
14
Most learners have full-time job and {BS, MS} degree
15
Challenges in teaching “big data” at large scale
• General challenges in MOOCs
– Variable student background
– Variable student needs
– Reliability of assessment
• Special challenges to “big data”
– Programming assignments are essential: variable student resources &
background
– Availability of interesting real-world data sets
– Automated grading of programming assignments
16
Self-Sustaining Data Set Annotations & Open Challenge
Annotations
Annotations
...
Annotation
Assignment
Auto
Grader
...
Raw Data Set
...
Annotations
Open Challenge
Competition
Assignment
Leaderboard
#1 Team1 0.81
#2 Team 2 0.75
…
Test Collection
18
Example of a new data set (for online course retrieval)
High grades  More reliable annotations
19
Search Engine Contest: Leaderboard
20
Overall lessons from the MOOCs
• Learners of MOOCs are a different crowd than the on-campus students
– Practical mindset, self-motivated, but less background and less time
– Pre-quiz is necessary for such technical courses (set realistic expectation)
– Learners form self-supporting online communities
• Short modularized lecture videos are preferred
• Programming assignments are very much appreciated
• Crowdsourcing annotations and open competition worked well 
MOOC goes beyond education to support research!
• Limitations of current MOOCs
– Lack of “individual care” (students don’t all get the needed help)
– Solely rely on peer grading of sophisticated assignments (unreliable grading &
ineffective feedback to students)
21
Current Trend:
Integration of MOOCs and Traditional Education
Quality
Traditional Classrooms
HIGH cost
Campus Degree
+
Flipped/Blended classroom
LOW cost
Online Degree
+
High Engagement
component
MINUM cost
Specialization Certificate
MINUM cost
Course Certificate
MOOC
FREE
No Certificate
Scalability
22
A new online MOOC-based program: MCS-DS at UIUC
•
•
•
•
MCS-DS = Master of Computer Science in Data Science
Tuition = $20,000
Courses =MOOCs + High Engagement Components
Interdisciplinary
– Courses mostly offered by Computer Science Department
• Data Mining Specialization
• Cloud Computing Specialization
• Machine Learning
– Other units include School of Information Science & Statistics Department
23
Part 2: Big Data for Education
Quality
Scalable Intelligent MOOC
Small Classrooms
Towards
Intelligent
MOOC
“Big Data Technology”
Automate grading with machine learning
MOOC
Automate question answering on forums
Scalability
24
Traditional Manual Grading
Submitted Assignments
Graded Assignments
Grade:
93
85
….
Proposed Automated Grading
Submitted
Assignments
Multi-dimensional Grade Predictor
Clustering
Improvement
Graded
Assignments
Grade
Verification
Batch
grading
Detailed
Grading Results
Performance &
Behavior Analysis
25
Preliminary results on grading medical case assignments
are promising [Geigle et al. 2016]
Chase Geigle, ChengXiang Zhai, Duncan Ferguson,
An Exploration of Automated Grading of Complex
Assignments, ACM Learning at Scale 2016.
26
Towards Intelligent MOOC: Limitations of Current MOOC
• Instruction materials limited to those pre-defined by an instructor
 can’t take advantage of useful materials on the Web
• Limited search capability inside a course  can’t easily find the
most relevant video clip or discussion posts about a topic
• No understanding of students  can’t personalize the instruction
and learning experience
• Limited support for collaborative learning  can’t leverage massive
student behavior data to recommend materials for individual
students
• Limited support for interactions with students  can’t engage
students in a natural dialogue
27
Novel Features of an Intelligent MOOC
• Seamless integration of MOOC and Web search  enable students
to learn from the Web
• Concept/Topic search, navigation, and summarization  enable
students to quickly find all materials about a concept or topic
• Dynamic and adaptive student modeling  enable deep
understanding of student state of knowledge
• Lifetime learning from student behavior data enable effective
support of collaborative learning
• Interactive personalized teaching  enable personalized natural
conversations between students and the system
28
Current MOOC
Student
Record
…
Traditional
MOOC
Platform
MOOC
Course
Content
MOOC
Activity Log
29
An Intelligent MOOC
Student
Model
Open Web
Concept
Recommender
…
Student
Modeler
Interactive
Teaching
Interface
MOOC
Activity Log
Personalized
Search agent
Topic/Concept
Graph generator
Concept
Navigator
Concept/Topic
Search agent
MOOC
Course
Content
30
Part 3: Integration of Big Data and Education
Intelligent
MOOC Platform
Improve
Scalability
& Quality
Educate
?
Applied to
MOOC Log
Education Big Data
Research
& Develop
Big Data Technology
32
Toward a Cloud-based Big Data Virtual Lab
Leaderboard
#1 Team1 0.81
#2 Team 2 0.75
…
App Data 1
Big Data Tool 1
…
Log
Data
Big Data Tool 2
…
App Data N
Leaderboard
#1 Team1 0.5
#2 Team 2 0.3
…
Big Data Tool 1
Big Data Education System
…
33
Unification of education, research, and applications!
4. Industry data sets not released to students & researchers
 Privacy-preserving Big Data education & research
3. Well-archived interaction history
 Reproducibility of research
2. Encourage open exploration (research)
 Remove gap between education & research
1. Directly work on industry data sets and problems
 Remove gap between education & applications
34
Final Thoughts: Education Revolution & Automation
• Big Data and IT enable education revolution and automation toward
more affordable high-quality education
– IT enables one teacher to teach many more students than before (efficiency)
– Big Data technology would enable “automated” TA/instructor (scalability)
– Intelligent MOOC would improve quality of education at low cost
• Implications: Many traditional boundaries will likely disappear!
– No strict distinction between a teacher and a student (everyone learns from
each other)
– No strict distinction between grade levels or age groups (learn at your own
pace)
– No inherent boundaries between different courses (due to high modularization)
– No boundaries of subject areas (due to high modularization)
– No boundaries of institutions (MOOCs unify all institutions!)
35
Thank You!
Questions/Comments?
36