Download pptx

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Advanced Methods and Analysis for
the Learning and Social Sciences
PSY505
Spring term, 2012
March 26, 2012
Today’s Class
• Sequential Pattern Mining
Related to
• Association Rule Mining
• MOTIF Extraction
Similarities
• MOTIF Extraction can be seen as a type of
sequential pattern mining
– Though MOTIFs can also be non-sequential, like in
the Shananbrook et al paper
• Some SPM algorithms find simpler patterns
than MOTIF, other algorithms find more
complex patterns than MOTIF
Similarities
• Some algorithms for Sequential Pattern
Mining similar to Association Rule Mining
Association Rule Mining
• Try to automatically find if-then rules within
the data set
Sequential Pattern Mining
• Try to automatically find temporal patterns
within the data set
ARM Example
• If person X buys diapers,
• Person X buys beer
• Purchases occur at the same time
SPM Example
• If person X buys novel Foundation now,
• Person X buys novel Second Foundation in a
later transaction
• Conclusion: recommend Second Foundation
to people who have previously purchased
Foundation
SPM Example
• Many customers rent Star Wars, then the
Empire Strikes Back, then Return of the Jedi
• Doesn’t matter if they rent other stuff inbetween
SPM Example
• Many customers buy flowers, and then buy
diapers AND diaper cream several months
later
SPM Example
• Many learners become confused, then game
the system, then become frustrated, then
complete gaming the system, then become reengaged
Different Constraints than ARM
• If-then elements do not need to occur in the
same data point
• Instead
– If-then elements should have same user (or other
organizing variable)
– If elements can be within a certain time window
of each other
– Then element time should be within a certain
window after if times
Sequential Pattern Mining
• Find all subsequences in data with high
support
• Support calculated as number of sequences
that contain subsequence, divided by total
number of sequences
Sequential Pattern Mining
• What are some subsequences with high
support? (What is their support?)
•
•
•
•
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
Questions? Comments?
Algorithms for SPM
GSP (Generalized Sequential Pattern)
• Classic Algorithm
• (Srikant & Agrawal, 1996)
Data pre-processing
• Data transformed from individual actions to
sequences by user
• E.g.
• Bob: {GAMING and BORED, OFF-TASK and
BORED, ON-TASK and BORED, GAMING and
BORED, GAMING and FRUSTRATED, ON-TASK
and BORED}
Data pre-processing
• In some cases, time also included
• E.g.
• Bob: {GAMING and BORED 5:05:20, OFF-TASK
and BORED 5:05:40, ON-TASK and BORED
5:06:00, GAMING and BORED 5:06:20,
GAMING and FRUSTRATED 5:06:40, ON-TASK
and BORED 5:07:00}
Algorithm
• Take the whole set of sequences of length 1
– May include “ANDed” combinations at same time
• Find which sequences of length 1 have support over
pre-chosen threshold
• Compose potential sequences out of pairs of
sequences of length 1 with acceptable support
• Find which sequences of length 2 have support over
pre-chosen threshold
• Compose potential sequences out of triplets of
sequences of length 1 and 2 with acceptable support
• Continue until no new sequences found
Let’s execute GPS algorithm
• With min support = 50%
Let’s execute GPS algorithm
• With min support = 50%
•
•
•
•
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
Other algorithms
• Free-Span
• Prefix-Span
• Select sub-sets of data to search within
• Faster, but same basic idea as in GPS
Uses in educational domains
Perera et al. (2009)
• What were the three ways that Perera et al.
(2009) used sequential pattern mining?
• What did they learn, and how did they use the
information?
Perera et al. (2009)
1. Overall uses of collaborative tools by groups
2. Sequences of collaborative tool use by different
group members
3. Sequences of access of specific resources by
different group members
• In all cases, they found common patterns and
then looked at how support differed for
successful and unsuccessful groups
Perera et al. (2009):
Important Findings
1. Overall uses of collaborative tools by groups
– Successful groups used ticketing system more
than the wiki; weaker groups used wiki more
– Patterns were particularly strong for group
leaders
Perera et al. (2009):
Important Findings
2. Sequences of collaborative tool use by
different group members
– Successful groups characterized by leader
opening ticket and other student working on
ticket
– Successful groups characterized by students
other than leader opening ticket, and other
students working on ticket
Perera et al. (2009):
Important Findings
3. Sequences of access of specific resources by
different group members
– The best groups had interactions around the
same resource by multiple students
– The poor groups did no work on tickets before
closing them
Zhang et al. (2005)
Romero et al. (2008)
• Analyze students’ paths through learning
resources in order to find and suggest
resources for students
Robinet et al. (2007)
• Mine sequences of student actions in a system
where students are allowed to skip steps
• In order to infer intermediate/implicit steps
during algebraic manipulation
• In other words, if some students have A->B->C
• Infer that A->C has B in the middle
• Aids with choosing remedial feedback
What else?
• What else could sequential pattern mining be
used for in education?
Asgn. 8
• Solutions
• Let’s look at solutions from
– Sweet
– Mike W.
Asgn. 9
• Questions?
• Comments?
Next Class
• Wednesday, March 28
• 3pm-5pm
• AK232
• Learning Curves
• Readings
• Martin, B., Mitrovic, A., Koedinger, K.R., Mathan, S.
(2011) Evaluating and improving adaptive educational
systems with learning curves. User Modeling and UserAdapted Interaction, 21 (3), 249-283.
• Assignments Due: None
The End