Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Transcript
SEQUENTIAL PATTERNS &
THE GSP ALGORITHM
BY: JOE
CASABONA
INTRO
• What are Sequential Patterns?
• Why don't ARs suffice?
• The General Sequential Pattern Algorithm
o Finding Frequent Sets
o Candidate Generation
o Rule Generation
WHAT ARE SEQUENTIAL PATTERNS?
"Finding statistically relevant patterns between data examples
where the values are delivered in a sequence." [3]
Very similar to Association Rules, but sequence in this case
matters.
There may be times when order is important.
SEQUENTIAL PATTERN EXAMPLES
In Transaction Processing:
Do customers usually buy a new controller or a game first
after buying an Xbox?
In Text Mining:
Order of the words important for finding linguistic or
language patterns [1]
OBJECTIVE
Given a set S of input data sequences, find all sequences that
have a user-specified minimum support. This is called a
'frequent sequence' or sequential pattern. [1]
We will use the Generalized Sequential Pattern Algorithm
(GSP)
GSP
Similar to Apriori Algorithm
• Find individual items with minSupport (1-sequences)
• Use them to find 2-sequences
• Continue using k-sequences to find
(k+1)-sequences
• Stop when there are no more frequent
sequences.
Difference is in Candidate Generation
GSP: CANDIDATE GENERATION
Input : Frequent Set k-1 (F[k-1])
Output: Candidate Set C[k]
How it works:
• Join F[k-1] with F[k-1]
• Get rid of infrequent sequences (prune)
• Note: Order of items matter
CANDIDATE EXAMPLE
F[3] = <{1, 2} {4}>, <{1, 2} {5}>, <{1} {4, 5}>, <{1, 4} {6}>,
<{2} {4, 5}>, <{2} {4} {6}>
After Join: <{1, 2} {4, 5}>, <{1, 2} { 4} {6}>
After Prune: <{1, 2} {4, 5}>
C[4]= <{1, 2} {4, 5}>
RULE GENERATION
Objective not to generate rules, but it can be done.
Sequential Rule: Apply confidence to Frequent Sequences
Label Sequential Rules: Replace some elements in X with *
RERERENCES
[1] The Book I am using:
Liu, Bing. Web Data Mining, Chapter 2: Association Rules and
Sequential Patterns. Springer, December, 2006
Wikipedia:
[2] "GSP Algorithm."
http://en.wikipedia.org/wiki/GSP_Algorithm
June 3, 2008
[3] "Sequence Mining."
http://en.wikipedia.org/wiki/Sequence_mining
Oct. 30, 2008