Download 1-π`1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

General circulation model wikipedia , lookup

Computer simulation wikipedia , lookup

History of numerical weather prediction wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
The Infinite Tree
Jenny Rose Finkel, Trond Grenager, and
Christopher D. Manning
Presented by
Nobuyuki Shimizu
1
Overview
Task: Unsupervised POS Tagging
It’s neither about PCFG
nor dependency parsing
It’s an extension of
Infinite Hidden Markov Model
(Beal et al. 2002, Teh et al. 2006)
The biggest difference:
Uses dependency structures given by
Penn Tree Bank.
2
Overview
• Tree Model
– Finite, Bayesian
• Background
– Dirichlet Process
– Hierarchical Dirichlet Process
• Infinite Tree Model
– Sampling
3
Uses dependency tree structures
unlike HMM
Tree Model
Hidden Markov Model
4
Independent Children
If there is at most one child, this is HMM.
Recursive form of the Viterbi Algorithm.
5
Finite Bayesian
k<C
k is the hidden state index
6
Review of Dirichlet Process
• Now we go back to Dirichlet Process.
– Tutorial by Sato
• Finite Mixture v.s. Infinite Mixture
– Stick Breaking Process
• Next few slides are shamelessly stolen
from Dirichlet Process Tutorial by Sato.
7
Finite Mixture Models1/2
Gaussian
Mean
Covariance
8
Finite Mixture Models2/2
θ1 ・・・θ5
How to determine K ?
9
Infinite Mixture Models
For each k = 1..∞,
we need
10
Graphical Model
Finite Mixture Model
α
λ
θ
Infinite Mixture Model
α
λ
θ
k
∞
θ1 ・・・θk
x
x
11
More Formally
• This is a Stick
Breaking Formulation
• What is π?
12
What is π?
This process is
Note:
13
Stick-breaking Process
• Keep breaking a stick of length 1
π’1
1-π’1
π1
π2
π’2
1-π’2
π2=π’2(1-π’1)
π1
π’3
π3
1-π’3
π3=π’3(1-π’2)(1-π’1)
i=2
3
1
14
Sampling from DP
• If there are K known topics, instead of generating
represent only the known elements
• The remaining probability mass
• Initially
and
• For i-th position, first draw a topic
– If
– Else
, find co-indexed topic
• Draw
• Let
• Draw
– Then a word drawn from
and
or
15
Hierarchical Dirichlet Process
• For each group j,
have a broken stick
•
allows sharing
components
between groups
• Stick Breaking
Formulation
16
Sampling for HDP
17
Sampling for HDP
• Basically, whenever we draw unseen topic
from
, go back up to
and get a topic
there.
• Otherwise it’s very much like DP sampling.
• Since
is discrete and we are likely to
draw the same component again, they are
shared across the groups.
• Sharing will not happen if was
continuous.
18
Sharing Components
19
Sharing Components
The same will happen to πj+1.
Parameters φ will likely be shared between groups.
(However, it’s not likely that they have the same probability)
20
Overview
• Tree Model
– Finite, Bayesian
• Background
– Dirichlet Process
– Hierarchical Dirichlet Process
• Infinite Tree Model
– Sampling
21
Infinite Tree
22
Why HDP?
• The number of k is infinite and dimension
of each πk is again infinite.
• Depending on the current node’s state, it’s
child’s state will have different distribution.
• Each state label forms a group for its
children.
23
Gibbs Sampling
(Direct Assignment)
• Each state k has local stick πk.
• mjk
– The number of elements of the finite observed portion of πk that
corresponds to βk
• njk
– The number of observations with state k whose parent’s state is j.
• Alternates 3 stages
Sample z,
Sample mjk
Then sample β
24
Sample z
25
Sample mjk
26
Sample β
27
Experiment
28
Experiment
• Start with the state label given by PTB.
• Only allow a refinement of POS tags given.
• Performance of the parser increased from
85.11% to 87.35%.
29