* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download 1-π`1
Survey
Document related concepts
Transcript
The Infinite Tree Jenny Rose Finkel, Trond Grenager, and Christopher D. Manning Presented by Nobuyuki Shimizu 1 Overview Task: Unsupervised POS Tagging It’s neither about PCFG nor dependency parsing It’s an extension of Infinite Hidden Markov Model (Beal et al. 2002, Teh et al. 2006) The biggest difference: Uses dependency structures given by Penn Tree Bank. 2 Overview • Tree Model – Finite, Bayesian • Background – Dirichlet Process – Hierarchical Dirichlet Process • Infinite Tree Model – Sampling 3 Uses dependency tree structures unlike HMM Tree Model Hidden Markov Model 4 Independent Children If there is at most one child, this is HMM. Recursive form of the Viterbi Algorithm. 5 Finite Bayesian k<C k is the hidden state index 6 Review of Dirichlet Process • Now we go back to Dirichlet Process. – Tutorial by Sato • Finite Mixture v.s. Infinite Mixture – Stick Breaking Process • Next few slides are shamelessly stolen from Dirichlet Process Tutorial by Sato. 7 Finite Mixture Models1/2 Gaussian Mean Covariance 8 Finite Mixture Models2/2 θ1 ・・・θ5 How to determine K ? 9 Infinite Mixture Models For each k = 1..∞, we need 10 Graphical Model Finite Mixture Model α λ θ Infinite Mixture Model α λ θ k ∞ θ1 ・・・θk x x 11 More Formally • This is a Stick Breaking Formulation • What is π? 12 What is π? This process is Note: 13 Stick-breaking Process • Keep breaking a stick of length 1 π’1 1-π’1 π1 π2 π’2 1-π’2 π2=π’2(1-π’1) π1 π’3 π3 1-π’3 π3=π’3(1-π’2)(1-π’1) i=2 3 1 14 Sampling from DP • If there are K known topics, instead of generating represent only the known elements • The remaining probability mass • Initially and • For i-th position, first draw a topic – If – Else , find co-indexed topic • Draw • Let • Draw – Then a word drawn from and or 15 Hierarchical Dirichlet Process • For each group j, have a broken stick • allows sharing components between groups • Stick Breaking Formulation 16 Sampling for HDP 17 Sampling for HDP • Basically, whenever we draw unseen topic from , go back up to and get a topic there. • Otherwise it’s very much like DP sampling. • Since is discrete and we are likely to draw the same component again, they are shared across the groups. • Sharing will not happen if was continuous. 18 Sharing Components 19 Sharing Components The same will happen to πj+1. Parameters φ will likely be shared between groups. (However, it’s not likely that they have the same probability) 20 Overview • Tree Model – Finite, Bayesian • Background – Dirichlet Process – Hierarchical Dirichlet Process • Infinite Tree Model – Sampling 21 Infinite Tree 22 Why HDP? • The number of k is infinite and dimension of each πk is again infinite. • Depending on the current node’s state, it’s child’s state will have different distribution. • Each state label forms a group for its children. 23 Gibbs Sampling (Direct Assignment) • Each state k has local stick πk. • mjk – The number of elements of the finite observed portion of πk that corresponds to βk • njk – The number of observations with state k whose parent’s state is j. • Alternates 3 stages Sample z, Sample mjk Then sample β 24 Sample z 25 Sample mjk 26 Sample β 27 Experiment 28 Experiment • Start with the state label given by PTB. • Only allow a refinement of POS tags given. • Performance of the parser increased from 85.11% to 87.35%. 29