Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Infinite Tree Jenny Rose Finkel, Trond Grenager, and Christopher D. Manning Presented by Nobuyuki Shimizu 1 Overview Task: Unsupervised POS Tagging It’s neither about PCFG nor dependency parsing It’s an extension of Infinite Hidden Markov Model (Beal et al. 2002, Teh et al. 2006) The biggest difference: Uses dependency structures given by Penn Tree Bank. 2 Overview • Tree Model – Finite, Bayesian • Background – Dirichlet Process – Hierarchical Dirichlet Process • Infinite Tree Model – Sampling 3 Uses dependency tree structures unlike HMM Tree Model Hidden Markov Model 4 Independent Children If there is at most one child, this is HMM. Recursive form of the Viterbi Algorithm. 5 Finite Bayesian k<C k is the hidden state index 6 Review of Dirichlet Process • Now we go back to Dirichlet Process. – Tutorial by Sato • Finite Mixture v.s. Infinite Mixture – Stick Breaking Process • Next few slides are shamelessly stolen from Dirichlet Process Tutorial by Sato. 7 Finite Mixture Models1/2 Gaussian Mean Covariance 8 Finite Mixture Models2/2 θ1 ・・・θ5 How to determine K ? 9 Infinite Mixture Models For each k = 1..∞, we need 10 Graphical Model Finite Mixture Model α λ θ Infinite Mixture Model α λ θ k ∞ θ1 ・・・θk x x 11 More Formally • This is a Stick Breaking Formulation • What is π? 12 What is π? This process is Note: 13 Stick-breaking Process • Keep breaking a stick of length 1 π’1 1-π’1 π1 π2 π’2 1-π’2 π2=π’2(1-π’1) π1 π’3 π3 1-π’3 π3=π’3(1-π’2)(1-π’1) i=2 3 1 14 Sampling from DP • If there are K known topics, instead of generating represent only the known elements • The remaining probability mass • Initially and • For i-th position, first draw a topic – If – Else , find co-indexed topic • Draw • Let • Draw – Then a word drawn from and or 15 Hierarchical Dirichlet Process • For each group j, have a broken stick • allows sharing components between groups • Stick Breaking Formulation 16 Sampling for HDP 17 Sampling for HDP • Basically, whenever we draw unseen topic from , go back up to and get a topic there. • Otherwise it’s very much like DP sampling. • Since is discrete and we are likely to draw the same component again, they are shared across the groups. • Sharing will not happen if was continuous. 18 Sharing Components 19 Sharing Components The same will happen to πj+1. Parameters φ will likely be shared between groups. (However, it’s not likely that they have the same probability) 20 Overview • Tree Model – Finite, Bayesian • Background – Dirichlet Process – Hierarchical Dirichlet Process • Infinite Tree Model – Sampling 21 Infinite Tree 22 Why HDP? • The number of k is infinite and dimension of each πk is again infinite. • Depending on the current node’s state, it’s child’s state will have different distribution. • Each state label forms a group for its children. 23 Gibbs Sampling (Direct Assignment) • Each state k has local stick πk. • mjk – The number of elements of the finite observed portion of πk that corresponds to βk • njk – The number of observations with state k whose parent’s state is j. • Alternates 3 stages Sample z, Sample mjk Then sample β 24 Sample z 25 Sample mjk 26 Sample β 27 Experiment 28 Experiment • Start with the state label given by PTB. • Only allow a refinement of POS tags given. • Performance of the parser increased from 85.11% to 87.35%. 29