Download Some Probability Theory and Computational models

Some Probability Theory and Computational models A short overview Basic Probability Theory • We will only use discrete probability spaces over boolean events • A Probability distribution maps a set of events to [0,1] – P(A) is the probability that A is true – The fraction of “worlds” in which A holds • “Possible worlds” interpretation Axioms 0 <= 𝑃(𝐴) <= 1 𝑃(𝑇𝑟𝑢𝑒) = 1 𝑃(𝐹𝑎𝑙𝑠𝑒) = 0 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵) If A and B are disjoint then 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) 𝑃(𝑁𝑂𝑇 𝐴) = 1 − 𝑃(𝐴) Conditional Probability and Independence • 𝑃(𝐴|𝐵) is the fraction of worlds in which B is true, that also have A true 𝑃(𝐴|𝐵) = 𝑃(𝐴 𝑎𝑛𝑑 𝐵) 𝑃(𝐵) • Chain rule: 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐵 ∗ 𝑃(𝐴|𝐵) • If 𝑃(𝐴|𝐵) = 𝑃(𝐴) then A and B are independent – Implies that also 𝑃(𝐵|𝐴) = 𝑃(𝐵) – And that 𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵) • Conditional independence: 𝑃 𝐴 𝐵, 𝐶 = 𝑃(𝐴|𝐶) Bayes Rule 𝑃 𝐵 𝐴 ∗ 𝑃(𝐴) 𝑃 𝐴𝐵 = 𝑃(𝐵) 𝑃 𝐴 𝐵 + 𝑃 𝑁𝑜𝑡 𝐴 𝐵) = 1 𝑃 𝑋 = 𝑉𝑖 𝐵 = 1 Example • Consider two “language models” of French and English • Assume that the probability of observing a word w is – 0.01 in English text – 0.05 in French text • Assume the number of english and french texts are roughly equal • What is the probability that w is in french? Some Computational Models • Finite State Machines • Context Free Grammars • Probabilistic Variants Finite State Machines • States and transitions • Symbols on transitions • Acceptors vs. generators Markov Chains • Finite State Machines with transitions governed by probabilistic events – In conjunction with / instead of external input • Markovian property: Every transition is independent of the past, given the present state – Probability of following a path is the multiplication of probabilities of individual transitions Context Free Grammars • Context Free Grammars are a more natural model for Natural Language • Syntax rules are very easy to formulate using CFGs • Provably more expressive than Finite State Machines – E.g. Can check for balanced parentheses Context Free Grammars • Non-terminals • Terminals • Production rules – V → w where V is a non-terminal and w is a sequence of terminals and non-terminals Context Free Grammars • Can be used as acceptors • Can be used as a generative model • Similarly to the case of Finite State Machines • How long can a string generated by a CFG be? Stochastic Context Free Grammar • Non-terminals • Terminals • Production rules associated with probability – V → w where V is a non-terminal and w is a sequence of terminals and non-terminals – Markovian property is typically assumed Chomsky Normal Form • Every rule is of the form • V → V1V2 where V,V1,V2 are non-terminals • V → t where V is a non-terminal and t is a terminal Every (S)CFG can be written in this form • Makes designing many algorithms easier

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Some Probability Theory and Computational models