Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

History of randomness wikipedia, lookup

Indeterminism wikipedia, lookup

Dempster–Shafer theory wikipedia, lookup

Probability box wikipedia, lookup

Birthday problem wikipedia, lookup

Infinite monkey theorem wikipedia, lookup

Conditioning (probability) wikipedia, lookup

Ars Conjectandi wikipedia, lookup

Inductive probability wikipedia, lookup

Transcript

Some Probability Theory and Computational models A short overview Basic Probability Theory • We will only use discrete probability spaces over boolean events • A Probability distribution maps a set of events to [0,1] – P(A) is the probability that A is true – The fraction of “worlds” in which A holds • “Possible worlds” interpretation Axioms 0 <= 𝑃(𝐴) <= 1 𝑃(𝑇𝑟𝑢𝑒) = 1 𝑃(𝐹𝑎𝑙𝑠𝑒) = 0 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵) If A and B are disjoint then 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) 𝑃(𝑁𝑂𝑇 𝐴) = 1 − 𝑃(𝐴) Conditional Probability and Independence • 𝑃(𝐴|𝐵) is the fraction of worlds in which B is true, that also have A true 𝑃(𝐴|𝐵) = 𝑃(𝐴 𝑎𝑛𝑑 𝐵) 𝑃(𝐵) • Chain rule: 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐵 ∗ 𝑃(𝐴|𝐵) • If 𝑃(𝐴|𝐵) = 𝑃(𝐴) then A and B are independent – Implies that also 𝑃(𝐵|𝐴) = 𝑃(𝐵) – And that 𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵) • Conditional independence: 𝑃 𝐴 𝐵, 𝐶 = 𝑃(𝐴|𝐶) Bayes Rule 𝑃 𝐵 𝐴 ∗ 𝑃(𝐴) 𝑃 𝐴𝐵 = 𝑃(𝐵) 𝑃 𝐴 𝐵 + 𝑃 𝑁𝑜𝑡 𝐴 𝐵) = 1 𝑃 𝑋 = 𝑉𝑖 𝐵 = 1 Example • Consider two “language models” of French and English • Assume that the probability of observing a word w is – 0.01 in English text – 0.05 in French text • Assume the number of english and french texts are roughly equal • What is the probability that w is in french? Some Computational Models • Finite State Machines • Context Free Grammars • Probabilistic Variants Finite State Machines • States and transitions • Symbols on transitions • Acceptors vs. generators Markov Chains • Finite State Machines with transitions governed by probabilistic events – In conjunction with / instead of external input • Markovian property: Every transition is independent of the past, given the present state – Probability of following a path is the multiplication of probabilities of individual transitions Context Free Grammars • Context Free Grammars are a more natural model for Natural Language • Syntax rules are very easy to formulate using CFGs • Provably more expressive than Finite State Machines – E.g. Can check for balanced parentheses Context Free Grammars • Non-terminals • Terminals • Production rules – V → w where V is a non-terminal and w is a sequence of terminals and non-terminals Context Free Grammars • Can be used as acceptors • Can be used as a generative model • Similarly to the case of Finite State Machines • How long can a string generated by a CFG be? Stochastic Context Free Grammar • Non-terminals • Terminals • Production rules associated with probability – V → w where V is a non-terminal and w is a sequence of terminals and non-terminals – Markovian property is typically assumed Chomsky Normal Form • Every rule is of the form • V → V1V2 where V,V1,V2 are non-terminals • V → t where V is a non-terminal and t is a terminal Every (S)CFG can be written in this form • Makes designing many algorithms easier