Download Some Probability Theory and Computational models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia, lookup

Indeterminism wikipedia, lookup

Randomness wikipedia, lookup

Dempster–Shafer theory wikipedia, lookup

Probability box wikipedia, lookup

Birthday problem wikipedia, lookup

Infinite monkey theorem wikipedia, lookup

Conditioning (probability) wikipedia, lookup

Ars Conjectandi wikipedia, lookup

Inductive probability wikipedia, lookup

Probabilistic context-free grammar wikipedia, lookup

Probability interpretations wikipedia, lookup

Transcript
Some Probability Theory and
Computational models
A short overview
Basic Probability Theory
• We will only use discrete probability spaces
over boolean events
• A Probability distribution maps a set of events
to [0,1]
– P(A) is the probability that A is true
– The fraction of “worlds” in which A holds
• “Possible worlds” interpretation
Axioms
0 <= 𝑃(𝐴) <= 1
𝑃(𝑇𝑟𝑢𝑒) = 1
𝑃(𝐹𝑎𝑙𝑠𝑒) = 0
𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵)
If A and B are disjoint then
𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
𝑃(𝑁𝑂𝑇 𝐴) = 1 − 𝑃(𝐴)
Conditional Probability and
Independence
• 𝑃(𝐴|𝐵) is the fraction of worlds in which B is
true, that also have A true
𝑃(𝐴|𝐵) =
𝑃(𝐴 𝑎𝑛𝑑 𝐵)
𝑃(𝐵)
• Chain rule: 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐵 ∗ 𝑃(𝐴|𝐵)
• If 𝑃(𝐴|𝐵) = 𝑃(𝐴) then A and B are independent
– Implies that also 𝑃(𝐵|𝐴) = 𝑃(𝐵)
– And that 𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵)
• Conditional independence: 𝑃 𝐴 𝐵, 𝐶 = 𝑃(𝐴|𝐶)
Bayes Rule
𝑃 𝐵 𝐴 ∗ 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)
𝑃 𝐴 𝐵 + 𝑃 𝑁𝑜𝑡 𝐴 𝐵) = 1
𝑃 𝑋 = 𝑉𝑖 𝐵 = 1
Example
• Consider two “language models” of French
and English
• Assume that the probability of observing a
word w is
– 0.01 in English text
– 0.05 in French text
• Assume the number of english and french
texts are roughly equal
• What is the probability that w is in french?
Some Computational Models
• Finite State Machines
• Context Free Grammars
• Probabilistic Variants
Finite State Machines
• States and transitions
• Symbols on transitions
• Acceptors vs. generators
Markov Chains
• Finite State Machines with transitions
governed by probabilistic events
– In conjunction with / instead of external input
• Markovian property: Every transition is
independent of the past, given the present
state
– Probability of following a path is the multiplication
of probabilities of individual transitions
Context Free Grammars
• Context Free Grammars are a more natural model for
Natural Language
• Syntax rules are very easy to formulate using CFGs
• Provably more expressive than Finite State Machines
– E.g. Can check for balanced parentheses
Context Free Grammars
• Non-terminals
• Terminals
• Production rules
– V → w where V is a non-terminal and w is a
sequence of terminals and non-terminals
Context Free Grammars
• Can be used as acceptors
• Can be used as a generative model
• Similarly to the case of Finite State Machines
• How long can a string generated by a CFG be?
Stochastic Context Free Grammar
• Non-terminals
• Terminals
• Production rules associated with probability
– V → w where V is a non-terminal and w is a
sequence of terminals and non-terminals
– Markovian property is typically assumed
Chomsky Normal Form
• Every rule is of the form
• V → V1V2 where V,V1,V2 are non-terminals
• V → t where V is a non-terminal and t is a terminal
Every (S)CFG can be written in this form
• Makes designing many algorithms easier