Download Some Probability Theory and Computational models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia , lookup

Indeterminism wikipedia , lookup

Randomness wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Probability box wikipedia , lookup

Birthday problem wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Some Probability Theory and
Computational models
A short overview
Basic Probability Theory
• We will only use discrete probability spaces
over boolean events
• A Probability distribution maps a set of events
to [0,1]
– P(A) is the probability that A is true
– The fraction of “worlds” in which A holds
• “Possible worlds” interpretation
Axioms
0 <= 𝑃(𝐴) <= 1
𝑃(𝑇𝑟𝑢𝑒) = 1
𝑃(𝐹𝑎𝑙𝑠𝑒) = 0
𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵)
If A and B are disjoint then
𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
𝑃(𝑁𝑂𝑇 𝐴) = 1 − 𝑃(𝐴)
Conditional Probability and
Independence
• 𝑃(𝐴|𝐵) is the fraction of worlds in which B is
true, that also have A true
𝑃(𝐴|𝐵) =
𝑃(𝐴 𝑎𝑛𝑑 𝐵)
𝑃(𝐵)
• Chain rule: 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐵 ∗ 𝑃(𝐴|𝐵)
• If 𝑃(𝐴|𝐵) = 𝑃(𝐴) then A and B are independent
– Implies that also 𝑃(𝐵|𝐴) = 𝑃(𝐵)
– And that 𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵)
• Conditional independence: 𝑃 𝐴 𝐵, 𝐶 = 𝑃(𝐴|𝐶)
Bayes Rule
𝑃 𝐵 𝐴 ∗ 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)
𝑃 𝐴 𝐵 + 𝑃 𝑁𝑜𝑡 𝐴 𝐵) = 1
𝑃 𝑋 = 𝑉𝑖 𝐵 = 1
Example
• Consider two “language models” of French
and English
• Assume that the probability of observing a
word w is
– 0.01 in English text
– 0.05 in French text
• Assume the number of english and french
texts are roughly equal
• What is the probability that w is in french?
Some Computational Models
• Finite State Machines
• Context Free Grammars
• Probabilistic Variants
Finite State Machines
• States and transitions
• Symbols on transitions
• Acceptors vs. generators
Markov Chains
• Finite State Machines with transitions
governed by probabilistic events
– In conjunction with / instead of external input
• Markovian property: Every transition is
independent of the past, given the present
state
– Probability of following a path is the multiplication
of probabilities of individual transitions
Context Free Grammars
• Context Free Grammars are a more natural model for
Natural Language
• Syntax rules are very easy to formulate using CFGs
• Provably more expressive than Finite State Machines
– E.g. Can check for balanced parentheses
Context Free Grammars
• Non-terminals
• Terminals
• Production rules
– V → w where V is a non-terminal and w is a
sequence of terminals and non-terminals
Context Free Grammars
• Can be used as acceptors
• Can be used as a generative model
• Similarly to the case of Finite State Machines
• How long can a string generated by a CFG be?
Stochastic Context Free Grammar
• Non-terminals
• Terminals
• Production rules associated with probability
– V → w where V is a non-terminal and w is a
sequence of terminals and non-terminals
– Markovian property is typically assumed
Chomsky Normal Form
• Every rule is of the form
• V → V1V2 where V,V1,V2 are non-terminals
• V → t where V is a non-terminal and t is a terminal
Every (S)CFG can be written in this form
• Makes designing many algorithms easier