Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Artificial Intelligence Lecture 4A: Probability Theory Review Outline • • • • • • Axioms of Probability Product and chain rules Bayes Theorem Random variables PDFs and CDFs Expected value and variance Introduction • Sample space - set of all possible outcomes of a random experiment – Dice roll: {1, 2, 3, 4, 5, 6} – Coin toss: {Tails, Heads} • Event space sample space - subsets of elements in a – Dice roll: {1, 2, 3} or {2, 4, 6} – Coin toss: {Tails} • 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 𝑃: • Axioms of Probability → ℝ – 0 ≤ 𝑃 𝐴 ≤ 1, for all 𝐴 ∈ – 𝑃(Ω) = 1 • 𝑃 𝐴 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 ∈ 𝐴 lim𝑛→∞ 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 examples • Coin flip – P(H) – P(T) – P(H,H,H) – P(x1=x2=x3=x4) – P({x1,x2,x3,x4} contains more than 3 heads) Set operations • Let 𝐴 = {1, 2, 3} and B = {2, 4, 6} • 𝐴 ∩ 𝐵 = 2 and 𝐴 ∪ 𝐵 = 1, 2, 3, 4, 6 • 𝐴 − 𝐵 = 1, 3 • Properties: • • • • 𝑃 𝐴 ∩ 𝐵 ≤ min 𝑃 𝐴 , 𝑃 𝐵 𝑃 𝐴∪𝐵 ≤𝑃 𝐴 +𝑃 𝐵 𝑃 Ω − 𝐴 = 1 − 𝑃(𝐴) If 𝐴 ⊆ 𝐵 then 𝑃 𝐴 ≤ 𝑃(𝐵) Conditional Probability • 𝑃 𝐴|𝐵 = 𝑃 𝐴∩𝐵 𝑃 𝐵 = 𝑃 𝐴, 𝐵 𝑃 𝐵 • A and B are independent if • 𝑃 𝐴|𝐵 = 𝑃(𝐴) • A and B are conditionally independent given C if • 𝑃 𝐴, 𝐵 𝐶) = 𝑃 𝐴 𝐶)𝑃 𝐵 𝐶) Conditional Probability • Joint probability: • 𝑃 𝐴1 𝐴2 , … , 𝐴𝑛 ) = 𝑃 𝐴1 ,…,𝐴𝑛 𝑃 𝐴2 ,…,𝐴𝑛 • Product rule: • 𝑃 𝐴1 , … , 𝐴𝑛 = 𝑃 𝐴1 𝐴2 , … , 𝐴𝑛 )𝑃 𝐴2 , … , 𝐴𝑛 • Chain rule of probability: • 𝑃 𝐴1 , … , 𝐴𝑛 = 𝑛 𝑖=1 𝑃 𝐴𝑖 𝐴1 , 𝐴2 , … 𝐴𝑖−1 ) examples • Coin flip – P(x1=H)=1/2 – P(x2=H|x1=H)=0.9 – P(x2=T|x1=T)=0.8 – P(x2=H)=? Conditional Probability • 𝑃 𝐴|𝐵 − probability of A given B • Example: – 𝑃(𝐴) probability that school is closed – 𝑃 𝐵 probability that it snows – 𝑃 𝐴|𝐵 probability of school closing if it snows – 𝑃 𝐵|𝐴 probability of snowing if school is closed Conditional Probability • 𝑃 𝐴|𝐵 − probability of A given B • Example: – 𝑃(𝐴) probability that school is closed – 𝑃 𝐵 probability that it snows P(A, B) 0.005 P(B) 0.02 P(A|B) 0.25 𝑃 𝐴|𝐵 = 𝑃 𝐴∩𝐵 𝑃 𝐵 Quiz • • • • • • • P(D1=sunny)=0.9 P(D2=sunny|D1=sunny)=0.8 P(D2=rainy|D1=sunny)=? P(D2=sunny|D1=rainy)=0.6 P(D2=rainy|D1=rainy)=? P(D2=sunny)=? P(D3=sunny)=? Joint Probability • Multiple events: cancer, test result P(has cancer, test positive) Has cancer? Test positive? P(C,TP) yes yes 0.018 yes no 0.002 no yes 0.196 no no 0.784 13 Joint Probability • The problem with joint distributions It takes 2D-1 numbers to specify them! 14 Conditional Probability • Describes the cancer test: P(test positive | has cancer) = 0.9 P(test positive | Øhas cancer) = 0.2 • Put this together with: Prior probability P(has cancer) = 0.02 P(test negative | has cancer) = 0.1 15 Conditional Probability P(C) = 0.02 • We have: P(ØC) = 0.98 P(TP | C) = 0.9 P(ØTP | C) = 0.1 P(TP |ØC) = 0.2 P(ØTP |ØC) = 0.8 • We can now calculate joint probabilities Has cancer? cancer? Has Test positive? positive? Test P(TP, C) P(TP, C) yes yes yes yes 0.018 yes yes no 0.002 no yes yes 0.196 no no 0.784 16 Conditional Probability • “Diagnostic” question: How likely do is cancer given a positive test? P(has cancer | test positive) = ? Has cancer? Test positive? P(TP, C) yes yes 0.018 yes no 0.002 no yes 0.196 no no 0.784 P(C | TP) = P(C, TP) / P(TP) = 0.018 / 0.214 = 0.084 17 Bayes Theorem • We can relate P(A|B) and P(B|A) through Bayes’ rule: 𝑃 𝐵 𝐴)𝑃(𝐴) 𝑃 𝐴 𝐵 = 𝑃(𝐵) Bayes Theorem • We can relate P(A|B) and P(B|A) through Bayes’ rule: 𝑃 𝐵 𝐴)𝑃(𝐴) 𝑃 𝐴 𝐵 = 𝑃(𝐵) Posterior Probability Likelihood Normalizing Constant Prior Probability Bayes Theorem • We can relate P(A|B) and P(B|A) through Bayes’ rule: 𝑃 𝐵 𝐴)𝑃(𝐴) 𝑃 𝐴 𝐵 = 𝑃(𝐵) • 𝑃 𝐵 can be eliminated with law of total probability: 𝑃 𝐵 = 𝑗𝑃 𝐵 𝐴𝑗 𝑃(𝐴𝑗 ) Random Variables • Random variable X is a function s.t. 𝑋∶ Ω → ℝ • Examples: – Russian roulette: X = 1 if gun fires and X = 0 otherwise 1 6 𝑃 𝑋 = 1 = and 𝑃 𝑋 = 0 = – X = # of heads in 10 coin tosses 5 6 Cumulative Distribution Functions • Defined as 𝐹𝑋 ∶ ℝ → 0, 1 such that 𝐹𝑋 𝑥 = 𝑃(𝑋 ≤ 𝑥) • Properties: • • • • 0 ≤ 𝐹𝑋 𝑥 ≤ 1 lim𝑥→−∞ 𝐹𝑋 𝑥 = 0 lim𝑥→∞ 𝐹𝑋 𝑥 = 1 𝑥 ≤ 𝑦 ⇒ 𝐹𝑋 𝑥 ≤ 𝐹𝑋 𝑦 Probability Density Functions • For discrete random variables, defined as: • 𝑓 𝑥𝑗 = 𝑃 𝑥𝑗−1 < 𝑋 ≤ 𝑥𝑗 = 𝑃(𝑥𝑗 ) • Relates to CDFs: • 𝐹𝑋 𝑥 = 𝑥𝑗 ≤𝑥 𝑓(𝑥𝑗 ) • 𝑓(𝑥𝑗 ) = 𝐹𝑋 𝑥𝑗 − 𝐹𝑋 𝑥𝑗−1 Probability Density Functions • For continuous random variables, defined as: • 𝑓 𝑥𝑗 △ 𝑥 ≈ 𝑃 𝑥 < 𝑋 ≤ 𝑥 +△ 𝑥 • Relates to CDFs: • 𝐹𝑋 𝑥 = • 𝑓𝑋 𝑥 = 𝑥 𝑓 −∞ 𝑋 𝑑𝐹𝑋 𝑥 𝑑𝑥 𝑥 𝑑𝑥 Probability Density Functions • Example: • Let X be the angular position of freely spinning pointer on a circle • 𝑓(𝑥) Probability Density Functions • Example: • Let X be the angular position of freely spinning pointer on a circle f(X) • 𝑓 𝑥 = 1 2π 1/2π X 2π Probability Density Functions • Example: • Let X be the angular position of freely spinning pointer on a circle f(X) • 𝑓 𝑥 = 1 2π 1/2π • 𝐹 𝑥 X 2π Probability Density Functions • Example: • Let X be the angular position of freely spinning pointer on a circle f(x) 1 2π 1 = 𝑥 2π • 𝑓 𝑥 = • 𝐹 𝑥 1/2π for 0 ≤ 𝑥 ≤ 2π x • 𝑃 0 ≤ 𝑥 ≤ π/3 2π F(x) 1 x 2π Probability Density Functions • Example: • Let X be the angular position of freely spinning pointer on a circle f(x) 1 2π 1 = 𝑥 2π • 𝑓 𝑥 = • 𝐹 𝑥 1/2π for 0 ≤ 𝑥 ≤ 2π • 𝑃 0 ≤ 𝑥 ≤ π/3 = 1 6 x 2π F(x) 1 x 2π Expectation • Given a discrete r.v. X and a function 𝑔 ∶ ℝ → ℝ, 𝐸𝑔 𝑋 = 𝑔 𝑥 𝑓𝑋 (𝑥) 𝑥∈𝑉𝑎𝑙(𝑋) • For a continuous r.v. X and a function 𝑔 ∶ ℝ → ℝ, ∞ 𝐸𝑔 𝑋 = 𝑔(𝑥)𝑓𝑋 𝑥 𝑑𝑥 −∞ Expectation • Expectation of a r.v. X is known as the first moment, or the mean value, of that variable >X = 𝑔 𝑋 • Properties: • 𝐸 𝑎 = 𝑎 for any constant 𝑎 ∈ ℝ • 𝐸 𝑎𝑓(𝑋) = 𝑎𝐸[𝑓 𝑋 ] for any constant 𝑎 ∈ ℝ • 𝐸 𝑓 𝑋 + 𝑔(𝑋) = 𝐸 𝑓(𝑋) + 𝐸 𝑔(𝑋) Variance • For a random variable X, define: 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 −𝐸 𝑋 • Another common form: 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 2 − 𝐸 𝑋 • Properties: 2 2 • 𝑉𝑎𝑟 𝑎 = 0 for any constant 𝑎 ∈ ℝ • 𝑉𝑎𝑟 𝑎𝑓(𝑋) = 𝑎2 𝑉𝑎𝑟[𝑓 𝑋 ] for any constant 𝑎 ∈ ℝ Gaussian Distributions • Let 𝑋 ~ 𝑁𝑜𝑟𝑚𝑎𝑙 μ, σ2 , then 1 1 − 2 𝑥−μ 2 𝑓 𝑥 = 𝑒𝑥𝑝 2σ 2𝜋σ where μrepresents the mean and σ2 the variance • Occurs naturally in many phenomena – Noise/error – Central limit theorem