Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS/EE 144 The ideas behind our networked world A probability refresher 1 Outline • • • • • • • Probability space Random variables Expectation Indicator random variable Conditional probability and conditional expectation Important discrete random variables Important continuous random variables 3 Experiment: toss a fair coin three times What are the outcomes and probabilities? H HHH HHT HTH T HTT H TTH THH THT T H T TTT H T H T H T H T Probability that there are two heads? Expected number of heads? Is number of heads independent of number of tails? 4 Probability space A probability space consists of (Ω, 𝑃): 1. Ω is set of all possible outcomes. 2. 𝑃: Ω → 0,1 is an assignment of probabilities to outcomes. Note that 𝜔∈Ω 𝑃 𝜔 = 1 e.g. three coin tosses: Ω = 𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇 1 𝑃 𝜔 = 8, 𝜔∈Ω Typically interested in collections of outcomes, e.g. outcomes with at least two heads. We refer to such collections as events. e.g. 𝐸 = 𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻 is the event that there are at least two heads, 1 𝑃 𝐸 =2 5 Technically, more machinery is needed to define a probability space, you can learn this in a Math course on probability theory 6 Useful properties Partition property: given any partition of events 𝐸1 , 𝐸2 , … , 𝐸𝑛 of Ω 𝑛 𝑃(𝐸𝑖 ) = 1 𝑖=1 Union bound: given any events 𝐸1 , 𝐸2 , … , 𝐸𝑛 𝑛 𝑃 𝑛 𝐸𝑖 ≤ 𝑖=1 𝑃(𝐸𝑖 ) 𝑖=1 7 Independence Definition: Two events 𝐸1 and 𝐸2 are independent ⟺ 𝑃 𝐸1 ∩ 𝐸2 = 𝑃 𝐸1 𝑃(𝐸2 ) e.g. coin toss experiment: 𝐴 denotes the event that all the tosses have the same faces 𝐵 denotes the event that there is at most one head 𝐶 denotes the event that there is one head and one tail 1 𝑃 𝐴 =4 1 𝑃 𝐵 =2 3 𝑃 𝐶 =4 Are 𝐴 and 𝐵 independent? Yes. 𝑃 𝐴 ∩ 𝐵 = 1 8 3 HHH HHT HTH HTT Are 𝐴 and 𝐶 independent? No. 𝑃 𝐴 ∩ 𝐶 = 0 TTH THH THT TTT Are 𝐵 and 𝐶 independent? Yes. 𝑃 𝐵 ∩ 𝐶 = 8 8 Random variables A random variable 𝑋 is a function mapping Ω to ℝ, i.e. 𝑋: Ω → ℝ Set of possible outcomes Ω 𝑋 ℝ e.g. 𝑋 = number of heads in three coin tosses 𝑌 = number of tails in three coin tosses 𝑍 = 1 if all three tosses show the same face and 0 otherwise 𝑃 𝑋 = 𝑘 ≔ 𝑃 {𝜔 ∈ Ω: 𝑋 𝜔 = 𝑘} e.g. coin toss experiment: 𝐴 denotes the event that all the tosses have the same faces, 𝑋 = 3 ∪ {𝑋 = 0} 𝐵 denotes the event that there is at most one head, 𝑋 ≤ 1 𝐶 denotes the event that there is one head and one tail, 1 ≤ 𝑋 ≤ 2 9 Independence of random variables How might we extend the notion of independence of events to random variables? Might desire this: if random variables 𝑋 and 𝑌 are independent, then for any 𝐴 ⊂ ℝ and 𝐵 ⊂ ℝ, the events {𝑋 ∈ 𝐴} and {𝑌 ∈ 𝐵} are independent. It turns out that we do not have to check all possible subsets 𝐴 and 𝐵! Definition: Two random variables 𝑋 and 𝑌 are independent ⟺ for every 𝑥 and 𝑦, the events 𝑋 ≤ 𝑥 and {𝑌 ≤ 𝑦} are independent. 10 Quiz Let 𝑋 and 𝑌 be the number of heads and tails respectively in the coin toss experiment. • Are 𝑋 and 𝑌 identical? No. They are different functions! • Are the probability distributions of 𝑋 and 𝑌 identical? Yes. In other words, “𝑋 and 𝑌 are identically distributed”. • Are 𝑋 and 𝑌 independent? No. For instance, the events 𝑋 = 2 and {𝑌 = 2} are not independent since we cannot simultaneously have 2 heads and 2 tails. 11 Discrete vs. continuous random variables A discrete random variable takes values in ℤ. It is described by a probability mass function (pmf). 𝑃 𝑋 = 𝑘 = 𝑝𝑋 𝑘 Properties: 𝑝𝑋 𝑘 ∈ 0,1 , 𝑘∈ℤ 𝑝𝑋 (𝑘) =1 A continuous random variable takes values in an interval (or a collection of intervals) in ℝ. It is described by a probability density function (pdf). 𝑏 𝑃 𝑎≤𝑋≤𝑏 = 𝑓𝑋 𝑠 𝑑𝑠 𝑎 Often, the cumulative distribution function (cdf) is also used 𝑥 𝐹𝑋 𝑥 ≔ 𝑃 𝑋 ≤ 𝑥 = 𝑓𝑋 𝑠 𝑑𝑠 −∞ 𝑑 Properties: 𝑓𝑋 (𝑥) ≥ 0, 𝐹𝑋 −∞ = 0, 𝐹𝑋 +∞ = 1, 𝑓𝑋 𝑥 = 𝑑𝑥 𝐹𝑋 𝑥 12 Expectation Long-run average of the values taken by the random variable when the same experiment is performed repeatedly. Also called the “mean” or the “first moment”. 𝐸𝑋 = 𝑘∈ℤ 𝑘𝑝𝑋 (𝑘) 𝐸𝑋 = +∞ 𝑠𝑓𝑋 −∞ 𝑠 𝑑𝑠 If 𝑐 is a constant, then 𝐸 𝑐 = 𝑐. Linearity of expectation 𝐸 𝑎𝑋 + 𝑏𝑌 = 𝑎𝐸 𝑋 + 𝑏𝐸[𝑌] 𝑋 and 𝑌 independent ⟹ 𝐸 𝑋𝑌 = 𝐸 𝑋 𝐸[𝑌] 13 Indicator random variable The indicator random variable for the event 𝐸 is given by 𝟏𝐸 𝜔 = Important fact: 1, 0, 𝑖𝑓 𝜔 ∈ 𝐸 𝑖𝑓 𝜔 ∉ 𝐸 𝐸 𝟏𝐸 = 1 ⋅ 𝑃 𝐸 + 0 ⋅ 𝑃 𝐸 𝑐 = 𝑃(𝐸) Indicators are tremendously useful for counting expected number of occurrences! Consider coin toss experiment. Let 𝐸𝑖 be the event that the 𝑖 𝑡ℎ toss results in a head. Then the total number of heads is given by 𝟏𝐸1 + 𝟏𝐸2 + 𝟏𝐸3 . The expected number of heads is 𝐸 𝟏𝐸1 + 𝟏𝐸2 + 𝟏𝐸3 = 𝐸[𝟏𝐸1 ] + 𝐸[𝟏𝐸2 ] + 𝐸[𝟏𝐸3 ] Linearity 3 = 𝑃 𝐸1 + 𝑃 𝐸2 + 𝑃 𝐸3 = 2 Note: We did not require 𝐸𝑖 to be independent or identically distributed. An example will be given in the next two slides. 14 1. A total of 𝑟 balls are placed, one at a time, into 𝑘 boxes. Each ball is placed independently into box 𝑖 with probability 𝑝𝑖 (with 𝑝1 + 𝑝2 + ⋯ + 𝑝𝑘 = 1). (a) What is the expected number of empty boxes? (b) What is the expected number of boxes containing at least 2 balls? 15 2. There are 𝑟 red balls and 𝑏 black balls in a box. We take out the balls one at a time randomly. What is the expected number of red balls that precede all black balls? 16 Conditional probability Coin toss experiment: suppose your friend tells you that there is at most one head, what is the probability that all three tosses have the same faces? Definition: Given two events 𝐴 and 𝐵 such that 𝑃 𝐵 > 0, the conditional probability of 𝐴 given 𝐵 is 𝑃(𝐴 ∩ 𝐵) 𝑃 𝐴𝐵 = 𝑃(𝐵) Events 𝐴 and 𝐵 are independent ⟺ 𝑃 𝐴 𝐵 = 𝑃(𝐴). Law of total probability: given any partition of events 𝐸1 , 𝐸2 , … , 𝐸𝑛 of Ω where 𝑃 𝐸𝑖 > 0 𝑛 𝑃 𝐴 = 𝑃 𝐸𝑖 𝑃(𝐴|𝐸𝑖 ) 𝑖=1 17 Conditional probability e.g. coin toss experiment 𝐴 denotes the event that all the tosses have the same faces 𝐵 denotes the event that there is at most one head 𝐶 denotes the event that there is one head and one tail 𝑃 𝐴 = 1 4 𝑃 𝐵 = 1 2 𝑃 𝐶 = 𝑃 𝐴∩𝐵 1/8 1 𝑃 𝐴|𝐵 = = = =𝑃 𝐴 𝑃 𝐵 1/2 4 𝑃 𝐴∩𝐶 0 𝑃 𝐴|𝐶 = = =0≠𝑃 𝐴 𝑃 𝐶 3/4 3 4 HHH HHT TTH HTT HTH THH THT TTT 18 Conditional expectation Long-run average of the values taken by a random variable conditioned on the occurrence of a particular event. 𝐸 𝑋|𝐴 = 𝑘∈ℤ 𝑘𝑝𝑋 (𝑘|𝐴) 𝐸 𝑋|𝐴 = +∞ 𝑠𝑓𝑋 −∞ 𝑠|𝐴 𝑑𝑠 𝑋 and 𝑌 are independent ⟹ 𝐸 𝑋|𝑌 = 𝑦 = 𝐸 𝑋 Law of total expectation: given any partition of events 𝐸1 , 𝐸2 , … , 𝐸𝑛 of Ω where 𝑃 𝐸𝑖 > 0 𝑛 𝐸𝑋 = 𝑃 𝐸𝑖 𝐸 𝑋 𝐸𝑖 𝑖=1 19 Conditional expectation Law of total expectation allows us to break a problem into pieces! A coin has a probability 𝑝 of showing heads. Let 𝑁 be a discrete random variable taking non-negative values. Assume that 𝑁 is independent of our coin tosses. What is the expected number of heads that show up if we toss this coin 𝑁 times? 1, 𝑖 𝑡ℎ 𝑡𝑜𝑠𝑠 𝑖𝑠 𝑎 ℎ𝑒𝑎𝑑 𝑋𝑖 = 0, 𝑖 𝑡ℎ 𝑡𝑜𝑠𝑠 𝑖𝑠 𝑎 𝑡𝑎𝑖𝑙 Indicator random variable 𝑆 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 𝐸 𝑆 =𝐸 =𝐸 =𝐸 =𝐸 Law of total expectation 𝐸 𝑆|𝑁 𝐸 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 |𝑁 𝐸 𝑋1 𝑁 + 𝐸 𝑋2 𝑁 + ⋯ + 𝐸 𝑋𝑁 𝑁 𝑝 + 𝑝 + ⋯ + 𝑝 = 𝐸 𝑁𝑝 = 𝐸 𝑁 𝑝 Independence Linearity 20 3. A biased coin has a probability 𝑝 of showing heads when tossed. The coin is tossed repeatedly until a head occurs. What is the expected number of tosses required to obtain a head? Solve this without calculating an infinite sum. 21 Know the following random variables well Bernoulli Binomial Geometric Poisson Uniform Exponential Normal/Gaussian When they arise Distribution functions Moments How they relate to each other 22 Bernoulli random variable • A biased coin has a probability 𝑝 of showing heads when tossed. Let 𝑋 take the value 1 if the outcome is heads and 0 if the outcome is tails. • Often, the events corresponding to 𝑋 = 1 and 𝑋 = 0 are referred to as “success” and “failure”. 1 − 𝑝, 𝑖𝑓 𝑘 = 0 𝑝𝑋 𝑘 = 𝑝, 𝑖𝑓 𝑘 = 1 𝐸 𝑋 =𝑝 𝑉𝑎𝑟 𝑋 = 𝑝(1 − 𝑝) • Notation: 𝑋 ∼ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝) 23 Binomial random variable • A biased coin has a probability 𝑝 of showing heads when tossed. Let 𝑋 denote the total number of heads obtained over 𝑛 tosses. • Perform 𝑛 independent Bernoulli trials 𝑋1 , 𝑋2 , … , 𝑋𝑛 and let 𝑋 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 denote the total number of successes. 𝑝𝑋 𝑘 = 𝑛 𝑘 𝑝𝑘 1 − 𝑝 𝑛−𝑘 , 𝑘 = 0, 1, … , 𝑛 𝐸 𝑋 = 𝑛𝑝 𝑉𝑎𝑟 𝑋 = 𝑛𝑝(1 − 𝑝) • Notation: 𝑋 ∼ 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝) • Exercise: use indicator variables and linearity of expectation to derive the expectation of the binomial random variable 24 Geometric random variable • A biased coin has a probability 𝑝 of showing heads when tossed. Suppose the coin is tossed repeatedly until a head occurs. Let 𝑋 denote the number of tosses required to obtain a head. • 𝑋 is also called a waiting time till the first success. 𝑝𝑋 𝑘 = 1 − 𝑝 𝐸𝑋 = 𝑘−1 𝑝, 𝑘 = 1, 2, 3, … 1 𝑝 • Notation: 𝑋 ∼ 𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑝) 25 Poisson random variable • Number of phone calls arriving at a call center per minute. • Number of jumps in a stock price in a given time interval. • Number of misprints on the front page of a newspaper. 𝑝𝑋 𝑘 = 𝑒 −𝜆 𝜆𝑘 , 𝑘! 𝑘 = 0, 1, 2, … 𝐸 𝑋 =𝜆 • Notation: 𝑋 ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝜆 • Can be used to approximate a 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑛, 𝑝 when 𝑛 is very large and 𝑝 is very small by setting 𝜆 = 𝑛𝑝. • Misprints: 𝑛 is the total number of characters and 𝑝 is the probability of a character being misprinted. 26 Uniform random variable • Let 𝑋 denote a random variable that takes any value inside [𝑎, 𝑏] with equal probability. 𝑓𝑋 𝑥 = 1 , 𝑏−𝑎 𝑖𝑓 𝑎 < 𝑥 < 𝑏 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0, 𝑖𝑓 𝑥 ≤ 𝑎 𝑥−𝑎 𝑖𝑓 𝑎 < 𝑥 < 𝑏 𝐹𝑋 𝑥 = 𝑏−𝑎 , 1, 𝑖𝑓 𝑥 ≥ 𝑏 𝐸𝑋 = 𝑉𝑎𝑟 𝑋 = • 𝑎+𝑏 2 𝑏−𝑎 2 12 Notation: 𝑋 ∼ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(𝑎, 𝑏) 27 Exponential random variable • Time it takes for a server to complete a request. • Time it takes before your next telephone call. 𝜆𝑒 −𝜆𝑥 , 𝑖𝑓 𝑥 ≥ 0 𝑓𝑋 𝑥 = 0, 𝑖𝑓 𝑥 < 0 𝐹𝑋 𝑥 = 𝐸𝑋 = 0, 𝑖𝑓 𝑥 < 0 1 − 𝑒 −𝜆𝑥 , 𝑖𝑓 𝑥 ≥ 0 1 𝜆 • Notation: 𝑋 ∼ 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙(𝜆) 28 Normal/Gaussian random variable • Velocities of molecules in a gas. 𝑓𝑋 𝑥 = 𝑥−𝜇 2 1 − 𝑒 2𝜎2 𝜎 2𝜋 𝐸 𝑋 =𝜇 𝑉𝑎𝑟 𝑋 = 𝜎 2 • Notation: 𝑋 ∼ 𝑁𝑜𝑟𝑚𝑎𝑙(𝜇, 𝜎 2 ) • Arises as a limit of the sample average of independent and identically distributed (i.i.d.) random variables, e.g. 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝) as 𝑛 → ∞. This is known as the Central-Limit Theorem, which will be presented in a subsequent lecture. 29 Questions? 30