Download Probability-refresher

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
CS/EE 144 The ideas behind our
networked world
A probability refresher
1
Outline
•
•
•
•
•
•
•
Probability space
Random variables
Expectation
Indicator random variable
Conditional probability and conditional expectation
Important discrete random variables
Important continuous random variables
3
Experiment: toss a fair coin three times
What are the outcomes and probabilities?
H
HHH
HHT
HTH
T
HTT
H
TTH
THH
THT
T
H
T
TTT
H
T
H
T
H
T
H
T
Probability that there are two heads?
Expected number of heads?
Is number of heads independent of number of tails?
4
Probability space
A probability space consists of (Ω, 𝑃):
1. Ω is set of all possible outcomes.
2. 𝑃: Ω → 0,1 is an assignment of probabilities to outcomes. Note that
𝜔∈Ω 𝑃 𝜔 = 1
e.g. three coin tosses:
Ω = 𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇
1
𝑃 𝜔 = 8,
𝜔∈Ω
Typically interested in collections of outcomes, e.g. outcomes with at least two
heads. We refer to such collections as events.
e.g. 𝐸 = 𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻 is the event that there are at least two heads,
1
𝑃 𝐸 =2
5
Technically, more machinery is needed to define a
probability space, you can learn this in a Math course on
probability theory
6
Useful properties
Partition property: given any partition of events 𝐸1 , 𝐸2 , … , 𝐸𝑛 of Ω
𝑛
𝑃(𝐸𝑖 ) = 1
𝑖=1
Union bound: given any events 𝐸1 , 𝐸2 , … , 𝐸𝑛
𝑛
𝑃
𝑛
𝐸𝑖 ≤
𝑖=1
𝑃(𝐸𝑖 )
𝑖=1
7
Independence
Definition: Two events 𝐸1 and 𝐸2 are independent ⟺ 𝑃 𝐸1 ∩ 𝐸2 = 𝑃 𝐸1 𝑃(𝐸2 )
e.g. coin toss experiment:
𝐴 denotes the event that all the tosses have the same faces
𝐵 denotes the event that there is at most one head
𝐶 denotes the event that there is one head and one tail
1
𝑃 𝐴 =4
1
𝑃 𝐵 =2
3
𝑃 𝐶 =4
Are 𝐴 and 𝐵 independent? Yes. 𝑃 𝐴 ∩ 𝐵 =
1
8
3
HHH
HHT
HTH
HTT
Are 𝐴 and 𝐶 independent? No. 𝑃 𝐴 ∩ 𝐶 = 0
TTH
THH
THT
TTT
Are 𝐵 and 𝐶 independent? Yes. 𝑃 𝐵 ∩ 𝐶 = 8
8
Random variables
A random variable 𝑋 is a function mapping Ω to ℝ, i.e. 𝑋: Ω → ℝ
Set of possible
outcomes Ω
𝑋
ℝ
e.g. 𝑋 = number of heads in three coin tosses
𝑌 = number of tails in three coin tosses
𝑍 = 1 if all three tosses show the same face and 0 otherwise
𝑃 𝑋 = 𝑘 ≔ 𝑃 {𝜔 ∈ Ω: 𝑋 𝜔 = 𝑘}
e.g. coin toss experiment:
𝐴 denotes the event that all the tosses have the same faces, 𝑋 = 3 ∪ {𝑋 = 0}
𝐵 denotes the event that there is at most one head, 𝑋 ≤ 1
𝐶 denotes the event that there is one head and one tail, 1 ≤ 𝑋 ≤ 2
9
Independence of random variables
How might we extend the notion of independence of events to random
variables?
Might desire this: if random variables 𝑋 and 𝑌 are independent, then for any
𝐴 ⊂ ℝ and 𝐵 ⊂ ℝ, the events {𝑋 ∈ 𝐴} and {𝑌 ∈ 𝐵} are independent. It turns
out that we do not have to check all possible subsets 𝐴 and 𝐵!
Definition: Two random variables 𝑋 and 𝑌 are independent ⟺ for every 𝑥 and 𝑦,
the events 𝑋 ≤ 𝑥 and {𝑌 ≤ 𝑦} are independent.
10
Quiz
Let 𝑋 and 𝑌 be the number of heads and tails respectively in the coin toss
experiment.
• Are 𝑋 and 𝑌 identical?
No. They are different functions!
• Are the probability distributions of 𝑋 and 𝑌 identical?
Yes. In other words, “𝑋 and 𝑌 are identically distributed”.
• Are 𝑋 and 𝑌 independent?
No. For instance, the events 𝑋 = 2 and {𝑌 = 2} are not independent
since we cannot simultaneously have 2 heads and 2 tails.
11
Discrete vs. continuous random variables
A discrete random variable takes values in ℤ. It is described by a probability mass function
(pmf).
𝑃 𝑋 = 𝑘 = 𝑝𝑋 𝑘
Properties: 𝑝𝑋 𝑘 ∈ 0,1 ,
𝑘∈ℤ 𝑝𝑋 (𝑘)
=1
A continuous random variable takes values in an interval (or a collection of intervals) in ℝ. It
is described by a probability density function (pdf).
𝑏
𝑃 𝑎≤𝑋≤𝑏 =
𝑓𝑋 𝑠 𝑑𝑠
𝑎
Often, the cumulative distribution function (cdf) is also used
𝑥
𝐹𝑋 𝑥 ≔ 𝑃 𝑋 ≤ 𝑥 =
𝑓𝑋 𝑠 𝑑𝑠
−∞
𝑑
Properties: 𝑓𝑋 (𝑥) ≥ 0, 𝐹𝑋 −∞ = 0, 𝐹𝑋 +∞ = 1, 𝑓𝑋 𝑥 = 𝑑𝑥 𝐹𝑋 𝑥
12
Expectation
Long-run average of the values taken by the random variable when the same
experiment is performed repeatedly. Also called the “mean” or the “first
moment”.
𝐸𝑋 =
𝑘∈ℤ 𝑘𝑝𝑋 (𝑘)
𝐸𝑋 =
+∞
𝑠𝑓𝑋
−∞
𝑠 𝑑𝑠
If 𝑐 is a constant, then 𝐸 𝑐 = 𝑐.
Linearity of expectation
𝐸 𝑎𝑋 + 𝑏𝑌 = 𝑎𝐸 𝑋 + 𝑏𝐸[𝑌]
𝑋 and 𝑌 independent ⟹ 𝐸 𝑋𝑌 = 𝐸 𝑋 𝐸[𝑌]
13
Indicator random variable
The indicator random variable for the event 𝐸 is given by
𝟏𝐸 𝜔 =
Important fact:
1,
0,
𝑖𝑓 𝜔 ∈ 𝐸
𝑖𝑓 𝜔 ∉ 𝐸
𝐸 𝟏𝐸 = 1 ⋅ 𝑃 𝐸 + 0 ⋅ 𝑃 𝐸 𝑐 = 𝑃(𝐸)
Indicators are tremendously useful for counting expected number of occurrences!
Consider coin toss experiment. Let 𝐸𝑖 be the event that the 𝑖 𝑡ℎ toss results in a head. Then
the total number of heads is given by 𝟏𝐸1 + 𝟏𝐸2 + 𝟏𝐸3 . The expected number of heads is
𝐸 𝟏𝐸1 + 𝟏𝐸2 + 𝟏𝐸3 = 𝐸[𝟏𝐸1 ] + 𝐸[𝟏𝐸2 ] + 𝐸[𝟏𝐸3 ]
Linearity
3
= 𝑃 𝐸1 + 𝑃 𝐸2 + 𝑃 𝐸3 = 2
Note: We did not require 𝐸𝑖 to be independent or identically distributed. An example will be
given in the next two slides.
14
1.
A total of 𝑟 balls are placed, one at a time, into 𝑘 boxes. Each ball is
placed independently into box 𝑖 with probability 𝑝𝑖 (with 𝑝1 + 𝑝2 + ⋯ +
𝑝𝑘 = 1).
(a) What is the expected number of empty boxes?
(b) What is the expected number of boxes containing at least 2 balls?
15
2. There are 𝑟 red balls and 𝑏 black balls in a box. We take out the balls one at
a time randomly. What is the expected number of red balls that precede all
black balls?
16
Conditional probability
Coin toss experiment: suppose your friend tells you that there is at most one
head, what is the probability that all three tosses have the same faces?
Definition: Given two events 𝐴 and 𝐵 such that 𝑃 𝐵 > 0, the conditional
probability of 𝐴 given 𝐵 is
𝑃(𝐴 ∩ 𝐵)
𝑃 𝐴𝐵 =
𝑃(𝐵)
Events 𝐴 and 𝐵 are independent ⟺ 𝑃 𝐴 𝐵 = 𝑃(𝐴).
Law of total probability: given any partition of events 𝐸1 , 𝐸2 , … , 𝐸𝑛 of Ω where
𝑃 𝐸𝑖 > 0
𝑛
𝑃 𝐴 =
𝑃 𝐸𝑖 𝑃(𝐴|𝐸𝑖 )
𝑖=1
17
Conditional probability
e.g. coin toss experiment
𝐴 denotes the event that all the tosses have the same faces
𝐵 denotes the event that there is at most one head
𝐶 denotes the event that there is one head and one tail
𝑃 𝐴 =
1
4
𝑃 𝐵 =
1
2
𝑃 𝐶 =
𝑃 𝐴∩𝐵
1/8 1
𝑃 𝐴|𝐵 =
=
= =𝑃 𝐴
𝑃 𝐵
1/2 4
𝑃 𝐴∩𝐶
0
𝑃 𝐴|𝐶 =
=
=0≠𝑃 𝐴
𝑃 𝐶
3/4
3
4
HHH
HHT
TTH
HTT
HTH
THH
THT
TTT
18
Conditional expectation
Long-run average of the values taken by a random variable conditioned on
the occurrence of a particular event.
𝐸 𝑋|𝐴 =
𝑘∈ℤ 𝑘𝑝𝑋 (𝑘|𝐴)
𝐸 𝑋|𝐴 =
+∞
𝑠𝑓𝑋
−∞
𝑠|𝐴 𝑑𝑠
𝑋 and 𝑌 are independent ⟹ 𝐸 𝑋|𝑌 = 𝑦 = 𝐸 𝑋
Law of total expectation: given any partition of events 𝐸1 , 𝐸2 , … , 𝐸𝑛 of Ω where
𝑃 𝐸𝑖 > 0
𝑛
𝐸𝑋 =
𝑃 𝐸𝑖 𝐸 𝑋 𝐸𝑖
𝑖=1
19
Conditional expectation
Law of total expectation allows us to break a problem into pieces!
A coin has a probability 𝑝 of showing heads. Let 𝑁 be a discrete random variable
taking non-negative values. Assume that 𝑁 is independent of our coin tosses.
What is the expected number of heads that show up if we toss this coin 𝑁 times?
1, 𝑖 𝑡ℎ 𝑡𝑜𝑠𝑠 𝑖𝑠 𝑎 ℎ𝑒𝑎𝑑
𝑋𝑖 =
0, 𝑖 𝑡ℎ 𝑡𝑜𝑠𝑠 𝑖𝑠 𝑎 𝑡𝑎𝑖𝑙
Indicator random variable
𝑆 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
𝐸 𝑆 =𝐸
=𝐸
=𝐸
=𝐸
Law of total expectation
𝐸 𝑆|𝑁
𝐸 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 |𝑁
𝐸 𝑋1 𝑁 + 𝐸 𝑋2 𝑁 + ⋯ + 𝐸 𝑋𝑁 𝑁
𝑝 + 𝑝 + ⋯ + 𝑝 = 𝐸 𝑁𝑝 = 𝐸 𝑁 𝑝
Independence
Linearity
20
3. A biased coin has a probability 𝑝 of showing heads when tossed. The coin is
tossed repeatedly until a head occurs. What is the expected number of tosses
required to obtain a head? Solve this without calculating an infinite sum.
21
Know the following random variables well
Bernoulli
Binomial
Geometric
Poisson
Uniform
Exponential
Normal/Gaussian
When they arise
Distribution functions
Moments
How they relate to each other
22
Bernoulli random variable
• A biased coin has a probability 𝑝 of showing heads when tossed. Let 𝑋
take the value 1 if the outcome is heads and 0 if the outcome is tails.
• Often, the events corresponding to 𝑋 = 1 and 𝑋 = 0 are referred to as
“success” and “failure”.
1 − 𝑝, 𝑖𝑓 𝑘 = 0
𝑝𝑋 𝑘 =
𝑝,
𝑖𝑓 𝑘 = 1
𝐸 𝑋 =𝑝
𝑉𝑎𝑟 𝑋 = 𝑝(1 − 𝑝)
• Notation: 𝑋 ∼ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝)
23
Binomial random variable
• A biased coin has a probability 𝑝 of showing heads when tossed. Let 𝑋
denote the total number of heads obtained over 𝑛 tosses.
• Perform 𝑛 independent Bernoulli trials 𝑋1 , 𝑋2 , … , 𝑋𝑛 and let 𝑋 = 𝑋1 +
𝑋2 + ⋯ + 𝑋𝑛 denote the total number of successes.
𝑝𝑋 𝑘 =
𝑛
𝑘
𝑝𝑘 1 − 𝑝
𝑛−𝑘 ,
𝑘 = 0, 1, … , 𝑛
𝐸 𝑋 = 𝑛𝑝
𝑉𝑎𝑟 𝑋 = 𝑛𝑝(1 − 𝑝)
• Notation: 𝑋 ∼ 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝)
• Exercise: use indicator variables and linearity of expectation to derive the
expectation of the binomial random variable
24
Geometric random variable
• A biased coin has a probability 𝑝 of showing heads when tossed. Suppose
the coin is tossed repeatedly until a head occurs. Let 𝑋 denote the number
of tosses required to obtain a head.
• 𝑋 is also called a waiting time till the first success.
𝑝𝑋 𝑘 = 1 − 𝑝
𝐸𝑋 =
𝑘−1
𝑝, 𝑘 = 1, 2, 3, …
1
𝑝
• Notation: 𝑋 ∼ 𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑝)
25
Poisson random variable
• Number of phone calls arriving at a call center per minute.
• Number of jumps in a stock price in a given time interval.
• Number of misprints on the front page of a newspaper.
𝑝𝑋 𝑘 =
𝑒 −𝜆 𝜆𝑘
,
𝑘!
𝑘 = 0, 1, 2, …
𝐸 𝑋 =𝜆
• Notation: 𝑋 ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝜆
• Can be used to approximate a 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑛, 𝑝 when 𝑛 is very large and 𝑝
is very small by setting 𝜆 = 𝑛𝑝.
• Misprints: 𝑛 is the total number of characters and 𝑝 is the probability of a
character being misprinted.
26
Uniform random variable
•
Let 𝑋 denote a random variable that takes any value inside [𝑎, 𝑏] with equal
probability.
𝑓𝑋 𝑥 =
1
,
𝑏−𝑎
𝑖𝑓 𝑎 < 𝑥 < 𝑏
0,
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
0, 𝑖𝑓 𝑥 ≤ 𝑎
𝑥−𝑎
𝑖𝑓 𝑎 < 𝑥 < 𝑏
𝐹𝑋 𝑥 = 𝑏−𝑎 ,
1, 𝑖𝑓 𝑥 ≥ 𝑏
𝐸𝑋 =
𝑉𝑎𝑟 𝑋 =
•
𝑎+𝑏
2
𝑏−𝑎 2
12
Notation: 𝑋 ∼ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(𝑎, 𝑏)
27
Exponential random variable
• Time it takes for a server to complete a request.
• Time it takes before your next telephone call.
𝜆𝑒 −𝜆𝑥 , 𝑖𝑓 𝑥 ≥ 0
𝑓𝑋 𝑥 =
0,
𝑖𝑓 𝑥 < 0
𝐹𝑋 𝑥 =
𝐸𝑋 =
0,
𝑖𝑓 𝑥 < 0
1 − 𝑒 −𝜆𝑥 , 𝑖𝑓 𝑥 ≥ 0
1
𝜆
• Notation: 𝑋 ∼ 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙(𝜆)
28
Normal/Gaussian random variable
• Velocities of molecules in a gas.
𝑓𝑋 𝑥 =
𝑥−𝜇 2
1
−
𝑒 2𝜎2
𝜎 2𝜋
𝐸 𝑋 =𝜇
𝑉𝑎𝑟 𝑋 = 𝜎 2
• Notation: 𝑋 ∼ 𝑁𝑜𝑟𝑚𝑎𝑙(𝜇, 𝜎 2 )
• Arises as a limit of the sample average of independent and identically
distributed (i.i.d.) random variables, e.g. 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝) as 𝑛 → ∞. This is
known as the Central-Limit Theorem, which will be presented in a
subsequent lecture.
29
Questions?
30