Download Introduction to Probability

Document related concepts
no text concepts found
Transcript
Intro to Probability
Zhi Wei
1
Outline



Basic concepts in probability theory
Random variable and probability
distribution
Bayes’ rule
2
Introduction

Probability is the study of randomness and
uncertainty.

In the early days, probability was associated with
games of chance (gambling).
3
Simple Games Involving Probability
Game: A fair die is rolled. If the result is 2, 3, or 4,
you win $1; if it is 5, you win $2; but if it is 1 or 6,
you lose $3.
Should you play this game?
4
Random Experiment





a random experiment is a process whose outcome is
uncertain.
Examples:
Tossing a coin once or several times
Picking a card or cards from a deck
Measuring temperature of patients
...
5
Events & Sample Spaces
Sample Space
The sample space is the set of all possible outcomes.
Simple Events
The individual outcomes are called simple events.
Event
An event is any subset of the
whole sample space
6
Example
Experiment: Toss a coin 3 times.

Sample space
 = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.

Examples of events include

A = {at least two heads}
= {HHH, HHT,HTH, THH}

B = {exactly two tails.}
= {HTT, THT,TTH}
7
Basic Concepts (from Set Theory)

A  B, the union of two events A and B, is the event
consisting of all outcomes that are either in A or in B or in
both events.

A  B (AB), the intersection of two events A and B, is the
event consisting of all outcomes that are in both events.

Ac, the complement of an event A, is the set of all
outcomes in  that are not in A.

A-B, the set difference, is the event consisting of all
outcomes that in A but not in B

When two events A and B have no outcomes in common,
they are said to be mutually exclusive, or disjoint, events.
8
Example
Experiment: toss a coin 10 times and the number of heads is observed.

Let A = { 0, 2, 4, 6, 8, 10}.

B = { 1, 3, 5, 7, 9}, C = {0, 1, 2, 3, 4, 5}.

A  B= {0, 1, …, 10} = .

A  B contains no outcomes. So A and B are mutually exclusive.

Cc = {6, 7, 8, 9, 10}, A  C = {0, 2, 4}, A-C={6,8,10}
9
Rules

Commutative Laws:


Associative Laws:




A  B = B  A, A  B = B  A
(A  B)  C = A  (B  C )
(A  B)  C = A  (B  C) .
Distributive Laws:

(A  B)  C = (A  C)  (B  C)

(A  B)  C = (A  C)  (B  C)
DeMorgan’s Laws:
c


c
  Ai    Ai ,
i 1
 i 1 
n
n
c
n


c
  Ai    Ai .
i 1
 i 1 
n
10
Venn Diagram

A
A∩B
B
11
Probability

A Probability is a number assigned to each subset (events) of a
sample space .

Probability distributions satisfy the following rules:
12
Axioms of Probability

For any event A, 0  P(A)  1.

P() =1.

If A1, A2, … An is a partition of A, then
P(A) = P(A1) + P(A2) + ...+ P(An)
(A1, A2, … An is called a partition of A if A1  A2  … An = A
and A1, A2, … An are mutually exclusive.)
13
Properties of Probability

For any event A, P(Ac) = 1 - P(A).

P(A-B)=P(A) – P(A  B)


If A  B, then P(A - B) = P(A) – P(B)
For any two events A and B,
P(A  B) = P(A) + P(B) - P(A  B).
For three events, A, B, and C,
P(ABC) = P(A) + P(B) + P(C) P(AB) - P(AC) - P(BC) + P(AB C).
14
Example

In a certain population, 10% of the people are rich, 5% are
famous, and 3% are both rich and famous. A person is
randomly selected from this population. What is the chance
that the person is



not rich?
rich but not famous?
either rich or famous?
15
Joint Probability


For events A and B, joint probability
Pr(AB) stands for the probability that
both events happen.
Example: A={HT}, B={HT, TH}, what is the
joint probability Pr(AB)?
16
Independence

Two events A and B are independent in case
Pr(AB) = Pr(A)Pr(B)
17
Independence

Two events A and B are independent in case
Pr(AB) = Pr(A)Pr(B)

Example 1: Drug test
A = {A patient is a Women}
Women
Men
Success
200
1800
B = {Drug fails}
Failure
1800
200
Are A and B independent?
18
Independence

Two events A and B are independent in case
Pr(AB) = Pr(A)Pr(B)

Example 1: Drug test
A = {A patient is a Women}
Women
Men
Success
200
1800
B = {Drug fails}
Failure
1800
200
Are A and B independent?
Example 2: toss a coin 3 times, Let
 = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
A={having both T and H}, B={at most one T}
Are A and B independent? How about toss 2 times?19

Independence

If A and B independent, then A and Bc, B and Ac, Ac and Bc
independent

Consider the experiment of tossing a coin twice
Example I:




A = {HT, HH}, B = {HT}
Will event A independent from event B?
Example II:


A = {HT}, B = {TH}
Is event A independent from event B?

Disjoint  Independence

If A is independent from B, B is independent from C, will A
be independent from C?
20
Conditioning

If A and B are events with Pr(A) > 0, the conditional
probability of B given A is
Pr( AB)
Pr( B | A) 
Pr( A)
21
Conditioning

If A and B are events with Pr(A) > 0, the conditional
probability of B given A is
Pr( AB)
Pr( B | A) 
Pr( A)

Example: Drug test
Women
Men
Success
200
1800
Failure
1800
200
A = {Patient is a Women}
B = {Drug fails}
Pr(B|A) = ?
Pr(A|B) = ?
22
Conditioning

If A and B are events with Pr(A) > 0, the conditional
probability of B given A is
Pr( AB)
Pr( B | A) 
Pr( A)


Example: Drug test
Women
Men
Success
200
1800
Failure
1800
200
A = {Patient is a Women}
B = {Drug fails}
Pr(B|A) = ?
Pr(A|B) = ?
Given A is independent from B, what is the relationship
between Pr(A|B) and Pr(A)?
23
Which Drug is Better ?
24
Simpson’s Paradox: View I
Drug II is better than Drug I
A = {Using Drug I}
Drug I
Drug II
B = {Using Drug II}
Success
219
1010
C = {Drug succeeds}
Failure
1801
1190
Pr(C|A) ~ 10%
Pr(C|B) ~ 50%
25
Simpson’s Paradox: View II
Female Patient
A = {Using Drug I}
B = {Using Drug II}
C = {Drug succeeds}
Pr(C|A) ~ 10%
Pr(C|B) ~ 5%
26
Simpson’s Paradox: View II
Female Patient
Male Patient
A = {Using Drug I}
A = {Using Drug I}
B = {Using Drug II}
B = {Using Drug II}
C = {Drug succeeds}
C = {Drug succeeds}
Pr(C|A) ~ 10%
Pr(C|A) ~ 100%
Pr(C|B) ~ 5%
Pr(C|B) ~ 50%
27
Simpson’s Paradox: View II
Drug
I is better thanMale
Drug
II
Patient
Female
Patient
A = {Using Drug I}
A = {Using Drug I}
B = {Using Drug II}
B = {Using Drug II}
C = {Drug succeeds}
C = {Drug succeeds}
Pr(C|A) ~ 10%
Pr(C|A) ~ 100%
Pr(C|B) ~ 5%
Pr(C|B) ~ 50%
28
Conditional Independence


Event A and B are conditionally independent
given C in case
Pr(AB|C)=Pr(A|C)Pr(B|C)
A set of events {Ai} is conditionally independent
given C in case
n
Pr( A1... An | C )   Pr( Ai | C )
i 1
29
Conditional Independence (cont’d)

Example: There are three events: A, B, C






Pr(A) = Pr(B) = Pr(C) = 1/5
Pr(AC) = Pr(BC) = 1/25, Pr(AB) = 1/10
Pr(ABC) = 1/125
Whether A, B are independent?
Whether A, B are conditionally independent
given C?
A and B are independent  A and B are
conditionally independent
30
Outline



Basic concepts in probability theory
Random variable and probability
distribution
Bayes’ rule
31
Random Variable and Distribution


A random variable X is a numerical outcome of
a random experiment
The distribution of a random variable is the
collection of possible outcomes along with their
probabilities:


Categorical case: Pr( X  x)  p ( x)
b
Numerical case: Pr(a  X  b)  p ( x)dx
a
32
Random Variables Distributions

Cumulative Probability Distribution (CDF):
• Probability Density Function (PDF):
33
Random Variable: Example



Let S be the set of all sequences of three rolls of
a die. Let X be the sum of the number of dots on
the three rolls.
What are the possible values for X?
Pr(X = 5) = ?, Pr(X = 10) = ?
34
Expectation value

Division of the stakes problem
Henry and Tony play a game. They toss a fair coin, if get
a Head, Henry wins; Tail, Tony wins. They contribute
equally to a prize pot of $100, and agree in advance that
the first player who has won 3 rounds will collect the
entire prize. However, the game is interrupted for some
reason after 3 rounds. They got 2 H and 1 T. How should
they divide the pot fairly?
a)
b)
It seems unfair to divide the pot equally Since Henry has won 2
out of 3 rounds. Then, how about Henry gets 2/3 of $100?
Other thoughts?
X
0
100
P
0.25
0.75
X is what Henry will win if the game not interrupted
35
Expectation

Definition: the expectation of a random
variable is

E[ X ]   x x Pr( X  x) , discrete case

E[ X ]   xp ( x)dx

Properties



, continuous case
Summation: For any n≥1, and any constants k1,…,kn
n
n
i 1
i 1
E[ ki X i ]   ki E ( X i )

Product: If X1, X2, …, Xn are independent
n
n
i 1
i 1
E[ X i ]   E ( X i )
36
Expectation: Example




Let S be the set of all sequence of three rolls of
a die. Let X be the sum of the number of dots on
the three rolls.
What is E(X)?
Let S be the set of all sequence of three rolls of
a die. Let X be the product of the number of
dots on the three rolls.
What is E(X)?
37
Variance

Definition: the variance of a random variable X
is the expectation of (X-E[x])2 :
Var ( X )  E (( X  E[ X ]) 2 )
 E ( X 2  E[ X ]2  2 XE[ X ])
 E ( X 2 )  E[ X ]2  2 E ( XE[ X ])

Properties


 E[ X 2 ]  E[ X ]2
For any constant C, Var(CX)=C2Var(X)
If X1, X2, …, Xn are independent
Var ( X1  X 2  ...  X n )  Var ( X1 )  Var ( X 2 )  ...  Var ( X n )
38
Bernoulli Distribution


The outcome of an experiment can either be
success (i.e., 1) and failure (i.e., 0).
Pr(X=1) = p, Pr(X=0) = 1-p, or
1 x
p ( x)  p (1  p)
x


E[X] = p, Var(X) = p(1-p)
Using sample() to generate Bernoulli samples
> n = 10; p = 1/4;
> sample(0:1, size=n, replace=TRUE, prob=c(1-p, p))
[1] 0 1 0 0 0 1 0 0 0 1
39
Binomial Distribution

n draws of a Bernoulli distribution


Xi~Bernoulli(p), X=i=1n Xi, X~Bin(p, n)
Random variable X stands for the number of
times that experiments are successful.
 n  x
n x
  p (1  p) x  0,1,2,..., n
Pr( X  x)  p ( x)   x 
0 otherwise





n = the number of trials
x = the number of successes
p = the probability of success
E[X] = np, Var(X) = np(1-p)
40
0.20
0.15
0.10
> dbinom(3,7,0.5)
[1] 0.2734375
>barplot(dbinom(0:7,7,0.
5),names.arg=0:7)
0.05

dbinom(x, size, prob)
Try 7 times, equally
likely succeed or fail
0.00

0.25
the binomial distribution in R
0
1
2
3
4
5
6
7
what if p ≠ 0.5?
0.1
0.2
0.3
0.4
> barplot(dbinom(0:7,7,0.1),names.arg=0:7)
0.0

0
1
2
3
4
5
6
7
0.0
0.00
0.05
0.1
0.10
0.2
0.15
0.3
0.20
0.4
0.25
Which distribution has greater variance?
0
1
2
3
4
5
6
7
p = 0.5
var = n*p*(1-p) = 7*0.5*0.5 = 7*0.25
0
1
2
3
4
5
6
7
p = 0.1
var = n*p*(1-p) = 7*0.1*0.9=7*0.09
briefly comparing an experiment to a distribution
nExpr = 1000
theoretical
tosses = 7; y=rep(0,nExpr); distribution
for (i in 1:nExpr) {
x = sample(c("H","T"),
tosses, replace = T)
y[i] = sum(x=="H")
}
hist(y,breaks=-0.5:7.5)
lines(0:7,dbinom(0:7,7,0.5)*
nExpr)
points(0:7,dbinom(0:7,7,0.5
)*nExpr)
150
100
50
0
Frequency
200
250
300
Histogram of y
result of
1000
trials
0
2
4
y
6
1.0
0.6
0.2
0.4
cumulative distribution
0.6
0.4
0.0
0.2
0.0
probability distribution
0.8
0.8
1.0
Cumulative distribution
0
1
2
3
4
P(X=x)
5
6
7
0
1
2
3
4
5
6
7
P(X≤x)
> barplot(dbinom(0:7,7,0.5),names.arg=0:7) > barplot(pbinom(0:7,7,0.5),names.arg=0:7)
0.0
0.0
0.2
0.2
0.6
0
1
2
3
4
P(X=x)
5
6
7
0.4
0.6
cumulative distribution
0.4
probability distribution
0.8
0.8
1.0
1.0
cumulative distribution
0
1
2
3
4
5
P(X≤x)
6
7
example: surfers on a website
Your site has a lot of visitors 45% of
whom are female
 You’ve created a new section on gardening
 Out of the first 100 visitors, 55 are
female.
 What is the probability that this many or
more of the visitors are female?
 P(X≥55) = 1 – P(X≤54) = 1pbinom(54,100,0.45)

Another way to calculate cumulative probabilities
?pbinom
 P(X≤x) = pbinom(x, size, prob, lower.tail
= T)
 P(X>x) = pbinom(x, size, prob, lower.tail
= F)

> 1-pbinom(54,100,0.45)
[1] 0.02839342
> pbinom(54,100,0.45,lower.tail=F)
[1] 0.02839342
0.04
0.02
what is the area
under the curve?
0.00
probability distribution
0.06
Female surfers visiting a section of a website
0
6 13 21 29 37 45 53 61 69 77 85 93
1.0
Cumulative distribution
0.6
0.4
0.2
cumulative distribution
0.8
> 1-pbinom(54,100,0.45)
[1] 0.02839342
0.0
<3 %
0
6 13 21 29 37 45 53 61 69 77 85 93
Plots of Binomial Distribution
51
Another discrete distribution: hypergeometric
Randomly draw n elements without
replacement from a set of N elements, r of
which are S’s (successes) and (N-r) of
which are F’s (failures)
 hypergeometric random variable x is the
number of S’s in the draw of n elements

 r  N  r 
 

x  n  x 

p ( x) 
N
 
n 
hypergeometric example








fortune cookies
there are N = 20 fortune cookies
r = 18 have a fortune, N-r = 2 are empty
What is the probability that out of n = 5 cookies,
s=5 have a fortune (that is we don’t notice that
some cookies are empty)
> dhyper(5, 18, 2, 5)
[1] 0.5526316
So there is a greater than 50% chance that we
won’t notice.
Gene Set Enrichment Analysis
hypergeometric and binomial
0.5
binomial
0.2
0.3
0.4
hypergeometric
0.1

When the population N is (very) big, whether one
samples with or without replacement is pretty
much the same
100 cookies, 10 of which are empty
0.0

1
2
3
4
5
number of full cookies out of 5
code aside
>
>
>
>
>
x = 1:5
y1 = dhyper(1:5,90,10,5) hypergeometric probability
y2 = dbinom(1:5,5,0.9) binomial probability
tmp = as.matrix(t(cbind(y1,y2)))
barplot(tmp,beside=T,names.arg=x)
Poisson distribution

# of events in a given interval


e.g. number of light bulbs burning out in a
building in a year
# of people arriving in a queue per minute
x l
p( x) 


le
x!
l = mean # of events in a given interval
E[X] = l, Var(X) = l
Example: Poisson distribution
You got a box of 1,000 widgets.
 The manufacturer says that the failure
rate is 5 per box on average.
 Your box contains 10 defective widgets.
What are the odds?
> ppois(9,5,lower.tail=F)
[1] 0.03182806
 Less than 3%, maybe the manufacturer is
not quite honest.
 Or the distribution is not Poisson?

Poisson approximation to binomial
If n is large (e.g. > 100) and n*p is moderate (p
should be small) (e.g. < 10), the Poisson is a
good approximation to the binomial with l = n*p
0.05
0.10
0.15
binomial
Poisson
0.00

0
1
2
3
4
5
6
7
8
9
11
13
15
Plots of Poisson Distribution
59
Normal (Gaussian) Distribution
Normal distribution (aka “bell curve”)
 fits many biological data well


e.g. height, weight
serves as an approximation to binomial,
hypergeometric, Poisson because of the
Central Limit Theorem
 Well studied

Normal (Gaussian) Distribution

X~N(,)
 ( x   )2 
p ( x) 
exp 
2 
2
2


2
1
b
b
a
a
Pr(a  X  b)   p ( x)dx  


 ( x   )2 
exp 
dx

2
2
2 2


1
E[X]= , Var(X)= 2
If X1~N(1,1), X2~N(2,2), and X1, X2 are
independent


X= X1+ X2 ?
X= X1- X2 ?
61
sampling from a normal distribution
0.0
0.1
0.2
0.3
0.4
Histogram of x
Density
x <- rnorm(1000)
h <- hist(x, plot=F)
ylim <range(0,h$density,dnor
m(0))
hist(x,freq=F,ylim=ylim)
curve(dnorm(x),add=T)
-4
-2
0
x
2
4
Normal Approximation based on Central Limit Theorem

Central Limit Theorem
If xi ~i.i.d with (μ, σ2) and when n is large, then
(x1+…+xn)/n ~ N(μ, σ2/n)
Or (x1+…+xn) ~ N(nμ, nσ2)


Example

A population is evenly divided on an issue (p=0.5). For a
random sample of size 1000, what is the probability of having
≥550 in favor of it?
n=1000, xi~Bernoulli (p=0.5), i.e. E(xi)=p; V(xi)=p(1-p)
(x1+…+xn) ~ Binomial(n=1000, p=0.5)
Pr((x1+…+xn)>=550) =1-pbinom(550,1000,0.5)
Normal Approximation:
(x1+…+xn) ~ N(np, np(1-p))=N(500, 250)
Pr((x1+…+xn)>=550) =1-pnorm(550, mean=500, sd=sqrt(250))
63
d, p, q, and r functions in R


In R, a set of functions have been implemented
for each of almost all known distributions.
r<distname>(n,<parameters>)


Possible distributions: binom, pois, hyper, norm, beta,
chisq, f, gamma, t, unif, etc
You find other characteristics of distributions as
well



d<dist>(x,<parameters>): density at x
p<dist>(x,<parameters>): cumulative distribution
function to x
q<dist>(p,<parameters>): inverse cdf
64
Example: Uniform Distribution

The uniform distr. On [a,b] has two
parameter. The family name is unif. In R,
the parameters are named min and max
> dunif(x=1, min=0, max=3)
[1] 0.3333333
> punif(q=2, min=0, max=3)
[1] 0.6666667
> qunif(p=0.5, min=0, max=3)
[1] 1.5
> runif(n=5, min=0, max=3)
[1] 2.7866852 1.9627136 0.9594195 2.6293273 1.6277597
65
Lab Exercise
Using R for Introductory Statistics
Page 39: 2.4, 2.8-2.10
Page 54: 2.16, 2.23, 2.35, 2.26
Page 66: 2.30, 2.32, 2.34-2.36, 2.39, 2.41,
2.42, 2.43-2.46

66
Outline



Basic concepts in probability theory
Random variable and probability
distribution
Bayes’ rule
67
Bayes’ Rule

Given two events A and B and suppose that Pr(A) > 0. Then

Pr( AB) Pr( A | B) Pr( B)
Pr( B | A) 

Pr( A)
Pr( A)
Example:
Pr(R) = 0.8
R: It is a rainy day
Pr(W|R)
R
R
W
0.7
0.4
W
0.3
0.6
W: The grass is wet
Pr(R|W) = ?
68
Bayes’ Rule
R
R
W
0.7
0.4
W
0.3
0.6
R: It rains
W: The grass is wet
Information
Pr(W|R)
R
W
Inference
Pr(R|W)
69
Bayes’ Rule
R
R
W
W
0.7
0.4
0.3
0.6
R: The weather rains
W: The grass is wet
Information: Pr(E|H)
Hypothesis H
Posterior
Likelihood
Inference:
Pr(H|E)
Evidence E
Prior
Pr( E | H ) Pr( H )
Pr( H | E ) 
Pr( E )
70
Summation (Integration) out tip

Suppose that B1, B2, … Bk form a partition of :
B1  B2  … Bk =  and B1, B2, … Bk are mutually exclusive
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then
Pr( A | Bi ) Pr( Bi )
Pr( Bi | A) 
Pr( A)
Pr( A | Bi ) Pr( Bi )

k
 j 1 Pr( AB j )

Pr( A | Bi ) Pr( Bi )

k
Pr( B j ) Pr( A |
j 1
Bj )
71
Summation (Integration) out tip

Suppose that B1, B2, … Bk form a partition of :
B1  B2  … Bk =  and B1, B2, … Bk are mutually exclusive
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then
Pr( A | Bi ) Pr( Bi )
Pr( Bi | A) 
Pr( A)
Pr( A | Bi ) Pr( Bi )

k
 j 1 Pr( AB j )

Pr( A | Bi ) Pr( Bi )

k
Pr( B j ) Pr( A |
j 1
Bj )
72
Summation (Integration) out tip

Suppose that B1, B2, … Bk form a partition of :
B1  B2  … Bk =  and B1, B2, … Bk are mutually exclusive
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then
Pr( A | Bi ) Pr( Bi )
Pr( A)
Pr( A | Bi ) Pr( Bi )
Pr( Bi | A) 



k
Pr( AB j )
j 1
Key: Joint distribution!
Pr( A | Bi ) Pr( Bi )

k
Pr( B j ) Pr( A |
j 1
Bj )
73
Application of Bayes’ Rule
R
W
U
Pr(R) = 0.8
R
It rains
W
The grass is wet
U
People bring umbrella
Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(UW| R)=Pr(U| R)Pr(W| R)
Pr(W|R)
R
R
Pr(U|R)
R
R
W
0.7
0.4
U
0.9
0.2
W
0.3
0.6
U
0.1
0.8
Pr(U|W) = ?
74
A More Complicated Example
R
W
U
R
It rains
W
The grass is wet
U
People bring umbrella
Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(UW| R)=Pr(U| R)Pr(W| R)
Pr(R) = 0.8
Pr(W|R)
R
R
Pr(U|R)
R
R
W
0.7
0.4
U
0.9
0.2
W
0.3
0.6
U
0.1
0.8
Pr(U|W) = ?
75
A More Complicated Example
R
W
U
Pr(R) = 0.8
R
It rains
W
The grass is wet
U
People bring umbrella
Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(UW| R)=Pr(U| R)Pr(W| R)
Pr(W|R)
R
R
Pr(U|R)
R
R
W
0.7
0.4
U
0.9
0.2
W
0.3
0.6
U
0.1
0.8
Pr(U|W) = ?
76
Acknowledgments




Peter N. Belhumeur: for some of the slides adapted or
modified from his lecture slides at Columbia University
Rong Jin: for some of the slides adapted or modified from
his lecture slides at Michigan State University
Jeff Solka: for some of the slides adapted or modified from
his lecture slides at George Mason University
Brian Healy: for some of the slides adapted or modified
from his lecture slides at Harvard University
77
Related documents