Download Lecture slides - Columbia Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Binomial model (cont’d)
Introduction to Statistics
Lecture 16
„
For n random Bernoulli trials, X is the
number of successes. If the probability of
success is p, q=1-p is the probability of
failure,
⎛n⎞
P ( X = k ) = ⎜ ⎟ p k q n −k
⎝k ⎠
"n choose k"
Reminder: Quiz 3 next week.
⎛n⎞
n!
⎜ ⎟=
!(
k
k
n
− k )!
⎝ ⎠
5! = 5 × 4 × 3 × 2 × 1
Mean: np
Standard deviation: np(1 − p )
0! = 1
Example
„
„
Normal approximation of binomial
distribution
A large pool of candies. 30% red.
We randomly sampled 10, what is the
probability 5 of them are red?
Binom(100, 0.3)
Binom(10, 0.3)
Binom(100, 0.3)
Binom(10, 0.3)
25
8
20
„
What is the probability that fewer than 5 of
them are red?
Percent of Total
Percent of Total
6
15
10
4
2
5
0
0
0
2
4
6
rbinom(50000, 10, 0.3)
8
10
0
20
40
60
80
100
rbinom(50000, 100, 0.3)
When n is large, so that (np>10) and (nq>10), the binomial
distribution Binom(n, p) can be approximated by N (np, np(1 − p ))
Calculation steps
Normal approximation of Binomial model
„
„
„
„
„
„
„
Check Bernoulli trials?
‰ Success/failure
‰ Probability of success
‰ Independence
Find n—number of trials
Find p—probability of success for each trial
Write down the distribution Binom(n, p)
Check Normal approximation rules:
‰ np>10
‰ n(1-p)>10
Find mean and standard deviation of Binom(n, p)
Use normal distribution with the same mean and standard
deviation to carry out the calculation.
What if p is too small that np<10 even
when n is large
„
Poisson distribution can be used to
approximate the binomial distribution when
‰
‰
„
n is large
np is small
where λ=np.
P( X = k ) = e−λ
λk
k!
1
Poisson approximation of the
binomial distribution
Binom(10000, 0.0001)
Poisson with rate 1
Why study binomial distribution and its
approximations
„
Consider the following scenario:
0.3
0.3
‰
‰
In a large population, 60% of the people support
the current mayor.
You random sample one individual
0.2
0.2
„
„
0.1
A survey of 100 people, X is the number of people
that supports the mayor in this survey.
0.0
0.0
0.1
„
‰
Success: he/she supports the mayor
Failure: he/she doesn’t support the mayor
Probability of success = 60% ?
0 1 2 3 4 5
0 1 2 3 4 5
Different samples out of a population
Why probability models are
important? Or why binomial
distribution is important?
We are starting chapters on statistical
inference NOW.
„
„
„
„
Given a simple random sample with n observations
of a Bernoulli trial
X: the number of success
p: probability of success
Sample proportion:
X
pˆ =
Sample
2
(a,b,c,d)
1
Green
Red
2
Red
Green
3
Green
Green
4
Red
Green
c
d
2
3
b
4
a
Sampling distribution models
of the sample proportion
Sample proportion
„
Sample
1
(1,2,3,4)
1
„
„
‰
‰
n
Sample proportion is a random variable since X
is a random variable that has Binomial model.
Sampling variation: the p-hat calculated based on
different random samples from the sample
population differs due to chance.
Parameters:
„
The sample size, n
The probability of success: long-term relative frequency
(proportion) of success in the population—population
proportion, p
We can study the variation of sample proportion
using some probability model under some
assumptions—sampling distribution model.
2
By the normal approximation …
X is approximately normally distributed with
General situation …
„
mean np and standard deviation np (1 − p )
„
Then,
„
pˆ =
X
is approximately normally distributed with
n
mean p and standard deviation
p (1- p )
.
n
N: population size
n: sample size
To use normal approximation of binomial
model to study the random behavior of the
sample proportion, we need
n < N/10
np > 10 AND nq=n(1-p) > 10
‰
‰
Sampling distribution model of sample proportion!
Example
We just discussed about
proportion, what about mean,
the average?
As historically studied,15% of faculty
members leave campus during spring break.
For a (simple) random sample of 100 faculty
members, what is the chance that more than
20% of them left campus during the past
spring break?
100
-1
0
1
0
50
0
-2
2
-0.5
0.0
0.5
-0.2
100
6
8
10
1
2
3
4
5
1.4
4
2.0
2.2
2.4
2.6
-2
0
2
sample mean of 4 obs
4
1.9
2.0
2.1
2.2
150
100
300
50
200
0
100
0
-4
1.8
sample mean of 1000 obs
400
300
200
100
2
1.8
sample mean of 100 obs
0
0
population
1.6
sample mean of 10 obs
0.12
-2
0.2
0
0
0
population
-4
0.1
50
50
50 100
0
4
0.0
100 150 200 250
200
150
0.4
0.3
2
-0.1
sample mean of 1000 obs
200
sample mean of 100 obs
300
0.5
sample mean of 10 obs
0.2
density curve
50 100 150 200 250 300
150
300
200
4
0.1
0
density curve
2
2
0.0
1 n
1
X = ∑ X i so µ X = ( µ + ... + µ ) = µ
n i =1
n
σ2
⎛1⎞
σ X2 = ⎜ ⎟ (σ 2 + ... + σ 2 ) =
n
⎝n⎠
0
population
0.11
1
-2
0.10
1
50 100
-4
independent, and have the same probability distribution
µ X = µ σ X2 = σ 2
0.15
0
X 1 ,… , X n
n observations:
0.10
0.05
density curve
0.20
Central limit theorem
0.09
Mean and Variance
of Sample mean
Let’s see an online demo first.
0.08
„
-4
-2
0
2
sample mean of 2 obs
4
-3
-2
-1
0
1
2
3
sample mean of 10 obs
3
Sampling distribution model of
Sample mean
„
If the population distribution is
X ∼ N (µ ,
„
n
N (µ , σ )
σ
n
„
)
If the population distribution is not normal and with
mean µ and standard deviation σ
X ∼ N (µ ,
„
σ
In practice
„
Standard error: estimated standard deviation
of a sampling distribution.
Distinguish the sampling distribution and the
distribution of a sample.
) approximately when n is large.
How large is large?
Reading
„
Chapter 18
4
Related documents