Download Sampling Distributions and the Central Limit

Document related concepts
no text concepts found
Transcript
Sampling Distribution Models
and the Central Limit Theorem
Transition from Data Analysis and Probability
to Statistics
Sampling Distribution Models
and the Central Limit Theorem
OBJECTIVES
At the conclusion of this unit you should be able to:

1) Derive the correct sampling distribution model
when given the population parameters

2) Correctly apply the Central Limit Theorem to
calculate probabilities associated with a sample
proportion and sample mean
Probability:

From population to
sample (deduction)
Statistics:
 From sample to the
population (induction)
Sampling Distributions
Population parameter: a numerical descriptive
measure of a population.
(for example:  p (a population proportion);
the numerical value of a population parameter
is usually not known)
Examples:  =mean height of all NCSU students
p=proportion of Raleigh residents who favor
stricter gun control laws
 Sample statistic: a numerical descriptive
measure calculated from sample data.
(e.g, x, s, p (sample proportion))

Parameters; Statistics

In real life parameters of populations are
unknown and unknowable.
– For example, the mean height of US adult
(18+) men is unknown and unknowable


Rather than investigating the whole
population, we take a sample, calculate a
statistic related to the parameter of
interest, and make an inference.
The sampling distribution of the statistic
tells us how the value of the statistic
varies from sample to sample.
DEFINITION: Sampling Distribution

The sampling distribution of a sample
statistic calculated from a sample of n
measurements is the probability
distribution of values taken by the
statistic in all possible samples of size n
taken from the same population.
Based on all possible samples
of size n.
Constructing a Sampling
Distribution


In some cases the sampling distribution can
be determined exactly.
In other cases it must be approximated by
using a computer to draw some of the
possible samples of size n and drawing a
histogram.
Sampling Distribution Models
of Sample Proportions
Example: sampling distribution of
p, the sample proportion



If a coin is fair the probability of a head on
any toss of the coin is p = 0.5 (p is the
population parameter)
Imagine tossing this fair coin 4 times and
calculating the proportion p of the 4 tosses
that result in heads (note that p = x/4, where x
is the number of heads in 4 tosses).
Objective: determine the sampling
distribution of p, the proportion of heads in 4
tosses of a fair coin.
Example: Sampling distribution of p
0
1
2
3
4
Possible values of pˆ : = 0, = .25, = .50, = .75, = 1
4
4
4
4
4
There are 24 = 16 equally likely possible outcomes
(1 =head, 0 =tail)
(1,1,1,1) (1,1,1,0) (1,1,0,1) (1,0,1,1)
(0,1,1,1) (1,1,0,0) (1,0,1,0) (1,0,0,1)
(0,1,1,0) (0,1,0,1) (0,0,1,1) (1,0,0,0)
(0,1,0,0) (0,0,1,0) (0,0,0,1) (0,0,0,0)
p
P(p)
0.0
0.25
(0 heads)
1/16=
0.0625
0.50
0.75
1.0
(1 head) (2 heads)
(3 heads)
(4 heads)
4/16=
0.25
4/16=
0.25
1/16=
0.0625
6/16=
0.375
Sampling distribution of p
p
P(p)
0.0
0.25
(0 heads)
1/16=
0.0625
0.50
0.75
1.0
(1 head) (2 heads)
(3 heads)
(4 heads)
4/16=
0.25
4/16=
0.25
1/16=
0.0625
6/16=
0.375
Sampling distribution of p (cont.)
p
P(p)


0.0
0.25
(0 heads) (1 head)
0.50
0.75
1.0
(2 heads) (3 heads) (4 heads)
1/16=
0.0625
6/16=
0.375
4/16=
0.25
4/16=
0.25
1/16=
0.0625
E(p) =0*.0625+ 0.25*0.25+ 0.50*0.375 +0.75*0.25+
1.0*0.0625 = 0.5 = p (the prob of heads)
Var(p) = (0  0.5)  0.0625  (.25  0.5)  0.25  (0.5  0.5)  0.375
2
2
 (0.75  0.5)  0.25  (1  0.5)  0.0625
2
2
= 0.0625
 SD( pˆ ) = Var ( pˆ ) = 0.0625 = 0.25
 Note that SD( pˆ ) =
p (1  p )
.5  .5 .5
=
=
= 0.25
n
4
4
2
Expected Value and Standard Deviation of
the Sampling Distribution of p


E(p) = p
SD(p) =
p(1  p)
n
where p is the “success” probability in the
sampled population and n is the sample
size
Shape of Sampling Distribution of
p

The sampling distribution of p is
approximately normal when the sample
size n is large enough. n large enough
means np ≥ 10 and n(1-p) ≥ 10
Shape of Sampling Distribution of p
Population Distribution, p=.65
Population, p = .65
0.7
0.65
0.6
0.5
0.4
0.3
0.35
0.2
0.1
0
0
1
Sampling distribution of p for
samples of size n
Example
8% of American Caucasian male
population is color blind.
 Use computer to simulate random
samples of size n = 1000

Histogram of phat's from Simulated Samples (2000
independent samples, each of size n=1000 men)
300
200
100
9
7
0.
10
phat
0.
09
1
0.
09
3
0.
08
5
0.
07
7
0.
06
9
0.
05
1
0
0.
05
# of Samples
400
The sampling distribution model for a
sample proportion p
Provided that the sampled values are independent and the
sample size n is large enough, the sampling distribution of
p is modeled by a normal distribution with E(p) = p and
standard deviation SD(p) =
pq
n
, that is

pq 
pˆ ~ N  p,

n 

where q = 1 – p and where n large enough means np>=10
and nq>=10
The Central Limit Theorem will be a formal statement of
this fact.
Example: binge drinking by
college students




Study by Harvard School of Public Health:
44% of college students binge drink.
At a particular college 244 students were
surveyed; 36% admitted to binge drinking in
the past week
Assume the value 0.44 given in the Harvard
study is the proportion p of college students
that binge drink; that is 0.44 is the population
proportion p
Compute the probability that in a sample of
244 students, 36% or less have engaged in
binge drinking.
Example: binge drinking by
college students (cont.)


Let p be the proportion in a sample of 244
that engage in binge drinking.
We want to compute P ( pˆ  .36)
pq


.44 *.56
E(p) = p = .44; SD(p) = n = 244 = .032
Since np = 244*.44 = 107.36 and nq =
244*.56 = 136.64 are both greater than 10,
we can model the sampling distribution of p
with a normal distribution, so …
Example: binge drinking by
college students (cont.)
pˆ ~ N (.44,.032)
pˆ  .44 .36  .44 

So P ( pˆ  .36) = P 


.032 
 .032
= P ( z  2.5) = .0062
Example: snapchat
by college students




recent scientifically valid survey : 77% of college
students use snapchat.
1136 college students surveyed; 75% reported that
they use snapchat.
Assume the value 0.77 given in the survey is the
proportion p of college students that use snapchat;
that is 0.77 is the population proportion p
Compute the probability that in a sample of 1136
students, 75% or less use snapchat.
Example: snapchat by college
students (cont.)


Let p be the proportion in a sample of 1136
that use snapchat.
We want to compute P ( pˆ  .75)
pq


=
.77 *.23
= .0125
E(p) = p = .77; SD(p) = n
1136
Since np = 1136*.77 = 874.72 and nq =
1136*.23 = 261.28 are both greater than 10,
we can model the sampling distribution of p
with a normal distribution, so …
Example: snapchat by college students
(cont.)
pˆ ~ N (.77,.0125)
pˆ  .75 .75  .77 

So P ( pˆ  .75) = P 


.0125 
 .0125
= P ( z  1.6) = .0548
Sampling Distribution Models
of Sample Means
Another Population Parameter of Frequent
Interest: the Population Mean µ
 To
estimate the unknown value of
µ, the sample mean x is often used.
 We need to examine the Sampling
Distribution of the Sample Mean x
(the probability distribution of all
possible values of x based on a
sample of size n).
Example
Professor Stickler has a large statistics class
of over 300 students. He asked them the
ages of their cars and obtained the following
probability distribution:
x
2
3
4
5
6
7
8
p(x) 1/14 1/14 2/14 2/14 2/14 3/14 3/14

SRS n=2 is to be drawn from pop.
 Find the sampling distribution of the
sample mean x for samples of size n =
2.

Solution

7 possible ages (ages 2 through 8)

Total of 72=49 possible samples of size 2

All 49 possible samples with the
corresponding sample means and
probabilities are on the next slide
All 49 possible samples of size n = 2
x
p(x)
2
1/14
3
1/14
4
2/14
5
2/14
6
2/14
7
3/14
8
3/14
Sample 2,2 2,4 2,6 2,8 2,5 2,3 2,7 4,2 4,4 4,6 4,8 4,5 4,3 4,7 6,2 6,4 6,6
xbar 2 3 4 5 3.5 2.5 4.5 3 4 5 6 4.5 3.5 5.5 4 5 6
Prob
1
196
2
196
2
196
3
196
2
196
1
196
3
196
2
196
4
196
4
196
6
196
4
196
2
196
6
196
2
196
4
196
Sample 6,8 6,5 6,3 6,7 8,2 8,4 8,6 8,8 8,5 8,3 8,7 5,2 5,4 5,6 5,8 5,5
xbar 7 5.5 4.5 6.5 5 6 7 8 6.5 5.5 7.5 3.5 4.5 5.5 6.5 5
Prob
6
196
4
196
2
196
6
196
3
196
6
196
6
196
9
196
6
196
3
196
9
196
2
196
4
196
4
196
6
196
4
196
Sample 5,3 5,7 3,2 3,4 3,6 3,8 3,5 3,3 3,7 7,2 7,4 7,6 7,8 7,5 7,3 7,7
xbar 4 6 2.5 3.5 4.5 5.5 4 3 5 4.5 5.5 6.5 7.5 6 5 7
Prob
2
196
6
196
1
196
2
196
2
196
3
196
2
196
1
196
3
196
3
196
6
196
6
196
9
196
6
196
3
196
9
196
4
196
Population: ages of cars
and their distribution
Sample 2,2 2,4 2,6 2,8 2,5 2,3 2,7 4,2 4,4 4,6 4,8 4,5 4,3 4,7 6,2 6,4 6,6
xbar 2 3 4 5 3.5 2.5 4.5 3 4 5 6 4.5 3.5 5.5 4 5 6
Prob
Probability Distribution of the Sample
Mean Age of 2 Cars
1 2 2 3 2 1 3 2 4 4 6 4 2 6 2 4 4
196 196 196 196 196 196 196 196 196 196 196 196 196 196 196 196 196
Sample 6,8 6,5 6,3 6,7 8,2 8,4 8,6 8,8 8,5 8,3 8,7 5,2 5,4 5,6 5,8 5,5
xbar 7 5.5 4.5 6.5 5 6 7 8 6.5 5.5 7.5 3.5 4.5 5.5 6.5 5
Prob
6 4 2 6 3 6 6 9 6 3 9 2 4 4 6 4
196 196 196 196 196 196 196 196 196 196 196 196 196 196 196 196
Sample 5,3 5,7 3,2 3,4 3,6 3,8 3,5 3,3 3,7 7,2 7,4 7,6 7,8 7,5 7,3 7,7
xbar 4 6 2.5 3.5 4.5 5.5 4 3 5 4.5 5.5 6.5 7.5 6 5 7
Prob
2 6 1 2 2 3 2 1 3 3 6 6 9 6 3 9
196 196 196 196 196 196 196 196 196 196 196 196 196 196 196 196
Sample 2,2 2,4 2,6 2,8 2,5 2,3 2,7 4,2 4,4 4,6 4,8 4,5 4,3 4,7 6,2 6,4 6,6
xbar 2 3 4 5 3.5 2.5 4.5 3 4 5 6 4.5 3.5 5.5 4 5 6
1
196
Prob
2
196
2
196
3
196
2
196
1
196
3
196
2
196
4
196
4
196
6
196
4
196
2
196
6
196
2
196
4
196
4
196
Sample 6,8 6,5 6,3 6,7 8,2 8,4 8,6 8,8 8,5 8,3 8,7 5,2 5,4 5,6 5,8 5,5
xbar 7 5.5 4.5 6.5 5 6 7 8 6.5 5.5 7.5 3.5 4.5 5.5 6.5 5
6
196
Prob
4
196
2
196
6
196
3
196
6
196
6
196
9
196
6
196
3
196
9
196
2
196
4
196
4
196
6
196
4
196
Sample 5,3 5,7 3,2 3,4 3,6 3,8 3,5 3,3 3,7 7,2 7,4 7,6 7,8 7,5 7,3 7,7
xbar 4 6 2.5 3.5 4.5 5.5 4 3 5 4.5 5.5 6.5 7.5 6 5 7
2
196
Prob
x
2
2.5
3
6
196
1
196
3.5
2
196
2
196
4
3
196
2
196
1
196
4.5
3
196
3
196
5
6
196
6
196
9
196
5.5
6
196
3
196
6
9
196
6.5
7
7.5
8
p(x) 1/196 2/196 5/196 8/196 12/196 18/196 24/196 26/196 28/196 24/196 21/196 18/196 9/196
Solution (cont.)

Probability distribution of x:
x 2 2.5
p(x) 1/196 2/196
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
5/196 8/196 12/196 18/196 24/196 26/196 28/196 24/196 21/196 18/196 1/196
This is the sampling distribution of x because it
specifies the probability associated with each
possible value of x
 From the sampling distribution above
P(4  x  6) = p(4)+p(4.5)+p(5)+p(5.5)+p(6)

= 12/196 + 18/196 + 24/196 + 26/196 + 28/196 = 108/196
Expected Value and Standard
Deviation of the Sampling
Distribution of x
Example (cont.)
Population probability dist.
x
2
3
4
5
6
7
8
p(x) 1/14 1/14 2/14 2/14 2/14 3/14 3/14


Sampling dist. of x
x
p(x)
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
1/196 2/196 5/196 8/196 12/196 18/196 24/196 26/196 28/196 24/196 21/196 18/196 1/196
Population probability dist.
x
2
3
4
5
6
7
8
p(x) 1/14 1/14 2/14 2/14 2/14 3/14 3/14
E(X)=2(1/14)+3(1/14)+4(2/14)+ … +8(3/14)=5.714
Sampling dist. of x
Population mean E(X)= = 5.714
x
2 2.5 3 3.5
p(x) 1/196 2/196 5/196 8/196
4.5
4
5
5.5
6
6.5
7
7.5
8
12/196 18/196 24/196 26/196 28/196 24/196 21/196 18/196 1/196
E(X)=2(1/196)+2.5(2/196)+3(5/196)+3.5(8/196)+4(12/196)+4.5(18/196)+5(24/196)
+5.5(26/196)+6(28/196)+6.5(24/196)+7(21/196)+7.5(18/196)+8(1/196) = 5.714
Mean of sampling distribution of x: E(X) = 5.714
Example (cont.)
Population from which sample is selected:
 = E ( X ) = 2( 141 )  3( 141 )  4( 142 ) 
 8  143  = 5.714
 2 = Var ( X ) = 3.4898
 = SD( X ) = Var ( X ) = 3.4898 = 1.8681

Sampling dist. of X :
1
2
E ( X ) = 2( 196
)  2.5( 196
)
9
 8( 196
) = 5.714
3.4898 Var ( X )
=
2
2
Var ( X ) SD( X ) 1.8681
SD( X ) = Var ( X ) =
=
=
= 1.3209
2
2
2
Var ( X ) =1.7449 =
IMPORTANT
Numerical Summaries of the Sampling Distribution of X are
Related to the Numerical Summaries of the Population X from
Which the Sample is Selected
E ( X ) = E ( X ) (the mean of the sampling distribution of X is always
equal to the mean of the population from which the sample is selected)

Var ( X )
Var ( X ) =
n
Var ( X ) SD( X )
SD( X ) = Var ( X ) =
=
n
n
the standard deviation of the sampling distribution of X is always
equal to the standard deviation of the population from which the sample
is selected, divided by the square root of the sample size n
Sampling Distribution of the
Sample Mean X: Example

An example
– A fair 6-sided die is thrown; let X represent the number of
dots showing on the upper face.
– The probability distribution
Population mean :
 = E(X) = 1(1/6) +2(1/6)
of X is
x
1 2 3 4 5 6
p(x) 1/6 1/6 1/6 1/6 1/6 1/6
+ 3(1/6) +……… = 3.5.
Population variance 2
2 =V(X) = (1-3.5)2(1/6)+
(2-3.5)2(1/6)+ ………
………. = 2.92
Suppose we want to estimate  from the
mean x of a sample of size n = 2.
 What is the sampling distribution of x in
this situation?

Sample
1
2
3
4
5
6
7
8
9
10
11
12
1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6
Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5
19
4,1
2.5
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5
Sample
25
26
27
28
29
30
31
32
33
34
35
36
Mean
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6
3
3.5
4
4.5
5
5.5
3.5
4
4.5
5
5.5
6
Sample
1
2
3
4
5
6
7
8
9
10
11
12
1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6
Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5
19
4,1
2.5
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5
Sample
25
26
27
28
29
30
31
32
33
34
35
36
Mean
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6
3
3.5
4
4.5
5
5.5
3.5
4
4.5
5
5.5
6
Var ( X )
Note : E ( X ) = E ( X ) and Var ( X ) =
2
E( x) =1.0(1/36)+
1.5(2/36)+….=3.5
6/36
5/36
V(X) = (1.0-3.5)2(1/36)+
(1.5-3.5)2(2/36)... = 1.46
4/36
3/36
2/36
1/36
1
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5 6.0
x
n=5
E ( X ) = 3.5
Var ( X ) = .5833 ( =
Var ( X )
5
n = 10
) E ( X ) = 3.5
Var ( X ) = .2917 ( =
Var ( X )
10
n = 25
) E ( X ) = 3.5
Var ( X ) = .1167 ( =
1
Var ( X )
25
6
Notice that Var ( X ) is smaller
1
than Var(X). The larger the sample
size the smaller is Var ( X ) . Therefore,
x tends to fall closer to , as the
sample size increases.
6
1
6
)
The variance of the sample mean is smaller
than the variance of the population.
Mean = 1.5 Mean = 2. Mean = 2.5
Population
Let us take samples
of two observations
1.5
2.5
22
3
1.5
2.5
22
1.5
2.5
1.5
2
2.5
1.5
2.5
Compare1.5
the variability
population
2 of the
2.5
1.5
2.5
to the variability
of 22the sample
mean.
1.5
2.5
1.5
2.5
2
1.5
2.5
1.5
2
2.5
1.5
2
2.5
1.5
2
2.5
1
Also,
Expected value of the population = (1 + 2 + 3)/3 = 2
Expected value of the sample mean = (1.5 + 2 + 2.5)/3 = 2
Properties of the Sampling Distribution of
x
1. E ( x ) = 
(the expected value of the sampling distribution
of x = the expected value  of the sampled population)
SD( x) 
2. SD( x ) =
=
n
n
where  is the standard deviation of the
population from which the sample is taken and n is
the sample size.
Unbiased
l Confidence
l Precision
µ
The central tendency is down the center
BUS 350 - Topic 6.1
Handout 6.1, Page 1 6.1 - 14
Unbiased
Biased
µ
Biased
µ
µ
The central tendency is down the center
BUS 350 - Topic 6.1
Handout 6.1, Page 2
6.1 - 15
Consequences
1. E ( x ) = . This is why we use x to estimate an
unknown population mean . The sampling
dist. of x is "centered" at the parameter we are
trying to estimate.
2. SD( x ) = SD (nx ) ; the standard deviation of x is
smaller than SD( x), the stand. dev. of the population from which the sample is taken. The
values of x will cluster tightly around 
when n is large.
A Billion Dollar Mistake





“Conventional” wisdom: smaller schools better
than larger schools
Late 90’s, Gates Foundation, Annenberg
Foundation, Carnegie Foundation
Among the 50 top-scoring Pennsylvania
elementary schools 6 (12%) were from the
smallest 3% of the schools
But …, they didn’t notice …
Among the 50 lowest-scoring Pennsylvania
elementary schools 9 (18%) were from the
smallest 3% of the schools
A Billion Dollar
Mistake (cont.)
Smaller schools have (by definition)
smaller n’s.
SD ( x )
 When n is small, SD(x) =
n is larger
 That is, the sampling distributions of
small school mean scores have larger
SD’s

We Know More!

We know 2 parameters of the sampling
distribution of x :
E(x) = μ
SD(x)
SD(x) =
n
The Central Limit Theorem tells
us about the shape of the distribution of x
when the sample size n is sufficiently large.
THE CENTRAL LIMIT
THEOREM
The “World is Normal” Theorem
But first,…Sampling Distribution of
x- Normally Distributed Population
Sampling distribution of x:
N( ,  /10)
n=10
/10
Population distribution:
N( , )

Normal Populations

Important Fact:
 If the population is normally distributed, then the sampling
distribution of x is normally distributed for any sample size
n.

Previous slide
Non-normal Populations
What can we say about the shape of the
sampling distribution of x when the
population from which the sample is
selected is not normal?
Baseball Salaries
600
490
500
Frequency

400
300
200
100
53
102
72
35 21 26 17
8
10
0
Salary ($1,000's)
2
3
1
0
0
1
The Central Limit Theorem
(for the sample mean x)
If a random sample of n observations is selected from
a population (any population), then when n is
sufficiently large, the sampling distribution of x will be
approximately normal.
(The larger the sample size, the better will be the normal
approximation to the sampling distribution of x.)

The Importance of the Central
Limit Theorem

When we select simple random samples of
size n, the sample means x will vary from
sample to sample. We can model the
distribution of these sample means with a
probability model that is …


N  ,



n
How Large Should n Be?

For the purpose of applying the Central
Limit Theorem, we will consider a
sample size to be large when n > 30.
Baseball Salaries
600
Frequency
← Even if the population from
← which the sample is
← selected looks like this …
490
500
400
300
200
100
53
102
72
35 21 26 17
8
10
2
3
1
0
0
1
0
Salary ($1,000's)
… the Central Limit
→
Theorem tells us that a
→
good model for the sampling
→
distribution of the sample
mean x is …
Summary
Population: mean ; stand dev. ;
shape of population dist. is
unknown; value of  is unknown;
select random sample of size n;
Sampling distribution of x:
mean ; stand. dev. /n;
always true!
By the Central Limit Theorem:
the shape of the sampling distribution
is approx normal, that is
x ~ N(, /n)
The Central Limit Theorem
(for the sample proportion p )
If x “successes” occur in a random
sample of n observations selected from
a population (any population), then
when n is sufficiently large, the
sampling distribution of p =x/n will be
approximately normal.
(The larger the sample size, the better will
be the normal approximation to the
sampling distribution of p.)

The Importance of the Central
Limit Theorem

When we select simple random samples of size
n from a population with “success” probability p
and observe x “successes”, the sample
proportions p =x/n will vary from sample to
sample. We can model the distribution of these
sample proportions with a probability model that
is… 
p(1  p) 
N  p,

n


How Large Should n Be?

For the purpose of applying the central limit
theorem, we will consider a sample size n
to be large when np ≥ 10 and n(1-p) ≥ 10
Population, "success" proportion = p
0.7
p
__
0.6
p
0.5
0.4
0.3
1-p
0.2
0.1
0
0
1
… the Central Limit
→
Theorem tells us that a
→
good model for the sampling
→
distribution of the sample
x
proportion pˆ = n is …
← If the population from
← which the sample is
← selected looks like this …
Population Parameters and Sample
Statistics
Population
parameter
Value
Sample
statistic
used to
estimate

p
proportion of
population
with a certain
characteristic
Unknown
p̂

µ
mean value
of a
population
variable

Unknown
x
The value of a population
parameter is a fixed
number, it is NOT random;
its value is not known.
The value of a sample
statistic is calculated from
sample data
The value of a sample
statistic will vary from
sample to sample
(sampling distributions)
Example
A random sample of n =64 observations is
drawn from a population with mean
 =15 and standard deviation  =4.
SD( X ) 4
a. E ( X ) =  = 15; SD( X ) =
= = 0.5
8
n
b. The shape of the sampling distribution
model for x is approx. normal (by the CLT)
with mean E(X) = 15 and SD( X ) = 0.5 (The
answer depends on the sample size n
since SD( X ) =
SD ( X )
n
=
4
64
= 84 = 0.5)
Example (cont.)
c.
x = 15.5;
z=
x 
SD ( X )
= 15.5.515 = .5.5 = 1
This means that x =15.5 is one standard
deviation above the mean  = E ( X ) = 15
Example 2
The probability distribution of 6-month
incomes of account executives has mean
$20,000 and standard deviation $5,000.
 a) A single executive’s income is $20,000.
Can it be said that this executive’s income
exceeds 50% of all account executive
incomes?
ANSWER No. P(X<$20,000)=? No
information given about shape of
distribution of X; we do not know the
median of 6-month incomes.

Example 2(cont.)

b) n=64 account executives are randomly
selected. What is the probability that the
sample mean exceeds $20,500?
answer E(X) = $20, 000
SD(X) = $5, 000
E ( X ) = $20, 000
SD ( X ) =
SD ( x )
n
=
5,000
64
= 625
By CLT, X ~ N (20, 000, 625)
P ( X  20, 500) =
P
X  20,000
625

20,500  20,000
625
=
P ( z  .8) = 1  .7881 = .2119
Example 3
A sample of size n=16 is
drawn from a normally
distributed population with
E(X)=20 and SD(X)=8.
X ~ N (20, 8); X ~ N (20, 816 )
a ) P ( X  24) = P ( X 220  24 2 20 )
= P ( z  2) = 1  .9772 = .0228
b) P (16  X  24)
= P  16 220  z  24 2 20 
= P ( 2  z  2)
= .9772  .0228 = .9544
Example 3 (cont.)
c. Do we need the Central Limit
Theorem to solve part a or part b?
 NO. We are given that the population is
normal, so the sampling distribution of
the mean will also be normal for any
sample size n. The CLT is not needed.

Example 4

Battery life X~N(20, 10). Guarantee: avg.
battery life in a case of 24 exceeds 16
hrs. Find the probability that a randomly
selected case meets the guarantee.
E ( x ) = 20; SD( x ) =
10
P ( X  16) = P( 2.04 
X  20
.1  .0250 = .9750
24
= 2.04. X ~ N (20, 2.04)
16  20
2.04
) = P( z  1.96) =
Example 5
Cans of salmon are supposed to have a
net weight of 6 oz. The canner says that
the net weight is a random variable with
mean =6.05 oz. and stand. dev. =.18
oz.
Suppose you take a random sample of 36
cans and calculate the sample mean
weight to be 5.97 oz.
 Find the probability that the mean
weight of the sample is less than or
equal to 5.97 oz.
Population X: amount of salmon in
a can
E(x)=6.05 oz, SD(x) = .18 oz




X sampling dist: E(x)=6.05 SD(x)=.18/6=.03
By the CLT, X sampling dist is approx. normal
P(X  5.97) = P(z  [5.97-6.05]/.03)
=P(z  -.08/.03)=P(z  -2.67)= .0038
How could you use this answer?
Suppose you work for a “consumer
watchdog” group
 If you sampled the weights of 36
cans and obtained a sample mean x
 5.97 oz., what would you think?
 Since P( x  5.97) = .0038, either

– you observed a “rare” event (recall: 5.97
oz is 2.67 stand. dev. below the mean)
and the mean fill E(x) is in fact 6.05 oz.
(the value claimed by the canner)
– the true mean fill is less than 6.05 oz.,
(the canner is lying ).
Example 6
X: weekly income. E(X)=1050, SD(X) = 100
 n=64;
X sampling dist: E(X)=1050
SD(X)=100/8 =12.5


P(X  1022)=P(z  [1022-1050]/12.5)
=P(z  -28/12.5)=P(z  -2.24) = .0125
Suspicious of claim that average is $1050;
evidence is that average income is less.
Example 7

12% of students at NCSU are left-handed.
What is the probability that in a sample of 100
students, the sample proportion that are lefthanded is less than 11%?
.12*.88
ˆ
ˆ
E ( p) = p = .12; SD( p) =
= .032
100
np = 100  .12 = 12  10;
n(1  p) = 100  .88 = 88  10;
So
By the CLT, pˆ ~ N (.12,.032)
Example 7 (cont.)
 pˆ  .12 .11  .12 
ˆ
P( p  .11) = P 


.032
.032


= P( z  .31) = .3783
P ( pˆ  .11) = .3783
p̂
pˆ = .11
P( z  .31) = .3783
z = .31
Related documents