Download Chapter 7: Sampling Distributions Chapter 7: Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 7: Sampling Distributions






STAT 3038
Chapter 7: Sampling Distributions
7.1 Sampling Distribution, Sampling Error, and
Nonsampling Errors
7.2 Mean and Standard Deviation of
7.3 Shape of the Sampling Distribution of
7.4 Applications of the Sampling Distribution of
7.5 Population and Sample Proportions; and Mean, Standard
Deviation, and Shape of the Sampling Distribution of
7.6 Applications of the Sampling Distribution of
7-1
Dr. Yingfu (Frank) Li






STAT 3038
7.1 Population & Sampling Distributions





STAT 3038
Population distribution: a probability distribution of the
population data
Example of a simple population distribution: 5 students test
scores (see next slide)
Sampling distribution: probability distribution of a sample
statistic
Sampling distribution of
: probability distribution of
It lists all possible values of
and their corresponding
probabilities
In general, the probability distribution of a sample statistic is
called its sampling distribution.
7-3
Dr. Yingfu (Frank) Li
Population distribution & sampling distributions
Errors: sampling v.s. nonsampling
Mean and standard deviation of sample mean
Sampling distribution: distribution of sample mean
Population and sample proportion
Sampling distribution of sample proportion
7-2
Dr. Yingfu (Frank) Li
Population Distribution

STAT 3038
Suppose there are only five students in an advanced statistics
class and the midterm scores of these five students are
70 78 80 80 95
Let x denote the score of a student
7-4
Dr. Yingfu (Frank) Li
1
Sampling Distribution



Reconsider the population of midterm scores of five students
given in Table 7.1
Consider all possible samples of three scores each that can
be selected, without replacement, from that population.
Total number of possible samples is 5 C3  5!  5  4  3 2 1  10
3!(5  3)!

Distributions of
All possible samples for n=3
P(x )
0.20
0.10
0.10
0.10
0.20
0.20
0.10
3  2 1 2 1
Suppose we assign the letters A, B, C, D, and E to the scores
of the five students so that


All Possible Samples and the Distribution
A = 70, B = 78, C = 80, D = 80, E = 95
Then, the 10 possible samples of three scores each are

ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE
1.00
STAT 3038
7-5
Dr. Yingfu (Frank) Li
STAT 3038
Sampling and Nonsampling Errors


Sampling error is the difference between the value of a
sample statistic and the value of the corresponding
population parameter. In the case of the mean, sampling
error = x  
assuming that the sample is random and no nonsampling
error has been made.
The errors that occur in the collection, recording, and
tabulation of data are called nonsampling errors.
7-7
Dr. Yingfu (Frank) Li
Reasons for Occurrence of Nonsampling Errors




STAT 3038
7-6
Dr. Yingfu (Frank) Li
STAT 3038
If a sample is nonrandom (and, hence, nonrepresentative),
the sample results may be too difference from the census
results.
The questions may be phrased in such a way that they are
not fully understood by the members of the sample or
population.
The respondents may intentionally give false information in
response to some sensitive questions.
The poll taker may make a mistake and enter a wrong
number in the records or make an error while entering the
data on a computer.
7-8
Dr. Yingfu (Frank) Li
2
Example 7-1

Sampling & Nonsampling Errors
Reconsider the population of five scores given in Table 7.1.
Suppose one sample of three scores is selected from this
population, and this sample includes the scores 70, 80, and
95. Find the sampling error.

x
70  78  80  80  95
 80.60

Now suppose, when we select the sample of three scores, we
mistakenly record the second score as 82 instead of 80. As a result,
70  82  95
we calculate the sample mean as
 82.33
x
3
The difference between this sample mean and the population mean
is x    82.33  80.60  1.73
This difference does not represent the sampling error.

The remaining portion represents the nonsampling error.


5
70  80  95
3

 81.67

Sampling error  x    81.67  80.60  1.07



That is, the mean score estimated from the sample is 1.07
higher than the mean score of the population.
STAT 3038
Dr. Yingfu (Frank) Li
7-9
Only 1.07 of this difference is due to the sampling error.
It is equal to 1.73 – 1.07 = .66
Due to the error we made in recording the second score in the sample
Also,
Nonsampling error  Incorrect x  Correct x  82.33  81.67  .66
STAT 3038
7.2 Mean and Standard Deviation of X

We just see that
is a random variable. So it has its own
probability distribution, and its own mean & standard
deviation
Example 7-2

The mean wage for all 5000 employees who work at a large
company is $27.50 and the standard deviation is $3.70. Let
be the mean wage per hour for a random sample of
certain employees selected from this company. Find the
mean and standard deviation of for a sample size of
(a) 30
(b) 75
(c) 200
Solution μ = $27.50, σ = $3.70  x    $27.50
x
x

Mean of X :  X  

Standard Deviation of X : 

X



n

It means that the central location of sampling distribution of
keeps the same as pop, but the standard deviation is
much smaller, and decreases as the sample size increases


STAT 3038
Dr. Yingfu (Frank) Li
7-10
7-11
Dr. Yingfu (Frank) Li
STAT 3038
(a)
(b)
(c)
x 
x 
x 

n

n

n

3.70
 $.676
30

3.70
 $.427
75

3.70
 $.262
200
7-12
Dr. Yingfu (Frank) Li
3
7.3 Shape of Sampling Distribution of X

The population from which samples are drawn has a normal
distribution.

If the population from which the samples are drawn is normally
distributed with mean μ and standard deviation σ, then the sampling
distribution of the sample mean, X , will also be normally
distributed with the following mean and standard deviation,
irrespective of the sample size:

x  

Pop Distribution and Sampling Distributions of X
x 
&
n
The population from which samples are drawn does not have
a normal distribution

According to the central limit theorem, for a large sample size (n ≥
30), the sampling distribution of X is approximately normal,
irrespective of the shape of the population distribution. The mean and
standard deviation of the sampling distribution of X are
x  
x 
&
STAT 3038

7-13
n
Dr. Yingfu (Frank) Li
Simulations
STAT 3038
Example 7-3


STAT 3038
7-14
Dr. Yingfu (Frank) Li
Shape of (a) in Example 7-3
In a recent SAT, the mean score for all examinees was 1020.
Assume that the distribution of SAT scores of all examinees
is normal with the mean of 1020 and a standard deviation of
153. Let X be the mean SAT score of a random sample of
certain examinees. Calculate the mean and standard
deviation of and describe the shape of its sampling
distribution when the sample size is
(a) 16
(b) 50
(c) 1000
 x    1020
Solution

(a)


(b)


(c)

x
x

x



n


n
n

153
 3 8 .2 5 0
16

153
 2 1 .6 3 7
50

153
 4 .8 3 8
1000
7-15
Dr. Yingfu (Frank) Li
STAT 3038
7-16
Dr. Yingfu (Frank) Li
4
Shape of (b) in Example 7-3
STAT 3038
7-17
Shape of (c) in Example 7-3
Dr. Yingfu (Frank) Li
STAT 3038
Example 7-4


Dr. Yingfu (Frank) Li
Shape of (a) in Example 7-4
The mean rent paid by all tenants in a small city is $1550
with a standard deviation of $225. However, the population
distribution of rents for all tenants in this city is skewed to
the right. Calculate the mean and standard deviation of
and describe the shape of its sampling distribution when the
sample size is (a) 30 (b) 100
Solution  x    $1550


STAT 3038
7-18
(a)
(b)


x

x


n

n

225
 $ 4 1 .0 8
30

225
 $ 2 2 .5 0
100
7-19
Dr. Yingfu (Frank) Li
STAT 3038
7-20
Dr. Yingfu (Frank) Li
5
7.4 Applications of Sampling Distribution of X
Shape of (b) in Example 7-4


For applications, first find
since
Then work on the new distribution of

Finding probability






STAT 3038
Dr. Yingfu (Frank) Li
7-21
Sketch the normal curve
Shade the area
Use table, calculator, or Excel to find the probability
Finding x values given probability
Empirical rules
Examples 7 – 5 & 6
STAT 3038
Empirical Rules

Example 7-5
If we take all possible samples of the same (large) size from
a population and calculate the mean for each of these
samples, then about 68.26% (95.44% or 99.74%) of the
sample means will be within one (two or three) standard
deviation of the population mean.
P (   1
x
 x    1
Dr. Yingfu (Frank) Li
7-22
x

Assume that the weights of all
packages of a certain brand of cookies
are normally distributed with a mean
of 32 ounces and a standard deviation
of .3 ounce. Find the probability that
the mean weight, , of a random
sample of 20 packages of this brand of
cookies will be between 31.8 and 31.9
ounces.
Solution
 x    32 ounces
x
)


x


n

.3
 .0 6 7 0 8 2 0 4 o u n c e
20
z

x
 

P  31.8  X  31.9   P (  2.98  Z   1.49)
x
 P ( Z   1.49)  P ( Z   2.98)  .0681  .0014  .0667
STAT 3038
7-23
Dr. Yingfu (Frank) Li
STAT 3038
7-24
Dr. Yingfu (Frank) Li
6
Example 7-6

7.5 Population & Sample Proportions
According to Moebs Services Inc., an individual checking
account at major U.S. banks costs the banks between $350
and $450 per year (Time, November 21, 2011). Suppose that
the current average cost of all checking accounts at major
U.S. banks is $400 per year with a standard deviation of $30.
Let x be the current average annual cost of a random
sample of 225 individual checking account at major banks in
America.

Within $4 of pop mean: P(μ-4 <
What is the proportion?




x < μ +4) = ?


Lower than the pop mean by $2.70 or more: P(
x < μ -2.7) = ?
Dr. Yingfu (Frank) Li
7-25
STAT 3038
Mean, Standard Deviation & Shape of
Suppose a total of 789,654 families live in a city and
563,282 of them own homes. A sample of 240 families is
selected from this city, and 158 of them own homes. Find the
proportion of families who own homes in the population and
in the sample.
Solution
X
563, 282

 .71
 N = 789,654, X = 563,282
=> p 
N

n = 240, x = 158 =>
pˆ 
x
n

158


7-27
p̂

Standard deviation of sample proportion
 p̂  p
 p̂ 

Dr. Yingfu (Frank) Li
p̂
Mean of sample proportion
240
STAT 3038
p̂
The probability distribution of sample proportion p̂ . It gives
various values that p̂ can assume and their probabilities.

789, 654
 .66
Sampling distribution of
p̂
pq
n
Sample proportion p̂ approximately follows normal
distribution if
np > 5 and nq > 5
pˆ 
STAT 3038
Dr. Yingfu (Frank) Li
7-26
Example 7-7

N = total # of elements in the population
n = total # of elements in the sample
X = # of elements in the population with a characteristic
x = # of elements in the sample with a characteristic
What is the probability that the average annual cost of the checking
accounts in this sample is less than the population mean by $2.70 or
more?
STAT 3038

The ratio of # of elements with a specific characteristic to the total #
of elements
Population proportion & sample proportion

What is the probability that the average annual cost of the checking
accounts in this sample is within $4 of the population mean?



x x1  x2  ...  xn

x
n
n
7-28
Dr. Yingfu (Frank) Li
7
Example 7-8

Example 7-8
Boe Consultant Associates has five employees. Table 7.6
gives the names of these five employees and information
concerning their knowledge of statistics.



Now, suppose we draw all possible samples of three
employees each and compute the proportion of employees,
for each sample, who know statistics.
5!
5  4 3 2 1
Total number of samples 
7-29
Dr. Yingfu (Frank) Li
STAT 3038
3  2  1 2  1
7.6 Applications of Sampling Distribution of
According to a New York Times/CBS News poll conducted
during June 24-28, 2011, 55% of adults polled said that owning a
home is a very important part of the American Dream (The New
York Times, June 30, 2011). Assume that this result is true for the
current population of American adults. Let p̂ be the proportion of
American adults in a random sample of 2000 who will say that
owning a home is a very important part of the American Dream.
Find the mean and standard deviation of p̂ and describe the shape
of its sampling distribution.
Solution p  0.55, q  1  p  0.45 and n  2000
 pˆ  p  0.55

 10

For applications, first find  p̂ 
since
n
Then work on the new distribution of p̂

7-31
Dr. Yingfu (Frank) Li


STAT 3038
p̂
 p̂  p
Finding probability such as Examples 7 – 10 & 11

pq
0.55  0.45

 0.0111
n
2000
np  2000  0.55  1100, nq  2000  0.45  900
p̂
When we conduct a study, we usually take only one sample
and make all decisions or inferences on the basis of the
results of that one sample. We use the concepts of the mean,
standard deviation, and shape of the sampling distribution of
to determine the probability that the value of p̂ computed
from one sample falls within a given interval.
pq

 pˆ 
STAT 3038

Dr. Yingfu (Frank) Li
7-30
Example 7-9

3!(5  3)!
If we define the population proportion, p, as the proportion of
employees who know statistics, then
p = 3 / 5 = .60
STAT 3038

C3 
5
Sketch the normal curve
Shade the area
Use table, calculator, or Excel to find the probability
7-32
Dr. Yingfu (Frank) Li
8
Example 7-10

Example 7-11
According to a Pew Research Center nationwide telephone
survey of American adults conducted by phone between
March 15 and April 24, 2011, 75% of adults said that college
education has become too expensive for most people and
they cannot afford it (Time, May 30, 2011). Suppose that
this result is true for the current population of American
adults. Let p̂ be the proportion in a random sample of 1400
adult Americans who will hold the said opinion. Find the
probability that 76.5% to 78% of adults in this sample will
hold this opinion.
Solution: n = 1400, p = 0.75, so q = 1 – p = 0.25
pˆ  p  0.75
ˆ
p ~ N ( pˆ , pˆ )

Find P(0.765 < p̂ < 0.78) = ?  pˆ 
= P(1.3 < z < 2.59) = 0.0920
STAT 3038



Dr. Yingfu (Frank) Li
n =400, p = .53, and q = 1 – p = 1 – .53 = .47
pˆ  p  .53
 pˆ 
pq
0.75 0.25

 0.01157275
n
1400
7-33
Maureen Webster, who is running for mayor in a large city,
claims that she is favored by 53% of all eligible voters of
that city. Assume that this claim is true. What is the
probability that in a random sample of 400 registered voters
taken from this city, less than 49% will favor Maureen
Webster?
Solution

STAT 3038
P(
pq
n

(.53)(.47)
400
 .02495496
p̂ < .49) = P(z < -1.60) = .0548
7-34
Dr. Yingfu (Frank) Li
Summary


Introduction to sampling distribution of X and p̂
These two distributions are both approximately normal with
means and standard deviations as follows

 For X :
x  
x 
n
X ~ N ( X ,  X )
2

For
p̂ :
 p̂  p

p̂

pq
n
pˆ ~ N ( pˆ , 2pˆ )

STAT 3038
Applications of these two sampling distributions
7-35
Dr. Yingfu (Frank) Li
9
Related documents