Download STP 226 Brief Lecture Notes, Instructor Ela Jackiewicz

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
STP 226 Brief Lecture Notes, Instructor Ela Jackiewicz
CHAPTER 7
THE SAMPLING DISTRIBUTION OF THE MEAN
7.1
Sampling Error; The need for Sampling Distributions
Sampling Error – the error resulting from using a sample characteristic (statistic) to
estimate a population characteristic (parameter).
Example 1 (Household Income): Try to estimate the population mean income,  , of all
U.S. households by the sample mean income, x , of the 60,000 households surveyed. In
1998, it was reported to be $51,855 ( x ) in CPR (Current Population Reports).
Question: How accurate is our sample mean (estimate) likely to be? What is the
probability that a sample mean from SRS of 60,000 households will estimate the
population mean households income with an error of no more than $1000?
To answer these questions we need to examine the sampling distribution of x .

Sampling Distribution: The distribution of a statistic, or the distribution of all
possible observations of the statistic for samples of a given size.
Example 2. Weight of certain breed of dogs. Below we have unrealistic small
population of 5 dogs and their weight in pound.
dog
weight
A
42
B
48
C
52
D
58
E
60
The population mean height is
∑ x =42+ 48+ 52+ 58+ 60 =52 pounds. σ=6.57 pounds
μ=
N
5
Possible samples of size two and its means are summarized in the following table.
Samples A, B A, C A, D A, E B, C B, D B, E C, D C, E D, E
Weights 42,48 42,52 42,58 42,60 48,52 48,58 48,60 52,58 52,60 58,60
45
47
50
51
50
53
54
55
56
59
x
Any sample of size 2 we will take is going to be one of the above 10 possible samples, so
probability of obtaining each value of x̄ is 1/10.
How confident of our sample mean of size two is going to estimate our population mean
within 2 pounds of the population mean weight?
In other words what is P(50≤ x̄≤54) ?
Since there are 5 samples ({A,D}, {A,E}, {B,C}, {B,D} and {B, E} ) which lie within 2
pounds of the population mean 52, P(50≤ x̄≤54)=50 %
STP 226 Brief Lecture Notes, Instructor Ela Jackiewicz
We can also compute mean and standard deviation of all x̄ values, they are:
μ x̄ =52 and σ x̄ =4.02
Notice that mean of all x̄ values is the same as the population mean and standard
deviation is smaller than the population standard deviation.
Now we repeat our example for samples of size 4
Possible samples of size four and its means are summarized in the following table.
Samples
A, B, C,D
A,B,C,E
A,B,D,E
A,C,D,E
B,C,D,E
Weights 42,48, 52, 58 42,48, 52, 60 42,48,58,60 42,52,58,60 48,52,58,60
50
50.5
52
53
54.5
x
Any sample of size 4 we will take is going to be one of the above 5 possible samples, so
probability of obtaining each value of x̄ is 1/5.
This time
P(50≤ x̄≤54) = 4/5=80%
We can also compute mean and standard deviation of all x̄ values, they are:
μ x̄ =52 and σ x̄ =1.64
In conclusion we can clearly see that mean the distribution of Ȳ remains the same as a
population mean , regardless of the sample size. Standard deviation of that distribution
decreases as n increases.
Sampling Distribution of the sample mean - the distribution of the variable x (i.e., of
all possible sample means) for a given variable x.
Sample size and Sampling Error
As sample size increases, the more sample means cluster around the population mean,
and the sampling error of estimating , by x is smaller.
What we do in practice?
Large and unknown population -> Obtaining the sampling distribution is not feasible.
But we can find approximate sampling distribution of the sample mean for any
underlying population distributions, which we discuss in 7.2 and 7.3.
7.2
The Mean and Standard deviation of x
We use the sampling distribution of the sample mean to make inferences about a
population mean based on the mean of a sample from the population. Bur generally we
do not know the exact distribution of the sample mean (sampling distribution).
STP 226 Brief Lecture Notes, Instructor Ela Jackiewicz
Under certain conditions, we can approximate the sampling distribution of the sample
mean ( x ) by the normal distribution. Normal distribution is determined by its mean and
standard deviation. So let’s denote its mean is  x and its standard deviation is  x .
Mean of the variable x
For samples of size n, the mean of the variable x equals the mean of the variable x
under consideration: (the mean of all possible sample mean equals the population mean)
x   .
Standard Deviation of the variable x
For samples of size n, the standard deviation of the variable x equals the standard
deviation of the variable under consideration divided by the square root of the sample
size: (the standard deviation of all possible sample means equals the population standard
deviation divided by the square root of the sample size)

x 
n
Sample Size and Sampling Error
1.
2.
The larger the sample size, the smaller the standard deviation of x .
The smaller the standard deviation of x , the more closely its possible values
cluster around the mean of x .
The mean of x is the same as the population mean:  x  
3.
NOTE: The standard deviation of x determines the amount of sampling error to be
expected when a population mean is estimated by s ample mean. So often it is referred to
as the standard error of the sample mean.
Standard error (SE) of a statistic – standard deviation of a statistic
7.3
The Sampling Distribution of the Mean
Sampling Distribution of the Mean for a Normally Distributed Variable
•
If the variable x of a population is normally distributed with mean  and standard
deviation , then, for any sample of size n  1 , the variable x is also normally

distributed with mean  and standard deviation
.
n
STP 226 Brief Lecture Notes, Instructor Ela Jackiewicz

The Central Limit Theorem (CLT) – one of the most important theorems is
statistics
For a relatively large sample size, the variable x is approximately normally distributed,
regardless of the distribution of the variable x under consideration. The approximation
becomes better and better with increasing sample size.
The Sampling Distribution of the Sample Mean
If a variable
size n,
1.
2.
3.
4.
x
of a population has mean  and standard deviation , then for samples of
The mean of x equals the population mean :  x  
The standard deviation of x (standard error of x ) equals the population

standard deviation divided by the square root of the sample size:  x 
If x is normally distributed, then so is x , regardless of sample size.
If the sample size is large (approximately bigger than 30), then x is
approximately normally distributed, regardless of the distribution of x.
n
Following graphs represent distribution of IQ scores (X) in some population (a) and
sampling distributions of a sample mean ( ̄x ) for n=4 (b) and n=16 (c)
Notice that all three distribution curves center at 100 ( μ ) and graphs get to be
narrower as sample sizes are increasing
STP 226 Brief Lecture Notes, Instructor Ela Jackiewicz
Following graph illustrates distribution of househols sizes in the USA (clearly not
normal distribution
This histogram illustrates distribution
of the sample mean of 1000 samples
of size n=30. Clearly nearly normal
distribution
Example.
Let Y be a height of males in certain population. Assume Y has approximately normal
distribution with mean μ=69.7 inches and SD σ=2.8 inches.
a) Suppose we randomly select one individual from that population, what is the
probability that his height will exceed 72 inches?
P(Y>72)= P(Z>0.82)=0.2061, where Z =
72−69.7
2.8
and probability equals to the area
under N(69.7, 2.8) to the right of 72.
b) Suppose we randomly select a sample of 4 individuals from that population. What is
the probability that their average height ( Ȳ ) will estimate population mean with an
error of no more than 1 inch?
STP 226 Brief Lecture Notes, Instructor Ela Jackiewicz
P(68.7≤Ȳ ≤70.7)=P (−0.71≤Z≤0.71)=0.5223 , where −0.71=
0.71=
68.7−69.7
and
2.8/ √ 4
70.7−69.7
and probability equals to the area under N(69.7, 2.8/ √ 4=1.4 )
2.8 / √ 4
c) Without computations, will the answer in part b change or will it remain the same if
we take a sample of size 16?
If n=16, distribution of Ȳ will be N(69.7, 2.8/ √ 16=0.7 ), so it will be narrower
than the distribution for n=4. The % of all Ȳ values within 1 inch of off the mean will
be larger, so our probability will increase.
d) Suppose our population was not normal, but severely left skewed, what would be the
answers to questions b and c?
If population is not normal, we need to use Central Limit Theorem, but that requires us to
have a large sample (of size at least 30). If samples are as small as 4 and 16, we
can't assume that the distributiln of Ȳ will be normal or approximately normal,
so we can't answer our questions in both parts.
Example (Central Limit Theorem)
Based on service records from the past year, the time (in hours) that a technician
requires to complete preventative maintenance on an air conditioner follows the
distribution that is strongly right-skewed, and whose most likely outcomes are close to 0.
The mean time is µ = 1 hour and the standard deviation is σ = 1.
Your company will service an SRS of 70 conditioners.
You budgeted 1.1 hour per unit. Will that be enough?
The Central Limit Theorem stateds that the sampling
distribution of the mean time spent working on 70 units
is approximately normal with mean 1 and SD 0.12
(since n=70>30).
z=
1 .1−1
=0 .83
0 .12
P ( ̄x >1. 1)=P (Z >0 . 83)
=1−0 .7967=0 . 2033
If you budgeted 1.1 hour per unit, there is
over 20% chance that the technicians
will not complete the work within the budgeted
time.