Download stat slides - sampling distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SAMPLING DISTRIBUTIONS
POPULATION AND SAMPLES, PARAMETERS AND STATISTICS
RECALL!
A POPULATION is the set of all possible subjects of a given experiment or study.
A SAMPLE is a specially-chosen, relatively small subset of the population that is
used for actual measurements. Usually, a RANDOM SAMPLE is obtained.
POPULATION
MEAN (μ)
ST.DEV. (σ)
SAMPLE1
MEAN (X1)
ST.DEV. (s1)
NOTE!
SAMPLE2
MEAN (X2)
ST.DEV. (s2)
SAMPLE3
MEAN (X3)
ST.DEV. (s3)
For a given experiment, the measurements on the population are fixed quantities
(constants), These measurements are called PARAMETERS (of the population).
For each of the sample, these measurements are variable quantities (vary from
sample to sample). These measurements are called (sample) STATISTICS.
SAMPLING DISTRIBUTIONS — POPULATION AND SAMPLES, PARAMETERS AND STATISTICS
Page 1
DISTRIBUTION OF A STATISTIC (MEANS)
EXPLAIN
PLEASE!
Suppose we have the following population data:
Computing the mean and standard deviation, we get:
2, 4, 6, 8
μ=4
σ=2.83
Now, we list every possible sample (of size N=2), compute their means,
make a frequency table and histogram for this list of sample means.
SAMPLE MEANS
2
6, 2
4
2, 4
3
6, 4
5
2, 6
4
6, 6
6
2, 8
5
6, 8
4, 2
3
8, 2
2, 2
4
sample
means
Freq.
2
1
3
2
7
4
3
5
5
4
4, 4
4
8, 4
6
6
3
4, 6
5
8, 6
7
7
2
8
8
1
4, 8
6
8, 8
3
2
1
0
SAMPLING DISTRIBUTIONS — DISTRIBUTION OF A STATISTIC (MEANS)
2
3
4
5
6
7
8
Page 2
THE SAMPLING DISTRIBUTION OF MEANS
SAMPLING
DISTRIBUTION
OF THE MEANS
If we can obtain all samples of a fixed size N≥30 from ANY POPULATION with mean μ
and standard deviation σ, then the distribution of the sample means is normal with:

mean X   and
standard deviation X 
N
For a NORMAL POPULATION, the samples can be of any size!
SAMPLE3
(SIZE N=35)
MEAN:
X1  12.3
MEAN:
X2  11. 7
MEAN:
X3  12. 8
SAMPLE4
(SIZE N=35)
SAMPLE5
(SIZE N=35)
SAMPLE6
(SIZE N=35)
MEAN:
X 4  10. 9
MEAN:
X5  13. 8
MEAN:
X1  10. 2
...
...
MEAN: μ=12.2
ST.DEV.: σ=2.41
POPULATION POPULATION POPULATION PO
PULATION POPULATION POPULATION POPULATION POPU
SAMPLE2
(SIZE N=35)
...
LATION POPULATION POPULATION POPULA
SAMPLE1
(SIZE N=35)
The DATA SET of all the SAMPLE MEANS
(each sample of fixed size, N=35):
X1 , X 2 , X 3 , X 4 , X5 ,
X6 , X7 , X8 , X9 , X10 , . . .
has a NORMAL DISTRIBUTION.
Also, for this DATA SET of SAMPLE MEANS:
MEAN X    12.2
STANDARD
DEVIATION
X 
2. 41
35
 0.41
POPULATION POPULATION POPULATION POPULATION
NOTE!
This DATA SET of the MEANS of all possible samples (of fixed size N) that can be
drawn from a given population is called the SAMPLING DISTRIBUTION OF MEANS.
SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS
Page 3
EXAMPLE 1. The sardines delivered to the Eugenio’s Cannery have a mean length of 4.54 ins.,
and a standard deviation of 1.03 ins.
A. Suppose these lengths are found to be normally distributed. What percentage of sardines
delivered to the cannery are longer than 5 ins?
MEAN = 4.54
ST.DEV. = 1.03
Convert the data values to Z-score:
X  5:
Shaded section:
4.54
5
Length (in)
0
0.47
z-score
Z 
5  4.54
 0.47
1.03
P(Z>0.47) = 1 – P(Z<0.47)
= 1 – 0.6808
= 0.3292 or 32.92%
B. If the sardines are delivered in plastic bags (15 per bag), what percentage of these bags
contain (sardines with) a mean length, less than 4.9 ins?
THINK!
The sardines are packed in plastic bags (15 each)
packed in samples of fixed size N=20.
Length of sardines
= NORMAL
The sampling distribution of
means (of lengths) = NORMAL
MEAN = 4.54
ST.DEV. = 0.27
with mean X    4.54
and st.dev. X 
1.03
15
 0.27
SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS
4.54
mean
length
Page 4
B. If the sardines are delivered in plastic bags (15 per bag), what percentage of these bags
contain (sardines with) a mean length, less than 4.9 ins? (Continuation)
MEAN = 4.54
ST.DEV. = 0.27
Convert the data values to Z-score:
X  5: Z 
Shaded section:
4.54 4.9
0
1.33
4.9  4.54
 1.33
0.27
P(Z<1.33)
= 0.9082 or 90.82%
Length (in)
z-score
C. If someone claims that he has found a plastic bag of rather large sardines with mean length
above 5.2 ins, would you believe his claim?
MEAN = 4.54
ST.DEV. = 0.27
Convert the data values to Z-score:
X  5.2 : Z 
Shaded section:
4.54 5.2
0
2.44
Length (in)
z-score
5.2  4.54
 2.44
0.27
P(Z>2.44) = 1 – P(Z<2.44)
= 1 – 0.9927
= 0.0063 or 0.63%
The probability of finding a pack of fish with mean length above 5.2 ins. is almost = 0 (or impossible!)
SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS
Page 5
EXAMPLE 2. A. C. Neilsen reported that children between the ages of 2 and 5 watch an average
of 25 hours of television per week, with standard deviation of 3 hours per week.
A. If 40 children (ages 2 to 5) are randomly selected, what is the probability that the mean no. of
hours they watch television is less than 24.6 hours?
THINK!
40 children (ages 2 to 5) are randomly selected.
samples (of children) of size N=40.
The sampling distribution of
mean no. of hours watching TV
= NORMAL
sample size
N=40 ≥30
MEAN = 25
ST.DEV. = 0.47
with mean X    25
3
and st.dev. X 
 0.47
40
MEAN = 25
ST.DEV. = 0.47
24.6
25
1.33
0
mean no. of
hours on TV
25
mean no. of
hours on TV
Convert the data values to Z-score:
X  24.6 : Z 
Shaded section:
24.6  25
 0.85
0.47
P(Z<–0.85) = 0.1977 or 19.77%
z-score
SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS
Page 6
B. If 35 children (ages 2 to 5) are randomly selected, what is the probability that the mean no. of
hours they watch television is more than 25.9 hours?
MEAN  25
3
ST.DEV. 
35
 0.51
25
25.9
0
1.91
mean no. of
hours on TV
Convert the data values to Z-score:
X  25.9 :
Z 
25.9  25
 1.91
0.47
Shaded section: P(Z>1.91)
= 1 – P(Z<1.91)
= 1 – 0.9719
z-score
= 0.0281 or 2.81%
B. If 35 children (ages 2 to 5) are randomly selected, what is the probability that the mean no. of
hours they watch television is more than 25.9 hours?
MEAN  25
3
ST.DEV. 
35
 0.51
NOTE!
25
25.9
0
1.91
mean no. of
hours on TV
Convert the data values to Z-score:
X  25.9 :
Z 
25.9  25
 1.91
0.47
Shaded section: P(Z>1.91)
z-score
= 1 – P(Z<1.91)
= 1 – 0.9719
= 0.0281 or 2.81%
If we get P(X)≤0.1000, it means that the event X is UNUSUAL (or nearly impossible).
Conditions that lead to unusual events can be rejected (like a contradiction).
SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS
Page 7
ESTIMATION OF PARAMETERS USING INTERVALS
EXAMPLE 1. In 36 sea water samples, the mean salt concentration was 23 cm3/m3. The st.dev of
salt concentration in all sea waters is known to be approximately 6.7 cm3/m3. How can
we estimate the mean salt concentration in all sea waters?
THINK!
In this case, we know the population st.dev. as: σ=2.83 cm3/m3
We want to estimate the population mean as: μ=?? cm3/m3
We know that one sample mean is:
X=23 cm3/m3
So, assume that the population mean is: μ=23 cm3/m3
0.90
But we are not fully 100% sure that μ=23!
— the actual mean
is somewhere around μ=23!
Find
To find our level of ‘sureness’, we can use the
SAMPLING DISTRIBUTION OF THE MEANS.
fixed sample size:
N = 36
(>30)
population mean:
X    23
2.83
population st.dev.:  X 
 0.47
36
Now, if I want to feel, say, 90% probability that
a mean-value will fall within a specific interval
around the assumed μ=23…
MEAN = 23
ST.DEV. = 0.47
?1
23
?2
-1.65
0
1.65
1.65 
?1  23
0.47
?1 = 22.22
1.65 
?2  23
0.47
?2 = 23.78
z-score
Finally! We can say: I am 90% certain that the
population mean is within 22.22 – 23.78!
SAMPLING DISTRIBUTIONS — ESTIMATION OF PARAMETERS USING INTERVALS
Page 8
TESTING HYPOTHESIS: WHEN TO REJECT A CLAIM
EXAMPLE 1. A manufacturer advertises that its new hybrid car has a mean gas mileage of at
least 50 mi/gal. To test this, you drove a random sample of 33 such vehicles and
computed a mean of 47 mi/gal. If the standard deviation of the gas mileages of all
such cars is 5.8 mi/gal, how do we know if we can reject the ad?
THINK!
In this case, we know the population st.dev. as: σ=5.5 mi/gal
We assume that the ad is true: (A) μ≥50 mi/gal (μ is AT LEAST 50)
But, if the ad is not true:
(B) μ<50 mi/gal
MEAN = 50
ST.DEV. = 1.01
We have to test if the sample we have found is
‘possible’ under the assumption. We use the
SAMPLING DISTRIBUTION OF THE MEANS.
fixed sample size:
N = 33
0.05
(>30)
?
-1.65
population mean:
X    50
5.8
population st.dev.:  X 
 1.01
33
Here, I admit that I may be wrong in rejecting the
assumption! So, I say: there is a small probability
(say 5%) that a mean-value can be found to be too
low for the stated assumption: μ≥50 mi/gal
Z
47  50
1.01
Find
50
0
z-score
Z = -2.97
With the assumption and my own admittance of
possible mistake at 5% probability, I can reject
the assumption! (since against that 5% probability,
I still have found a sample with a mean too low!)
SAMPLING DISTRIBUTIONS — TESTING HYPOTHESIS: WHEN TO REJECT CLAIMS
Page 9
OTHER SAMPLE STATISTICS BESIDES THE MEAN
NOTE!
For EACH SAMPLE (of FIXED SIZE N) of A GIVEN POPULATION the most common
SAMPLE STATISTICS that can be calculated are the following:
MEAN
X is simply the mean of the sample
Z-STATISTIC
z 
X 
(  N)
where
X is the mean of the sample
 is the mean of the population
 is the st.dev. of the population
T-STATISTIC
t 
X 
(s N)
where
X is the mean of the sample
 is the mean of the population
s is the st.dev. of the sample
where
 is the mean of the population
s is the st.dev. of the sample
χ2-STATISTIC
NOTE!
2
 
(N  1)s2
2
Each of these SAMPLE STATISTICS possesses a specific SAMPLING DISTRIBUTION.
We have discussed the SAMPLING DISTRIBUTION OF THE MEANS before.
As for the Z-STATISTIC, if a SAMPLING DISTRIBUTION OF A STATISTIC is NORMAL,
then we just convert that STATISTIC to Z-SCORE and use the standard normal.
SAMPLING DISTRIBUTIONS — OTHER SAMPLE STATISTICS BESIDES THE MEAN
Page 10
THE SAMPLING DISTRIBUTION OF THE t-STATISTIC
t-STATISTIC
NOTE!
t 
X 
(s N)
where
X is the mean of the sample
 is the mean of the population
s is the st.dev. of the sample
The t-statistic is used for questions about the population mean where
the (population) standard deviation is not given — which is usually the case!!
SAMPLING
DISTRIBUTION
OF THE t-STAT
If the distribution of sample means is normal,
then the data set of the t-statistic:
t 
N = 10
df = 9
X 
(s N)
has the so-called t-distribution (with df = N – 1)
NOTE!
The t distribution is symmetric around the mean,
at t=0. It is very similar to the standard normal!
The standard normal has exactly one form for all
sorts of normal data, while the t-distribution has one
form for each value of df (degrees of freedom)
N = 65
df = 64
From previous section, we learned that, if the
population is normal, or if sample size N≥30,
then the distribution of the sample means is
normal! And so, the t-stat has the t-distribution!
SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF THE T-STATISTICS
Page 11
THE SAMPLING DISTRIBUTION OF THE χ2-STATISTIC
χ2-STATISTIC
NOTE!
2
 
(N  1)s2
where
2
 is the mean of the population
s is the st.dev. of the sample
The χ2-statistic is used for questions about the
variance and standard variation of the population data.
SAMPLING
DISTRIBUTION
OF THE χ2-STAT
If the distribution of sample means is normal,
then the data set of the χ2-statistic:
2
 
N = 10
df = 9
(N  1)s2
2
has the so-called χ2-distribution (with df = N – 1)
NOTE!
The χ2-distribution is asymmetric, unlike the
t-distribution and the standard normal.
Like the t-distribution, the χ2-distribution has one
form for each value of df (degrees of freedom)
N = 15
df = 14
From previous section, we learned that, if the
population is normal, or if sample size N≥30,
then the distribution of the sample means is
normal! And so, the χ2-stat has the χ2-distribution!
SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF THE χ2-STATISTICS
Page 12
Related documents