Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Graph of the Day
Lecture 9
Chapter 17. Inference for a
population mean (σ unknown)
Objectives (PSLS Chapter 17)
Inference for the mean of one population (σ unknown)

Know which sampling distribution to use when s is unknown

Know the properties of t distributions

Be able to apply Student’s t-tests

Be able to use confidence intervals based on the t distribution

Know how to adapt t procedures for matched pairs designs

Recognize the limits of robustness for t-tests
Motivating Examples: Sweetening colas and Guinness beer
How is the sweetness of a cola drink affected by storage? The sweetness
loss due to 1 year of storage was evaluated by 10 professional tasters (by
comparing the sweetness before and after storage):
Taster
1
2
3
4
5
6
7
8
9
10
“Sweetness” Change (Before – After)
-2.0
-0.4
We want to test if storage
-0.7
results in a loss of sweetness.
-2.0
This can be translated into a
0.4
-2.2
statistical hypothesis (Ha), and
1.3
we can look for evidence
-1.2
-1.1
against the null hypothesis of
-2.3
no lose of sweetness,
H0: m = 0 versus Ha: m ≠0
We are familiar with such tests, except we do not know the value of the
population parameter s in this case.
Motivating Examples: Sweetening colas and Guinness beer
• In 1908 Guinness biochemist William Gosset
developed the t-test
• It was used as a means for comparing small
samples of beer and beer ingredients, and may
have been applied to beer quality control and
process/recipe development.
• It should be called the Gosset t-test. Gosset published the
t-test under a pseudonym (“Student”), because Guinness did not
want competitors to know they were using statistics to improve and
maintain consistency of their product.
• While it appears at first that Gosset’s t-distribution was only a
minor tweak, those small adjustments are so important that
engineers and scientists now recognize the t-distribution as
essential when we don’t know s.
What to do when s is unknown

Use s. The sample standard deviation (s) provides an estimate of the
population standard deviation (s).

Z scores are values on a standard deviation scale. You can’t
calculate a Z-score without knowing the standard deviation. You can
estimate a Z-score if you have an estimate of the standard deviation.

Just as with the sample mean, s is subject to sampling variation. s will
be < s or > s . That means the estimates of Z will be over or
underestimated in any given sample.

Larger samples give more reliable estimates of s.
The t distributions
We take 1 random sample of size n from a Normal population N(µ,σ):
s is known, the sampling distribution of x is Normal N(m, s/√n),

)
and the statistic z = x  m follows the standard Normal N(0,1).
s n
When
When
s is estimated from the sample standard deviation s, the statistic
follows a t distribution t (m = 0, s/√n = 1) with n − 1 degrees of
freedom.
t=
x  m )
s
n
The t and Z sampling distributions
m = 0, s / √n = 1, n = 15
When n is very large, s is a very good estimate of s and the
corresponding t distributions are very close to the Normal distribution.
The t distributions become wider for smaller sample sizes, reflecting the
lack of precision in estimating s from s.
Using the t-distribution tables
When n is very large, s is a very good estimate of s and the corresponding t
distributions are very close to the Normal distribution.
Table C provides the t-values and corresponding confidence ranges for the tdistribution. How large does n have to be before the t-distribution approximates
a Z distribution?
Note: “degrees of freedom” equals n - 1.
Table C
When σ is unknown
we use a t distribution
with “n − 1” degrees
of freedom (df).
Table C shows the
z-values and t-values
corresponding to
landmark P-values/
confidence levels.

When σ is known,
we use the Normal
distribution and z.
x m
t=
s n
Standard deviation versus standard error
For a sample of size n,
the sample standard deviation s is:
n − 1 is the “degrees of freedom.”
1
s=
(x i - x ) 2
å
n -1
The value s/√n is called the standard error of the mean SEM.
Scientists often present their sample results as the mean ± SEM.
A medical study examined the effect of a new medication on the
seated systolic blood pressure. The results, presented as mean ±
SEM for 25 patients, are 113.5 ± 8.9. What is the standard
deviation s of the sample data?
SEM = s/√n <=> s = SEM*√n
s = 8.9*√25 = 44.5
Why n-1?
If you know x , then once you calculate the first n-1 squared deviations,
you know the last without having to calculate it.
x
Related example: If n = 5,
= 0, and the first four observations are
-2, -1, 0, & 1, then you know the last observation. It must take a certain
value for all of the given information to be consistent. That last observation
is not free to vary. In this example there are only 4 degrees of freedom in x.
What happens if we calculate the sample standard deviation and use n
instead of n-1 in the denominator?
The one-sample t test
As before, a test of hypotheses requires a few steps:
1. Identifying the biological hypothesis
2. Translating that a statistical null hypothesis (H0)
3. Choosing a significance level a
4. Calculating t and its degrees of freedom
5. Finding the area under the curve with Table C or software
6. Estimating the difference and stating the P-value
7. Making a conclusion about H0
8. Making a biological conclusion
We draw a random sample of size n from an N(µ, σ) population.
When s is estimated from s, the distribution of the test statistic t is a
t distribution with df = n – 1.
H o : m =mo
1
x  m0
t=
s n
0
t
This resulting t test is robust to deviations from Normality as long as the
sample size is large enough.
The P-value is the probability, if H0 was true, of randomly drawing a
sample like the one obtained or more extreme in the direction of Ha.
One-sided
(one-tailed)
Two-sided
(two-tailed)
t=
x  m0
s
n
Using Table C:
For Ha: μ ≠ μ0
if n = 10 and t = 2.70, then…
2.398 < t 2.7 < 2.821
so
0.04 > P-value > 0.02
Sweetening colas (cont.)
Is there evidence that storage results in sweetness loss for the new cola
recipe at the 0.05 level of significance (a = 5%)?
H0: m = 0 versus Ha: m ≠ 0
x  m0
1.02  0
=
= 2.70
s n 1.196 10
df = n  1 = 9
t=
2.398 < t = 2.70 < 2.821, 0.04 > P > 0.02
P < a . The result is mildly significant.
Taster
Sweetness loss
1
2.0
2
0.4
3
0.7
4
2.0
5
-0.4
6
2.2
7
-1.3
8
1.2
9
1.1
10
2.3
___________________________
Average
1.02
Standard deviation
1.196
 There is a significant loss of sweetness, on average, following storage.
Confidence intervals
A confidence interval is a range of values that contains the true
population parameter with probability (confidence level) C.
We have a set of data from a population with both m and s unknown. We
use x̅ to estimate m, and s to estimate s, using a t distribution (df n − 1).

C is the area between −t* and t*.

We find t* in the line of Table C.

The margin of error m is:
m = t*s
n
C
m
−t*
m
t*
Taster
Sweetening colas (cont.)
Sweetness loss
(positive value = loss)
What is the true population mean sweetness
loss after storage? We want 90% confidence.
1
2.0
2
0.4
3
0.7
4
2.0
5
-0.4
6
2.2
7
-1.3
8
1.2
9
1.1
10
2.3
___________________________
Mean
1.02
Standard deviation
m = t* s
1.196
n = 1.833 1.196 / 10  0.693
x  m = 1.02  0.69

0.33 to 1.71
With 90% confidence, the true population mean sweetness loss is
somewhere between 0.33 and 1.71.
Matched pairs t procedures
Sometimes we want to compare treatments or conditions at the
individual level. The data from the pairs of observations are not
independent.
Example study designs where the individuals in one sample are related
to those in the other sample:

Pre-test and post-test studies look at data collected on the same sample
elements before and after some experiment is performed.

Twin studies often try to sort out the influence of genetic factors by
comparing a variable between sets of twins.

Using people matched for age, sex, and education in social studies
allows us to cancel out the effect of these potential lurking variables.
In these cases, we use the paired data to test for the difference in the
two population means.
_
The variable tested becomes Xdiff = average difference, and
H0: µdifference=0; Ha: µdifference ≠ 0
Conceptually, this is just like a test for one population mean.
Sweetening colas (revisited)
The sweetness loss due to storage was evaluated by 10 professional
tasters (comparing the sweetness before and after storage):
Taster
1
2
3
4
5
6
7
8
9
10
Sweetness loss
2.0
0.4
0.7
2.0
−0.4
2.2
−1.3
1.2
1.1
2.3
We want to test if storage
results in a loss of
sweetness, thus
H0: m = 0 versus Ha: m ≠ 0
Although the text did not mention it explicitly, this is a pre-/post-test design,
and the variable is the difference in cola sweetness before and after storage.
A matched pairs test of significance is indeed just like a one-sample test.
Does lack of caffeine increase depression?
Randomly selected caffeine-dependent individuals were deprived of all caffeinerich foods and assigned to receive daily pills. At one time the pills contained
caffeine and, at another time they were a placebo. Depression was assessed
quantitatively (higher scores represent greater depression).
Depression Depression Placebo Subject with Caffeine with Placebo Caffeine
Cafeine
1
5
16
11
2
5
23
18
3
4
5
1
4
3
7
4
5
8
14
6
6
5
24
19
7
0
6
6
8
0
3
3
9
2
15
13
10
11
12
1
11
1
0
-1
This is a matched pairs design with 2
data points for each subject.
We compute a new variable
“Difference” Placebo minus Caffeine
With 11 "difference" points, df = n – 1 = 10.
We find: x̅diff = 7.36; sdiff = 6.92; so SEMdiff = sdiff / √n = 6.92/√11 = 2.086
We test:
H0: mdiff = 0 ; Ha: mdiff ≠ 0
t=
xdiff  mdiff
sdiff
n
=
xdiff  0
SEM diff
7.36
=
 3.53
2.086
(…)
For df = 10, 3.169 < t 3.53 < 3.581  0.005 > P-value > 0.0025
(Software gives P = 0.0207.)
Caffeine deprivation causes a significant increase in depression
(P < 0.005, n = 11). [Assuming the P-value is valid]
Robustness
The t procedures are exactly correct when the population is exactly
Normal. This is rare.
The t procedures are robust to small deviations from normality, but:

The sample must be a random sample from the population.

Parent population that produce lots of outliers and are skewed
strongly influence the mean and therefore the t procedures. Their
impact diminishes as the sample size gets larger because of the Central
Limit Theorem.
As a guideline:



When n < 15, the data must be close to normal and without outliers.
When 15 > n > 40, mild skewness is acceptable, but not outliers.
When n > 40, the t statistic will be valid even with strong skewness.
Red wine, in moderation
Does drinking red wine in moderation increase blood polyphenol levels,
thus maybe protecting against heart attacks?
Nine randomly selected healthy men were assigned to drink half a bottle of red
wine daily for two weeks. The percent change in their blood polyphenol levels
was assessed:
0.7
3.5
4
4.9
5.5
7
7.4
8.1
8.4
x̅ = 5.5; s = 2.517; df = n − 1 = 8
1.2
2.4
3.6
4.8
6.0
Percent change in blood polyphenol level
7.2
8.4
Can we use a t inference procedure for this study? Discuss the assumptions.