Download 4.1 Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Eigenstate thermalization hypothesis wikipedia , lookup

Misuse of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
4.1 Hypothesis Testing
• z-test for a single value
• double-sided and single-sided z-test for
one average
• z-test for two averages
• double-sided and single-sided t-test for
one average
• the F-parameter and F-table
• the F-test
• the t-test for two averages
4.1 : 1/14
z-Test for a Single Value
Suppose that an analytical technique is done sufficiently often that
σ is known. Also assume that only a single measured value is
available.
The following hypothesis test involves the situation where μ is
specified or obtained by theory.
Null Hypothesis: the measured value, x, comes from a normal pdf
having μ as its mean.
Alternative Hypothesis: the measured value, x, comes from a
normal pdf not having μ as its mean.
To test the hypothesis compute, zcalc = |x - μ|/σ.
If zcalc ≤ 1.96, accept the null hypothesis.
If zcalc > 1.96, accept the alternative hypothesis.
4.1 : 2/14
Graphical Interpretation of Test
The null hypothesis will be accepted for any pdf having a mean over
the range μ = x - 1.96σ (blue curve) to μ = x + 1.96σ (red curve).
Accepting the null hypothesis does not guarantee that x comes from
a pdf having μ as its mean. Acceptance infers that the pdf resulting
in x is statistically indistinguishable from one having μ as its mean.
Rejection of the null hypothesis states with 95% confidence that x
comes from a pdf not having μ as its mean.
0.4
pdf with lowest
possible mean
pdf with highest
possible mean
f(x)
0.3
2.5% on
each side
0.2
2.5% on
each side
0.1
0
-6
4.1 : 3/14
-4
-2
0
2
σ units from x
4
6
z-Test for One Average
Suppose that an analytical technique is done sufficiently often that
σ is known. Also assume that an average, x , has been obtained
using N replicate measurements.
The following hypothesis test involves the situation where μ is
specified or obtained by theory.
Null Hypothesis: the average, x , comes from a normal pdf having μ
as its mean.
Alternative Hypothesis: the average, x , comes from a normal pdf
not having μ as its mean.
To test the hypothesis compute,
zcalc = x − μ
N σ
If zcalc ≤ 1.96, accept the null hypothesis.
If zcalc > 1.96, accept the alternative hypothesis.
4.1 : 4/14
Single-Sided z-Test for One Average
Confidence limits can also be used to guarantee that the measured
average comes from a pdf with a mean greater than or equal to a
specified value, C0. The entire 5% uncertainty is put on the lowvalue side of the pdf. For a normal pdf F(x) occurs at x = μ - 1.64σ.
Now, zcalc = (C0 - x )/σ, and can be negative.
Alternative hypothesis:
z > 1.64, the measured
average is less than the
specified value.
C0
0.3
f(x)
Null hypothesis: z ≤ 1.64,
the measured average is
greater than or equal to
the specified value.
0.4
0.2
5% on
one side
0.1
x + 1.64σ
0
-4
-3
-2
-1
0
1
σ units from C0
A similar strategy can be used to test if the measured
average is less than a specified value.
4.1 : 5/14
2
3
4
z-Test for Two Averages
It is possible to statistically test whether two experimental averages
come from the same pdf. Let x1 have the normal pdf, n(μ,σ/N11/2),
and x2 have the normal pdf, n(μ,σ/N21/2). To test the hypothesis
calculate the difference, D, and test whether it is statistically
indistinguishable from 0.
D = x1 − x2
μD = 0
σ D2
= σ x21
σD =
zcalc =
+ σ x22
=
σ2
N1
+
σ2
N2
=
N1 + N 2 2
σ
N1N 2
N1 + N 2
σ
N1N 2
D−0
σD
=
x1 − x2
σD
For zcalc ≤ 1.96, the averages are not statistically distinguishable.
For zcalc > 1.96, the averages come from pdfs with different means.
4.1 : 6/14
The t-Test for One Average
When the value of σ is not known and is estimated by s, the
confidence limits are determined by the t-parameter. The hypothesis
test uses tcalc instead of zcalc.
μ=x±
ts
N
tcalc = x − μ
N
s
To perform the hypothesis test, an appropriate value of ttable is
found by choosing the confidence level and the degrees of
freedom, φ = N-1.
Null hypothesis: tcalc ≤ ttable, the average comes from a pdf with a
mean indistinguishable from μ.
Alternative hypothesis: tcalc > ttable, the average comes from a pdf
with a mean different than μ.
4.1 : 7/14
t-Test Examples
A NIST nickel standard known to contain 6.15 mmol is analyzed by a
gravimetric method. Three replicate measurements were obtained:
{5.88, 5.68, 6.16 mmol}. The average is 5.91 mmol and the
standard deviation is 0.24 mmol.
N
3
= 5.91 − 6.15
= 1.73
s
0.24
ttable ( 0.95,φ = 2 ) = 4.30
tcalc = x − μ
Since tcalc ≤ ttable, the average is indistinguishable from a pdf having
a mean of 6.15 mmol. An A-grade! A second student ran 10
replicates and obtained an average of 5.46 mmol and a standard
deviation of 0.29 mmol.
N
10
tcalc = x − μ
= 5.46 − 6.15
= 7.52
s
0.29
ttable ( 0.95,φ = 9 ) = 2.26
Since tcalc > ttable, the average does not come from a pdf having
6.15 mmol as its mean. The student has to repeat the
determination and identify the determinate error.
4.1 : 8/14
Single-Sided t-Test
A pollution regulation requires that the concentration of airborne
SO2 be less than 10 ppm. Three measurements are made {12.64,
11.04,14.57} with an average of 12.75 and a standard deviation of
1.77. By analogy with the single-sided z-test on slide 4.1-5 we can
write the following.
tcalc =
( x − C0 )
N
s
=
(12.75 − 10.00 )
1.77
3
= 2.69
A single-sided t-test requires that we use a 90% double-sided table.
φ
2
3
4
9
19
29
39
∞
t(0.90)
2.92
2.35
2.13
1.83
1.73
1.70
1.69
1.64
Since tcalc ≤ ttable(0.90,2), the average of 12.75 is statistically
indistinguishable from the 10 ppm regulation.
4.1 : 9/14
The F-Parameter
The ratio of two experimental variances is called the F-parameter,
where F = s12/s22. The pdf, f(F), is asymmetric and has to be
integrated numerically.
φ1 − 2
φ1 ⎛ φ1 + φ2 ⎞ ⎛ φ1 ⎞ 2
Γ⎜
⎟ ⎜φ F ⎟
φ ⎝ 2 ⎠ ⎝ 2 ⎠
f (F ) = 2
φ1 +φ2
⎛ φ1 ⎞ ⎛ φ2 ⎞
Γ⎜ ⎟Γ⎜ ⎟ ⎛ φ ⎞ 2
⎝ 2 ⎠ ⎝ 2 ⎠ ⎜1 + 1 F ⎟
⎝ φ2 ⎠
The red line is F(3,3), the
blue F(5,5), the green
F(10,10), the magenta
F(20,20), and the cyan
F(50,50).
As both N1 and N2 increase, the pdf becomes symmetric and the
mean approaches 1.
4.1 : 10/14
The F-Table
By convention the larger variance is placed into
the numerator so that 1 ≤ F ≤ +∞. This simplifies
integration by allowing it to start at 0.
Ftable
∫
f ( F ) dF = 0.95
0
The value of F yielding any particular confidence level depends upon
the degrees of freedom used to compute both variances. The
numerator φ1 are in the top row, the denominator φ2 in the left column.
φ2 \ φ1
2
3
4
9
19
29
39
∞
2
19.00
19.16
19.25
19.38
19.44
19.46
19.47
19.50
3
9.55
9.28
9.12
8.81
8.67
8.62
8.60
8.53
4
6.94
6.59
6.39
6.00
5.81
5.75
5.72
5.63
9
4.26
3.86
3.63
3.18
2.95
2.87
2.83
2.71
19
3.52
3.13
2.90
2.42
2.17
2.08
2.03
1.88
29
3.33
2.93
2.70
2.22
1.96
1.86
1.81
1.64
39
3.24
2.85
2.61
2.13
1.86
1.76
1.70
1.52
∞
3.00
2.61
2.37
1.88
1.59
1.47
1.40
1.00
4.1 : 11/14
Example F-Test
Two students analyzed a steel sample for nickel. The first used a
gravimetric method and obtained 3 values, {9.87, 9.95, and 10.01
mmol}, with s = 0.07 mmol. The second used a titrimetric method
and obtained 5 values, {9.98, 9.99, 9.99, 10.06, 9.97 mmol}, with
s = 0.04 mmol. Although the two standard deviations differ by a
factor of ~2, are they statistically different?
Fcalc =
2
slarger
2
ssmaller
2
0.07 )
(
=
= 3.06
2
( 0.04 )
Since Fcalc ≤ Ftable(0.95,φlarger=2,φsmaller = 4) = 6.94, the two standard
deviations are statistically indistinguishable.
4.1 : 12/14
t-Test for Two Averages
Two experimental averages can be compared as long as they both
come from pdfs having the same standard deviation. Ordinarily the
two experimental standard deviations are first checked using the Ftest. If the F-test is passed, then the t-test for two averages can
proceed.
First compute a pooled variance using the following equation, then
calculate t.
N1 − 1) s12 + ( N 2 − 1) s22
(
2
sp =
N1 + N 2 − 2
tcalc =
x1 − x2
sp
N1N 2
N1 + N 2
The ttable value depends upon the confidence level and N1+N2 -2
degrees of freedom.
If tcalc ≤ ttable, accept the null hypothesis; if tcalc > ttable, accept the
alternative hypothesis.
4.1 : 13/14
Example t-Test for Two Averages
Use the previous example where two students analyzed a steel
sample for nickel. The first used a gravimetric method and obtained
3 values with x1 = 9.94 and s1 = 0.07 mmol. The second used a
titrimetric method and obtained 5 values with x2 = 10.00 and
s2 = 0.04 mmol. The two variances have already been shown to be
statistically indistinguishable.
s 2p
N1 − 1) s12 + ( N 2 − 1) s22 2 × 0.07 2 + 4 × 0.042
(
=
=
= 0.0522
tcalc =
N1 + N 2 − 2
x1 − x2
sp
6
9.94 − 10.00 15
N1N 2
=
= 1.15 × 1.37 = 1.57
N1 + N 2
0.052
8
ttable ( 0.95,φ = 6 ) = 2.45
Since tcalc ≤ ttable, the null hypothesis is accepted. The two means
are statistically indistinguishable.
4.1 : 14/14