Download week10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Decision Errors and Power
• When we perform a statistical test we hope that our decision will
be correct, but sometimes it will be wrong. There are two types
of incorrect decisions. To help distinguish these two types of
error, we give them specific names.
• The error made by rejecting the null hypothesis H0
(accepting Ha) when in fact H0 is true is called a type I error.
• The probability of making a type I error is denoted by .
• The error made by accepting the null hypothesis H0
(rejecting Ha) when in fact H0 is false is called a type II error.
• The probability of making a type II error is denoted by .
• The probability that a fixed level  significant test will reject H0
when a particular alternative value of the parameter is true is
called the power of the test to detect that alternative.
week10
1
• Significance and type I error
The significance level  of any fixed level test is the
probability of a Type I error. That is  is the probability that the
test will reject the null hypothesis H0 when H0 is in fact true.
• Power and Type II error
The power of a fixed level test against a particular alternative is
Power = 1- β = 1- P( accepting H0 when H0 is false) =
= P( rejecting H0 when H0 is false)
week10
2
Ways to increase Power
• Increase α. When we increase α the strength of evidence
required for the rejection is less.
• Consider a particular Ha that is farther away from μ0.
Values of μ that are in Ha but lie close to μ0 are harder to
detect (lower power) then values of μ that are far from μ0.
• Increase sample size. More data will provide more information
about the population so we have a better chance of
distinguishing values of µ.
• Decrease σ. This has the same effect as increasing the sample
size: more information about µ. Improving the measurement
process and restricting attention to a subpopulation are two
common ways to decrease σ.
week10
3
Example
• Example 6.16 discusses a test about the mean contents of cola
bottles. The hypotheses are
H0:  = 300
Ha:  < 300.
• The sample size is n = 6, and the population is assumed to
have a normal distribution with  = 3. A 5% significance test
rejects H0 if z ≤ Z0.05 = -1.645 where the test statistic z is
z  x 300
3/ 6
• Power calculations help us see how large a shortfall in the
bottle contents the test can be expected to detect.
(a) Find the power of this test against the alternative  = 299.
(b) Find the power against the alternative  = 295.
(c) Is the power against  = 290 higher or lower than the value
you found in (b)? Explain why
this result makes sense.
week10
4
Solution
• The rejection criterion z  –1.645 is equivalent to
X  300 – 1.645(3/6) = 297.99, so we may proceed as follows:
(a) P(reject H0 when  = 299) = P( X  297.99 when  = 299)
= P(Z  297 .99  299 ) = P(Z  –0.83) = 0.2033.
3/ 6
(b) P(reject H0 when  = 295) = P( X  297.99 when  = 295)
= P(Z  297 .99  295 ) = P(Z  2.44) = 0.9927.
3/ 6
(c) The power against  = 290 would be greater—it is further from
the null value of 300, so it is easier to distinguish from the null
hypothesis.
week10
5
Exercise
• You have an SRS of size n = 9 from a normal distribution with
σ = 1. You wish to test the following
H0:  = 0
Ha:  > 0
• You decide to reject H0 if X  0 and to accept H0 otherwise.
(a) Find the probability of a Type I error, that is, the probability that
your test rejects H0 when in fact  = 0.
(b) Find the probability of a Type II error when  = 0.3. This is the
probability that your test accepts H0 when in fact  = 0.3.
(c) Find the probability of a Type II error when  = 1.
week10
6
Qestion17 Final Exam Dec 2000
When testing H0: μ = 5 vs Ha: μ ≠ 5 at  = 0.01 with n =40
suppose that the probability of a type II error () is equal to
0.02 when  = 2. Which of the following statements are true?
a)  > 0.02 when  = 3
b)  > 0.02 if the sample size was 50 (at  = 2)
c)  > 0.02 if  had been twice as large. (at  = 2)
d) The power of the test is at  = 2 0.99
week10
7
Exercise
A study was carried out to investigate the effectiveness of a
treatment. 1000 subjects participated in the study, with 500
being randomly assigned to the "treatment group" and the other
500 to the "control (or placebo) group". A statistically significant
difference was reported between the responses of the two groups
(P <0 .005).
State whether the following statements are true of false.
a) There is a large difference between the effects of the treatment
and the placebo.
b) There is strong evidence that the treatment is very effective.
c) There is strong evidence that there is some difference in effect
between the treatment and the placebo.
d) There is little evidence that the treatment has some effect.
e) The probability that the null hypothesis is true is less than 0.005.
week10
8
Use and abuse of Tests
• The spirit of a test of significance is to give a clear statement
of the degree of evidence provided by the sample against the
null hypothesis. The P-value does this. There is no sharp
evidence between “significant” and “not significant” only
increasingly strong evidence as the P-value decreases.
• When large samples are available, even tiny deviations from
the null hypothesis will be significant (small P-value).
Statistically significant effect need not be practically
important. Always plot the data and examine them carefully.
Beware of outliers.
• On the other hand, lack of significant does not imply that H0 is
true, especially when the test has low power. When planning a
study, verify that the test you plan to use does have high
probability of detecting an effect of the size you hope to find.
week10
9
• Significant tests are not always valid. Faulty data collection,
outliers in the data, and testing a hypothesis on the same data
that first suggested that hypothesis can invalidate a test. Many
tests run at once will probably produce some significant results
by chance alone, even if the null hypotheses are true.
• The reasoning behind statistical significance works well if you
decide what effect you are seeking, design an experiment or
sample to search for it, and use a test of significance to weight
the evidence you get.
week10
10
Example
Suppose that the population of scores of the high school
seniors that took the SAT-Verbal test this year follows a normal
distribution with  = 48 and  = 90. A report claims that
10,000 students who took part in the national program for
improving SAT-verbal scores had a significantly better score
(at the 5% level of sig.) than the population as a whole.
In order to determine if the improvement is of practical
significance one should:
 Find out the actual mean score for the 10,000 students.
 Fine out the actual p-value.
week10
11
Example 6.22 on page 425 in IPS
• Suppose that we are testing the hypothesis of no correlation
between two variables. With 400 observation, an observed
correlation of only r = 0.1 is significant evidence at the α =
0.05 level that the correlation in the population is not zero. The
low significance level does not mean there is strong
association, only that there is some evidence of some
association.
• This is an example where the test results are statistically
significant but not practically significant.
week10
12
Question 19 Final Exam Dec 1998
a)
b)
c)
Suppose a researcher carries out 100 separate and
independent tests of a particular null hypothesis, and finds
that exactly 4 tests out of 100 are statistically sig. at the 5%
level. Which of the following statements are true?
We may report to the media that we have found evidence
for the alternative hypothesis at the 5% level.
If the null hypothesis were in fact true, the number of
statistically sig. test results (5% level), out of 100 tests,
should follow a binomial( n = 100, p = ½).
The null hypothesis has not been disproved, since the
results are not at all unusual, i.e. the observed level of
significance for these findings should be considered to be
quite unimpressive and of high probability.
week10
13
The t distribution
• Suppose that a SRS of size n is drawn from a N(μ, σ)
population. Then the one sample t statistic
x
t
s n
has a t distribution with n -1 degrees of freedom.
• The t distribution has mean 0 and it is a symmetric
distribution.
• The is a different t distribution for each sample size.
• A particular t distribution is specified by the degrees of
freedom that comes from the sample standard deviation.
week10
14
Tests for the population mean  when  is
unknown
• Suppose that a SRS of size n is drawn from a population
having unknown mean μ and unknown stdev. . To test the
hypothesis H0: μ = μ0 , we first estimate  by s – the sample
stdev., then compute the one-sample t statistic given by
x  0
t
s n
• In terms of a random variable T having the t (n - 1)
distribution, the P-value for the test of H0 against
Ha : μ > μ 0 is
Ha : μ < μ 0 is
P( T ≥ t )
P( T ≤ t )
Ha : μ ≠ μ 0 is 2·P( T ≥ |t|)
week10
15
Example
• In a metropolitan area, the concentration of cadmium (Cd) in
leaf lettuce was measured in 6 representative gardens where
sewage sludge was used as fertilizer. The following
measurements (in mg/kg of dry weight) were obtained.
Cd 21 38 12 15 14
8
Is there strong evidence that the mean concentration of Cd is
higher than 12.
Descriptive Statistics
Variable
Cd
N
6
Mean
18.00
Median
14.50
TrMean
18.00
StDev
10.68
• The hypothesis to be tested are: H0: μ = 12 vs
week10
SE Mean
4.36
Ha: μ > 12.
16
• The test statistics is:
t  x    1812 1.38
s / n 10.68/ 6
The degrees of freedom are df = 6 – 1 = 5
Since t = 1.38 < 2.015, we cannot reject H0 at the 5% level and
so there are no strong evidence.
The P-value is 0.1 < P(T(5) ≥ 1.38) < 0.15 and so is greater
then 0.05 indicating a non significant result.
week10
17
CIs for the population mean  when 
unknown
• Suppose that a SRS of size n is drawn from a population
having unknown mean μ. A C-level CI for μ when  is
unknown is an interval of the form
s
s 

*
*
, x t 
x t 

n
n

where t* is the value for the t (n -1) density curve with area C
between –t* and t*.
• Example:
Give a 95% CI for the mean Cd concentration.
week10
18
• MINITAB commands: Stat > Basic Statistics > 1-Sample t
• MINITAB outputs for the above problem:
T-Test of the Mean
Test of mu = 12.00 vs mu > 12.00
Variable
N
Mean
StDev
Cd
6
18.00
10.68
SE Mean
4.36
T
1.38
P
0.11
T Confidence Intervals
Variable
Cd
N
6
Mean
18.00
StDev
10.68
week10
SE Mean
4.36
95.0 % CI
(6.79, 29.21)
19
Question 3 Final exam Dec 2000
•
In order to test H0: μ = 60 vs Ha: μ ≠ 60 a random sample of 9
observations (normally distributed) is obtained, yielding x 55
and s = 5. What is the p-value of the test for this sample?
a)
b)
c)
d)
e)
greater than 0.10.
between 0.05 and 0.10.
between 0.025 and 0.05.
between 0.01 and 0.025.
less than 0.01.
week10
20
Question
A manufacturing company claims that its new floodlight will
last 1000 hours. After collecting a simple random sample of
size ten, you determine that a 95% confidence interval for the
true mean number of hours that the floodlights will last, , is
(970, 995). Which of the following are true? (Assume all tests
are two-sided.)
I) At any  < .05, we can reject the null hypothesis that the true
mean is 1000.
II) If a 99% confidence interval for the mean were determined
here, the numerical value 972 would certainly lie in this
interval.
III) If we wished to test the null hypothesis H0:  = 988, we
could say that the p-value must be < 0.05.
week10
21
Questions
1. Alpha (level of sig. α) is
a) the probability of rejecting H0 when H0 is true.
b) the probability of supporting H0 when H0 is false.
c) supporting H0 when H0 is true.
d) rejecting H0 when H0 is false.
2. Confidence intervals can be used to do hypothesis tests for
a) left tail tests.
b) right tail tests
c) two tailed test
3. The Type II error is supporting a null hypothesis that is false. T/F
week10
22
Robustness of the t procedures
• Robust procedures
A statistical inference procedure is called robust if the
probability calculations required are insensitive to violations of
the assumptions made.
• t-procedures are quite robust against nonnormality of the
population except in the case of outliers or strong skewness.
week10
23
Simulation study
• Let’s generate 100 samples of size 10 from a moderately
skewed distribution (Chi-square distribution with 5 df ) and
calculate the 95% t-intervals to see how many of them contain
the true mean μ = 5.
• First let’s have a look at the histogram of the 1000 values
generated from this distribution.
400
Frequency
300
200
100
0
0
10
20
30
C1
Variable
C1
N
1000
Mean
4.9758
Median
4.2788
week10
TrMean
4.7329
StDev
3.1618
24
T Confidence Intervals
Variable
C1
N
10
Mean
5.21
StDev
3.89
SE Mean
1.23
95.0 % CI
2.43,
7.99)
(
10
10
10
10
10
10
4.449
5.33
3.267
4.981
3.725
4.487
1.593
4.23
2.312
2.988
1.520
2.332
0.504
1.34
0.731
0.945
0.481
0.738
(
(
(
(
(
(
3.309,
2.31,
1.612,
2.844,
2.638,
2.819,
5.589)
8.36)
4.921)*
7.118)
4.812)*
6.155)
10
10
10
10
10
10
4.650
2.973
4.685
5.594
3.468
5.59
1.854
2.163
2.254
2.984
2.078
3.84
0.586
0.684
0.713
0.944
0.657
1.22
(
(
(
(
(
(
3.324,
1.425,
3.072,
3.459,
1.982,
2.84,
5.977)
4.520)*
6.297)
7.728)
4.955)*
8.34)
10
10
10
5.689
3.724
4.387
3.113
1.741
2.157
0.984
0.551
0.682
(
(
(
3.462,
2.479,
2.843,
7.916)
4.970)*
5.930)
10
10
10
7.01
3.281
4.78
3.44
2.265
3.20
1.09
0.716
1.01
(
(
(
4.55,
1.661,
2.49,
9.47)
4.902)*
7.06)
10
10
6.52
3.614
4.24
2.198
1.34
0.695
(
(
3.49,
2.042,
9.56)
5.186)
. . .
C4
C5
C6
C7
C8
C9
. . .
C14
C15
C16
C26
C27
C28
. . .
C62
C63
C64
. . .
C87
C88
C89
. . .
C99
C100
week10
The number of intervals not capturing
the true mean (μ = 5) is 6/100.
25
Example
• 100 samples of size 15 were drawn from a very skewed
distribution (Chi-square distribution with d. f. 1)
Variable
C1
N
Mean
Median
TrMean
StDev
1500
0.9947
0.4766
0.8059
1.3647
Frequency
1500
1000
500
0
0
5
10
15
C1
• The 95% CIs (t-intervals) for these 100 samples are given below.
week10
26
T Confidence Intervals
Variable
N
Mean
C1
15
0.773
C2
15
1.093
C3
15
0.553
C4
15
0.387
C5
15
1.239
...
C23
15
0.491
C24
15
0.582
C25
15
0.550
C26
15
0.634
C27
15
0.508
...
C51
15
1.122
C52
15
0.519
C53
15
1.666
...
C59
15
1.208
C60
15
0.644
C61
15
1.088
StDev
0.939
1.491
0.735
0.732
2.146
SE Mean
0.242
0.385
0.190
0.189
0.554
(
(
(
(
(
95.0 % CI
0.253,
1.293)
0.268,
1.919)
0.146,
0.960)*
-0.019,
0.792)*
0.051,
2.427)
0.619
1.088
0.660
0.769
0.528
0.160
0.281
0.170
0.199
0.136
(
(
(
(
(
0.148,
-0.020,
0.184,
0.208,
0.216,
0.834)*
1.184)
0.915)*
1.060)
0.800)*
1.292
0.664
2.028
0.334
0.171
0.524
(
(
(
0.406,
0.151,
0.543,
1.837)
0.887)*
2.789)
2.297
0.525
1.122
0.593
0.136
0.290
(
(
(
-0.065,
0.353,
0.466,
2.480)
0.935)*
1.709)
week10
27
T Confidence Intervals (continuation)
...
C79
C80
C81
C82
C83
C84
...
C99
C100
15
15
15
15
15
15
0.895
0.391
1.038
0.952
0.2763
1.237
0.931
0.767
0.992
1.407
0.2999
1.999
0.240
0.198
0.256
0.363
0.0774
0.516
(
(
(
(
(
(
0.379,
-0.034,
0.488,
0.173,
0.1102,
0.130,
1.411)
0.816)*
1.587)
1.732)
0.4424)*
2.345)
15
15
0.921
0.813
0.865
1.437
0.223
0.371
(
(
0.442,
0.018,
1.400)
1.609)
The number of intervals not capturing the true mean (μ = 1) is 9/100.
week10
28
Related documents