Download biostat7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Hypothesis Testing
• A criminal trial is an example of hypothesis testing.
• In a trial a jury must decide between two hypotheses.
• The null hypothesis is
•
H0: The defendant is innocent
• The alternative hypothesis or research hypothesis is
•
HA: The defendant is guilty
• The jury does not know which hypothesis is true. They
must make a decision on the basis of evidence
presented.
1
Hypothesis Testing
• Convicting the defendant is called rejecting the null
hypothesis in favor of the alternative hypothesis. That is,
the jury is saying that there is enough evidence to
conclude that the defendant is guilty (i.e., there is
enough evidence to conclude that the assumption of
innocence is suspect).
• If the jury acquits it is stating that there is not enough
evidence to reject the null hypothesis in support of the
alternative hypothesis. This does not prove that the
defendant is innocent, only that there is not enough
evidence to support the alternative hypothesis. That is
why we never say that we accept the null hypothesis,
although most people in industry will say “We accept the
null hypothesis”
2
Errors in Hypothesis Testing
• There are two possible errors.
• A Type I error occurs when we reject a true null
hypothesis. That is, a Type I error occurs when the jury
convicts an innocent person. We would want the
probability of this type of error [maybe 0.001 – beyond a
reasonable doubt] to be very small for a criminal trial
where a conviction results in the death penalty, whereas
for a civil trial, where conviction might result in someone
having to “pay for damages to a wrecked auto”,we would
be willing for the probability to be larger [0.49 –
preponderance of the evidence ]
•
P(Type I error) =  [usually 0.05 or 0.01]
3
Errors in Hypothesis Testing
• A Type II error occurs when we don’t reject a false null
hypothesis [accept the null hypothesis]. That occurs
when a guilty defendant is acquitted.
• In practice, this type of error is by far the most serious
mistake we normally make. For example, if we test the
hypothesis that the amount of medication in a heart pill is
equal to a value which will cure your heart problem and
“accept the hull hypothesis that the amount is ok”. Later
on we find out that the average amount is WAY too large
and people die from “too much medication” [I wish we
had rejected the hypothesis and threw the pills in the
trash can], it’s too late because we shipped the pills to
the public.
4
Errors in Hypothesis Testing
• The probability of a Type I error is denoted as α
(Greek letter alpha). The probability of a type II
error is β (Greek letter beta).
• The two probabilities are inversely related.
Decreasing one increases the other, for a fixed
sample size.
• In other words, you can’t have  and β both real
small for any old sample size. You may have to
take a much larger sample size, or in the court
example, you need much more evidence.
5
Only 4 things can happen when
you test a hypothesis!
Conclusion of
Hypothesis Test
True state of nature
Ho is TRUE
Ho is FALSE
Fail to reject Ho
Correct Decision
Type II error [β]
Reject Ho
Type I error []
Correct Decision
6
Conclusions to Hypothesis Testing
• 1. There are two hypotheses, the null and the alternative
hypotheses.
• 2. The procedure begins with the assumption that the
null hypothesis is true.
• 3. The goal is to determine whether there is enough
evidence to infer that the alternative hypothesis is true,
or the null is not likely to be true.
• 4. There are two possible decisions:
•
Conclude that there is enough evidence to support
the alternative hypothesis. Reject the null.
•
Conclude that there is not enough evidence to
support the alternative hypothesis. Fail to reject the null.
7
Hypothesis Testing
• The two hypotheses are called the null hypothesis and
the other the alternative or research hypothesis. The
usual notation is:
•
H0: — the ‘null’ hypothesis
•
HA: — the ‘alternative’ or ‘research’ hypothesis
• The null hypothesis (H0) will always state that the
parameter equals the value specified in the alternative
hypothesis HA ( some people will use H1 for the
alternative hypothesis)
8
Example: Hypothesis Test
•
•
•
•
•
Recall the mean time in the recovery room example earlier. Rather than
estimate the mean time, our hospital administrator wants to know whether
the mean is different from 350 minutes (which is the standard Blue
Cross/Blue Shield uses for insurance). In other words BC/BS claims that
the mean time is 350 minutes and we want to check this claim out to see if it
appears reasonable. We can rephrase this request into a test of the
hypothesis:
H0: μx = 350
Thus, our research hypothesis becomes:
HA: μx ≠ 350
Recall that the standard deviation [σ]was assumed to be 75, the sample
size [n] was 25, and the sample mean [ ] was calculated to be 370.16. If
the sample mean is close to 350, it would appear that BC/BS may be
correct in assuming the average time for recovery is in fact 350
9
Example: Hypothesis Test
• The testing procedure begins with the assumption that
the null hypothesis is true.
• Thus, until we have further statistical evidence, we will
assume:
•
H0: = 350 (assumed to be TRUE)
• The next step will be to determine the sampling
distribution of the sample mean assuming the true
mean is 350.
•
is normal with
350
•
75/SQRT(25) = 15
10
Is the Sample Mean in the Guts of the
Sampling Distribution??
11
Three ways to determine this: First way
1.
Unstandardized test statistic: Is in the guts of the
sampling distribution? Depends on what you define as
the “guts” of the sampling distribution.
2.
If we define the guts as the center 95% of the
distribution [this means  = 0.05], then the critical
values that define the guts will be 1.96 standard
deviations of X-Bar on either side of the mean of the
sampling distribution [350], or
UCV = 350 + 1.96*15 = 350 + 29.4 = 379.4
LCV = 350 – 1.96*15 = 350 – 29.4 = 320.6
3.
4.
12
1. Unstandardized Test Statistic Approach
13
1. Unstandardized Test Statistic Approach –
Managerial Summary
• Reason: Since the sample mean is between the
LCV(320.6) and the UCV(379.4)
• Conclusion: We fail to reject the null
hypothesis Ho: μx = 350
• Level of Significance: At a 5% level of significance.
• In other words, we have no statistical evidence to
dispute BC/BS’s claim that patients should stay in
recovery an average of 350 minutes.
14
Three ways to determine this: Second way
• 2. Standardized test statistic: Since we defined the
“guts” of the sampling distribution to be the center 95%
[ = 0.05],
• If the Z-Score for the sample mean is greater than
1.96, we know that will be in the reject region on the
right side or
• If the Z-Score for the sample mean is less than -1.97, we
know that
will be in the reject region on the left side.
• Z=(
- μo )/ σ
= (370.16 – 350)/15 = 1.344
• Is this Z-Score in the guts of the sampling distribution???
15
2. Standardized Test Statistic Approach
16
2. Standardized Test Statistic Approach –
Managerial Summary
• Reason: Since the sample value for Z is
between the LCV(-1.96) and the UCV(+1.96)
• Conclusion: We fail to reject the null
hypothesis Ho: μx = 350
• Level of Significance: At a 5% level of significance.
• In other words, we have no statistical evidence to
dispute BC/BS’s claim that patients should stay in
recovery an average of 350 minutes.
17
Three ways to determine this:
Third way
• 3. The p-value approach (which is generally used with computer
and statistical software): Increase the “Rejection Region” until it
“captures” the sample mean.
• For this example, since is to the right of the mean, calculate
• P( > 370.16) = P(Z > 1.344) = 0.0901
• Since this is a two tailed test, you must double this area for the pvalue.
•
p-value = 2*(0.0901) = 0.1802
• Since we defined the guts as the center 95% [ = 0.05], the reject
region was 5%. Since our sample mean, , is in the 18.02%
region, it cannot be in our 5% rejection region [ = 0.05].
• Therefore, since the p-value > 0.05, we reject the hypothesis
that the mean is 350 at a 5% level of significance
18
Three ways to determine this:
Third way
19
Statistical Conclusions:
• Unstandardized Test Statistic:
•
Since LCV (320.6) <
(370.16) < UCV (379.4), we
fail to reject the null hypothesis at a 5% level of
significance.
• Standardized Test Statistic:
•
Since -Z/2(-1.96) < Z(1.344) < Z/2 (1.96), we fail to
reject the null hypothesis at a 5% level of significance.
• P-value:
•
Since p-value (0.1802) > 0.05 [], we fail to reject
the hull hypothesis at a 5% level of significance.
20
NOW, what happens when σ is
unknown and we have to use “s”
•
Now assume the standard deviation [σ] was unknown
and the sample standard deviation [s] calculated to be
80.8, the sample size [n] was 25, and the sample
mean [ ] was calculated to be 370.16.
•
Use the t-statistic [assuming population is normal] to
calculate the new critical values:
1.
UCV = 350 + t/2*(80.8/5) = ?
2.
LCV = 350 – t/2*(80.8/5) = ?
d.f. = n-1 = 24
 = 0.05
What is your managerial conclusion???
21
Hypothesis Testing Advise
• In most cases you will not know σ and will need
to use the sample standard deviation in all your
formulas.
• The minute you do this, everywhere you find a
“Z score” you will replace it with a “t score” with
n-1 degrees of freedom [single mean problems]
• You must be able to assume population is
approximately normal to do this, especially for
small sample sizes
22
Hypothesis Testing Advise
• If sample sizes are “large” the central limit will
take care of you and “in the old days” we went
ahead and used the Z score whenever the
sample sizes were greater than 30.
• Bottom Line: If the population is not normal,
and you have to estimate σ from the data [ use
“s” ], and the sample size is small [usually <
30], YOU CANNOT WORK THIS PROBLEM
WITH WHAT YOU LEARN IN THIS CLASS.
23
Three Forms of Single Mean Hypothesis Test [ is
not divided in the one-tail tests]
24
One Tail Hypothesis Test Example
• A health care facility determines that a new
billing system will be cost-effective only if the
mean monthly bill is more than $170.
• A random sample of 400 monthly bills is drawn,
for which the sample mean is $178. The bills are
approximately normally distributed with a
standard deviation of $65 [σ = 65].
• Can we conclude that the new system will be
cost-effective?
25
One Tail Hypothesis Test Example
• The system will be cost effective if the mean bill
for all customers is greater than $170 [some
refer to this as the “research” hypothesis].
• Our null and alternative hypothesis would be:
•
H0: μ < 170 (some text will leave the < off
this null: H0: μ = 170)
•
Ha: μ > 170 (this is what we want to
conclude if we buy the new system)
•
•
Fail to Reject Ho – don’t buy system
Reject Ho – buy system
26
One Tail Hypothesis Test Example
• H0: μ < 170
• Ha: μ > 170
• We know:
•
n = 400,
•
= 178, and
•
σ = 65
•
σ = 65/SQRT(400) = 3.25
•
 = 0.05
27
One Tail Hypothesis Test Example
• Managerial Conclusion???
28
Homework:
• 7.2.3, 7.2.11, 7.2.13, 7.2.15
29
Questions to thing about!
• Is there any significant difference in the
mean grade made on the first test
between male and female students?
– Ho: μF = μM or Ho: μF - μM =0
– “Difference in Means Hypothesis Test”
30
Questions to thing about!
• Is there any significant difference in the mean grade on
the first test and the second test?
• Would we test
– Ho: μ1 = μ2
– Ho: μd = 0
OR
Test 1
85
48
92
……..
μ1
Test 2
77
56
80
……..
μ2
Difference = d
8
-8
12
…….
μd
• Called a “Paired Difference in Means” test.
• The second test Ho: μd = 0 takes the variability between
the students out of the analysis, resulting in less
variability, which means you will have more statistical
“POWER” to detect a difference for this sample size. [In
effect this is a single mean hypothesis test]
31
Questions to thing about!
• Is there a significant difference in the mean grades depending on
which row the student sits in?
• Ho: μRow 1 = μRow 2 = μRow 3 = μRow 4 = μRow 5 = μRow 6
• Called a “One Way Analysis of Variance” Conclusion???
•
32
One Way Analysis of Variance
Box-and-Whisker Plot
1
Row
2
3
4
5
6
74
78
82
86
90
94
98
Grade
33
One Way Analysis of Variance
• You could claim that the means associated with
any vertical column are equal BUT for sure
Rows 5 and 4 have different means than rows 1,
2, and 6
34
Homework
• Describe an experiment at your work
where you might use a “Difference in
Means Hypothesis Test”
• Describe an experiment at your work
where you might use a “Paired Difference
in Means Hypothesis Test”
• Describe an experiment at your work
where you might use a “One Way Analysis
of Variance Test”
35