Download Hypothesis testing: Examples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Hypothesis testing:
Examples
AMS7, Spring 2012
Example 1: Testing a Claim
about a Proportion
• Sect. 7.3, # 2: Survey of Drinking:
In a Gallup survey, 1087 randomly selected adults were
asked whether they used alcoholic beverages. 62% of the
subjects said that they used alcoholic beverages. Test the
claim that the majority (more than 50%) of adults use
alcoholic beverages with a 0.05 significance level.
Set the Null and Alternative
Hypotheses
• Set the Null and Alternative Hypotheses about p (Proportion
of adults that use alcohol beverages)
• Claim: p>0.5
• Since the claim does not contain the equal sign the claim
becomes the Alternative Hypothesis.
• The opposite to the original claim is: p≤0.5. Since the
opposite to the original claim contains the equal sign, this
becomes the Null hypothesis.
• Finally set the Null hypothesis to p=0.5
• We finally want to prove:
H0: p=0.5 vs. H1: p>0.5
• From the sign of the Alternative Hypothesis we figure out
that we have a Right-Tailed test.
Select the test statistic and check
requirements for its application
• The sample is assumed to be a simple random sample
• The random variable is: number of adults that use alcohol
beverages. This variable has a Binomial distribution with
sample size n=1087 and p=0.5.
• Since n.p≥5 and n.q≥5, the normal distribution can be
used to approximate the binomial distribution.
• The sample proportion is ̂ = 0.62. The sampling
distribution of the sample proportion is approximately
normal with mean = p and standard deviation =
• The test statistic
distribution
z=
೛.೜
೙
has a standard normal
.
.
Critical Region and p-value
• Test statistic: z=
=
೛.೜
೙
..
బ.ఱ×బ.ఱ
భబఴళ
=7.913
(Note that p=0.5 is the value proposed under the Null
Hypothesis. We normally assumed that the Null hypothesis is
true before hand!!!).
• Level of significance of the test: 0.05 (Prob. reject H0 given it is
true).
• Critical value: = . = 1. 645 (This is the z score
corresponding to an area to the left equal to 1-0.05=0.95)
• Critical region (Rejection of H0). All values of the test statistic
greater that 1.645.
• P-value: Area to the right of the observed test statistic
(z=7.913). This area is 0.0001
Take a decision based on the
critical region or p-value
• Using the critical region: because the test statistic (z= 7.913) is
greater that the critical value (.= 1. 645 ) we reject the null
hypothesis.
• Using the p-value: because the p-value (0.0001) is lower than
the significance level (0.05) we reject the null hypothesis.
Note: We should reach the same conclusion under the two
methods!!!!
• CONCLUSION: The sample data support the claim that the
majority of adults use alcoholic beverages.
Example 2: Testing a claim
about a mean:
• At a dam in Oregon, fisheries biologists are studying the length
of a particular species of salmon to investigate the population
structure of resident fish. They collect a sample of 60 fishes
and find that the mean is 15 inches. Assume the population
standard deviation is known from a previous study to be 1.5
inches. Use a 0.05 significance level to test the claim that this
species of salmon have a mean length different than 14
inches.
Set the Null and Alternative
Hypotheses
• Set the Null and Alternative Hypotheses about (mean
length of salmon)
• Claim: ≠14
• Since the claim does not contain the equal sign the claim
becomes the Alternative Hypothesis.
• The opposite to the original claim is: =14. Since the
opposite to the original claim contains the equal sign, this
becomes the Null hypothesis.
• Finally set the Null hypothesis to =14
• We finally want to prove:
H0: =14 vs. H1: ≠14
• From the sign of the Alternative Hypothesis we figure out
that we have a Two-Tailed test.
Select the test statistic and check
requirements for its application
• The sample is assumed to be a simple random sample
• The random variable is: length of salmon
• Since the sample size n > 30, a normal distribution can be
used.
• The sample mean is ̅ = 15. The sampling distribution of
the sample mean is approximately normal with mean = and standard deviation = .
• The test statistic
distribution
z=
̅ ഑
೙
has a standard normal
Critical Region and p-value
• Test statistic: z=
̅ ഑
೙
=
భ.ఱ
లబ
= 5.164
(Note that is the value =14 is proposed under the Null
Hypothesis. We normally assumed that the Null hypothesis is
true before hand!!!).
• Level of significance of the test: 0.05
• Critical value: / = . = 1.96 (This is the z score
corresponding to an area to the left equal to 1-0.025=0.975),
and - / = −. = -1.96
• Critical region (Rejection of H0). All values of the test statistic
greater that 1.96 or lower than -1.96.
• P-value: 2⨯Area to the right of the observed test statistic
(z=5.164). This are is less than 0.0001
Take a decision based on the
critical region or p-value
• Using the critical region: because the test statistics (z= 5.164)
is greater that the critical value (.= 1.96 ) we reject the
null hypothesis.
• Using the p-value: because the p-value (0.0001) is lower than
the significance level (0.05) we reject the null hypothesis.
Note: Again, we should reach the same conclusion under the
two methods!!!!
• CONCLUSION: The sample data support the claim that
salmon have a mean length different that 14 inches
Example 3: Testing a claim
about a mean:
• Sect. 7.5 # 17. Sugar in Cereal
A sample of cereal boxes is randomly selected and the sugar
content (Grams of sugar per gram of cereal) are recorded. Those
amounts are summarized with these statistics: n=16, ̅ =
0.295, = 0.168. Use a 0.10 significance level to test the
claim of a cereal lobbyist that the mean sugar content for all
cereals is less than 0.3 g. Assume that a simple random sample
has been selected from a normally distributed population.
Set the Null and Alternative
Hypotheses
• Set the Null and Alternative Hypotheses about (mean sugar
content of cereals)
• Claim: <0.3
• Since the claim does not contain the equal sign the claim
becomes the Alternative Hypothesis.
• The opposite to the original claim is: ≥0.3. Since the
opposite to the original claim contains the equal sign, this
becomes the Null hypothesis.
• Finally set the Null hypothesis to =0.3
• We finally want to prove:
H0: =0.3 vs. H1: <0.3
• From the sign of the Alternative Hypothesis we figure out
that we have a Left-Tailed test.
Select the test statistic and check
requirements for its application
• The sample is assumed to be a simple random sample
• The random variable is: sugar content in cereals
• The sample size n <30, but the sample comes from a
normally distributed population.
• The population standard deviation is unknown. So we use
the sample standard deviation.
• The sample mean is ̅ = 0.295. The sampling distribution
of the sample mean has a Student t distribution with mean
= and standard deviation = with n-1 degrees of
freedom.
• The test statistic to be used is
degrees of freedom.
t=
̅ ೞ
೙
with n-1=15
Critical Region and p-value
• Test statistic: t=
̅ ೞ
೙
=
..
బ.భలఴ
భల
= -0.119
(Note that the value =0.3 is proposed under the Null
Hypothesis. We normally assumed that the Null hypothesis is
true before hand!!!).
• Level of significance of the test: 0.10
• Critical value: -
= −
. = -1.341 (This is the critical t value
corresponding to a one-tail area to the left equal to 0.10 and
15 degrees of freedom).
• Critical region (Rejection of H0). All values of the test statistic
lower that -1.341.
• P-value: Area to the left of the observed test statistic
(z=-0.119). This area is greater than 0.10. Using the computer
we found that this value is exactly 0.453.
Take a decision based on the
critical region or p-value
• Using the critical region: because the test statistic (t= -0.119)
is greater that the critical value (
.= -1.341 ) we fail to reject
the null hypothesis.
• Using the p-value: because the p-value (0.453) is greater than
the significance level (0.10) we fail to reject the null
hypothesis.
Note: Again, we should reach the same conclusion under the
two methods!!!!
• CONCLUSION: There is not sufficient sample evidence to
support the claim that the mean sugar content for all cereals
is less than 0.3 g
Confidence Interval Method of
Testing hypothesis
• For a two-tailed tests construct a
confidence interval with a confidence level
of 1-ߙ
• For a one-tailed test construct a
confidence interval with a confidence level
of 1-2ߙ (you have to double the level of
significance ߙ)
• We take the decision based on whether
the proposed parameter value falls
within the confidence interval limits.
Confidence Interval Method for
example 3
• =0.10. Construct a confidence interval with confidence level 12⨯0.1=0.80
• Confidence Interval with 80% Confidence Level:
s
s
̅ − 1.341 ×
< < ̅ + 1.341 ×
0.295−1.341 ×
.
< < 0.295 + 1.341 ×
.
0.238678 < < . CONCLUSION: The value of ߤ=0.3 does fall in the Confidence
Interval. We fail to reject H0 and we reached the same conclusion:
There is not sufficient sample evidence to support the claim that the
mean sugar content for all cereals is less than 0.3 g
Additional Note on using Confidence
Intervals for hypothesis testing
• When testing a claim about a population mean, the traditional
method using the critical region, the p-value method and the
confidence interval method are all equivalent and we should expect
the same conclusion.
• When testing a claim about a population proportion, the critical
region method and the p-value method are equivalent. The
confidence interval methods can give different results because the
standard deviation of the population proportion is calculated in
different ways:
• Standard Dev. of the sample proportion (Confidence Interval
Method):
• Standard Dev. of the sample proportion (Critical region or p-value
Method):
.
Testing a claim about a
standard deviation or variance
• REQUIREMENTS
1) Simple random sample
2) Population must have a normal distribution (more strict)
3) Test statistic has a chi-square distribution:
−
1
=
with n-1 degrees of freedom
Chi-square distribution
• PROPERTIES
1.
2.
3.
4.
All values are non-negatives
Distribution is not symmetric
Different distributions for different degrees of freedom
Critical vales in Table A-4
Chi-square distribution
Example: Testing a claim on a
standard deviation
According to the US department of Agriculture, imports of
Canadian grown potatoes have depressed US sales of potatoes
during the last six years. From this sample the standard
deviation is 2.07. Assume we know from the past, the
population standard deviation was .79. Use a 0.05 significance
level to test the claim that US potato sales have more variation
in the last six years than in the past. Assume a simple random
sample and a normally distributed population.
Set the Null and Alternative
Hypotheses
• Set the Null and Alternative Hypotheses about (population
standard deviation of US sales of potatoes)
• Claim: >0.79
• Since the claim does not contain the equal sign the claim
becomes the Alternative Hypothesis.
• The opposite to the original claim is: ≤0.79. Since the
opposite to the original claim contains the equal sign, this
becomes the Null hypothesis.
• Finally set the Null hypothesis to =0.79
• We finally want to prove:
H0: =0.79 vs. H1: >0.79
• From the sign of the Alternative Hypothesis we figure out
that we have a Right-Tailed test.
Select the test statistic and check
requirements for its application
• The sample is assumed to be a simple random sample
• The random variable is: US sales of potatoes
• The sample size is n=6 <30, but the sample comes from a
normally distributed population.
• The test statistic to be used is
degrees of freedom.
()మ
2
X=
మ
with n-1=5
Critical Region and p-value
•
()
Test statistic: =
మ
మ
=
×
.మ
.మ
=34.328
with n-1=5 degrees of freedom.
• (Note that the value =0.79 is proposed under the Null Hypothesis.
We normally assumed that the Null hypothesis is true before
hand!!!).
• Level of significance of the test: 0.05
• Critical value: = 11.071 (This value corresponds to the column
for =0.05 and 5 degrees of freedom of Table A-4).
• Critical region (Rejection of H0). All values of the test statistic
greater than 11.071.
• P-value: Area to the right of the observed test statistic ( =34.328).
This area is lower than 0.005. Using the computer we found that this
value is exactly 0.000002.
Take a decision based on the
critical region or p-value
• Using the critical region: because the test statistic
( =34.328) is greater that the critical value ( = 11.071 )
we reject the null hypothesis.
• Using the p-value: because the p-value (0.000002) is lower
than the significance level (0.05) we reject the null
hypothesis.
Note: Again, we should reach the same conclusion under the
two methods!!!!
• CONCLUSION: The sample data supports the claim that US
potato sales have more variation in the last six years than in
the past