Download Class 3 - Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
AMS7: WEEK 7. CLASS 3
Hypothesis Testing Examples
Friday May 15, 2015
Example 1: Testing a Claim about a
Proportion
• Sect. 7.3, # 2: Survey of Drinking:
In a Gallup survey, 1087 randomly selected adults were
asked whether they used alcoholic beverages. 62% of the
subjects said that they used alcoholic beverages. Test the
claim that the majority (more than 50%) of adults use
alcoholic beverages with a 0.05 significance level.
Set the Null and Alternative Hypotheses
• Set the Null and Alternative Hypotheses about p (Proportion of
•
•
•
•
•
•
adults that use alcohol beverages)
Claim: p>0.5
Since the claim does not contain the equal sign the claim
becomes the Alternative Hypothesis.
The opposite to the original claim is: p≤0.5. Since the opposite
to the original claim contains the equal sign, this becomes the Null
hypothesis.
Finally set the Null hypothesis to p=0.5
We finally want to prove:
H0: p=0.5 vs. H1: p>0.5
From the sign of the Alternative Hypothesis we figure out that we
have a Right-Tailed test.
Select the test statistic and check requirements for its
application
• The sample is assumed to be a simple random sample
• The random variable is: number of adults that use alcohol
beverages. This variable has a Binomial distribution with
sample size n=1087 and p=0.5.
• Since n.p≥5 and n.q≥5, the normal distribution can be
used to approximate the binomial distribution.
• The sample proportion is ̂ = 0.62. The sampling
distribution of the sample proportion is approximately
normal with mean = p and standard deviation =
• The test statistic
distribution
z=
೛.೜
೙
has a standard normal
.
.
Critical Region and p-value
• Test statistic: z=
=
೛.೜
೙
..
బ.ఱ×బ.ఱ
భబఴళ
=7.913
(Note that p=0.5 is the value proposed under the Null
Hypothesis. We normally assumed that the Null
hypothesis is true before hand!!!).
• Level of significance of the test: 0.05 (Prob. reject H0
given it is true).
• Critical value: = . = 1. 645 (This is the z score
corresponding to an area to the left equal to 1-0.05=0.95)
• Critical region (Rejection of H0). All values of the test
statistic greater that 1.645.
• P-value: Area to the right of the observed test statistic
(z=7.913). This area is 0.0001
Take a decision based on the critical
region or p-value
• Using the critical region: because the test statistic (z=
7.913) is greater that the critical value (. = 1. 645 ) we
reject the null hypothesis.
• Using the p-value: because the p-value (0.0001) is lower
than the significance level (0.05) we reject the null
hypothesis.
Note: We should reach the same conclusion under the two
methods!!!!
• CONCLUSION: The sample data support the claim
that the majority of adults use alcoholic beverages.
Example 2: Testing a claim about a mean:
ߪ known
• At a dam in Oregon, fisheries biologists are studying the
length of a particular species of salmon to investigate the
population structure of resident fish. They collect a sample
of 60 fishes and find that the mean is 15 inches. Assume
the population standard deviation is known from a
previous study to be 1.5 inches. Use a 0.05 significance
level to test the claim that this species of salmon have a
mean length different than 14 inches.
Set the Null and Alternative Hypotheses
• Set the Null and Alternative Hypotheses about (mean length
•
•
•
•
•
•
of salmon)
Claim: ≠14
Since the claim does not contain the equal sign the claim
becomes the Alternative Hypothesis.
The opposite to the original claim is: =14. Since the opposite to
the original claim contains the equal sign, this becomes the Null
hypothesis.
Finally set the Null hypothesis to =14
We finally want to prove:
H0: =14 vs. H1: ≠14
From the sign of the Alternative Hypothesis we figure out that we
have a Two-Tailed test.
Select the test statistic and check requirements
for its application
• The sample is assumed to be a simple random sample
• The random variable is: length of salmon
• Since the sample size n > 30, a normal distribution can be
used.
• The sample mean is ̅ = 15. The sampling distribution of the
sample mean is approximately normal with mean = and
standard deviation = .
• The test statistic
distribution
z=
̅ ഑
೙
has a standard normal
Critical Region and p-value
• Test statistic: z=
̅ ഑
೙
=
భ.ఱ
లబ
= 5.164
(Note that is the value =14 is proposed under the Null
Hypothesis. We normally assumed that the Null
hypothesis is true before hand!!!).
• Level of significance of the test: 0.05
• Critical value: / = . = 1.96 (This is the z score
corresponding to an area to the left equal to 10.025=0.975), and - / = −. = -1.96
• Critical region (Rejection of H0). All values of the test
statistic greater that 1.96 or lower than -1.96.
• P-value: 2⨯Area to the right of the observed test statistic
(z=5.164). This are is less than 0.0001
Take a decision based on the critical
region or p-value
• Using the critical region: because the test statistics (z=
5.164) is greater that the critical value (. = 1.96 ) we
reject the null hypothesis.
• Using the p-value: because the p-value (0.0001) is lower
than the significance level (0.05) we reject the null
hypothesis.
Note: Again, we should reach the same conclusion under
the two methods!!!!
• CONCLUSION: The sample data support the claim
that salmon have a mean length different that 14
inches
Example 3: Testing a claim about a
mean:
• Sect. 7.5 # 17. Sugar in Cereal
A sample of cereal boxes is randomly selected and the
sugar content (Grams of sugar per gram of cereal) are
recorded. Those amounts are summarized with these
statistics: n=16, ̅ = 0.295, = 0.168. Use a 0.10
significance level to test the claim of a cereal lobbyist that
the mean sugar content for all cereals is less than 0.3 g.
Assume that a simple random sample has been selected
from a normally distributed population.
Set the Null and Alternative
Hypotheses
• Set the Null and Alternative Hypotheses about (mean
•
•
•
•
•
•
sugar content of cereals)
Claim: <0.3
Since the claim does not contain the equal sign the claim
becomes the Alternative Hypothesis.
The opposite to the original claim is: ≥0.3. Since the
opposite to the original claim contains the equal sign, this
becomes the Null hypothesis.
Finally set the Null hypothesis to =0.3
We finally want to prove:
H0: =0.3 vs. H1: <0.3
From the sign of the Alternative Hypothesis we figure out
that we have a Left-Tailed test.
Select the test statistic and check requirements
for its application
• The sample is assumed to be a simple random sample
• The random variable is: sugar content in cereals
• The sample size n <30, but the sample comes from a
normally distributed population.
• The population standard deviation is unknown. So we use
the sample standard deviation.
• The sample mean is ̅ = 0.295. The sampling distribution of
the sample mean has a Student t distribution with mean =
and standard deviation = with n-1 degrees of freedom.
• The test statistic to be used is
degrees of freedom.
t=
̅ ೞ
೙
with n-1=15
Critical Region and p-value
• Test statistic: t=
̅ ೞ
೙
=
..
బ.భలఴ
భల
= -0.119
(Note that the value =0.3 is proposed under the Null
Hypothesis. We normally assumed that the Null
hypothesis is true before hand!!!).
• Level of significance of the test: 0.10
• Critical value: -
= −
. = -1.341 (This is the critical t
value corresponding to a one-tail area to the left equal to
0.10 and 15 degrees of freedom).
• Critical region (Rejection of H0). All values of the test
statistic lower that -1.341.
• P-value: Area to the left of the observed test statistic
(z=-0.119). This area is greater than 0.10. Using the
computer we found that this value is exactly 0.453.
Take a decision based on the critical
region or p-value
• Using the critical region: because the test statistic (t= -
0.119) is greater that the critical value (
. = -1.341 ) we
fail to reject the null hypothesis.
• Using the p-value: because the p-value (0.453) is greater
than the significance level (0.10) we fail to reject the null
hypothesis.
Note: Again, we should reach the same conclusion under
the two methods!!!!
• CONCLUSION: There is not sufficient sample
evidence to support the claim that the mean sugar
content for all cereals is less than 0.3 g
Confidence Interval Method of
Testing hypothesis
• For a two-tailed tests construct a
confidence interval with a confidence level
of 1-ߙ
• For a one-tailed test construct a confidence
interval with a confidence level of 1-2ߙ (you
have to double the level of significance ߙ)
• We take the decision based on whether the
proposed parameter value falls within the
confidence interval limits.
Confidence Interval Method for
example 3
• =0.10. Construct a confidence interval with confidence level 1-
2⨯0.1=0.80
• Confidence Interval with 80% Confidence Level:
s
s
̅ − 1.341 ×
< < ̅ + 1.341 ×
0.295−1.341 ×
.
< < 0.295 + 1.341 ×
.
0.238678 < < . CONCLUSION: The value of ߤ=0.3 does fall in the Confidence
Interval. We fail to reject H0 and we reached the same conclusion:
There is not sufficient sample evidence to support the
claim that the mean sugar content for all cereals is less
than 0.3 g
Additional Note on using Confidence
Intervals for hypothesis testing
• When testing a claim about a population mean, the traditional
method using the critical region, the p-value method and the
confidence interval method are all equivalent and we should
expect the same conclusion.
• When testing a claim about a population proportion, the critical
region method and the p-value method are equivalent. The
confidence interval methods can give different results because
the standard deviation of the population proportion is calculated
in different ways:
• Standard Dev. of the sample proportion (Confidence Interval
Method):
• Standard Dev. of the sample proportion (Critical region or p-
value Method):
.
Testing a claim about a standard
deviation or variance
• REQUIREMENTS
1) Simple random sample
2) Population must have a normal distribution (more
strict)
3) Test statistic has a chi-square distribution:
−
1
=
with n-1 degrees of freedom
Chi-square distribution
• PROPERTIES
1. All values are non-negatives
2. Distribution is not symmetric
3. Different distributions for different degrees of freedom
4. Critical vales in Table A-4
Chi-square distribution
Example: Testing a claim on a
standard deviation
According to the US department of Agriculture, imports of
Canadian grown potatoes have depressed US sales of
potatoes during the last six years. From this sample the
standard deviation is 2.07. Assume we know from the
past, the population standard deviation was .79. Use a
0.05 significance level to test the claim that US potato
sales have more variation in the last six years than in the
past. Assume a simple random sample and a normally
distributed population.
Set the Null and Alternative
Hypotheses
• Set the Null and Alternative Hypotheses about •
•
•
•
•
•
(population standard deviation of US sales of potatoes)
Claim: >0.79
Since the claim does not contain the equal sign the claim
becomes the Alternative Hypothesis.
The opposite to the original claim is: ≤0.79. Since the
opposite to the original claim contains the equal sign, this
becomes the Null hypothesis.
Finally set the Null hypothesis to =0.79
We finally want to prove:
H0: =0.79 vs. H1: >0.79
From the sign of the Alternative Hypothesis we figure out
that we have a Right-Tailed test.
Select the test statistic and check requirements
for its application
• The sample is assumed to be a simple random sample
• The random variable is: US sales of potatoes
• The sample size is n=6 <30, but the sample comes from a
normally distributed population.
• The test statistic to be used is
degrees of freedom.
() మ
2
X=
మ
with n-1=5
Critical Region and p-value
• Test statistic:
()
=
మ
మ
=
×
.మ
.మ
=34.328
with n-1=5 degrees of freedom.
• (Note that the value =0.79 is proposed under the Null
•
•
•
•
Hypothesis. We normally assumed that the Null hypothesis is
true before hand!!!).
Level of significance of the test: 0.05
Critical value: = 11.071 (This value corresponds to the
column for =0.05 and 5 degrees of freedom of Table A-4).
Critical region (Rejection of H0). All values of the test statistic
greater than 11.071.
P-value: Area to the right of the observed test statistic
( =34.328). This area is lower than 0.005. Using the computer
we found that this value is exactly 0.000002.
Take a decision based on the critical
region or p-value
• Using the critical region: because the test statistic
( =34.328) is greater that the critical value ( = 11.071
) we reject the null hypothesis.
• Using the p-value: because the p-value (0.000002) is
lower than the significance level (0.05) we reject the null
hypothesis.
Note: Again, we should reach the same conclusion under
the two methods!!!!
• CONCLUSION: The sample data supports the claim
that US potato sales have more variation in the last
six years than in the past