Download Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Hypothesis Testing
Martina Litschmannová
[email protected]
EA 538
Terms Introduce in Prior Chapter
 Population … all possible values
 Sample … a portion of the population
 Statistical inference … generalizing from a sample to a
population with calculated degree of certainty
 Two forms of statistical inference
 Hypothesis testing
 Estimation
 Parameter 𝛩 … a characteristic of population, e.g.,
population mean µ
 Statistic … calculated from data in the sample, e.g., sample
mean 𝑥
Distinctions Between Parameters and Statistics
(Exercise 8 and 9 review)
Parameters
Statistics
Source
Population
Sample
Notation
Greek (e.g. μ)
Roman (e.g. 𝑥)
Variability
No
Yes
Calculated
No
Yes
What is Hypothesis Testing?
 A statistical hypothesis is an assumption about a population.
This assumption may or may not be true.
 Hypothesis testing refers to the formal procedures used by
statisticians to failed reject or reject statistical hypotheses.
The best way to determine whether a statistical hypothesis is
true would be to examine the entire population. Since that is
often impractical, researchers typically examine a random
sample from the population. If sample data are not consistent
with the statistical hypothesis, the hypothesis is rejected.
Statistical Hypotheses
There are two types of statistical hypotheses.
 Null hypothesis. The null hypothesis, denoted by 𝐻0, is usually
the hypothesis that sample observations result purely from
chance. 𝐻0 is the statement being tested in a test of
hypothesis.
 Alternative hypothesis. The alternative hypothesis, denoted
by 𝐻𝐴 or 𝐻1, is the hypothesis that sample observations are
influenced by some non-random cause. 𝐻𝐴 is what is believe
to be true if 𝐻0 is false.
Statistical Hypotheses
In this course, we will always assume that the null hypothesis for
a population parameter Θ always specifies a single value Θ0 for
that parameter. So, an equal sign always appears:
𝐻0 : Θ = Θ0
 If the primary concern is deciding whether a population
parameter is different than a specified value Θ0 , the
alternative hypothesis should be:
𝐻𝐴 : Θ ≠ Θ0
This form of alternative hypothesis is called a two-tailed test.
Statistical Hypotheses
 If the primary concern is whether a population parameter Θ is
less than a specified value Θ0 , the alternative hypothesis
should be:
𝐻𝐴 : Θ < Θ0
A hypothesis test whose alternative hypothesis has this form is
called a left-tailed test.
Statistical Hypotheses
 If the primary concern is whether a population parameter Θ is
greater than a specified value Θ0 , the alternative hypothesis
should be:
𝐻𝐴 : Θ > Θ0
A hypothesis test whose alternative hypothesis has this form is
called a right-tailed test.
A hypothesis test is called a one-tailed test if it is either right- or
left-tailed, i.e.,if it is not a two-tailed test.
Can We Accept the Null Hypothesis?
 Some researchers say that a hypothesis test can have one of
two outcomes: you accept the null hypothesis or you reject
the null hypothesis. Many statisticians, however, take issue
with the notion of "accepting the null hypothesis." Instead,
they say: you reject the null hypothesis or you don‘t reject the
null hypothesis.
 Why the distinction between "acceptance" and „don‘t
reject?" Acceptance implies that the null hypothesis is true.
Don‘t reject implies that the data are not sufficiently
persuasive for us to prefer the alternative hypothesis over the
null hypothesis.
Hypothesis Testing
Hypothesis testing is a formal process to determine whether to
reject a null hypothesis, based on sample data. This process
consists of four steps.
 State the hypotheses. This involves stating the null and alternative
hypotheses. The hypotheses are stated in such a way that they are
mutually exclusive. That is, if one is true, the other must be false.
 Formulate an analysis plan. The analysis plan describes how to use
sample data to evaluate the null hypothesis. The evaluation often focuses
around a single test statistic.
 Analyze sample data. Find the value of the test statistic (mean score,
proportion, t-score, z-score, etc.) described in the analysis plan.
 Interpret results. Apply the decision rule described in the analysis plan.
If the value of the test statistic is unlikely, based on the null hypothesis,
reject the null hypothesis.
Decision Errors
Your Statistical Decision
True state of null hypothesis
𝑯𝟎 True
(example: the drug
doesn’t work)
𝑯𝟎 False
(example: the drug
works)
Reject 𝑯𝟎
(ex: you conclude that
the drug works)
Type I error (α)
Correct
Do not reject 𝑯𝟎
(ex: you conclude that
there is insufficient
evidence that the drug
works)
Correct
Type II Error (β)
Decision Errors
Two types of errors can result from a hypothesis test.
 Type I error. A Type I error occurs when the researcher
rejects a null hypothesis when it is true. The probability of
committing a Type I error is called the significance level.
This probability is also called alpha, and is often denoted by
α.
 Type II error. A Type II error occurs when the researcher
fails to reject a null hypothesis that is false. The probability
of committing a Type II error is called beta, and is often
denoted by β. The probability of not committing a Type II
error is called the power of the test.
Decision Rules
The analysis plan includes decision rules for rejecting the null
hypothesis. In practice, statisticians describe these decision rules
in two ways - with reference to a p-value or with reference to a
region of acceptance.
 p-value. The strength of evidence in support of a null
hypothesis is measured by the p-value. Suppose the test
statistic is equal to S. The p-value is the probability of
observing a test statistic as extreme as S, assuming the null
hypothesis is true. If the p-value is less than the significance
level, we reject the null hypothesis.
Decision Rules
 Region of acceptance. The region of acceptance is a range of
values. If the test statistic falls within the region of
acceptance, the null hypothesis is not rejected. The region of
acceptance is defined so that the chance of making a Type I
error is equal to the significance level.
 The set of values outside the region of acceptance is called
the region of rejection. If the test statistic falls within the
region of rejection, the null hypothesis is rejected. In such
cases, we say that the hypothesis has been rejected at the α
level of significance.
How to Test Hypotheses?
1) State the hypotheses. Every hypothesis test requires the
analyst to state a null hypothesis and an alternative
hypothesis. The hypotheses are stated in such a way that
they are mutually exclusive. That is, if one is true, the other
must be false; and vice versa.
2) Formulate an analysis plan. The analysis plan describes how
to use sample data to failed reject or reject the null
hypothesis. It should specify the following elements.
Significance level. Often, researchers choose significance
level equal to 0.01, 0.05, or 0.10; but any value between 0
and 1 can be used.
How to Test Hypotheses?
Test method. Typically, the test method involves a test statistic
and a sampling distribution. Computed from sample data, the
test statistic might be a mean score, proportion, difference
between means, difference between proportions, z-score, tscore, chi-square, etc. Given a test statistic and its sampling
distribution, a researcher can assess probabilities associated
with the test statistic. If the test statistic probability is less than
the significance level, the null hypothesis is rejected.
How to Test Hypotheses?
3) Analyze sample data. Using sample data, perform
computations called for in the analysis plan.
Test statistic.
p-value. The p-value is the probability of observing a sample
statistic as extreme as the test statistic, assuming the null
hypotheis is true.
How to Test Hypotheses?
4) Interpret the results. If the sample findings are unlikely,
given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the p-value to
the significance level, and rejecting the null hypothesis when
the P-value is less than the significance level.
P-value is low,
null hypothesis must go!!
How to calculate p-value?
Alternative hypothesis 𝐻𝐴
𝜃 < 𝜃0
𝜃 > 𝜃0
𝜃 ≠ 𝜃0
𝑝−𝑣𝑎𝑙𝑢𝑒
𝑝−𝑣𝑎𝑙𝑢𝑒 = 𝐹0 𝑥𝑂𝐵𝑆
𝑝−𝑣𝑎𝑙𝑢𝑒 = 1 − 𝐹0 𝑥𝑂𝐵𝑆
𝑝−𝑣𝑎𝑙𝑢𝑒 = 2𝑚𝑖𝑛 𝐹0 𝑥𝑂𝐵𝑆 ; 1 −
 𝑥𝑂𝐵𝑆 … test statistic if 𝐻0 is true
 𝐹0 𝑥𝑂𝐵𝑆 … distribution function of RV with null distribution
in 𝑥𝑂𝐵𝑆
How to interpret results?
𝑝−𝑣𝑎𝑙𝑢𝑒
𝑝−𝑣𝑎𝑙𝑢𝑒 < 𝛼
𝑝−𝑣𝑎𝑙𝑢𝑒 ≥ 𝛼
Result
We reject H0 with significance level 𝛼.
We dont reject H0 with significance level 𝛼.
 𝛼 … significance level (typically 0,05)
Hypothesis problems
Null
Hypothesis
𝜇 = 𝜇0
𝜇 = 𝜇0
𝜋 = 𝜋0
Assumptions
normal or near-normal
population,
large sample
normal or near-normal
population,
small sample
9
𝑛>
𝑝 1−𝑝
Null
Distribution
Test Statistic
𝑋 − 𝜇0
𝑛
𝑆
𝑁 0; 1
𝑋 − 𝜇0
𝑛
𝑆
𝑡𝑛−1
𝑝 − 𝜋0
𝜋0 1 − 𝜋0
𝑛
𝑁 0; 1
Hypothesis problems
Null
Hypothesis
Assumptions
Independent samples,
𝜇1 − 𝜇2 = 𝐷
normal or nearnormal populations
Not independent
samples (paired data),
𝜇𝑑 = 𝐷
normal or nearnormal populations
∀𝑖 ∈ 1,2 :
𝑛𝑖 > 30,
𝜋1 − 𝜋2 = 𝐷
9
𝑛𝑖 >
𝑝𝑖 1 − 𝑝𝑖
Test Statistic
𝑋1 − 𝑋2 − 𝐷
𝑆12 𝑆22
𝑛1 + 𝑛2
𝑑−𝐷
𝑛
𝑆𝑑
𝑝1 − 𝑝2 − 𝐷
𝑝1 1 − 𝑝1
𝑝2 1 − 𝑝2
+
𝑛1
𝑛2
Null
Distribution
𝑡𝐷𝐹
𝐷𝐹
=
𝑆12 𝑆22
𝑛1 + 𝑛2
2
2
2
𝑆12
𝑆22
𝑛1
𝑛2
+
𝑛1 − 1 𝑛2 − 1
𝑡𝑛−1
𝑁 0; 1
1) An inventor has developed a new, energy-efficient lawn
mower engine. He claims that the engine will run
continuously for 5 hours (300 minutes) on a single gallon of
regular gasoline. Suppose a simple random sample of 50
engines is tested. The engines run for an average of 295
minutes, with a standard deviation of 20 minutes. Test the
null hypothesis that the mean run time is 300 minutes
against the alternative hypothesis that the mean run time is
not 300 minutes. Use a 0.05 level of significance. (Assume
that run times for the population of engines are normally
distributed.)
2) Bon Air Elementary School has 300 students. The principal of
the school thinks that the average IQ of students at Bon Air
is at least 110. To prove her point, she administers an IQ test
to 20 randomly selected students. Among the sampled
students, the average IQ is 108 with a standard deviation of
10. Based on these results, should the principal accept or
reject her original hypothesis? Assume a significance level of
0.01.
3) Within a school district, students were randomly assigned to
one of two Math teachers - Mrs. Smith and Mrs. Jones. After
the assignment, Mrs. Smith had 30 students, and Mrs. Jones
had 25 students.
At the end of the year, each class took the same standardized
test. Mrs. Smith's students had an average test score of 78, with
a standard deviation of 10; and Mrs. Jones' students had an
average test score of 85, with a standard deviation of 15.
Test the hypothesis that Mrs. Smith and Mrs. Jones are equally
effective teachers. Use a 0.10 level of significance. (Assume that
student performance is approximately normal and the variances
of both groups are equal.)
4) The Acme Company has developed a new battery. The
engineer in charge claims that the new battery will operate
continuously for at least 7 minutes longer than the old
battery.
To test the claim, the company selects a simple random sample
of 100 new batteries and 100 old batteries. The old batteries run
continuously for 190 minutes with a standard deviation of 20
minutes; the new batteries, 200 minutes with a standard
deviation of 40 minutes.
Test the engineer's claim that the new batteries run at least 7
minutes longer than the old. Use a 0.05 level of significance.
(Assume that there are no outliers in either sample and the
variances in both groups are unequal.)
5) Forty-four sixth graders were randomly selected from a
school district. Then, they were divided into 22 matched
pairs, each pair having equal IQ's. One member of each pair
was randomly selected to receive special training. Then, all of
the students were given an IQ test. Test results are in dataset
IQtest.xls.
Do these results provide evidence that the special training
helped or hurt student performance? Use an 0.05 level of
significance. Assume that the mean differences are
approximately normally distributed.
6) The CEO of a large electric utility claims that 80 percent of his
1,000,000 customers are very satisfied with the service they
receive. To test this claim, the local newspaper surveyed 100
customers, using simple random sampling. Among the
sampled customers, 73 percent say they are very satisified.
Based on these findings, can we reject the CEO's hypothesis
that 80% of the customers are very satisfied? Use a 0.05 level
of significance.
7) Suppose the Acme Drug Company develops a new drug,
designed to prevent colds. The company states that the drug
is equally effective for men and women. To test this claim,
they choose a a simple random sample of 100 women and
200 men from a population of 100,000 volunteers.
At the end of the study, 38% of the women caught a cold; and
51% of the men caught a cold. Based on these findings, can we
reject the company's claim that the drug is equally effective for
men and women? Use a 0.05 level of significance.
Chi-Square Test for Independence
Chi-Square Test for Independence
 The test is applied when you have two categorical variables
from a single population.
 It is used to determine whether there is a significant
association between the two variables.
For example, in an election survey, voters might be classified by
gender (male or female) and voting preference (Democrat,
Republican, or Independent). We could use a chi-square test for
independence to determine whether gender is related to voting
preference.
Chi-Square Test for Independence
Voting Preferences
Row total
Republican
Democrat
Independent
Male
200
150
50
400
Female
250
300
50
600
Column total
450
450
100
1000
Female
250
Male
300
200
0%
20%
Republican
50
150
40%
Democrat
60%
50
80%
Independent
100%
When to Use Chi-Square Test for Independence?
 The sampling method is simple random sampling.
 Each population is at least 10 times as large as its respective
sample.
 The variables under study are each categorical.
 If sample data are displayed in a contingency table, the
expected frequency count for each cell of the table is at least
5.
State the Hypotheses
Suppose that Variable A has r levels, and Variable B has c levels.
The null hypothesis states that knowing the level of Variable A
does not help you predict the level of Variable B. That is, the
variables are independent.
H0: Variable A and Variable B are independent.
HA: Variable A and Variable B are not independent.
Test
Assumption
Chi-Square Test
for
Independence
∀𝑖, 𝑗: 𝐸𝑖,𝑗 > 5
Null
Distribution
Test Statistic
𝑟
𝑖=1
𝑂𝑖𝑗 −𝐸𝑖𝑗
c
𝑗=1
𝐸𝑖𝑗
2
𝜒 2 𝐷𝐹 ,
𝐷𝐹 = 𝑟 − 1 𝑐 − 1
p-value
1 − 𝐹0 𝑥𝑂𝐵𝑆
H0: Variable A and Variable B are independent.
HA: Variable A and Variable B are not independent.
𝑂𝑖,𝑗 … observed frequencies
𝐸𝑖,𝑗 … expected frequencies
𝐸𝑖,𝑗
𝑛𝑖 𝑛𝑗
=
𝑛
8) A public opinion poll surveyed a simple random sample of
1000 voters. Respondents were classified by gender (male or
female) and by voting preference (Republican, Democrat, or
Independent). Results are shown in the contingency table in
dataset public_opinion.xls.
Is there a gender gap? Do the men's voting preferences differ
significantly from the women's preferences? Use a 0.05 level of
significance.
You can use Statgraphics or http://www.quantpsy.org/chisq/chisq.htm
Study materials :
 http://homel.vsb.cz/~bri10/Teaching/Bris%20Prob%20&%20Stat.pdf
(p. 111 - p.129)
 http://stattrek.com/tutorials/statistics-tutorial.aspx?Tutorial=Stat
(Hypothesis testing)
 https://onlinecourses.science.psu.edu/stat500/node/56
(Chi-Square Test of Independence)