Download Example 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Hypothesis Testing
Introduction to Statistics
Chapter 8
Mar 2-4, 2010
Classes #13-14
Hypothesis Test

A statistical method that uses the sample
data to evaluate a hypothesis about a
population parameter
Hypothesis-Testing Procedure

State Hypothesis



Set criteria for decision



Larger samples are preferred
Collect data and compute sample statistics


Must be clearly set before testing
Set alpha level (also before testing)
Obtain a random sample


Use hypothesis to predict characteristics of population
Null (H0) vs. Alternative (HA)
Calculate z-scores
Make a decision

Compare obtained sample with hypothesis
Example 1

Suppose that we want to compare the
crime rate in San Diego with the crime rate
in the rest of the country…
 Is
there more or less crime in San Diego than
the national average?
Example 1
we start with the hypothesis that “the
crime rate on average in San Diego is the
same as the national average”
 To test our hypothesis, we ask what sample
means would occur if many samples of the
same size were drawn at random from our
population if our hypothesis is true
 First,
Example 1
 We
can now refer to the sampling distribution
of the mean, for an infinite series of samples
of size n, drawn from a population whose
mean is the same as the national average,
and we compare our sample mean with those
in this sampling distribution
 If our hypothesis is true, then the distribution
of sample means will be centered about the
national average
Example 1
 Suppose
that the relationship between our
sample mean and those of the sampling
distribution of the mean looks like this…
Our hypothesized
value.
Our obtained
value.
Example 1
 If
so, our sample mean is one that could
reasonably occur if the hypothesis is true, and
we will retain our hypothesis as one that could
be true

The crime rate of San Diego is the same as the
national average
Example 1
 On
the other hand, if the relationship between
our sample mean and those of the sampling
distribution of the mean looks like this…
Our hypothesized
value.
Our obtained
value.
Example 1
 Our
sample mean is so deviant that it would
be quite unusual to obtain such a value when
our hypothesis is true
 In this case, we would reject our hypothesis
and conclude that it is more likely that the
crime rate of San Diego is not the same as
the national average

The population represented by the sample differs
significantly from the comparison population
Null Hypothesis
The hypothesis that we put to the test is
called the null hypothesis, symbolized H0
 The null hypothesis usually states the
situation in which there is no difference
(the difference is “null”) between
populations

Alternative Hypothesis
The alternative hypothesis, symbolized HA,
is the opposite of the null hypothesis
 The alternative hypothesis is also
identified as the research hypothesis, or
the “hunch” that the investigator wants to
test

Null and Alternative Hypotheses

Both H0 and HA are statements about
population parameters, not sample
statistics
A
decision to retain the null hypothesis implies
a lack of support for the alternative hypothesis
 A decision to reject the null hypothesis implies
support for the alternative hypothesis
When do we retain and when do we
reject the null hypothesis?


When we draw a random sample from a
population, our obtained value of the sample
mean will almost never exactly equal the mean
of our population
The decision to reject or retain the null
hypothesis depends on the selected criterion for
distinguishing between those sample means that
would be common and those that would be rare
if H0 was true
When do we retain and when do we
reject the null hypothesis?
If the sample mean is so different from
what is expected when H0 is true that its
appearance would be unlikely, H0 should
be rejected
 But what degree of rarity of occurrence is
so great that it seems better to reject the
null hypothesis than to retain it?

When do we retain and when do we
reject the null hypothesis?
This decision is somewhat arbitrary, but
common research practice is to reject H0 if
the sample mean is so deviant that its
probability of occurrence in random
sampling is .05 or less
 Such a criterion is called the level of
significance, symbolized 

Rejection Regions


For our purposes, we will adopt the .05 level of
significance.
Therefore, we will reject H0 only if our obtained sample
mean is so deviant that it falls in the upper 2.5% or lower
2.5% of all the possible sample means that would occur
when H0 is true.


The portions of the sampling distribution that include the values
of the mean that lead to rejection of the null hypothesis are
called rejection regions.
If our sample mean falls in the middle 95% of the
distribution of all possible values of the mean that could
occur when H0 is true, we will retain the null hypothesis.
Critical Values

We can use the normal curve table to calculate the Z
values, called critical values, that separate the upper
2.5% and lower 2.5% of sample means from the
remainder.
Example 2



Suppose our obtained sample mean (n = 100) of
the crime rate in Boston is a score of 90
Suppose that the national average is known to
be 85, with a standard deviation of 20
Even if the population mean really is a score of
85, because of random sampling variation we do
not expect the mean of a sample randomly
drawn from a population to be exactly 85
(although it could be)
Example 2
Using the Sampling Distribution of
the Mean to Determine Probability
The important question is what is the
relative position of the obtained sample
mean among all those that could have
been obtained if the hypothesis is true?
 To determine the position of the obtained
sample mean, it must be expressed as a Z
score.

Z score
Before, you were finding the Z score of a
single individual on a distribution of a
population of individuals
 In hypothesis testing, you are finding a Z
score of your sample’s mean on a
distribution of means

Example 2

In current study,
obtained sample mean  hypothesiz ed population mean (when Ho is true )
standard error of the mean
90  85 5
Z
  2.5
20
2
100
Z
Example 2
Our sample mean is 2.5 standard errors of
the mean greater than expected if the null
hypothesis were true.
 The value of 2.5 falls in the rejection
region, so we reject H0 and retain HA.
 We can conclude that the mean of the
population from which the sample came
from is not 85.

Example 2
The crime rate of Boston is, on average,
different from (greater than) other cities of
the country.
 Notice that the conclusion is about the
population represented by the sample
under study and not simply the particular
sample itself.

What if we had used  = .01?
If we retain H0, what can we conclude?
The decision to retain H0 does not mean
that it is likely that H0 is true.
 Rather, this decision reflects the fact that
we do not have sufficient evidence to
reject the null hypothesis.
 Certain other hypotheses would also have
been retained if tested in the same way.

If we retain H0, what can we conclude?




Consider our example where the hypothesized population mean is
85.
If we had obtained a sample mean of 86, the null hypothesis would
have been retained.
But suppose the hypothesized population mean was 87.
If we had obtained a sample mean of 86, the null hypothesis would
also have been retained.
What if we obtain a mean of 80 and what
if we had used  = .01?
(Hypothesized population mean was 87)
Example 3







A teacher believes that by taking her summer course students will
achieve a higher score on a biology final exam taken at the end of
semester. The exam has a maximum score of 275 points. The teacher
has been at the college for 30 years and has kept the data on these
exam scores. The known population mean is 200 with a standard
deviation of 15.
40 students decide to spend the summer taking this prep course. In
review of their scores on the final exam the teacher is pleased as she
reports to dean the success of her summer course. The 40 students
achieved a mean score of 205.
The dean hires someone to do a statistical analysis to determine the
efficiency of the summer course.
State the Null (H0) and Alternative (HA) hypotheses.
Use an alpha level of .01.
What is your decision?
Interpret this decision.
Example 3
Strength of Decision
Rejecting the null hypothesis means that
H0 is probably false, a strong decision.
 Retaining the null hypothesis is a weak
decision.

Two-tailed Test

The alternative hypothesis states that the population
parameter may be either less than or greater than the
value stated in H0.

The critical region is divided between both tails of the sampling
distribution.
Two-tailed Test

This type of test is desirable in certain
research situations
 For
example, in cases in which the
performance of a group is compared to a
known standard, it would be of interest to
discover that the group is superior or inferior
One-tailed Test

The alternative hypothesis states that the
population parameter differs from the
value stated in H0 in one particular
direction.
 The
critical region is located only in one tail of
the sampling distribution.
One-tailed Test

Upper-tail Critical

Lower-tail Critical
One-tailed Test


The advantage of a one-tailed test is that it is
more sensitive to detecting a false hypothesis in
the direction of concern than a two-tailed test.
The major disadvantage of a one-tailed test is
that it precludes any chance of discovering that
reality is just the opposite of what the alternative
hypothesis says.
Hypothesis Testing
Test Result –
True State
H0 True
H0 False
H0 True
H0 False
Correct
Decision
Type I Error
Type II Error
Correct
Decision
  P(Type I Error )   P(Type II Error )
• Goal: Keep ,  reasonably small
Errors in Hypothesis Testing

Type I Error:
 Occurs
when a researcher rejects a null
hypothesis that is actually true
 Concluding there IS an effect when there is
NOT

Type II Error:
 Occurs
when a researcher fails to reject a null
hypothesis that is false
 Basically, here the hypothesis test has failed
to detect a real treatment effect
Example - Efficacy Test for New
drug





Drug company has new drug, wishes to compare it
with current standard treatment
Federal regulators tell company that they must
demonstrate that new drug is better than current
treatment to receive approval
Firm runs clinical trial where some patients receive
new drug, and others receive standard treatment
Numeric response of therapeutic effect is obtained
(higher scores are better).
Parameter of interest: mNew - mStd
Example - Efficacy Test for New drug

Null hypothesis - New drug is no better than standard trt
m New  mStd  0
H 0 : m New  mStd  0
• Alternative hypothesis - New drug is better than standard trt
H A : m New  m Std  0
• Experimental (Sample) data:
y New
y Std
s New
sStd
nNew
nStd
Example - Efficacy Test for New drug

Type I error - Concluding that the new drug is better than
the standard (HA) when in fact it is no better (H0). Ineffective
drug is deemed better.

Type II error - Failing to conclude that the new drug is
better (HA) when in fact it is. Effective drug is deemed to be
no better.
Effect Size
Effect size is a measure of the strength of
the relationship between two variables
 In scientific experiments, it is often useful
to know not only whether an experiment
has a statistically significant effect, but
also the size of any observed effects
 In practical situations, effect sizes are
helpful for making decisions

Effect Size


The concept of effect size appears in everyday
language.
For example, a weight loss program may boast
that it leads to an average weight loss of 30
pounds. In this case, 30 pounds is an indicator
of the claimed effect size. Another example is
that a tutoring program may claim that it raises
school performance by one letter grade. This
grade increase is the claimed effect size of the
program.
Effect Size

Cohen’s d
 An
effect size measure representing the
standardized difference between two means
Effect Size

mean difference
M  m
Cohen' s d 

population standard deviation
r

Example 2


M  m   90  85 
r
20
5
 0.25
20
Small effect (small to medium). See Table 8.2 (page 233).
Example 3

d
d
M  m   205  200 
r
15
5
 0.33
15
Small effect (small to medium). See Table 8.2 (page 233).
Credits


http://psy.ucsd.edu/~sky/Psyc%2060%20Hypothesis%20Testing.ppt#3
http://www.stat.ufl.edu/~winner/sta6934/hyptest.ppt#2