Download null hypothesis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Inference
Sampling distributions
Hypothesis testing
Sample
Numbers describing
sample are statistics
Population
Numbers describing population
are parameters
Random Sample
Every unit in the population has an
equal probability of being included
in the sample
A random sample should represent
the population well, so sample
statistics from a random sample
should provide reasonable estimates
of population parameters
All sample statistics have some error
in estimating population parameters
If repeated samples are taken from a
population and the same statistic (e.g.
mean) is calculated from each
sample, the statistics will vary, that is,
they will have a distribution
Example
• Average IQ in population is 100
• If I take repeated random samples of size 30
from the population, we would expect to get
samples with means clustered about 100
• Some sample means would be far from 100
– but we expect fewer means the farther we
get from 100
A larger sample provides more
information than a smaller sample so
a statistic from a large sample should
have less error than a statistic from a
small sample
Review
•
•
•
•
Random samples are the best
Statistics have error
Statistics have distributions
Larger sample size (n) is better - less error
The distribution of a statistic is
called a sampling distribution.
Distribution of X s when
sampling from a normal
distribution
The distribution of means has a
normal distribution with
mean =  x  
and

standard deviation =  x 
n
Central Limit Theorem
If the sample size (n) is large enough,
The distribution of means has a normal
distribution with
mean =  x  
and

standard deviation =  x  n
regardless of the population distribution
Example
• Average IQ = 100, SD = 15.
• Take random samples of n = 6 adults.
• Then, sample means are normally
distributed with mean 100 and standard
error 6.12 [from SD/sqrt(N) = 15/sqrt(6)].
Therefore
• 68% of samples of n=6 adults will have an
average IQ between 93.88 and 106.12
• 95% of samples of n=6 adults will have an
average IQ between 87.75 and 112.24
• 99% of samples of n=6 adults will have an
average IQ between 81.63 and 118.37
Same example: larger sample
• Average IQ = 100, SD = 15.
• Take random samples of n = 30 adults.
• Then, sample means are normally
distributed with mean 100 and standard
error 2.74 [from SD/sqrt(n) = 15/sqrt(30)].
Therefore
• 68% of samples of n=30 adults will have an
average IQ between 97.26 and 102.74
• 95% of samples of n=30 adults will have an
average IQ between 94.52 and 105.48
• 99% of samples of n=30 adults will have an
average IQ between 91.78 and 108.22
• So … the larger the sample, the less the
sample averages vary.
Hypothesis Testing
Two ways to learn
about a population
• Confidence intervals
• Hypothesis testing
Confidence Intervals
• Allow us to use sample data to estimate a
population value, like the true mean or the
true proportion.
• Gives a more accurate representation of
what the true population value is likely to
be
Hypothesis Testing
• Allows us to use sample data to test a claim
about a population, such as testing whether
a population mean equals some number.
• e.g., do students spend more than 3 hours
per week on their stats homework
General Idea of
Hypothesis Testing
• Make an initial assumption.
• Collect evidence (data).
• Based on the available evidence, decide
whether or not the initial assumption is
reasonable.
Example
Population of
5 million college
students
Sample of
100 college students
M=2.9
Is the average
GPA 2.7?
How likely is it that
100 students would
have an average
GPA as large as 2.9
if the population
average was 2.7?
Making the Decision
• It is either likely or unlikely that we would
collect the evidence we did given the initial
assumption.
• If it is likely, then we “do not reject” our
initial assumption. There is not enough
evidence to do otherwise.
• Likely is determined by a probability
Making the Decision (cont’d)
• If it is unlikely, then:
– either our initial assumption is correct and we
experienced an unusual event
– or our initial assumption is incorrect
• In statistics, if it is unlikely, we decide to
“reject” our initial assumption.
Idea of Hypothesis Testing:
Criminal Trial Analogy
• First, state 2 hypotheses, the null hypothesis
(“H0”) and the alternative hypothesis (“HA”)
– H0: Defendant is not guilty.
– HA: Defendant is guilty.
Hypotheses
• The null hypothesis always represents the
status quo, i.e. the hypothesis that requires
no change in current behavior.
• The alternative hypothesis is the
conclusion that the researcher is trying to
make.
Criminal Trial Analogy
(continued)
• Then, collect evidence, such as finger
prints, blood spots, hair samples, carpet
fibers, shoe prints, ransom notes,
handwriting samples, etc.
• In statistics, the data are the evidence.
Criminal Trial Analogy
(continued)
• Then, make initial assumption.
– Defendant is innocent until proven guilty.
• In statistics, we always assume the null
hypothesis is true.
Criminal Trial Analogy
(continued)
• Then, make a decision based on the
available evidence.
– If there is sufficient evidence (“beyond a
reasonable doubt”), reject the null hypothesis.
(Behave as if defendant is guilty.)
– If there is not enough evidence, do not reject
the null hypothesis. (Behave as if defendant is
not guilty.)
Important Point
• Neither decision entails proving the null
hypothesis or the alternative hypothesis.
• No matter what decision we make, there is
always a chance we made an error.
Errors in Criminal Trials
Truth
Jury
Decision
Guilty
Not guilty
Not guilty
Guilty
Error
OK
OK
Error
Errors in Hypothesis Testing
Truth
Null
hypothesis
TYPE I
Reject null
ERROR ()
Do not
OK
reject null
Decision
Alternative
hypothesis
OK
TYPE II
ERROR ()
Definitions: Types of Errors
• Type I error: The null hypothesis is
rejected when it is true.
• Type II error: The null hypothesis is not
rejected when it is false.
• Always chance of error – want to minimize
that chance
Example
Population of
many, many adults
Sample of
80 adults
M=98.4
Is average adult
body temperature
98.6 degrees? Or
is it lower?
Average body
temperature of 80
sampled adults is
98.4 degrees.
Example (continued)
• Specify hypotheses.
– H0:  = 98.6 degrees
– HA:  < 98.6 degrees
• Make initial assumption:  = 98.6 degrees
• Collect data: Average body temp of 80
sampled adults is 98.4 degrees. How likely
is it that a sample of 80 adults would have
an average body temp as low as 98.4 if the
average body temp of population was 98.6?
Using the probability
to make the decision
• The probability represents how likely we
would be to observe such an extreme
sample if the null hypothesis were true.
• The probability is a number between 0 and
1.
• Close to 0 means “unlikely.”
• So if probabilities are “small,” (typically,
less than 0.05), then reject the null
hypothesis.
Example (continued)
The probability can easily be obtained from
statistical software like SPSS.
Say we calculate a p of 0.0026
Example (continued)
• The p-value, 0.0026, indicates that, if the
average body temperature in the population
is 98.6 degrees, it is unlikely that a sample
of 80 adults would have an average body
temperature as extreme as 98.4 degrees.
• Decision: Reject the null hypothesis.
• Conclude that the average body temperature
is lower than 98.6 degrees.
What type of error
might we have made?
• Type I error here is claiming that average body
temp is lower than 98.6 when in fact it really isn’t.
• Type II error here is failing to claim that the
average body temp is lower than 98.6 when it is.
• We rejected the null hypothesis, i.e. claimed body
temp is lower than 98.6, so we may have made a
Type I error.