Download Hypothesis Testing - personal.kent.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Hypothesis Testing
Comparing One Sample to its
Population
Hypothesis Testing w/ One
Sample

If the population mean (μ) and standard
deviation (σ) are known:


Testing if our sample mean ( X) is significantly
different from our sampling distribution of the
mean
Similar to testing if how different an individual
score is from other scores in the sample

What is this test called?
Hypothesis Testing w/ One
Sample

z-score formula [for an individual score (x)] =
z

X 

z-score formula [for means ( X )] =
z
X 

N
Hypothesis Testing w/ One
Sample


Testing score versus
standard deviation for
an distribution of scores
Testing mean versus
standard deviation for
distribution of sample
means

I.e. standard error
z
z
X 

X 

N
Hypothesis Testing w/ One
Sample

Two implications of this formula:

1. Because we are dividing by N (actually √N),
with the same data (same sample & population
mean and σ), but larger sample size, our p-value
will be smaller (i.e. more likely to be significant)

All statistical tests that produce p-values will be
sensitive to sample size – i.e. with enough people
anything is significant at p < .05
Hypothesis Testing w/ One
Sample

Two implications of this formula:

2. If you recall, this formula was derived from the
formula for the normal distribution



This means that your data must be normally
distributed to use this test validly
However, this test is robust to violations of this
assumption – i.e. you can violate it, if you have (a) a
large enough sample or (b) your population data is
normally distributed
Why?
Hypothesis Testing w/ One
Sample

The Central Limit Theorem:
Given a population with mean μ and variance
σ2, the sampling distribution of the mean (the
distribution of sample means) will have a
mean equal to μ (i.e., μ = μ) and a variance
(σ2) equal to σ2/N (and standard deviation, σ
= σ/√N). The distribution will approach the
normal distribution as N, the sample size,
increases.
X
X
X
Hypothesis Testing w/ One
Sample

Example #1:

You want to test the hypothesis that the current
crop of Kent State freshman are more depressed
than Kent State undergraduates in general.




What is your sample and what is your population?
What is your Ho and your H1?
Are you using a one- or two-tailed test?
Assuming that for current Kent State freshman, their
mean depression score is 15, while the mean for all
previous Kent State undergrads (N = 100,000) is 10,
and their standard deviation is 5
Hypothesis Testing w/ One
Sample
15  10
z
= 5/.0158 = 316.46
5
100,000



Look up p associated with z-score in z-table
p < 0.0000
Since this is less than .05 (or .025 if we were using a
two-tailed test), we could conclude that the current
batch of freshman are significantly more depressed
than previous undergrads

Also notice the effect that our large N had on our p-value
Hypothesis Testing w/ One
Sample




Most often, however, we don’t know the μ and σ,
because this is what we’re trying to estimate with
our sample in the first place
The formula for the t statistic accomplishes this by
substituting s2 for σ2 in the formula for the z statistic
Because of this substitution, we have a different
statistic, which requires that we use a different table
than the z-table
Don’t worry too much about why it’s different
Hypothesis Testing w/ One
Sample

Testing mean versus
standard deviation for
distribution of sample
means


I.e. standard error
Testing mean versus
standard deviation for
sample
z
X 

N
X 
t
s
N
Hypothesis Testing w/ One
Sample

After computing our t statistic, we need to compare it
with the t-table (called the Student’s T-Table)


“Student” is a pseudonym for William Gosset
 Gosset worked for the Guiness Brewing Company, but they
wouldn’t let him publish under his own name
First, we will need to become familiar with the
concept of degrees of freedom or df


df = N – 1
This represents the number of individual subjects data
points that are free to vary, if you know the mean or s
already
Hypothesis Testing w/ One
Sample

For example:


If we already know that a particular set of data has a mean
of 5, and 10 scores in total (n = 10)
Once we have nine of those scores, we can calculate the
tenth, however, if we have eight scores we do not know
what the other two scores could be

We can solve x + 5 = 10, but not x + y = 10, because
in the latter we have more than one unknown (x and y)


x and y could be 5 and 5, 8 and 2, 4 and 6, 7 and 3, etc.
Therefore, nine scores are free to vary, then the tenth is
fixed
Hypothesis Testing w/ One
Sample

Factors that influence the z and t statistics:





The difference between the sample mean and population
mean – greater differences = greater t and z values
The magnitude of s (or s2) – since we’re dividing by s,
smaller values of s result in larger values of t or z [i.e. we
want to decrease variability in our sample (error)]
The sample size – the bigger the bigger t and z
The significance level (α) – the smaller the α, the higher the
critical t to reject Ho – although raising α also raises our
Type I Error, so we probably won’t want to do this without
good reason
Whether the test is one- or two-tailed – two-tailed tests split
α into two tails of p< .025, instead of one tail at p < .05
General Approach to
Hypothesis Testing
1. Identify H0 and H1
2. Calculate df and identify the critical test statistic
3. Determine whether to use one- or two-tailed test,
determine what value of α to use (usually .05),
and identify the rejection region(s) that the critical
statistic is the boundary of
4. Calculate your obtained test statistic
7. Compare your value of your test statistic to your
rejection region to determine whether or not to
reject H0
Hypothesis Testing w/ One
Sample

Example #1:

You’ve administered a therapy for people with anorexia that
will supposedly assist them in gaining weight. The following
data are amount of weight gained in pounds over your 16
session therapy for 29 participants. Does this represent a
significantly increased degree of weight gain compared to
the average weight gained without treatment (-.45 lbs.)?



What are Ho and H1?
Will you be using a one- or two-tailed test?
Why?
Based on this, what is your df?
Hypothesis Testing w/ One
Sample

Example #1:
1.7
-9.1
.7
2.1
-.1
-1.4
-.7
1.4
-3.5
-.3
14.9
-3.7
3.5
-.8
17.1
2.4
-7.6
12.6
1.6
1.9
11.7
3.9
6.1
.1
1.1
15.4
-4.0
-.7
20.9
Hypothesis Testing w/ One
Sample

Example #1:

Sample Mean = 87.2/29 = 3.0069
s2 = (1757.8 – [(87.2)2/29])/ 28 = 53.41
s = 7.3085
t = (3.0069 - -.45)/(7.3085/√29) = 2.5472, p < .05

t > Critical t and in our rejection region, which is
above the population mean (since we’re only
interested in people gaining weight), therefore we
reject Ho and conclude that our treatment is more
effective than no treatment at all
Hypothesis Testing w/ One
Sample

Often, if we’re reporting the results of our
experiments to the public, or the results of an
assessment (psychological or otherwise) to a
client, we want to emphasize to them that our
measurements are made with error, or that
our samples include sampling error

We can do this by including intervals around the
scores we report, indicating that the “true” score
measured without error lies in this interval
Hypothesis Testing w/ One
Sample

This is what is known as a Confidence
Interval


In keeping with the p < .05 tradition, we are often
looking for the 95% confidence interval, or the
scores that 95% of our distribution lie, but we can
do this for any interval
They are calculated just like for z-scores, where
we plug the t values into the formula and work
backwards
Hypothesis Testing w/ One
Sample

For a 95% CI:


Given your df (we’ll assume df = 9 for this
example) and type of test (assume a two-tailed
test for now), look up your critical values of t from
a t-table (t = ± 2.262)
Plug into your formula with your Sample Mean
and s (which we’ll assume are 1.463 and .341,
respectively), and solve for μ
Hypothesis Testing w/ One
Sample

For a 95% CI:
± 2.262 = (1.463 – μ)/(.341/√10)
μ = ±2.262(.108) + 1.463 = ±.244 + 1.462
.244 + 1.463 = 1.707
-.244 + 1.463 = 1.219
 CI.95 = 1.219 ≤ μ ≤ 1.707