Download Introducing Hypothesis Tests

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
STAT 250
Dr. Kari Lock Morgan
Normal Distribution
Chapter 5
• Normal distribution
• Normal distribution for p-values (5.1)
• Normal distribution for confidence intervals (5.2)
Statistics: Unlocking the Power of Data
Lock5
Question of the Day
How do Malaria parasites impact
mosquito behavior?
Statistics: Unlocking the Power of Data
Lock5
Malaria Parasites and Mosquitoes
 Mosquitos were randomized to either eat from a malaria
infected mouse or a healthy mouse
 After infection, the parasites go through two stages:
1)
2)
Oocyst (not yet infectious), Days 1-8
Sporozoite (infectious), Days 9 – 28
 Response variable: whether the mosquito approached a
human (in a cage with them)
 Does this behavior differ by infected vs control? Does it
differ by infection stage?
 Dr. Andrew Read, Professor of Biology and
Entomology and Penn State, is a co-author
Cator LJ, George J, Blanford S, Murdock CC, Baker TC, Read AF, Thomas MB. (2013).
‘Manipulation’ without the parasite: altered feeding behaviour of mosquitoes is not
dependent on infection with malaria parasites. Proc R Soc B 280: 20130711.
Statistics: Unlocking the Power of Data
Lock5
Malaria Parasites and Mosquitoes
 Malaria parasites would benefit if:
 Mosquitos
sought fewer blood meals after getting
infected, but before becoming infectious (oocyst
stage), because blood meals are risky
 Mosquitoes
sought more blood meals after
becoming infectious (sporozoite stage), to pass on
the infection
 Does infecting mosquitoes with Malaria
actually impact their behavior in this way?
Statistics: Unlocking the Power of Data
Lock5
Oocyt Stage
We’ll first look at the Oocyst stage, after the
infected group has been infected, but before they
are infectious.
pC: proportion of controls to approach human
pI: proportion of infecteds to approach human
What are the relevant hypotheses?
a) H0: pI = pC, Ha: pI < pC
b) H0: pI = pC, Ha: pI > pC
c) H0: pI < pC, Ha: pI = pC
d) H0: pI > pC, Ha: pI = pC
Statistics: Unlocking the Power of Data
Lock5
Data: Oocyst Stage
20 36
p̂I - p̂C =
= 0.177- 0.308 = -0.131
113 117
Statistics: Unlocking the Power of Data
Lock5
Randomization Test
p-value
observed statistic
Statistics: Unlocking the Power of Data
Lock5
Randomization and Bootstrap Distributions
Mate choice and
offspring fitness
Tea and immunity
Difference in
proportions
Difference in
means
What do you
notice?
Mercury in fish
Single mean
Statistics: Unlocking the Power of Data
Sleep and
anxiety
Correlation
Lock5
Normal Distribution
 The symmetric bell-shaped curve we have seen
for almost all of our distribution of statistics is
called a normal distribution
 The normal distribution is fully described by
it’s mean and standard deviation:
N(mean, standard deviation)
Statistics: Unlocking the Power of Data
Lock5
Central Limit Theorem
For a sufficiently large sample size,
the distribution of sample statistics
for a mean or a proportion is normal
 The larger the sample sizes, the more normal
the distribution of the statistic will be
 www.lock5stat.com/statkey
Statistics: Unlocking the Power of Data
Lock5
Randomization Distributions
N(mean, standard deviation)
If a randomization distribution is
normally distributed, we can write it as
a) N(null value, se)
b) N(statistic, se)
c) N(parameter, se)
Statistics: Unlocking the Power of Data
Lock5
Malaria and Mosquitoes
Which normal
distribution
should we use to
approximate this?
Statistics: Unlocking the Power of Data
a) N(0, -0.131)
b) N(0, 0.056)
c) N(-0.131, 0.056)
d) N(0.056, 0)
Lock5
Normal Distribution
We can compare the original statistic to this Normal
distribution to find the p-value!
Statistics: Unlocking the Power of Data
Lock5
p-value from N(null, SE)
p-value
Exact same idea
as randomization
test, just using a
smooth curve!
observed statistic
Statistics: Unlocking the Power of Data
Lock5
Standardized Data
 Often, we standardize the statistic to have
mean 0 and standard deviation 1
 How? z-scores!
statistic
null value
x  mean
z
sd
SE
 What is the equivalent for the null distribution?
Statistics: Unlocking the Power of Data
Lock5
Standardized Statistic
The standardized test statistic (also
known as a z-statistic or SDS) is
statistic - null
z=
SE
•
Calculating the number of standard errors a
statistic is from the null lets us assess
extremity on a common scale
Statistics: Unlocking the Power of Data
Lock5
Standardized Statistic
statistic - null
z=
SE
Malaria and Mosquitoes:
 From original data: statistic = -0.131
 From null hypothesis: null value = 0
 From randomization distribution: SE = 0.056
statistic - null -0.131 - 0
z=
=
= -2.34
SE
0.056
Compare to N(0,1) to find p-value…
Statistics: Unlocking the Power of Data
Lock5
Standard Normal
• The standard normal distribution is the
normal distribution with mean 0 and standard
deviation 1
N  0,1
• Standardized statistics are compared to the
standard normal distribution
Statistics: Unlocking the Power of Data
Lock5
p-value from N(0,1)
If a statistic is normally distributed
under H0, the p-value can be calculated
as the proportion of a N(0,1) beyond
statistic - null
z=
SE
Statistics: Unlocking the Power of Data
Lock5
p-value from N(0,1)
p-value
Exact same idea
as before, just
standardized!
standardized statistic
Statistics: Unlocking the Power of Data
Lock5
Randomization
test:
Replace with
smooth curve
The p-value is always the
proportion in the tail(s)
beyond the relevant statistic!
We have evidence that mosquitoes
exposed to malaria parasites are less
likely to approach a human before they
become infectious than mosquitoes not
exposed to malaria parasites.
N(null, SE)
N(0,1)
Standardize
Statistics: Unlocking the Power of Data
Lock5
Sporozoite Stage
For the data from the Sporozoite stage, after
infectious, what are the relevant hypotheses?
pC: proportion of controls to approach human
pI: proportion of infecteds to approach human
What are the relevant hypotheses?
a) H0: pI = pC, Ha: pI < pC
b) H0: pI = pC, Ha: pI > pC
c) H0: pI < pC, Ha: pI = pC
d) H0: pI > pC, Ha: pI = pC
Statistics: Unlocking the Power of Data
Lock5
Data
Oocyst Stage
Sporozoite Stage
20 36
p̂I - p̂C =
113 117
= 0.177- 0.308
37 14
p̂I - p̂C =
149 144
= 0.248- 0.097
= -0.131
= 0.151
Statistics: Unlocking the Power of Data
Lock5
Sporozoite Stage
The difference in proportions is 0.15 and the
standard error is 0.05. Is this significant?
a) Yes
b) No
Statistics: Unlocking the Power of Data
Standard
normal
Lock5
Malaria and Mosquitoes
 It appears that mosquitoes infected by malaria
parasites do, in fact, behave in ways
advantageous to the parasites!
 Oocyst
stage results: They are less likely to engage in
risky blood meals before becoming infectious (so
more likely to stay alive)
 Sporozoite stage
results: They are more likely to
engage in risky blood meals after becoming infectious
(so more likely to pass on disease)
 Toxoplasmosis parasites impact rat and human
behavior, malaria parasites impact mosquito
behavior… what else might parasites be
impacting that we have yet to discover?
Statistics: Unlocking the Power of Data
Lock5
Proportion of Infected
 All mosquitoes in the infected group were
exposed to the malaria parasites, but not all
mosquitoes were actually infected
 Of the 201 mosquitoes in the infected group
that we actually have data on, only 90 were
actually infected (90/201 = 0.448)
 What proportion of mosquitoes eating from a
malaria infected mouse become infected?
 We want a confidence interval!
Statistics: Unlocking the Power of Data
Lock5
Bootstrap Interval
95% Confidence Interval
Statistics: Unlocking the Power of Data
Lock5
Bootstrap Distributions
If a bootstrap distribution is normally
distributed, we can write it as
a)
b)
c)
d)
N(parameter, sd)
N(statistic, sd)
N(parameter, se)
N(statistic, se)
sd = standard deviation of data values
se = standard error = standard deviation of statistic
Statistics: Unlocking the Power of Data
Lock5
Normal Distribution
We can find the middle P% of this Normal
distribution to get the confidence interval!
Statistics: Unlocking the Power of Data
Lock5
CI from N(statistic, SE)
Same idea as the
bootstrap, just using
a smooth curve!
95% CI
Statistics: Unlocking the Power of Data
Lock5
Bootstrap
Interval:
95% CI
Replace with
smooth curve
N(statistic, SE)
95% CI
Statistics: Unlocking the Power of Data
Lock5
(Un)-standardization
 Standardized scale:
x - mean
z=
sd
 To un-standardize:
x - mean
z=
sd
z×sd = x - mean
x = mean+ z×sd
Statistics: Unlocking the Power of Data
Lock5
(Un)-standardization
 In testing, we go to a standardized statistic
 In intervals, we find (-z*, z*) for a standardized
distribution, and return to the original scale
 Un-standardization (reverse of z-scores):
statistic
± z*
SE
x = mean + z × sd
 What’s the equivalent for the distribution of
the statistic? (bootstrap distribution)
Statistics: Unlocking the Power of Data
Lock5
P% Confidence Interval
1. Find values (–z*
and z*) that capture
the middle P% of
N(0,1)
2. Return to
original scale with
statistic ± z*× SE
P%
-z*
Statistics: Unlocking the Power of Data
z*
Lock5
Confidence Interval using N(0,1)
If a statistic is normally distributed, we find a
confidence interval for the parameter using
statistic ± z*× SE
where the proportion between –z* and +z* in
the standard normal distribution is the desired
level of confidence.
Statistics: Unlocking the Power of Data
Lock5
Confidence Intervals
Find z* for a 99% confidence interval.
www.lock5stat.com/statkey
Statistics: Unlocking the Power of Data
Lock5
Proportion of Infected
 Proportion of infected mosquitoes:
 Sample
statistic (from data): 90/201 = 0.448
 z*
(from standard normal): 2.575
 SE
(from bootstrap distribution): 0.037
 Give a 99% confidence interval for the
proportion of mosquitoes who get infected.
Statistics: Unlocking the Power of Data
Lock5
z*
 Why use the standard normal?
 z* is always the same, regardless of the data!
 Common confidence levels:
 95%: z*
= 1.96 (but 2 is close enough)
 90%: z*
= 1.645
 99%: z* =
2.576
Statistics: Unlocking the Power of Data
Lock5
N(0, 1)
Bootstrap
Interval:
middle 95%
Replace with
smooth curve
N(statistic, SE)
middle 95%
Unstandardize
statistic ± z* × SE
0.448 ± 1.96 × 0.037
(0.375, 0.521)
We are 95% confident that only between
0.375 and 0.521 of the mosquitoes
exposed to infection actually get infected.
middle 95%
Statistics: Unlocking the Power of Data
Lock5
Malaria and Mosquitoes
 Should we limit our analysis to only those
mosquitoes that actually got infected? Why or
why not?
 In favor of yes:
 In favor of no:
Statistics: Unlocking the Power of Data
Lock5
Confidence Interval Formula
If distribution of statistic is normal…
From N(0,1)
sample statistic  z  SE
*
From original
data
Statistics: Unlocking the Power of Data
From
bootstrap
distribution
Lock5
Formula for p-values
If distribution of statistic is normal…
From original
data
From H0
sample statistic  null value
z
SE
From
randomization
distribution
Statistics: Unlocking the Power of Data
Compare z to
N(0,1) for p-value
Lock5
Standard Error
• Wouldn’t it be nice if we could compute
the standard error without doing
thousands of simulations?
• We can!!!
• Or at least we’ll be able to next class…
Statistics: Unlocking the Power of Data
Lock5
To Do
 HW 5.1, 5.2 (due Monday, 3/27)
Statistics: Unlocking the Power of Data
Lock5