Download p-value functions and likelihoods

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
Lecture 9:
p-value functions and intro to
Bayesian thinking
Matthew Fox
Advanced Epidemiology
If you are a hypothesis tester,
which pvalue is more likely to
avoid a type I error
P = 0.049
P = 0.005
What is the p-value fallacy?
If you go to the doctor with a set
of symptoms, does the doctor
develop a hypothesis and test it?
Anyone heard of Bayesian
statistics?
After completing a study, would you rather
know the probability of the data given the
null hypothesis, or the probability of the null
hypothesis given the data?
Last Session

Randomization
–
–
–

P-values
–
–

Leads to average confounding, gives meaning to p-values
Provides a known distribution for all possible observed results
Observational data does not have average 0 confounding
Probability under the null that a test statistic would be greater
than or equal to its observed value, assuming no bias.
Not the probability of the data, the null, a significance level
Confidence intervals
–
–
Calculated assuming infinite repetitions of the data
Don’t give a probability of containing the true value
Today


The p-value fallacy
p-value functions
–

Shows p-values for the data at a range of hypotheses
Bayesian statistics
–
–
–
The difference between Frequentists and Bayesians
Bayesian Theory
How to apply Bayes Theory in Practice
P-value fallacy

P-value developed by Fisher as informal
measure of compatibility of data with null
–
–

Provides no guidance on significance
Should be interpreted in light of what we know
Hypothesis testing developed by Neyman,
Pearson to minimize errors in the long run
–

RA Fisher
Jerzy
Egon Pearson
(not Karl, his father) Neyman
P = 0.04 is no more evidence than p = 0.00001
Fallacy is that p-value can do both
Different goals

The goal of epidemiology:
–

The goal of policy:
–

To measure precisely and accurately the
effect of an exposure on a disease
To make decisions
Given our goal:
–
–
Why hypothesis testing?
Why compare to the null?
p-value functions (1)

Recall that a p-value is:
–

“The probability under the test hypothesis (usually
the null) that a test statistic would be ≥ to its
observed value, assuming no bias.”
We can calculate p-values under test
hypotheses other than the null
–
–
Particularly easy if we use a normal approximation
If we assume we can fix the margins with
observational data
p-value functions (1)
Exposed
Unexposed
Disease
6
3
No disease
14
17
Total
20
20
Risk
0.3
0.15
2.00 RR Observed
0.63 SE(ln(RR))
p-value functions (2a)
To calculate a test statistics (Z score)
for a p-value, usually given:
ln( RR ) ln( 2.0)
z

 1.1
SE ( RR )
0.63
p ( z  1.1)  0.27
Ln(1) = 0
p-value functions (2b)
ln( RRo )  ln( RR H )
z
,
SE ( RRo )
c
d
SE ( RR ) 

a  Ne b  N E
abs ( z )
p  2  (1 


1
 21 z 2
e dz )
2
p-value functions (3)
Hypothesis RR
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Observed
RR a
RR b
RR c
RR d
RR e
RR f
RR g
RR h
RR I
0.1
0.20
0.33
0.5
1
2.00
3
5
10
z-value
p-value
4.737 0.0E+00
3.641
0.0000
2.849
0.0040
2.192
0.0280
1.096
0.2730
0.000
1.0000
-0.641
0.5210
-1.449
0.1470
-2.545
0.0110
p-value functions (4)
2-sided p-value
p-value function for example
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Null pval
0.27
UCLM
LCLM
0.1
0.58
Point
estimate
1
RR hypothesis
2
6.9 10
p-value functions (4)
2-sided p-value
p-value function for example
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
0.1
1
RR hypothesis
10
Case-control study of spermicides
and Down Syndrome
Interpretation
Introduction to Bayesian Thinking
What is the best estimate of the
true effect of E on D?


Given a disease D and
An exposure E+ vs. unexposed E-:
–
–

The relative risk associating D w/ E+ (vs E-) = 2.0
with 95% confidence interval 1.5 to 2.7
OK, but why did you say what you said?
–
Have no other info to go on
What is the best estimate of the
true effect of E on D?


Given a disease D and
An exposure E+ vs. unexposed E-:
–
–

The relative risk associating D w/ E+ (vs E-) = 2.0
with 95% confidence interval 0.5 to 8.0
Note that the width of the interval doesn’t
affect the best estimate in this case
What is the best estimate of the
true effect of E on D?


Given a disease D (breast cancer) and
An exposure E+ (ever-smoking) vs.
unexposed E- (never-smoking):
–
–

The relative risk associating D w/ E+ (vs E-) = 2.0
with 95% confidence interval 1.0 to 4.0
Most previous studies of smoking and BC
have shown no association
What is the best estimate of the
true effect of E on D?


Given a disease D (lung cancer) and
An exposure E+ (ever-smoking) vs.
unexposed E- (never-smoking):
–
–

The relative risk associating D w/ E+ (vs E-) = 2.0
with 95% confidence interval 1.0 to 4.0
Most previous studies of smoking and LC
have shown much larger effects
What is the best estimate of the
true probability of heads?


Given a 100 flips of a fair coin flipped in a fair way
Observed number of heads = 40
–
–


The probability of heads equals 0.40
with 95% confidence interval 0.304 to 0.496
Given what we know about a fair coin, why should
this data override what we know?
So why would we interpret our study data as if it
existed in a vacuum?
The Monty Hall Problem
An alternative to frequentist
statistics



Something important about the data is not
being captured by the p-value and confidence
interval, or at least in the way they’re used
What is missing is a measure of the evidence
provided by the data
Evidence is a property of data that makes us
alter our beliefs
Frequentist statistics fail as
measures of evidence

The logical underpinning of frequentist
statistics is that
–

“if an observation is rare under a hypothesis, then
the observation can be used as evidence against
the hypothesis.”
Life is full of rare events we accord little
attention
–
What makes us react is a plausible competing
hypothesis under which the data are more probable
Frequentist statistics fail as
measures of evidence
• Null p-value provides no information about
the probability of alternatives to the null
• Measurement of evidence requires 3 things:
• The observations (data)
• 2 competing hypotheses (often null and alternative)
• The p-value incorporates data & 1
hypothesis
• usually the null
The likelihood as a measure of
evidence


Likelihood = c*Probability(dataH)
Data are fixed and hypotheses variable
–

p-values calculated with fixed (null) hypothesis and
assuming data are randomly variable.
Evidence supporting one hypothesis
versus another = ratio of their likelihoods
–
Log of the ratio is an additive measure of support
Evidence versus belief

The hypothesis with the higher likelihood is
better supported by the evidence
–

Belief also depends on prior knowledge,
and can be incorporated w/ Bayes Theorem
–

But that does not make it more likely to be true.
It is the likelihood ratio that represents the data
Priors can be subjective or empirical
–
But not arbitrary
Bayesian analysis (1)

Given (1) the prior odds that a hypothesis is
true, and (2) data to measure the effect
–
–
Update the prior odds using the data to calculate the
posterior odds that the hypothesis is true.
A formal algorithm to accomplish what many do
informally
Bayesian analysis (2)
p(H1 ) p(D H1 ) p(H1 D)


p(H0 ) p(D H0 ) p(H0 D)


Prior odds times the likelihood ratio equals the posterior
odds
Only for people with an ignorant prior distribution
(uniform) can we say that the frequentist 95% CI covers
the true value with 95% certainty
Bayesian analysis (3): Environmental
tobacco smoke and breast cancer
likelihood prior odds posterior
study
observation ratio
(A)
odds
Sandler
1.6 (0.8-3.4)
1.78
1.0
1.78
Hirayama 1.3 (0.8-2.1)
0.38
1.8
0.7
Smith
1.6 (0.8-3.1)
2.12
0.7
1.4


H1 = [OR = 2]; H0 = [OR = 1]
Initially, the analyst has no preference (prior odds = 1)
Bayesian analysis (4): concepts

Keep these concepts separate:
–
–
–
The hypotheses under comparison
(e.g., RR=2 vs RR=1)
The prior odds (>1 favors 1st hypothesis (RR=2),
<1 favors the 2nd hypothesis (RR=1))
The estimate of effect for a study
(this is the data used to modify the prior odds)
Bayesian analysis (5): concepts

Keep these concepts separate:
–
–
The likelihood ratio (probability of data under 1st
hypothesis versus under 2nd hypothesis. >1 favors
1st hypothesis, <1 favors the 2nd hypothesis)
The posterior odds (compares the hypotheses
after observing the data. >1 favors 1st hypothesis,
<1 favors the 2nd hypothesis)
Bayesian analysis:
Part V: Problem 3 (20 points total)
The following table shows odds ratios, 95% confidence intervals, and the standard error of the
ln(OR) from three studies of the association between passive exposure to tobacco smoke and the
occurrence of breast cancer. The fourth column shows the likelihood ratio calculated as the
1
0
likelihood under the hypothesis that the true relative risk equals 2.0 divided by the likelihood
under the hypothesis that the true relative risk equals 1.0.
H = RR = 2.0
study
H = RR = 1.0
SE(ln(OR))
likelihood ratio
prior odds
posterior odds
Sandler
observation
95% CI
1.6 (0.8-3.4)
0.38
1.8
0.75
1.34
Hirayama
1.3 (0.8-2.1)
0.24
0.4
1.34
0.50
Smith
1.6 (0.8-3.1)
0.34
2.1
0.50
1.07
A. (7 points) Assume that someone favors the hypothesis that the true odds ratio equals 1.0 over
the hypothesis that the true odds ratio equals 2.0. The person quantifies their preference by
stating that their prior odds for the hypothesis of a relative risk equal to 2.0 versus the hypothesis
of a relative risk equal to 1.0 is 0.75 (see the first row of the table). Complete the shaded cells of
the table using Bayesian analysis. After seeing these three studies, should the person favor (circle
one):
the hypothesis of 2.0
the hypothesis of 1.0
http://statpages.org/bayes.html
Connection between p-values and
the likelihood ratio
Bayesian intervals

Bayesian intervals require specification of
prior odds for entire distribution of
hypotheses, not just two hypotheses
–

Update the distribution with data
–

Distribution will look like a p-value function, but
incorporate only prior knowledge.
Posterior distribution
Choose interval limits
Priors
1
0.9
probability density
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
1
10
RR Hypothesis
Advocate
Skeptic
Ignorant
Bimodal
+ Sandler
1
0.9
probability density
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
1
10
RR Hypothesis
advocate
skeptic
ignorant
bimodal
+ Hirayama
1
0.9
probability density
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
1
10
RR Hypothesis
advocate
skeptic
ignorant
bimodal
+ Smith
1
0.9
probability density
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
1
10
RR Hypothesis
advocate
skeptic
ignorant
bimodal
+ Morabia
1
0.9
probability density
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
1
10
RR Hypothesis
advocate
skeptic
ignorant
bimodal
+ Johnson
1
0.9
probability density
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
1
10
RR Hypothesis
advocate
skeptic
ignorant
bimodal
Conclusion

Pvalue fallacy
–

Pvalue functions
–

Cannot serve both the long run perspective and the
individual study perspective
Can help see the entire distribution of probabilities
Bayesian analysis
–
Allows us to change our beliefs with new information
and measure the probability of hypothesis