Download Lecture 2 - Biostatistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Effect size wikipedia , lookup

Transcript
Lecture 2: Association and Inference
Karen Bandeen-Roche, PhD
Department of Biostatistics
Johns Hopkins University
July 12, 2011
Introduction to Statistical
Measurement and Modeling
Data motivation
 Osteoporosis screening
 Scientific question: Can we detect osteoporosis earlier
and more safely?
 Some related statistical questions:
 Are ultrasound measurements associated with
osteoporosis status?
 How strong is the evidence of association?
 By how much do the mean ultrasound values differ
between those with and without osteoporosis?
 How precisely can we determine that difference?
Data motivation
DPA scores by osteoporosis groups
0.6
1600
0.7
1700
0.8
1800
0.9
1.0
1900
1.1
2000
1.2
Ultrasound scores by osteoporosis groups
control
case
control
case
Outline
 Association
 Conditional probability
 Joint distributions / Independence (note SRS)
 Correlation
 Statistical evidence / certainty
 Confidence intervals
 Statistical tests
Association - Heuristic
 Two random variables are associated if…
 … they are “connected” or “related”
 … knowing one helps predict the other
 “Association” and “causality” are not equivalent
 Two variables may be associated because a third variable
causes each
 Even if an association is causal the direction may not be
clear (which causes which)
 Association is necessary, but not sufficient, for causality
Association - Formal
 Tool: Conditional probability
 Definition for two events A and B: If P(B) > 0, the
conditional probability that event A occurs, given that
event B has occurred, is P(A|B) := P(A∩B)/P(B)
 Key concept: independence
 A,B are pairwise independent if P{A,B} = P{A}P{B}.
 Events {Aj, j = 1,...,n} are mutually independent if
P{∩j=1n Aj} = ∏1n P{XjεAj}
 A, B are independent iff P(A|B) = P(A)
 Osteoporosis example: A=ultrasound<1750, B=case?
Association - Formal
 What about random variables?
 Joint distribution function: FX,Y(x,y) = P{X≤x,Y≤y} =
P{{X≤x} ∩ {Y≤y}}
 Joint mass, density functions:
 Discrete X, Y: pXY(x,y)=P{X=x,Y=y}
 Continuous X, Y: fX,Y (x,y) = d2/(dsdt) FX,Y (s,t) |(x,y)
 Conditional mass, density functions:
 Discrete X, Y: pX|Y(x|y)=pXY(x,y)/pY(y)
 Continuous X, Y: f(x|y) = fX,Y(x,y)/fY (y)
Association - Formal
 Key concept: independence
 RVs X,Y are independent if for each x є SX , y є SY
 FX,Y(x,y) = FX(x)FY(y)
 pX|Y(x|y)=pX(x) (discrete X)
 f(x|y) = fX(x) (continuous Y)
 Osteoporosis example: X=ultrasound score, Y = 1 if
osteoporosis case and Y=0 if osteoporosis control
 Are the ultrasound densities the same for cases & controls?
 Concepts generalize to mixed continuous / discrete (etc.),
multiple random variables
Association - Examples
 Discrete random variables
 Let
X=1 if ultrasound < 1750, X=0 otherwise
Y = 1 if osteoporosis case and Y=0 if control
 Suppose the joint mass function in a population of older
women is as follows:
Y=0
Y=1
X=0
.43
.07
X=1
.10
.40
 Are low ultrasound and osteoporosis associated?
Association - Examples
 Continuous random variables
 A pair of random variables (X,Y) is said to be bivariate
normal if it is distributed according to the following density:

1
1
f ( x, y ) 
exp 
 2 1  2
21 2 1   2



2

x  1   y  2   y  2  



 x  1


  2
 

1
2
 1 
  2  



  is the “correlation” parameter
 Bivariate normal X,Y are independent if
f  x 
2

1
 1  x  1  

exp 
 
2
2 
21

  1  

 =0
Association - Examples
 Continuous random variables
Association - Formal
 A related concept to independence: Covariance
 Heuristic: measures degree of linear relationship between
variables.
 Covariance=Cov(X,Y) =E{[X-E(X)][Y-E(Y)]}
= E[XY] - E[X]E[Y]
 Note: Details provided on adjunct handout
 Two relationships now easy to characterize:
a) Pairwise independence ⇒ Cov(X,Y) = 0; not vice versa
b) Var(X+Y) = Var(X)+Var(Y)+2Cov(X,Y)
Association - Example
1800
1700
1600
Ultrasound score
1900
2000
 Correlation of (ultrasound, DPA) = 0.36
0.6
0.7
0.8
0.9
DPA
1.0
1.1
1.2
Association - Formal
 X, Y are dependent (not independent) if there exist x1 є SX,
y1, y2 є SY such that
 FX,Y(x1,y1) ≠ FX(x1)FY(y1)
 pX|Y(x1|y1) ≠ pX(x1) ≠ pX|Y(x1|y2) (discrete X)
 fX|Y(x1|y1) ≠ fX(x1) ≠ fX|Y(x1|y2)(continuous Y)
 A sufficient but not necessary condition: X, Y are
dependent (not independent) if the mean of X varies
conditionally on Y
E{X|Y}
= ∫xfX|Y(x|Y)dx
= ΣxεSx xpX|Y(x|Y)
> Basis of regression!
(continuous)
(discrete)
Data motivation
2000
Ultrasound scores by osteoporosis groups
 In our data sample the
means are certainly
different for the women
with versus without
osteoporosis
1900
X = 1828
1800
X = 1688
 How persuasive is the
1600
1700
evidence of a mean
difference in a
population of older
women?
control
case
Basic paradigm of statistics
 We wish to learn about populations
 All about which we wish to make an inference
 “True” experimental outcomes and their mechanisms
 We do this by studying samples
 A subset of a given population
 “Represents” the population
 Sample features are used to infer population features
 Method of obtaining the sample is important
 Simple random sample: All population elements /
outcomes have equal probability of inclusion
Basic paradigm of statistics
Probability
Truth for
Population
Our example: the
difference in mean
ultrasound measurement
Between older women
With and without
osteoporosis
Observed Value for a
Representative Sample
Statistical inference
Basic paradigm of statistics
 Identify a parameter that summarizes the scientific
quantity of interest
 Obtain a representative sample of the target population
 X1, … , Xn
(possibly within groups)
 Estimate the parameter using the sample (statistic)
 Characterize the variability of the estimate
 Make an inference, characterize its uncertainty, and
draw a conclusion
Basic paradigm
 Identify a parameter that summarizes the scientific
quantity of interest
µ1 = E[X|Y=1] = mean ultrasound given osteoporosis
µ0 = E[X|Y=0] = mean ultrasound given no osteoporosis
Target parameter: µ1 - µ0
 Obtain a representative sample of the target population
 The current sample was a clinical series. Questionable
how representative.
Basic paradigm
 Estimate the parameter using the sample (statistic)
 Many estimation methods have been developed.
 Proposed estimator here: Means based on ECDF
X1  X 0
 “Method of moments”
 Pause: Is this a good estimator? What makes it one?
 Consider a generic estimator (statistic) gn(X)
 Denote the target parameter we wish to estimate by Θ
Basic paradigm
 Some properties of a good estimator gn(X)
 Accuracy with respect to target parameter Θ
 Unbiased: E[gn(X) ] = Θ
 That is, bias = E[gn(X)] – Θ = 0
 Consistent:
For each ε > 0, limn→∞ P{|gn(X)-Θ|> ε} = 0
or (stronger) P{limn→∞ gn(X) = Θ} = 1
 Precision: small Var(gn(X))
 Good tradeoff of accuracy and precision:
 Low mean squared error
= E(gn(X)-Θ)2 = Var(gn (X))+[Bias(gn (X))]2
Basic paradigm
 Estimate the parameter using the sample (statistic) X 1  X 0
 Characterize the variability of the estimate
 Method 1: Calculate the variance directly
 Assumption: Sampling method ⇒ mutual independence
 Then, Var(X 1  X 0 ) = Var(X 1) + Var( X 0)
 Var( X 1 ) = 1/n1 Var(X1) = σ12/n1
 Var( X 1  X 0 ) = σ12/n1 + σ02/n0
*** Basic paradigm ***
 Characterize the variability of the estimate
 Method 1: Var( X 1  X 0 ) = σ12/n1 + σ02/n0
 What does this mean?
 We “have” one possible sample among many that could
have been taken from the population
 Suppose it was drawn randomly
 Definition: The values that the statistic of interest could
have taken over all possible samples (of the same size
randomly drawn from the population), and their probability
measure, is called the sampling distribution of that statistic.
 σ12/n1 + σ02/n0 is the variance of the sampling distribution
 Important term: The standard deviation of the sampling
distribution is called the standard error
Way to “see” the sampling
distribution: Bootstrap-Efron, 1979
 Idea: mimic sampling that produced the original sample.
1. Treat the sample as if it is the whole population. The
original sample statistic becomes the true value
(“truth”) we seek.
2. Draw the first bootstrap sample at random (with
replacement) from the original sample and calculate
the statistic of interest.
3. Repeat this process 1000* times. The distribution of
bootstrapped statistics approximates the sampling
distribution of the statistic.
Efron, B. Bootstrap Methods: Another Look at the Jackknife. Ann Stat 1979; 7:1-26.
24
Repeat 1000 Times
Bootstrapped
Original sample value of -140.48
mean difference:
-150.14
-118.16
-128.30
-168.46
-157.69
.
.
.
-196.65
25
Characterize variability of gn(X)
 Method 2: Var( X 1  X 0 ) estimated by the variance of the
bootstrapped distribution
 Variance is still a bit difficult to interpret
 Method 3: Employ the Central Limit Theorem
 Distributions of sample means converge to normal as n→∞
 Definition: Fgn(X)(s) converges to F(s) in distribution ⇒
limn→∞ P{gn(X)≤s} = F(s) at every continuity point of F.
 Central limit theorem: Let X1,...,Xn be a sequence of
mutually independent RVs with common distribution.
Define Sn=Σi Xi. Then limn→∞ P{(Sn-nμ)/(σn1/2) ≤ z} =
Φ(z) for every fixed z , where Φ is the Normal CDF with
mean=0 and variance=1.
Characterize variability of gn(X)
 Implications of the Central Limit Theorem
 ~ 95% of estimates within +/- 1.96 standard errors of µ1 - µ0
Characterize variability of gn(X)
 Implications of the Central Limit Theorem
 ~ 95% of estimates within +/- 1.96 standard errors of µ1 - µ0
 P{µ1 - µ0 ε X 1  X 0 ± 1.96 SE(X 1  X 0)} = 0.95
 Many gn(X) converge in distribution to normal
 Broadly: P[Θ ε gn(X) ± z(1-α/2) SE{gn(X)}] ≈ 1-α with z(1-α/2)
= Q{(1-α/2)} of the normal distribution with µ=0 and σ=1
 An interval I(X) = [L{gn(X)},U{gn(X)}] satisfying
P{Θ ε I(X)}= 1-α is called a 100x(1-α) confidence interval
Bootstrap Confidence Interval
Fact: the interval
that includes the
middle 95% of the
bootstrapped
statistics covers
the true unknown
population value
for roughly 95% of
samples if the
sampling
distribution of the
statistic or a
function thereof is
roughly Gaussian.
Original sample value of -140.48
29
Analytic Confidence Interval (CI)
 Sample mean difference X 1  X 0 = 1688-1828 = -140
 Var( X 1  X 0 ) = σ12/n1 + σ02/n0; SE = square root of this
 Estimate σ12, σ02 by sample analogs s12, s02 = (5362,13570)
 n1 = n0 = 21
 SE = √{5362/21 + 13570/21} = 30.03
 95% CI = (-140-1.96*30.03,-140+1.96*30.03)= (-199,-81)
= an interval in which we have 95% confidence for
including the difference in mean ultrasound scores
between osteoporotic and non-osteoporotic older women
Analytic CI – A detail
 The preceding calculation assumed
X 1  X 0 / √ (s12/n1 + s02/n0)
approximately distributed as normal mean=0, variance=1
 This does not account for variability in substituting s for σ
 Rather, the correct distribution is “t” (close to normal for
large n)
 With this correction the 95% CI is (-202,-79)
Basic paradigm of statistics
 Make an inference, characterize its uncertainty, and
draw a conclusion
 Inference Method 1: (-202,-79) is a 95% CI for the
difference in mean ultrasound scores between those with,
without osteoporosis.
 Conclusion: Based on the data, we are confident that the
mean ultrasound score is substantially lower in
osteoporotic older women than in those without
osteoporosis.
 The conclusion is a population statement.
Basic paradigm of statistics
 Make an inference, characterize its uncertainty, and
draw a conclusion
 Inference Method 2: Statistical testing
 Two goals
 Assess evidence for a statement / hypothesis
Ultrasound measurements are (or are not) associated with
osteoporosis status
 Make yes/no decision about question of interest:
Are ultrasound measurements associated with osteoporosis
status?
Decision making
 Neyman-Pearson framework for hypothesis testing
 Define complementary hypotheses about the truth in
the population
 Denote as θ the parameter space={possible values of Θ}
 Define θ0, θ1 as distinct subsets of θ corresponding to
the different hypotheses
 Example: No difference in means versus a difference in
means
 “Null” Hypothesis - H0: Θ ε θ0
(H0: µ1 - µ0 = 0)
 “Alternative” Hypothesis - H1: Θ ε θ1 (H1: µ1 - µ0 ≠ 0)
Decision making
 Choose an estimator of Θ, gn(X) (as above)
 Main idea: If the value of gn(X) observed in one’s data
is “unusual” assuming H0, then conclude H0 is not a
reasonable model for the data and decide against it
 Testing – steps:
 Compute distribution of gn(X) for Θ = θ0
 Define a “rejection” region of values far from θ0
occurring with low probability = α when H0 is true
 Reject H0 if gn(x) is in the “rejection” region; do not
reject it otherwise.
Rejection region - example
 We are interested in X 1  X 0
 If H0 is true (µ1 - µ0 = 0), then (X 1  X 0 - 0)/SE is
approximately distributed as normal with mean = 0
and variance = 1 (henceforth, N(0,1) )
Density if
H0 is true
Possible Results
of Hypothesis Testing
Reject when H0 true
Fail to reject when
H1 true
Significance testing
 Variant of Neyman-Pearson
 Based on calculation of a p-value:
The probability of observing a test statistic as or more
extreme than occurred in sample under the null
hypothesis
 As or more extreme: as different or more different from the
hypothesized value
 p-value sometimes used as measure of evidence
against null, and often, "for" alternative
Osteoporosis data
 Are ultrasound measurements associated with osteoporosis
status?
 Hypotheses: H0: µ1 - µ0 = 0 vs. H1: µ1 - µ0 ≠ 0
 Test statistic:
X 1  X/√
(s12/n1 + s02/n0) =140/30.03=4.66
0
 Rejection region: > 1.96 or < -1.96
 4.66 > 1.96, thus we reject H0 (again, approximate: “t”)
 p-value: probability of a value as far or farther from 0 than
4.66 if the true mean difference were 0
 P{gn(X) > 4.66 or gn(X) < -4.66} = 3.16E-06
 Conclusion: Data strongly support a “yes” answer.
Data motivation
 Osteoporosis screening
 Scientific question: Can we detect osteoporosis earlier
and more safely?
 Some related statistical questions:
 Are ultrasound measurements associated with
osteoporosis status? - Testing indicates “yes”.
 How strong is the evidence of association? Strong (95%
CI very substantially excludes values near 0)
 By how much do the mean ultrasound values differ
between those with and without osteoporosis? We
estimate that older women with osteoporosis have mean
140 dB/MHz lower than those without osteoporosis
 How precisely can we determine that difference?
Standard error = 30.03; 95% CI = (-202,-79)
Main points
 Variables (characteristics, features, etc.) are associated
if they are statistically dependent.
 Covariance / correlation measure linear association
 We use representative samples to study features of
populations
 Estimators (“statistics”) provide approximations of
population parameters
 Possible values and their probabilities are characterized
by sampling distributions
 Standard errors and confidence intervals summarize
their associated variability / uncertainty
Main points
 Statistical tests evaluate evidence and inform
decisions about scientific (and other) questions
 Neyman-Pearson paradigm
 Hypothesis testing
 Significance testing