* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Inferential Statistics: A Frequentist Perspective
History of randomness wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Birthday problem wikipedia , lookup
Inductive probability wikipedia , lookup
Random variable wikipedia , lookup
Probability interpretations wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Inferential Statistics: A Frequentist Perspective Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Goa, India, September 2009 1 1 Outline 1. What are inferential statistics? 2. Inference space: – randomization versus random sampling 3. Two methods of statistical inference – Hypothesis Testing – Estimation and Confidence Intervals Objective: To gain an appreciation for some concepts underlying (frequentist) statistical inference. 2 2 What Are Inferential Statistics? • Inferential Statistics – methods for drawing conclusions about a population based on data from a sample of that population • Q: What allows us to make valid inferences about a population based only on a sample? • A: PROBABILITY • Q: Where does probability come from? • A: A random process • i.e., random sampling or randomization 3 3 What is Probability? • Frequentist definition – the long-term relative frequency of a (hypothetically) repeatable event: – Examples: • Flipping a fair coin many times, P( heads ) = 50% • Rolling a fair 6-sided die many times, P( roll 1 or 2 ) = 2/6 = 1/3 • Probability that you will win a fair lottery if 10,000,000 tickets sold and only one winner: 1 / 10,000,000 • Alternative definition (used in Bayesian statistics) – subjective probability as a measure of belief • We will only discuss frequentist methods 4 4 Two Special Cases • Two important cases for which we can derive exact probability distributions on which to base inferences: 1. Random sampling from a finite population 2. Randomization • Note: important distinction between randomization and random sampling! – Both induce randomness required for statistical inference. – However, allow for different “spaces” of inference – Rarely used together in the same study! 5 5 Random Sampling from a Finite Population • Consider a large, but finite, population of size N – Assume that the population is well-defined such that we could (theoretically) list every member – We want to determine something about that population • We take a sample of size n < N – Using some random sampling method 6 6 Random Sampling from a Finite Population • In theory, we could list all possible samples of size n • Thus, we could compute selection probability for any individual in population simply by counting • In SRS, it is easy to show that the selection probability for any individual is n / N • Sampling inference in a nut shell: – selection probability can be used to “up-weight” each individual’s contribution to the sample to estimate number of similar individuals in the population 7 7 Random Sampling from a Finite Population • Simple example: an urn with 1000 balls, some red and some black, “well-mixed” – Randomly select 10 balls, so P(S) = 10/1000 = 0.01 – Observe 1 red – estimate for number of red balls in urn is 1 / 0.01 = 100 – Mix, repeat sample, observe 2 red, estimate 200 in urn – Keep repeating sample, average of estimates will be very close to true value (definition of unbiased) • What if there really were 150 red balls? 8 8 Random Sampling from a Finite Population Q: What is the statistical inference space for results from such a random sample? – That is, to whom do the results directly apply? a) Some larger population that contains “similar types” of people as the finite population b) The finite population from which sample was drawn c) Some other population altogether 9 9 Randomization • Example: RCT to compare trts A and B • Enroll N people, randomize about N/2 to A, N/2 to B – Study population (N people) typically convenience sample • Pts must meet inclusion criteria and provide consent • i.e., not random! • In theory, we could list all possible random assignments among these N participants – each assignment is equally likely (usually) • Thus, we can calculate the probability of observing our particular random assignment – from which we can calculate exact p-values 10 10 Randomization • Example, randomize 4 patients to either A or B • 6 possible allocations • Randomly select one P(S)=1/6 for each • Observe results A: ID 1 → 10 ID 4 → 6 B: ID 2 → 4 ID 3 → 0 – TS: difference in means 8–2=6 A B 1, 2 3, 4 1, 3 2, 4 1, 4 2, 3 3, 4 1, 2 2, 4 1, 3 2, 3 1, 4 11 11 Randomization • Suppose both treatments have exactly same underlying effect (H0 true) • Since assignment was random, any other assignment would have produced same results A B 10, 4 0, 6 10, 0 4, 6 10, 6 4, 0 0, 6 10, 4 4, 6 4, 0 10, 0 10, 6 12 12 Randomization • Suppose both treatments have exactly same underlying effect (H0 true) • Since assignment was random, any other assignment would have produced same results • P-value = probability of TS as or more extreme, under null hypothesis • Exact 1-sided p-value: 1 / 6 = 0.167 A B TS 10, 4 0, 6 4 10, 0 4, 6 0 10, 6 4, 0 6 0, 6 10, 4 -4 4, 6 4, 0 10, 0 10, 6 0 -6 13 13 Randomization Q: What is the statistical inference space for results from a randomized experiment? – That is, to whom does this p-value directly apply? a) Some larger population that contains “similar types” of people as those enrolled in the trial b) The finite population consisting of all possible random assignments of the people actually enrolled in trial c) Some other population altogether 14 14 Random Sampling from Infinite Population • Occasionally, study sample really is randomly sampled from a huge or ambiguous population – E.g., randomly selecting clients from population of all clients who attend a clinic in specific time period 15 15 Random Sampling from Infinite Population • However, more typically, random sampling is implicitly assumed when – Applying statistical models, p-values, or confidence intervals to observational data – Generalizing statistical inferences from an RCT to some broader population • Such generalizations have been called nonstatistical inference, or inferences without a basis in probability • “Clinical judgement” as opposed to “statistical inference” 16 16 “Arguments regarding the ‘representativeness’ of a nonrandomly selected sample are irrelevant to the question of its randomness: a random sample is random because of the sampling procedure used to select it, not because of the composition of the sample.” Edgington and Onghena (2007), Randomization Tests, 4th ed. 17 17 “I have never met random samples except when sampling has been under human control and choice as in random sampling from a finite population or in experimental randomization in the comparative experiment.” Kempthorne (1979), Sankhya 18 18 “In most epidemiologic studies, randomization and random sampling play little or no role in the assembly of study cohorts. I therefore conclude that probabilistic interpretation of conventional statistics are rarely justified, and that such interpretations may encourage misinterpretation of nonrandomized studies.” Greenland (1990), Epidemiology 19 19 The Logic of Hypothesis Testing • Statistical hypothesis – a statement about parameters of a population • Null hypothesis (H0) – the hypothesis to be tested, often includes hypothesis of no difference – H0: Avg. BP in group A ≥ Avg. BP in group B • Alternative hypothesis (HA) – corresponds to the research hypothesis – HA: Avg. BP in group A < Avg. BP in group B • H0 and HA - mutually exclusive and exhaustive 20 20 The Logic of Hypothesis Testing • Goal of hypothesis testing – reject H0! • How do we decide to reject or not? – Obtain data via a random process – If data are consistent with H0, then do not reject – Otherwise, if data are inconsistent, then reject H0 and conclude HA • P-value = probability of getting sample data as or more extreme than observed by chance, assuming H0 • Decision rule: – If p-value > α, do not reject H0 – If p-value < α, reject H0 21 21 Type I and Type II Errors and Power • In truth, H0 is either true or false, but we never get to know the truth. • Based only on observed data, we decide to either reject H0 or not. Truth H0 is true Decision H0 is false Do not reject H0 Correct decision Type II error (1 - α) β Reject H0 Type I error α = significance level Correct decision (1 - β) = Power 22 22 Interpreting “The Power to Detect” • Suppose protocol says “study has 90% power to detect a mean difference between groups of 5.” • This does not mean: 1. There is a 90% chance to conclude that the true mean difference between groups is at least 5 2. That there is a 90% chance of observing a mean difference between groups of at least 5 • It simply means that there is an 90% chance of making a decision to reject the null hypothesis if the true (but unknown) mean difference is 5 23 23 Interpreting Results • Suppose we decided before the study that α = 0.05, study designed w/ 90% power to detect mean difference of 5 • Suppose we observe p-value = 0.06 – Reject or not? – What is the probability that we would be making a type II error if we decide to not reject H0 in this case? • Now suppose we observe p-value of 0.001 – Reject or not? – What is the probability of a type I error here? • How does knowing a study’s power help interpret results? 24 24 Absence of Evidence … • • • • Is not evidence of absence! That is, not rejecting the null hypothesis does not provide evidence that the null is true. P-values provide evidence against the null. Avoid the following conclusions: – “no difference between groups” – “treatment was ineffective” – “no association between X and Y” • Instead, say “insufficient evidence of” difference, effect, or association 25 25 Genesis of a Confidence Interval θˆ ∧ ∧ -1.96 SE +1.96 SE θ -1.96 SE +1.96 SE 26 26 Interpreting a Confidence Interval • For a 95% CI, we have 95% “confidence” that the true value is somewhere within the interval – This is not a probability statement – True value is either in the observed interval or it is not • Have no more confidence regarding the center of the interval than we do around the endpoints – True value can be anywhere within the interval, not necessarily near the middle – Point estimate should not necessarily be regarded as “best estimate” or most likely value for true value 27 27 Confidence Intervals and Hypothesis Tests • Close relationship between confidence intervals and hypothesis tests • In fact, a confidence interval can be regarded as a family of hypothesis tests – Any value not contained within a (1 – α)% CI would have been rejected by a 2-sided test of size α – Example: 95% CI for OR (1.25, 2) – CI does not contain the value 1 – Thus, can reject H0: OR = 1 at the 5% level – CIs used frequently for conducting tests in certain contexts, e.g., non-inferiority designs 28 28 Key Points • Statistical inference requires a random process • Random sampling and randomization are not the same thing • Goal of hypothesis testing is (almost) always to reject the null hypothesis – Deciding not to reject tells you little – Don’t over-interpret non-significant results • Confidence intervals provide a range of plausible values for true parameters, none of which is more “likely” to be the true one 29 29