Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Statistical Inference Gehlbach: Chapter 8 Objective of Statistical Analysis To answer research questions using observed data, using data reduction and analyzing variability To make an inference about a population based on information contained in a sample from that population To provide an associated measure of how good the inference is Basic Concepts of Statistics Estimation & Inference Method of testing hypotheses Based on statistics: f(X) Population : Parameters Sample X : Sample values Sampling Method of collecting data Based on probability General Approach to Statistical Analysis Population Distribution Random variables-Parameters: µ, σ Sampling Samples of size N generate data Descriptive Statistics (figures, tables) Estimation: statistics X , SD Statistical Tests of Hypothesis Inference Inference about the population Outline • Probability • • • • Definition Probability Laws Random Variable Probability Distributions • Statistical Inference • • • • • • • • • • • Definition Sample vs. Population Sampling Variability Sampling Problems Central Limit Theorem Hypothesis Testing Test Statistics P-value Calculation Errors in Inference P-value Adjustments Confidence Intervals We disagree with Stephen • A working understanding of P-values is not difficult to come by. all parts • For the most part, Statistics and clinical research can work well together. • Good collaborations result when researchers have some knowledge of design and analysis issues Probability Probability and the P-value • You need to understand what a P-value means • P-value represents a probabilistic statement • Need to understand concept of probability distributions • More on P-values later Definition of Probability • An experiment is any process by which an observation is made • An event (E or Ei) is any outcome of an experiment • The sample space (S) is the set of all possible outcomes of an experiment • Probability: a measure based on the sample S; in the simplest case is empirically estimated by # times event occurs / total # trials E.g.: Pr(of a red car) = (# red cars seen) / (total # cars) • Probability is the basis for statistical inference Axiomatic Probability (laying down “the laws”) For any sample space S containing events E1, E2, E3,…; we assign a number, P(Ei), called the probability of Ei such that: 1. 0 ≤ P(Ei) ≤ 1 2. P(S) = 1 3. If E1, E2, E3,…are pairwise mutually exclusive events in S then P(E1 E2 E3 ...) P(Ei ) i1 Union and Intersection: Venn Diagrams Union of E1 and E2: “E1 or E2”, denoted E1UE2: E1 E2 Intersection of E1 and E2: “E1 and E2”, denoted E1∩E2: E1 E2 Laws of Probability (the sequel) • Let E (“E complement”) be set of events in S not in E, then P( E )= 1-P(E) • P(E1U E2) = P(E1) + P(E2) – P(E1∩ E2) • The conditional probability of E1 given E2 has occurred: P(E1 E2 ) P(E1 | E2 ) P(E2 ) • Events E1 and E2 are independent if P(E1∩E2) = P(E1)P(E2) Conditional Probability • Restrict yourself to a “subspace” of the sample space Male Female Infection 20% 10% No infection 35% 35% ● P(I|M) = P(I∩M)/P(M) = 0.2/0.55 = 0.36 ● P(M|I) = P(I∩M)/P(I) = 0.2/0.3 = 0.67 Conditional probability examples • Categorical data analysis: odds ratio = ratio of odds of two conditional probabilities • Survival analysis, conditional probabilities of the form : P(alive at time t1+t2 | survive to t1) Random Variables (where the math begins) • A random variable is a (set) function with domain S and range (i.e., a real-valued function defined over a sample space) • E.g.: tossing a coin, let X=1 of heads, X=0 if tails – P(X=0) = P(X=1) = ½ – Many times the random variable of interest will be the realized value of the experiment (e.g., if X is the b-segment PSV from RDS) – Random variables have probability distributions Probability Distributions Two types: Discrete distributions (and discrete random variables) are represented by a finite (or countable) number of values P(X=x) = p(x) Continuous distributions (and random variables) are be represented by a realvalued interval P(x1<X<x2) = F(x2) – F(x1) Expected Value & Variance • Random variables are typically described using two quantities: – Expected value = E(X) (the mean, usually “μ”) – Variance = V(X) (usually “σ2”) • Discrete Case: E(X) = xip(xi ) V(X) = i 2 [x E(x)] p(xi ) i i • Continuous Case: E(X) x f(x) dx - V(X) (x - μ)2 f(x) dx - Discrete Distribution Example Binomial: – – – – – Experiment consists of n identical trials Each trial has only 2 outcomes: success (S) or failure (F) P(S) = p for a single trial; P(F) = 1-p = q Trials are independent R.V. X = the number of successes in n trials n x p(x) p (1 p)n x x Continuous Distribution Example Normal (Gaussian): • The normal distribution is defined by its probability density function, which is given as x μ f(x) exp 2 2 2πσ 2σ 1 2 , x for parameters μ and σ, where σ > 0. X ~ N(μ, σ2), E(X) = μ and V(X) = σ2 Same Variance Different Means f(x) X ~ N(1, 2) X ~ N(2, 2) 1 2 X Same Mean Different Variances f(x) X ~ N(, 12) X ~ N(, 22) X Statistical Inference Statistical Inference • Is there a difference in the population? • You do not know about the population. Just the sample you collected. • Develop a Probability model • Infer characteristics of a population from a sample • How likely is it that sample data support null hypothesis Statistical Inference Inference Mean = 16.2 Mean = ? Sample Population Definition of Inference • Infer a conclusion/estimate about a population based on a sample from the population • If you collect data from whole population you don’t need to infer anything • Inference = conducting hypothesis tests (for p-values), estimating 95% CI’s Sample vs. Population (example) • “The primary sample [involved] students in the 3rd through 5th grades in a community bordering a major urban center in North Carolina… The sampling frame for the study was all third through fifth-grade students attending the seven public elementary schools in the community (n=2,033). From the sampling frame, school district evaluation staff generated a random sample of 700 students.” Source: Bowen, NK. (2006) Psychometric properties of Elementary School Success Profile for Children. Social Work Research, 30(1), p. 53. Philosophy of Science • Idea: We posit a paradigm and attempt to falsify that paradigm. • Science progresses faster via attempting to falsify a paradigm than attempting to corroborate a paradigm. (Thomas S. Kuhn. 1970. The Structure of Scientific Revolutions. University of Chicago Press.) Philosophy of Science • Easier to collect evidence to contradict something than to prove truth? • The fastest way to progress in science under a paradigm of falsification is through perturbation experiments. • In epidemiology, – often unable to do perturbation experiments – it becomes a process of accumulating evidence • Statistical testing provides a rigorous data-driven framework for falsifying hypothesis What is Statistical Inference? • A generalization made about a larger group or population from the study of a sample of that population. • Sampling variability: repeat your study (sample) over and over again. Results from each sample would be different. Sampling Variability Inference Mean = 16.2 Mean = ? Sample Population Sampling Variability Inference Mean = 17.1 Mean = ? Sample Population Sampling Problems • Low Response Rate • Refusals to Participate • Attrition Low Response Rate • Response rate = % of targeted sample that supply requested information • Statistical inferences extend only to individuals who are similar to completers • Low response rate ≠ Nonresponse bias, but is a possible symptom Low Response Rate (examples) • “One hundred six of the 360 questionnaires were returned, a response rate of 29%.” Source: Nordquist, G. (2006) Patient insurance status and do-not-resuscitate orders: Survival of the richest? Journal of Sociology & Social Welfare, 33(1), p. 81. • “At the 7th week, we sent a follow-up letter to thank the respondents and to remind the nonrespondents to complete and return their questionnaires. The follow-up letter generated 66 additional usable responses.” Source: Zhao JJ, Truell AD, Alexander MW, Hill IB. (2006) Less success than meets the eye? The impact of Master of Business Administration education on graduates’ careers. Journal of Education for Business, 81(5), p. 263. • “The response rate, however, was below our expectation. We used 2 procedures to explore issues related to non-response bias. First, there were several identical items that we used in both the onsite and mailback surveys. We compared the responses of the nonrespondents to those of respondents for [both surveys]. No significant differences between respondents and non-respondents were observed. We then conducted a follow-up telephone survey of non-respondents to test for potential non-response bias as well as to explore reasons why they had not returned their survey instruments…” Source: Kyle GT, Mowen AJ, Absher JD, Havitz ME. (2006) Commitment to public leisure service providers: A conceptual and psychometric analysis. Journal of Leisure Research, 38(1), 86-87. Refusals to Participate • Similar kind of problem to having low response rates • Statistical inferences may extend only to those who agreed to participate, not to all asked to participate • Compare those who agree to refusals Refusals to Participate (example) • “Participants were 38 children aged between 7 and 9 years. Children were from working- or middle-class backgrounds, and were drawn from 2 primary schools in the north of England. Letters were sent to the parents of all children between 7 and 9 in both schools seeking consent to participate in the study. Around 40% of the parents approached agreed for their children to take part.” Source: Meins E, Fernyhough C, Johnson F, Lidstone J. (2006) Mind-mindedness in children: Individual differences in internalstate talk in middle childhood. British Journal of Developmental Psychology, 24(1), p. 184. Attrition • Individuals who drop out before study’s end (not an issue for every study design) • Differences between those who drop out and those who stay in are called Attrition bias. • Conduct follow-up study on dropouts • Compare baseline data Attrition (example) • “…Of the 251 men who completed an assigned intervention, about a fifth (19%) failed to return for a 1-month assessment and more than half (54%) for a 3-month assessment… Conclusions also cannot be generalized beyond the sample [partly because] attrition in the evaluation study was relatively high and it was not random. Therefore, findings cannot be generalized to those least likely to complete intervention sessions or followup assessments.” Source: Williams ML, Bowen AM, Timpson SC, Ross MW, Atkinson JS. (2006) HIV prevention and street-based male sex workers: An evaluation of brief interventions. AIDS Education & Prevention, 18(3), pp.207-214. • “The 171 participants who did not return for their two follow-up visits represent a significant attrition rate (34%). A comparison of demographic and baseline measures indicated that [those who stayed in the study versus those who did not] differed on age, BMI, when diagnosed, language, ethnicity, HbA1c, PCS, MCS and symptoms of depression (CES-D).” Source: Maljanian R, Grey N, Staff I, Conroy L. (2005) Intensive telephone follow-up to a hospital-based disease management model for patients with diabetes mellitus. Disease Management, 8(1), p. 18. Back to Inference…. Motivation • Typically you want to see if there are differences between groups (i.e., Treatment vs. Control) • Approach this by looking at “typical” or “difference on average” between groups • Thus we look at differences in central tendency to quantify group differences • Test if two sample means are different (assuming same variance) in experiment Same Variance Different Means f(x) X ~ N(1, 2) X ~ N(2, 2) 1 2 X Central Limit Theorem • The CLT states that regardless of the distribution of the original data, the average of the data is Normally distributed • Why such a big deal? • Allows for hypothesis testing (p-values) and CI’s to be estimated Central Limit Theorem • If a random sample is drawn from a population, a statistic (like the sample average) follows a distribution called a “sampling distribution”. • CLT tells us the sampling distribution of the average is a Normal distribution, regardless of the distribution of the original observations, as the sample size increases. P-value = 0.164 f(x) X ~ N(C, 2) X ~ N(T, 2) C T # of Infections What is the P-value? • The p-value represents the probability of getting a test statistic as extreme or more under the null hypothesis • That is, the p-value is the chances you obtained your data results under the assumption that your null hypothesis is true. • If this probability is low (say p<0.05), then you conclude your data results do not support the null being true and “reject the null hypothesis.” Hypothesis Testing & P-value • P-value is: Pr(observed data results | null hypothesis is true) • If P-value is low, then conclude null hypothesis is not true and reject the null (“in data we trust”) • How low is low? Statistical Significance If the P-value is as small or smaller than the pre-determined Type I error (size) , we say that the data are statistically significant at level . What value of is typically assumed? Probability Distribution & P-value f(x) Fail to reject H0 Reject H0 Critical limit Critical region 4.4 4.7 5.0 5.3 5.6 Mean # Infections 2-sided P-value & Probability Distribution f(x) Reject H0 Fail to reject H0 Critical limit Critical limit Critical region 4.4 Reject H0 Critical region 4.7 5.0 5.3 5.6 Mean # Infections Why P-value < 0.05 ? This arbitrary cutoff has evolved over time as somewhat precedent. In legal matters, courts typically require statistical significance at the 5% level. The P-value The P-value is a continuum of evidence against the null hypothesis. Not just a dichotomous indicator of significance. Would you change your standard of care surgery procedure for p=0.049999 vs. p=0.050001? Gehlbach’s beefs with P-value • Size of P-value does not indicate the [clinical] importance of the result • Results may be statistically significant but practically unimportant • Differences not statistically significant are not necessarily unimportant *** • Any difference can become statistically significant if N is large enough • Even if there is statistical significance is there clinical significance? Controversy around HT and P-value “A methodological culprit responsible for spurious theoretical conclusions” (Meehl, 1967; see Greenwald et al, 1996) “The p-value is a measure of the credibility of the null hypothesis. The smaller the P-value is, the less likely one feels the null hypothesis can be true.” HT and p-value • “It cannot be denied that many journal editors and investigators use P-value < 0.05 as a yardstick for the publishability of a result.” • “This is unfortunate because not only P-value, but also the sample size and magnitude of a physically important difference determine the quality of an experimental finding.” HT and p-value • “[We] endorse the reporting of estimation statistics (such as effect sizes, variabilities, and confidence intervals) for all important hypothesis tests.” – Greenwald et al (1996) Test Statistics • Each hypothesis test has an associated test statistic. • A test statistic measures compatibility between the null hypothesis and the data. • A test statistic is a random variable with a certain distribution. • A test statistic is used to calculate probability (P-value) for the test of significance. How a P-value is calculated • A data summary statistic is estimated (like the sample mean) • A “test” statistic is calculated which relates the data summary statistic to the null hypothesis about the population parameter (the population mean) • The observed/calculated test statistic is compared to what is expected under the null hypothesis using the Sampling Distribution of the test statistic • The Probability of finding the observed test statistic (or more extreme) is calculated (this is the P-value) Hypothesis Testing 1. Set up a null and alternative hypothesis 2. Calculate test statistic 3. Calculate the P-value for the test statistic 4. Based on P-value make a decision to reject or fail to reject the null hypothesis 5. Make your conclusion Errors in Statistical Inference The Four Possible Outcomes in Hypothesis Testing Decision based on Data Truth in Population H0 true H0 false Fail to reject H0 H0 is true & H0 is not rejected H0 is false & H0 is not rejected Reject H0 H0 is true & H0 is rejected H0 is false & H0 is rejected Note similarities to diagnostic tests! The Four Possible Outcomes in Hypothesis Testing TRUTH DECISION H0 true Fail to reject H0 Reject H0 H0 false Type II error () Type I error ( ) Power (1-) Conditioned on column! Type I Errors = Pr(Type I error) = Pr(reject H0 | H0 is true) “Innocent until proven guilty” Rejected innocence but defendant is truly innocent (concluded guilty). Type II Errors = Pr(Type II error) = Pr(do not reject H0 | H0 is false) “Innocent until proven guilty” Do not reject innocence but defendant was truly guilty. (conclude innocent). P-value adjustments P-value adjustments • Sometimes adjustments for multiple testing are made • Bonferroni α = (alpha) / (# of tests) • alpha is usually 0.05 (P-value cutoff) • Bonferroni is a common (but conservative) adjustment; many others exist P-value adjustments (example) • “An alpha of .05 was used for all statistical tests. The Bonferroni correction was used, however, to reduce the chance of committing a Type I error. Therefore, given that five statistical tests were conducted, the adjusted alpha used to reject the null hypothesis was .05/5 or alpha = .01.” Source: Cumming-McCann A. (2005) An investigation of rehabilitation counselor characteristics, white racial attitudes, and self-reported multicultural counseling competencies. Rehabilitation Counseling Bulletin, 48(3), 170-171. Confidence Intervals (CI’s) Confidence Intervals • What is the idea of confidence interval? Calculate a range of reasonable values (an interval) that should include the population value (point estimate) 95% of the time if you were to collect sample data over and over again. Confidence Intervals 95% Confidence ( | | ) In other words, if 100 different samples were drawn from the same population and 100 intervals were calculated, approximately 95 of them would contain the population mean. Confidence Intervals • 100*(1-α)% Confidence Interval for Mean: sd X tdf n1, 1 2 n • 100*(1-α)% Confidence Interval for Proportion: pˆ 1 pˆ pˆ z1 2 n 95% Confidence Intervals • 95% Confidence Interval for Mean: sd X 2 n • 95% Confidence Interval for Proportion: pˆ 1 pˆ pˆ 2 n Bayesian vs. Classical Inference • There are 2 main camps of Statistical Inference: – Frequentist (classical) statistical inference – Bayesian statistical inference • Bayesian inference incorporates “past knowledge” about the probability of events using “prior probabilities” • Bayesian paradigm assumes parameters of interest follow a statistical distribution of their own; Frequentist inference assumes parameters are fixed • Statistical inference is then performed to ascertain what the “posterior probability” of outcomes are, depending on: – the data – the assumed prior probabilities Schedule Seminar # Topic Date Time 1 Study design and data collection 9/10 1:30 – 3:00 2 Probability and statistical inference 9/17 2:00 – 4:00 3 Data summary measures and graphical display of results* 10/1 2:00 – 4:00 4 Survey of statistical analysis techniques (part I) 10/8 2:00 – 4:00 5 Survey of statistical analysis techniques (part II) 10/15 2:00 – 4:00 6 Evidence-based medicine and decision analysis 11/5 2:00 – 4:00 7 Reading and reviewing analyses in medical literature* 11/19 2:00 – 4:00 8 Review of student-selected medical publications* 12/3 2:00 – 4:00 *10/01 seminar will meet in Wachovia 2314