Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Biostatistics for Clinical and Translational Researchers KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer Center FRONTIERS: The Heartland Institute of Clinical and Translational Research Course Information Jo A. Wick, PhD Office Location: 5028 Robinson Email: [email protected] Lectures are recorded and posted at http://biostatistics.kumc.edu under ‘Educational Opportunities’ Course Objectives Understand the role of statistics in the scientific process Understand features, strengths and limitations of descriptive, observational and experimental studies Distinguish between association and causation Understand roles of chance, bias and confounding in the evaluation of research Course Calendar June 29: Descriptive Statistics and Core Concepts July 6: Hypothesis Testing July 13: Linear Regression & Survival Analysis July 20: Clinical Trial & Experimental Design Probability Review Experiment An experiment is a process whose results are not known until after it has been performed. The range of possible outcomes are known in advance We do not know the exact outcome, but would like to know the chances of its occurrence The probability of an outcome E, denoted P(E), is a numerical measure of the chances of E occurring. 0 ≤ P(E) ≤ 1 Probability The most common definition of probability is the relative frequency view: # of times x = a P x = a = total # of observations of x Probabilities for the outcomes of a random variable 0 0.0 0.00 5 0.05 P(x) 10 0.2 0.10 15 0.15 0.4 x are represented through a probability distribution: 0 -4 1 2 2 3 -2 44 5 66 07 x 8 8 9 10 10 2 1112 12 14 4 Population Parameters Most often our research questions involve unknown population parameters: What is the average BMI among 5th graders? What proportion of hospital patients acquire a hospitalbased infection? To determine these values exactly would require a census. However, due to a prohibitively large population (or other considerations) a sample is taken instead. Sample Statistics Statistics describe or summarize sample observations. They vary from sample to sample, making them random variables. We use statistics generated from samples to make inferences about the parameters that describe populations. Sampling Variability Samples μ σ x 0 s 1 x 0.15 s 1.1 Population Sampling Distribution x of x 0.1 s 0.98 Types of Samples Random sample: each population sample person has equal chance of being selected. Convenience sample: persons are selected because they are convenient or readily available. The principal way to guarantee that the sample Systematic sample: persons selected based on a pattern. Stratified sample: persons selected from within subgroup. Random Sampling For studies, it is optimal (but not always possible) for the sample providing the data to be representative of the population under study. Simple random sampling provides a representative sample (theoretically). A sampling scheme in which every possible sub-sample of size n from a population is equally likely to be selected Assuming the sample is representative, the summary statistics (e.g., mean) should be ‘good’ estimates of the true quantities in the population. • The larger n is, the better estimates will be. Types of Samples We will explore the impact of sampling when we discuss Experimental Design on July 20. Hypothesis Testing Recall: Types of Data All data contains information. It is important to recognize that the hierarchy implied in the level of measurement of a variable has an impact on (1) how we describe the variable data and (2) what statistical methods we use to analyze it. Levels of Measurement Nominal: difference discrete qualitative Ordinal: difference, order Interval: difference, order, equivalence of intervals continuous quantitative Ratio: difference, order, equivalence of intervals, absolute zero Types of Data NOMINAL ORDINAL INTERVAL RATIO Information increases Levels of Measurement The levels are in increasing order of mathematical structure—meaning that more mathematical operations and relations are defined—and the higher levels are required in order to define some statistics. At the lower levels, assumptions tend to be less restrictive and the appropriate data analysis techniques tend to be less sensitive. In general, it is desirable to have a higher level of measurement. Levels of Measurement Level Statistical Summary Mathematical Relation/Operation Nominal Mode one-to-one transformations Ordinal Median monotonic transformations Interval Mean, Standard Deviation positive linear transformations Ratio Geometric Mean, Coefficient of Variation multiplication by c 0 Recall: Hypotheses Null hypothesis “H0”: statement of no differences or association between variables This is the hypothesis we test—the first step in the ‘recipe’ for hypothesis testing is to assume H0 is true Alternative hypothesis “H1”: statement of differences or association between variables This is what we are trying to prove Hypothesis Testing One-tailed hypothesis: outcome is expected in a single direction (e.g., administration of experimental drug will result in a decrease in systolic BP) H1 includes ‘<‘ or ‘>’ Two-tailed hypothesis: the direction of the effect is unknown (e.g., experimental therapy will result in a different response rate than that of current standard of care) H1 includes ‘≠‘ Hypothesis Testing The statistical hypotheses are statements concerning characteristics of the population(s) of interest: Population mean: μ Population variability: σ Population rate (or proportion): π Population correlation: ρ Example: It is hypothesized that the response rate for the experimental therapy is greater than that of the current standard of care. πExp > πSOC ← This is H1. Recall: Decisions Type I Error (α): a true H0 is incorrectly rejected “An innocent man is proven GUILTY in a court of law” Commonly accepted rate is α = 0.05 Type II Error (β): failing to reject a false H0 “A guilty man is proven NOT GUILTY in a court of law” Commonly accepted rate is β = 0.2 Power (1 – β): correctly rejecting a false H0 “Justice has been served” Commonly accepted rate is 1 – β = 0.8 Decisions Truth Conclusion H1 H0 H1 Correct: Power Type I Error H0 Type II Error Correct Basic Recipe for Hypothesis Testing 1. State H0 and H1 2. Assume H0 is true 3. Collect the evidence—from the sample data, compute the appropriate sample statistic and the test statistic Test statistics quantify the level of evidence within the sample—they also provide us with the information for computing a p-value (e.g., t, chi-square, F) 4. Determine if the test statistic is large enough to meet the a priori determined level of evidence necessary to reject H0 (. . . or, is p < α?) Example: Carbon Monoxide An experiment is undertaken to determine the concentration of carbon monoxide in air. It is hypothesized that the actual concentration is significantly greater than 10 mg/m3. Eighteen air samples are obtained and the concentration for each sample is measured. The random variable (outcome) x is carbon monoxide concentration. The characteristic (parameter) of interest is μ—the true average concentration of carbon monoxide in air. Step 1: State H0 & H1 H1: μ > 10 mg/m3 ← We think! 0.4 H0: μ ≤ 10 mg/m3 ← We assume in order to test! 0.2 0.0 P(x) Step 2: Assume μ = 10 -4 -2 μ = 10 0 x 2 4 Step 3: Evidence 10.25 10.37 10.66 10.47 10.56 10.22 10.44 10.38 10.63 10.40 10.39 10.26 10.32 10.35 10.54 10.33 10.48 10.68 Sample statistic: x = 10.43 Test statistic: t What does 1.79 mean? How do we use it? x μ0 10.43 10 1.79 s 1.02 n 18 Student’s t Distribution 0.4 Remember when we assumed H0 was true? 0.2 0.0 P(x) Step 2: Assume μ = 10 -4 -2 μ = 10 0 x 2 4 Student’s t Distribution What we were actually doing was setting up this theoretical Student’s t distribution from which the pvalue can be calculated: xμ 10 10 0 s 0.2 n 0.0 P(x) 0.4 t -4 -2 0 t=0 x 2 4 0 1.02 18 Student’s t Distribution Assuming the true air concentration of carbon 0.4 monoxide is actually 10 mg/mm3, how likely is it that we should get evidence in the form of a sample mean equal to 10.43? 0.2 P x 10.43 ? 0.0 P(x) Step 2: Assume μ = 10 -4 -2 μ = 10 0 2 x x =10.43 4 Student’s t Distribution We can say how likely by framing the statement in terms of the probability of an outcome: 0.2 x μ0 10 10 0 s 1.02 n 18 p = P(t ≥ 1.79) = 0.0456 0.0 P(x) 0.4 t -4 -2 0 t=0 2 x t = 1.79 4 Step 4: Make a Decision Decision rule: if p ≤ α, the chances of getting the actual collected evidence from our sample given the null hypothesis is true are very small. The observed data conflicts with the null ‘theory.’ The observed data supports the alternative ‘theory.’ Since the evidence (data) was actually observed and our theory (H0) is unobservable, we choose to believe that our evidence is the more accurate portrayal of reality and reject H0 in favor of H1. Step 4: Make a Decision What if our evidence had not been in as great of degree of conflict with our theory? p > α: the chances of getting the actual collected evidence from our sample given the null hypothesis is true are pretty high We fail to reject H0. Decision How do we know if the decision we made was the correct one? We don’t! If α = 0.05, the chances of our decision being an incorrect reject of a true H0 are no greater than 5%. We have no way of knowing whether we made this kind of error—we only know that our chances of making it in this setting are relatively small. Which test do I use? What kind of outcome do you have? Nominal? Ordinal? Interval? Ratio? How many samples do you have? Are they related or independent? Types of Tests One Sample Measurement Level Population Parameter Hypotheses Sample Statistic Nominal Proportion π H0: π = π0 H1: π ≠ π0 Ordinal Median M H0: M = M0 H1: M ≠ M0 Interval Mean μ H0: μ = μ0 H1: μ ≠ μ0 x Student’s t or Wilcoxon (if non-normal or small n) Ratio Mean μ H0: μ = μ0 H1: μ ≠ μ0 x Student’s t or Wilcoxon (if non-normal or small n) p= x n m = p50 Inferential Method(s) Binomial test or z test (if np > 10 & nq > 10) Wilcoxon signed-rank test Types of Tests Parametric methods: make assumptions about the distribution of the data (e.g., normally distributed) and are suited for sample sizes large enough to assess whether the distributional assumption is met Nonparametric methods: make no assumptions about the distribution of the data and are suitable for small sample sizes or large samples where parametric assumptions are violated Use ranks of the data values rather than actual data values themselves Loss of power when parametric test is appropriate Types of Tests Two Independent Samples Measurement Level Population Parameters Hypotheses Sample Statistics Nominal π1, π2 H0: π1 = π2 H1: π1 ≠ π2 Ordinal M1, M2 H0: M1 = M2 H1: M1 ≠ M2 m1, m2 Median test Interval μ1, μ2 H0: μ1 = μ2 H1: μ1 ≠ μ2 x1 x2 Student’s t or Mann-Whitney (if non-normal, unequal variances or small n) Ratio μ1, μ2 H0: μ1 = μ2 H1: μ1 ≠ μ2 x1 x2 Student’s t or Mann-Whitney (if non-normal, unequal variances or small n) p1 = x1 n1 p2 = Inferential Method(s) x2 n2 Fisher’s exact or Chi-square (if cell counts > 5) Comparing Central Tendency # Groups 2 >2 Normal or large n Independent Samples Dependent Samples 2-sample t Non-normal or small n Independent Samples Paired t Normal or large n Dependent Samples Wilcoxon Signed-Rank Independent Samples Wilcoxon RankSum Non-normal or small n Dependent Samples ANOVA Independent Samples 2-way ANOVA Dependent Samples Kruskal-Wallis Friedman’s One-Sample Test of a Mean Dissolving times (seconds) of a drug in gastric juice: x 45.21 42.7 43.4 44.6 45.1 45.6 45.9 46.8 47.6 s 2 2.69 It is hypothesized that the drug will take more than 45 seconds to fully dissolve. 0.4 H1: μ > 45 t 45.21 45 0.36 0.58 0.2 p = P(t > 0.36) = 0.36 0.0 P(x) H0: μ ≤ 45 -4 -2 0 t=0 x 2 4 Two-Sample Test of Means Clotting times (minutes) of blood for subjects given one of two different drugs: Drug B Drug G 8.8 8.4 9.9 9.0 7.9 8.7 11.1 9.6 9.1 9.6 8.7 10.4 x1 8.75 x2 9.74 9.5 It is hypothesized that the two drugs will result in different blood-clotting times. H1: μB ≠ μG H0: μB = μG Two-Sample Test of Means What we’re actually hypothesizing: H0: μB μG = 0 0.2 0.0 P(x) 0.4 x1 x2 0.99 -4 -2 μB μG = 0 0 2 4 x P x1 x2 0.99 ? Two-Sample Test of Means What we’re actually hypothesizing: H0: μB μG = 0 p = P(|t| > 2.475) = 0.03 x1 x2 0.2 s12 s22 n1 n2 0.0 P(x) 0.4 t -4 t = 2.48 -2 0 t=0 x t = 2.48 2 4 8.75 9.74 2.475 0.40 Assumptions of t In order to use the parametric Student’s t test, we have a few assumptions that need to be met: Approximate normality of the observations In the case of two samples, approximate equality of the sample variances Assumption Checking To assess the assumption of normality, a simple histogram would show any issues with skewness or outliers: Assumption Checking Skewness Assumption Checking Other graphical assessments include the QQ plot: Assumption Checking Violation of normality: Assumption Checking To assess the assumption of equal variances (when groups = 2), simple boxplots would show any issues with heteroscedasticity: Assumption Checking Rule of thumb: if the larger variance is more than 2 times the smaller, the assumption has been violated Now what? If you have enough observations (20? 30?) to be able to determine that the assumptions are feasible, check them. If violated: • Try a transformation to correct the violated assumptions (natural log) and reassess; proceed with the t-test if fixed • If a transformation doesn’t work, proceed with a non-parametric test • Skip the transformation altogether and proceed to the nonparametric test If okay, proceed with t-test. Now what? If you have too small a sample to adequately assess the assumptions, perform the nonparametric test instead. For the one-sample t, we typically substitute the Wilcoxon signed-rank test For the two-sample t, we typically substitute the MannWhitney test Consequences of Nonparametric Testing Robust! Less powerful because they are based on ranks which do not contain the full level of information contained in the raw data When in doubt, use the nonparametric test—it will be less likely to give you a ‘false positive’ result. Summary Probability review Population parameters Sample statistics Types of samples Hypothesis testing Matching the level of measurement to the type of test Recipe for hypothesis testing Types of tests • Parametric versus nonparametric Assumption checking