* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Psychometrics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Analysis of variance wikipedia , lookup
Misuse of statistics wikipedia , lookup
Statistical inference wikipedia , lookup
2015-11-09 Biostatistics for the biomedical profession Lecture 4 BIMM34 Karin Källen & Linda Hartman November 2015 1 • Repetition • Normal distribution • Reference intervals and confidence intervals • Elements of statistical inference, Hypothesis testing • T-test, ANOVA • Central limit theorem • Criteria to be met for performing a T-test. 2015-11-09 Today • Lecture 4: • Non-parametric tests • Paired samples tests. 2 • • • • • • The mean, median, and mode all have the same value The curve is symmetric around the mean; the skew and kurtosis is 0 The curve approaches the X-axis asymptotically Mean ± 1 SD covers 2*34.1% of data Mean ± 2 SD covers 2*47.5% of data Mean ± 3 SD covers 99.7% of data 3 The (perfect) normal distribution 2015 -1109 Excercise: Mean: SD: 50.5 cm 2.3 cm • +2 z-scores: 50.5 + 2*2.3 = 55.1cm 4 1. Estimate the value corresponding to a z-score of -2 and +2, respectively, for birth length, based on the following sample measurements: • -2 z-scores: 50.5 – 2*2.3 = 45.9cm 2. Estimate the z-score corresponding to a birth length of 48 cm. • (48-50.5)cm / 2.3cm = - 1.1 2015 -1109 Birth length example, continued 95% of all Swedish babies will have a birth lenght betwen 45.9 and 55.1cm Exercise: How large proportion of the population will have a birth length above 48 cm? • Hint: approximate that 70% av all will be between +/- 1.1 SD • Approx 85% 5 A birth lenght of 48cm corresponds to a z-value of -1.1. 2015 -1109 The standard error of the mean is a measurement of the spread of the mean. . 6 Standard error of the mean (SEM) Let SD= the population standard deviation n=sample size Then SEM= 2015 -1109 Confidence interval E.g., a 95% confidence interval tells us between which limits the ’true’ estimate (with 95% certainty) lies. 7 A confidence interval tells us within which interval the ’true’ estimate of a parameter probably lies- Repetition: 95% of the data will lie between +/- 2 SD (1.96 exactly). A 95% CI could be constructed (assuming large sample): (mean-1.96*SEM to mean+1.96*SEM) 2015 -1109 Excercise, birth lengths, continued: Mean: 50.5 cm • +2 z-scores: 50.5 + 2*2.3 = 55.1cm SD: 2.3 cm 2. Estimate the z-score corresponding to a birth length of 48 cm. • (48-50.5)cm / 2.3cm = - 1.1 3. How large proportion (approximately) of the infants will have a birth below 48 cm? • Approx 15% 4. The mean (50.5cm) was based on 77 000 births. A)Construct a 95% CI for mean birth lenght in this sample • 50.5 +/- 1.96 * SEM where SEM=2.3/√70 000 = 0.0087 • Mean (95%CI)= 50.5 (50.48 – 50.52) B)Construct a 95% reference interval for birth length 50.5 +/- 1.96*SD: (46.0 – 55.0)cm 8 1. Estimate the value corresponding to a z-score of -2 and +2, respectively, for birth length, based on the following sample measurements: • -2 z-scores: 50.5 – 2*2.3 = 45.9cm 2015 -1109 9 The 95%CI are often represented by error bars Statistical inference: Is there a true difference between the means? 2015 -1109 • Repetition • Normal distribution • Reference intervals and confidence intervals • Elements of statistical inference, Hypothesis testing • T-test, ANOVA • Central limit theorem • Criteria to be met for performing a T-test. 2015-11-09 Today • Lecture 4: • Non-parametric tests • Paired samples tests. 10 Hypothesis testing • In order to get significant findings, we want to reject H0 ex. no effect If H0 is rejected, the alternative hypothesis is left (H1) • The p-value is the probability that you get that result you got or even more extreme if H0 is true. • P-value is a probability between 0% and 100% 11 • Set up a null hypothesis (H0): No effect • Set up an alternative hypothesis (H1): Effect If the p-value is small enough: reject H0 Small enough is something you decide before the analysis, significance level Ex. 1%, 5% or 10% Calculation of the p-value can be done even if the data is not normally distributed, but in different ways 2015 -1109 Elements of statistical inference Sample 2: 𝑥12 , …, 𝑥𝑛2 Mean= 𝑥 2 St dev = s2 H0: μ1 = μ2, i.e. μ1 - μ2 = 0 H1: μ1 ≠ μ2, i.e. μ1 - μ2 ≠ 0 2015-11-09 Sample 1: 𝑥11 , …, 𝑥𝑛1 Mean= 𝑥 1 St dev = s1 • The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis (H0)is true. • The test statistic is 𝑥 1-𝑥 2 • If H0 is true 𝑥 1-𝑥 2 is a sample from N(0, SE(𝑥 1-𝑥 2) 12 H0: μ1 = μ2, i.e. μ1 - μ2 = 0 H1: μ1 ≠ μ2, i.e. μ1 - μ2 ≠ 0 _____ Expected distribution of 𝑥 1-𝑥 2 _ _ _ _ Sample mean, 𝑥 1-𝑥 2 • Large p-value -> Data probable if H0 is true • Thus: H0 is not rejected (= considered to be true until further evidence) 2015-11-09 Elements of statistical inference, hypothesis testing, two-sided The p-value is the probability that you get that result you got or even more extreme if H0 is true. 0 𝑥 1-𝑥 2 13 Elements of statistical inference, hypothesis testing, two-sided • Data unlikely if H0 is true. • Thus: H0 is rejected (=considered to be false) _____ Expected distribution of 𝑥 1-𝑥 2 _ _ _ _ Sample mean, 𝑥 1-𝑥 2 2015-11-09 H0: μ1 = μ2, i.e. μ1 - μ2 = 0 H1: μ1 ≠ μ2, i.e. μ1 - μ2 ≠ 0 The p-value is the probability that you get that result you got or even more extreme if H0 is true. 0 𝑥 1-𝑥 2 14 Uses the fact that the difference between two means that both comes from normally distributed data, will follow a normal distribution with expected mean=0,and expected standard deviation SE. t= Difference between group means Standard error of difference in means SE=s√(1/n0+1/n1); s= √[ 2015-11-09 The t-test = x1 - x0 SE (n1-1)s12 +(n0-1)*s02 . (n1 + n0 – 2) ] The test statistic (t) could then be compared with the student’s t- distribution. If the samples are large, the test statistic will follow a normal distribution (small samples will follow a t(df) distribution, where df=n1+n2-2). 15 The t-test can be used to determine if two small sets of normal distributed data are significantly different from each other, and the spread is not known but instead estimated from the data. Under certain conditions, the test statistic will follow a Student's t distribution. 2015-11-09 The t-test 16 Elements of statistical inference, hypothesis testing, two-sided, large samples • 2015-11-09 H0: μ1 = μ2, i.e. μ1 - μ2 = 0 H1: μ1 ≠ μ2, i.e. μ1 - μ2 ≠ 0 1.96 𝑥 1−𝑥 2 𝑆𝐸 17 Expected mean difference = μ1 - μ2 =0 Difference between samples 𝑥 1-𝑥 2 -1.96 0 Z-value below -1.96 or above +1.96 p(𝑥 1-𝑥 2 from distribution 0) =(.0249 + .0249) = 0.05 • Our data unlikely if H0 is true H0 is rejected Elements of statistical inference, hypothesis testing, two-sided, large samples • 2015-11-09 H0: μ1 = μ2, i.e. μ1 - μ2 = 0 H1: μ1 ≠ μ2, i.e. μ1 - μ2 ≠ 0 1.96 𝑥 1−𝑥 2 𝑆𝐸 18 Expected mean difference = μ1 - μ2 =0 Difference between samples 𝑥 1-𝑥 2 -1.96 0 Z-value below -1.96 or above +1.96 p(𝑥 1-𝑥 2 from distribution 0) =(.0249 + .0249) = 0.05 • Our data unlikely if H0 is true H0 is rejected 2015-11-09 Example: Do the birth weight differ significantly between girls and boys? (data from the birth-database) Discuss: What do you conclude regarding the distributions? Would a t-test be suitable? 19 2015-11-09 T-test, birth weight boys vs girls, continued SPSS-output: Significance of the T-test birthweight boys vs girls Difference between the means, with 95%CI for the difference 20 Just to decide which row to read from. If Sig. Is large (e.g., >.1): read from the upper row Statistical significance vs. clinical relevance Statistical significance: ”There is a difference” 2015-11-06 • Low p-value • How large is the difference? Clinical relevance: ”Is the difference of importance?” Effect estimation (CI) is needed! 21 • The p-value does not tell us anything about the size of the effect. Only how probable it is to obtain an effect of the size in our sample if the null hypothesis is true • The P-value is a function of both sample size and of true effect size 2015-11-09 Important notes…. • With large samples, statistically significant results could be found even if the size of the absolute effects are so small that they are of no clinical interest. • The fact that no significant results were found, does not mean that no difference exists. Perhaps the study had to low power to detect a true difference/effect/association. 22 1. In a sample with two arms with 200 000 observations each, a statistically significant 25g weight loss difference between two diet groups was detected. A. High risk for a Type II-error 2. In a sample with 300 individuals, the associations between 50 environmental toxins and 5 different outcomes (cancer, psychiatric disorders etc etc) were investigated. Two strong associations, and 10 associations of moderate strength were statistically significant. 3. In a sample with 300 individuals, the risk for ADHD after exposure with a certain pollutant during pregnancy was investigated. In the exposed group (n=100) 4 children had ADHD, compared to 6 in the control group. The difference was not statistically significant (p=0.28). 2015-11-09 Exercise: Combine the results with the comments B. High risk for a Type I- error C. Statistical significant results without clinical significance. 23 Multiple T-tests could result in mass-significance! Do ANOVA instead of repeated T-tests. • • • ANOVA: H0: Mean1=mean2=mean3 H1: At least two of the means are different • In short: In an ANOVA, the total variance is devided into the within-groups, and between-groups variance. 2015-11-09 Repetition: More than two groups, t-test assumptions holds, one way ANOVA (analysis of variance) 24 • Repetition • Normal distribution • Reference intervals and confidence intervals • Elements of statistical inference, Hypothesis testing • T-test, ANOVA • Central limit theorem • Criteria to be met for performing a T-test. 2015-11-09 Today • Lecture 4: • Non-parametric tests • Paired samples tests. 25 ANOVA • Compare variances 2015-11-09 • Between groups (VB) • Within groups (VW) VB VW Ratio VB/VW Large Small Large Small Large Small The quotient (F=VB/VW) is equal to 1 if group means are equal and >1 if they are not. 26 The corresponding test is called an F-test – and is based on the F-distribution ANOVA example birthweights 2015-11-09 Descriptives: Created with Analyze > Tables > CustomTables in SPSS Exercise: Interpret the output in words 27 ANOVA example birthweights 2015-11-09 Descriptives: “A one-way between subjects ANOVA was conducted to compare the effect of smoking mothers (non-smokers, smokers(≤10 cig/day), smokers (>10 cig/day)) on birthweight . There was no significant effect of smoking on birth weight at the p<.05 level for the three groups [F(2, 246) = 2.73, p = 0.067].” 28 • Repetition • Normal distribution • Reference intervals and confidence intervals • Elements of statistical inference, Hypothesis testing • T-test, ANOVA • Central limit theorem • Criteria to be met for performing a T-test. 2015-11-09 Today • Lecture 4: • Non-parametric tests • Paired samples tests. 29 2015-11-09 Cotinine is the main metabolite from nicotine. The graph shows the cotinine levels in the umbelical artery. Mean = 31.2 Median=0.2 30 2015-11-09 Child cotinine levels, continued… 31 Exercise: Normal distribution? NO! 2015-11-09 Exercise: 1. Would it be possible to nevertheless use a T-test to compare the cotinine level among children of smoker versus non-smokers? Under what conditions? 32 Repetition – Central Limit Theorem • no. of observations is large (faster if distribution is symmetric) • Independent observations • from the same distribution We could often use normal distribution to 2015-11-09 • Mean has approximately normal distribution if test difference in mean – even if observations are not normal 10000 samples of mean values from dice-rolls Based on • 10 rolls • 100 rolls 33 Population distribution N=10 Distribution of means from 1000 samples, by different sample sizes 2015-11-09 Difference between distribution in the population, and distribution of means of samples, by sample sizeifferent sizes N=50 N=100 34 Repetition: T-test 1. The mean is a relevant summary measure 2. Independent observations (e.g. no patient contributes more than one observation) 3. Observations are of Normal distribution OR Both groups are large 2015-11-09 Assumptions Exercise: Would you perform a T-test if you, e.g., would like to compare the cotinine levels in children to smokers vs non-smokers? Doubtful…. (does not meet the first criteria) Other solutions? Perform a non-parametric test (e.g., Mann-Whitney) 35 2015-11-09 Box-plot: (cotinine levels in children in mothers of smokers vs non-smokers) 36 Exercise: Describe the distributions by studying the box-plot 2015-11-09 The SPSS-output from a T-test (against better knowledge). Cotinine levels in children of smokers compared to non-smokers. Equal variances could not be assumed. Read from the lower row. A significant difference between the means… …but the means (and thus, the difference between the means) do not make sense. 37 2015-11-09 So…. How could we investigate data that is not normally distributed? Using non-parametric tests!!! 38 • Repetition • Normal distribution • Reference intervals and confidence intervals • Elements of statistical inference, Hypothesis testing • T-test, ANOVA • Central limit theorem • Criteria to be met for performing a T-test. 2015-11-09 Today • Lecture 4: • Non-parametric tests • Paired samples tests. 39 • Original measurements are converted to ranks in the analysis • H0: Distributions are equal in all groups Median useful marker for differences in distribution 2015-11-09 Non-parametric methods • Insensitive to skewed distributions, extreme values • Can be used for ordinal data E.g. 0 = No response, 1 = Mild response, 2 = Strong response 40 Difference between two independent groups: Mann-Whitney’s test • Rank the observations from the lowest to highest • Calculate rank sum in group A (WA) and in group B (WB) Straightforward generalization to more than two groups (KruskalWallis test) • The larger the difference is in mean ranks WA/nA and WB/nB , the lower p-value will be • Mann-Whitney U WA n A n A 1 2 41 Another name for the same test is ”Wilcoxon Rank sum test” which utilizes WA 2015-11-09 Mann-Whitney… Sex Female Female Female Female Female Male Male Female Male Male Male Female Male Female Male Female Male Male Female Male Female Female Male Rank Small Group Discussion... • Calculate the rank sum and the mean rank for males (and females if you have time) • For the group sizes nA = 11 (males) and nB = 12 (females), p < 0.05 if the rank sum for the smallest group (males) is below 100 or above 175 2015-11-09 Creatinine 38 57 64 65 70 79 81 81 82 85 93 105 107 110 113 123 219 232 262 297 313 320 845 • Conclusion? • How would you summarize the test? 42 Presenting Mann-Whitney results • Median (possibly mean as well for comparison) • Variability in each group • Percentiles or quartiles (in smaller groups) or min-max (in even smaller groups) • Standard deviation not relevant 2015-11-09 • Average in each group • Difference between the groups • P-value for M-W test • U-statistic sometimes relevant • Ideally: Median difference ± 95% CI sometimes relevant (could be calculated in e.g. SPSS) 43 2015-11-09 Mann-Whitney Creatinine, cont 44 Extension to more than two groups Kruskal-Wallis test • Mann-Whitney U-test (k = 2) H0: Distribution A = Distribution B H1: Distribution A Distribution B 2015-11-09 E.g. • Kruskal-Wallis test (k > 2 groups) E.g. k = 3: H0: Distribution A = Distribution H1: Distribution A Distribution Distribution A Distribution Distribution B Distribution B = Distribution C B or C or C • Independent groups, independent observations within each group • Median useful marker for differences in distribution • The more the mean ranks differ, the lower the p-value will be 45 Now… back to our cotinine data… 2015-11-09 • How to compare levels between groups? 46 2015-11-09 The SPSS-output from a non-parametric test MannWhitney U-test: (cotinine levels in children in mothers of smokers vs non-smokers) Exercise: How could you interpret the output? Do you need to do more to show the results? Present medians, perhaps percentiles or inter-quartile-range, or boxplot Non-smokers: Smokers: Median Interquartile Range .13 .29 Median Interquartile Range 76.1 86.2 47 • Repetition • Normal distribution • Reference intervals and confidence intervals • Elements of statistical inference, Hypothesis testing • T-test, ANOVA • Central limit theorem • Criteria to be met for performing a T-test. 2015-11-09 Today • Lecture 4: • Non-parametric tests • Paired samples tests. 48 Paired data, cell-count example ControlsDay2 915600 953300 650000 700000 1050000 984000 772000 920000 1080000 920000 840000 533000 510000 722000 SalDay2 357800 502200 470000 560000 736000 556000 418000 600000 680000 520000 560000 620000 704000 696000 Two ways of comparing means: 1. Calculate the means of the groups, and estimate the difference 2. Estimate the difference for each row. Then calculate the mean of the differences 2015-11-09 Preparation 11 2 3 4 5 6 7 8 9 10 11 12 13 14 Means Difference between values 49 Preparation 11 2 3 4 5 6 7 8 9 10 11 12 13 14 Means ControlsDay2 915600 953300 650000 700000 1050000 984000 772000 920000 1080000 920000 840000 533000 510000 722000 824992,8571 SalDay2 357800 502200 470000 560000 736000 556000 418000 600000 680000 520000 560000 620000 704000 696000 570000 Difference between values 557800 451100 180000 140000 314000 428000 354000 320000 400000 400000 280000 -87000 -194000 26000 2015-11-09 T-test for paired data 50 Preparation 11 2 3 4 5 6 7 8 9 10 11 12 13 14 Means ControlsDay2 915600 953300 650000 700000 1050000 984000 772000 920000 1080000 920000 840000 533000 510000 722000 824992,8571 SalDay2 357800 502200 470000 560000 736000 556000 418000 600000 680000 520000 560000 620000 704000 696000 570000 Difference between values 557800 451100 180000 140000 314000 428000 354000 320000 400000 400000 280000 -87000 -194000 26000 254992,9 Difference between means= mean of the differences 2015-11-09 T-test for paired data 51 Preparation 11 2 3 4 5 6 7 8 9 10 11 12 13 14 Means ControlsDay2 915600 953300 650000 700000 1050000 984000 772000 920000 1080000 920000 840000 533000 510000 722000 824992,8571 SalDay2 357800 502200 470000 560000 736000 556000 418000 600000 680000 520000 560000 620000 704000 696000 570000 Difference between values 557800 451100 180000 140000 314000 428000 354000 320000 400000 400000 280000 -87000 -194000 26000 254992,9 s= 181454,0097 111808 216636,9 s (combined) SEM 150709,3394 56962,77603 57898,64 2015-11-09 T-test for paired data 52 Preparation 11 2 3 4 5 6 7 8 9 10 11 12 13 14 Means ControlsDay2 915600 953300 650000 700000 1050000 984000 772000 920000 1080000 920000 840000 533000 510000 722000 824992,8571 SalDay2 357800 502200 470000 560000 736000 556000 418000 600000 680000 520000 560000 620000 704000 696000 570000 Difference between values 557800 451100 180000 140000 314000 428000 354000 320000 400000 400000 280000 -87000 -194000 26000 254992,9 s= 181454,0097 111808 216636,9 Thus, the mean is not influenced on whether the data are paired or not, but the estimate of the standard deviation is likely to differ with method. 2015-11-09 T-test for paired data Use analyses for paired data when adequate! s (combined) SEM 150709,3394 56962,77603 57898,64 53 Paired samples t-test Sometimes it is more powerful to test for differences within the same patient (or another paired measurement) In a study of weight loss from spicy food, 12 subjects were weighed before and after a month on spicy food diet, see the table 2015-11-09 Previous t-test was made to find differences between independent groups of observations Discuss: How would you test if the diet gave weightloss? 54 Paired samples t-test Calculate the differences di for each subject’s weights. 2015-11-09 Do a paired samples t-test! Test if mean(d) = 0 95% CI for mean(d): Mean(d)± t0.025(N-1)∙𝑠 Here: -2.1 ± 2.2∙3.03 12 = -2.1 ± 1.9 = (-4.0,-0.17) 𝑁 Discuss: • How do you interpret the CI? • Was the treatment effective 55 Paired test: 95% CI for weight loss (-4.0,-0.17) P=0.036 If the researchers wouldn’t recognize the paired design, but did an independent groups’ t-test: Why so wide? CI = (-18.6;22.76) Large variability BETWEEN subjects inflates the P=0.84 variability of the difference in an independent groups’ design! 2015-11-09 Paired samples t-test 56 Normally distributed outcomes/’large’ studies Focus on mean comparisons • Two independent groups t-test • Paired groups (paired measurements) Paired t-test • > 2 groups Analysis of variance (ANOVA) 2015-11-09 Parametric methods for group comparisons If assumptions are not met: Non-parametric tests! • Two Independent groups • More than two independent groups • Paired groups (paired measurements) Mann-Whitney U-test Kruskal-Wallis test Wilcoxons signed rank test 57 Non-parametric test for paired samples: Wilcoxon signed rank test Spicy diet continued: Subject Subject 11 22 33 44 55 66 77 88 99 10 10 11 11 12 12 Pre Pre Post Post Diff Diff Sign Sign Rank Rank 65 65 88 88 125 125 103 103 90 90 76 76 85 85 126 126 97 97 142 142 132 132 110 110 62 62 86 86 118 118 105 105 91 91 72 72 81 81 122 122 95 95 145 145 132 132 105 105 -3 -3 -2 -2 -7 -7 22 11 -4 -4 -4 -4 -4 -4 -2 -2 33 00 -5 -5 Signed Signed rank rank ---++ ++ ----++ 5,5 5,5 33 11 11 33 11 88 88 88 33 5,5 5,5 -5,5 -5,5 -3 -3 -11 -11 33 11 -8 -8 -8 -8 -8 -8 -3 -3 5,5 5,5 -- 10 10 -10 -10 (Sum the negative ranks =56.5 (=11*12/2-9.5) ) … 2015-11-09 Sum the positive ranks: 3+1+5.5=9.5 58 Test-situation Parametric test Non-parametric test Independent samples, 2 groups T-test Mann-Whitney Independent ANOVA samples, ≥ 2 groups Kruskal-Wallis Paired samples, 2 groups Wilcoxon rank sum test Paired t-test 2015-11-09 Comparison of different tests 59 Parametric methods Nonparametric methods Ex Positive Negative T-test + Results in both effectmeasure (w CI) and pvalue. - Based on assumptions about the distribution of data - typically 𝑁(𝜇, 𝜎) + More effective to detect differences if data is (close to) normal - Test results could be sensitive to deviation from Normal distribution, especially in small studies + No assumptions about the distribution of data - + useful also for data measured on an ordinal scale Less powerful than parametric methods (if normal distribution applies) - Typically results only in p-value (but sometimes an effect measure with CI could be computed) MannWhitney + Suitable for small studies 2015-11-09 Two broad categories of statistical methods 60 Exercise: Which methods would you use to investigate the relationship between the mothers and their childrens values? 2015-11-09 Comparison of maternal cotinine levels and child’s cotinine levels 1. We know that cotinine levels are not suitable for a T-test. 2. This type of analysis should have a paired design - Wilcoxon signed test 61 2015-11-09 The SPSS-output from an analyses with paired data Exercise: Interpret the output What more should be shown? Childs levels Mothers levels: Median Interquartile Range .22 22.6 Median .22 Interquartile Range 27.5 The information is not very illustrative. We’ll learn more about correlations next lession 62 • Repetition • Normal distribution • Reference intervals and confidence intervals • Elements of statistical inference, Hypothesis testing • T-test, ANOVA • Central limit theorem • Criteria to be met for performing a T-test. 2015-11-09 Today • Lecture 4: • Non-parametric tests • Paired samples tests. Next lecture: • Linear regression • Correlation • R2 63