* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Two sample t-test - MGH Biostatistics Center
Survey
Document related concepts
Transcript
Comparison of two samples Summer program Brian Healy Previous classes Hypothesis testing – Null and Alternative hypotheses – Test statistic – p-value – Conclusion Confidence intervals Comparison of CI to hypothesis test Power and sample size What are we doing today? Two-sample t-test – Paired t-test – Independent samples Equal variance Unequal variance Sample size for two samples Big picture Up to this point, we have only concerned ourselves with one sample. Often we want to compare one group to another. What happens when we are comparing two samples? Variability in both samples, and potentially two samples are related Much of the theory is the same Example One of the first studies I analyzed was a tumor size study. Having an accurate measure of tumor size is extremely important because it allows a physician to accurately determine if a tumor is growing, shrinking or remaining constant. The problem is that often the measurements of the tumor size vary from physician to physician. In the past, tumor size was measured using the linear distance across the tumor, but this was found to be very variable because of the irregular shape of some tumors. A new method called the RECIST criteria traces the outside of the tumor. The RECIST method was believed to give more consistent measures of the volume of the tumor. Available data For a portion of the study, a pair of doctors were shown the same set of tumor pictures. The volume of the tumor was measured by two separate physicians under similar conditions. Question of interest: Did the measurements from the two physicians significantly differ? If not, then there would be no evidence that the volume measurements change based on physician. 20 scans were measured by each physician (10 are shown here) Measurements in cm3 What can you say about these samples? – Two measurement on the same person – They are related so we must account for this – Much research in statistics deals with how to handle correlated data, but in this case it is pretty easy Tumor Dr. 1 Dr. 2 1 15.8 17.2 2 22.3 20.3 3 14.5 14.2 4 15.7 18.5 5 26.8 28.0 6 24.0 24.8 7 21.8 20.3 8 23.0 25.4 9 29.3 27.5 10 20.5 19.7 Dependent sample Tumor Dr. 1 We can measure the effect of the treatment in 1 15.8 each person by taking the 2 22.3 difference di x1i x2i Instead of having two samples, we can consider our dataset to be one sample of differences – Just like the one sample problem Dr. 2 Difference 17.2 -1.4 20.3 2.0 3 14.5 14.2 0.3 4 15.7 18.5 -2.8 5 26.8 28.0 -1.2 6 24.0 24.8 -0.8 7 21.8 20.3 1.5 8 23.0 25.4 -2.4 9 29.3 27.5 1.8 10 20.5 19.7 0.8 Differences Volume from Dr. 1 – Population mean: 1 – Sample mean: x1 Volume from Dr. 2 – Population mean: 2 – Sample mean: x 2 Difference – Population mean: 1 2 n – Sample mean: d d i 1 n i Distribution of differences Assuming di’s are normally distributed, can use t-distribution with n-1 dof where n is the number of differences d t sd n Standard deviation of differences d n sd i 1 i d 2 n 1 Test statistic acts just like one sample Picture 3 4 5 Histogram of diff 2 1 0 We can see that the assumption of normality of the differences is reasonable in this case Frequency -4 -3 -2 -1 0 diff 1 2 3 Paired t-test 1) Null hypothesis: No difference between physicians effect H 0 : dr1 dr 2 dr1 dr 2 0 2) 3) Two dependent samples; alpha=0.05 Test statistic: t-statistic with dof t d sd 4) 5) 6) n 0.24 0.646 1.66 20 p-value=0.53 Fail to reject null hypothesis Conclusion: there is no evidence of a difference in tumor volume measurement based on physician Confidence interval Confidence interval for paired t-test constructed in the same way as one-sample ttest sd sd , d t1a / 2 d t1a / 2 n n For our example, the confidence interval is (-1.01 0.54) Note that the conclusion from the hypothesis test and the confidence interval are the same Paired t-test in R Using the help menu, determine how to complete the paired t-test in R. Paired t-test in R data<-read.table(P:\\”pairedscans.dat”, header=F) dr1<-data[,1]; dr2<-data[,2] t.test(dr1, dr2, paired=T) – The output provides the p-value and the confidence interval Paired t-test data: data[, 1] and data[, 2] t = -0.6456, df = 19, p-value = 0.5262 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.0180279 0.5380279 sample estimates: mean of the differences -0.24 Extensions Some additional examples of paired samples are: – Differences between left and right eye – Differences between dominant and recessive hand – Matched samples When you have more than two samples, techniques account for the correlation between the samples – Multivariate / longitudinal data Unpaired samples Often it is impractical to design study to use the same patients for both group – Ex. Comparison of cholesterol in males and females – Ex. Time constraints Since the samples are not paired, we cannot use the difference between the individual samples – Must adjust previous analysis Example Another aspect of the tumor volume study was trying to compare the tumor volume among patients with different forms of cancer. The average tumor size is important to know the effect of treatment can be determined. In this study, patients with brain, breast and liver tumors, but initially we will only compare the brain and breast cancers. All of the tumors were measured using the RECIST method Null hypothesis The null hypothesis is that there is no difference between the volume of the tumor in the two forms of cancer H0: brain =breast , or brain – breast =0 More generally, we can test if the difference between two groups is a specific value, 1-2=D – This occurs when comparing two treatment groups and we are interested if the two groups are different by a specific amount Each patient contributes one observation Can estimate from the sample (1 , 1 ) – Mean and standard deviation in brain cancer group with ( x1 , sd1 ) ( 2 , 2 ) – Mean and standard deviation in breast cancer group with ( x2 , sd 2 ) Are the two groups the same? – H0: 1=2, or 1-2=0 – To determine this, we are going to look at – We also need to know Var x1 x2 Var x1 Var x2 12 n1 x1 x2 22 n2 Difference in the sample means We are going to use the difference of the means as our test statistic, but we need to estimate the variance of this difference to determine if the difference is significant Basic form of test statistic: – Standard deviations known z x x 1 2 1 2 1 n1 2 2 2 unknown x x t n2 1 2 s x1 x2 1 2 The estimate of the standard deviation changes when – The samples have equal variance OR – The samples have unequal variance Equal variance Sometimes we will be willing to assume that the variance in the two groups is equal: 12 22 2 If we know this variance, we can use the zstatistic x x z 1 2 1 2 1 1 n1 n2 Often we have to estimate 2 with the sample variance from each of the samples, s12 , s22 Since we have two estimates of one quantity, we pool the two estimates Equal variance continued The estimate of is given by: 2 2 n 1 s n 1 s 1 2 2 s2 1 n1 n2 2 The t-statistic based on the pooled variance is very similar to the z-statistic as always: x1 x2 1 2 p t sp 1 1 n1 n2 The t-statistic has a t-distribution with n1 n2 2 degrees of freedom 2 1 0 Frequency 3 4 Histogram of size[(gr == 0)] 13 14 15 16 17 18 19 20 size[(gr == 0)] 6 8 Histogram of size[(gr == 1)] 4 Breast n 20 28 xbar 16.2 cm3 17.5 cm3 s2 3.49 6.0 Frequency Brain 2 For the tumor volume study, there were 20 brain cancer subjects and 28 breast cancer subjects The summary statistics and histogram for the data are given here What can you say about the distributions? Does the equal variance assumption seem valid in this case? 0 12 14 16 18 size[(gr == 1)] 20 22 Hypothesis test H0: mean brain tumor size = mean breast tumor size 2) Two independent samples with equal variance; alpha = 0.05 3) t x x 16.2 17.5 0 2.054 1) 1 2 sp 1 1 1 n1 n2 2 2.23 1 1 20 28 p-value: 0.046 5) Reject null hypothesis 6) Conclusion: There is a significant difference in the size of brain and breast cancer tumors 4) R code If we only had the test statistics above, we can calculate the test statistic and then compare it to the t-distribution using pt(-2.054 ,df=46) to determine the area in the lower tail How do we convert this into the appropriate pvalue? With the full data, we can use data<-read.table(“cancer.dat”,header=T) gr<-data[,1]; size<-data[,2] t.test(size[(gr==0)], size[(gr==1)], var.equal=T) R output Two Sample t-test data: size[(gr == 0)] and size[(gr == 1)] t = -2.054, df = 46, p-value = 0.04568 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.65174438 -0.02682705 sample estimates: mean of x mean of y 16.15000 17.48929 Unequal variance Often, we are unwilling to assume that the variances are equal We now write the test statistic as: t x x 1 2 1 2 s12 s22 n1 n2 The distribution of this statistic is difficult to derive and we approximate the distribution using a t-distribution with n degrees of freedom s n s n n s n s n 2 1 2 1 2 2 1 2 1 n1 1 2 2 2 2 2 2 n2 1 This is called the Satterthwaite or Welch approximation – When you complete a two-sample t-test in R and the variances are not assumed equal, this approximation is used Example For the comparison of the brain cancers to the liver cancers, the variances are much more different. Let’s use the unequal variance two sample t-test in this case Brain Liver n 20 17 xbar 16.2 cm3 19.35 cm3 s2 3.49 14.4 Example H0: mean brain tumor size = mean liver tumor size 2) Two independent samples with equal variance; alpha = 0.05 x x 16.15 19.35 0 3.17 3) t 1) 1 2 1 s12 s22 n1 n2 2 3.5 14.4 20 17 p-value: 0.0044 5) Reject null hypothesis 6) Conclusion: There is a significant difference in the size of the brain and liver tumor size 4) R output > t.test(size[(gr==0)],size[(gr==2)]) Welch Two Sample t-test data: size[(gr == 0)] and size[(gr == 2)] t = -3.1666, df = 22.48, p-value = 0.00439 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.288291 -1.105827 sample estimates: mean of x mean of y 16.15000 19.34706 Practice Get the TV dataset from the course folder We want to compare the amount of TV the boys and girls watch. Perform the most appropriate test. Boys are coded as 0 and girls are coded as 1. Can we test if the variances are equal? Since we can never be sure if the variances are equal, could we test if they are equal? Of course we can!!! – But, remember there is error in every statistical test – Sometimes it is just preferred to use the unequal variance unless there is a good reason Equality of variance H0: 12 22 To test this hypothesis, we use the sample variances: s12 , s22 If one of the variances is much larger than the other, this is evidence against the null As we discussed a couple classes ago: s12 2 ~ n1 1 2 1 s22 22 ~ 2n2 1 Test of equality One way to test if the two variances are equal is to check if the ratio is equal to 1 (H0: ratio=1) 2 s Under the null, the ratio simplifies to 1 s22 The ratio of 2 chi-square random variables has an Fdistribution The F-distribution is defined by the numerator and denominator degrees of freedom Here we have an F-distribution with n1-1 and n2-1 degrees of freedom This works better with s12 s22 F-distribution Here is the Fdistribution with 5 and 500 degrees of freedom Note the skew of the distribution 2000 1000 0 Frequency 3000 Histogram of rf(10000, df1 = 5, df2 = 500) 0 1 2 3 rf(10000, df1 = 5, df2 = 500) 4 5 Example > var.test(size[(gr==1)],size[(gr==0)]) F test to compare two variances data: size[(gr == 1)] and size[(gr == 0)] F = 1.719, num df = 27, denom df = 19, p-value = 0.2247 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.710335 3.904512 sample estimates: ratio of variances 1.719033 > var.test(size[(gr==2)],size[(gr==0)]) F test to compare two variances data: size[(gr == 2)] and size[(gr == 0)] F = 4.1182, num df = 16, denom df = 19, p-value = 0.004156 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 1.589643 11.111060 sample estimates: ratio of variances 4.118214 Practice Example from Rosner, Principles of Biostatistics (Problems 8.88 and 8.89) The following table compares the balance scores in patients with rheumatoid arthritis and osteoarthitis. Which test is most appropriate and what is the conclusion you would draw? Mean SD n RA 3.4 3.0 36 OA 2.5 2.8 30 Power and sample size As with the one sample case, we can find power and sample size for a two sample problem For two dependent samples, the power and sample size can be calculated exactly as in the one sample case because the paired t-test is a one sample problem For two independent samples, the power and sample size is slightly different One sample case (review) To find the sample size in the one sample case we needed – – – – – The hypothesized difference in the means The alpha level The power The variance in the sample One-sided or two sided test z n 2 z 1 / 2 1 2 1 0 2 Two sample case We still need to have the following pieces of information. For equal sample size, n 2 1 22 z1 / 2 z1 2 2 1 2 For sample sizes n2=kn1, n1 n2 k 2 1 22 / k z1 / 2 z1 2 2 1 2 2 1 22 z1 / 2 z1 2 1 2 2