Download day2-E2005

Ph.D. COURSE IN BIOSTATISTICS DAY 2 SOME RESULTS ABOUT MEANS AND VARIANCES The sample mean and the sample variance were used to describe a typical value and the variation in the sample. We may similarly use the population mean, the expected value, and the population variance to describe the typical value and the variation in a population. These values are often referred to as the theoretical values, and the sample mean and the sample variance are considered as estimates of the analogous population quantities. If X represents a random variable, e.g. birth weight or blood pressure, the mean and variance are often denoted mean  E ( X )   variance  Var ( X )   2 The notation is also used when the distribution is not normal. 1 The random variation in a series of observations is transferred to uncertainty, i.e. sampling error or sampling variation, in estimates computed from the observations. The average, or sample mean, is an important example of such an estimate. Let X 1 , X 2 , , X n denote a random sample of size n from a population with mean  and variance  2 then the average X is itself a random variable. If several samples of size n are drawn from the population, the average value will vary between samples. Terminology: A ”random sample” implies that the observations are mutually independent replicates of the experiment ”take a unit at random from the population and measure the value on this unit”. For the average (sample mean) we have E( X )   Var ( X )  2 n 2 The sample mean is an unbiased estimate of the population mean. The variance of the sample mean is proportional to the variance of a single observation and inversely proportional to the sample size. The standard deviation of the sample mean = standard error of the mean   n = s.e.m. Interpretation: The expected value, the variance, and the standard error of the mean are the values of these quantities that one would expect to find if we generated a large sample of averages each obtained from independent random samples of size n from the same population. The result shows that the precision of the sample mean increases with the sample size. Moreover, if the variation in the population follows a normal distribution the sampling variation of the average also follows a normal distribution X N (  ,  2 n) 3 Consider a random sample X 1 , X 2 , , X n of size n from a population 2 with mean  X and variance  X and Y1 , Y2 , , Ym an independent random sample of size m from a population with mean Y and 2 variance  Y . For the difference between the sample means we have E ( X  Y )  E ( X )  E (Y )   X  Y Var ( X  Y )  Var ( X )  Var (Y )   X2 n   Y2 m These results are a consequence of the following general results • Linear transformations of random variables (change of scale) E (a  bX )  a  bE ( X ) Var (a  bX )  b 2Var ( X ) • The expected value of a sum of random variables E (a0  a1 X 1   an X n )  a0  a1E ( X 1 )   an E ( X n ) • The variance of a sum of independent random variables Var (a0  a1 X 1   an X n )  a12Var ( X 1 )   an2Var ( X n ) 4 For a random sample of size n from a normal distribution the result above can be reformulated as X   n standard normal distribution The standard normal distribution is tabulated, so for given values of  and  this relation can be used to derive probability statements about the sample mean. The sampling distribution of the variance The sample variance s2 is also a statistic derived from the observations and therefore subject to sampling variation. For a random sample from a normal distribution one may show that E(s 2 )   2 so the sample variance is an unbiased estimate of the population variance 5 For a random sample of size n from a normal distribution the sampling error of the sample variance can also be described. We have s2  (n  1) 2  2-distribution with f = n -1 degrees of freedom The  -distributions (chi-square distributions) are tabulated so for 2 a given value of  this relation can be used to derive probability 2 statements about the sample variance. A  -distribution is the distribution of a sum of independent, squared standard normal variates. 2 6 n=5 n = 10 n = 20 n = 50 n = 100 4 2 0 0 1 2 3 4 2 The distribution of the sample variance when   1 for various n. 6 INTRODUCTION TO STATISTICAL INFERENCE Statistical inference: The use of a statistical analysis of data to draw conclusions from observations subject to random variation. Data are considered as a sample from a population (real or hypothetical) The purpose of the statistical analysis is to make statements about certain aspects of this population The basic components of a statistical analysis • Specification of a relevant statistical model (the scientific problem is ”translated” to a statistical problem) • Estimation of the population characteristics (the model parameters) • Validation of the underlying assumptions • Test of hypotheses about the model parameters. A statistical analysis is always based on a statistical model, which formalizes the assumptions made about the sampling procedure and the random and systematic variation in the population from which the sample is drawn. 7 The validity of the conclusions depends on the degree to which the statistical model gives an adequate description of the sampling procedure and the random and systematic variation. Consequently, checking the appropriateness of the underlying assumptions (i.e. the statistical model) is an important part of a statistical analysis. The statistical model should be seen as an approximation to the real world. The choice of a suitable model is always a balance between complex models, which are close approximations, but very difficult to use in practice, and simple models, which are crude approximations, but easy to apply 8 Example: Comparing the efficacy of two treatments Design: Experimental units (e.g. patients) are allocated to two treatments. For each experimental unit in both treatment groups an outcome is measured. The outcome reflects the efficacy of the treatment. Purpose: To compare the efficacy of the two treatments Analysis: To summarize the results the average outcome is computed in each group and the two averages are compared. Possible explanations for a discrepancy between the average outcome in the two groups • The treatments have different efficacy. One is better than the other • Random variation • Bias originating from other differences between the groups. Other factors which influence the outcome may differ between the groups and lead to apparent differences between the efficacy of the two treatments (confounding). 9 A proper design of the study (randomization, blinding etc.) can eliminate or reduce the bias and therefore make this explanation unlikely. Bias correction (control of confounding) is also possible in the statistical analysis. The statistical analysis is performed to estimate the size of the treatment difference and evaluate if random variation is a plausible explanation for this difference. If the study is well-designed and the statistical analysis indicates that random variation is not a plausible explanation for the difference, we may conclude that a real difference between the efficacy of the two treatments is the most likely explanation of the findings. The statistical analysis can also identify a range of plausible values, a so-called confidence interval, for the difference in efficacy. 10 STATISTICAL ANALYSIS OF A SAMPLE FROM A NORMAL DISTRIBUTION Example. Fish oil supplement and blood pressure in pregnant women Purpose: To evaluate the effect of fish oil supplement on diastolic blood pressure in pregnant women. Design: Randomised controlled clinical trial on 430 pregnant women, enrolled at week 30 and randomised to either fishoil supplement or control. Data: Diastolic and systolic blood pressure at week 30 and 37 (source: Sjurdur Olsen) The Stata file fishoil.dta contains the following variables grp group difsys difdia treatment group, 1 for control, 2 for fish oil a string variable with the name of the group allocation increase in systolic blood pressure from week 30 to week 37 increase in diatolic blood pressure from week 30 to week 37 11 We shall here consider the change in diastolic blood pressure from week 30 to week 37. Stata histogram difdia , by(group) Control Fishoil .06 Density .04 .02 0 -40 -20 0 20 40 -40 difdia Graphs by group -20 0 20 40 12 Stata qnorm difdia if grp==1, title("Control") saving(q1,replace) qnorm difdia if grp==2, title("Fish oil") saving(q2,replace) graph combine q1.gph q2.gph 40 40 20 20 0 0 -20 -20 -40 -40 -20 -10 0 Inverse Normal /// Fish oil difdia difdia Control /// 10 20 -20 -10 0 10 Inverse Normal 20 13 For both groups the histogram and the probability plot correspond closely to the expected behavior of normally distributed data. Hence our statistical model is: The observations in each group can be considered as a random sample from a normal distribution with unknown parameters as below: Group Mean Variance Control 1  12 Fishoil 2  22 The two sets of observations are independent. The ultimate goal of the analysis is to compare the two treatments with respect to the expected change in blood pressure. We shall return to this analysis later. First we want to examine the change in the diastolic blood pressure in women in the control group. 14 We now consider the control group and focus on the increase in diastolic blood pressure Problem: Do the data suggest that the diastolic blood pressure in pregnant women increases from week 30 to week 37? Data The observed values of the change in diastolic blood pressure in the 213 women who participated in the study Statistical model The data are considered as a random sample of size 213 from a 2 normal distribution with mean  and variance  . The parameter  describes the expected change and the parameter  2 describes the random variation caused by biological factors and measurement errors. 15 Assumptions The assumptions of the statistical model are 1. The observations are independent 2. The observations have the same mean and the same variance 3. A normal distribution describes the variation. Checking the validity of the assumptions is usually done by various plots and diagrams. Knowledge of the measurement process can often help in identifying points which need special attention. Re 1. Checking independence often involves going through the sampling procedure. Here the assumption would e.g. be violated if the same woman contributes with more than one pregnancy. Re 2. Do we have “independent replications of the same experiment”? Factors that are known to be associated with changes in blood pressure are not accounted for in the model. They contribute to the random variation. 16 Re 3. The plots above indicate that a normal distribution gives an adequate description of the data Estimation The estimation problem: Find the normal distribution that best fits the data. Solution: Use the normal distribution with mean equal to the sample mean and variance equal to the sample variance. sum difdia if grp==1 Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------difdia | 213 1.901408 7.528853 -28 29 i.e. ˆ  x  1.90 ˆ 2  s 2  56.68 ( s  7.53) Note: A normal distribution is completely determined by the values of the mean and the variance. Convenient notation: A “^” on top of a population parameter is used to identify the estimate of the parameter 17 Question: Do the data suggest a systematic change in the diastolic pressure? No systematic change means that the expected change is 0, i.e. Hypothesis: The data are consistent with the value of  being 0. This hypothesis is usually written as H :   0 We have observed an average value of 1.90. Is sampling variation a possible explanation for the difference between the observed value of 1.90 and the expected value of 0? Statistical test To evaluate if the random variation can account for the difference we assume that the hypothesis is true and compute the probability that the average value in a random sample of size 213 differs by at least as much as the observed value. From the model assumptions we conclude that the average can be considered as an observation from a normal distribution with mean 0 and standard deviation equal to  n  213 18 Consequently, the standardized value z x 0  213 is an observation from a standard normal distribution. Problem: The population standard deviation is unknown, but in large samples we may use the sample standard deviation and still rely on the normal distribution. Small samples are considered later. Replacing  with the estimate s we therefore get "z"  x 0 1.901  0 1.901    3.69 s 213 7.529 213 0.516 For a normal distribution a value more than 3 standard deviations from the mean is very unlikely to occur. Using a table of the standard normal distribution function we find that a value that deviates more 3.69 in either direction occurs with a probability of 0.00023. 19 p-value The probability computed above is called the p-value. p-value = the probability of obtaining a value of the test statistics as least as extreme as the one actually observed if the hypothesis is true. Usually extreme values in both tails of the distribution are included (two-sided test), so in the present case p  value  P( z  3.69)  P( z  3.69)  2  0.000114  0.00023 0.4 0.3 -3.69 3.69 0.2 0.1 0.0 -4 -3 -2 -1 0 1 2 3 4 The calculation indicates that sampling variation is a highly implausible explanation for the observed change in blood pressure. The observed deviation from the hypothesized value is statistically significant. 20 Usually a hypothesis is rejected if the p-value less than 0.05. SMALL SAMPLES – use of the t – distribution To compute the p-value above we replaced the unknown population standard deviation with the sample standard deviation and referred the value of the test statistic to a normal distribution. For large samples this approach is unproblematic, but for small samples the p-value becomes too small, since the sampling error of the sample standard deviation is ignored. Statistical theory shows that the correct distribution of the test statistic is a so-called t-distribution with f = n – 1 degrees of freedom. The t-distribution has been tabulated, so we are still able to compute a p-value. Note that the t-distributions does not depend on the 2 parameters  and  so the same table applies in all situations. As the sample size increases the t-distribution will approach a standard normal distribution. Usually the approximation is acceptable for samples larger than 60, say. 21 If we again compute t x 0  3.69 s 213 but this time look-up the value in a table of a t-distribution with f = 212 degrees of freedom, we get p = 0.00029. Since the sample is relatively large the result is almost identical to the one above. standard normal t-dist. n = 5 t-dist. n = 20 t-dist. n = 100 0.4 0.3 0.2 0.1 0.0 -4 -2 0 2 4 A comparison of t-distribution with 4, 19, and 99 degrees of freedom and a standard normal distribution (the black curve!) 22 STATA: PROBABILITY CALCULATIONS Output from statistical programs like Stata usually also includes p-values so statistical tables are rarely needed. Moreover Stata has a lot of build-in functions that can compute almost any kind of probabilities. Write help probfun to see the full list. Some examples display norm(3.6858) returns .99988601, the value of the cumulative probability function of a standard normal distribution at 3.6858, i.e. P( Z  3.6858), the probability that a standard normal variate is less than or equal to 3.6858 display ttail(212,3.6858) returns .00014478, the probability that a t-statistic with 212 degrees of freedom is larger than 3.6858. display Binomial(224,130,0.5134) returns .02608126, the probability of getting at least 130 successes from a Binomial distribution with n = 224 and p = 0.5134. 23 ONE SAMPLE t-TEST: THE GENERAL CASE Above we derived the t-test of the hypothesis H :   0 The same approach can be used to test if any specified value is consistent with the data. If we e.g. want to test the hypothesis H :   2 we compute t x 2 1.901  2 0.099    0.1911 s 213 7.529 213 0.516 display 2*ttail(212,0.1911) returns the p-value .84863014, so an expected change of 2 is compatible with the data and can not be rejected. Note: The function ttail gives a probability in the upper tail of the distribution. A negative t-value should therefore be replaced by the corresponding positive value when computing the p-value. 24 CONFIDENCE INTERVALS In the example the observed average change in blood pressure is 1.901, and this value was used as an estimate of the expected change  Values close to 1.901 are also compatible with the data, we saw e.g. that the value 2 could not be rejected. Problem: Find the range of values for the expected change that is supported by the data. A confidence interval is the solution to this problem. Formally: A 95% confidence interval identifies the values of the unknown parameter which would not be significantly contradicted by a (two-sided) test at the 5% level, because the p-value associated with the test statistic for each of these values is larger than 5% 25 Frequency interpretation: If the experiment is repeated a large number of times and a 95% confidence interval is computed for each replication, then 95% of these confidence intervals will contain the true value of the unknown parameter. How to calculate the 95% confidence interval The limits of the confidence interval are the values of t equal to the 2.5 and 97.5 percentile of a t-distribution with n – 1 degrees of freedom. The t-distribution is symmetric around 0, so t0.025  t0.975 and the the confidence limits are therefore given by the values of  satisfying i.e. x  t0.975 x   t0.975 s n s s    x  t0.975 n n The formula shows the confidence intervals becomes more narrow as the sample size increases. 26 Example continued In Stata the command invttail gives the upper percentiles and display invttail(212,0.025) returns 1.971217. The 95% confidence limits for the expected change in diastolic blood pressure therefore becomes x  t0.975 0.88 s  1.9014  1.9712  0.5159   n 2.92 and the 95% confidence interval becomes 0.88    2.92 99% confidence intervals are derived from the upper 0.5 percentile in a similar way. Also, one-sided confidence intervals can be defined and computed from one-sided statistical test (statistical tests are called one-sided if large deviations in only one direction are considered extreme). 27 STATA: ONE SAMPLE t-TEST A single command in Stata will give all the results derived so far. ttest difdia=0 if grp==1 One-sample t test --------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+----------------------------------------------------------difdia | 213 1.901408 .5158685 7.528853 .8845197 2.918297 --------------------------------------------------------------------t = 3.6858 hypothesis tested degrees of freedom = 212 Ha: mean < 0 Pr(T < t) = 0.9999 Ho: mean(difdia) = 0 Ha: mean != 0 pr(|T| > |t|) = 0.0003 Ha: mean > 0 Pr(T > t) = 0.0001 (two-sided) p-value To test the hypothesis H :   2 use ttest difdia=2 if grp==1 instead 28 Statistical inference about the variance So far we have looked at statistical inference about the mean of a normal population based on a random sample. In the same setting we can also derive a test statistic for hypotheses about the variance (or the standard deviation) and obtain confidence intervals for this parameter. The arguments are based on the result about the sampling distribution of the sample variance (see p. 6) s2  (n  1) 2  2 -distribution with f = n -1 degrees of freedom Inference problems involving a hypothesis about the variance are much less common, but may e.g. arise in studies of methods of measurement Example continued Suppose we for some reason want to see if the change in diastolic blood pressure has a standard deviation of 7, or equivalently a variance of 49. 29 To test the hypothesis H :   7 we could compute s2 56.68 (213  1)  (213  1)  245.24 49 49 2 and see if this value is extreme when referred to a  -distribution on 212 degrees of freedom. Using Stata’s probability calculator, display chi2(212,245.24), we get .94165889.This is the probability of a values at less than or equal to 245.24. The probability of getting a value larger than 245.24 is 1-.94165889 = .05834111. Stata can also give this result directly from the command display chi2tail(212,245.24). The p-value is 2 times the smallest tail probability, i.e. 0.117. A standard deviation of 7 can not be rejected. Rule: If the test statistic, x, is smaller than the degrees of freedom, f, use display 2*chi2(x,f) else use display 2*chi2tail(x,f) 30 Confidence intervals for variances and standard deviations 2 A 95% confidence interval for the population variance  is given by f  s2 2  0.975 2  f  s2 2  0.025 where f is the degrees of freedom and  0.025 and  0.975 are the 2.5 2 and the 97.5 percentiles of a  -distribution with f degrees of freedom. 2 2 A 95% confidence interval for the standard deviation therefore becomes s f  2 0.975   s f 2  0.025 Example – diastolic blood pressure continued Stata’s probability calculator has a function invchi2 that computes 2  percentiles of -distributions. We find that display invchi2(212,0.025) gives 173.5682 display invchi2(212,0.975) gives 254.2178 31 A 95% confidence interval for the standard deviation is therefore 7.5289 212 212    7.5289 254.2178 173.56823  6.88    8.32 More Stata A test of a hypothesis about the standard deviation is carried out by the command sdtest difdia=7 if grp==1 One-sample test of variance ---------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-----------------------------------------------------------difdia | 213 1.901408 .5158685 7.528853 .8845197 2.918297 ---------------------------------------------------------------------sd = sd(difdia) c = chi2 = 245.2435 hypothesized value degrees of freedom = Ho: sd = 7 212 Ha: sd < 7 Pr(C < c) = 0.9417 Ha: sd != 7 2*(C > c) = 0.1166 Ha: sd > 7 Pr(C > c) = 0.0583 (two-sided) p-value Note that the 95% confidence interval is the confidence interval for the population mean and not for the standard deviation. 32 STATISTICAL ANALYSIS OF TWO INDEPENDENT SAMPLES FROM NORMAL DISTRIBUTIONS Example. Fish oil supplement and blood pressure in pregnant women The study was a randomized trial carried out to evaluate the effect of fishoil supplement on diastolic blood pressure in pregnant women. Pregnant women were assigned at random to one of two treatment groups. One group received fish oil supplement, the other was a control group. Here we shall compare the two treatments using difdia, the change in diastolic blood pressure, as outcome, or response. We have already seen histograms and Q-Q plots of the distribution of difdia in each of the two groups (see p. 12-13) and these plots suggest that the random variation may be adequately described by normal distributions. 33 The standard analysis of this problem is based on the following statistical model The observations in each group can be considered as a random sample from a normal distribution with unknown parameters as below: Group Mean Variance Control 1 2 Fishoil 2 2 The two sets of observations are independent. Note that the size of the random variation is assumed to be the same in the two groups, so this assumption should also be checked. The purpose of the analysis is to quantify the difference between the expected change in the two groups and assess if this difference is statistically different from 0 34 Model assumptions 1. Independence within and between samples 2. Random samples from population with the same variance 3. The random variation in each population can be described by a normal distribution Note: The model assumptions imply that if this difference is not statistically different from 0 we may conclude that the distributions are not significantly different, since a normal distribution is completely 2 determined by the parameters  and  . Re 1. Inspect the design and the data. Repeated observations on the same individual usually imply violation of the independence assumption. Re 2. A formal test of the hypothesis of identical variances of normal distribution is described below. Re 3. Histograms and Q-Q plots, see page 12-13 35 Estimation Basic idea: population values are estimated by the corresponding sample values. This gives two estimates of the variance, which should be pooled to a single estimate. Stata performs the basic calculations with bysort grp: summarize difdia _________________________________________________________________ -> grp = control Variable | Obs Mean Std. Dev. Min Max ---------+-----------------------------------------------------difdia | 213 1.901408 7.528853 -28 29 _________________________________________________________________ -> grp = fish oil Variable | Obs Mean Std. Dev. Min Max ---------+------------------------------------------------------difdia | 217 2.193548 8.364904 -28 31 i.e. control group: mean = 1.90 fish oil group: mean = 2.19 36 The standard deviations are rather similar, so let us assume for a moment that it is reasonable to derive a pooled estimate. How should this be done? Statistical theory shows that the best approach is to compute a pooled estimate of the variance as a weighted average of the sample variances and use the corresponding standard deviation as the pooled estimate. The weighted average uses weights proportional to the degrees of freedom, i.e. f = n – 1. Hence s 2 pooled f1s12  f 2 s22 s  f1  f 2 2 p and s pooled  s p  s 2p Stata does not include this estimate in the output above, but the result is produced by the commands quietly regress difdia grp display e(rmse) giving the output 7.9617662, i.e. sp = 7.962 writing quietly in front suppresses output from the command the string variable group can not be used here 37 Statistical test comparing means of two independent samples The expected change in diastolic blood pressure is slightly higher in the fish oil group. Does this reflect a systematic effect? To see if random variation can explain the difference we test the hypothesis H : 1  2 of identical population means in the two samples. The line of argument is similar to the one that was used in the onesample case. Assume that the hypothesis is true. This observed difference between the two means must then be caused by sampling variation. The plausibility of this explanation is assessed by computing a p-value, the probability of obtaining a result at least as extreme as the observed. 38 From the model assumption we conclude that if the hypothesis is true then the difference between the sample means can be considered as an observation from a normal distribution with mean 0 and variance 1 1 Var  X 1  X 2        n1 n2  n1 n2  2 2 2 Consequently, the standardized value x1  x2  1 n1  1 n2 is an observation from a standard normal distribution. If the standard deviation  is replace by the pooled estimate s p we arrive at the test statistic x1  x2 t sp 1 n1  1 n2 39 To derive the p-value this test statistic should be referred to a t-distribution with f1  f 2  (n1  1)  (n2  1)  n1  n2  2 degrees of freedom, since we may show that sampling distribution of the pooled variance estimate is identical to the sampling distribution of a variance estimate with f1  f 2 degrees of freedom (see page 6). We get x1  x2 t sp 1 n1   1 1.9014  2.1935 1 7.9618 n2 213   0.38 1 217 and the p-value becomes 0.70. The difference is not statistically significant different from 0. 0.4 0.3 0.2 0.1 0.0 -4 -3 -2 -1 0 1 2 3 4 40 Confidence intervals for the parameters of the model The model has unknown three parameters 1 , 2 , and  . A 95% confidence interval for the expected value 1 becomes 2 x1  t0.975 sp n1  1  x1  t0.975 sp n1 and similarly for 2 . Note that the pooled standard deviation is used and t.975 is therefore the 97.5 percentile of a t-distribution with f1  f 2 degrees of freedom. For the change in diastolic blood pressure we get 1.901  1.966 7.962 7.962  1  1.901  1.966 213 213  0.83  1  2.97 Note: some programs, e.g. Stata, use the separate sample standard deviation when computing these confidence intervals. A 95% confidence interval for the standard deviation is based on the pooled estimate with 212 + 216 = 428 degrees of freedom (see page 31) 7.962 428 428    7.962 487.21 372.57  7.46    8.53 41 Confidence intervals for the difference between means In a two-sample problem the parameter of interest is usually   1  2 , the difference between the expected values. From the results above (page 39) we get  x1  x2   t0.975 s p 1 n1  1 n2     x1  x2   t0.975 s p 1 n1  1 n2 where the t-percentile refers to a t-distribution with f1  f 2 degrees of freedom. The example 1.901  2.194   1.966  7.962 1 213  1 217    1.901  2.194   1.966  7.962  1 213  1 217 1.80    1.22 42 STATA: TWO SAMPLE t-TEST (equal variances) A single command in Stata gives all the results derived so far except and estimate of the pooled variance (see page 37) ttest difdia , by(grp) s.d. in combined samples, not pooled s.d. Two-sample t test with equal variances --------------------------------------------------------------------Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+----------------------------------------------------------control | 213 1.901408 .5158685 7.528853 .8845197 2.918297 fish oil | 217 2.193548 .5678467 8.364904 1.074318 3.312778 ---------+----------------------------------------------------------combined | 430 2.048837 .3835675 7.953826 1.294932 2.802743 ---------+----------------------------------------------------------diff | -.2921399 .7679341 -1.801531 1.217252 --------------------------------------------------------------------diff = mean(control) - mean(fish oil) t = -0.3804 Ho: diff = 0 degrees of freedom = 428 hypothesis tested Ha: diff < 0 Pr(T < t) = 0.3519 Ha: diff != 0 Pr(|T| > |t|) = 0.7038 (two sided) p-value Ha: diff > 0 Pr(T > t) = 0.6481 43 Comparing the variances: The F-distribution In the statistical model we assumed the same variance in the two populations. To assess this assumption we consider a statistical 2 2 test of the hypothesis H :  1   2 An obvious test statistic is the ratio of sample variances s12 F 2 s2 A value close to 1 is expected if the hypothesis is true. Both small and large values would suggest that the variances differ. From statistical theory follows that the distribution of the ratio of two independent variance estimates is a so-called F-distribution if the corresponding population variances are identical (i.e. if H is true). The F-distribution is characterized by a pair of degrees of freedom (the degrees of freedom for the two variance estimates). Like normal, t-, and chi-square distributions the F-distributions are extensively tabulated. 44 Comparing the variances In practice the hypothesis of equal variances is tested by computing max( s12 , s22 ) largest variance estimate Fobs   2 2 min( s1 , s2 ) smallest variance estimate and p-value is then obtained as p  2  P  F  Fobs  where the pair of degrees of freedom are those of the numerator and the denominator. Example For the change in diastolic blood pressure we have Fobs 69.97   1.2344 56.68 Stata’s command display 2*Ftail(216,212,1.2344) returns 0.125, so the p-value becomes 0.125. The difference between the two standard deviations is not statistically significant. 45 STATA: COMPARISON OF TWO VARIANCES Stata’s command sdtest can also be used to compare two variances. Write sdtest difdia , by(grp) Variance ratio test ------------------------------------------------------------------Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+--------------------------------------------------------control | 213 1.901408 .5158685 7.528853 .8845197 2.918297 fish oil | 217 2.193548 .5678467 8.364904 1.074318 3.312778 ---------+--------------------------------------------------------combined | 430 2.048837 .3835675 7.953826 1.294932 2.802743 ------------------------------------------------------------------ratio = sd(control) / sd(fish oil) f = 0.8101 Ho: ratio = 1 degrees of freedom = 212, 216 hypothesis tested Ha: ratio < 1 Pr(F < f) = 0.0622 Ha: ratio != 1 2*Pr(F < f) = 0.1245 Ha: ratio > 1 Pr(F > f) = 0.9378 (two sided) p-value 46 Comparing the means when variances are unequal Problem: What if the assumption of equal variances is unreasonable? Some solutions: 1. Try to obtain homogeneity of variances by transforming the observations in a suitable way, e.g. by working with log-transformed data. 2. Use an approximate t-test, that does not rely on equal variances. The approximate t-test has the form tapprox  x1  x2 s12 n1  s22 n2 Under the hypothesis of equal means the distribution of this test statistic is approximately equal to a t-distribution. To compute the degrees of freedom for the aproximate t-distribution first compute 47 s12 n1 c 2 s1 n1  s22 n2 the degrees of freedom is then obtained as  c (1  c)     n  1 n  1 2  1  2 f approx 2 1 3. Use a non-parametric test, e.g. a Wilcoxon-Mann-Whitney test. We shall consider solution 1 next time and solution 3 later in the course. The Stata command ttest computes solution 2 if the option unequal is added. Note: When the variances of the two normal distributions differ the hypothesis of equal means are no longer equivalent to the hypothesis of equal distributions. 48 STATA: TWO SAMPLE t-TEST (unequal variances) To compute the approximate t-test (solution 2 above) with Stata write ttest difdia , by(grp) unequal approximate confidence limits Two-sample t test with unequal variances -------------------------------------------------------------------Group | Obs Mea Std. Err Std. Dev. [95% Conf. Interval] ---------+---------------------------------------------------------control | 213 1.901408 .5158685 7.528853 .8845197 2.918297 fish oil | 217 2.193548 .5678467 8.364904 1.074318 3.312778 ---------+---------------------------------------------------------combined | 430 2.048837 .3835675 7.953826 1.294932 2.802743 ---------+---------------------------------------------------------diff | -.2921399 .7671833 -1.800088 1.215808 -------------------------------------------------------------------diff = mean(control) - mean(fish oil) t = -0.3808 Ho: diff = 0 Satterthwaite's degrees of freedom = 424.831 Ha: diff < 0 Pr(T < t) = 0.3518 Ha: diff != 0 Pr(|T| > |t|) = 0.7035 (two sided) p-value approximate t-test Ha: diff > 0 Pr(T > t) = 0.6482 degrees of freedom of the approximate t-test 49 SOME GENERAL COMMENTS ON STATISTICAL TESTS To test a hypothesis we compute a test statistic, which follows a known distribution if the hypothesis is true. We can therefore compute the probability of obtaining a value of the test statistic as least as extreme as the one observed. This probability is called the p-value. The p-value describes the degrees of support of the hypothesis found in the data. The result of the statistical test is often classified as ”statistical significant” or ”non-significant” depending on whether or not the p-value is smaller than a level of significance, often called , and usually equal to 0.05. The hypothesis being tested is often called the null hypothesis. A null hypothesis always represents a simplication of the statistical model. Hypothesis testing is sometimes given a decision theoretic formulation: The null hypothesis is either true or false and a decision is made based on the data. 50 When hypothesis testing is viewed as decisions, two types of error are possible • Type 1 error: Rejecting a true null hypothesis • Type 2 error: Accepting (i.e. not rejecting) a false null hypothesis. The level of significance specifies the risk of a type 1 error. In the usual setting the null hypothesis is tested against an alternative hypothesis which includes different values of the parameter, e.g. H0 :  0 against H A:  0 The risk of a type 2 error depends on which of the alternative values are the true value. The power of a statistical test is 1 minus the risk of type 2 error. When planning an experiment power considerations are sometimes used to determine the sample size. We return to this in the last lecture. Once the data are collected confidence intervals are the appropriate way to summarize the uncertainty in the conclusions. 51 Relation between p-values and confidence intervals In a two sample problem it is tempting to compare the 95% confidence intervals of the two means and conclude that the hypothesis 1  2 is non-significant if the 95% confidence intervals overlap. This is not correct. Overlapping 95% confidence intervals does not imply that the difference is not significant on a 5% level. On the other hand, if the 95% confidence intervals do not overlap, the difference is statistical significant on a 5% level (actually, the p-value is 1% or smaller). This may at first seem surprising, but it is a simple consequence of the fact that for independent samples the result implies that Var ( x  y )  Var ( x )  Var ( y ) se( x  y )  se( x )  se( y ) 52

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download day2-E2005