Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Sufficient statistic wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
A Macro To Perform A T- Test For 2 Independent Samples Using Sufficient Statistics Lan-Feng Tsai, Edwards Lifesciences LLC, Irvine, California Abstract Confidence interval for sample mean = The T-test is a commonly used statistical test to compare the mean of one sample to a predetermined value, the means of paired samples, or the means of 2 independent samples. It is known that the test statistic for the T-test is based on the sample means, sample standard deviations, and sample sizes. Therefore, if only the summary statistics are known, and the raw data are unavailable, the result of the T-test can still be calculated. While SAS procedures require raw data to perform a T-test, a SAS macro to perform a T-test for 2 independent samples using sufficient statistics is proposed. The advantages of performing a Ttest using sufficient statistics are also discussed. Upper and lower confidence interval for sample standard deviation = (n-1)s 2 2 [ % a 1-2··-1 ' Sample standard error = s .Jn Introduction Sample mean difference = A T-test can be performed using only summary statistics because the summary statistics are sufficient, consistent, and unbiased estimators for its normal model. The definition of a sufficient statistic is given as follows (Rice 1995): Pooled sample standard deviation (or standard deviation for sample mean difference): sp = A statistic T(X1 , ... , X,J is said to be sufficient for if the conditional distribution of X1 , ... , Xn , given T=t, does not depend on 0 for any value oft. Pooled sample standard error = 2 (n1 -l)s1 + (n 2 -l)s 2 n1 +n 2 -2 e 2 Calculation Confidence interval for sample mean difference The purpose of this macro is to perform a T-test for 2 independent sample means using sufficient statistics (summary statistics). The theoretical details can be found in statistical textbooks (Arnold 1990, Rice 1995) and will not be discussed here. The following formulas are used to calculate the result of the T-test. 68 F value for folded f statistic: f = Confidence interval for pooled sample standard deviation= P-value for folded f statistic = Degrees of freedom for equal variances = 2 [1-P(fmax(n1-1, n2-1), min(n1-1, n2-2) ::5 f)] Discussion Test statistic for equal variances: t_eq = Normality assumption One of the assumptions of the T-test is normality, and this assumption cannot be examined without the raw data. Therefore, one should keep in mind that the normality assumption might not be valid when comparing 2 independent sample means using summary statistics in a T-test. P-value for equal variances = 2 P(t..1+n2-2 ::5 t_eq) Degrees of freedom for unequal variances: df_uneq = Advantages of performing T-tests using sufficient statistics Sometimes, statisticians obtain only summary statistics from clients, published litemture, or other sources. T-tests can still be performed keeping in mind the potential normality assumption violation. For example, we can compare the result of our product with the results of competitors' products from published journals, companies' websites, or advertisements. This macro can also be a handy tool when we would like to do a quick comparison of 2 sample means or to validate the results ofT-tests with other statistical software. Test statistic for unequal variances: t_uneq = The Macro The PROC TTEST in Version 8 (Appendix I) cannot perform a T-test using just summary statistics without a _STAT_ variable. Therefore, a - STAT- variable must be created along with the summary statistics that are to be input in the macro. The macro parameters MIN, MAX for both groups and the alpha level are not required. However, the numeric missing P-value for unequal variances = 2 P(t.Jr_uneq:St_uneq) Degrees of freedom for folded F statistic: df_f= 69 values "." need to be entered to avoid confusion if MIN's and MAX's are not used. Contact Information Lan-Feng Tsai One Edwards Way Irvine, CA 92614 E-mail: [email protected] The confidence intervals for the standard deviations of the groups are not calculated in the PROC TTEST. However, they can, in fact, be calculated using the formula provided above. This is expected to be solved in SAS Version 9. This macro creates a separate text file containing the confidence intervals for the standard deviations of the groups. A macro (Appendix 2) to perform a T-test using summary statistics in SAS Version 6 is also shown. SAS is a registered trademark of SAS Institute Inc., Cary, NC, USA. Conclusion One sample T-tests and paired sample Ttests can be performed using only summary statistics. More macros for such T-tests will be developed using summary statistics in the future. Acknowledgement The author would like to thank William Anderson PhD, Rita Kristy, Brian Ramos, and Felicia Ho for their generous comments. Reference Arnold, S. F., Mathematical Statistics (1990), Prentice-Hall, Inc., p.366, p.373. Rice, J. A., Mathematical Statistics and Data Analysis, Second Edition (1995), Duxbury Press, p.280, p.388. SAS/STAT User's Guide, Version 8, (1999), SAS Institute Inc. 70 Appendix 1. Version 8 SAS code: *********************************************************• *** ttest8 macro: Perform V8 proc ttest using summary ***;' *** statistics ***; *** Position parameters gl: sample name of group 1 *** nl: sample size of group 1 *** m1: sample mean of group 1 *** *** sl: sample standard error of group 1 il: sample minimum of group 1 *** xl: sample maximum of group l *** g2: sample name of group 2 *** *** n2: sample size of group 2 m2: sample mean of group 2 *** s2: sample standard error of group 2 *** i2: sample minimum of group 2 *** x2: sample maximum of group 2 *** alpha: alpha level (default is 0.05) *** *** *** Note: il, xl, i2, x2 are not required, enter values or for missing. *** Lan-Feng Tsai *** Written by ***·' ***; ***; ***; ***·' ***· ' ***; ***· ' ***; ***; ***·' ***· ' ***; ***· ***·' ***;' ***; ***· *********************************************************·' ' %macro ttest8(g1, n1, ml, s1, il, x1, g2, n2, m2, s2, i2, x2, alpha); data sumstat; %let len=%sysfunc(max(%length(&gl), %length(&g2))); length group $&len •. stat $4.; %do i=1 %to 2; group="&&g&i"; sumstat=&&n&i; _stat_='N'; output; group="&&g&i"; sumstat=&&m&i; _stat_='MEAN'; output; group="&&g&i"; sumstat=&&s&i; stat_='STD'; output; group="&&g&i"; sumstat=&&i&i; _stat_='MIN'; output; group="&&g&i"; sumstat=&&x&i; _stat_='MAX'; output; %end; run; proc print; run; proc ttest %if &alpha ne %then %do; alpha=&alpha %end;; class group; var sumstat; run; data null ; file 'ttestmacro_V8.txt'; gl="&g1"; n1=&nl; s1=&s1; g2="&g2"; n2=&n2; s2=&s2; %if &alpha ne %then %do; lcll=sqrt(((n1-1)*sl**2)/cinv((1-&alpha/2), n1-1)); ucll=sqrt(((nl-l)*s1**2)/cinv(&alpha/2, n1-1)); lcl2=sqrt(((n2-1)*s2**2)/cinv((1-&alpha/2), n2-1)); ucl2=sqrt(((n2-l}*s2**2)/cinv(&alpha/2, n2-1)); %end; %else %do; 1cll=sqrt(((n1-l}*s1**2)/cinv(0.975, nl-1}); ucll=sqrt(((nl-l}*s1**2}/cinv(0.025, n1-1}); lcl2=sqrt(((n2-1}*s2**2)/cinv(0.975, n2-l)); ucl2=sqrt(((n2-l}*s2**2)/cinv(0.025, n2-1)); 71 %end; put @1 'Group' put @1 gl put @1 g2 run; %mend ttestB; @21 'LCL STD' @21 lcll @21 lcl2 @41 'STD' @41 sl @41 s2 @61 'UCL STD'; @61 ucll; @61 ucl2; 2. Version 6 SAS code: ************************* ************************* ********; ***·' *** ttest6 macro: gives similar output as proc ttest ***; V6 using sufficient statistics *** ***; *** position parameters ***· mgl: name of group 1 *** ***·' mn1: sample size of group 1 *** ***·' mml: sample mean of group 1 *** ' ***; 1 ms1: sample standard error of group *** ***• mg2: name of group 2 *** ' ***; mn2: sample size of group 2 *** ***• mm2: sample mean of group 2 *** ' ms2: sample standard error of group 2 ***·' *** ***· *** ***;' Note: specify output file out in a *** ***; FILENAME statement. *** ***• *** written by : Lan-Feng Tsai ' ************************* ************************* ********·' %macro ttest6(mgl, mnl, mml, msl, mg2, mn2, mm2, ms2); data null ; file 'ttestmacro V6.txt'; attrib gl g2 for;at=$8. nl n2 format=S. m1 m2 sl s2 t_uneq t_eq f format=8.2 df_uneq df_eq f_p t_p_uneq t_p_eq format=8.4; format~S.l gl="&mgl"; nl=&mnl; ml=&mml; sl=&msl; g2="&mg2"; n2=&mn2; m2=&mm2; s2=&ms2; vl=sl**2; v2=s2**2; f~ax(of vl, v2)/min(of vl, v2); dfl=nl-1; df2=n2-1; dfmax=max(of dfl, df2); dfmin=min(of dfl, df2); f_p=2*(1-probf(f, dfmax, dfmin)); *** 2-sided ***; v_pool=((nl-l)*v1+(n2-l)* v2)/(nl+n2-2); t uneq=(ml-m2)/sqrt(vl/nl+v 2/n2); t=eq~(ml-m2)/sqrt(v_pool*(l/nl+l/n2)); df uneq=(v1/n1+v2/n2)**2/(( v1/n1)**2/(nl-l)+(v2/n2)* *2/(n2-1)); df=eq=nl+n2-2; *** 2-sided ***•' t_p_uneq=2*(1-probt(abs(t _uneq), df_uneq)); t_p_eq=2*(1-probt(abs(t_e q), df_eq)); @17 'Mean' @25 'Std Err'; @9 'N' put @1 'Group' put ·------------------------ ---------'; @25 sl; @17 ml @9 nl put @1 g1 @25 s2; @17 m2 @9 n2 put @1 g2 put; put; @33 'Prob> IT I ' ; @17 'T' @25 'OF' put @1 'Variance' put '------------------------ ------------------------- --------------'; @33 t_p_uneq; @17 t_uneq @25 df uneq put @1 'Unequal' @33 t_p_eq; @25 df=eq @17 t_eq put @1 'Equal' put; put @1 'For HO: Variances are equal, F" = • f +5 'DF = (' dfmax +(-1) ')' +5 'Prob>F' run; 1 • f_p; %mend ttest6; 72 ',' dfmin +(-1)