Download A Macro to Perform a T-Test for Two Independent Samples Using Sufficient Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Sufficient statistic wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
A Macro To Perform A T- Test For 2 Independent Samples
Using Sufficient Statistics
Lan-Feng Tsai, Edwards Lifesciences LLC, Irvine, California
Abstract
Confidence interval for sample mean =
The T-test is a commonly used statistical test to
compare the mean of one sample to a
predetermined value, the means of paired
samples, or the means of 2 independent
samples. It is known that the test statistic for
the T-test is based on the sample means, sample
standard deviations, and sample sizes.
Therefore, if only the summary statistics are
known, and the raw data are unavailable, the
result of the T-test can still be calculated. While
SAS procedures require raw data to perform a
T-test, a SAS macro to perform a T-test for 2
independent samples using sufficient statistics is
proposed. The advantages of performing a Ttest using sufficient statistics are also discussed.
Upper and lower confidence interval for sample
standard deviation =
(n-1)s 2
2
[
%
a
1-2··-1
'
Sample standard error =
s
.Jn
Introduction
Sample mean difference =
A T-test can be performed using only
summary statistics because the summary
statistics are sufficient, consistent, and unbiased
estimators for its normal model. The definition
of a sufficient statistic is given as follows (Rice
1995):
Pooled sample standard deviation (or standard
deviation for sample mean difference): sp =
A statistic T(X1 , ... , X,J is said to be
sufficient for if the conditional distribution of
X1 , ... , Xn , given T=t, does not depend on 0 for
any value oft.
Pooled sample standard error =
2
(n1 -l)s1 + (n 2 -l)s 2
n1 +n 2 -2
e
2
Calculation
Confidence interval for sample mean difference
The purpose of this macro is to perform
a T-test for 2 independent sample means using
sufficient statistics (summary statistics). The
theoretical details can be found in statistical
textbooks (Arnold 1990, Rice 1995) and will not
be discussed here. The following formulas are
used to calculate the result of the T-test.
68
F value for folded f statistic: f =
Confidence interval for pooled sample standard
deviation=
P-value for folded f statistic =
Degrees of freedom for equal variances =
2 [1-P(fmax(n1-1, n2-1), min(n1-1, n2-2) ::5 f)]
Discussion
Test statistic for equal variances: t_eq =
Normality assumption
One of the assumptions of the T-test is
normality, and this assumption cannot be
examined without the raw data. Therefore, one
should keep in mind that the normality
assumption might not be valid when comparing
2 independent sample means using summary
statistics in a T-test.
P-value for equal variances =
2 P(t..1+n2-2 ::5 t_eq)
Degrees of freedom for unequal variances:
df_uneq =
Advantages of performing T-tests using
sufficient statistics
Sometimes, statisticians obtain only
summary statistics from clients, published
litemture, or other sources. T-tests can still be
performed keeping in mind the potential
normality assumption violation. For example,
we can compare the result of our product with
the results of competitors' products from
published journals, companies' websites, or
advertisements. This macro can also be a handy
tool when we would like to do a quick
comparison of 2 sample means or to validate the
results ofT-tests with other statistical software.
Test statistic for unequal variances: t_uneq =
The Macro
The PROC TTEST in Version 8
(Appendix I) cannot perform a T-test using just
summary statistics without a _STAT_ variable.
Therefore, a - STAT- variable must be created
along with the summary statistics that are to be
input in the macro. The macro parameters MIN,
MAX for both groups and the alpha level are
not required. However, the numeric missing
P-value for unequal variances =
2 P(t.Jr_uneq:St_uneq)
Degrees of freedom for folded F statistic: df_f=
69
values "." need to be entered to avoid confusion
if MIN's and MAX's are not used.
Contact Information
Lan-Feng Tsai
One Edwards Way
Irvine, CA 92614
E-mail: [email protected]
The confidence intervals for the standard
deviations of the groups are not calculated in the
PROC TTEST. However, they can, in fact, be
calculated using the formula provided above.
This is expected to be solved in SAS Version 9.
This macro creates a separate text file
containing the confidence intervals for the
standard deviations of the groups. A macro
(Appendix 2) to perform a T-test using
summary statistics in SAS Version 6 is also
shown.
SAS is a registered trademark of SAS Institute
Inc., Cary, NC, USA.
Conclusion
One sample T-tests and paired sample Ttests can be performed using only summary
statistics. More macros for such T-tests will be
developed using summary statistics in the
future.
Acknowledgement
The author would like to thank William
Anderson PhD, Rita Kristy, Brian Ramos, and
Felicia Ho for their generous comments.
Reference
Arnold, S. F., Mathematical Statistics (1990),
Prentice-Hall, Inc., p.366, p.373.
Rice, J. A., Mathematical Statistics and Data
Analysis, Second Edition (1995), Duxbury
Press, p.280, p.388.
SAS/STAT User's Guide, Version 8, (1999),
SAS Institute Inc.
70
Appendix
1. Version 8 SAS code:
*********************************************************•
*** ttest8 macro: Perform V8 proc ttest using summary ***;'
***
statistics
***;
*** Position parameters
gl: sample name of group 1
***
nl: sample size of group 1
***
m1: sample mean of group 1
***
***
sl: sample standard error of group 1
il: sample minimum of group 1
***
xl: sample maximum of group l
***
g2: sample name of group 2
***
***
n2: sample size of group 2
m2: sample mean of group 2
***
s2: sample standard error of group 2
***
i2: sample minimum of group 2
***
x2: sample maximum of group 2
***
alpha: alpha level (default is 0.05)
***
***
***
Note: il, xl, i2, x2 are not required,
enter values or
for missing.
***
Lan-Feng Tsai
*** Written by
***·'
***;
***;
***;
***·'
***·
'
***;
***·
'
***;
***;
***·'
***·
'
***;
***·
***·'
***;'
***;
***·
*********************************************************·'
'
%macro ttest8(g1, n1, ml, s1, il, x1, g2, n2, m2, s2, i2, x2, alpha);
data sumstat;
%let len=%sysfunc(max(%length(&gl), %length(&g2)));
length group $&len •. stat $4.;
%do i=1 %to 2;
group="&&g&i"; sumstat=&&n&i; _stat_='N'; output;
group="&&g&i"; sumstat=&&m&i; _stat_='MEAN'; output;
group="&&g&i"; sumstat=&&s&i; stat_='STD'; output;
group="&&g&i"; sumstat=&&i&i; _stat_='MIN'; output;
group="&&g&i"; sumstat=&&x&i; _stat_='MAX'; output;
%end;
run;
proc print; run;
proc ttest
%if &alpha ne %then %do;
alpha=&alpha
%end;;
class group;
var sumstat;
run;
data null ;
file 'ttestmacro_V8.txt';
gl="&g1"; n1=&nl; s1=&s1;
g2="&g2"; n2=&n2; s2=&s2;
%if &alpha ne %then %do;
lcll=sqrt(((n1-1)*sl**2)/cinv((1-&alpha/2), n1-1));
ucll=sqrt(((nl-l)*s1**2)/cinv(&alpha/2, n1-1));
lcl2=sqrt(((n2-1)*s2**2)/cinv((1-&alpha/2), n2-1));
ucl2=sqrt(((n2-l}*s2**2)/cinv(&alpha/2, n2-1));
%end;
%else %do;
1cll=sqrt(((n1-l}*s1**2)/cinv(0.975, nl-1});
ucll=sqrt(((nl-l}*s1**2}/cinv(0.025, n1-1});
lcl2=sqrt(((n2-1}*s2**2)/cinv(0.975, n2-l));
ucl2=sqrt(((n2-l}*s2**2)/cinv(0.025, n2-1));
71
%end;
put @1 'Group'
put @1 gl
put @1 g2
run;
%mend ttestB;
@21 'LCL STD'
@21 lcll
@21 lcl2
@41 'STD'
@41 sl
@41 s2
@61 'UCL STD';
@61 ucll;
@61 ucl2;
2. Version 6 SAS code:
************************* ************************* ********;
***·'
*** ttest6 macro: gives similar output as proc ttest
***;
V6 using sufficient statistics
***
***;
*** position parameters
***·
mgl: name of group 1
***
***·'
mn1: sample size of group 1
***
***·'
mml: sample mean of group 1
***
'
***;
1
ms1: sample standard error of group
***
***•
mg2: name of group 2
***
'
***;
mn2: sample size of group 2
***
***•
mm2: sample mean of group 2
***
'
ms2: sample standard error of group 2 ***·'
***
***·
***
***;'
Note: specify output file out in a
***
***;
FILENAME statement.
***
***•
*** written by : Lan-Feng Tsai
'
************************* ************************* ********·'
%macro ttest6(mgl, mnl, mml, msl, mg2, mn2, mm2, ms2);
data null ;
file 'ttestmacro V6.txt';
attrib gl g2 for;at=$8. nl n2 format=S.
m1 m2 sl s2 t_uneq t_eq f format=8.2 df_uneq df_eq
f_p t_p_uneq t_p_eq format=8.4;
format~S.l
gl="&mgl"; nl=&mnl; ml=&mml; sl=&msl;
g2="&mg2"; n2=&mn2; m2=&mm2; s2=&ms2;
vl=sl**2; v2=s2**2;
f~ax(of vl, v2)/min(of vl, v2);
dfl=nl-1; df2=n2-1;
dfmax=max(of dfl, df2); dfmin=min(of dfl, df2);
f_p=2*(1-probf(f, dfmax, dfmin));
***
2-sided ***;
v_pool=((nl-l)*v1+(n2-l)* v2)/(nl+n2-2);
t uneq=(ml-m2)/sqrt(vl/nl+v 2/n2);
t=eq~(ml-m2)/sqrt(v_pool*(l/nl+l/n2));
df uneq=(v1/n1+v2/n2)**2/(( v1/n1)**2/(nl-l)+(v2/n2)* *2/(n2-1));
df=eq=nl+n2-2;
*** 2-sided ***•'
t_p_uneq=2*(1-probt(abs(t _uneq), df_uneq));
t_p_eq=2*(1-probt(abs(t_e q), df_eq));
@17 'Mean' @25 'Std Err';
@9 'N'
put @1 'Group'
put ·------------------------ ---------';
@25 sl;
@17 ml
@9 nl
put @1 g1
@25 s2;
@17 m2
@9 n2
put @1 g2
put; put;
@33 'Prob> IT I ' ;
@17 'T'
@25 'OF'
put @1 'Variance'
put '------------------------ ------------------------- --------------';
@33 t_p_uneq;
@17 t_uneq @25 df uneq
put @1 'Unequal'
@33 t_p_eq;
@25 df=eq
@17 t_eq
put @1 'Equal'
put;
put @1 'For HO: Variances are equal, F" = • f +5 'DF = (' dfmax +(-1)
')' +5 'Prob>F'
run;
1
•
f_p;
%mend ttest6;
72
',' dfmin +(-1)