Download HANDY REFERENCE SHEET – HRP 259

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Analysis of variance wikipedia , lookup

Student's t-test wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Handy Reference II
HANDY REFERENCE SHEET 2 – HRP 259
Calculation Formula’s for Sample Data:
Univariate:
n

i 1
Sample proportion: pˆ 
1 if success
xi  
 0 if failure
n
n
x
Sample mean: x =
i
i 1
n
n
Sum of squares of x: SS x   ( xi  x ) 2
[to ease computation: SS x 
i 1
n
 (x
SS x
Sample variance: s x2 =
=
n 1
i
n
x
2
i
 nx 2 ]
i 1
 x)2
i 1
n 1
n
SS x
Sample standard deviation: s x =
=
n 1
 (x
 x)2
i
i 1
n 1
n
 (x
 x)2
i
i 1
sx
Standard error of the sample mean:
n 1
n
=
n
2. Bivariate
n
Sum of squares of xy: SS xy   ( xi  x )( y i  y )
[to ease computation: SS xy

i 1
n
Sample Covariance:
2
s xy
=
SS xy
n 1
 (x
=
i
Sample Correlation: rˆ 
s x2 s 2y
=
x y
i
i
 nx y ]
i 1
 x )( y i  y )
i 1
n 1
n
2
s xy
n
SS xy
SS x SS y

 (x
i
 x )( y i  y )
i 1
n

i 1

( xi  x ) 2
n
(y
i
 y) 2
i 1
Variance rules for correlated random variables:
Var (x+y)=Var(x)+Var(y)+2Cov(x,y); Var (x-y)=Var(x)+Var(y)-2Cov(x,y)
vii
Handy Reference II
Hypothesis Testing
The Steps:
1. Define your hypotheses (null, alternative)
2. Specify your null distribution
3. Do an experiment
4. Calculate the p-value of what you observed
5. Reject or fail to reject (~accept) the null hypothesis
The Errors
Your Statistical
Decision
Reject H0
Do not reject H0
True state of null hypothesis
H0 True
H0 False
Type I error ()
Correct
Correct
Type II Error ( )
Power=1-
viii
Handy Reference II
Confidence intervals (estimation)
For a mean (σ2 unknown):
x  t n 1, / 2 
sx
[if variance known or large sample size t df , / 2
n
 Z / 2 ]
For a paired difference (σ2 unknown):
d  t n 1, / 2 
sd
[where
n
di
= the within-pair difference]
For a difference in means, 2 independent samples (σ2’s unknown but roughly equal):
( x  y )  t n  2, / 2 
s 2p
nx

s 2p
ny
s 2p =
SS x  SS y
or
n2
(n x  1) s x2  (n y  1) s 2y
n2
For a proportion:
( pˆ )(1  pˆ )
n
pˆ  Z  / 2 
For a difference in proportions, 2 independent samples:
( pˆ 1  pˆ 2 )  Z  / 2 
( pˆ 1 )(1  pˆ 1 ) ( pˆ 2 )(1  pˆ 2 )

n1
n2
For a correlation coefficient
rˆ  t n  2, / 2 *
1  rˆ 2
n2
For a regression coefficient:
n
ˆ  t n  2, / 2
s2
*
SS x
Common values of t and Z
Confidence
t10, / 2
level
90%
1.81
95%
2.23
99%
3.17
[ ˆ 
SS xy
SS x
; s2 
(y
i
 yˆ i ) 2
i 1
n2
t 20, / 2
t 30, / 2
t 50, / 2
t100, / 2
Z / 2
1.73
2.09
2.85
1.70
2.04
2.75
1.68
2.01
2.68
1.66
1.98
2.63
1.64
1.96
2.58
]
ix
Handy Reference II
For an odds ratio:
95% confidence limits: OR * exp

1 1 1 1
   
 1.96
a b c d 

, OR * exp

1 1 1 1
   
 1.96
a b c d 

For a risk ratio:
95% confidence limits: RR * exp

1 a /( a b ) 1c /(c  d ) 

 1.96

a
c


, RR * exp

1 a /(a b ) 1c /(c  d ) 

 1.96

a
c


x
Handy Reference II
Corresponding hypothesis tests
Test for Ho: μ= μo (σ2 unknown):
t n 1 
x  0
sx
n
Test for Ho: μd = 0 (σ2 unknown):
t n 1 
d 0
sd
n
Test for Ho: μx- μy = 0 (σ2 unknown, but roughly equal):
t n2 
(x  y)  0
s 2p
nx

s 2p
ny
Test for Ho: p = po:
Z
pˆ  p 0
( p 0 )(1  p 0 )
n
Test for Ho: p1- p2= 0:
Z
( pˆ 1  pˆ 2 )  0
( p)(1  p) ( p )(1  p )

n1
n2
;p
n1 pˆ 1  n2 pˆ 2
n1  n2
Test for Ho: r = 0:
t n2 
rˆ  0
1  rˆ 2
n2
Test for: Ho: β = 0
t n2 
ˆ  0
s2
SS x
xi
Handy Reference II
Corresponding sample size/power
Sample size required to test Ho: μd = 0 (paired difference ttest):
n
 d2 ( Z power  Z  / 2 ) 2
d 2
Corresponding power for a given n:
Z power 
d
d
n  Z / 2
Smaller group sample size required to test Ho: μx – μy = 0 (two sample ttest):
(where r=ratio of larger group to smaller group)
n smaller 
2
2
(r  1)  ( Z power  Z  / 2 )
r
( x   y ) 2
Corresponding power for a given n:
Z power 
x  y

nr
 Z / 2
r 1
Smaller group sample size required to test Ho: p1 – p2 = 0 (difference in two proportions):
(where r=ratio of larger group to smaller group)
2
(r  1) p (1  p )(Z power  Z  / 2 )
n smaller 
r
( p1  p 2 ) 2
Corresponding power for a given n:
Z power 
p1  p 2
p (1  p )
nr
 Z / 2
r 1
Sample size required to test Ho: r = 0 (correlation/equivalent to simple linear regression):
(where r=ratio of larger group to smaller group)
n
(1  r ) 2 ( Z power  Z  / 2 ) 2
r2
2
Corresponding power for a given n:
Z power 
r
1 r2
Common values of Zpower
Zpower:
.25
Power:
60%
n  2  Z / 2
.52
70%
.84
80%
1.28
90%
1.64
95%
2.33
99%
xii
Handy Reference II
Linear regression
Source of
variation
Model
Sum of squares
d.f.
k-1
SSM  n
(y
i 1
(k levels of X)
Error
k
N-k
SSE 
 y) 2
SSM
F-statistic
SSM
k 1
SSE
N
(y
i i
Mean Sum of
Squares
 yˆ i ) 2
ij
s 2  SSE
p-value
Go to
k 1
Fk-1,N-k
chart
N k
N k
j 1
Total variation
N-1
n
TSS= SS y   ( y i  y ) 2
i 1
Assumptions of Linear Regression
Linear regression assumes that…
1. The relationship between X and Y is linear
2. Y is distributed normally at each value of X
3. The variance of Y at every value of X is the same (homogeneity of variances)
ANOVA TABLE
Source of
variation
Between
d.f.
k-1
Sum of squares
SSB  n
(y
i
 y) 2
i 1
(k groups)
Within
k
nk-k
SSW 
k
Mean Sum of
Squares
SSB
k 1
F-statistic
SSB
SSW
n

( y ij  y i  ) 2
s 2  SSW
k 1
nk  k
p-value
Go to
Fk-1,nk-k
chart
nk  k
i 1 j 1
Total
variation
nk-1
k
n
TSS= SS y   ( y ij  y ) 2
i 1 j 1
Coefficient of Determination: r 2  R 2 
variation explained by the predictor
SSB 1  SSW
=

total variation in the outcome
TSS
TSS
xiii
Handy Reference II
ANOVA TABLE FOR linear regression (more general) case
Coefficient of Determination:
r 2  R 2 
variation explained by the predictor
total variation in the outcome

SSM 1  SSE

TSS
TSS
xiv
Handy Reference II
Probability distributions often used in statistics:
T-distribution
Given n independent observations x i , t 
x
s/ n
The Chi-Square Distribution
n
 n   Z 2 ; where Z~ Normal(0,1)
i 1
E(χn) = n
Var(χn) = 2n
The F- Distribution
n
Fn,m=
m
n
m
xv
Handy Reference II
Summary of common statistical tests for epidemiology/clinical research:
Choice of appropriate statistical test or measure of association for various types of data by study design.
Types of variables to be analyzed
Predictor (independent)
variable/s
Outcome (dependent)
variable
Statistical procedure
or measure of association
Cross-sectional/case-control studies
Binary
Continuous
T-test*
Categorical
Continuous
ANOVA*
Continuous
Continuous
Simple linear regression
Multivariate
(categorical and
continuous)
Continuous
Multiple linear regression
Categorical
Categorical
Chi-square test§
Binary
Binary
Odds ratio, Mantel-Haenszel OR
Multivariate (categorical
and continuous)
Binary
Logistic regression
Cohort Studies/Clinical Trials
Binary
Binary
Relative risk
Categorical
Time-to-event
Kaplan-Meier curve/ log-rank
test
Multivariate (categorical
and continuous)
Time-to-event
Cox-proportional hazards model
Categorical
Continuous—repeated
Repeated-measures ANOVA
Multivariate (categorical
and continuous)
Continuous—repeated
Mixed models for repeated
measures
*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.
§
Fisher’s exact test is used when the expected cells contain less than 5 subjects.
16
Handy Reference II
Course coverage in the HRP statistics sequence:
Choice of appropriate statistical test or measure of association for various types of data by study design.
Types of variables to be analyzed
Predictor (independent)
variable/s
Outcome (dependent)
variable
Statistical procedure
or measure of association
Cross-sectional/case-control studies
Binary
Continuous
T-test*
Categorical
Continuous
ANOVA*
Continuous
Continuous
Simple linear regression
Multivariate
(categorical and
continuous)
Continuous
Multiple linear regression
Categorical
Categorical
Chi-square test§
Binary
Binary
Odds ratio, Mantel-Haenszel OR
Multivariate (categorical
and continuous)
Binary
HRP259
Logistic regression
HRP261
Cohort Studies/Clinical Trials
Binary
Binary
Risk ratio
Categorical
Time-to-event
Kaplan-Meier curve/ log-rank
test
Multivariate (categorical
and continuous)
Time-to-event
Cox-proportional hazards model
(hazard ratios)
Categorical
Continuous—repeated
Repeated-measures ANOVA
Multivariate (categorical
and continuous)
Continuous—repeated
Mixed models for repeated
measures
HRP262
*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.
§
Fisher’s exact test is used when the expected cells contain less than 5 subjects.
17
Handy Reference II
Corresponding SAS PROCs:
Choice of appropriate statistical test or measure of association for various types of data by study design.
Types of variables to be analyzed
Statistical procedure
or measure of association
Predictor
SAS PROC
Outcome
Cross-sectional/case-control studies
Binary
Continuous
T-test*
PROC TTEST
Categorical
Continuous
ANOVA*
PROC ANOVA
Continuous
Continuous
Simple linear regression
PROC REG
Multivariate
(categorical
/continuous)
Continuous
Categorical
Categorical
Chi-square test§
PROC FREQ
Binary
Binary
Odds ratio, Mantel-Haenszel OR
PROC FREQ
Multivariate
(categorical/
continuous)
Binary
Logistic regression
PROC LOGISTIC
Multiple linear regression
PROC GLM
Cohort Studies/Clinical Trials
Binary
Binary
Risk ratio
PROC FREQ
Categorical
Time-to-event
Kaplan-Meier curve/ log-rank test
PROC LIFETEST
Cox-proportional hazards model
(hazard ratios)
PROC PHREG
Multivariate
(categorical and Time-to-event
continuous)
Categorical
Continuous—
repeated
Multivariate
Continuous—
(categorical and
repeated
continuous)
Repeated-measures ANOVA
PROC GLM
Mixed models for repeated measures
PROC MIXED
*Non-parametric equivalents: PROC NPAR1WAY; §Fisher’s exact test: PROC FREQ, option: exact
18