Download 03/26/2009

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics
March 26, 2009
Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D.
Nemours Bioinformatics Core Facility
Nemours Biomedical Research
Hypothesis Testing (Quantitative
variable)
Hypothesis Testing Procedure
One group sample
Two-group sample
More than two groups sample
One sample t-test
Independent
Not Independent
Analysis of variance
Sign test
Two-sample t-test
Paired t-test
Kruskal Wallis test
Mann-Whitney U test
Wilcoxon Signed Rank test
Nemours Biomedical Research
One vs two sided tests
• One-sided
– One direction of effect (e.g. mean efficacy of treatment group
is greater than the mean efficacy of placebo group)
– Greater power to detect difference in expected direction
• Two-sided
– Effect could be in either direction (e.g. mean efficacy of
treatment group is not equal to the mean efficacy of placebo
group)
– More conservative
Nemours Biomedical Research
One vs two sided tests
One group
Two groups
One sided
A single mean differs
from a known value in a
specific direction. e.g.
mean > 0 or median > 0
Two means differ from
one another in a specific
direction. e.g.,
mean2 < mean1
median2 < median1
Two sided
A single mean differs
from a known value in
either direction. e.g.,
mean ≠ 0
median ≠ 0
Two means are not
equal. E.g.,
mean1 ≠ mean2
median1 ≠ median2
Nemours Biomedical Research
Single group sample - one sample ttest
• Test for value of a single mean
• E.g., to see if mean SBP of all AIDHC employees is
120 mm Hg
• Assumptions
– Parent population is normal
– Sample observations (subjects) are independent
Nemours Biomedical Research
Single group sample-one sample ttest
• Formula
Consider a sample of size n from a normal population
with mean µ and variance σ2, then the following statistic
is distributed as Student’s t with (n-1) degrees of
freedom.
x
t
s/ n
x is the sample mean and s is the sample standard deviation
Nemours Biomedical Research
One group sample - Sign Test
(Nonparametric)
• Use:
(1) Compares the median of a single group with a
specified value (instead of single sample t-test).
• Hypothesis: H0:Median = c
Ha:Median  c
• Test Statistic:
We take the difference of observations from median (xi c). The number of positive difference follows a Binomial
distribution. For large sample size, this distribution
follows normal distribution. (Rcmdr does not have a
default sign test)
Nemours Biomedical Research
One group sample: Rcmdr
demonstration
• Load data default to Rcmdr and then select• Statistics->Means->single sample t-test. From this window,
select variable (e.g.hgt), alternative hypothesis, value for null
hypothesis (e.g. 50).
data: data$hgt
t = -2.6925, df = 59, p-value = 0.009217
alternative hypothesis: true mean is not equal
to 50
95 percent confidence interval:
46.50166 49.48460
sample estimates:
mean of x
47.99313
•
Nemours Biomedical Research
Two-group (independent) samples two-sample t-statistic
• Use
– Test for equality of two means
• Assumptions
– Parent population is normal
– Sample observations (subjects) are independent.
Nemours Biomedical Research
Two-group (independent) samples two-sample t-statistic
•
Formula (two groups)
– Case 1: Equal Population Standard Deviations:
• The following statistic is distributed as t distribution with (n1+n2 -2) d.f.
t
( x1  x2 )
1 1
Sp

n1 n2
The pooled standard deviation,
(n1  1) S12  (n2  1) S 22
Sp 
n1  n2  2
n1 and n2 are the sample sizes and S1 and S2 are the sample standard
deviations of two groups.
Nemours Biomedical Research
Two-group (independent) samples two-sample t-statistic
• Formula (two groups)
– Case 2: Unequal population standard deviations
• The following statistic follows t distribution.
t
( x1  x2 )  ( 1   2 )
s12 s22

n1 n2
• The d.f. of this statistic is,
s

2
/ n1  s / n2
v 2
( s1 / n1 ) 2 ( s22 / n2 ) 2

n1  1
n2  1
2
1
2
2
Nemours Biomedical Research
Two-sample (independent) t-statistic
• Statistics->Means-> Independent sample t test
• From the small window, select response variable (e.g.
PLUC.pre), select group variable (e.g. sex) and select
alternative hypothesis type (e.g. two sided), confidence level
(e.g. .95), and Assume equal variance (e.g yes)
Two Sample t-test
data: PLUC.pre by sex
t = -0.4866, df = 58, p-value = 0.6284
alternative hypothesis: true difference in means is
not equal to 0
95 percent confidence interval:
-1.4004136 0.8527181
sample estimates:
mean in group f mean in group m
9.606148
9.879995
Nemours Biomedical Research
Two-group (independent) samples- Wilcoxon
Rank-Sum Test (Nonparametric)
• Use: Compares medians of two independent groups.
• Corresponds to t-test for 2 Independent sample means
• Test Statistic:
Consider two samples of sizes m and n. Suppose N=m+n. Compute
the rank of all N observations. Then, the statistic,
Wm= Sum of the ranks of all observations of the first variable (size
m).
Nemours Biomedical Research
Two-group (independent) samples- Wilcoxon
Rank-Sum Test (Nonparametric)
• Statistics-> Nonparametric tests->Two sample Wilcoxon test ->
select response variable (e.g. PLUC.pre), group variable (e.g.
sex), Alternative hypothesis (e.g. two sided), Type of Test (e.g.
exact).
Wilcoxon rank sum test
data: PLUC.pre by sex
W = 439, p-value = 0.8776
alternative hypothesis: true location shift is
not equal to 0
Nemours Biomedical Research
Two-group (matched) samples - paired
t-statistic
• Use: Compares equality of means of two matched or
paired samples (e.g. pretest versus posttest)
• Assumptions:
– Parent population is normal
– Sample observations (subjects) are independent
Nemours Biomedical Research
Two-group (matched) samples - paired
t-statistic
• Formula
– The following statistic follows t distribution with n-1 d.f.
d
t
sd / n
Where, d is the difference of two matched samples and Sd is the standard
deviation of the variable d.
Nemours Biomedical Research
Paired t-test:Rcmdr demo
• Statistics -> Means->paired t-test-> pick First variable (e.g.
PLUC.pre), pick second variable (e.g. PLUC.post), Alternative
Hypothesis (e.g. Two sided), Confidence level (e.g. 0.95).
data: data$PLUC.pre and data$PLUC.post
t = -5.4953, df = 59, p-value = 8.739e-07
alternative hypothesis: true difference in
means is not equal to 0
95 percent confidence interval:
-3.275342 -1.526769
sample estimates:
mean of the differences
-2.401056
Nemours Biomedical Research
Two-group (matched) samples Wilcoxon
Signed-Rank Test (Nonparametric)
• USE:
– Compares medians of two paired samples.
• Test Statistic
– Obtain Differences of two variables, Di = X1i - X2i
– Take Absolute Value of Differences, Di
– Assign Ranks to absolute values (lower to higher), Ri
– Sum up ranks for positive differences (T+) and negative
differences (T-)
• Test Statistic is smaller of T- or T+ (2-tailed)
Nemours Biomedical Research
Two-group (matched) samples Wilcoxon
Signed-Rank Test (Nonparametric)
• Statistics -> Nonparametric tests ->Paired sample Wilcoxon
test -> select first variable (e.g. PLUC. Pre) and second
variable (e.g. PLUC.post), Alternative hypothesis (e.g.two
sided), type of test (e.g. normal approximation)
median(data$PLUC.pre - data$PLUC.post, na.rm=TRUE) #
median difference
[1] -2.283691
> wilcox.test(data$PLUC.pre, data$PLUC.post,
alternative='two.sided',
+
paired=TRUE)
Nemours Biomedical Research
Comparing more than two independent
samples: F statistic (Parametric)
• Use:
– Compare means of more than two groups
– Test the equality of variances.
Nemours Biomedical Research
Comparing > 2 independent
samples: F statistic (Parametric)
• Let X and Y be two independent Chi-square variables with n1 and n2 d.f.
respectively, then the following statistic follows a F distribution with n1
and n2 d.f.
X / n1
Fn1 ,n2 
Y / n2
• Let, X and Y are two independent normal variables with sample sizes
n1 and n2. Then the following statistic follows a F distribution with n1
and n2 d.f.
2
Fn1 ,n2
sx
 2
sy
Where, sx2 and sy2 are sample variances of X and Y.
Nemours Biomedical Research
Comparing > 2 independent samples: F
statistic (Parametric)
• Hypotheses:
H0: µ1= µ2=…. =µn
Ha: µ1≠ µ2 ≠ …. ≠µn
 Comparison will be done using analysis of
variance (ANOVA) technique.
 ANOVA uses F statistic for this comparison.
 The ANOVA technique will be covered in
another class session.
Nemours Biomedical Research
More than two independent samples:
One-way analysis of variance (ANOVA)
• Create a new variable: PLUC.chng=PLUC.post – PLUC.pre
(Data-> Manage variable in active dataset-> Compute new
variable -> New variable name (on the left): PLUC.chng =
PLUC.post – PLUC.pre (on the right).
• For ANOVA: Statistics -> means -> One-way ANOVA -> Select
model, response variable (e.g. PLUC.chng), group variable (e.g.
grp), pairwise comparison of means.
Nemours Biomedical Research
More than two independent samples:
One-way analysis of variance (ANOVA)
AnovaModel.5 <- aov(PLUC.chng ~ grp, data=data)
Df Sum Sq
> summary(AnovaModel.5)
grp
2 93.64
Residuals
57 582.17
--Signif. codes: 0 '***'
0.1 ' ' 1
Mean Sq F value
46.82
10.21
Pr(>F)
4.5839 0.01426 *
0.001 '**' 0.01 '*' 0.05 '.'
> numSummary(data$PLUC.chng , groups=data$grp,
statistics=c("mean", "sd"))
mean
sd n
1 1.023051 3.686748 20
2 2.132612 2.509201 20
3 4.047504 3.279040 20
Contd.
Nemours Biomedical Research
More than two independent samples:
One-way analysis of variance (ANOVA)
threegrp <- aov(PLUC.chng ~ grp, data=data)
> summary(threegrp)
Df Sum Sq Mean Sq F value Pr(>F)
grp
2 93.64
46.82 4.5839 0.01426 *
Residuals
57 582.17
10.21
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.'
0.1 ' ' 1
> numSummary(data$PLUC.chng , groups=data$grp,
statistics=c("mean", "sd"))
mean
sd n
1 1.023051 3.686748 20
2 2.132612 2.509201 20
3 4.047504 3.279040 20
Contd.
Nemours Biomedical Research
More than two independent samples:
One-way analysis of variance (ANOVA)
95% family-wise confidence level
2-1
(
)
(
3-1
)
(
3-2
-1
)
0
1
2
3
4
5
Nemours
Biomedical
Research
Linear
Function
More than two groups: Nonparametric
Kruskal-Wallis Test
• Compares median of three or more groups or (means of ranks
of three or more groups)
• Rcmdr Demo: Statistics -> Nonparametric tests ->Kruskal-Wallis
test and then select response and group variables etc.
> kruskal.test(PLUC.chng ~ grp, data=data)
Kruskal-Wallis rank sum test
data: PLUC.chng by grp
Kruskal-Wallis chi-squared = 6.5452, df = 2, p-value = 0.03791
Nemours Biomedical Research
Proportion Tests
• Use
– Test for equality of two Proportions
• E.g. proportions of subjects in two treatment groups who
benefited from treatment.
– Test for the value of a single proportion
• E.g., to test if the proportion of smokers in a population
is some specified value (less than 1)
Nemours Biomedical Research
Proportion Tests
• Formula
– One Group:
z
pˆ  p0
p0 (1  p0 )
n
– Two Groups:
z
pˆ 1  pˆ 2
1 1
ˆp (1  pˆ )(  )
n1 n2
x1  x2
where pˆ 
.
n1  n2
Nemours Biomedical Research
Proportion Test
• Statistics-> proportions -> single sample (e.g.sex) or two
sample-> Select, variable, Alternative hypothesis, null
hypothesis, Confidence Level, Type of test.
> .Table
sex
f m
30 30
> binom.test(rbind(.Table), alternative='two.sided', p=.5,
conf.level=.95)
Exact binomial test
data: rbind(.Table)
number of successes = 30, number of trials = 60, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.368062 0.631938
sample estimates:
probability of success
0.5
Nemours Biomedical Research
Chi-square statistic
• USE
– Testing the population variance σ2= σ02.
– Testing the goodness of fit.
– Testing the independence/ association of attributes
• Assumptions
– Sample observations should be independent.
– Cell frequencies should be >= 5.
– Total observed and expected frequencies are equal
Nemours Biomedical Research
Chi-square statistic
• Formula: If xi (i=1,2,…n) are independent and normally distributed
with mean µ and standard deviation σ, then,
 xi   
2
is
a

distributi on with n d.f.



 
i 1 
2
n
• If we don’t know µ, then we estimate it using a sample mean and
 xi  x 
2
then,
is
a

distributi on with (n - 1) d.f.



 
i 1 
n
2
Nemours Biomedical Research
Chi-square statistic
• For a contingency table, we use the following chi- square
test statistic,
2
(
O

E
)
i
2   i
, distribute d as  2 with (n - 1) d.f.
Ei
i 1
n
Oi  Observed Frequency
Ei  Expected Frequency
Nemours Biomedical Research
Chi-square statistic
Female
O(E)
11 (10)
Total
Group 1
Male
O(E)
9 (10)
Group 2
8 (10)
12 (10)
20
Group 3
11 (10)
9(10)
20
30
30
60
Nemours Biomedical Research
20
Chi-square statistic – calculation of
expected frequency
• To obtain the expected frequency for any cell, use:
• Corresponding (row total X column total) / grand total
• E.g: cell for group 1 and female, substituting: (30 X
20 / 60) = 10
Nemours Biomedical Research
Chi-square statistic:Rcmdr demonstration
• Statistics->Contingency tables -> Two way table -> Pick
row and column variable, select type of cell percentage
you want (e.g. row), hypothesis tests (e.g. chi-square)
> rowPercents(.Table) # Row Percentages
grp
sex
1
2
3 Total Count
f 30.0 40.0 30.0 100.0
30
m 36.7 26.7 36.7 100.1
30
> .Test <- chisq.test(.Table, correct=FALSE)
> .Test
Pearson's Chi-squared test
data: .Table
X-squared = 1.2, df = 2, p-value = 0.5488
Nemours Biomedical Research
Thank you
Nemours Biomedical Research