Download Smoking and Lung Cancer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
z-test and t-test
Xuhua Xia
[email protected]
http://dambe.bio.uottawa.ca
Properties of a Normal Distribution
68.27% of the measurements lie within the range of ,
95.44% lie within 2,
99.73% lie within 3,
50% lie within 0.67,
95% lie within 1.96,
97.5% lie within 2.24,
99% lie within 2.58,
99.5% lie within 2.81,
99.9% lie within 3.29.
Given  = 70kg and  = 10kg for a normal distribution (of body weight), what is
the probability of a body weight of 40 kg belonging to the population?
The normal deviate: Z 
Xi  

Standard deviation and Standard Error of the mean:  X 
2

n
X 
The standard deviate pertaining to the normal distribution of means: Z  i
Xuhua Xia
n

X
The z-score
Z
Xi  
X
 1.96
The government has certain regulations on
commercial product. Suppose that packages of sugar
labeled as 2 kg should have a mean weight of 2 kg
and a standard deviation equal to 0.10. If a package
of sugar labeled 2 kg that you bought from a store
has a weight of 1.82 kg, what is the z score? Can
you present the package as evidence that the
manufacturer has violated the government
regulation?
Xuhua Xia
Normal Distribution
Body Weight
Xuhua Xia
106.89
100.47
94.06
87.64
81.23
74.81
68.40
61.98
55.57
49.15
42.74
36.32
350
300
250
200
150
100
50
0
29.91
Frequency
Body Weight of 10,000 Adult Men
Mean = 70 kg, Std Dev = 10 kg
350
300
250
200
150
100
50
0
Body Weight
Xuhua Xia
106.89
100.47
94.06
87.64
81.23
74.81
68.40
61.98
55.57
49.15
42.74
36.32
s
sx 
n
29.91
Frequency
Frequency Distribution of Means
Darwin’s Breeding Experiment
Wrong method assuming
normal distribution:
 = 20.933;  = 37.744; n = 15;
X 
Z

n

Xi  
X
37.744
 9.75
15

20.933
 2.147  1.96
9.75
Therefore, the mean difference
is significantly larger than
zero, i.e., inbreeding does
reduce seed production.
Xuhua Xia
Species Outbreed Inbreed Difference
100
51
49
1
222
289
-67
2
121
113
8
3
433
417
16
4
222
216
6
5
111
88
23
6
534
506
28
7
432
391
41
8
99
85
14
9
445
416
29
10
112
56
56
11
333
309
24
12
222
147
75
13
422
362
60
14
101
149
-48
15
Is the mean difference
significantly larger than 0?
Problem of Small Samples
I may premise that if we took by
chance a dozen or score of men
belonging to two nations and
measured them, it would I presume
be very rash to form any judgment
from such small numbers on their
(the nation’s) average heights. But
the case is somewhat different with
my … plants, as they were exactly of
the same age, were subjected from
first to last to the same conditions,
and were descended from the same
parents.
-- Darwin, quoted in Fisher’s The design of experiments.
Xuhua Xia
Species Outbreed Intbreed Difference
100
51
49
1
222
289
-67
2
121
113
8
3
433
417
16
4
222
216
6
5
111
88
23
6
534
506
28
7
432
391
41
8
99
85
14
9
445
416
29
10
112
56
56
11
333
309
24
12
222
147
75
13
422
362
60
14
101
149
-48
15
William S. Gosset & t Distribution
t distribution is wider and flatter than the normal distribution
350
300
250
200
150
100
50
0
Xuhua Xia
Body Weight
106.89
100.47
94.06
87.64
81.23
74.81
68.40
61.98
55.57
49.15
42.74
36.32
t distribution
29.91
Frequency
Normal distribution
t distribution
• The t distribution depends on the degree of freedom (DF). For
Darwin’s data with a sample size = 15, DF = 15 - 1 = 14.
• With the t distribution with DF = 14, we expect 95% of the
observations should fall within the range of mean  2.145
STD.
• Remember that for a normal distribution, 95% of the
observations are expected to fall within the range of   1.96
.
• For pair-sample t-test with the null hypothesis being Mean1 =
Mean2 (or MeanD = 0):
t
Xuhua Xia
D  0 20.933

 2.147  2.145
sX
9.75
T-Test
• T-Test can be used to test
– the difference in mean between two samples (paired or unpaired),
– a sample mean against a mean of a known population (e.g., the
concentration of a medicine set as a standard by the government),
– whether a single individual observation belong to a sample with
sample size larger than one.
• The normal distribution and the Student’s t distribution. Why
should the statistic t take into consideration both the mean
difference and the variance?
• How to apply the test using Excel or SAS.
• The assumptions.
• Alternative methods: Wilcoxon rank-sum test or MannWhitney U test.
Xuhua Xia
The Essence of the t Statistic
Same variance,
smaller mean difference
-6
-4
-2
0
2
4
6
-6
-4
-2
0
2
4
Same mean difference,
larger variance
-18
-12
Xuhua Xia
-6
0
6
12
18
X1  X 2
t
pooled SX
6
More on variance and SE
Two independent variables: x1, x2 sampled from two normal distributions
sx21  x2  E ( sx21  sx22 )
sx21  x2  E ( sx21  sx22 )
A better estimate:
s
2
x1  x2
 SS1 SS2  SS1  SS2
 E (s  s )  E 


DF
DF
DF1  DF2
 1
2 
S x1 
2
x1
sx21
n1
; S x2 
2
x2
sx22
n2
with n1  n2  n : S x1  x2 
sx21  sx22
n
with n1  n2 , but both large: S x1  x2 
sx21
n1

Estimate of S x1  x2 assuming equal variance:
S x1  x2 
sx21  x2
n1
Xuhua Xia

sx21  x2
n2
sx22
n2
Computation for unpaired t-test
Sample 1
Sample 2
Sample size
n1
n2
Mean
x1
x2
Standard dev.
s1
s2
Sample size
7
7
Mean
76.857
82.714
Standard dev.
2.545
3.147
t
(76.857  82.714)
2.545  3.147
7
2
2
 3.828
Df = (7-1) + (7-1) = 12
Xuhua Xia
x1  x2
t
S x1  x2
with n1  n2  n : S x1  x2 
sx21  sx22
n
with n1  n2 , but both large: S x1  x2 
sx21
n1

Estimate of S x1  x2 assuming equal variance:
S x1  x2 
sx21  x2
n1

sx21  x2
n2
sx22
n2
Paired-sample t-test: 3
Using blocks to reduce confounding environmental factors (Everything else
being equal except for the treatment effect) in evaluating the protein content
of two wheat variaties.
1
1
1
1
1
2
1
2
1
2
1
2
2
2
2
2
Block 1
Block 2
Block 3
Block 4
1
2
2
1
2
2
1
1
1
2
2
1
2
1
2
1
Block 1
Block 2
Block 3
Block 4
How should we allocate the two crop varieties to the plots? What comparison would be fair?
Xuhua Xia
The Wilcoxon-Mann-Whitney Test
• Statistical significance tests can be grouped into
– Parametric tests, e.g., t-test, ANOVA
– Non-parametric tests, e.g., Wilcoxon-Mann-Whitney test,
sign test, runs test.
Xuhua Xia
When to Use Non-parametric Tests
• Parametric tests depends on the assumed probability
distributions, e.g., normal distribution, t distribution,
etc, and would give misleading results when the
assumptions are violated.
• Non-parametric tests are called distribution-free tests
and can be used in cases where the parametric tests
are inappropriate.
• Parametric tests are more powerful than their nonparametric counterparts when the underlying
assumptions are met.
Xuhua Xia
Wilcoxon-Mann-Whitney Test
• The Wilcoxon-Mann-Whitney test is the nonparametric equivalent of the t-test.
• The original data are rank-transformed before
applying the test
• The test statistic is U
Xuhua Xia
Related documents