Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 9
Inferences Based
on
Two Samples
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.1
z Tests and Confidence
Intervals for a
Difference Between
Two Population Means
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Difference Between Two
Population Means
Assumptions:
1. X1,…,Xm is a random sample from a
2
population with 1 and  1 .
2. Y1,…,Yn is a random sample from a
2
population with  2 and  2 .
3. The X and Y samples are independent
of one another
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Expected Value and Standard
Deviation of X  Y
The expected value is 1  2 .
So X  Y is an unbiased estimator of
1  2 .
The standard deviation is
 X Y 

2
1
m


2
2
n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Test Procedures for Normal
Populations With Known Variances
Null hypothesis:
H 0 : 1  2  0
Test statistic value: z 
x  y  0

2
1
m


2
2
n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
 () = P(Type II Error)
 (  1  2 )
Alt. Hypothesis
H a : 1  2  0
H a : 1  2  0
H a : 1  2  0
   0 

  z 

 

   0 

1     z 

 

   0 

  z / 2 




   0 

   z / 2 

 

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Large-Sample Tests
The assumptions of normal population
distributions and known values of  1 ,  2
are unnecessary. The Central Limit
Theorem guarantees that X  Y has
approximately a normal distribution.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Large-Sample Tests
Use of the test statistic value
z
x  y  0
2
1
2
2
s
s

m n
m, n >40
along with previously stated rejection
regions based on z critical values give
large-sample tests whose significance
levels are approximately .
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval for 1  2
Provided m and n are large, a CI for
1  2 with a confidence level of
100(1   )% is
x  y  z / 2
2
1
2
2
s
s

m n
confidence bounds can be found by
replacing z / 2 by z .
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.2
The Two-Sample
t Test and
Confidence Interval
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Assumptions
Both populations are normal, so that
X1,…,Xm is a random sample from a
normal distribution and so is Y1,…,Yn.
The plausibility of these assumptions can
be judged by constructing a normal
probability plot of the xi’s and another of
the yi’s.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
t Distribution
When the population distributions are
both normal, the standardized variable
T
X  Y  ( 1   2 )
S12 S22

m n
has approximately a t distribution…
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
t Distribution
df v can be estimated from the data
by
2
2 2
v
s
2
1
 s1 s2 
  
m n
/ m
m 1
2
s


2
2
/ n
2
n 1
(round down to the nearest integer)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Two-Sample CI for 1  2
The two-sample CI for 1   2
with a confidence level of 100(1   )%
is
x  y  t / 2,v
2
1
2
2
s
s

m n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Two-Sample t Test
Null hypothesis:
H 0 : 1  2  0
Test statistic value: z 
x  y  0
2
1
2
2
s
s

m n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Two-Sample t Test
Alternative
Hypothesis
Rejection Region for
Approx. Level  Test
H a :   0   0
t  t ,v
H a :   0   0
t  t ,v
H a :   0   0 t  t / 2,v or t  t / 2,v
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Pooled t Procedures
Assume two populations are normal and
2
have equal variances. If  denotes the
common variance, it can be estimated
by combining information from the twosamples. Standardizing X  Y using
the pooled estimator gives a t variable
based on m + n – 2 df.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.3
Analysis
Paired Data
of
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Paired Data (Assumptions)
The data consists of n independently
selected pairs (X1,Y1),…, (Xn,Yn), with
E ( X i )  1 and E (Yi )  2
Let D1 = X1 – Y1, …, Dn = Xn – Yn.
The Di’s are assumed to be normally
distributed with mean value  D and
2
variance  D .
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Paired t Test
Null hypothesis:
H 0 :  D  0
d  0
Test statistic value: t 
sD / n
d and sD are the sample mean
and standard deviation of the di’s.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Paired t Test
Alternative
Hypothesis
Rejection Region for
Level  Test
H a :  D  0
t  t ,n 1
H a :  D  0
t  t , n 1
H a :  D  0
t  t / 2, n 1 or t  t / 2,n 1
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval for  D
The paired t CI for
 D is
d  t / 2,n1  sD / n
confidence bounds can be found by
replacing t / 2 by t .
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Paired Data and Two-Sample t
1

V ( X  Y )  V ( D)  V   Di 
n

2
2
V ( Di )  1   2  2  1 2


n
n
Independence between X and Y 
Positive dependence 
 0
 0
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Pros and Cons of Pairing
1. For great heterogeneity and large correlation
within experimental units, the loss in degrees
of freedom will be compensated for by an
increased precision associated with pairing
(use pairing).
2. If the units are relatively homogeneous and
the correlation within pairs is not large, the
gain in precision due to pairing will be
outweighed by the decrease in degrees of
freedom (use independent samples).
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.4
Inferences
Concerning a
Difference Between
Population Proportions
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Difference Between Population
Proportions
Let X ~Bin(m,p1) and Y ~Bin(n,p2) with
X and Y independent variables. Then
E  pˆ1  pˆ 2   p1  p2
pˆ1  pˆ 2 is an unbiased estimator of p1  p2
p1q1 p2 q2
(qi = 1 – pi)
V  pˆ1  pˆ 2  

m
n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Large-Samples
Null hypothesis:
H 0 : p1  p2  0
Test statistic value:
z
pˆ1  pˆ 2
ˆ ˆ 1/ m  1/ n 
pq
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Large-Samples
Alternative
Hypothesis
Rejection Region
H a : p1  p2  0
z  z
H a : p1  p2  0
z   z
H a : p1  p2  0
z  z / 2 or z   z / 2
Valid provided
np0  10 and n(1  p0 )  10.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
General Expressions for  ( p1 , p2 )
 ( p1 , p2 )
Alt. Hypothesis
H a : p1  p2  0
 z



pq (1/ m  1/ n)  ( p1  p2 ) 



H a : p1  p2  0
  z
1  


pq (1/ m  1/ n)  ( p1  p2 ) 



Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
General Expressions for  ( p1 , p2 )
 ( p1 , p2 )
Alt. Hypothesis
H a : p1  p2  0
where
 z



pq (1/ m  1/ n)  ( p1  p2 ) 



  z
 


pq (1/ m  1/ n)  ( p1  p2 ) 



p  (mp1  np2 ) /(m  n)
q  (mq1  nq2 ) /(m  n)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size
For the case m = n, the level  test has
type II error probability  at the
alternative values p1, p2 with p1 – p2 = d
when
 z ( p1  p2 )(q1  q2 ) / 2  z p1q1  p2 q2 


n
2
d
2
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval for p1 – p2
pˆ1  pˆ 2  z / 2
pˆ1qˆ1 pˆ 2 qˆ2

m
n
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
9.5
Inferences
Concerning Two
Population Variances
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The F Distribution
The F probability distribution has
parameters v1 (number of numerator df)
and v2 (number of denominator df). If
X1 and X2 are independent chi-squared
rv’s with v1 and v2 df, then
X 1 / v1
F
X 2 / v2
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The F Distribution Density
Curve Property
F1 ,v1 ,v2  1/ F ,v1 ,v2
F density curve
Shaded area = 
f
F ,v1 ,v2
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Inferential Methods
Let X1,…,Xm and Y1,…,Yn be random
(independent) samples from normal
2
2
distributions with variances  1 and  2 .
2
2
respectively. Let S1 and S 2 denote
the two sample variances, then
S /
F
S /
2
1
2
2
2
1
2
2
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
F Test for Equality of Variances
H0 :  
2
1
Null hypothesis:
Test statistic value:
2
2
f  s /s
2
1
2
2
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
F Test for Equality of Variances
Alternative
Hypothesis
Ha :  
2
1
Rejection Region
2
2
f  F ,m 1,n 1
Ha :  
2
2
f  F1 ,m 1,n 1
Ha :  
2
2
f  F / 2,m 1,n 1
2
1
2
1
or f  F1 / 2, m 1, n 1
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
P-Values for F Tests
The P-value for an upper-tailed F test
is the area under the F curve with
appropriate numerator and
denominator df to the right of the
calculated f.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.