Download Two-proportion z

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

German tank problem wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 22 Comparing two proportions
Two populations, two unknown proportions p1 and p2
p2
p1
Problems
 Estimate the difference p1 - p2
 Test HO: p1 = p2
Samples: Two independent, large samples of sizes n1, n2
Sample proportions:
pˆ 1 
success1
n1
pˆ 2 
success 2
n2
is pˆ1  pˆ 2
ˆ1  p
ˆ 2 is approximately normal
 If n1, n2 are large then p
with mean p1 - p
 Standard deviation of pˆ1  pˆ 2 is
 Point estimate of p1 - p2
SD( pˆ1  pˆ 2 ) 
p1q1 p2 q2

n1
n2
Two-proportion z-interval
Assumptions
1. Random samples, each with independent observations
2. Independent samples
3. If sampling without replacement, the sample size n should be
no more than 10% of the population.
4. "Large" samples (n1p1 > 10, n1q1>10, n2p2 > 10, n2q2 >10)
Standard Error:
SE ( pˆ1  pˆ 2 ) 
C% Margin of Error:
pˆ1qˆ1 pˆ 2 qˆ2

n1
n2
ME( pˆ1  pˆ 2 )  z *  SE( pˆ1  pˆ 2 )
where z* is a critical value for standard normal distribution
that corresponds to C% confidence level
A C% confidence interval for a difference
p1 - p2 is
( pˆ1  pˆ 2 )  ME( pˆ1  pˆ 2 )
Example: In 2000 researchers contacted 25,138 Americans aged
24 years to see if they had finished high school;
84.9% of the 12,460 males and
88.1% of the 12,678 females
indicated that they had high school diploma. Create a 95%
confidence interval for the difference in graduation rate between
males and females.
Data
ˆ1  0.849, p
ˆ 2  0.881, n1  12460, n2  12678
p
ˆ1  p
ˆ 2  0.849  0.881  .032
p
Standard Error:
ˆ1  p
ˆ2) 
SE ( p
ˆ1q
ˆ1
ˆ q
ˆ
p
p
 2 2  0.004
n1
n2
Critical value: z* = 1.96
95% Margin of Error:
ˆ1  p
ˆ 2 )  1.96  0.004  0.008
ME ( p
C% confidence interval for a population proportion p is
( pˆ1  pˆ 2 )  ME( pˆ1  pˆ 2 )
Answer: -.0320.008 or (-0.040, -0.024)
Two-proportion z-test
Assumptions
1. Random samples, each with independent observations
2. Independent samples
3. If sampling without replacement, the sample size n should be
no more than 10% of the population.
4. "Large" samples (n1p1 > 10, n1p1>10, n2p2 > 10, n2q2 >10)
Hypotheses:
1. Null hypothesis
HO: p1 = p2 that is HO: p1 - p2 =0
2. Alternative hypothesis
HA: p1 > p2
or HA: p1 < p2
or HA: p1 ≠ p2
that is
HA: p1- p2 > 0 or HA: p1 - p2 < 0 or HA: p1- p2 ≠ 0
Attitude: Assume that the null hypothesis HO is true and uphold
it, unless data strongly speaks against it.
To estimate the common p = p1 = p2 we combine (pool) the two
samples together
pˆ pooled 
success1  success2 n1  pˆ1  n2  pˆ 2

n1  n2
n1  n2
ˆ1  p
ˆ2
and use it to estimate the standard deviation of p
Pooled standard error of
SE pooled ( pˆ1  pˆ 2 ) 
pˆ1  pˆ 2
pˆ pooledqˆ pooled
n1

pˆ pooledqˆ pooled
n2
Test statistic:
z
pˆ 1  pˆ 2
SE pooled ( pˆ 1  pˆ 2 )
Distribution under H0: approximately standard normal
P-value: Let zo be the observed value of the test statistic. The
way we compute it depends on HA
HA
P-value
HA: p1 > p2
P(z > zo)
HA: p1 < p2
P(z <zo)
HA: p1 ≠ p2 P(z > |zo|) + P(z < -|zo|)
Example.
Of 995 respondents, 37% reported they snored at least a few night
a week. Split into two age categories, 26% of the 184 people
under 30 snored, compared with 39% of 811 in the older group. Is
this difference real (statistically significant) or due only to natural
fluctuations. Use =0.05
Assumptions
1. Random samples, each with independent observations
2. Independent samples
3. If sampling without replacement, the sample size n should be no more
than 10% of the population.
4. "Large" samples (n1p1 > 10, n1p1>10, n2p2 > 10, n2q2 >10)
Data:
ˆ1  0.26, p
ˆ 2  0.39, n1  184, n2  811
p
Hypotheses:
HO: p1 = p2
HA: p1 < p2
(HO: p1 - p2 =0)
(HA: p1 - p2 < 0)
Estimate of the common p = p1 = p2
n1  pˆ1  n2  pˆ 2 184  0.26  811  0.39

 0.366
n1  n2
184  811
ˆ1  pˆ 2
Pooled standard error of p
pˆ pooled 
SE pooled 
Test statistic:
P-value:
Conclusion:
z
0.366  0.634 0.366  0.634

 0.039
184
811
pˆ1  pˆ 2
0.26  0.39

 2.56
SE pooled ( pˆ1  pˆ 2 )
0.039
P(z<-2.56) = 0.0052
Reject HO at level 0.01
STT 200 102/104/701 Summer A
A.Makagon
4/30/2017
Chapter 23 Inferences About Means
Problems
 Estimate 
 Test HO:  = 0

Assumptions:
 Normal population (or large sample)
 Unknown population mean 
 Unknown standard deviation 
Point Estimator:
x
x1  x2  ...  xn
n
 If  is unknown and n is large, then for any population
x
z
s
n
is approximately standard normal
 If population is normal and standard deviation  is known
then a for any n (large or small) the sample mean x is
N ( , 
) and hence
n
z
x

n
is standard normal
 If population is normal and standard deviation  is unknown
then for any n (large or small) the same statistic
t
x
s
n
has a Student's t-distribution with n-1 degrees of freedom
Example.
Using t tables (Table T) and/or calculator find or estimate
1. critical value t7* for 90% confidence level if number of degrees of
freedom is 7
2. one tail probability if t = 2.56 and number of degrees of freedom is 7
3. two tail probability if t = 2.56 and number of degrees of freedom is 7
NOTE: If t has a Student's t-distribution with df degrees of
freedom then TI-83 function tcdf(a,b,df) computes the area under the tcurve and between a and b.
Solution:
1. critical value t7* for 90% confidence level if number of degrees of
freedom is 7 = 1.895 (from Table T)
2. one tail probability if t = 2.56 and number of degrees of freedom is 7
= tcdf(2.56,10^10,7) = 0.0188
3. two tail probability if t = 2.56 and number of degrees of freedom is 7 =
= 20.0188 = 0.0376
One-sample t-interval
for population mean 
Assumptions
5. Random sample, independent observations
6. If sampling without replacement, the sample size n should be
no more than 10% of the population.
7. Normal population
x
Point Estimator:
Standard Error:
x1  x2  ...  xn
n
SE ( x ) 
C% Margin of Error:
s
n
ME ( x )  tn*1  SE ( x )
where tn-1* is a critical value for Student's t-model with n-1
degrees of freedom that corresponds to C% confidence level
A C% confidence interval for 
x  tn*1SE ( x )
Sample size needed for a given ME
*
2 2
(t n
1) s
n
ME 2
Example:
Below is the speed of vehicles recorded on Triphammer Road:
Find a 90% confidence interval for the mean speed  of vehicles driving
on Triphammer Road.
Sample size: n = 23 (small)
Descriptive statistics:
x  31.0, s  4.25
Histogram is symmetric, we assume normal model.
Degrees of freedom: df = n - 1 = 22
t22* = 1.717
s
4.25

 0.886
n
23
ME ( x )  1.717  0.886  1.521
SE ( x ) 
A 90% confidence interval for the mean speed of vehicles driving on
Triphammer Road is
31.0±1.5 or (29.5, 32.5)
One-sample t-test
for population mean 
Assumptions
1. Random sample, independent observations
2. If sampling without replacement, the sample size n should be
no more than 10% of the population.
3. Normal population (or large sample)
Hypotheses:
3. Null hypothesis
HO:  = 0
4. Alternative hypothesis
HA:  > 0
or HA:  < 0
or
HA:  ≠ 0
Attitude: Assume that the null hypothesis HO is true and uphold
it, unless data strongly speaks against it.
Standard error
SE ( x ) 
Test statistic:
t
s
n
x  0
SE ( x )
 t has Student's t - distribution with n-1 degrees of freedom
P-value: Let to be the observed value of the test statistic.
HA
P-value
HA:  > 0
P(t > to)
HA:  < 0
P(t <to)
HA:  ≠ 0 P(t > |to|) + P(t < -|to|)
Example - cont.
Below is the speed of vehicles recorded on Triphammer Road:
x  31.0, s  4.25
Test whether the data provides evidence that the mean speed of vehicles on
Triphammer Road exceeds 30 mph.
n = 23 (small),
Histogram is symmetric, we assume normal model.
We use one-sample t-test
Hypotheses:
Test:
Standard error:
Test statistic:
HO:  = 30 vs. HA:  > 0
one sample t-test
SE ( x ) 
t
s
 4.25 / 23  0.886
n
x  30 31  33

 1.13
SE ( x )
0.886
Degrees of freedom: df = n - 1 = 22
P-value: bigger that .10
TI-83 tcdf(1.13,1E99,22) = 0.14
Fail to reject H0 even at  = .10