* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Two-proportion z
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
German tank problem wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Misuse of statistics wikipedia , lookup
Chapter 22 Comparing two proportions Two populations, two unknown proportions p1 and p2 p2 p1 Problems Estimate the difference p1 - p2 Test HO: p1 = p2 Samples: Two independent, large samples of sizes n1, n2 Sample proportions: pˆ 1 success1 n1 pˆ 2 success 2 n2 is pˆ1 pˆ 2 ˆ1 p ˆ 2 is approximately normal If n1, n2 are large then p with mean p1 - p Standard deviation of pˆ1 pˆ 2 is Point estimate of p1 - p2 SD( pˆ1 pˆ 2 ) p1q1 p2 q2 n1 n2 Two-proportion z-interval Assumptions 1. Random samples, each with independent observations 2. Independent samples 3. If sampling without replacement, the sample size n should be no more than 10% of the population. 4. "Large" samples (n1p1 > 10, n1q1>10, n2p2 > 10, n2q2 >10) Standard Error: SE ( pˆ1 pˆ 2 ) C% Margin of Error: pˆ1qˆ1 pˆ 2 qˆ2 n1 n2 ME( pˆ1 pˆ 2 ) z * SE( pˆ1 pˆ 2 ) where z* is a critical value for standard normal distribution that corresponds to C% confidence level A C% confidence interval for a difference p1 - p2 is ( pˆ1 pˆ 2 ) ME( pˆ1 pˆ 2 ) Example: In 2000 researchers contacted 25,138 Americans aged 24 years to see if they had finished high school; 84.9% of the 12,460 males and 88.1% of the 12,678 females indicated that they had high school diploma. Create a 95% confidence interval for the difference in graduation rate between males and females. Data ˆ1 0.849, p ˆ 2 0.881, n1 12460, n2 12678 p ˆ1 p ˆ 2 0.849 0.881 .032 p Standard Error: ˆ1 p ˆ2) SE ( p ˆ1q ˆ1 ˆ q ˆ p p 2 2 0.004 n1 n2 Critical value: z* = 1.96 95% Margin of Error: ˆ1 p ˆ 2 ) 1.96 0.004 0.008 ME ( p C% confidence interval for a population proportion p is ( pˆ1 pˆ 2 ) ME( pˆ1 pˆ 2 ) Answer: -.0320.008 or (-0.040, -0.024) Two-proportion z-test Assumptions 1. Random samples, each with independent observations 2. Independent samples 3. If sampling without replacement, the sample size n should be no more than 10% of the population. 4. "Large" samples (n1p1 > 10, n1p1>10, n2p2 > 10, n2q2 >10) Hypotheses: 1. Null hypothesis HO: p1 = p2 that is HO: p1 - p2 =0 2. Alternative hypothesis HA: p1 > p2 or HA: p1 < p2 or HA: p1 ≠ p2 that is HA: p1- p2 > 0 or HA: p1 - p2 < 0 or HA: p1- p2 ≠ 0 Attitude: Assume that the null hypothesis HO is true and uphold it, unless data strongly speaks against it. To estimate the common p = p1 = p2 we combine (pool) the two samples together pˆ pooled success1 success2 n1 pˆ1 n2 pˆ 2 n1 n2 n1 n2 ˆ1 p ˆ2 and use it to estimate the standard deviation of p Pooled standard error of SE pooled ( pˆ1 pˆ 2 ) pˆ1 pˆ 2 pˆ pooledqˆ pooled n1 pˆ pooledqˆ pooled n2 Test statistic: z pˆ 1 pˆ 2 SE pooled ( pˆ 1 pˆ 2 ) Distribution under H0: approximately standard normal P-value: Let zo be the observed value of the test statistic. The way we compute it depends on HA HA P-value HA: p1 > p2 P(z > zo) HA: p1 < p2 P(z <zo) HA: p1 ≠ p2 P(z > |zo|) + P(z < -|zo|) Example. Of 995 respondents, 37% reported they snored at least a few night a week. Split into two age categories, 26% of the 184 people under 30 snored, compared with 39% of 811 in the older group. Is this difference real (statistically significant) or due only to natural fluctuations. Use =0.05 Assumptions 1. Random samples, each with independent observations 2. Independent samples 3. If sampling without replacement, the sample size n should be no more than 10% of the population. 4. "Large" samples (n1p1 > 10, n1p1>10, n2p2 > 10, n2q2 >10) Data: ˆ1 0.26, p ˆ 2 0.39, n1 184, n2 811 p Hypotheses: HO: p1 = p2 HA: p1 < p2 (HO: p1 - p2 =0) (HA: p1 - p2 < 0) Estimate of the common p = p1 = p2 n1 pˆ1 n2 pˆ 2 184 0.26 811 0.39 0.366 n1 n2 184 811 ˆ1 pˆ 2 Pooled standard error of p pˆ pooled SE pooled Test statistic: P-value: Conclusion: z 0.366 0.634 0.366 0.634 0.039 184 811 pˆ1 pˆ 2 0.26 0.39 2.56 SE pooled ( pˆ1 pˆ 2 ) 0.039 P(z<-2.56) = 0.0052 Reject HO at level 0.01 STT 200 102/104/701 Summer A A.Makagon 4/30/2017 Chapter 23 Inferences About Means Problems Estimate Test HO: = 0 Assumptions: Normal population (or large sample) Unknown population mean Unknown standard deviation Point Estimator: x x1 x2 ... xn n If is unknown and n is large, then for any population x z s n is approximately standard normal If population is normal and standard deviation is known then a for any n (large or small) the sample mean x is N ( , ) and hence n z x n is standard normal If population is normal and standard deviation is unknown then for any n (large or small) the same statistic t x s n has a Student's t-distribution with n-1 degrees of freedom Example. Using t tables (Table T) and/or calculator find or estimate 1. critical value t7* for 90% confidence level if number of degrees of freedom is 7 2. one tail probability if t = 2.56 and number of degrees of freedom is 7 3. two tail probability if t = 2.56 and number of degrees of freedom is 7 NOTE: If t has a Student's t-distribution with df degrees of freedom then TI-83 function tcdf(a,b,df) computes the area under the tcurve and between a and b. Solution: 1. critical value t7* for 90% confidence level if number of degrees of freedom is 7 = 1.895 (from Table T) 2. one tail probability if t = 2.56 and number of degrees of freedom is 7 = tcdf(2.56,10^10,7) = 0.0188 3. two tail probability if t = 2.56 and number of degrees of freedom is 7 = = 20.0188 = 0.0376 One-sample t-interval for population mean Assumptions 5. Random sample, independent observations 6. If sampling without replacement, the sample size n should be no more than 10% of the population. 7. Normal population x Point Estimator: Standard Error: x1 x2 ... xn n SE ( x ) C% Margin of Error: s n ME ( x ) tn*1 SE ( x ) where tn-1* is a critical value for Student's t-model with n-1 degrees of freedom that corresponds to C% confidence level A C% confidence interval for x tn*1SE ( x ) Sample size needed for a given ME * 2 2 (t n 1) s n ME 2 Example: Below is the speed of vehicles recorded on Triphammer Road: Find a 90% confidence interval for the mean speed of vehicles driving on Triphammer Road. Sample size: n = 23 (small) Descriptive statistics: x 31.0, s 4.25 Histogram is symmetric, we assume normal model. Degrees of freedom: df = n - 1 = 22 t22* = 1.717 s 4.25 0.886 n 23 ME ( x ) 1.717 0.886 1.521 SE ( x ) A 90% confidence interval for the mean speed of vehicles driving on Triphammer Road is 31.0±1.5 or (29.5, 32.5) One-sample t-test for population mean Assumptions 1. Random sample, independent observations 2. If sampling without replacement, the sample size n should be no more than 10% of the population. 3. Normal population (or large sample) Hypotheses: 3. Null hypothesis HO: = 0 4. Alternative hypothesis HA: > 0 or HA: < 0 or HA: ≠ 0 Attitude: Assume that the null hypothesis HO is true and uphold it, unless data strongly speaks against it. Standard error SE ( x ) Test statistic: t s n x 0 SE ( x ) t has Student's t - distribution with n-1 degrees of freedom P-value: Let to be the observed value of the test statistic. HA P-value HA: > 0 P(t > to) HA: < 0 P(t <to) HA: ≠ 0 P(t > |to|) + P(t < -|to|) Example - cont. Below is the speed of vehicles recorded on Triphammer Road: x 31.0, s 4.25 Test whether the data provides evidence that the mean speed of vehicles on Triphammer Road exceeds 30 mph. n = 23 (small), Histogram is symmetric, we assume normal model. We use one-sample t-test Hypotheses: Test: Standard error: Test statistic: HO: = 30 vs. HA: > 0 one sample t-test SE ( x ) t s 4.25 / 23 0.886 n x 30 31 33 1.13 SE ( x ) 0.886 Degrees of freedom: df = n - 1 = 22 P-value: bigger that .10 TI-83 tcdf(1.13,1E99,22) = 0.14 Fail to reject H0 even at = .10