Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 9 Inferences Based on Two Samples Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 9.1 z Tests and Confidence Intervals for a Difference Between Two Population Means Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. The Difference Between Two Population Means Assumptions: 1. X1,…,Xm is a random sample from a 2 population with 1 and 1 . 2. Y1,…,Yn is a random sample from a 2 population with 2 and 2 . 3. The X and Y samples are independent of one another Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Expected Value and Standard Deviation of X Y The expected value is 1 2 . So X Y is an unbiased estimator of 1 2 . The standard deviation is X Y 2 1 m 2 2 n Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Test Procedures for Normal Populations With Known Variances Null hypothesis: H 0 : 1 2 0 Test statistic value: z x y 0 2 1 m 2 2 n Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. () = P(Type II Error) ( 1 2 ) Alt. Hypothesis H a : 1 2 0 H a : 1 2 0 H a : 1 2 0 0 z 0 1 z 0 z / 2 0 z / 2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Large-Sample Tests The assumptions of normal population distributions and known values of 1 , 2 are unnecessary. The Central Limit Theorem guarantees that X Y has approximately a normal distribution. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Large-Sample Tests Use of the test statistic value z x y 0 2 1 2 2 s s m n m, n >40 along with previously stated rejection regions based on z critical values give large-sample tests whose significance levels are approximately . Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Confidence Interval for 1 2 Provided m and n are large, a CI for 1 2 with a confidence level of 100(1 )% is x y z / 2 2 1 2 2 s s m n confidence bounds can be found by replacing z / 2 by z . Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 9.2 The Two-Sample t Test and Confidence Interval Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Assumptions Both populations are normal, so that X1,…,Xm is a random sample from a normal distribution and so is Y1,…,Yn. The plausibility of these assumptions can be judged by constructing a normal probability plot of the xi’s and another of the yi’s. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. t Distribution When the population distributions are both normal, the standardized variable T X Y ( 1 2 ) S12 S22 m n has approximately a t distribution… Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. t Distribution df v can be estimated from the data by 2 2 2 v s 2 1 s1 s2 m n / m m 1 2 s 2 2 / n 2 n 1 (round down to the nearest integer) Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Two-Sample CI for 1 2 The two-sample CI for 1 2 with a confidence level of 100(1 )% is x y t / 2,v 2 1 2 2 s s m n Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Two-Sample t Test Null hypothesis: H 0 : 1 2 0 Test statistic value: z x y 0 2 1 2 2 s s m n Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. The Two-Sample t Test Alternative Hypothesis Rejection Region for Approx. Level Test H a : 0 0 t t ,v H a : 0 0 t t ,v H a : 0 0 t t / 2,v or t t / 2,v Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Pooled t Procedures Assume two populations are normal and 2 have equal variances. If denotes the common variance, it can be estimated by combining information from the twosamples. Standardizing X Y using the pooled estimator gives a t variable based on m + n – 2 df. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 9.3 Analysis Paired Data of Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Paired Data (Assumptions) The data consists of n independently selected pairs (X1,Y1),…, (Xn,Yn), with E ( X i ) 1 and E (Yi ) 2 Let D1 = X1 – Y1, …, Dn = Xn – Yn. The Di’s are assumed to be normally distributed with mean value D and 2 variance D . Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. The Paired t Test Null hypothesis: H 0 : D 0 d 0 Test statistic value: t sD / n d and sD are the sample mean and standard deviation of the di’s. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. The Paired t Test Alternative Hypothesis Rejection Region for Level Test H a : D 0 t t ,n 1 H a : D 0 t t , n 1 H a : D 0 t t / 2, n 1 or t t / 2,n 1 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Confidence Interval for D The paired t CI for D is d t / 2,n1 sD / n confidence bounds can be found by replacing t / 2 by t . Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Paired Data and Two-Sample t 1 V ( X Y ) V ( D) V Di n 2 2 V ( Di ) 1 2 2 1 2 n n Independence between X and Y Positive dependence 0 0 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Pros and Cons of Pairing 1. For great heterogeneity and large correlation within experimental units, the loss in degrees of freedom will be compensated for by an increased precision associated with pairing (use pairing). 2. If the units are relatively homogeneous and the correlation within pairs is not large, the gain in precision due to pairing will be outweighed by the decrease in degrees of freedom (use independent samples). Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 9.4 Inferences Concerning a Difference Between Population Proportions Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Difference Between Population Proportions Let X ~Bin(m,p1) and Y ~Bin(n,p2) with X and Y independent variables. Then E pˆ1 pˆ 2 p1 p2 pˆ1 pˆ 2 is an unbiased estimator of p1 p2 p1q1 p2 q2 (qi = 1 – pi) V pˆ1 pˆ 2 m n Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Large-Samples Null hypothesis: H 0 : p1 p2 0 Test statistic value: z pˆ1 pˆ 2 ˆ ˆ 1/ m 1/ n pq Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Large-Samples Alternative Hypothesis Rejection Region H a : p1 p2 0 z z H a : p1 p2 0 z z H a : p1 p2 0 z z / 2 or z z / 2 Valid provided np0 10 and n(1 p0 ) 10. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. General Expressions for ( p1 , p2 ) ( p1 , p2 ) Alt. Hypothesis H a : p1 p2 0 z pq (1/ m 1/ n) ( p1 p2 ) H a : p1 p2 0 z 1 pq (1/ m 1/ n) ( p1 p2 ) Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. General Expressions for ( p1 , p2 ) ( p1 , p2 ) Alt. Hypothesis H a : p1 p2 0 where z pq (1/ m 1/ n) ( p1 p2 ) z pq (1/ m 1/ n) ( p1 p2 ) p (mp1 np2 ) /(m n) q (mq1 nq2 ) /(m n) Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Sample Size For the case m = n, the level test has type II error probability at the alternative values p1, p2 with p1 – p2 = d when z ( p1 p2 )(q1 q2 ) / 2 z p1q1 p2 q2 n 2 d 2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Confidence Interval for p1 – p2 pˆ1 pˆ 2 z / 2 pˆ1qˆ1 pˆ 2 qˆ2 m n Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 9.5 Inferences Concerning Two Population Variances Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. The F Distribution The F probability distribution has parameters v1 (number of numerator df) and v2 (number of denominator df). If X1 and X2 are independent chi-squared rv’s with v1 and v2 df, then X 1 / v1 F X 2 / v2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. The F Distribution Density Curve Property F1 ,v1 ,v2 1/ F ,v1 ,v2 F density curve Shaded area = f F ,v1 ,v2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Inferential Methods Let X1,…,Xm and Y1,…,Yn be random (independent) samples from normal 2 2 distributions with variances 1 and 2 . 2 2 respectively. Let S1 and S 2 denote the two sample variances, then S / F S / 2 1 2 2 2 1 2 2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. F Test for Equality of Variances H0 : 2 1 Null hypothesis: Test statistic value: 2 2 f s /s 2 1 2 2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. F Test for Equality of Variances Alternative Hypothesis Ha : 2 1 Rejection Region 2 2 f F ,m 1,n 1 Ha : 2 2 f F1 ,m 1,n 1 Ha : 2 2 f F / 2,m 1,n 1 2 1 2 1 or f F1 / 2, m 1, n 1 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. P-Values for F Tests The P-value for an upper-tailed F test is the area under the F curve with appropriate numerator and denominator df to the right of the calculated f. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.