Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Topic 20 – Two Populations 20-1 Topic 20 – COMPARING TWO POPULATIONS (OR TREATMENTS) A) Two Population Means Using Independent Samples EXAMPLE A scientist is interested in determining which of two butterfly subspecies has a larger wingspan. Subspecies 1 is found on forest understory plants and tends to feed on its nursery plants. Thus it doesn’t travel far. The other species is found on open field flowers and migrates seasonally. She hypothesizes that the migrating species has larger average wingspans than the forest species and plans to take two samples to test her hypothesis. Notation: Popu- Popula- Popula- Sample Sample Sample lation tion tion Size Mean Standard Deviation Mean Standard Deviation x1 1 n1 s1 σ1 μ1 x2 2 n2 s2 σ2 μ2 To compare 2 population means we shall consider the size of the difference μ1 − μ 2 : Topic 20 – Two Populations 20-2 μ1 − μ 2 = 0 ⇒ μ1 = μ 2 μ1 − μ 2 > 0 ⇒ μ1 > μ 2 μ1 − μ 2 < 0 ⇒ μ1 < μ 2 Our sampling estimator of this population difference is the sample mean difference x1 − x2 when the two samples are independent of one another. Sampling Distribution of x1 − x2 when the two samples are independently and randomly taken: 1) the mean of the distribution is μ X1 − X 2 = μ1 − μ 2 (that is, x1 − x2 is unbiased) 2) the standard deviation of the distribution is σ X1 − X 2 = σ 12 σ 22 n1 + n2 3) the shape of the sampling distribution is approximately normal (a bell curve) if a) both n1 and n1 are large, or b) both of the populations being sampled are approximately normally distributed The estimator of μ1 − μ 2 is μˆ1 − μˆ 2 = x1 − x2 . Topic 20 – Two Populations 20-3 The estimator of σ X1 − X 2 = σ 12 σ 22 + depends on n1 n2 whether σ 1 ≠ σ 2 (unequal variance case) or σ 1 = σ 2 (equal variance case). Equal Variance Case: When σ 1 = σ 2 , the estimator of σ X1 − X 2 is given by ⎛1 1⎞ s x1 − x2 = sc2 ⎜⎜ + ⎟⎟ ⎝ n1 n2 ⎠ where the estimator of the common variance is sc2 s12 (n1 − 1) + s22 ( n2 − 1) . = n1 + n2 − 2 The degrees of freedom for this estimator are n1 + n2 − 2 . Unequal Variance Case: When σ 1 ≠ σ 2 , the estimator of σ X1 − X 2 is given by s12 s22 s ' x1 − x2 = + . n1 n2 What are the degrees of freedom for s ' x1 − x2 ? Topic 20 – Two Populations 20-4 Satterthwaite showed that the appropriate degrees of freedom for this estimator are (V1 + V2 ) 2 s12 s22 where V1 = and V2 = df = 2 2 n1 n2 V1 V2 + n1 − 1 n2 − 1 These reduce to df = n1+n2–2 when the two variances are in fact equal. So, if x1 − x2 is at least approximately normally distributed we get that t 'obs = or tobs = ( x1 − x2 ) − ( μ1 − μ 2 ) s12 s22 + n1 n2 ( x1 − x2 ) − ( μ1 − μ 2 ) 2⎛ sc ⎜⎜ 1 1⎞ + ⎟⎟ n ⎝ 1 n2 ⎠ have approximate T-distributions on the associated degrees of freedom. Topic 20 – Two Populations 20-5 Hypothesis Test of the Difference in Two Population Means Based on Two Independent Samples: Hypotheses are one of three: a) H0: μ1 − μ 2 ≤ D0 vs. HA: μ1 − μ2 > D0 b) H0: μ1 − μ 2 ≥ D0 vs. HA: μ1 − μ2 < D0 c) H0: μ1 − μ2 = D0 vs. HA: μ1 − μ 2 ≠ D0 where D0 is the hypothesized difference between the means Test Statistic: depends on whether the variances in the two populations are different or the same – so the statistic is either ( x − x ) − D0 (1) t 'obs = 1 2 or 2 2 s1 s2 + n1 n2 (2) tobs = ( x1 − x2 ) − ( μ1 − μ 2 ) ⎛1 1⎞ sc2 ⎜⎜ + ⎟⎟ ⎝ n1 n2 ⎠ The degrees of freedom are Topic 20 – Two Populations for (1): (V1 + V2 ) 2 V12 V22 + n1 − 1 n2 − 1 20-6 s22 s12 where V1 = and V2 = n1 n2 and for (2): n1 + n2 − 2 . P-value: depends on the alternative hypothesis: a) P-value = Pr( T > t) b) P-value = Pr( T < t) c) P-value = 2 Pr( T > |t|) Decision Rule: reject Ho if P-value ≤ α Assumptions: 1. n1 and n2 are large enough for the sample means to be approximately normally distributed 2. the sampling was random and not more than 5% of the population. 3. the two samples are independently taken EXAMPLE Nitrogen is the most common nutrient applied to soils. In tropical areas with warm temperatures and heavy rainfall, only part of the applied nitrogen is used by crops and the rest is lost. Information about the mean nitrogen loss (N-loss) is important for research on optimal growth of plants. Topic 20 – Two Populations 20-7 To that end, two nitrogen fertilizer treatments are to be compared for their average N-loss: Urea alone (population 1) and Urea+N-Serve (population 2). A sugarcane field was divided into equal size plots and plots were randomly assigned to one of the two treatments. There were sufficient numbers of plots so that no treated plots were adjacent on any side. Important Point about Experimental Design: when planning an experiment to compare two or more treatments: 1) experimental units (plants, field plots, people, etc) should be randomly selected from the larger group from which they could be selected (the population of potential experimental units) 2) treatments should be randomly assigned to the experimental units 3) extraneous or confounding factors should be considered and minimized when assigning and running the experiment (e.g. all units should be the same size, have the same weather conditions, etc) The following data represent Nitrogen loss (% of total N applied) at the end of a 16 week period: Topic 20 – Two Populations Fertilizer UN U Group U UN 20-8 Percentage N-loss 10.8, 10.5,14.0, 13.5, 8.0, 9.5, 11.8, 10.0, 8.7, 9.0, 9.8, 13.8, 14.7, 10.3, 12.8 8.0, 7.3, 14.1, 9.8, 7.1, 6.3, 10.0, 7.1, 7.9, 6.1, 6.9, 11.0, 10.0 Treatment N ID 1 13 2 15 Mean SD S2 8.585 2.288 5.235 11.147 2.140 4.580 Question: Is there sufficient evidence to support the hypothesis that the two treatments differ in their mean percentage N-loss? Hypotheses: Ho: μ1 − μ 2 = 0 HA: μ1 − μ 2 ≠ 0 Significance level: α = 0.05 Test Statistic (assuming unequal variances) t= ( x1 − x2 ) − D0 s12 n1 + s22 n2 = (8.58 − 11.15) − 0 2 (2.29) (2.14) + 13 15 2 = −3.045 Topic 20 – Two Populations 20-9 s12 ( 2.29) 2 Degrees of Freedom: V1 = = = 0.4034 n1 13 s22 ( 2.14) 2 V2 = = = 0.3053 n2 15 (V1 + V2 ) 2 (.4034 + .3053) 2 df = 2 = = 24.8 ≈ 24. 2 2 2 V1 V (.4034) (.3053) + 2 + n1 − 1 n2 − 1 13 − 1 15 − 1 (always round down) P-value: 2 Pr ( T > |t| ) =2 Pr(T> 3.0). From the table in the book, we see that for 24 df, tobs = 3.0 lies between 2.797 (p-value=0.005) and 3.467 (p-value=0.001). Hence, we have that the p-value for our test lies between 2(0.001) = 0.002 and 2(0.005) = 0.01. Test Statistic (assuming equal variances) t= ( x1 − x2 ) − D0 2⎛ sc ⎜⎜ where 1 1⎞ + ⎟⎟ n ⎝ 1 n2 ⎠ = (8.58 − 11.15) − 0 = −3.06 ⎛1 1⎞ 4.882⎜ + ⎟ ⎝ 13 15 ⎠ Topic 20 – Two Populations sc2 20-10 s12 ( n1 − 1) + s22 (n2 − 1) 5.235(12) + 4.580(14) = 4.882 = = n1 + n2 − 2 13 + 15 − 2 Degrees of Freedom: n1 + n2 − 2 = 26. P-value: 2 Pr ( T > |t| ) =2 Pr(T> 3.0). From the table in the book, we see that for 26 df, tobs = 3.0 lies between 2.779 (p-value=0.005) and 3.435 (p-value=0.001). Hence, we have that the p-value for our test lies between 2(0.001) = 0.002 and 2(0.005) = 0.01. Conclusion: Regardless of the choice of test, the p-value is less than α=0.05, so we reject the null hypothesis. There is sufficient evidence to indicate that the two nitrogen treatments differ in their average percentage nitrogen loss. Had we used SAS the code and output would be: data nloss; input treatment$ loss @@; cards; UN 10.8 UN 10.5 UN 14.0 UN 13.5 UN 8.0 UN 9.5 UN 11.8 UN 10.0 UN 8.7 UN 9.0 UN 9.8 UN 13.8 UN 14.7 UN 10.3 UN 12.8 U 8.0 U 7.3 U 14.1 U 9.8 U 7.1 U 6.3 U 10.0 U 7.1 U 7.9 U 6.1 U 6.9 U 11.0 U 10.0 ; Topic 20 – Two Populations 20-11 proc ttest data=nloss; class treatment; var loss; quit; Statistics LowerCL N Mean Mean treatment U 13 7.201 UN 15 9.961 Diff (1-2) -4.283 8.584 11.14 -2.56 UpperCL LowerCL UpperCL Mean StdDev StdDev StdDev StdErr 9.967 12.33 -0.84 sc2 Y1 − Y2 Variable loss loss Method Pooled Satterthwaite Variable loss 1.640 1.566 1.740 T-Tests Variances Equal Unequal DF 26 24.8 Equality of Variances Method Num DF Den DF Folded F 12 14 2.288 2.139 2.209 3.777 3.374 3.028 0.6347 0.5525 0.8373 ⎛1 1 ⎞ sc2 ⎜⎜ + ⎟⎟ ⎝ n1 n2 ⎠ t Value -3.06 -3.04 F Value 1.14 Pr > |t| 0.0051 0.0054 Pr > F 0.8014 Note that the p-values are exact here and are equal to 0.0051 or 0.0054 depending on whether we use the test assuming equal variance or not. There are two questions here: Topic 20 – Two Populations 20-12 1) We just saw that when the two sample variances are close in value, the two test statistics are almost identical. So, why not just use the unequal variance test all of the time? Actually, that is not unreasonable since the test for unequal variances reduces to the equal variance test when the two sample variances are identical. In reality though, even when the two populations have equal variance, the sample variances can be quite different. This is especially true when the sample sizes are not the same. As a result, the unequal variance test is not as good as the equal variance test when the two population variances are equal. It tends to have higher type II error when the two variances are equal but we assume they are not. So, the next question is … 2) How do we identify which test statistic should be used? Well, we can either • use the rule of thumb that the sample variances should be within 3 times each other OR • do a test of equality of the two variances. We will learn the test for equality of variances next after CI estimation of the difference in two population means. Topic 20 – Two Populations 20-13 Confidence Interval Estimation of the Difference of Two Means Based on Independent Samples: Interval Estimator: ( x1 − x2 ) ± tα / 2,df × estimator of σ x1 − x2 where the t-value is based on the confidence level desired (α) and has degrees of freedom calculated according to which estimator you use for the variance (equal or unequal). Assumptions: 1. n1 and n2 are large enough for the sample means to be approximately normally distributed 2. the sampling was random and not more than 5% of the population. 3. the two samples are independently taken EXAMPLE N-loss experiment. A 95% confidence interval based on two independent samples is given by either ( x1 − x2 ) ± t0.025, 24 or s12 s22 + n1 n2 Topic 20 – Two Populations 20-14 ⎛1 1⎞ ( x1 − x2 ) ± t0.025, 26 sc2 ⎜⎜ + ⎟⎟ ⎝ n1 n2 ⎠ From earlier: Group U UN Treatment N ID 1 13 2 15 Mean SD S2 8.585 2.288 5.235 11.147 2.140 4.580 For unequal variances, the t-value for 95% confidence and 24 df = 2.06. So, the confidence interval is (2.29) 2 (2.14) 2 (8.58 − 11.15) ± 2.06 + 13 15 = (−4.2870 − 0.8371) Similarly, if we assume equal variances the t-value for 95% and 26 df = 2.05, so we obtain ⎛1 1⎞ (8.58 − 11.15) ± 2.05 4.882⎜ + ⎟ ⎝ 13 15 ⎠ = (−4.2831, − 0.8410) Topic 20 – Two Populations 20-15 Thus, with 95% confidence, the mean nitrogen loss (%) from Urea alone is between 0.8% and 4.3% below the mean loss of the Urea+N-Serve combination, regardless of which method we use. Note that the SAS output reports the 95% confidence interval of the difference assuming equal variances. If you want the interval for unequal variances, you will have to calculate it yourself. EXAMPLE Discharge of industrial waste into rivers affects water quality. To assess the effect of a power plant on water quality, 24 samples were taken 16 km upstream of the plant and another 24 were taken at 4 km downstream. Alkalinity (mg/l) was measured on each water sample. Do the data suggest that the true mean alkalinity below the plant is more than 50 mg/l higher than the true mean alkalinity upstream of the plant? Since the two tests report similar results, we will use the unequal variance test here. Output from a statistical software program: Group N Mean SD Pop’ln upstream 24 75.9 1.83 2 downstream 24 183.6 1.70 1 tobs = 113.2 df = 45 2-sided P-value = 0+ Topic 20 – Two Populations Hypotheses: 20-16 Ho: μ1 − μ 2 = 50 HA: μ1 − μ 2 > 50 Check the t-score: t= ( x1 − x2 ) − D0 s12 n1 + s22 n2 Assumptions: = (183.6 − 75.9) − 50 2 (1.70) (1.83) + 24 24 2 = 113.17 1) sample sizes large enough? 2) samples independent and random? Conclusion: There is strong evidence to suggest that the average alkalinity of the water below the power plant is more than 50 mg/l higher than the mean alkalinity of the water above the power plant. For a 95% confidence interval estimate of the difference we have: the t-value for 95% and 45 df ≈ 2.02. So, (1.70) 2 (1.83) 2 (183.6 − 75.9) ± 2.02 + 24 24 = (100.67, 102.73) Topic 20 – Two Populations 20-17 We conclude with 95% confidence that the mean alkalinity below the power plant is between 100.7 and 103 mg/l higher than the mean alkalinity of the water above the power plant! B) Comparing Two Population Variances Using Independent Samples Suppose we are interested in determining which t-test to use to compare two means or we might be interested in comparing two populations variances for other purposes. A simple test of two population variances based on two independent samples is called Hartley’s Fmax test or the folded F-test. An underlying assumption is that the two populations being tested are Normally distributed. To test hypotheses about population variances we look at the ratio of the two sample variances: Fobs = 2 smax 2 smin where 2 2 smax = max(s12 , s22 ) > smin = min( s12 , s22 ) . Hence, the larger sample variance is always put in the numerator. Topic 20 – Two Populations 20-18 This test statistic, Fobs, has a sampling distribution known as the F-distribution with two sets of degrees of freedom, the numerator and the denominator degrees of freedom. For the Fmax test: • the numerator df are nmax – 1 (nmax is the sample size 2 for smax ) and • the denominator df are nmin – 1 (nmin is the sample 2 size for smin ). Note that nmax need not be larger than nmin! The F-distribution is positively skewed with a long right tail and a shape that depends on the two df values. It is a probability distribution for random variables whose values are > 0 (like the Chi-Square distribution). Like the chi-square distribution, we can use a table of cutoff values to determine whether to reject the null hypothesis. See pages 625-635 of Fruend & Wilson. For the Fmax test, use the table on page 635 if the two sample sizes are the same, i.e. n1 = n2. Hartley’s Fmax Test of Equality of Two Population Variances Based on Two Independent Samples: Hypotheses: H0: σ 12 = σ 22 vs. HA: σ 12 ≠ σ 22 Topic 20 – Two Populations 20-19 Test Statistic: Fobs = 2 smax 2 smin 2 2 where smax > smin The numerator df are nmax – 1 (nmax is the sample size for 2 smax ) and the denominator df are nmin – 1 (nmin is the 2 sample size for smin ) Decision Rule (2 approaches): 1) reject H0 if Fobs > tabulated F-value for α and the two sets of df. 2) reject H0 if the p-value of the test < α. EXAMPLE A wildlife biologist is interested in comparing the variability in weights for two populations of deer: those raised in the wild and those raised in a zoo. She randomly selected eight deer from each population and weighed them (lbs) at the age of 1 year. The data are: W W Z Z 114.7 134.5 103.1 182.5 W W Z Z 128.9 126.7 90.7 76.8 W W Z Z H0: σ W2 = σ Z2 vs. HA: σ W2 ≠ σ Z2 111.5 120.6 129.5 87.3 W W Z Z 116.4 129.6 75.8 77.3 Topic 20 – Two Populations 20-20 From SAS we have (subset of the output): Statistics Variable weight weight weight location W Z Diff (1-2) Method Pooled Satterthwaite Variable weight N 8 8 Mean 122.86 102.88 19.988 Std Dev 8.2342 36.853 26.701 T-Tests Variances DF Equal 14 Unequal 7.7 Std Err 2.9112 13.029 13.351 t Value 1.50 1.50 Equality of Variances Method Num DF Den DF F Value Folded F 7 7 20.03 Test statistic: Fobs = 2 smax 2 smin = 36.852 8.23 2 Pr > |t| 0.1566 0.1742 Pr > F 0.0008 = 20.03 Choose α = 0.05 . Decision: 1) From the table on pg. 635, with the denominator df = 8-1 = 7, we have a cutoff value of 4.99. Since Fobs = 20.03 > cutoff = 4.99, we reject the null hypothesis and conclude that the two populations of deer, those raised in Topic 20 – Two Populations 20-21 the wild and those raised in zoos, differ in the variability of their weights at 1 year of age. 2) From the SAS output, the p-value for Fobs is 0.0008 which is less than α = 0.05. Hence, we reject the null hypothesis and conclude that there is sufficient evidence to indicate that the variability of weights of deer raised in zoos differs from the variability of weights of wild deer. C) Comparing Two Population Means Using Paired Samples Consider the following experiments: 1. In order to determine if two IQ tests yield similar results (means and standard deviations), the researcher selected 50 college students at random to take both tests. The order in which any given student took the tests was randomized and the tests were taken 1 month apart to minimize crossover effects. The hypothesis is that test # 1 is biased in that it yields a higher average score than test #2 which has been in use for many years. Hypotheses: Ho: μ1 − μ 2 = 0 vs. HA: μ1 − μ 2 > 0 Note the experimental design here as well as the hypotheses being tested. We can’t use the independent samples test for this case. Topic 20 – Two Populations 20-22 2. A swine nutritionist wished to compare a nitrogen poor + enzyme diet (#1) to a nitrogen rich diet (#2) for pigs. Rather than take one piglet from each new litter and assign it a diet at random, he chose instead to take 2 piglets from each litter and randomly assign one pig to one diet and the other to the other diet. The hypothesis is that the nitrogen rich diet results in a higher average weight gain than the nitrogen poor + enzyme diet. Hypotheses: Ho: μ1 − μ 2 = 0 vs. HA: μ1 − μ 2 < 0 3. A researcher is interested in the effect of oxygen exposure on cell fluidity in pulmonary artery cells in dogs. She intends to collects cells from ten dogs for the experiment. For each dog, two agar plates of artery cells are prepared and each plate is randomly assigned to either receive O2 or not receive O2 treatment. She wishes to test the hypothesis that the mean fluidity for oxygen treated cells (2) differs from the mean for untreated cells (1). Hypotheses: Ho: μ1 − μ 2 = 0 vs. HA: μ1 − μ 2 ≠ 0 In all three cases, the samples are NOT independent of each other. In fact, they are deliberately dependent. Topic 20 – Two Populations 20-23 One reason for this is that the estimator of the difference between two means based on 2 independent samples has a large standard deviation (recall that it is the square root of the SUM of two variances). When samples are paired as is done here, the standard deviation of the estimator of μ1 − μ 2 used for a paired experiment is often smaller. Defn: A PAIRED or “BLOCKED” experiment is one in which each randomly selected experimental unit in the first sample is paired deliberately with a selected unit in the second sample. The units in the second sample are chosen so that they have characteristics similar to the unit in the first sample to which they have been paired. The characteristics used for pairing are usually those that likely have an effect on the response variable being studied in the experiment but are not of direct interest. It is this last statement that often leads to the standard deviation being smaller in paired experiments. Example #1. Perfect pairing since each experimental unit in sample 1 is also used in sample 2. By having each student take both tests and looking at the differences in scores we have removed variability due to intelligence or Topic 20 – Two Populations 20-24 test taking ability or other things that influence an individuals test taking ability. Intuitively, comparing how several people react to each test is more informative and accurate than comparing results for independently chosen people for each test. Example #2. Genetics has a relatively large influence on adult size and growth in most animals. Hence it would not be surprising that two pigs from the same litter would respond to each of the two diets similarly in the sense that one would respond as the other would have had it been on the first pig’s diet as well. Hence, the two littermates are paired in this experiment and we will look at the difference in responses of litter mates on different diets. Example #3. Although the cells in each of the 2 treatments are not exactly the same, they are as close as possible, being from the same animal. Hence any effect due to animal variability is controlled somewhat by using the same dogs for both treatments. For paired samples, the estimator of the difference μ1 − μ 2 is the average of the paired sample differences D. To obtain this mean difference: Topic 20 – Two Populations 20-25 For each pair, calculate the difference in Y of the two paired experimental units under the two treatments. Call this difference D. EXAMPLE: Cell fluidity Dog 1 2 3 4 5 6 7 8 9 10 Mean SD Without O2 With O2 (Y1) (Y2) 0.308 0.308 0.304 0.309 0.305 0.305 0.304 0.311 0.301 0.303 0.278 0.293 0.296 0.302 0.301 0.300 0.302 0.308 0.237 0.250 0.294 0.299 0.022 0.018 Difference D=(Y1-Y2) 0.000 -0.005 0.000 -0.007 -0.002 -0.015 -0.006 0.001 -0.006 -0.013 -0.0053 0.00542 We have transformed the original data from two dependent samples -- the new data consist of a single sample of n differences. The average of the sample differences is Topic 20 – Two Populations 20-26 1 n D = ∑ Di n i =1 and the standard deviation is n sD = 2 ( D D ) − ∑ i i =1 n −1 . The sample of differences can be regarded as a random sample from a population of differences if the experimental units (e.g. the ten dogs) can be regarded as a random selection from among all experimental units. In that case, we have SAMPLING DISTRIBUTION of D : 1) the mean of the distribution is μ D = μ1 − μ 2 2) the standard deviation of the distribution is σD = σD where σ D is the standard deviation of the n population of differences from which we sampled n differences. 3) the shape of the distribution is approximately normal (a bell curve) if n is large or the two populations being sampled are approximately normally distributed. Topic 20 – Two Populations 20-27 The estimator of μ D is D , the sample mean difference and the estimator of σ D is s D , the sample standard deviation of the differences. The problem reverts to a test of the mean μ D based on a single sample. Hypothesis Test of the Difference in Two Population Means Using Paired Samples: Hypotheses are one of three: a) H0: μ D ≤ D0 vs. HA: μ D > D0 b) H0: μ D ≥ D0 vs. HA: μ D < D0 c) H0: μ D = D0 vs. HA: μ D ≠ D0 D − Do on n – 1 df sD n P-value: depends on the alternative hypothesis: a) P-value = Pr( T > tobs) b) P-value = Pr( T < tobs) c) P-value = 2 Pr( T > |tobs|) Test Statistic: tobs = Decision Rule: reject Ho if p-value ≤ α Assumptions: 1. D is approximately normally distributed 2. the sampling was random and not more than 5% of the population. Topic 20 – Two Populations 20-28 EXAMPLE dog fluidity study Hypotheses: Ho: μD = 0 vs. HA: μD ≠ 0 (i.e. D0 = 0) Significance Level: we’ll choose α=0.025. Now, the numbers we need are: D = −0.0053 , s D = 0.00542 , and n = 10 Test Statistic: tobs = D −0 − 0.00530 = −3.0939 = sd 0.00542 10 n df = n − 1 = 9. P-value: 2Pr(T>|tobs|) = 2Pr(T>+3.09) is between 0.01 and 0.02 using the T-table in the book. Conclusion: p-value < α=0.025. Hence we reject Ho and conclude that the data provide sufficient evidence at α=0.025 to indicate that oxygen treatment changes the mean fluidity of pulmonary artery cells in dogs. Assumptions: The sample size is small but it is likely that the population of differences are not too skewed. In SAS, the code and output are: data dogs; input dogID Y1 Y2 @@; Topic 20 – Two Populations 20-29 datalines; 1 0.308 0.308 2 3 0.305 0.305 4 5 0.301 0.303 6 7 0.296 0.302 8 9 0.302 0.308 10 ; proc ttest data=dogs; paired Y1*Y2; quit; 0.304 0.304 0.278 0.301 0.237 0.309 0.311 0.293 0.300 0.250 The TTEST Procedure LowerCL UpperCL LowerCL UpperCL Diff N Mean Mean Mean StdDev StdDev StdDev StdErr Y1-Y2 10 -0.009 -0.005 -0.001 0.0037 0.0054 0.0099 0.0017 Difference Y1 - Y2 T-Tests DF t Value 9 -3.09 Pr > |t| 0.0128 The output gives the exact p-value of 0.0128 < α =0.025. Confidence Interval For the Difference of Two Means Based on a Paired Sample: ⎛s ⎞ D ± tα / 2,n −1 ⎜ D ⎟ ⎝ n⎠ Assumptions: 1) sampling is random and 2) either the sample size is large so we can use the CLT or the original population of differences has a frequency distribution that is bell-curve shaped. Topic 20 – Two Populations 20-30 EXAMPLE: dog fluidity study. For a 95% confidence interval of the difference of two means we need the t critical value for 95% and 9 df. It is t = 2.26. Hence, the 95% C. I. of the difference between mean fluidity in cells with and without oxygen is ⎛ .00542 ⎞ − 0.0053 ± 2.26⎜ ⎟ = −0.0053 ± 0.0039 ⎝ 10 ⎠ = (−0.0092, − 0.0014) which implies that the mean fluidity in the cells without oxygen is below the mean fluidity for those that receive oxygen with 95% confidence. The SAS output provides the same confidence limits.