Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparing Two Population Means Chapter 9 Introduction What information does the test statistics give us? What does a p-value mean in terms of the null hypothesis? Suppose we are investigating the GPA of USU students. We collect a sample and find the average GPA for the sample. Testing the null hypothesis H0=2.7, we get a p-value p=0.001. Is it safe to say that we have proven that the average GPA is not 2.7? Explain. Comparing Two Population Means 2 Introduction A farmer would like to know which of two brands of fertilizer results in the greater average yield from his tomato plants. How would you design a study to address this question? How would you analyze your results? Comparing Two Population Means 3 Introduction Comparing a treatment group to a control group or one treatment to another to determine if the mean response differs is an important tool of research in many disciplines. When comparing the means of two groups, a researcher has two sets of data observations. This is referred to as a two-sample problem. Two-sample problems can involve either paired (related) samples or independent (unrelated) samples. Comparing Two Population Means 4 Paired Samples Paired samples arise when measures are made twice on the same subject, or measures are made on two subjects that can be considered to be dependent. • To assess the effectiveness of a reading remediation course, students are given a pre-test and a post-test and the average scores are compared. • The speed at which a group of athletes can run 1 mile is recorded, they are then given a strict training regime for a period of time and their mile times are recorded again. • An experiment is conducted to compare a new tire to a standard tire. One of each type of tire is placed on each of 20 trucks, the trucks are driven over a variety of road conditions and the reduction of tread is measured for each tire. Comparing Two Population Means 5 Paired Samples Paired samples are analyzed by reducing the problem to a one sample problem. This is done by calculating the differences between each of the pairs of observations. We can apply the techniques of chapter 8 to make inferences about the unknown mean μz of the differences. In general, for related samples we observe the n pairs (X1, Y1), (X2, Y2),…, (Xn, Yn). The difference in the ith pair is denoted by Zi = Xi – Yi, for i = 1, 2,…, n. What’s a point estimate of μz? What is the standard error of this point estimate? Comparing Two Population Means 6 Paired Samples We are usually interested in whether the mean μz of the differences is equal to zero so we test the hypothesis H0: μz = 0 versus HA: μz ≠ 0 using the test statistic n ( z − 0) t= s which will follow a t-distribution if the sample size is sufficiently large and we assume the null hypothesis is true. Additionally, a confidence interval for μz is given by tα / 2,n −1s tα / 2,n −1s ⎛ μ Z = μ A − μ B ∈ ⎜⎜ z − ,z + n n ⎝ Comparing Two Population Means ⎞ ⎟⎟ ⎠ 7 Paired Samples Example: Neurobiology suggests that piano lessons may improve the spatial-temporal reasoning of preschool children. To test this hypothesis, the spatial-temporal reasoning of 34 preschool children was measured before and after 6 months of piano lessons. The changes in their reasoning scores are shown below: Is there evidence that piano lessons changes spatial-temporal reasoning? Construct a 95% confidence interval for the average difference. Comparing Two Population Means 8 Independent Samples Independent samples arise when measures are made on two unrelated or independent subjects. • To assess the effectiveness of a reading remediation course, students are randomly assigned to a remediation and nonremediation group. Reading scores are compared between the two groups. • Athletes are randomly assigned to two groups. The first is given a rigorous new training regime and the second uses a standard regime, the average mile times of the two groups are compared. • New tires are put on 10 trucks and standard tires are put on 10 others. The trucks then drive over a variety of terrains. The reduction in tread is measured and compared for the two types of tires. Comparing Two Population Means 9 Independent Samples Example: We’d like to assess the effect of piano lessons on spatial-temporal reasoning by comparing the piano lesson group to a control group that did not receive piano lessons. The changes in scores for the treatment and control group are: Comparing Two Population Means 10 Independent Samples Comparing Two Population Means 11 Independent Samples Consider a sample of n observations xi from population A, with mean x and sample standard deviation sx, and a sample of m observations yi from population B with mean y and standard deviation sy. The point estimate of the difference in means is x − y, thus s.e.( x − y ) = σ A2 n + σ B2 m Comparing Two Population Means 12 Independent Samples There are two methods for analyzing independent samples - the difference between them is how the standard error is estimated. The book describes a third method for when the population variances are known, but we will not discuss this, since population variances are usually unknown. The two procedures for estimating the variance are a “general procedure” that can be used in any case and a “pooled variance procedure” that is useful when the variances of the two populations are approximately equal. Once we have an estimate of the standard error of x − y , we can use this estimate to create confidence intervals and conduct hypothesis tests for the difference. Comparing Two Population Means 13 Independent Samples The general procedure estimates the standard error s.e.( x − y ) = σ 2 A n + σ 2 B m by s.e.( x − y ) = Comparing Two Population Means 2 x s y2 s + n m 14 Independent Samples Thus, a (1-α) level confidence interval for μA – μB is given by ⎛ μ A − μ B ∈ ⎜ x − y − tα / 2,ν ⎜ ⎝ 2 x s y2 s + , x − y + tα / 2,ν n m 2 ⎞ s s y + ⎟ n m⎟ ⎠ 2 x The degrees of freedom ν can be calculated via a formula given of page 395 in your text. Statistical software will calculate this automatically. When making calculations by hand, we will use the convention that the degrees of freedom ν are equal to the minimum of (n-1) and (m1). Comparing Two Population Means 15 Independent Samples To implement a hypothesis test for H0: μA - μB = δ, the test statistic is T= x − y −δ 2 x 2 y s s + n m Under the null hypothesis and given a sufficiently large sample size, T approximately follows a tν distribution, where ν = min(n-1,m-1). Comparing Two Population Means 16 Independent Samples We’d like to assess the effect of piano lessons on spatialtemporal reasoning by comparing the piano lesson group to a control group that did not receive piano lessons. The sample mean and standard deviation of score changes for the treatment group are 3.62 and 3.06 respectively (n=34), while the mean and standard deviation for the second sample are 0.39 and 2.42 (m=44). Construct a 95% confidence interval for the difference in score changes between the treatment and control groups. Conduct a hypothesis test to determine whether there is a difference between the change in scores for the two groups. Comparing Two Population Means 17 Independent Samples The general procedure can always be used if the sample size is sufficiently large or the data are approximately normally distributed. However, when it is safe to assume that the variances of the two populations are equal, a better analysis can be obtained by using the estimate s.e.( x − y ) = s p where sp = 1 1 + n m (n − 1) s x2 + (m − 1) s y2 n+m−2 Comparing Two Population Means 18 Independent Samples A confidence interval for μA – μB is given by ⎛ 1 1 1 1⎞ ⎜ μ A − μ B ∈ ⎜ x − y − tα / 2,n + m − 2 s p + , x − y + tα / 2,n + m − 2 s p + ⎟⎟ n m n m⎠ ⎝ And a hypothesis test of H0: μA - μB = δ is implemented using the test statistic x − y −δ T= 1 1 + sp n m where T has a t-distribution with n+m-2 degrees of freedom. Comparing Two Population Means 19 Independent Samples How do we decide if the sample variances are approximately equal? A good rule of thumb is to assume equal variances when the larger of s2x and s2y is no more than 1.5 times the smaller of the two. Returning to the ‘piano lessons affect spatial-temporal reasoning’ example, is it appropriate to assume that the variances are equal? Comparing Two Population Means 20 Independent Samples To study the question “does cocaine use by pregnant women cause their babies to have low birth weight,” birth weights (in lbs) of babies whose mothers used cocaine during pregnancy were compared to birth weights of babies whose mothers did not. For the groups whose mothers did use cocaine, the average and standard deviation of birth weight are 6.025 and 1.32 respectively. For the other group, the average birth weight was 6.87 lbs with and standard deviation of 1.48 pounds. Is it appropriate to assume equal variances of the two samples? Construct a 95% confidence interval for the difference in birth weight between the ‘cocaine’ and ‘no cocaine’ groups. Conduct a hypothesis test to determine whether there is a difference in birth weight for the two groups. Comparing Two Population Means 21 Independent Samples A study looked at the relationship between physical fitness and ego. Middle-aged college faculty were divided into low and high fitness groups based on a physical exam and were given a personality test to assess ego strength. The average and standard deviation of ego strength scores for the low fitness group were 4.64 and 0.69 respectively (n=14), while the mean and standard deviation of the high fitness group were 6.42 and 0.44 respectively (m=14). Construct a 99% confidence interval for the difference in ego scores of the two groups. Use the pooled variance procedure if this is appropriate. Is there evidence that mean ego scores are different in the two groups? Comparing Two Population Means 22 Independent Samples Describe a two-sample experiment that would be relevant to your field. How would you collect you samples? Would you use paired or independent samples? Find a general form for a two-sided confidence interval for a statistic θˆ. Comparing Two Population Means 23 Statistical Software In R two sample tests are implemented with the command: >t.test(x,y,paired=F,var.equal=T) “paired=F” and “var.equal=F” are the defaults. Notes: Use the command x<-read.table(“file.txt”, header=T) to read in files consisting of multiple columns. Use the command x[i,j] to access row i and column j of x. ‘x[i,]’ will access row i and ‘x[,j]’ will access column j. So if the two samples are in columns 1 and 2 of x, the command to implement a two sample test for independent samples would look like this (assuming unequal variances): >t.test(x[,1],x[,2]) Comparing Two Population Means 24 Statistical Software Excel “ttest(array1,array2,tails,type)” – array1 and array2 contain the x and y samples – ‘tails’ allows you to choose a two-sided or one-sided test – ‘type’ options are ‘1-paired, 2-equal var, 3-unequal var’ SAS From ‘hypothesis tests’ under statistics menu – ‘two sample paired t-test for means’: Enter variables into groups 1 and 2, choose hypotheses, ok. – ‘two sample t-test for means’: select ‘two variables’, enter variables into groups 1 and 2, choose hypotheses, ok. This will give you results for equal and unequal variances. Comparing Two Population Means 25 Examples State whether you would treat the samples as paired or independent. If independent, would you use the general or equal variance procedure to analyze the data? 1. Does calcium reduce blood pressure? A randomized comparative experiment gave one group of 10 men a calcium supplement for 12 weeks. The control group of 11 men received a placebo. The average decrease in blood pressure for the treatment group was 5 mm with sample standard deviation 8.743 mm. The average and standard deviation for the control (placebo) group were -0.273 mm and 5.901 mm respectively. Is there evidence of a difference between the two groups? Comparing Two Population Means 26 Examples State whether you would treat the samples as paired or independent. If independent, would you use the general or equal variance procedure to analyze the data? 2. A reading activity is introduced to help third graders improve their reading. A class of 21 students takes part in the activities for 8 weeks. A control class of 23 students has the same curriculum minus the activities. At the end of the 8 weeks, all students are given a reading test. The average and standard deviation of the treatment groups scores are 51.41 and 11.01 respectively. The average and standard deviation of the control group scores are 41.52 and 17.15 respectively. Is there evidence that the activities helped? Comparing Two Population Means 27 Examples State whether you would treat the samples as paired or independent. If independent, would you use the general or equal variance procedure to analyze the data? 3. A group of 133 male students and a group of 162 female students were given a test to determine how accurately they appraise one another. The possible test scores were from 0 to 41. The average scores and standard deviations for the two groups were 25.24 and 5.05 for the males and 24.94 and 5.44 for the females. Do these data support the contention that male and female students differ in average social insight? Comparing Two Population Means 28 Examples State whether you would treat the samples as paired or independent. If independent, would you use the general or equal variance procedure to analyze the data? 4. A test was given to the husband and wife in 133 married couples to determine how well men appraise women and vice versa. The possible test scores were from 0 to 41. The average scores and standard deviations for the two groups were 25.24 and 5.05 for the males and 24.94 and 5.44 for the females. Do these data support the contention that males and females differ in average social insight? Comparing Two Population Means 29 Examples State whether you would treat the samples as paired or independent. If independent, would you use the general or equal variance procedure to analyze the data? 5. A pharmaceutical company interested in comparing a new drug for arthritis, drug A, to the standard treatment, drug B, administers each of the drugs to a group of 30 arthritis sufferers and records the time it takes for the pain to go away. The order in which of the subjects receives the treatments is random. The drugs are administers on different days so that there are no residual effects from a previous treatment. The average time for drug A was 15.2 minutes with a standard deviation of 3 minutes. The average time for drug B was 17.3 minutes with a standard deviation of 5 minutes. Is there evidence that there is a difference in relief times? Comparing Two Population Means 30