Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapters 10 Two Sample Problems: Inference about a Population Mean When we have to samples (treatments) and we want to test to see if there is a statistically significant difference between the two groups this is the procedure we use. Many of the techniques and statistics that we have used in previous chapters will be used again. So it should seem very familiar how we go about studying and analyzing this type of procedure. Assumptions When Comparing Two Sample Means a. When looking at two samples we have to assume independence. If they were not independent it would not make sense to think they are different from one another (i.e. if dependent you are saying they ARE similar…so why test differences?). b. The populations must be normally distributed with either known or unknown means or variances. This implies we will be using a z-stat & t-stat. **although the book goes through the procedure with unknown mean and variance…we should note that if they were known then instead of using a t-stat we would simply use our Z. In so the method would be exactly the same but, we would use the Z. I am going to outline both. A. Hypothesis Testing: Differences between means Recall from before that we had essentially 3 types of hypotheses. Since we are testing to see whether or not there is a difference between the populations we get the following types of hypotheses: a. Ho: u1 ≥ u2 or u1 - u2 ≥ 0 Ha: u1< u2 or u1 - u2 < 0 **to get the other one tailed test simply switch u1 and u2 around b. Ho: u1 = u2 or u1 - u2 = 0 Ha: u1 ≠ u2 u1 - u2≠ 0 So what we are testing is to see if one mean is bigger than the other (i.e. the one tailed tests) or to simply see if there is a significant difference between the two (either higher or lower two tailed test. Question: So when might this be used? Answer: What if we wanted to see if there was a significant difference between the test scores of on-line Business Statistics students vs. In-Class Students. If we didn’t suspect one type of result or the other we would have hypotheses of: 1) Ho: u of Test Scoresin-class = u of Test Scoreson-line or u1 - u2 = 0 **this implies that there is no difference 1 2) Ha: u of Test Scoresin-class ≠u of Test Scoreson-line or u1 - u2≠ 0 The method of analysis is exactly the same except know when find our Z-stat or t-stat. So the first thing we need to do (as before) is find out if we have known σ’s or unknown values and have to use the sample values (i.e. the s’s) Performing Hypothesis Tests on Difference Between Means: 3 Conditions 1. σ – known In this case we simply use our z-stat as before as follows: Z= where SE and Do is the difference that is hypothesized. Note that many time we simply have Do = 0. Generically in can be thought of as the difference between ) b. Example: If we are given the avg height of males in one class of 50 students is 67 inches with a variance of 9 inches and the avg height of 36 males in a second class is 66 inches with a variance of 6 inches, perform a hypothesis test to see if the mean height of males is different between the two classes. Use α = 0.05. Since we have that n > 30 for each of the samples we may use the Z-table to get out test statistic. i) Hypotheses: Ho: uclass1 = uclass2 Ha: uclass1 ≠ uclass2 ii) Test Statistic: Z= = ≈ 1.70 SE = = 0.589 iii) Analysis of Critical Region: Since our alpha is 0.05 and we have a Z-stat that is two-tailed our critical values are +/- 1.96. So we will reject Ho only if our test statistics lies farther out than either of these two values. iv) Conclusion: Since our test statistics = 1.70 < 1.96, we fail to reject Ho and conclude there is not enough evidence to suggest that the mean height of males is different between the two classes. 2 2. σ – unknown In this case we use our t-stat as before but there are two cases we must examine here. We must decide if the sample variances are roughly equal or not (i.e. if Case 1: We have σ-unknown and we think that (i.e. equal variance assumption) In this case we use the following statistic called the pooling estimate for variances = pooled estimate We then note that SE in this case is and our test stat is then simply and in this case the df = n1+n2-2 example: Suppose we are given the following values and told that our variances can be assumed equal. If we want to test whether or not there is a difference in the samples at the 95% level of confidence, here is how we do it. Variables Sample 1 Sample 2 15 16 S 4 4.6 n 24 16 Step 1: Set up Hypotheses: So we are testing to simply see if there is a difference so i) Hypotheses: Ho: usample 1 = usample 2 Ha: usample 1 ≠ usample 2 ii) Find the test statistic, which is t = One of the first things we want to do is find the SE or the standard deviation since our calculation becomes much simpler then. So recall that with So variance. ; What this gives us is an approximate value for the combined We then find SE = ; This is the standard error So now we find that our t-stat = iii) Compare to Critical Value: Since we want to be 95% confident we want the t-stat that gives us the following: 3 These are your t-values that put 2.5% in each tail with dof = = 3. These values are ± 2.042 Note: There is no value for t with dof = 38 so always go to the lower Two Sided Test -you need to note the level of confidence and α to find the appropriate t-values for your CI. value…in this case 30. t -tα/2 =-2.042 μ tα/2 =2.042 iv) Conclusion: Since our test stat is not in the critical/rejection region [i.e. -1.50 > -2.042 So it does not lie in the tail], which is always the tail, we can conclude that there is no statistical difference between the sample means with 95% confidence. Case 2: We have σ-unknown and we think that then use the following information: t-stat = (or unequal variances assumed) and we where SE This method is a little simpler in calculating the test statistic, but where this becomes a bit more difficult is find the appropriate degrees of freedom. The exact same method is used but the degree of freedom calculation for this instance is: ; and yes…this is a bear. So if you believe the variances are close enough to being equal you would generally want to assume this if you had to do this by hand. This becomes much simpler when we have a software package simply give us the dof. In any case our rule will be to assume they are equal unless you test them and find they are statistically different or you notice that the larger sample variance is 3X the value of the smaller sample. For example suppose we have the following data: Variables Sample 1 15 s2 9 n 24 Sample 2 16 2 16 In this case we see that sample 1’s variance is 9 and it is the larger sample. So we must check to see if it s 3X the size of sample 2’s variance. Since 9/2 = 4.5 we see it is in fact 4.5 times the size and we would use the unequal variances method. 4 B. Hypothesis Testing: Paired Differences Many times experiments are run where the variation between the measurements needs to be controlled for to truly test whether there is a difference between measurements. In this case we want to use the paired differences procedure. It can be thought of as a before and after test. Here are some common time and why: 1) When performing a drug test you want to see how it affects a person or group of people…but internal chemistry, age, weight, and other characteristics need to be controlled for. 2) When one wants to test the difference in procedure or method. For example, if we wanted to test study methods, we would want to have the same people use the different methods and get the difference in response to a test of some sort 3) A final example could be fertilizer or some sort of agricultural process. Since soil composition, sunlight, water retention, etc are different among different plots of land you generally want to run a test on the same plot of land to ensure there is truly a difference that occurs due to what you are testing for. 1.Calculation of Mean Difference and Variance/Standard Deviation Here we get turn our two means into one mean. We do this for all the differences as follows: i) or you can take measure 2 and subtract measure 1. ii) To get the variance and standard deviation we do the following: with as before. If you notice, this is just the regular standard deviation and variance formula applied to mean difference For example suppose we have the following data on practice test scores for the SAT at Kaplan before and after the class for seven people (n=6) Person Before After Difference (measure 2- measure 1) 1 1050 1210 160 2 970 1310 340 3 1160 1480 320 4 890 1080 190 5 760 1110 350 6 1260 1580 320 (160-280)2 (340-280)2 (320-280)2 (190-280)2 (350-280)2 (320-280)2 / n = [160+340+320+190+350+320]/6 =280 = [(-120)2 +(60)2 + (40)2 + (-90)2 + (70)2 + (40)2 ] /5 = 6840 2. Performing a Hypothesis Test 5 -so now we can perform a hypothesis test as before like we were simply using a t-stat. In this instance our test stat becomes t with dof = n-1 where n is the number of pairs Example: Using our data before suppose we want to test whether or not the Kaplan class actually improves scores with 90% confidence. i) Hypotheses: Ho: ud ≤ 0 (i.e. that it doesn’t improve anything) Ha: ud > 0 (i.e. that the Kaplan improves scores since the difference would be positive ii) Test Stat: t iii) Compare to critical value: One Sided Test -It is only one sided since we are testing to see if the score is increased -you need to note the level of confidence and α to find the appropriate t-values for your CI. This is your t-value that put 10% in the upper tail with dof = . The value turns out to be 1.476 t μ tα=0.1 =1.476 iv) Conclusion: Since our test stat is clearly bigger than 1.476 (8.29 >> 1.476 so it lies in the critical or rejection region) we can conclude that with 90% confidence that the Kaplan study program does indeed help to raise scores. C. Hypothesis Testing: Comparing Two Population Proportions using Large Samples All of the hypothesis tests we have done before have involved means and differences of means, but hypothesis testing can be applied to any type of distribution. In this next section we will look at how it can be used to compare differences in population proportions. As before we want to see if there is any difference between two values, in this case must be the case that we have random independent larger samples. 1. How to calculate the standard deviation: ; note that if we don’t have p1 and p2 and a. We can estimate must use sample values then we can calculate subbing in our sample proportions. by simply using the previous formula but 6 It b. No Difference Assumed between One thing to note is that in hypothesis testing we are testing to see if there are differences in the proportions. So if we assume that there is no difference (i.e. Do = 0 as in most cases) then we can amend the calculation of the standard deviation as follows: where we have that So in this case our proportion is constructed using both samples. Example: Suppose we are looking at the number of students at OCCC who wear shorts and we take two samples. One during the day and one at night both of size 50. If we have that in the first sample during the day that 23 wore shorts and in the evening that 28 did then our Note: 2. Performing the Hypothesis Test -this is done with as before with the same four steps. So suppose we want to see if there is a difference between the two samples and we want to test this at the 95% confidence level. i) Set up hypotheses Ho: p1- p2 = 0 Ha: p1- p2 ≠ 0 *it is two sided because we are only seeing if there is a difference and not if one is specifically bigger than the other. ii) Test Stat: We use a Z since we have large samples and in this case Z= **if you have = = -1 then you would use this as the standard deviation 7 iii) Find the critical value – These are just our typical z values for 95% confidence level Two Sided Test Z -zα/2 =-1.96 Zα/2 =1.96 0 iv) Conclusion: When we compare our critical values and our test stat we find -1 is not in either tail. So we fail to reject Ho and conclude there is no difference in the proportion of students who wear shorts at night or during the day. D. Hypothesis Testing: Comparing Two Population Variances assuming Independence -recall from parts A and B that sometimes we assume that the variances of two samples are equal and sometimes we assume they are unequal. Aside from simply using our rule of equality of variances unless the larger sample variance is 3X the size of the smaller sample we can actually test to see if they are different. To do this we must introduce a new distribution called the F distribution. The F-distribution has the following characteristics. i) It is skewed to the right (i.e. tail extends to the right) ii) It only has positive test stats. So it starts from 0. iii) It has two degrees of freedom for the numerator (df1) and the denominator (df2). Graphically it looks like: Once again α is the probability in the tail. We can have one or two sided tails. In this case we just have one. If there were two you would divide up α into two parts as before. F Fα For example suppose we wanted to find the F value with α =0.05 and df1 = 7 and df2 =7 would be F= 3.50 8 To get these values go to A.5-8 on 644 and 647. Make sure to use the proper table with the appropriate α. 1. Performing a hypothesis test for equality of variances – General Set up a. Set up the Hypotheses So we can test if i) Two Tailed ii) One Tailed: Ho: Ha: Or if Ho: Ha: **nearly all of the time we will run a two tailed test. b. Set up the test state which is F = or F = It depends on whether is larger. Put the larger on the top. c & d. Find the critical value (based on the hypothesis test run and α) and then state your conclusion. Note that for the degrees of freedom df1 = n1-1 and df2 = n2-1. When performing the test if the df are not in the table always round down. For example if we have df1 = 23-1 =22 then use the value for 20. Also, when we have our alpha we must 2. Example: Suppose we want to test to see if the following variances are significantly different from one another testing them at the 95% level. Variables Sample 1 Sample 2 15 16 2 s 9 2 n 24 16 a. Hypotheses: It is two tailed since we are testing to see if they are different b. Test Stat: Since sample 1 has a greater value we have F = c. Find the critical value: Once again α is the probability in the tail. We have a two tailed test so α/2=0.025 with df1 = 24-1=23 and df2 = 15-1 =14. We use A.7 since we want F0.025 F Fα/2 F0.025 = 2.84 9 d. Conclusion: Since our test stat of 4.5 is greater than 2.84 and lies in our tail (also known at our critical or rejection region) then we reject Ho and conclude that with 95% confidence the variances are NOT equal. 10