Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapters 9 Hypothesis Testing and Statistical Inference -now we want to extend our statements about a population parameter; we use our knowledge of sampling and sampling distributions to reject or fail to reject whether or not we think a certain population has certain characteristics. Ex: We may think that the average income level in a certain community is around $45,000, but if we take an appropriate random sample and find out that the income is much lower we now will have a basis to prove this false (with a certain level of confidence think and recall confidence intervals). A. Developing and Defining the Null and Alternative Hypothesis 1. Null Hypothesis – Ho – tentative assumption about a population parameter. This is what you are testing. It can only be rejected or failed to be rejected with a certain level of confidence cannot be accepted. 2. Alternative Hypothesis – Ha – opposite conclusion of the null hypothesis. 3. Types of Hypotheses and How to Test Them a. Testing a Research Hypothesis – this is when you try to prove something based on experimental evidence. What you are trying to prove is generally your Ha. Generally your null hypothesis is the status quo and you want to show how under your experimental settings the status quo is not achieved. You are trying to reject the null to support your research being done. Ex. Suppose you look at performance and Gatorade. You want to show that the use of Gatorade helps with performance. Ho: Person performs the same using Gatorade Ha: Person performs better than the status quo while using the sports drink b. Testing the validity of claim – in this case you want to test a claim being made. We assume that the claim is true unless there is sample evidence that supports the alternative. No can be both a one or two sided test. Ho: Can have u > uo, u < uo, or u = uo Ha: Opposite of claim above. c. Decision Situation – this is when you are trying to determine a course of action. You would generally make some decision based on whether or not you thought Ho or Ha were proven to be true with a degree of confidence. 1 4 Testing Conditions in General a and b are one sided while c is two-sided a. Ho: u ≥ uo Ha: u < uo b. Ho: u ≤ uo Ha: u > uo c. Ho: u = uo Ha: u ≠ uo Note: uo is just some proposed value that you are testing. It is the population parameter that you think the data takes on. -A two sided test is concerned whether or not the value is different from some proposed value, while a one sided test is concerned whether or not the testing value is larger or smaller than some proposed value. 5. Type I and Type II errors Table 1: Accept Ho Reject Ho Ho is actually True Correct Conclusion Type I error Ha is true Type II error Correct Conclusion -when we designate α we are actually designating the amount of type I error that is going to be accepted. So when alpha is equal to 0.05, we accept that 5% of the time we expect that we will reject Ho when it is actually true. -many experiments actually don’t try and control for type II error and for this reason we do not accept Ho, but rather we fail to reject it. A. Testing Hypothesis using the Critical Value Method and Test Statistic – Mean with σ is known 1. Formulating the Hypothesis Test a. Test Statistic – this is the value that you compute from the sample. It is used to decide whether or not to reject of fail to reject the null hypothesis. Example: one example is the z-stat for testing a sample mean. In this case your test _ statistic takes on the value of: z = ( x - μ) / (σ / n) b. Critical Value – this is the value that you are testing against. It is the value of z that you compare your test statistics to. 2 Graph 1: One Sided Test Rejection region for one sided test -you need to note the level of confidence and α to find the appropriate critical value zα – this is your critical value; if you lie to the right of this you reject Ho Z μ Graph 2: Two Sided Test Rejection regions for two sided test -you need to note the level of confidence and α to find the appropriate critical value Z -zα/2 zα/2 – these are your critical value; if you lie to the right or the left of these you reject Ho μ c. Example: Suppose you are told the mean score in a particular dept on a test is 70 with σ = 2. If you take a sample of 20 students and find that the average is 75, do you have enough evidence at the 95% confidence level to support this conclusion. Step 1: set up the hypotheses Ho: u = 70 Ha: u ≠70 -so this is a two sided test Step 2: Find critical values so we go to the normal table and find the z’s that give us α = 0.05 or where 2.5% of the probability lies in the tails. These values are 1.96 and -1.96 so if our test stat is below or above these we reject Ho in favor of Ha at the 95% level of confidence. _ Step 3: Find the test stat z = ( x - μ) / (σ / n ) = ( 75 – 70 ) / (2 / 20 ) = 11.18 Step 4: Compare test stat to critical value and analyze Since we have 11.18 >> 1.96 we can conclude at the 95% level of confidence that the mean score in the dept is most likely not 70. It is highly unlikely to find such a large difference. Most likely the population parameter (test score) is higher, but recall that 5% of the time we would expect outliers, so it might be that the sample we obtained was in fact an anomaly and the avg is in fact 70. 3 2. Testing Hypothesis using the p-Value Method -in this case we use p-value instead of the test statistic method. It is almost identical to the above method, but instead of using and comparing a test statistic to some critical value we simply compare the probability of the test statistic to the p-value. p-value - the probability that the test statistic would take on some extreme value than that which is actually observed. This assumes that the null hypothesis is true. For a one sided test this is simply α and for a two sided test it is α/2. Ex: Using the same example as above we find that the p-value is α/2 since the test is two sided. Therefore we find that 0.05/2 = 0.025. We then find our test statistic’s p-value and compare it to the p-value above. Our test _ statistic is still z = ( x - μ) / (σ / n ) = ( 75 – 70 ) / (2 / 20 ) = 11.18 so the p-value of this z (taken from the standard normal table) is 1-0.9998 = 0.0002 Note: since the normal table only goes up to 3.49 we will use this value noting that for 11.18 it would be even smaller. So 0.0002 < 0.025 so we reject Ho again and find that the test avg is most likely different from 70. It should be the case that the results from the p-value method and the test statistic method give the same results. If you don’t get the same results a mistake has been made. -when you consider a two-sided test we notice that it is almost exactly like constructing a confidence interval and the finding a sample statistic and noticing whether or not the sample statistic lies in that interval. If we construct a 95% confidence interval then 95% of the time our sample data should lie within that interval. If it doesn’t we can mostly likely assume that the proposed population parameter value is in fact incorrect. 3. Statistical significance – if we get a p-value that is smaller than the α, then we say that the data are significant at the level of α. Ex: so in our case above we find that the results above have a statistical significance at the 0.05 level. In fact, since our p-value obtained is even smaller than 0.05 we find it to be even more significant. 4. Problems with Inferencing and Confidence Intervals a. Data may not be reliable – recall that before we talked about different sampling techniques as well as how to NOT sample. If you produce or sample data that is unreliable any tests or conclusions drawn from those test are also invalid. b. The conditions proposed must be met: Recall that to use a z-table we should have data taken from a SRS and it should follow a normal distribution. If we don’t have these 4 conditions we cannot use this type of test. Many times we must use different types of distributions to perform hypothesis testing. The procedure we have used it still valid, but we would not use the z-table, but another one. c. One test, especially if the results are used to make an expensive decision, is generally not enough to ensure that the decision is correct. The method should be covered again to ensure reliability and the level of significance should be such that the results are beyond conclusive. Instead of testing something at the 90% level or 0.10 level of significance, doing it at the 99% level might be more appropriate. d. The sample size should be duly noted. The smaller the sample the larger the variation that should be present and if the population is very large then something that affects the population overall should definitely be present in the sample. It might be the case that for this reason practical significance may or may not show up as statistically significant depending on the size of the sample and the nature of the population. 5. Power – the power of a test is 1 minus the probability of a type II error for the alternative hypothesis. B. Inference about a Population Mean- σ is unknown Recall from before that we mentioned the t-test statistic and t-distribution. This is used when there is an insufficient sample size (we will use < 30 ) and if σ is unknown. If we are given this information we use the following: 1. Sampling where σ is unknown -in this case we cannot assume that our sampling distribution follows a completely normal distribution. Now we must use the t-distribution (sometimes called the students tdistribution) to make probability statements. a. t-distribution – is a family of distributions that is based on degrees of freedom. Each distribution with its degrees of freedom has its own features. As the degrees of freedom (d.o.f) go up the distributions get closer and closer to the standard normal. This is b/c as the d.o.f. go up the variability is reduced. -we read the chart the same way as the standard normal with the only difference coming from the fact that we now have d.o.f which is (n-1) t-test stat = _ the denominator is known as s x -the only difference from before is that we don’t use our population standard deviation σ because it is not known. If one thinks about it, if we don’t know the mean, it would be impossible to know the standard deviation since the mean is required to get this value. 5 b. Confidence Intervals _ _ _ i. Two-Tailed: x ± tα/2 s x = x ± tα/2 s / n _ recall that s2 = ∑ ( xi – x )^2 / n-1 ; this is just sample variance. To get s we simply take the square root of s2 . Two Sided Test These are your t-values. They show us the ranges we expect to find the mean. -you need to note the level of confidence and α to find the appropriate t-values for your CI. t -tα/2 μ tα/2 ***so the only time you use the t-distribution is in the case of a small sample and when the population variance in unknown. In the text it implies that we MUST know the population is normal. This is actually incorrect. We even have the CLT (central limit theorem) in the text which verifies this. The CLT tells us as long as our sample size is ‘large’ our sampling distribution will be normal. Our rule for large is > 30. If we have a normal population our sampling distribution is guaranteed to be normal no matter what the sample size. So the times we will use a t-stat is if: 1) we use s in place of σ 2) if the sample is < 30 **one thing that can be used to assess the validity of this claim is to go to your t-table. As n increases (ie look at n = 40, 50, 80, 1000, and beyond) you will note that the values for the t-tables approach what we get from our Z-table. So for a 95% CI with our Z recall that we get ± 1.96. If we look at our values from the t-table we can see for values of 1000 our t-stat is 1.962. Very close to our Z-values. So in these cases it is not more efficient to use the t-stat. So we will simply use our z-table ii. One-Tailed: We can also have a one tailed confidence interval. In this case we are simply find an upper or lower bound for our values. It takes the form of: _ _ _ x ± tα s x = x ± tα s / n Note: In this case we don’t divide our α by 2. 6 tα – this is one sided now, so we only have one t-value for the CI. It could also be a lower tail (ie to the left of the mean. One Sided Test -Just as before you need to note the level of confidence and α to find the appropriate t-value t tα μ 2. Hypothesis testing using the t-stat -the method for both the test statistic and p-value method would be exactly the same. So simply go back to the previous notes and recall that method. We will do an example below to illustrate the method again. We will use the test statistic method predominantly since for our p-value method we cannot calculate all p-values of all possible sample tvalues. 3. Examples: a. Construct a 99% confidence interval for a mean when you are given the following and the population standard deviation is unknown: _ x = 18 s=3 n = 25 All of the conditions hold for using the t-stat. We don’t know the population standard deviation and our sample size is less than 30. So our first thing to do is to find our tα/2. We do this by going to the t-table at the back of the book and looking up the CI for 99% with dof = n – 1 = 25 – 1 = 24. This value is 2.787 Now we simply apply the formula as before. _ _ _ x ± tα/2 s x = x ± tα/2 s / 19.672 n 18 ± 2.787*(3/ 25 ) 18 ± 2.787*(0.6) 16.328 to So we are 99% sure the true population mean lies between 16.33 and 19.67. b. Using the same data construct a 95% one lower tailed confidence interval. This is the same thing as finding a lower bound for the mean. So our first thing to do is to find our tα. We do this by going to the t-table at the back of the book and looking up the CI for 95% with dof = n – 1 = 25 – 1 = 24. This value is t= 2.064 7 _ _ x ± tα s x note that we want a lower CI, so only subtract the mean _ x - tα s / n 18 – 2.064(0.6) = 16.76 So we are 95% sure that the true population mean must be greater than 16.76. c. Suppose you are told the mean score in a particular dept on a test is 70 with s = 2. If you take a sample of 25 students and find that the average is 75, do you have enough evidence at the 95% confidence level to support this conclusion. **note that we don’t have σ and n < 30..so use t-stats Step 1: set up the hypotheses Ho: u = 70 Ha: u ≠70 -so this is a two sided test Step 2: Find critical values so we go to the t-table and find the t’s that give us α = 0.05 or where 2.5% of the probability lies in the tails. These values are 2064 and -2.064 so if our test stat is below or above these we reject Ho in favor of Ha at the 95% level of confidence. _ Step 3: Find the test stat t = ( x - μ) / (s / n ) = ( 75 – 70 ) / (2 / 25 ) = 12.50 Step 4: Compare test stat to critical value and analyze Since we have 12.50 >> 2.064 we can conclude at the 95% level of confidence that the mean score in the dept is most likely not 70. It is highly unlikely to find such a large difference. Most likely the population parameter (test score) is higher, but recall that 5% of the time we would expect outliers, so it might be that the sample we obtained was in fact an anomaly and the avg is in fact 70. Graphically: Rejection regions for two sided test. 12.5 is so far to the right of 2.064 that we reject the null. t 2.064 – these are your critical value; if you lie to the right or the left of these you reject Ho -2.064 P-value method: If we used the p-value method to evaluate we cannot find the exact pvalue for 12.5, but we can approximate it. If we go to the dof row (which is 24 in this case) we go as far over to the right as we can without going past our test statistic value (12.5 in this case). If we do that we get all the way to the right side of the t-table and find 8 a t-stat of 3.745. Go to the t-table and go to dof=24 to verify the last value in that row is 3.745. If you then go up the column you see that this corresponds to a confidence level of 99.9%. So we are at least 0.001 sure that we should NOT find a test stat of 12.5 Since 0.001 < 0.05 ( we are simply comparing our p-value to α ) then we reject Ho and conclude there is enough evidence to suggest that the mean is not 70. Both methods must once again give you the same results. 4. Notes about t-stat a. robustness – this is the term that is used to show how well a test stands up to things that violate the tests assumptions. In this case we assume that the data is symmetric. A t-stat (and Z for that matter) doesn’t actually work well if this is violated. So if you have data that is heavily skewed or have outliers then using a Z or t-stat might not be a good idea. b. For very small sample sizes a t-stat is also not very good. A general rule is that if it is not greater than 15, then maybe it is better to use some other stat or test. c. Samples should be SRS. Recall that from chapter 11 (sampling distributions) we assumed that all samples should be random when taken. This is a very important assumption and cannot be violated for the t-stat to be a good/valid test. C. Hypothesis Testing: Proportions 1. Hypothesis Test Recall from before that we had essentially 3 types of hypotheses. We have the same general set up here. We want to test to see if the sample proportion is a. Ho: p ≥ p0 Ha: p < p0 **to get the other one tailed test simply switch u1 and u2 around b. Ho: p = p0 Ha: p ≠ p0 Question: So when might this be used? Answer: Polling data is a really good example for this type of data, finding disease rates, error rates or anything that uses proportion. c. Example: Suppose that we are given that the pass rate of a certain class is 60%. If we look at the data from one specific class where the number of people who passed out of 30 was 20, test to see whether or not the pass rate was statistically significant. Use α = 0.05. The first thing we do is set up our null and alternative i) Hypotheses: Ho: p = 0.60 9 Ha: p ≠ 0.60 ii) Analysis of Critical Region: Then we find our critical value for our test. Since our alpha is 0.05 and we have a Z-stat that is two-tailed our critical values are +/- 1.96. So we will reject Ho only if our test statistics lies farther out than either of these two values. iii) Now we find the Test Statistic: So Z = (p-p0) / σpo p= 0.667 = 0.0894 So we find that Z = ( 0.667 – 0.60 ) / 0.0894 = 0.681 iv) Conclusion: Since our test statistics = 0.681 < 1.96, we fail to reject Ho and conclude there is not enough evidence to suggest that the pass rate in the class is any different from 0.60 or 60% Graphically So from our Z-table calculation we find this value is 0.681, which is not in our rejection region. So the sample proportion of 0.667 is not far enough away from 0.60 for us to think that the class is any different from the rest in terms of passing rate. -Z = -1.96 α/2 Z tα/2=1.96 Zα/2 μ 10