Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Misuse of statistics wikipedia , lookup
"It is a capital mistake to theorize before one has data." Sir Arthur Conan Doyle 11.1 INFERENCE FOR THE MEAN OF A POPULATION (Pages 586 605) OVERVIEW: Last chapter we were given the standard deviation of the population. In reality, one frequently does not know the standard deviation of the population from which a random sample was obtained. When sample sizes are small and the population standard deviation is not known, statisticians make use of the t-distribution, which is bell-shaped, but differs from the normal distribution. Hence, there is a t-distribution table that differs from the normal distribution table. As samples sizes get larger, the t-distribution approaches the normal distribution in shape. In a one-sample test, the degrees of freedom (needed to use the table) is one less than the sample size. A few important things to note in this situation: -When the standard deviation parameter is estimated from a sample, the resulting statistic is called the . The of the sample mean ( x ) = , where n = sx = -If a sample has mean x , the one sample t-statistic is -A confidence interval calculated for a t-statistic has the form , where t* is the appropriate value from the t-distribution table (table C in back of book). Assumptions: 1. 2. 3. Example 1 (Single sample): An SRS of 75 male adults living in a particular suburb was taken to study the amount of time they spent per week doing rigorous exercise. It indicated a mean of 73 minutes with a standard deviation of 21 minutes. (a) Calculate a 95% confidence interval of the mean for all males in the suburb. (b) Interpret this interval in words. Response to (a): Assumptions: = x sx C= dof = t – confidence interval = x t * t* = sx n The desired 95% confidence interval is Response to (b): We are 95% confident that the true mean time of rigorous exercise is in the interval ( ) since 95% of the confidence intervals of all samples of size 75 from this population will contain the value of the true population mean of rigorous exercise time. Consequently, we are 95% confident that our interval is one of those that contains . Example 2 The belief is that the mean number of hours per week of part-time work of high school seniors in a city is 10.6 hours. Data from a SRS of 50 seniors indicated that their mean number of hours of part-time work was 12.5 with a standard deviation of 1.3. Test whether these data cast doubt on the current belief ( .05 ). = Assumptions: We will use a One-sample t-test H0 : H0 : Ha : Ha : Test Statistic: t x 0 , with n -1 degrees of freedom. sx n t t= Using the TI-83, to find Therefore, tcdf( is tcdf( ). )= Since this is a 2-tail situation, our P-value is 2( )= With a P-value this small, one would a null hypothesis stating = 10.6. In other words, since P-value is than , there is sufficient evidence to doubt that the mean number of hours of part time work per week for high school seniors in this city is 10.6. Facts about the t-distribution 1. 2. 3. Comparative studies are more convincing than single-sample investigations which often will lead us to conduct an alternative design. Matched Pairs Design – subjects are matched in pairs and each treatment is given to one subject in each pair, or a before-after observation is made on the same subjects. Example 3 (Matched Pairs): Assume that the following data represents a pretest, posttest situation. In a situation such as this, one calculates a difference column and tests the null hypothesis that the mean of the difference column is 0. In a 2-tail test, the alternate hypothesis is that the mean of the differences is not zero. Individual After Before # Score (A) Score (B) 1 78 73 2 92 86 3 68 70 4 69 62 5 76 72 6 80 83 7 91 85 8 79 76 9 86 86 10 59 54 Diff. = (A) - (B) Mean of difference column s for difference column SE =standard error s n = Degrees of freedom t-statistic Using the TI-83, . In other words, we conclude that there is about a % chance that we would get a difference as large as 3.1 if the numbers in the difference column came from a population with = 0. Since this probability is , one would probably a null hypothesis H0: = 0. Since we are using a 2-tail test, the P-value is 2tcdf( )= , or about Rules for using the t-test Ideally the population will have a normal distribution, but for times when this is not given: - For sample sizes less than 15, the t-procedures can be used if the data are close to normal. Do not use t-procedures if the data are clearly non-normal or if outliers are present. - For sample sizes 15 or greater, t-procedures can be safely used except in the presence of outliers or strong skewness. - For sample sizes 40 or greater, t-procedures can be used even if data is heavily skewed. 1. A SRS of 40 packages of light bulbs indicated that the mean number of defective bulbs in each package was .79 with a standard deviation of .2. The packages contained 4 bulbs each. a. Define the parameter of interest. b. Construct a 90% confidence interval based on these data. List your assumptions. c. Interpret this interval in words. 2. The choice between a z-test and a t-test for a population mean depends primarily on: (a) the sample size (c) whether a 1 or 2 tailed test is indicated (b) the level of significance (d) given s x or 3. The P-value of a significance test is the probability that: (A) the decision resulting from the test is correct (B) 95% of the confidence intervals will contain the parameter of interest (C) the null hypothesis is true (D) the alternative hypothesis is true (E) none of these describes the P-value 4. If the P-value of a test is less than the level of significance, then which of the following is a correct conclusion? (A) The value of the test statistic is in the rejection region for the test (B) The sample size should be increased to decrease the margin of error (C) The null hypothesis is true (D) The corresponding confidence interval will contain the hypothesized value of the parameter in the null hypothesis (E) None of these is a valid conclusion 5. A bank makes an offer of no credit fee to SRS of 40 customers. It then compares the amount customers charged this year with no credit fee to last year’s charged amounts with the credit fee. The mean increase is $332 and the standard deviation is $108. a) Is there significant evidence at the 1% level that the mean amount charged increases under the no credit fee offer? b) back up your test with a confidence interval a) H0 : Ha : b) 6. The guidance office of a school wants to test the claim of an SAT test preparation company that students who complete their course will improve their SAT Math score by at least 50 points. Ten members of the junior class who have had no SAT preparation but have taken the SAT once were selected at random and agreed to participate in the study. All took the course and re-took the SAT at the next opportunity. The results of the testindicated: -----------------------------------------------------------------------------------------------------------------------------Student 1 2 3 4 5 6 7 8 9 10 ____________________________________________________________________________________ Before 475 512 492 465 523 560 610 477 501 420 ____________________________________________________________________________________ After 500 540 512 530 533 603 691 512 489 458 ____________________________________________________________________________________ Is there sufficient evidence to support the prep course company’s claim that scores will improve by at least 50 points at the 5% level of significance? Let’s use the TI-83 to run our test and confirm our answers in exercise #5: Stat/Tests/2: T-Test Highlight Stats in order to manually load the data needed Whoomp, there it is! Will the TI-83 also CONFIRM my confidence level? Stat/Tests/#8 TInterval Highlight Stats to manually input the information needed Voila, we have the interval