* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Review test 3spring
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
Psychometrics wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
8.5 – Goodness of Fit Test Suppose we want to make an inference about a group of data (instead of just one or two). Or maybe we want to test counts of 2 categorical data. Chi-square (or c ) testing allows us to make such inferences. There are several types of Chi-square tests but in this section we will focus on the goodness-of-fit test. Goodness-of-fit test is used to test how well one sample proportions of categories “match-up” with the known population proportions stated in the null hypothesis statement. The Chi-square goodness-of-fit test extends inference on proportions to more than two proportions by enabling us to determine if a particular population distribution has changed from a specified form. The null and alternative hypotheses do not lend themselves to symbols, so we will define them with words. H o : _____ is the same as _____ H a : _____ is different from _____ For each problem you will make a table with the following headings: Observed Expected Counts (O) Counts (E) O E 2 E The sum of the third column is called the Chi-square test statistic. 2 observed expected expected Table D gives p-values for c2 2 with n – 1 degrees of freedom. Chi-square distributions have only positive values and are skewed right. As the degrees of freedom increase it becomes 2 more normal. The total area under the c curve is 1. The assumptions for a Chi-square goodness-of-fit test are: 2. The sample must be an SRS from the populations of interest. 3. The population size is at least ten times the size of the sample. 4. All expected counts must be at least 5. To find probabilities for c distributions: 2 TI-83/84 calculator uses the command cdf found under the DISTR menu. 2 R-Studio command is: 1 – pchisq(test statistic, df) Examples: 1. The Mixed-Up Nut Company advertises that their nut mix contains (by weight) 40% cashews, 15% Brazil nuts, 20% almonds and only 25% peanuts. The truth-in-advertising investigators took a random sample (of size 50 lbs) of the nut mix and found the distribution to be as follows: Brazil Almonds Nuts Peanuts 15 lb 11 lb 13 lb 11 lb Cashews At the 1% level of significance, is the claim made by Mixed-Up Nuts true? Summary on how do we decide what to do in chapter 7 and chapter 8 Test 3 review 1-7 are 10 points each and question 8-17 are 3 points each 1) The one-sample t statistic for a test of H0: μ = 19 vs. Ha: μ < 19 based on n = 27 observations has the test statistic value of t = -2.58. What is the p-value for this test? Ti83/84 tcdf(-10^99,-2.58,26)=.00794 2) Let x represent the hemoglobin count (HC) in grams per 100 milliliters of whole blood. The distribution for HC is approximately normal with μ = 14 for healthy adult women. Suppose that a female patient has taken 10 laboratory blood samples in the last year. The HC data sent to her doctor is listed below. Test whether these data cast doubt on the current belief. (use α = 0.05) State the null and alternate hypothesis, give the p-value sketch and give the rejection area State your conclusion based on a 5% significance level. TI- 84 use Ttest I get t= 1.428 P=.18 Sample mean =15.4 Sample sd= 3.098 N=10 df= n-1=9 significance level .05 rejection area intT(.05/2, 9)=-2.26 and 2.26 3) Based on information from a large insurance company, 67% of all damage liability claims are made by single people under the age of 25. A random sample of 51 claims showed that 44 were made by single people under the age of 25. Does this indicate that the insurance claims of single people under the age of 25 is higher than the national percent reported by the large insurance company? : State your null and alternative hypotheses. Ho: p = .67, Ha: p > .67 Sketch the rejection region. Calculate the test statistic. Plot this value in your sketch. Ti83/84 1-ProbZTest z=2.9273 Rejection area for significance level=.05 Invnorm(.05)=1.64(change the sign since Ha >) Note:for Ha< keep it -1.64 For Ha not equal Invnorm (.05/2) : Determine the P-value for your test. P=.0017 : State your conclusions clearly in complete sentences. 5) A 97% confidence interval for the mean of a population is to be constructed and must be accurate to within 0.3 units. A preliminary sample standard deviation is 1.7. The smallest sample size n that provides the desired accuracy is a) 167 *b) 152 (151.20) c) 139 d) 143 e) 138 Calculate 97% CI TI-84: =2.17 In_norm(.97+(1-.97)/2)=Inv_norm(.985)=2.17 6) Mars Inc. claims that they produce M&Ms with the following distributions: Brown Orange 20% 5% Red Green 25% 15% Yellow Blue 25% 10% A bag of M&Ms was randomly selected from the grocery store shelf, and the color counts were: Brown Orange 25 13 Red Green 23 15 Yellow Blue 21 14 Using the χ2 goodness of fit test (α = 0.05) to determine if the proportion of M&Ms is what is claimed. Select the [test statistic, p-value, Decision to Reject (RH0) or Failure to Reject (FRH0)]. a) [χ2 = 6.865, p-value = 0.983, RH0] b) [χ2 = 13.730, p-value = 0.009, RH0] *c) [χ2 = 13.730, p-value = 0.017, RH0] d) [χ2 = 13.730, p-value = 0.017, FRH0] e) [χ2 = 6.865, p-value = 0.983, FRH0] 7) True or false a) In a hypothesis test, if the computed P-value is less than 0.001, there is very strong evidence to reject the null hypothesis. b) In a hypothesis test, if the computed P-value is greater than a specified level of significance, then we fail to reject the null hypothesis. c) What will reduce the width of a confidence interval? Decrease variance. 8) A simple random sample of 49 8th graders at a large suburban middle school indicated that 84% of them are involved with some type of after school activity. Find the 98% confidence interval that estimates the proportion of them that are involved in an after school activity. Ans:[0.718, 0.962] 9) Television viewers often express doubts about the validity of certain commercials. In an attempt to answer their critics, a large advertiser wants to estimate the true proportion of consumers who believe what is shown in commercials. Preliminary studies indicate that about 40% of those surveyed believe what is shown in commercials. What is the minimum number of consumers that should be sampled by the advertiser to be 99% confident that their estimate will fall within 2% of the true population proportion? Ans: 3982 10) The gas mileage for a certain model of car is known to have a standard deviation of 6 mi/gallon. A simple random sample of 49 cars of this model is chosen and found to have a mean gas mileage of 28.4 mi/gallon. Construct a 97% confidence interval for the mean gas mileage for this car model. Ans [26.540, 30.260] (TI84_Stat ZInterval number 7) 10) An SRS of 28 students at UH gave an average height of 5.9 feet and a standard deviation of .1 feet. Construct a 90% confidence interval for the mean height of students at UH. Ans:[5.868, 5.932] TI-84:TI interval