* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Practice Problems for Exam 1
Survey
Document related concepts
Transcript
ST 432 Exam 1 Practice Questions 1. A dentist was interested in the effectiveness of a new toothpaste. Records showed there was an average of 2.2 cavities every six months for a group of 1000 schoolchildren before they started using the new toothpaste. After six months of the study, the dentist sampled n=10 children to determine how they were progressing on the new toothpaste. The data follow. Child 1 Cavities 0 2 4 3 2 4 3 5 2 6 0 7 3 8 4 9 1 10 1 a. Assume the 10 children were selected via SRS from the group of 1000. Estimate the mean number of cavities for the whole group of 1000 and place a bound on the error of your estimate. y 2 and bound on error of estimation =1.06 (t) or 0.92(z) b. The dentist would like to calculate a confidence interval to see whether the population mean has changed since before the study began. Would you recommend a confidence interval based on a normal approximation for this purpose? Why or why not? No, the sample size is too small. 2. A local newspaper reporter is doing a story about NCSU faculty’s off-campus consulting. This reporter is specifically interested in how much money faculty members earn from such activities. The reporter obtains a list of all 1588 faculty members of the university and chooses a simple random sample (SRS) of 10 people. He mails each of the ten people a survey in which he asks, among other things, how much money each person has earned through off-campus consulting activities during the previous calendar year. All ten people respond to the survey. The data (in dollars) follow. 12,000 250 750 8500 3200 a. What is the target population? All NCSU faculty b. What is the sampling frame? 0 300 0 2500 3000 List of all 1588 faculty c. What are the sampling units? Individual faculty members d. What kind of measurement bias might occur in the survey? Individuals may not remember correctly, may misunderstand what is considered consulting, or may lie. e. Using an unbiased estimator, estimate the population mean and give a bound on the error of estimation. y 3050 , bound on error of estimation =2906 (t) or 2518 (z) f. The reporter’s editor says that his standard error is too large to be useful. To get a smaller standard error, the reporter proposes to take a larger sample. Using the list of 1578 faculty that were not previously selected, he will take an SRS of 10 more faculty and send them surveys. Could these 10 new observations be combined with the 10 previous observations (given above) and treated as an SRS of size 20 from the population of all 1588 faculty? Explain. Yes. Each subset of 20 unique units from the population has an equal chance of being selected as the final sample. 3. Researchers are interested in studying the number of oak trees in a wilderness region. The region is divided into 50 plots, 10 of which are selected for the sample via SRS. For each selected plot, the number of oak trees on the plot is counted. The sample data appear below. Plot 3 8 9 13 25 Oak Trees 42 1 5 28 12 Plot 27 32 39 43 49 Oak Trees 38 15 8 41 3 a. Calculate an unbiased estimate of the total number of oak trees in the region and place a bound on the error of estimation. ˆ 965 ; bound on the error of estimation=524 (t) or 454 (z) 4. Below is a population of five units. The study variable is denoted y. Unit 1 2 3 4 5 y 5 12 8 4 6 Consider the following sampling design. Sample S P(S) {1,4} ¼ {2,3} ¼ {5} ½ a. We want to use the sample mean to estimate the population mean. Is the sample mean an unbiased estimator of the population mean under this sampling design? No. E( y ) 1/ 4(4.5) 1/ 4(10) 1/ 2(6) 6.625 7 b. What is the standard deviation of the sample mean under this sampling design? V ( y ) 1/ 4(4.5 6.625) 2 1/ 4(10 6.625) 2 1/ 2(6 6.625) 2 4.01 So SD=2.002 c. What is the inclusion probability i for each unit under this sampling design? 1 2 3 4 1/ 4 and 5 1 / 2 d. Is this a probability sampling design? Yes 5. Consider the following small population of people and their heights. Name Ann Barry Chip Denise Edward Height (inches) 62 68 76 72 73 The following sampling design will be used. S {Ann, Barry} {Chip, Denise} {Ann, Denise, Edward} {Barry, Chip} P(S) 1/4 1/4 1/8 3/8 a. What is the population mean ? 70.2 b. What is the population variance 2 ? 23.36 c. Is this a fixed size sampling design? Explain. No, some samples are size 2, some are size 3. d. Is this a probability sampling design? Explain. Yes, every member of the population appears in at least one sample. e. What is the mean of the sampling distribution of y ? Is y unbiased for ? E ( y ) 70.375 . No, not unbiased. f. What is the standard deviation of y ? V ( y ) (65 70.375) 2 1 1 1 3 (74 70.375) 2 (69 70.375) 2 (72 70.375)2 11.73 4 4 8 8 SD=3.43 g. What is the inclusion probability i for each member of the population? 1 3 / 8; 2 5 / 8; 3 5 / 8; 4 3 / 8; 5 1/ 8 h. Would you recommend this sampling design over an SRS of size 2? Explain. If y is going to be used, at least it would be unbiased under SRS and its SD would be 2.96. It does appear that SRS would be better than using y with this sampling design. 6. The statistics department wants to estimate the average housing cost for a graduate student in order to give this information to prospective students. From a list of the 150 currently supported statistics graduate students (those receiving pay for an assistantship or fellowship), a SRS of 10 students is selected. a. What is the target population in this study? All prospective graduate students. b. What is the sampled population in this study? All currently supported graduate students. c. Are there any possible sources of selection bias? Explain. Those graduate students who are unsupported have no chance of ending up in the sample. d. Give one possible source of measurement bias. Students may misunderstand what should be included in “housing costs.” e. The collected data are given below. Using an unbiased estimator, give an estimate of the population mean and the corresponding bound on the error of estimation. Student 1 2 3 4 5 6 7 8 9 10 Housing cost per month (hundreds of dollars) 4.5 5.5 7.4 0 3.6 3.5 2.5 3.5 3.7 4.0 y 3.82 $382 bound on the error of estimation 1.32($132) (t ) or 1.14($114) ( z) 7. Selection bias can occur in several ways: i) Using a sample-selection procedure that, unknown to the investigators, depends on some characteristic associated with the properties of interest; ii) Deliberately or purposefully selecting a “representative” sample; iii) Misspecifying the target population; iv) Failing to include all the target population in the sampling frame, called undercoverage; v) Substituting a convenient member of the population for a designated member that is not readily available; vi) Failing to obtain responses from all the chosen sample (nonresponse); vii) Allowing the sample to consist entirely of volunteers. For each of the following scenarios, state whether you believe the survey was subject to selection bias and explain why or why not. [Answer keys to these are not provided.] a. A survey by a company was said to measure American attitudes toward college. The polling firm mailed questionnaires to 4200 people across the US and received 1188 responses. About 55% of those responding said they had major concerns about being able to pay for their child’s college education. b. A chain of department stores is interested in estimating the proportion of accounts receivable that are delinquent. The chain consists of four stores, each having 100 accounts. Fifty accounts are chosen from store 1, 10 from store 2, 10 from store 3, and 10 from store 4. c. You are hired to estimate the proportion of registered voters in your county who favor an increase in the number of nuclear weapons owned by the US. You attend one meeting of the Republican party in your county and one meeting of the Democratic party in your county and survey all attendees at each meeting. For a final estimate, you average the percentages from the two meetings. d. An investigator is curious about a community’s attitude toward tougher credentials for day care workers proposed by the state legislature. The investigator obtains a list of all licensed day care facilities in the community. A sample of facilities is selected, then a sample of parents at each selected facility is taken. In presenting the results to a local community action group, the investigator states, “A survey of parents of children in licensed day care facilities in this community indicates that well over 75% are in favor of tougher credentials for day care workers.” 8. Suppose we want to estimate the proportion p of recipes in the Better Homes and Gardens New Cook Book that do not involve animal products. We plan to take an SRS of the N = 1251 recipes in the book. a. We want to be 95% confident that we are within 0.03 of the true proportion p. What is the required sample size? z 2 p(1 p) ; since we have no prior information about the (MOE )2 1 proportion p, we will use p . So 2 1 1 1.962 1 2 2 n0 1067. (.03)2 The sample size, ignoring the fpc, is large compared with the population size, so in this case we should make the fpc adjustment and use n 1067 n 0 576. n0 1067 1 1 1251 N n0 b. Suppose we want to reduce the margin of error by half, that is, we want to be 95% confident that we are within 0.015 of the true proportion p. What is the required sample size? Since we want to reduce the margin of error by half, new n0 4 original n0 4(1067) 4268; newn0 4268 newn 967 newn0 4268 1 1 1251 N c. Suppose we want to be 98% confident that we are within 0.03 of the true proportion p. What is the required sample size? 1 1 2.332 1 2 2 2 z p(1 p) n0 1508. 2 (MOE ) (.03)2 Again, the sample size, ignoring the fpc, is large compared with the population size (in fact, it's larger than the population size N), so in this case we should make the fpc adjustment and use n 1508 n 0 684. n0 1508 1 1 1251 N