* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download practice questions
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Math 251, Final Review Questions Note: Many of the questions included here are ones you have already seen on reviews or tests. The purpose of the questions is to serve as a reminder of many of the types of problems we have had this quarter. The following is a link to Answers to all questions except 5,6,9 and 15; the Answers to 5,6,9 and 15 are available here. Remember, answers are unedited, please see your instructor if you have questions. 1. Classify the type of sampling used in the following examples. (a) To conduct a poll, the Join Arnold team randomly chose 8 different prefixes in California (the first 3 digits of the telephone number) and called all households from those prefixes. (b) To maintain quality control, a tire manufacturer tests every 100th tire that comes off of the assembly line in its plant. (c) To determine student attitudes toward worship requirements at La Sierra, President Geraty gave questionnaires to ten randomly selected students each from of the following groups: Freshmen, Sophomores, Juniors, Seniors and Graduate Students. (d) To determine worker attitudes in the recent MTA strike, the union numbered all employees, and used a random number generator to select 100 of those employees for interviewing. (e) To estimate the mean age of students at La Sierra University, President Geraty computed the mean age of his Gender Issues in Ancient Hebrew Society class. 2. Categorize the following data according to level: nominal, ordinal, interval, or ratio. (a) The time someone goes to bed. (b) Length of time to complete a marathon. (c) The condition of a highway: poor, acceptable, good. (d) The color of the winning horse at the Kentucky Derby. (e) The height in inches of the winning horse at the Kentucky Derby. 3. How would you use a randomized two-treatment experiment in the following setting. Is placebo being used or not? Explain. A veterinarian wants to test a strain of antibiotic on calves to determine their resistance to common infection. In a pasture are 22 newborn calves. There is enough vaccine for 10 calves. However, blood tests to determine resistance to infection can be done on all calves. 4. (a) In a set of data with more than two distinct values, how does increasing the largest number affect the mean? How would it affect the median? How does it affect the range? (b) How will the mean, standard deviation, and coefficient of variation compare in Population 1 below compare with those in Population 2 below? Explain why but do not compute them. Notice the data values in Population 2 are 10 times the data values in Population 1. Pop1: Pop2: 5 50 10 100 15 150 20 200 25 250 30 300 40 400 45 450 50 500 75 750 80 800 95 950 5. Consider the data (which are systolic blood pressures of 25 subjects): 105 126 146 108 126 152 110 128 166 110 130 188 112 130 190 112 130 116 132 118 134 118 136 120 140 (a) What class width should be chosen if you would like to have 8 classes? (b) Complete the following table for this data given that the first class has limits 105—119 Lower Limit 105 Upper Lower Upper Cumulative Relative Limit Boundary Boundary Midpoint Frequency Frequency Frequency 119 (c) Find the median, Q1, and Q3 for the above data. Draw a box and whisker plot for the data. You may draw it horizontally if you prefer. (d) Construct a relative frequency histogram for the data using the table in (b). 6. Consider the data (which are systolic blood pressures of 50 subjects): 100,102,104,108,108,110,110,112,112,112,115,116,116,118,118, 118,118,120,120,126,126,126,128,128,128,130,130,130,130,130, 132,132,134,134,136,136,138,140,140,146,148,152,152,152,156, 160,190,200,208,208 (a) What class width should be chosen if you would like to have 6 classes. (b) Suppose you don’t want a class width of 19, but would like a class width of 15 irrespective of how many classes that would give you. Complete the following table for this data. Lower Limit Upper Limit 100 114 Lower Boundary Upper Boundary Midpoint Cumulative Relative Frequency Frequency Frequency (c) Draw a frequency histogram using the table in (b). (d) Draw a frequency polygon using the table in (b). (e) Draw an Ogive using the table in (b). (f) Find the median, mode, range and first and third quartiles for the data in this problem. 7. For this question, consider the ogive that is displayed below for winning times for the Kentucky Derby. Winning Times for Kentucky Derby 120 100 100 101 Cumulative Frequency 94 85 80 75 60 48 40 20 12 0 0 -0.85 1.15 3.15 5.15 7.15 9.15 11.15 13.15 Seconds over 2 Minutes (a) How many races had winning times over 2:07.15? (b) What percentages of the races had winning times between 2:01.15 and 2:05.15? (c) How many races had winning times of under than 2:03.15? 8. (a) At a large university, 4000 students wrote a mathematics placement test one day. Given that x = 85,400 and x2=1,904,290 for these test scores. Find the mean and population standard deviation, and the coefficient of variation. (b) If a person scored at the 38th percentile in the placement test in (a). Approximately how many students scored lower than that student? Approximately how many students scored higher than that student? 9. (a) Make a stem and leaf display for the following data. 58 92 85 52 66 84 68 68 90 86 87 57 72 86 77 66 73 76 97 61 84 89 70 93 84 75 58 91 72 47 91 73 (b) After making the display, find the median of the data. 10. A population is known to have a mean of 50 and standard deviation of 15. Use Chebyshev’s theorem to find the interval in which you would expect to find at least 8/9 of the data. (b) In general, at least what proportion of the data in any population must lie within 4 standard deviations of the mean? 11. Professor Henry Wiggins decided to study the ages of the children attending the nursery at his school. He constructed the following frequency distribution for ages in months. x 10—19 20—29 30—39 F 30 50 20 Please help Professor Wiggins by estimating the mean and sample standard deviation for the ages of children at the nursery. 12. At Kenwood College of Engineering, 60% of incoming freshmen students are female and 40% are male. Recent records indicate that 90% of the entering female students will graduate with a BSE degree, while 80% of the male students will obtain a BSE degree. If an incoming engineering student is selected at random, find (a) (b) (c) (d) (e) (f) P(student will graduate, given student is female) P(student will graduate, and student is female) P(student will graduate, given student is male) P(student will graduate, and student is male) P(student will graduate) P(student will graduate, or student is female) 13. The following represents the outcomes of a flu vaccine study. Got the Flu Did not get Flu Row Total No Flu Shot 223 777 1000 Given Flu Shot 446 1554 2000 Column Total 669 2331 3000 Let F represent the event the person caught the flu, let V represent the even the person was vaccinated, let H represent the event the person remained healthy (didn’t catch the flu), and let N represent the even that the person was not vaccinated. (a) Compute: P(F), P(V), P(H), P(N), P(F given V) and P(V given F), P(V and F), P(V or F). (b) Are the events V and F mutually exclusive? Are the events V and F independent? Explain your answers. (c) Explain what mutually exclusive events are. (d) Explain what independent events are. (e) Give an example of two events that independent but are not mutually exclusive. Justify your answer. 14. (a) President Geraty has recently received permission to excavate the site of an ancient temple. In how many ways can he choose 11 of the 37 students in his History of Antiquities course to join him on the dig? (b) Of the 37 students, 19 are female and 18 are male. In how many ways can President Geraty select a group of 11 that consists of 6 females and 5 males? (c) What is the probability that President Geraty would randomly select a group of 11 consisting of 6 females and 5 males? (d) In how many ways can 3 of the 37 students in President Geraty’s class students win $10,000, $5,000 and $1,000 scholarships respectively for finishing first, second and third in President Geraty’s class? (e) Is the answer to (b) larger than the number of license plates that are of the form xyy-zzz where x is a digit 1 to 9, y is a digit 0 to 9, and z is a capital letter A to Z? 15. (a) Fill in the missing probability for the following discrete random variable: X 3 6 9 10 12 P(x) .10 .15 .25 ? .11 (b) The number of cars per household in a small town is given by Cars Households 0 20 1 280 2 75 3 25 (i) Make a probability distribution for x where x represents the number of cars per household in this small town. Cars (x) P(x) (ii) Find the mean and standard deviation for the random variable in (i) (iii) What is the average number of cars per household in that small town? Explain what you mean by average. (c) Yet a different random variable question: (From p. 219 #15) Combinations of Random Variables. Norb and Gary entered in a local golf tournament. Both have played the local course many times. Their scores are random variables with the following means and standard deviations. Norb, x1: 1 = 115; 1 = 12 Gary, x2: 2 = 100; 2 = 8 Assume that Norb’s and Gary’s scores vary independently of each other. The difference between their scores is W = x1- x2. Compute the mean, variance and standard deviation for the random variable W. (d) Identify the following random variables as continuous or discrete. (i) (ii) (iii) (iv) (v) x = the winning time of a horse race x = the numbers of customers in Vons at a given day x = the number of traffic accidents in LA County in a given year x = weight of the winning Jockey at a horse race x = the length of time it takes a person to drive from Anaheim to Hesperia. 16. Cascade Airlines (a.k.a. “Crashcade” and now defunct) records showed that on average 10% of prospective passengers will not claim their reservations on a certain flight. Suppose that they booked 21 passengers for 20 seats on that flight. (a) Find the mean and standard deviation for the number of passengers who will claim a reservation. (b) Find the probability that all passengers who show up for the flight will receive a seat? (c) Compute the probability that exactly 17 passengers show up for the flight. 17. Suppose the distribution of weights of adult male American Landrace pigs is normally distributed with a mean of 480 lbs and standard deviation of 55 lbs. (a) What weight is at the 80th percentile? (b) What proportion of adult male American Landrace pigs weigh between 500 lbs and 600 lbs? (c) What proportion of adult male American Landrace pigs weigh less than 600 lbs? (d) What proportion of adult male American Landrace pigs weigh more than 600 lbs? (e) Find an interval whose center is the mean, and which contains 99% of the adult male American Landrace pigs weights. Hint: first find a z-interval so that 99% of normal curve lies between –z and z. 18. Alaska Airlines has found that 94% of people with tickets will show up for their Friday afternoon flight from Seattle to Ontario. Suppose that there are 153 passengers holding tickets for this flight, and the jet can carry 146 passengers, and that the decisions of passengers to show up are independent of one another. (a) Verify that the normal approximation of the binomial distribution can be used for this problem. (b) What is the probability that 146 or fewer passengers will show up for the flight (i.e., all passengers who show up will receive a seat)? 19. (Review of some terminology from Section 7.1) In a state with 459,341 voters, a poll of 2300 voters finds that 45 percent support the Republican candidate, where in reality, unknown to the pollster, 42 percent support the Republican candidate. (a) What is the value of the statistic of interest? (b) What is the value of the parameter of interest? (c) Describe the population of interest. (d) In general, is it true that given a certain population, the parameter of interest will not change under repeated sampling? Explain. 20. (From p. 352 # 18). The taxi and takeoff time for commercial jets is a random variable x with mean 8.5 minutes and standard deviation of 2.5 minutes. You may assume the jets are lined up on the runway so that one taxis and takes off immediately after the other, and they take off one at time on a given runway. What is the probability that for 36 jets on a given runway total taxi and take off time will be (a) less than 320 minutes? (b) more than 275 minutes? (c) between 275 and 320 minutes? 21. (From p. 385 #6) The Roman Arches is an Italian restaurant. The manager wants to estimate the average amount a customer sends on lunch Monday through Friday. A random sample of 115 customers' lunch tabs gave a mean of $9.74 with a standard deviation s = $2.93. (a) Find a 95% confidence interval for the average amount spent on lunch by all customers. (b) For a day when the Roman Arches has 115 lunch customers, use part (a) to estimate the range of dollar values for the total lunch income that day. 22. A recent Gallup poll of 1006 adult Americans found that 37% of those asked oppose cloning of human organs. (a) Find a 98% confidence interval for the proportion of adult Americans that oppose cloning of human organs. (b) Based on the answer to (a), would it be appropriate for the Gallup organization to state that less than 38\% of adult Americans oppose cloning of human organs? Explain. 23. A company claims that that in one coat, 1 gallon of its brand of paint will cover at least 350 square feet on average. A random sample of ten 1-gallon cans produced the following data. Area Covered (Square Feet): 342, 378, 358, 364, 381, 392, 339, 356, 386, 347 Note: for this data x = 3643 x2 = 1330395 (a) Construct a 99% confidence interval for the mean; assume that the data comes from a normal distribution. (b) Repeat question (a) for a 95% confidence interval. (c) Do you think the company’s claim is true? Explain. 24. In a 1999 survey of 80 Computer Science graduates and 110 Electrical Engineering graduates, it was found that the Computer Science graduates had a mean starting salary of $48,100 with a standard deviation of $7,200, while the Electrical Engineering graduates had a mean starting salary of $52,900 with a standard deviation of $5,300. (a) Find a point estimate for the difference in average starting salaries for Computer Science and Electrical Engineering graduates. (b) Let 1 be the population mean starting salary for the Computer Science graduates and let 2 be the population mean salary for the Electrical Engineering graduates. Find a 96% confidence interval for 1- 2. (c) Based on the interval in (b), would you be comfortable saying that the mean starting salary for Computer Science graduates is less than that for Electrical Engineering graduates? Explain. 25. A Gallup News Release (see http://www.gallup.com/poll/releases/pr031030.asp) reported that in July 1996, 69% of those surveyed supported assisted suicide for terminally ill, while in May 2003, 72% of those surveyed supported assisted suicide for the terminally ill. Assume the 1996 poll surveyed 1103 adult Americans, while the May 2003 poll surveyed 1009 adult Americans. Find a 95% confidence interval for p1 – p2 where p1 is the proportion of adult Americans supporting assisted suicide in July 1996, and p2 is the proportion of adult Americans supporting assisted suicide in May 2003. 26. (a) If a population has a standard deviation of 25, what sample size would be necessary in order for a 98% confidence interval to estimate the population mean within 4? (b) What size of random sample is needed by the Gallup organization to estimate a population proportion within .03 with 95% confidence? In your calculation assume that there is no preliminary estimate for p. (c) Repeat (b), but assume that there is a preliminary estimate that p = .33. 27. A vendor was concerned that a soft drink machine was not dispensing 6 ounces per cup, on average. A sample size of 40 gave a mean amount per cup of 6.05 ounces and a standard deviation of .15 ounce. (a) Find the Pvalue for an hypothesis test to determine if the mean is different from 6 ounces. (b) For which of the following levels of significance would the null hypothesis be rejected? (i) .01 (ii) .04 (iii) .05 (iv) .10 (c) For each case in part (b), what type of error has possibly been committed? (d) Find a 96% confidence interval for the mean amount of soda dispensed per cup. (e) Is your interval in (d) consistent with the test conclusion in (b)(ii)? Explain. (f) Based on your answer to (b)(i) would you expect a 99% confidence interval to contain 6? (g) Suppose that the population standard deviation is 0.15, what sample size would be needed so that the maximum error in a 96% confidence interval is E = .01? 28. (a) Suppose that a February Gallup poll of 1200 randomly selected voters found that 53 percent support George W. Bush's energy policy. Conduct an hypothesis test at a level of significance of .01 to test whether the true voter population support for George W. Bush's energy policy in February was less than 56 percent. (b) Report the Pvalue of the test in (a) and give a practical interpretation of it. 29. (From p. 536 #8) A reading test is given to both a control group and an experimental group (which received special tutoring). The average score for the 30 subjects in the control group was 349.2 with a standard deviation of 56.6. The average score for the 30 subjects in the experimental group was 368.4 with a standard deviation of 39.5. Use a 4% level of significance to test the claim that the experimental group performed better than the control group. 30. (From p. 539 #18) A random sample of 288 voters registered in the state of California showed that 141 voted in the last general election. A random sample of 216 registered voters in Colorado showed that 125 voted in the last general election. Do these data indicate that the population proportion of voter turnout in Colorado is higher than that in California? Use a 5% level of significance. 31. The following sample data concerns the number of years a student studied German in school versus their score on a proficiency test. Years (x) 3 4 4 2 5 3 4 5 3 2 Test Score (y) Note: x = 35 57 y = 697 78 72 58 89 x2 = 133 y2 = 50085 63 73 84 75 48 xy =2554 (a) Find the equation of the least squares line for this data. (b) Use your line from (a) to predict the score on the proficiency test of a person who had 3.5 years of German. (c) Use the regression line in (a) to predict the number of years of German required to achieve a proficiency score of 75. (d) Compute the correlation coefficient r for this data. What does this coefficient suggest about a linear relationship between number of years German was studied in school and test scores for this sample? That is, determine whether it is a good fit, and whether it indicates a positive or negative linear relationship. (e) Compute the coefficient of determination, and interpret what it means. 32. A local radio station claims that 15 percent of all people in Riverside say it is their favorite station, 65 percent of all people in Riverside listen to it occasionally, while 20 percent never listen to it. Suppose you surveyed 200 randomly selected people in Riverside and found that of those 200 people, 20 claimed it was their favorite station, 131 said they listen to it occasionally, while 49 never listen to it. Conduct an hypothesis test at a level of significance of .05 to determine whether the station’s claim concerning the distribution of listeners is correct. Make sure to state the critical region for your test. 33. (a) A company wishes to check whether the mean weekly production levels at their five factories are equal using the method of analysis of variance. (i) State the conditions that should be satisfied by the populations and samples in order to use the method of analysis of variance. (ii) State the null and alternative hypotheses. (iii) Given that random samples of size 4 were obtained from each factory. What is the critical region for the hypothesis test if it is conducted at a level of significance of 5%? Some Short Answer Questions 34. (a) State the Central Limit Theorem. (b) Let x be a random variable from any population with mean and standard deviation . What type of distribution will the distribution for x from samples of size n with n large approximate? (c) Find the mean and standard deviation for the distribution for x based on a random sample of size 64 from a population with mean 37 and standard deviation 24. 35. Find the mean and standard deviation for the distribution for x based on a random sample of size n from a population with mean and standard deviation . 36. What conditions on a sample are necessary in order to find a confidence interval for a proportion? 37. What conditions are necessary in order to find a confidence interval for a mean with a large sample? 38. If a population has a standard deviation of 5, what sample size would be necessary in order for a 99% confidence interval to estimate the population mean within 2? 39. What size of random sample is needed by the Gallup organization to estimate a population proportion within .02 with 90% confidence? In your calculation assume that there is no preliminary estimate for p. 40. Repeat #39, but assume that there is a preliminary estimate that p = .33. 41. Suppose you are to construct a 95% confidence interval for the mean using a sample of size n=19 from a normal population with unknown standard deviation. What value of tc would you use in your confidence interval formula? 42. Suppose you are to construct a 92% confidence interval for the mean using a large sample, what value of zc would you use? 43. Which method (large sample/small sample) would you use to find a confidence interval from a normal population with known standard deviation given a sample size of 15? 44. What formula do you use for finding confidence intervals for: (i) means using large samples; (ii) means using small samples; (iii) proportions; (iv) difference of two means; (v) difference of two proportions? 45. Describe the characteristics of a binomial experiment. 46. When does one use the formula z x versus z x 47. In terms of standard deviations, what does the formula z . n x measure? What is the significance of a positive versus a negative z? 48. There are infinitely many different normal distributions (different means and different standard deviations lead to different distributions). How come only one table is necessary? 49. What assumptions must be made on the sample in order to use a normal approximation to the binomial distribution. 50. Because binomial probabilities can be computed directly, what advantage is there to using a normal approximation to the binomial distribution? 51. What formula used for computing binomial probabilities? Explain what all of the variables are in the formula. How is the formula used for computing something like P(k r m)? 52. Find the critical region for a two-tailed hypothesis test on a mean that is conducted using a large sample at a level of significance of = .01. 53. Find the critical region for a right-tailed test on a proportion at a level of significance = .04. 54. Find the critical region for a left-tailed test on a large sample mean at a level of significance = .04. 55. If you reject the null hypothesis when it is true, what type of error have you committed? 56. If you fail to reject the null hypothesis when it is false, what type of error have you committed? 57. Suppose you found a z-value of -8.05 in a right-tailed hypothesis test at a level of significance of = .05, would you reject the null hypothesis? 58. Suppose you found a z-value of -8.05 in a two-tailed hypothesis test at a level of significance of = .05, would you reject the null hypothesis? 59. What do you call the maximum probability at which you are willing to risk making a type I error? 60. Suppose you are conducting a two-tailed test on the mean and arrive at a sample statistic of z = 1.4, what Pvalue would you report? 61. Suppose you are conducting a two-tailed test on the mean and arrive at a sample statistic of z = 1.72, what Pvalue would you report? 62. If you find a Pvalue of .03 for an hypothesis test, what is the probability that the alternative hypothesis is true? 63. Suppose you conducted a left-tailed test on the mean and arrive at a sample statistic of z = -1.72, what Pvalue would you report? 64. Suppose you conducted a right-tailed test on the mean and arrive at a sample statistic of z = -1.72, what Pvalue would you report? 65. Suppose the test in #63 was conducted with a level of significance of 5%. Would you reject the null hypothesis? Repeat this question for #64. 66. Given a Pvalue of 0.0376 for which levels of significance would you reject the null hypothesis: (i) 10% (ii) 5% (i) 1%? 67. Given a test comparing two proportions, what formula do you use for computing the sample statistic z? r r 68. When and where do you use the pooled estimate ˆp 1 2 ? n1 n 2 69. What formula do you use for computing the sample test statistic in hypothesis tests on: (i) mean (large sample) (ii) proportion (iii) difference of two means? 70. In formulas pertaining to computing least squares lines, what does the quantity n represent? 71. In what types of problems is the method of one-way analysis of variance use? 72. What assumptions are made on the samples and/or populations when using the one-way analysis of variance? 73. In what type of problem would one use a goodness of fit test? 74. What assumptions must be made on the sample and or population when conducting a goodness of fit test? 75. What r values of the correlation coefficient indicate a good linear fit? What does the coefficient of determination measure?