Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 1100 Sec: 08-27 Test 2-A-2 October 26, 2011 Kathleen McLaughlin Name___________________________ People Soft # _____________________ Section Number ___________________ Information for Questions 1 - 5: The rise in obesity rates in the U.S. has been blamed for the rise in diabetes rates. I decided to explore the relationship between these 2 variables by looking at a sample of state data (26 different states). I chose state obesity rates (% of state population who are obese) as my predictor variable and state diabetes rates (% of state population with diabetes) as my response variable. Please use the MINITAB output displayed here to answer the questions that follow. (Note: I do not show the data here. Use the output and graphs to answer Questions 1-5.) Regression Analysis: diabetes versus obesity The regression equation is diabetes = 1.09 + 0.350 obesity Predictor Coef SE Coef T P Constant 1.088 3.027 0.36 0.722 obesity 0.3498 0.1120 3.12 0.005 S = 1.71772 R-Sq = 28.9% R-Sq(adj) = 25.9% Analysis of Variance Source DF SS MS F P Regression 1 28.767 28.767 9.75 0.005 Residual Error 24 70.814 2.951 Total 25 99.580 Unusual Observations Obs obesity diabetes Fit 4 29.8 6.300 11.513 26 19.6 7.600 7.945 Predicted Values for New Observations New Obs Fit SE Fit 95% CI 1 10.533 0.337 (9.837, 11.229) 95% PI (6.920, 14.146) Values of Predictors for New Observations New Obs obesity 1 27.0 Fitted Line P lot Residuals V er sus obesity (response is diabetes) 15 S R-Sq 28.9% 14 R-Sq(adj) 25.9% 3 1.71772 2 13 1 12 0 Residual diabetes diabetes = 1.088 + 0.3498 obesity 11 10 -1 -2 9 -3 8 -4 7 -5 6 -6 20 22 24 26 28 30 32 34 obesity 20 22 24 26 28 obesity 30 32 34 1. How much of the variation in state diabetes rates can be explained by state obesity rates? a.) b.) c.) d.) e.) 35% 1.09% 28.9% 1.71772% 72.2% 2. The predicted diabetes rate for a state with an obesity rate of 25% is: a.) b.) c.) d.) e.) 8.09% 9.84% 11.59% 9.14% 10.26% 3. For Unusual observation #4, calculate the Residual. a.) b.) c.) d.) e.) 23.5% 18.287% -5.213% -2.3% -7.513% 4. Consider all the states with obesity rates of 27%. We would expect the average obesity rate in those states to lie in the interval: a.) b.) c.) d.) e.) 5. (10.533% ± 0.05%) (27% ± 2.5%) (27% ± 5%) (6.920%, 14.146%) (9.837%, 11.229%) Look at all the Minitab output and comment on how well obesity rates predict diabetes rates. a.) Most of the variation in diabetes rates can be explained by obesity rates. State obesity rates are very strong predictors of diabetes rates. b.) Much of the variation in diabetes rates is NOT explained by obesity rates. State obesity rates are not strong predictors of state diabetes rates. c.) The two graphs clearly indicate that the linear model is incorrect and that a quadratic model would be more appropriate. d.) A causal relationship between obesity and diabetes is clearly demonstrated in this model. Information for Questions 6- 10: The Internet Movie Database monitors the gross revenues for all major motion pictures. The accompanying table gives both the U.S. and Worldwide gross revenues for a random sample selected from the films that were the highest grossing films in the U.S. Movie Title Titanic Shrek 2 E.T. Star Wars: Episode I Spider-Man Star Wars: Episode III The Passion of the Christ The Lord of the Rings: The Two Towers Finding Nemo Spider-Man 3 Forrest Gump Iron Man Indiana Jones and the Kingdom of the Crystal Skull Pirates of the Caribbean: At the World’s End Independence Day Domestic Gross (millions of dollars) 600.8 436.5 434.9 431.1 403.7 380.2 370.3 340.5 Worldwide Gross (millions of dollars) 1835.3 880.9 756.7 922.3 806.7 848.5 604.4 921.6 339.7 336.5 329.6 318.3 317.0 865.0 885.4 679.4 571.8 783.0 309.4 958.4 306.1 811.2 I was interested in seeing whether I could use the U.S. gross revenues to predict the Worldwide gross revenues. That would lead me to believe that movies that are successful here in the U.S. have a worldwide appeal. 6. To begin your analysis, do a scatterplot of the data and comment on the graph. a.) The graph shows a weak negative linear association between the variables. b.) There is a lot of scatter in the data, indicating a strong linear relationship between the 2 variables. c.) The graph shows a positive linear trend but it looks like it is due mostly to one outlier. d.) The graph shows a strong positive quadratic trend. 7.. Fit a linear model to the data and calculate the R-square value. Store the model in Y1. a.) b.) c.) d.) e.) ŷ ŷ ŷ ŷ ŷ = -470.8 +3.572 x, R-Square = 77.9% = 492.4 + 0.9347x, R-Sq = 9.9% = 0.768 – 205.4x, R-Sq = .5900% = -205.4 + 2.87x, R-Sq = 59% = 310.5 – 3.42x, R-Sq = 25.3% 8. Using the model you stored in Y1, calculate the predicted y-value and the residual for Shrek 2. a.) b.) c.) d.) e.) ŷ ŷ ŷ ŷ ŷ = 886.62, = 1030.5, = 652.4, = 1046, = 770.81, Residual = -36.12 Residual = -108.2 Residual = -207.5 Residual = -165.1 Residual = 150.79 9. Remove the data for Titanic and recalculate the linear model and R-square value. a.) b.) c.) d.) e.) ŷ ŷ ŷ ŷ ŷ = 682.7 + 0.344x, = 0.140 + 0.344x, = 0.140 + 682.7x, = 0.344 + 682.7x, = 315.6 – 0.472x, R-Sq = 1.97% R-Sq = .0197% R-Sq = 1.97% R-Sq = .0197% R-Sq = 2.35% 10. Look at the two models (with and without Titanic) and make a statement about the linear relationship between X and Y. a.) Based on the 2 models, there appears to be a strong linear relationship between U.S. revenues and Worldwide revenues. b.) Even without the Titanic data, the linear relationship still shows the U.S. revenues are good predictors of Worldwide revenues. c.) After looking at the scatterplot without the Titanic, I would use a Quadratic model. d.) It appears that the strength of the linear relationship is highly dependent on the Titanic data so I would conclude that U.S. revenues are not good predictors of Worldwide revenues. Information for Questions 11 – 14: Use of the Internet has grown at an amazing rate! Here is the data from the very early stages in 1995 to the present day: Year 1995 1996 1997 1998 1999 2000 2001 2002 Number of Users (in millions) 16 36 70 147 248 361 513 587 Year 2003 2004 2005 2006 2007 2008 2009 2010 Number of Users (in millions) 719 817 1018 1093 1319 1574 1802 1971 I fit a linear model to the data and the results are displayed here: Regression Analysis: Number of Users (in millions) versus Years The regression equation is Number of Users (in millions) = - 263147 + 132 Years S = 132.280 R-Sq = 96.0% Scatter plot of Residuals vs Y ear s 200 Residuals 100 0 -100 -200 1995.0 1997.5 2000.0 2002.5 2005.0 2007.5 2010.0 Years 11. What conclusion would you make about the usefulness of this linear model? a.) Based on the large R2-value and the curved pattern in the residuals, the linear model is a good model for this data. b.) Based on the positive value of the slope, the negative value of the y-intercept and the large R2value, the linear model is a good model for this data. c.) The distinct curved pattern of the residuals tells us that the linear model is not the appropriate model for the data and that other regression models should be considered. d.) The distinct curved pattern of the residuals indicates that the residuals are always positive and squared. 12. Fit a quadratic model to the data and calculate the R-square value. a.) b.) c.) d.) e.) 13. ŷ ŷ ŷ ŷ ŷ = 5.3x2 – 254.21x + 2059, R-square = .988 = 13.4x2 – 2567x + 11921, R-square = .901 = 6.306548 x2 – 25125.9306x + 25026014.77, R-square = .997 = .2173433x2 – 35.2176554x + 4179098076, R-square = .923 = -.00057x2 + 24.21x -7789, R-square = .925 Store the predictions in L3 and the Residuals in L4. Do a scatterplot of the residuals (Xlist:L1 and Ylist:L4). Remember to Deselect Y1. Use the TRACE key to find the ‘largest’ residual. (Note: This can be a positive or a negative value.) Return to the lists: L1 – L4. Which of the following answers contains information on the data point with the largest residual? a.) b.) c.) d.) e.) Year: 2009, y: 1802 million, Year: 2009, y: 1802 million, Year: 2008, y: 1093 million, Year: 2006, y: 1093 million, Year: 2006, y: 1802 million, ŷ : 1956.4 million, Residual: -154.4 million ŷ : 1757.3 million, Residual: 44.7 million ŷ : 1172.7 million, Residual: 24.2 million ŷ : 1172.7 million, Residual: -79.7 million ŷ : 1284.3 million, Residual: 111.6 million 14. Use your quadratic model that you stored in Y1 to predict the number of Internet users in 2011. a. b. c. d. e. 2145 2210 2725 2346 2509 million million million million million Information for Questions 15-16: Although U.S. citizens value the freedoms and rights of democracy, they often do not vote. Data on x: the number of U.S. citizens eligible to vote (in millions) and y: the number of U.S. citizens who actually did vote (also in millions) in the last eight federal elections was entered into MINITAB and a scatterplot was created and is shown here: Scatterplot of Actual votes cast vs Voting age population 105 Actual votes cast 100 95 90 85 80 75 70 120 130 140 150 160 170 Voting age population 180 190 200 The following statistics are calculated: x = 165.2 s x = 26.0 y = 88.0 s y = 10.3 r = .942 Create the regression equation for x: U.S. citizens eligible to vote (in millions) and y: U. S. citizens who actually did vote (in millions). Here are the formulas you need to find the slope and y-intercept for the linear regression equation. Round each value (b and a) to the nearest thousandth: s b r( ) s y and a y bx x 15. Find the least squares regression line for predicting ‘actual votes cast’ from ‘voting age population.’ a.) b.) c.) d.) e.) ŷ ŷ ŷ ŷ ŷ = -488.18 + 507.87x = 26.0 – 10.3x = 26.380 + .373x = 88.0 - .942x = 165.2 – 88.0x 16. What percent of the variation in ‘Actual Votes Cast’ can be explained by the linear relationship between ‘Voting Age Population’ and ‘Actual Votes Cast’? a.) b.) c.) d.) e.) 64.2% 10.3% 26.0% 88.7% 37.3% 17. With the ever increasing price of gasoline, researchers are constantly evaluating automobile data. In one study, data was collected on “car weight (in pounds),” and “miles per gallon.” The data was entered into MINITAB and the following regression equation was generated: ŷ = 45.6 – 0.0052x, where x = ‘car weight in pounds’ and y = ‘miles per gallon.’ Interpret the slope in terms of a 1000 pound increase in the vehicle weight. a.) Mileage would increase by .052 miles per gallon for every 1000 pound increase in vehicle weight. b.) Mileage would decrease by .052 miles per gallon for every 1000 pound increase in vehicle weight. c.) Mileage would increase by 5.2 miles per gallon for every 1000 pound increase in vehicle weight. d.) Mileage would decrease by .0052 miles per gallon for every 1000 pound increase in vehicle weight. e.) Mileage would decrease by 5.2 miles per gallon for every 1000 pound increase in vehicle weight. 18. Blood types: (A, B, AB and O) and Rh factors (+ and -) combine to classify an individual’s blood into exactly one of the following 8 categories: A+, A-, B+, B-, AB+, AB-, O+ and O-. Consider the following statement: “If a person is randomly selected from the population, the probability that he/she has blood type A+ is 1 .” Must this statement be true? 8 a.) Yes, since there are eight different blood classifications. b.) No, we cannot assume that the eight different blood classifications are equally likely. c.) Yes, the eight different blood classifications are independent so are therefore equally likely. d.) No, we cannot assume that the eight different blood classifications are mutually exclusive. e.) Yes, since the eight different blood classifications are mutually exclusive. 19. Internet sites often vanish or move, so that references to them can’t be followed. In fact, 13% of Internet sites referenced in papers in major scientific journals are lost within two years after publication. Suppose you are researching obesity rates in the U.S. You find Medical scientific journal articles with links to 4 Internet sites that you are interested in. All the articles were written prior to 2008. What is the probability that all 4 Internet articles are still good and that you can link to them? Assume that the four links are independent. (You can use a tree diagram to model this problem.) a.) b.) c.) d.) e.) 0.13 0.52 0.169 0.0003 0.573 20. Suppose you roll a fair die 9 times and you get a ‘5’ on each of the 9 rolls. What are the chances that the next roll will be a ‘5’? a.) Since each roll is independent, the probability of ‘5’ changes with each roll so we cannot determine an actual value. b.) The probability that the 10th roll is ‘5’ is 0.167. c.) The probability that the 10th roll is ‘5’ is approximately 0.95 since ‘5’ is now more likely based on the previous rolls. d.) The probability that the 10th roll is ‘5’ is 0.05 since ‘5’ is now very unlikely based on the previous rolls. e.) Since each outcome is a random event, the probably that the 10th roll is a ‘5’ is 0.50 Information for Questions 21 – 22: An individual has a torn tendon and is facing surgery to repair it. The orthopedic surgeon explains the risks to the patient. Infection occurs in 4% of such operations, the repair fails in 15%, and both infection and failure together in1.5%. (Use a Venn Diagram to model this problem.) 21. What is the probability that the operation succeeds and is free from infection? a.) 0.78 b.) 0.80 c.) 0.84 d.) 0.81 e.) 0.825 22. What is the probability that the repair fails, given that an infection occurs? a.) 0.60 b.) 0.455 c.) 0.375 d.) 0.833 e.) 0.015 Information for Questions 23 and 24: The Triple Blood Test screens pregnant women for the genetic disorder, Down syndrome (D). This syndrome occurs in about 1 in 800 live births, that is P(D)=1/800. If the fetus actually has Down syndrome, the Triple Blood Test will result in a positive test with probability 0.89. And so, the probability of a false negative is 0.11. If the fetus does not have Down syndrome, the Triple Blood Test will result in a negative test with probability 0.75. And so, the probability of a false positive is 0.25. Fill in these probabilities on the branches of the tree diagram. D Pos oso s Neg DC Pos Neg 23. Calculate the probability of a Positive Test result. a.) b.) c.) d.) e.) 0.0011125 0.2496875 0.2508 0.89 0.25 24. Calculate the probability of Down syndrome, given that the test is positive. That is, calculate P(D|pos). a.) 0.00618 b.) c.) d.) e.) 0.89 1/800 0.0044 0.0011125 25. A local church holds an annual raffle to raise money for its sister church in Haiti. The first prize is a weekend in Newport and the second prize is two season passes to the Children’s series at Jorgenson. The church sells 200 tickets at $10 a piece. One of the parishioners buys 20 tickets. Winning tickets are drawn without replacement. What is the probability that she wins EXACTLY one (but not both) of the two prizes. Hint: Use a tree diagram to model this problem. a.) b.) c.) d.) e.) 0.10 0.09 0.181 0.20 0.236 26. The Monty Hall Problem: The Monty Hall Problem, named after the host of the long-running game show "Let’s Make a Deal," is a statistical puzzle that seems counterintuitive. A recurring deal on the show featured contestants choosing one of three closed doors, with a big prize (like a car) behind one of them and something else, like a goat, behind each of the others. As a contestant, you are asked to choose a door.. But before Monty Hall opens the door you chose, he wants to make the game more interesting. He opens one of the other doors to reveal a goat. (Note: Monty Hall knows what is behind each door.) Then he asks: "Are you sure you want the door you chose? Or would you like to switch to the other door?" What should you do and why? a.) It does not matter if I SWITCH or STAY. Since there are only 2 doors left unopened, the probability is 0.5 that the car is behind either door. b.) I should STAY because the probability of Winning by STAYING is 1/3. c.) I should SWITCH because the probability of Winning by SWITCHING is 9/10 and the probability of Winning by STAYING is 1/10. d.) I should SWITCH because the probability of Winning by SWITCHING is 2/3. 1. During one holiday season, the Texas lottery played a game called the Stocking Stuffer. The price of a ticket for this lottery was $1.00. Shown here are the various prizes and the probability of winning each prize. Prize (x) $1000 Probability .00002 $100 .00063 $20 .00400 $10 .00601 $4 .03403 $2 .14355 $0 .81176 Calculate the expected value for this game and decide whether it is worthwhile, in the long run, to play. a.) The expected value is $1.62. This means that in the long run you will make a profit of $0.62 for every dollar you spend so it is worthwhile to play the game. b.) The expected value is $0.64. This means that in the long run you will make a profit of $0.64 for every dollar you spend so it is worthwhile to play the game. c.) The expected value is $0.64. This means that in the long run you lose $0.36 for every dollar you spend so it is not worthwhile to play the game. d.) The expected value is $0.64. This means that in the long run you will lose $0.64 for every dollar you spend so it is not worthwhile to play the game. 2. The probability that a male professional golfer makes a hole-in-one is 1/2780. Suppose 36 professional male golfers play the sixth hole during a round of golf. Let the random variable X be the number of golfers in the group of 36 who make a hole-in-one. Calculate the probability that exactly four of the 36 golfers make a hole-in-one on the sixth hole – as actually happened during the 1989 U.S. Open. a.) 0.0005 b.) 9.7 x 10-10 c.) 6.3 x 10-8 d.) 4.2 x 10-2 e.) 3.6 x 10-4 3. According to Information Resources, which publishes data on market share for various products, Oreos control about 10% of the market for cookie brands. Suppose 20 purchasers of cookies are selected randomly from the population. What is the probability that fewer than four purchasers choose Oreos? a.) b.) c.) d.) e.) 0.957 0.867 0.677 0.989 0.190 Information for Questions 4 and 5: I asked all 500 students in Statistics last semester to flip a biased coin that I have 10 times and to record the number of Heads. I know that this biased coin has P(Heads) = 0.75 and P(Tails) = 0.25. Define the random variable X as the number of heads for each individual student’s experiment. So, X is a binomial random variable with n = 10 and p = 0.75. Here is a table of the results of the 500 experiments: X: No. of Heads Freq: 0 1 2 3 4 5 6 7 8 9 10 0 0 0 0 4 32 57 124 148 111 24 4. Use the output in the above table to calculate a relative frequency estimate of P(5 X 7) . a.) b.) c.) d.) e.) 0.93 0.186 0.213 0.426 0.75 5. Now use your calculator and the values for n and p given in the information above to calculate the theoretical probability: P(5 X 7) . a.) 0.2206 b.) 0.2044 c.) 0.4547 d.) 0.3963 e.) 0.75 Information for Questions 6 and 7: The National Center for Health Statistics reports that 25% of all Americans between the ages of 65 and 74 have a chronic heart condition. Suppose you live in a state where the environment is conducive to good health and low stress and you believe the conditions in your state promote healthy hearts. To investigate this theory, you conduct a survey of 150 persons 65 to 74 years of age in your state. 6. On the basis of the figure from the National Center for Health Statistics, calculate the mean and standard deviation of the binomial random variable, X, the number of persons with a chronic heart condition in a sample of 150 persons. a.) b.) c.) d.) e.) µ = 37.5 µ = 37.5 µ = 150 µ = .25 µ = .25 σ = 5.30 σ = 28.125 σ = 28.125 σ = 5.30 σ = .75 7. Based on the sample of 150 persons, would you be surprised to observe X = 21 persons in the 65-74 age group in your state with chronic heart disease? What would this tell you about the environmental conditions in your state? a.) Yes, X = 21 would be very unusual. I would say that the environmental conditions in my state may be conducive to promoting healthy hearts. b) No, X = 21 would not be all that unusual. The sample data does not support the belief that the environmental conditions in my state promote healthy hearts anymore than the conditions in other states. c.) No comment can be made since the criteria for a binomial experiment have not been met. d.) Yes, X = 21 would be slightly unusual since 21 is approximately one standard deviation greater than the mean. Environmental conditions in my state are certainly better than in other states. e.) There is an error in the sampling plan. All sample results must fall within 1 standard deviation of the mean. These results cannot tell us anything about the environmental conditions in my state. Information for Questions 8 - 11: The S & P 500 is a collection of 500 stocks of publicly traded companies. Using data obtained from Yahoo!Finance, the monthly rates of return of the S & P 500 from 1950 through 2010 are normally distributed. The mean rate of return is 0.007443 and the standard deviation is 0.04135. 8. What is the probability that a randomly selected month has a positive rate of return? That is, what is P(X > 0)? a.) b.) c.) d.) e.) 0.68 0.36 0.57 0.43 0.50 9. What is the probability that the mean monthly return for a one year period is positive? That is, for n = 12 months, what is P( X 0) ? a.) b.) c.) d.) e.) 0.73 0.43 0.57 0.27 0.50 10. What is the probability that the mean monthly return for a five year period is positive? That is, for n = 60 months, what is P( X 0) ? a.) b.) c.) d.) e.) 0.92 0.50 0.08 0.84 0.16 11. Consider the sampling distribution of X in Example #9 and Example #10. What effect does the sample size have on the standard deviation of the sampling distribution of X ? a.) As n increases, the standard deviation of the sampling distribution increases. b.) As n increases, the standard deviation of the sampling distribution decreases. c.) Since the underlying population is normal, the sample size has no effect on the standard deviation of the sampling distribution.