Download Statistics Review Chapters 1-8

Statistics Review In the paper “Reproduction in Laboratory colonies of Bank Vole,” the authors presented the results of a study of litter size. (A vole is a small rodent with a stout body, blunt nose, and short ears.) As each new litter was born, the number of babies was recorded, and the accompanying results were obtained. 1 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 10 10 11 The authors also kept track of the color of the first born in each litter. (B = brown, G = gray, W = white, and T = tan) B W B G B B G B G B B B T G T B B B W G T W B G B B B B T T W B B B B G B B G W T T B B T T B G T B G T G T W B G G B G G G W B W B W T B T 1. Which variable, litter size or color, is categorical? 2. Which variable is quantitative? Litter Size 3. Make a bar chart of the colors. 45 41 40 Frequency 35 30 25 25 21 20 13 15 10 5 0 Brown Gray White Color Tan G B B W B T G T G W B T B T G T T W B G B B G T B G G G B W 4. Make a histogram of the litter sizes. 5. Make a dotplot of the litter sizes. 6. Are there any outliers in the histogram or dotplot? Yes, there are outliers. Use the outlier rule to prove or disprove outliers, in both directions!!! Q1 1.5  IQR  5 1.5 2  2 and Q3 1.5  IQR  7 1.5 2 10 So any numbers less than 2 or more than 10 are considered outliers. So 1 and 11 are outliers. 7. Describe the shape of the histogram (symmetric or skewed). Approximately Symmetric  8. Find the mean of the litter sizes. x1  x 2  ... x n  x  5.87 n You can use 1-Var Stats XList. 9. Is the mean resistant to outliers? The mean is NOT resistant to outliers. Outliers can and do effect the mean. 10. Find the median of the litter sizes. M = 6, put the numbers in order (if they are not) and then find the middle one. If  there are 2 middle numbers, it is the average of the 2 numbers. 11. Is the median resistant to outliers? Yes, the median IS resistant to outliers. Outliers do not effect where the middle number is. 12. Find the range of the litter sizes. The range is from 1 – 11, or 10. 13. Find the 5-number summary of the litter sizes. The 5 # Summary is {1, 5, 6, 7, 11} (Min, Q1, Median, Q3, Max) 14. What is the interquartile range? IQR = Q1 – Q3 = 2 15. Make a boxplot of the litter sizes. 16. Find the variance of the litter sizes. (x1  x ) 2  (x 2  x ) 2  ... (x n  x ) 2 s   3.2860 n 1 2 x To get credit, you MUST plug in the first 2 sets of numbers and the last one. Just writing the formula with variables, doesn’t count! 17. Find the standard deviation of the litter sizes. #16  1.8127 Remember you can use 1-Var Stats to find these, but you must plug the numbers into the formula for full credit  18. Is standard deviation resistant to outliers? No, standard deviation is NOT resistant to outliers. Since we use x-bar in the formula and it is not resistant, neither is anything that uses it. 19. Find the degrees of freedom of the litter sizes. n-1 = 99 20. What is the area under a density curve? The area under a density curve is always 1. 21. The (mean or median) of a density curve is the equal-areas point, the point that divides the area under the curve in half. 22. The (mean or median) of a density curve is the balance point, at which the curve would balance if made of solid material. 23. If a density curve is skewed to the right, the (mean or median) will be further to the right than the (mean or median). The average gets “pulled” towards the extremes/outliers. 24. What is the difference between x and  ? X-bar is the average of a sample (the “real” or actual average) or the average without weight.  is the population mean, ideal mean (“statistical guess”), or mean for a Random Variable. 25. What is the difference between s and  ? Same as #24. s is the standard deviation of a sample (“actual” standard deviation).  is the standard deviation of a population, idealized standard deviation, or for a Random Variable. 26. Normal curves are density curves that are single peaked, bell-shaped, and symmetric. (See page 85) 27. How do you find the inflection points on a normal curve? The inflection points are where the curve turns from concave down to concave up or exactly 1 standard deviation away from the mean. 28. Sketch the graph of N(266, 16), the distribution of pregnancy length from conception to birth for humans. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. 29. What is the 68-95-99.7 rule? At approximately 68%, 95% and 99.7% will be where 1, 2 and 3 (respectively) standard deviations will fall away from the mean of a normal curve.  30. Using the empirical rule (the 68-95-99.7 rule), find the length of the longest 16% of all pregnancies. Sketch and shade a normal curve for this situation. The longest 16% of all pregnancies are 282 days and greater. 68% (1 standard deviation) is in the middle, so 16% is on each side. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. 31. Using the empirical rule, find the length of the middle 99.7% of all pregnancies. Sketch and shade. The middle 99.7% (3 standard deviations) of pregnancies are between 218 and 314 days. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. 32. Using the empirical rule, find the length of the shortest 2.5% of all pregnancies. Sketch and shade. The shortest 2.5% (2 standard deviations, so that 95% is in the middle) of all pregnancies is at 234 days and less. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. 33. Using the empirical rule, what percentile rank is a pregnancy of 218 days? The percentile rank for 218 days is the area of the curve less than 218 days, which is the .15th percentile. Remember percentile is everyone at or below that value. Use the Z formula and the Z table to answer questions 34 – 43. Use the following formula Z xx s 34. What percentile rank is a pregnancy of 298 days? We want the percent of pregnancies at or below 298 days. Z 298  266  2, so we get .9772 in the table. 16 35. What percentile is a pregnancy of 250 days? Z 298 days is at the 97.72nd percentile. 250  266  1 . 16 Using the table, we get .1587. So 250 days is at the 15.87th percentile. 36. What is the percentile of a pregnancy of 266 days? Z 266  266  0. 16 Z 279  266  .8125 . 16 percentile. 37. What z-score does a pregnancy of 279 days have? You don’t have to find the percentile/percentage. It just asks for the z-score. So we get .5, so 266 days is at the 50th So 279 days has a z-score of .8125. 38. What percent of humans have a pregnancy lasting less than 279 days? Sketch and shade a normal curve. Using the answer from #37, look for the area of the shaded region for a z-score of .82. We get .7910, so 79.1% of all pregnancies last less than 279. Use InvNorm to check this answer QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. 39. What z-score does a pregnancy of 257 days have? z 257  266  .5625 16 So 257 days has a z-score of -.5625. 40. What percent of humans have a pregnancy lasting less than 257 days? Sketch and shade. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. P(Z  .5625)  .2877 , you have to look up z=-.56 in the table since it only allows 2 decimal places. So 28.77% 41. What percent of humans have a pregnancy lasting longer than 280 days? Sketch and shade. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. P(Z  280)  P(X  280  266 )  P(Z  .875)  .8106 , so 81.06% 16 42. What percent of humans have a pregnancy lasting between 260 and 270 days? Sketch and shade. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. P(260  X  270)  P( 260  266 270  266 Z )  P(.375  Z  .25)  .2467 , Look up z<.25 and 16 16 then subtract z<-.38. So 24.67% 43. Would you say pregnancy length is a continuous or discrete variable? Justify. Continuous because you could find down to the days/mins/sec of pregnancy, if possible . 44. You have normal distributions on your TI-83. Use these functions to check your answers to problems 38, 40, 41, and 42. You can use Normalcdf (lower, upper, mean, standard deviation) to check your answers for problems where you know the days. If you are given the percentage, then you would use InvNorm(area below, mean, standard deviation). Remember these answers should be very close to what you get from the table. You do NOT have to use the tables on the Final or the AP exam; they are they incase you need them. If you do use the calculator, so NOT use “calculator speak” as replacement for work!!  45. How long would a pregnancy have to last to be in the longest 10% of all pregnancies? You can use the table, or InvNorm in your calculator. In the table, look up .9000 (or the closest number possible) inside the table (not a znumber). You get z=1.28, now solve for x. 1.28  x  266 . 16 X=286.48 days or longer (don’t forget “or longer”) You can also use InvNorm(.9, 266, 16) = 286.5048 days or longer. Don’t forger you must show work using the formula. “Calculator speak” means nothing and is NOT considered work! 46. How short would a pregnancy be to be in the shortest 25% of all pregnancies? Do the same thing as #45. Using the  With the calculator 255.2082 days or less.. table, you get 255.28 days or less. 47. How long would a pregnancy be to be in the middle fifth of all pregnancies? This does NOT say 50%!! It says 20%, and the middle 20%. So you want to know between 40% & 60%. Table = between 262 & 270 days. Calculator = between 261.9464 & 270.0536 days. 48. Does the vole information from the beginning of this review seem to be normal? Justify by checking actual percentages within 1, 2, and 3 standard deviations of the mean. Check to see if the percentages of the 1,2,3 standard deviations are relatively close to 68%, 95%, and 99.7%. 5.87 1s  4.0573 & 7.6827 , there are 62% of the numbers (5, 6 & 7) that fall in that range. 5.87  2s  2.4456 & 9.4954 , there are 94% in that range. 5.87  3s  .4318 &11.3084 , and finally 100% in this range. This is REALLY close to the correct values, so, these seem to be normal.   49. Make a back-to-back split stemplot of the following data:  Reading Scores 4th Graders 12 15 18 20 20 35 35 35 36 37 39 40 7th Graders 1 12 15 18 18 30 30 31 33 33 33 35 22 42 20 36 25 26 28 29 31 32 23 23 24 25 27 28 50. Make a comparison between 4th grade and 7th grade reading scores based on your stemplot. 7th graders have a smaller min & max. 4th graders might have an outlier at 1.  51. What is the mode of each set of scores? 4th grade = 35. 7th grade = 33. 52. Is the score of “1” for one of the 7th graders an outlier? Test using the 1.5 IQR rule. To find IQRs, we use the rule that any numbers beyond the values of Q1-1.5IQR & Q3+1.5IQR are outliers. Q1=19, Q3=32, IQR = 32-19=13. 191.5*13=-.5. No, 1 is not an outlier. 53. What is the difference between a modified boxplot and a regular boxplot? Why is a modified boxplot usually considered better? Modified boxplots show outliers and regular boxplots do not. More detail is always better, especially since outliers can mess up a lot of different calculations. So it’s good to know if there are any. 54. What are the four principles that guide the examination of data? (See page 121) *What individuals do the data describe? *What exactly are the variables? How are they measured? *Are all the variables quantitative or is at least one a categorical variable? *Do you want simply to explore the nature of the relationship, or do you think that some of the variables explain or even causes changes in others? That is, are some of the variables response variables and others explanatory variables? 55. Graph the following hot dog data: Calories Sodium (milligrams) 108 149 130 350 132 345 135 360 138 360 140 375 144 380 145 390 150 400 163 415 167 400 172 420 176 450 180 500 184 505 Calories(cont.) Sodium (milligrams) (cont.) 195 500 200 515 Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. 56. What is the response variable? Sodium is the response. 57. What is the explanatory variable? Calories are the explanatory variable. 58. What is the direction of this scatterplot? (positive, negative…) 59. What is the form of this scatterplot? (linear, exponential…) 60. What is the strength of this scatterplot? (strong, weak…) Pretty strong, if it wasn’t for that 1 outlier. Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. 61. Are their clusters? If so, where? There are 2 main clusters. 62. Are their outliers? (Outliers in a scatterplot have large residuals.) There are 2 ways to find residuals. The long way: y  yˆ . First do LinReg (#8). Then go to an empty list, an make it = Ylist – Y1(Xlist). Then plot this new list with your original Xlist. The fast way: First do LinReg (#8). Go to plot a scatter plot with your Xlist & “Resid” (which is   Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. found under List, but you MUST do LinReg first!). This is the original X & Y lists with the Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. equation . This is the residual plot 63. If there are outliers, are they influential? Yes, and it is influential because it will dramatically change the LinReg if it were deleted. 64. Calculate the correlation. x n  x yx  y  1 x1  x y1  y  x 2  x y2  y  r     ...   .9195 To get credit, you    n 1 s s s s s s             y x y x y  x  MUST plug in the first 2 sets of numbers and the last one. Just writing the formula with variables, doesn’t count! 65. Calculate the correlation without the point (108, 149). .9587 (All answers in this answer key will now no longer have this point calculated) 66. Using problems 64 and 65, check your answer to problem 63. Yes, influential outliers not only change the Line of Best Fit, but it will also change the correlation. 67. What two things does correlation tell us about a scatterplot? Direction & Strength. 68. If I change the units on sodium to grams instead of milligrams, what happens to the correlation? Nothing, correlation is not affected by unit changes. It describes the relationship between the variables. No matter how they are measured, that doesn’t change. 69. What is the highest correlation possible? -1 & 1. Remember that -1 is a “high” correlation, just in the negative direction. 70. What is the lowest correlation possible? 0 71. Correlation only applies to what type(s) of relationship(s)? Quantitative, you cannot put categorical/words into the formula. 72. Is correlation resistant to outliers? No, as we saw in # 65. Any formula with x-bar in it is not resistant. 73. Does a high correlation indicate a strong cause-effect relationship? No. Cause & effect can only be determined through an experiment. Correlation shows something just might be there, but might not. Remember the ice cream and drowning example we did in class. 74. Sketch a scatterplot with a correlation of about 0.8. May vary, see me if you are unsure. 75. Sketch a scatterplot with a correlation of about –0.5. May vary, see me if you are unsure. 76. Find the least-squares regression line (LSRL) for the calories-sodium data. a  y  bx & br sy sx yˆ  a  bx Don’t forget the “hat”!!!! If the “a” or “b” are asked for individually, make sure you show the work!! If LSRL is   asked for, you do not need to show how you got “a” and/or “b”. So the answer is that one point still deleted) Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. yˆ  24.4389  2.4594 x (with  77. Draw the LSRL on your scatterplot. 78. What is the slope of this line, and what does it tell you in this context? Slope is “b”, so it’s 2.4594. It says that for every 1 Calorie, Sodium increases 2.4594 units. 79. What is the y-intercept of this line, and what does it tell you in this context? Y-intercept is “a”, so it’s 24.4389. It means that if there are no Calories in a hotdog there will still be 24.4389 units of Sodium. 80. Predict the amount of sodium in a hot dog with 155 calories. If your LSRL is in Y1, you can use the notation Y1(155) or just hand calculate x=155. y=405.6488 units (with that one point deleted) 81. Predict the amount of sodium in a hot dog with 345 calories. Y = 872.9384 units of sodium. 82. Why is the prediction in problem 80 acceptable but the prediction in problem 81 not? #81 is extrapolation (far from our data) and cannot be trusted because trends may not continue the same way so far away from the given information. However #80 is interpolation (close/within our data) and is usually acceptable. 83. Find the error in prediction (residual) for a hot dog with 180 calories. Error (or residual) is y  yˆ  500  y1 (180)  500  467.1343  32.8657 84. Find the residual for 195 calories. -4.02559 85. The point ( x,y ) is always on the LSRL. Find this point, and label it on your scatterplot. (159.4375, 416.5625) 86. Find the standard deviation of the calories. (x1  x ) 2  (x 2  x ) 2  ... (x n  x ) 2 sx   23.1343 n 1 87. Find the standard deviation of the sodium. Same formula as #86, except with y. s y  59.3498 88. Using the equations on page 153,  verify the slope and intercept of the LSRL. Check these by plugging the numbers into the formula by hand, do not use 1-Var Stats.  89. Find the coefficient of determination for this data. Coefficient of determination is r2, so just square #65. r2=.9191 90. What does r2 tell you about this data? It tells us how well the y-values (sodium of a hotdog) are explained (predicted) by the x-values (calories of the hotdogs) though our LSRL. 91. How can you use a residual plot to tell if a line is a good model for data? A residual plot tells us if data is actually linear (remember our eyes can deceive us!). If the data is linear, we will find equal distant, random scatter in the dots. That is that there will be about as many points above & below the x-axis. And there will be no obvious pattern to the data. If there are obvious clusters and/or pattern(s), then the data was not linear to begin with. Thus a LSRL (linear equation) is not the best fit. 92. If you know a scatterplot has a curved shape, how can you decide whether to use a power model or an exponential model to fit data? You can check the y-values. If they increase my approximately the same ratio, then it’s probably exponential. 93. Graph the following data: Time (days) Mice 0 6 30 19 60 60 90 195 120 597 Ratio ////// 19/6=3.1667 60/16=3.1579 195/60=3.25 597/195=3.0615 Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure.  94. Perform the appropriate logarithmic transformation (power or exponential) by hand on the above data to get an equation. Since the ratios (see #92) are all approximately the same, I will use exponential. This means I must make a Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. new list and make it the LogY. Now graph. The LinReg(Xlist, LogYlist) is log yˆ10 .7788  .0167 x 10 log yˆ  .7788  .0167 x Now get rid of the log. yˆ  10 .7788 10 .0167x You can stop at the 2nd step, y  6.0094 1.0392 x but if you want, or are asked to, you can evaluate the powers of 10. 95. Check your equation in problem 94 using your calculator’s power and exponential regression functions. Stat->Calc->ExpReg (Xlist, Ylist) make sure you use the original y-list, not the logy! You get  yˆ  6.0094 1.0392 x Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure.  96. Make a residual plot to support your choice for problem 94. 97. Graph the following data: Diameter (inches) Cost (dollars) 6 3.50 9 8.00 12 14.50 15 22.50 20 39.50 Follow directions from # 62 Ratio ///// 8/3.5=2.2856 14.5/8=1.8125 22.5/14.5=1.5517 39.5/22.5=1.7556 Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. 98. Perform the appropriate logarithmic transformation (power or exponential) by hand on the above data to get an equation. Since the ratios are not really close, I’m going to try power. So we must make 2 new lists and log both X & Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. Y. If you look closely, it does look a little straighter. The LSRL of LogX, LogY is 10 log yˆ10 1.0217  2.0167 log x yˆ  10 1.0217 10 2.0167logx log yˆ  1.0217  2.0167 log x , now get rid of the logs. yˆ  10 1.0217 10 logx 2.0167 yˆ  10 1.0217  x 2.0167 yˆ  .0951  x 2.0167  99. Check your equation in problem 98 using your calculator’s power and exponential regression functions. Using ˆ  .0951 x 2.0167 PwrReg is found just below ExpReg. PwrReg(Xlist, YList), we get y 100. 101.  Make a residual plot to support your choice for problem 98.  Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. 101. What is the correlation for the equation you found in problem 98? r=.9999 102. What is extrapolation, and why shouldn’t we trust predictions using extrapolation? Extrapolation is being far from our data and cannot be trusted because trends may not continue the same way so far away. 103. What is interpolation? Interpolation is being close/within our data and is usually acceptable because we know how the trend reacts there. 104. What is a lurking variable? Something in the background that we don’t know about that may affect our variables. 105. Why should we avoid using averaged data for regression and correlation? Average data hides extremes, which doesn’t give you the entire picture of the data (usually more details is always better). Plus, many times, people want to take that data and break it down to use it to predict, which is inappropriate. For example, if you have data on the average snow fall in a month, some people might want to break that down to average snow fall per day. 106. What is causation? Give an example. Causation is one variable actually causes a change in another variable (regardless if the are other variables also making that change). Example: the harder your shake your soda bottle, the more “fizz” is made. The variables are vigorousness of the shake & amount of fizz. Check with me if you’re not sure if your example is causation. 107. What is common response? Give an example. Common response is when both variables actually do not affect each other at all. The change in both is due to the same 3rd (lurking) variable. Example: Sales of snow blowers goes up & down indirectly with the sales of swimsuits. Just looking at that info and not “thinking” you could say that if a store sells less snow blowers, that will make swimsuit sales go up. We know that there is a 3rd variable effecting both sales: weather. Check with me if you’re not sure if your example is common response. 108. What is confounding? Give an example. Confounding is when you’re not sure if one variable is affecting the other or if it’s a 3rd lurking variable. So you’re confounded on if “x” is actually causing any change in “y” at all. If you’re not “confused” on “x” causing “y” and you can come up with another variable that also affects “y,” that example is Causation. Confounding is only for when you’re not sure about “x.” Example: Intensity of a headache and amount of light in the room. For some people, light might not do anything to their headache, for others, it makes their headache much worse. So light may or may not actually be causing someone’s severity of the headache. 109. Why is a two-way table called a two-way table? A “two-way” table is a normal table, with all row & column variables shown. Use this table for questions 110–117: Education Did not complete high school Completed high school 1 to 3 years of college 4 or more years of college Total 110. Smoking Status Never smoked 82 97 92 86 357 Smoked, but quit 19 25 49 63 156 Smokes 113 103 59 37 312 Fill in the marginal distributions for this table. See above, check table total in both directions! Total 214 225 200 186 825 111. Display this table on a segmented bar chart (See p. 244). Education & Smoking 120 Did not complete high school Completed high school 1 to 3 years of college 4 or more years of college Education 100 80 60 40 20 ok es Sm it qu t bu ok ed , Sm N ev er sm ok ed 0 Smoking Status 112. What percent of these people smoke? 312/825=37.82% 113. What percent of never-smokers completed high school? 97/357=27.17% 114. What percent of those with 4 or more years of college have quit smoking? 63/186=33.87% 115. What percent of those with some college smoke? (59+37)/(186+200)=25.87% 116. What percent of smokers did not finish high school? 113/312=36.22% 117. What conclusion can be drawn about smoking and education from this table? The trend is the less likely are you to go further in schooling if you smoke or did smoke, but quit. However, if you did quit, you are more likely to go on in school than someone who smokes & didn’t quit. (However, remember you cannot prove causation from this!) 118. What is Simpson’s Paradox? Simpson’s Paradox is when 2 variables are shown in two two-way table, along with rd a 3 variable. Then for whatever reason, the 3rd variable is taken away to make one two-way table. The omission of the 3rd variable now makes it a “lurking” variable & this can completely reverse any trends or relationships seen in the different tables between the original two variables. 119. What is the difference between an observational study and an experiment? Observational study is when you are just “watching and recording” data or when you are just gathering data from different sites to combine it (like from different hospitals or gathering data from the internet). Experiment is when you are actually imposing a treatment on the subject to gather any responses. So you’re actually doing something to cause a response (remember, no response is a response) 120. What is a voluntary response sample? When all subject completely volunteered him or herself and were not put in some kind of random selection. Yes, most experiments on humans are “voluntary” since you need their permission, but this is not necessarily voluntary response. 121. How are a population and a sample related but different? Population is the “big” group. A sample is the “small” group from within that population. 122. Why is convenience sampling biased? To not be biased, there needs to be as little human thought in picking the subject for the sample. If the sample is picked based on convenience, they could all be similar in some aspect that can taint the data. 123. SRS stands for what kind of sample? Name and define. Simple Random Sample. Number every thing in the population and used some form of random number sampling to pick the ones in your sample. 124. Discuss how to choose a SRS of 4 towns from this list: Number these towns from 1 – 12 or 0 – 11. (any way you want. Up/down or across). Then use the randInt in your calculator randInt(1,12,4) or you can use the random number table in your book. Allendale Bangor Chelsea Detour Edmonton Fennville Gratiot Hillsdale Ionia Joliet Kentwood Ludington 125. What is a stratified random sample? Similar to SRS, but before you number & pick, you break up the population into some groups, usually based on some variable that you believe may affect your data. (Example: gender, species of animal, etc) Then you number each group and select from each group separately. 126. What is a multistage sample? Before numbering & selecting, you much break down a LARGE population into a small manageable group so you can number it and pick from it. They do this in the US census. 127. What is undercoverage? Undercoverage is from excluding a group from the population based on HOW you decided to take the sample (not because you couldn’t get a hold of them) Example: door-to-door survey from 9:00am – 2:00pm Monday - Friday. Because of this type of survey and how it’s being done, you will exclude all 1st shift workers from your survey. 128. What is nonresponse? Nonresponse is when you cannot get a hold of someone, but you could based on how you did your sampling. Example: phone surveys. You “undercover” any one with a phone, but you get nonresponse from people like me who just don’t answer their phones. You could get a hold of me, if I answered my phone. 129. What is response bias? Response bias is when the person giving the answers lie (on purpose or accident). Or question wording is another type of example. See next question. 130. Why is the wording of questions important? Give an example. Some times the way a question is worded or asked by the interviewer can lead the interviewee to one answer over another. “Who should be allowed to carry concealed weapons? A) Only safety officials. B) Safety office & private citizens with a clear need C) Any private citizen” (found at Gallop.com at http://www.gallup.com/poll/16822/Public-Wary-About-Broad-Concealed-FirearmPrivileges.aspx) This leaves no option for those who believe no one should and may feel that they should answer one of the given choices even if they don’t believe it. 131. How are experimental units and subjects similar but different? Experimental units stand for all things/people who are experimented on. However, we usually reserve the word “subjects” for humans only. 132. Explanatory variables in experiments are often called _factors____. 133. If I test a drug at 100 mg, 200 mg, and 300 mg, I am testing one variable at three __treatments___. 134. What is the placebo effect? When the subject reacts to a placebo like they were getting the real treatment. 135. What is the purpose of a control group? To help control the affects of lurking variables on the experiment and to have something to compare to after treatment is over. 136. Give an example of when we may not want to use a placebo/control group. We do not need to use placebos on none “conscious” things, but usually will still need a control. However, one reason we may want to not use a placebo is if we are comparing two different treatments to each other. Then we don’t need a placebo. 137. What are matched pairs experiments? Give an example. Matched pairs experiments are when both treatments are places on the same subject (can also be a before & after). Example: I want to determine the effectiveness of 2 different hand lotions. Since everyone’s skin is different, I will use both lotions of the same person and then compare. 138. What are the three principles of experimental design? * Control the effects of lurking variables by comparing several treatments. *Randomly assign subjects to treatment groups. * Replicate the experiment on many subjects to reduce chance variation. 139. What does double-blind mean, and why would we want an experiment to be double-blind? In a double-blind experiment, neither the subjects nor the people recording the observations know which subject received which treatment. This reduces bias. 140. Give an example of how lack of realism in an experiment can give false results. If the subjects know they are being watched, or if the situation doesn’t seem real to them (like scenarios you see on a candid type camera show), they may act differently or how they think the person watching would want them to react. 141. What is block design and how is it different than stratified random samples? Not any different really. We use the words “block design” to describe what we do in an experiment and “stratified random sampling” for observational studies. See number 125 for a description. 142. I want to test the effects of aerobic exercise on resting heart rate. I want to test two different levels of exercise, 30 minutes 3 times per week and 30 minutes 5 times per week. I have a group of 20 people to test, 10 men and 10 women. I will take heart rates before and after the experiment. Draw a chart for this experimental design 5 men for Treatment A compare Men 10 men 5 men for Treatment B compare everyone random selection 20 people 5 women for Treament A compare women 10 women 5 women for Treament B 143. Why is simulation useful? Sometimes you cannot actually do an experiment for time, monetary reasons, etc. 144. What are the five steps of a simulation? * State the problem * State the assumptions * Assign digits to represent outcomes * Simulate many repetitions * State your conclusions 145. Design and perform a simulation of how many children a couple must have to get two sons. (A simulation involves many trials. For this simulation, perform 10 trials.) There will be various answers to this. You can use your calculator or random number table. P(male) = .5 See me for help if you need it. 146. What is independence? Independence of two or more events is when they do not affect each other’s probabilities. There is a test for independence. A & B are Independent if and only if P(A)  P(B)  P(A  B) 147. You are going to flip a coin. What is the sample space for each flip? S={H, T} 148. You are going to flip a coin three times and note how many heads and tails you get. What is the sample space?  S={0, 1, 2, 3} 149. You are going to flip a coin three times and note what you get on each flip. What is the sample space? S= {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} 150. 3rd coin Make a tree diagram for the three flips. 2nd coin Heads Heads Tails 1st coin Heads Tails Tails Heads HHH Tails HHT Heads HTH Tails Heads HTT THH Tails THT Heads TTH Tails TTT 151. There are three ways I can drive from Fremont to Grand Rapids and four ways I can drive from Grand Rapids to my home. How many different ways can I drive from Fremont to my home through Grand Rapids? There are two “tasks” for you to do: drive to Grand Rapids & then home. Use the multiplication rule. 3*4=12. 152. How many different four-digit numbers can you make? There are 4 tasks: pick 4 numbers. 10*10*10*10=10000 153. How many different four-digit numbers can you make without repeating digits? There are 4 tasks again, but this time, there can’t be any repeats! So it’s 10*9*8*7=5040 154. What is an event in probability? An event is one or more of the possible outcomes in a random phenomenon. 155. Any probability is a number between (and including) __0___ and __1___. 156. All possible outcomes together must have probability of __1___. 157. If S is the sample space, P(S) = ___1__. 158. What are complements? Give an example and draw a Venn diagram. Complements are the “nots.” If an event is rolling less than a 3 on a die, then it’s complement is “not rolling less than a three” or “rolling greater than or equal to 3” 159. What are disjoint events? Give two examples and draw a Venn diagram. Disjoint events are 2 or more events that share no outcomes. Like Aces & Kings in a deck of cards 1 and 2 Aces 3, 4, 5 and 6 Kings   Use the following chart for questions 160-163: M & M Color Brown Probability 0.3 Red 0.2 Yellow 0.2 Green 0.1 Orange 0.1 Blue ? 160. What is the probability that an M & M is blue? All colors add to 1, so 1-.3-.2-.2-.1-.1 = .1 161. What is the probability that an M & M is red or green? P(red or green) = .2+.1=.3 162. What is the probability that an M & M is yellow and orange? Impossibility. You cannot have a yellow & an orange M & M at the same time. So the answer is  (empty set). 163. What is the probability that an M & M is not brown or blue? P(Brownc  Blue)  P(red  yellow  green orangeblue)  .7 164. Bre can beat Erica in tennis 9% of the time. Erica can swim faster than Bre 8% of the time. What is the probability that Bre would beat Erica in a tennis match and in a swimming race? Assuming independence, then it’s .09*.92=.0828 165. What assumption are you making in problem 164? Do you think this assumption is valid? We assumed the two races were independent. This probably is not valid. If you loose one race it can affect your next race, positively or negatively. 166. Using two dice, what is the probability that you would roll a sum of seven or eleven? Check p. 336 for dice sample space. 167. P(7 11)  8 2  36 9 Using two dice, what is the probability that you would roll doubles?  6 1  36 6 168. Using two dice, what is the probability that you would roll a sum of 7 or 11 on the first roll and doubles on the second roll? “And” means to “multiply” in probability. And since dice rolling is independent, we can just multiply the two probabilities: 2 1 1 P[(7 11)  (doubles)]    9 6 27 169. What assumption are you making in problem 168? Do you think this assumption is valid? The dice are independent is the assumption. Yes, it’s valid. 170. Using two dice, what is the probability that you would roll a sum of 7 or 11 that is also doubles? P[(711)(doubles)]   This is impossibility. 171. What is the union of two events? The union of two or more events is when any one or more of the events can happen. The union of Aces and Hearts in a deck of cards has the S={Ace of D, Ace of H, Ace of S, Ace of C, 2H, 3H, 4H, 5H, 6H, 7H, 8H, 9H, 10H, JH, QH, KH} 172. What is an intersection of two events? The intersection of two or more events is when all the events happen at the same time. The intersection of Aces and Hearts in a deck of cards has the S={Ace of Hearts}. 173. How can we test independence in probability? We can use the independence rule: P(A)  P(B)  P(A  B) If this equation is true, then A & B are independent. If this equation is not true, then A & B are not independent.   174. Perform an independence test on the smoking/education chart from problem 110 to show that smoking status and education are not independent. You can check any 2 options from the smoking & education categories. As soon as you find one that does not work, then the two categories are not independent. ? P(Did not complete HS  Never smoked ) P(Didn' t complete HS)  P(Never smoked ) 82 825 .0994 214 357  825 825  .1122 ?  So Education & smoking are not independent, they are dependent according to this information. 175. Make a Venn diagram for the following situation: Do this problem in the following order to make it work. 7. 45% of kids like Barney This involves the ENTIRE circle of Barney. There are 4 Sections, we’ve already used 3, so subtract what’s left to find out the last part: 15% 6. 25% of kids like Blue This involves the ENTIRE circle of Blue. There are 4 Sections, we’ve already used 3, so subtract what’s left to find out the last part: 5% 5. 55% of kids like Pooh This involves the ENTIRE circle of Pooh. There are 4 Sections, we’ve already used 3, so subtract what’s left to find out the last part: 20% 4. 15% of kids like Blue and Pooh This is 2 sections again. Again, we already used 5% for #2, so there’s only 10% left for the Blue & Pooh only section. 3. 25% of kids like Barney and Pooh This involves 2 sections, the section we used in #2 and the Barney & Pooh only section. Since we’ve used 5% already in #2, there’s 20% left for the other section. 2. 5% of kids Barney, Blue, and Pooh So a “5” goes in the section were all 3 circles intersect. 1. 5% of kids like Blue but not Barney or Pooh Start with the one with the most information, this one. So a “5” goes in the crescent for Blue all by himself. 8. Now subtract all the parts from 100% to find the probability of being in the box, but outside of the circles. In other words, kids that do not like any of them at all = 20% Don’t forget the “box.” The box says, “Here are the subjects in this experiment or survey” Outside of the box is everyone else. Blue Barney 5% 5% 15% 5% 10% 20% 20% 20% Pooh  176. A dartboard has a circle with a 20-inch diameter drawn inside a 2-foot square. What is the probability that a dart lands inside the circle given that it at least lands inside the square? (Assume a random trial here.) Find the area of 2 2 each, the Circle A  r  10  314.1593 (store in your calculator so you don’t have any rounding error. Square = 242=576. Now divide Circle/Square = .5454 2 ft 20 in  177. Give an example of a discrete random variable. Number of students in a class. See me if you’re not sure if your example is correct. 178. Give an example of a continuous random variable. Temperature on a given day (we can measure to as many decimal places as technology will allow us). See me if you’re not sure if your example is correct. 179. Make a probability histogram of the following grades on a four-point scale: Grade Probability 180. 0 0.05 1 0.28 2 0.19 3 0.32 4 0.16 Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. Using problem 179, what is P(X > 2)? P(3 or 4)  P(3)  P(4)  .46 P(2 or 3 or 4)  .67 181. Using problem 179, what is P(X > 2)? 182. What is a uniform distribution? Draw a picture. Uniform distribution is when all the probabilities for every x-value is the same. Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure.  183. In a uniform distribution, what is P(0.2 < X < 0.6)? Find the area between .2 & .6, if 0  X 1. .4*1 = .4 184. In a uniform distribution, what is P(0.2 < X < 0.6)? Same as #183. The probability of being exactly on a point in a continuous distribution is 0. So, you take the answer from #183 and add 2*0 and you still get .4 185. How do your answers to problems 180, 181, 183, and 184 demonstrate a  difference between continuous and discrete random variables? A uniform distribution is continuous and there is no difference between < and . There is a difference between them if we are using a discrete distribution, like in # 180 &181. 186. Normal distributions are (continuous or discrete). 187. Expected value is another name for __mean/average___. 188. Find the expected value of the grades in problem 179. Var Stats XList, ProbList 189.  x  x1 p1  x2 p2  ... xk pk  2.26 You can use 1- Find the variance of the grades in problem 179. 2   (x1  x ) 2 p1  (x2  x ) 2 p2  ... (x k  x ) pk  1.3724 2 x 190. Find the standard deviation of the grades in problem 179.   You can use 1-Var Stats XList, ProbList. #189  1.1715  191. What is the law of large numbers? The law of large numbers says that probabilities can only be counted on in the long-run. Never in the short-run. 192. If I sell an average of 5 books per day and 7 CDs per day, what is the average number of items I sell per day? Use the rule for means:  x y   x   y  12 items , where  x  5 and  y  7 193. If I charge $2 per book and $1.50 per CD in problem 192, what is my average amount of income per day? ( a bX )( a bY )   a bX   a b Y  a  b X   a `  b `Y   $20.50 , where a=a`= nothing, b=$2, and b`=$1.50    Combine both means rules: 194. ` ` Before you can use the rules for variances you must make sure the variables are _independent____.  use the following situation: For Test 1, the class average was 80 with a standard deviation of 10. For problems 195-203, For Test 2, the class average was 70 with a standard deviation of 12. 195. What is the average for the two tests added together? Formula from #192. Answer is 80+70=150. 196. What is the standard deviation for the two tests added together?  X Y sure you use the variances! Then square root the final answer for 15.6205 2 197. What is the difference in the test averages?   X2   Y2  10 2  12 2  244 Make  xy   x   y  10  198. What is the standard deviation for the difference in the test averages?  X Y   X   Y  10  12  244 Then square root for 15.6205. The variance/standard deviation formula doesn’t care if you are subtracting or adding. It’s the same formula.  199. 2 2 2 2 2 If I cut the test scores on Test 2 in half and add 50, what is the new average?   a bX  a  b X  50  .5  70  85 200. What is the new standard deviation for Test 2 in problem 199?  a bX  b  X  .5 12 root for standard deviation. Remember that “a” does not affect the standard deviation/variance. 2 2 2 2 2  36 Now square 201. If I add 7 points to every Test 1, what is the new standard deviation? Since “a” does not affect standard deviation, the answer is the same as before adding 7 points. It’s 10.  202. If I multiply every Test 1 by 2 and subtract 80, what is the new mean? “a”=-80 and “b”=2, so the average is 80. 203. If I multiply every Test 1 by 2 and subtract 80, what is the new standard deviation? “a”=-80 and “b”=2, so the standard deviation is 2 2 10 2  20 . 204. Where are the mean and median located on a normal distribution? Since a normal curve is symmetrical, they are the same so right in the middle. For problems 205-211 consider the process of a drawing a card from a standard deck and replacing it. Let A be drawing a heart, B be drawing a king, and C be drawing a spade. 205. Are the events A and B disjoint? Explain. No because you can draw a King of Hearts. That is an outcome in both events. 206. Are the events A and B independent? Explain. Yes, since you are replacing cards. Also the ? P(Heart)  P(King) P(Heart  King) multiplication/independence rule works. 207. 1 1  4 13  1 52 This is true. Are the events A and C disjoint? Explain. Yes, because you cannot have a card that is both a heart & a spade. 208. Are the events A and C independent? Explain. NO!! Disjoint events are dependent!! Both A & C have a  However, if I pick a card and I get one event (let’s say Hearts) then there is no way to probability of happening of ¼. get the other (spades) and it’s probability goes down to zero. So it is affected by the first card. 209. What does the symbol  mean? “Union” See # 171 210. What does the symbol  mean? “Intersect” See # 172

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics Review Chapters 1-8