Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Statistics 60: Fall 2011 Section 1 (Sept 30) TA: Dennis Sun Statistics 60: Section 1 ? TA: Dennis Sun ([email protected]) ? Office Hours: M 2-3, Th 1-2 in Sequoia 238 ? Section: Fridays 9-9:50 AM in Braun Lec. Hall ? Website: stats60.com 1 The Importance of Variability The mean (or average) and the median are useful for describing the center of a variable, but they do not tell the whole story. We can get a better picture if we also consider the spread of the variable, which is measured by the standard deviation (or SD, for short). • The average (or mean) of a list of numbers is: mean of #’s = sum of #’s . n • The root-mean-square (r.m.s.) is another q measure of center. It is literally the “root of the means of the squares”: r.m.s. of #’s = mean of (#2 )’s. • The standard deviation (SD) of a list of numbers is the r.m.s. of deviations from the mean. Example 1 (Yellowstone Supervolcano). Yellowstone National Park and much of the western U.S. sits atop a massive volcano. A Yellowstone eruption would be catastrophic, as Bill Bryson relates: The ash fall from the last Yellowstone eruption covered...nearly the whole of the United States west of the Mississippi. This, bear in mind, is the breadbasket of America, an area that produces roughly half the worlds cereals....If you wanted to grow crops again, you would have to find some place to put all the ash. So should we be worried about an impending eruption? [Geologists] were able to work out that the cycle of Yellowstones eruptions averaged one massive blow every 600,000 years. The last one, interestingly enough, was 630,000 years ago. Yellowstone, it appears, is due. What do you think of Bryson’s claim in light of the data below? Times between last 5 supereruptions (in millions of years) 0.65 0.80 1 2.25 2.17 stats60.com Statistics 60: Fall 2011 Section 1 (Sept 30) TA: Dennis Sun Better yet, we can consider the distribution of the data, which is visualized using a histogram. Example 2 (Hurricane Irene). In August 2011, Hurricane Irene devastated the northeastern US. Just how severe was Irene? Shown below is a histogram of the max wind speeds (in knots) of all Atlantic hurricanes and tropical storms in the past 5 years. Irene was clocked at 105 knots. (a) Estimate the percentile rank of Irene’s max wind speed among Atlantic storms. (b) The average max wind speed of Atlantic storms is 70 knots. Do you expect more than, less than, or exactly 50% of storms to have a max wind speed above 70? 20 40 60 80 100 120 140 160 Wind Speed in Knots Example 3 (Baseball Salaries). The League Commissioner’s Office is trying to determine whether the “typical” player in the MLB is overpaid. (a) Suppose you are a representative of the Players Association. Do you cite the mean or median? (b) Suppose you are the stingy owner of the Minnesota Twins. Do you cite the mean or median? 2 Shifting, and Scaling, and Standardizing, Oh My! • Shifting: Suppose we have temperature measurements in Celsius and, as good scientists, wish to convert them to Kelvin. (K = C + 273) How will this affect the mean and SD? • Scaling: Suppose we have the weights of students in this class in kilograms and would like to convert to pounds. (1 kg = 2.2 lbs) How will this affect the mean and SD? 2 stats60.com Statistics 60: Fall 2011 Section 1 (Sept 30) TA: Dennis Sun Example 4 (Temperatures). In 1861, Carl August Wunderlich measured the body temperatures of 25,000 people and reported an average of 37◦ C and an SD of 0.7◦ . These figures became translated in the United States as... Example 5 (Standardizing). Suppose for a list of numbers, we subtract from each element the mean of the list, then divide by the SD of the list. This process is called standardizing, and the resulting numbers are called z-scores. (a) The mean of the z-scores is always . . (b) The SD of the z-scores is always (c) What does a z-score represent, in words? Example 6 (Greatest Hitter?). The highest single-season batting averages in modern times are: Year Player BA Lg Avg Lg SD 1941 1977 1980 1994 Ted Williams Rod Carew George Brett Tony Gwynn .406 .388 .390 .394 .281 .277 .279 .293 .032 .027 .028 .032 Comparing raw batting averages makes little sense. As the table shows, it was easier to get a hit in ’94 than in ’77. Which of the four players had the most impressive season in your view? 3 The Normal Distribution Data in the real world often follows a normal distribution, which has the property that: ≈ 68% of data falls within 1 SD of the mean. ≈ 95% of data falls within 2 SD of the mean. ≈ 99.7% of data falls within 3 SD of the mean. If we need more precise numbers (i.e. what percentage falls within 1.5 SD of the mean), we consult a standard normal table. Note: The mean and SD completely determine the distribution of a normal variable! 3 stats60.com Statistics 60: Fall 2011 Section 1 (Sept 30) TA: Dennis Sun Example 7 (Soccer Penalty Kicks). A regulation soccer net is 24 feet wide. Suppose a typical goalie can block any shot within 9 feet of the center. You are trying to decide where to aim (horizontally). Your kicks are normally distributed around where you aim with an SD of 3 feet. (a) One common strategy is to “aim for the post” (i.e. 12 feet from center). Is this a good idea? What fraction of the time will you score with this strategy? (b) What is the optimal strategy? What fraction of the time will you score with this strategy? (c) Where should you aim if you want the goalie to block it no more than 20% of the time? (d) Bonus: In (b), you calculated that even with the optimal aiming strategy you still score less than 40% of the time. How much more “accurate” would you have to become before you are able to score more than half of the time? (i.e. what would your SD have to be?) 4 stats60.com