Download Statistics 60: Section 1

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Statistics 60: Fall 2011
Section 1 (Sept 30)
TA: Dennis Sun
Statistics 60: Section 1
? TA: Dennis Sun ([email protected])
? Office Hours: M 2-3, Th 1-2 in Sequoia 238
? Section: Fridays 9-9:50 AM in Braun Lec. Hall
? Website:
The Importance of Variability
The mean (or average) and the median are useful for describing the center of a variable, but
they do not tell the whole story. We can get a better picture if we also consider the spread of the
variable, which is measured by the standard deviation (or SD, for short).
• The average (or mean) of a list of numbers is: mean of #’s =
sum of #’s
• The root-mean-square (r.m.s.) is another
q measure of center. It is literally the “root of the
means of the squares”: r.m.s. of #’s = mean of (#2 )’s.
• The standard deviation (SD) of a list of numbers is the r.m.s. of deviations from the mean.
Example 1 (Yellowstone Supervolcano). Yellowstone National Park and much of the western U.S.
sits atop a massive volcano. A Yellowstone eruption would be catastrophic, as Bill Bryson relates:
The ash fall from the last Yellowstone eruption covered...nearly the whole of the United
States west of the Mississippi. This, bear in mind, is the breadbasket of America, an
area that produces roughly half the worlds cereals....If you wanted to grow crops again,
you would have to find some place to put all the ash.
So should we be worried about an impending eruption?
[Geologists] were able to work out that the cycle of Yellowstones eruptions averaged
one massive blow every 600,000 years. The last one, interestingly enough, was 630,000
years ago. Yellowstone, it appears, is due.
What do you think of Bryson’s claim in light of the data below?
Times between last 5 supereruptions
(in millions of years)
Statistics 60: Fall 2011
Section 1 (Sept 30)
TA: Dennis Sun
Better yet, we can consider the distribution of the data, which is visualized using a histogram.
Example 2 (Hurricane Irene). In August 2011, Hurricane Irene devastated the northeastern US.
Just how severe was Irene? Shown below is a histogram of the max wind speeds (in knots) of all
Atlantic hurricanes and tropical storms in the past 5 years. Irene was clocked at 105 knots.
(a) Estimate the percentile rank of Irene’s max wind speed
among Atlantic storms.
(b) The average max wind speed of Atlantic storms is 70
knots. Do you expect more than, less than, or exactly
50% of storms to have a max wind speed above 70?
Wind Speed in Knots
Example 3 (Baseball Salaries). The League Commissioner’s Office is trying to determine whether
the “typical” player in the MLB is overpaid.
(a) Suppose you are a representative of the Players Association. Do you cite the mean or median?
(b) Suppose you are the stingy owner of the Minnesota Twins. Do you cite the mean or median?
Shifting, and Scaling, and Standardizing, Oh My!
• Shifting: Suppose we have temperature measurements in Celsius and, as good scientists, wish
to convert them to Kelvin. (K = C + 273) How will this affect the mean and SD?
• Scaling: Suppose we have the weights of students in this class in kilograms and would like to
convert to pounds. (1 kg = 2.2 lbs) How will this affect the mean and SD?
Statistics 60: Fall 2011
Section 1 (Sept 30)
TA: Dennis Sun
Example 4 (Temperatures). In 1861, Carl August Wunderlich measured the body temperatures of
25,000 people and reported an average of 37◦ C and an SD of 0.7◦ . These figures became translated
in the United States as...
Example 5 (Standardizing). Suppose for a list of numbers, we subtract from each element the
mean of the list, then divide by the SD of the list. This process is called standardizing, and the
resulting numbers are called z-scores.
(a) The mean of the z-scores is always
(b) The SD of the z-scores is always
(c) What does a z-score represent, in words?
Example 6 (Greatest Hitter?). The highest single-season batting averages in modern times are:
Lg Avg
Ted Williams
Rod Carew
George Brett
Tony Gwynn
Comparing raw batting averages makes little sense. As the table shows, it was easier to get a hit
in ’94 than in ’77. Which of the four players had the most impressive season in your view?
The Normal Distribution
Data in the real world often follows a normal distribution, which has the property that:
≈ 68% of data falls within 1 SD of the mean.
≈ 95% of data falls within 2 SD of the mean.
≈ 99.7% of data falls within 3 SD of the mean.
If we need more precise numbers (i.e. what percentage falls within 1.5 SD of the mean), we consult
a standard normal table.
Note: The mean and SD completely determine the distribution of a normal variable!
Statistics 60: Fall 2011
Section 1 (Sept 30)
TA: Dennis Sun
Example 7 (Soccer Penalty Kicks). A regulation soccer net is 24 feet wide. Suppose a typical
goalie can block any shot within 9 feet of the center. You are trying to decide where to aim
(horizontally). Your kicks are normally distributed around where you aim with an SD of 3 feet.
(a) One common strategy is to “aim for the post” (i.e. 12 feet from center). Is this a good idea?
What fraction of the time will you score with this strategy?
(b) What is the optimal strategy? What fraction of the time will you score with this strategy?
(c) Where should you aim if you want the goalie to block it no more than 20% of the time?
(d) Bonus: In (b), you calculated that even with the optimal aiming strategy you still score less
than 40% of the time. How much more “accurate” would you have to become before you are
able to score more than half of the time? (i.e. what would your SD have to be?)