Download Chapter 3: Describing Relationships (first spread)

Ex. 2.1-1: Wins in Major League Baseball The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2009. 5 9 Key: 5|9 represents a 6 2455 team with 59 wins. 7 00455589 8 0345667778 9 123557 10 3 Find the percentiles for the following teams: (a) The Colorado Rockies, who won 92 games. (b) The New York Yankees, who won 103 games. (c) The Kansas City Royals and Cleveland Indians, who both won 65 games. Ex. 2.1-2, 3: State Median Household Incomes The table below shows the distribution of median household incomes for the 50 states and the District of Columbia. Median Cumulative Relative Cumulative Income Frequency Relative Frequency Frequency ($1000s) Frequency 35 to < 40 1 1/51 = 0.020 1 1/51 = 0.020 40 to < 45 10 10/51 = 0.196 11 11/51 = 0.216 45 to < 50 14 14/51 = 0.275 25 25/51 = 0.490 50 to < 55 12 12/51 = 0.236 37 37/51 = 0.725 55 to < 60 5 5/51 = 0.098 42 42/51 = 0.824 60 to < 65 6 6/51 = 0.118 48 48/51 = 0.941 65 to < 70 3 3/51 = 0.059 51 51/51 = 1.000 The cumulative relative frequency graph below shows the same income data. The point at (50,0.49) means 49% of the states had median household incomes less than $50,000. The point at (55, 0.725) means that 72.5% of the states had median household incomes less than $55,000. Thus, 72.5% - 49% = 23.5% of the states had median household incomes between $50,000 and $55,000 since the cumulative relative frequency increased by 0.235. Due to rounding error, this value is slightly different than the relative frequency for the 50 to <55 category. (a) At what percentile is California, with a median income of $57,445? (b) Estimate and interpret the first quartile of this distribution. Ex. 2.1-5: Wins in Major League Baseball In 2009, the mean number of wins was 81 with a standard deviation of 11.4 wins. Find and interpret the z-scores for the following teams. (a) The New York Yankees, with 103 wins. (b) The New York Mets, with 70 wins. Ex. 2.1-6: Home Run Kings The single-season home run record for major league baseball has been set just three times since Babe Ruth hit 60 home runs in 1927. Roger Maris hit 61 in 1961, Mark McGwire hit 70 in 1998 and Barry Bonds hit 73 in 2001. In an absolute sense, Barry Bonds had the best performance of these four players, since he hit the most home runs in a single season. However, in a relative sense this may not be true. Baseball historians suggest that hitting a home run has been easier in some eras than others. This is due to many factors, including quality of batters, quality of pitchers, hardness of the baseball, dimensions of ballparks, and possible use of performance-enhancing drugs. To make a fair comparison, we should see how these performances rate relative to others hitters during the same year. Compute the standardized scores for each performance. Year Player HR Mean SD 1927 Babe Ruth 60 7.2 9.7 1961 Roger Maris 61 18.8 13.4 1998 Mark McGwire 70 20.7 12.7 2001 73 21.4 13.2 Barry Bonds z-score Which player had the most outstanding performance relative to his peers? Ex. 2.1-7, 8: Test Scores (Transforming Data) The graph and table below are summary statistics for a sample of 30 test scores. The maximum possible score on1the test was 50 points. Dot Plot Collection 10 15 20 25 30 35 Score 40 45 50 sx Min Q1 M Q3 Max IQR Range x n Score 30 35.8 8.17 12 32 37 41 48 9 36 Suppose that the teacher was nice and added 5 points to each test score. How would this change the shape, center, and spread of the distribution? Shown below areCollection the graphs1 and summary statistics for the original scores and the Dot +5Plotscores: Score Score_Plus5 10 15 20 25 30 35 40 45 50 sx Min Q1 M Q3 Max IQR Range x n Score 30 35.8 8.17 12 32 37 41 48 9 36 Score + 5 30 40.8 8.17 17 37 42 46 53 9 36 From both the graph and summary statistics, we can see that the measures of center and measures of position all increased by 5. However the shape of the distribution did not change nor did the spread of the distribution. Suppose that the teacher wanted to convert the original test scores to percents. Since the test was out of 50 points, he should multiply each score by 2 to make them out of 100. Shown below are graphs and summary statistics for the original scores and the doubled scores. Collection 1 Dot Plot Score Scorex2 10 20 30 40 50 60 70 80 90 100 sx x n Min Q1 M Q3 Max IQR Range Score 30 35.8 8.17 12 32 37 41 48 9 36 Score x 2 60 71.6 16.34 24 64 74 82 96 18 72 From the graphs and summary statistics we can see that the measures of center, location, and spread all have doubled, just like the individual observations. But even though the distribution is more spread out, the shape hasn’t changed. It is still skewed to the right with the same clusters and gaps. Ex. 2.2-1, 2: Batting Averages The histogram below shows the distribution of batting average (proportion of hits) for the 432 Major League Baseball players with at least 100 plate appearances in the 2009 season. The smooth curve shows the overall shape of the distribution. In the first graph below, the bars in red represent the proportion of players who had batting averages of at least 0.270. There are 177 such players out of a total of 432, for a proportion of 0.410. In the second graph below, the area under the curve to the right of 0.270 is shaded. This area is 0.391, only 0.019 away from the actual proportion of 0.410. The mean of the 432 batting averages in MLB in 2009 was 0.261 with a standard deviation of 0.034. Suppose that the distribution is exactly Normal with  = 0.261 and  = 0.034. (a) Sketch a Normal density curve for this distribution of batting averages. Label the points that are 1, 2, and 3 standard deviations from the mean. (b) What percent of the batting averages are above 0.329? Show your work. (c) What percent of the batting averages are between 0.227 and .295? Show your work. Compare your results with results from using the 68-95-99.7 rule. Ex. 2.2-3: Finding Area to the Right Suppose we wanted to find the proportion of observations in a Normal distribution that were more than 1.53 standard deviations above the mean. That is, we want to know what proportion of observations in the standard Normal distribution are greater than z = 1.53. To find this proportion, locate the value 1.5 in the left-hand column of Table A, then locate the remaining digit 3 as .03 in the top row. The corresponding entry is 0.9370. This is the area to the left of z = 1.53. To find the area above z = 1.53, subtract 0.9370 from 1 to get 0.0630. The area to the right of z = 1.53 is 1 – 0.9370 = 0.0630 The table entry 0.9370 is for the area to the left of z = 1.53. Ex. 2.2-4: Finding Areas Under the Standard Normal Curve Find the proportion of observations from the standard Normal distribution that are between -0.58 and 1.79. To find the proportion of observations from the standard Normal curve that are between -0.58 and 1.79, we must find the proportion of values that are less than z = 1.79 and then subtract the proportion of values that are less than z = -0.58. The difference in these proportions is the proportion of observations that are between z = -0.58 and z = 1.79. Area to the left of z = 1.79 is 0.9633 Area to the left of z = -0.58 is 0.2810 – Area between z = -0.58 and z = 1.79 is 0.6823 = Ex. 2.2-5: Working Backward In a standard Normal distribution, 20% of the observations are above what value? Using Table A, we should look up an area of 0.8000 since the table always lists area to the left of a boundary. The closest area to 0.8000 is 0.7995 which corresponds to a z-score of z = 0.84. Thus, approximately 20% of the observations in a standard Normal distribution are above z = 0.84. Area to the right of z is 0.20. What’s z? Ex. 2.2-6, 7: Serving Speed In the 2008 Wimbledon tennis tournament, Rafael Nadal averaged 115 miles per hour (mph) on his first serves1. Assume that the distribution of his first serve speeds is Normal with a mean of 115 mph and a standard deviation of 6 mph. About what proportion of his first serves would you expect to exceed 120 mph? State: Let x = the speed of Nadal’s first serve. The variable x has a Normal distribution with  = 115 and  = 6. We want the proportion of first serves with x  120. Plan: The figure below shows the distribution with the area of interest shaded. x = 120 z = 0.83 120  115  0.83 . Table A: Looking up a z-score of 0.83 shows us that the area 6 less than z = 0.83 is 0.7967. This means that the area to the right of z = 0.83 is 1 – 0.7967 = 0.2033. Conclude: About 20% of Nadal’s first serves will travel more than 120 mph. Do: Standardize: z  What percent of Rafael Nadal’s first serves are between 100 and 110 mph? State: Let x = the speed of Nadal’s first serve. The variable x has a Normal distribution with  = 115 and  = 6. We want the proportion of first serves with 100 < x < 110. Plan: The figure below shows the distribution with the area of interest shaded. x = 100 z = -2.50 x = 110 z = -0.83 100  115 110  115  2.50 . When x = 110, z   0.83 . Table 6 6 A: Looking up a z-score of -2.50 shows us that the area less than z = -2.50 is 0.0062. Looking up a zscore of -0.83 shows us that the area less than z = -0.83 is 0.2033. Thus, the area between z = -2.50 and z = -0.83 is 0.2033 – 0.0062 = 0.1971. Conclude: About 20% of Nadal’s first serves will travel between 100 and 110 mph. Do: Standardize: When x = 100, z  1 http://sports.espn.go.com/sports/tennis/wimbledon08/columns/story?columnist=garber_greg&id=3472238 Ex. 2.2-8: Heights of three-year-old females. According to http://www.cdc.gov/growthcharts/, the heights of 3 year old females are approximately Normally distributed with a mean of 94.5 cm and a standard deviation of 4 cm. What is the third quartile of this distribution? State: Let x = height of a randomly selected three year old female. This variable has the N(94.5, 4) distribution. The third quartile is the value with 75% of the distribution to its left. Plan: The picture below illustrates what we are trying to find. Do: Using Table A, the table entry closest to 0.75 is 0.7486. This corresponds to a z-score of 0.67. To x  94.5 un-standardize, we solve the following equation for x: 0.67  and get x = 97.18 cm. 4 Conclude: The third quartile of 3 year old female heights is 97.18 cm. Ex. 2.2-9: No Space in the Fridge? (Assessing Normality) The measurements listed below describe the useable capacity (in cubic feet) of a sample of 36 side-byside refrigerators. <source: Consumer Reports, May 2010> Are the data close to Normal? 12.9 13.7 14.1 14.2 14.5 14.5 14.6 14.7 15.1 15.2 15.3 15.3 15.3 15.3 15.5 15.6 15.6 15.8 16.0 16.0 16.2 16.2 16.3 16.4 16.5 16.6 16.6 16.6 16.8 17.0 17.0 17.2 17.4 17.4 17.9 18.4 A histogram of these data is shown below. It seems roughly symmetric and bell shaped. The mean and standard deviation of these data are x = 15.825 and sx = 1.217.  x  1sx = (14.608, 17.042) 24 of 36 = 66.7%   x  2sx = (13.391, 18.259) x  3sx = (12.174, 19.467) 34 of 36 = 94.4% 36 of 36 = 100% These percents are quite close to what we would expect based on the 68-95-99.7 rule. Combined with the graph, this gives good evidence that this distribution is close to Normal. Ex. 2.2-10: Assessing Normality Alternate Example: Here is a Normal probability plot (also called a Normal quantile plot) of the refrigerator data from the example above. It is quite linear, supporting our earlier decision that the distribution is close to Normal. Ex. 2.2-11: State Land Areas (Assessing Normality) The histogram and Normal probability plot below display the land areas for the 50 states. Is this distribution approximately Normal? Both the histogram and Normal probability plot indicate that this distribution is strongly skewed to the right. In particular, there is one state whose area is much larger than we would expect if the distribution was approximately Normal. Ex. 2.2-12: NBA Free Throw Percentage (Assessing Normality) This is an example of a distribution that is skewed to the left. Notice that the lowest free throw percentages are too the left of what we would expect and the highest free throw percentages are not as far to the right as we would expect. Ex. 2.2-13: How linear should a Normal probability plot actually be? The screen shots below show the results for taking random samples from a Normal distribution and generating the data’s Normal probability plot. As you can see, none of the plots were perfectly linear, even though the sample came from a Normal population.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 3: Describing Relationships (first spread)