Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sec 4.4 Notes – Interpreting Center and Variability: Chebyshev’s Rule, the Empirical Rule, and z Scores *Using standard deviation as a ruler allows us to compare values that are measured on different variables, with different scales, with different units, or for different populations Chebyshev’s Rule: Applicable to all distributions Notice “at least” gives room for error (could be much more) Since data can be skewed, it is not appropriate to divide the percentages in half based on the mean The Empirical Rule (68-95-99.7 Rule): Normal models give us an idea of how extreme a value is by telling us how likely it is to find one that far from the mean 68% of the values fall within 1 standard deviation of the mean; 95% of the values fall within 2 standard deviations of the mean; and 99.7% of values fall within 3 standard deviations of the mean **Just Checking: pg. 109 As a group, the Dutch are among the tallest people in the world. The average Dutch man is 184 cm tall – just over 6 feet (and the average Dutch woman is 170.8 cm tall – just over 5’7”). If a Normal model is appropriate and the standard deviation for men is about 8 cm, what percentage of all Dutch men will be over 2 meters (6’6”) tall? Suppose it takes you 20 minutes, on average, to drive to school, with a standard deviation of 2 minutes. Suppose a Normal model is appropriate for the distribution of driving times. a) b) c) d) How often will you arrive at the school in less than 22 minutes? How often will it take you more than 24 minutes? Do you think the distribution of your driving times is unimodal and symmetric? What does this say about the accuracy of your predictions? Explain. **When working with Normal models, always draw a picture!!! **And don’t forget about checking the Nearly Normal Condition!!! **Step-by-Step: pg. 110-111 - BVD DO THIS!!! Sketch Normal models using the 68-95-99.7 Rule: Birthweights of babies, N(7.6 lb, 1.3 lb) ACT scores at a certain college, N(12.2, 4.4) Z-Scores: z-scores (or standardized values) – use the mean and standard deviation to compare data with different units tells us how many standard deviations above or below the mean a data value is have no units + z-scores = above the mean; - z-scores = below the mean the bigger the absolute value of a z-score (further away from the mean), the more unusual the data value is **Just Checking – pg. 104 Your Statistics teacher has announced that the lower of your two tests will be dropped. You got a 90 on test 1 and an 80 on test 2. You’re all set to drop the 80 until she announces that she grades “on a curve.” She standardized the scores in order to decide which is the lower one. If the mean on the first test is 88 with a standard deviation of 4 and the mean on the second was a 75 with a standard deviation of 5. a) Which one will be dropped? b) Does this seem “fair”? Shifting Data: Adding (or subtracting) a constant to each value, all measures of position (center, percentiles, min, max) will increase (or decrease) by the same constant The distribution just shifts, the shape and spread are not affected Rescaling Data: When we multiply (or divide) all the data values by any constant, all measures of position (such as the mean, median, and percentiles) and measures of spread (such as the range, the IQR, and the standard deviation) are multiplied (or divided) by that same constant **Just Checking – pg. 106 In 1995 the Educational Testing Service (ETS) adjusted the scores of the SAT tests. Before ETS re-centered the SAT Verbal test, the mean of all test scores was 450. a) How would adding 50 points to each score affect the mean? b) The standard deviation was 100 points. What would the standard deviation be after adding 50 points? c) Suppose we drew boxplots of test takers’ scores a year before and a year after the re-centering. How would the boxplots of the two years differ? A company manufactures wheels for roller blades. The diameter of the wheels has a mean of 3 inches and a standard deviation of 0.1 inches. Because so many of its customers use the metric system, the company decided to report their production statistics in millimeters (1 inch = 25.4 mm). They report that the standard deviation is now 2.54 mm. A corporate executive is worried about this increase in variation. Should they be concerned? Explain. Z-scores, again… When finding the z-score we shift them by the mean and rescale them by the standard deviation subtracting the mean of the data from each value shifts the mean of the distribution to 0 Dividing each value by the standard deviation, also divides the standard deviation, which makes the new standard deviation 1. Shape: unchanged Center: y 0 Spread: s 1 **Step-by-Step: pg. 107 - BVD Normal Model: Appropriate for distributions whose shapes are unimodal and roughly symmetric Written N ( , ) to represent a Normal model with a mean of and a standard deviation of The symbols are in Greek because they are not numerical summaries of data; they are part of the model. We call these parameters For the Normal model: z y Standard Normal model (or standard Normal distribution) – the Normal model with mean 0 and standard deviation 1 DATA MUST BE UNIMODAL AND SYMMETRIC!!! Nearly Normal Condition – the shape of the data’s distribution is unimodal and symmetric. Check this by making a histogram, or a Normal probability plot (explained later) NEVER use the model without checking whether the condition is satisfied. Examples – 1. Suppose the class took a 40-point quiz. Results show a mean score of 30, median 32, IQR 8, SD 6, min 12, and Q1 27. (Supposed YOU got a 35.) What happens to each of the statistics if… I decide to weight the quiz as 50 points, and will add 10 points to every score. Your score is now 45. I decide to weight the quiz as 80 points, and double each score. Your score is now 70. I decide to count the quiz as 100 points; I’ll double each score and add 20 points. Your score is now 90. Statistic Mean Median IQR SD Minimum Q1 Your score Original y+10 2y 2y+20 2. Let’s talk about scoring the decathlon. Silly example, but suppose two competitors tie in each of the first eight events. In the ninth event, the high jump, one clears the bar 1 in. higher. Then in the 1500-meter run the other one runs 5 seconds faster. Who wins? It boils down to knowing whether it is harder to jump an inch higher or run 5 seconds faster. We have to be able to compare two fundamentally different activities involving different units. Standard deviations to the rescue! If we knew the mean performance (by world-class athletes) in each event, and the standard deviation, we could compute how far each performance was from the mean in SD units (called z-scores). So consider the three athletes’ performances shown below in a three event competition. Note that each placed first, second, and third in and event. Who gets the gold medal? Who turned in the most remarkable performance of the competition? Events Competitor 100 m Dash Shot Put Long Jump A 10.1 sec 66’ 26’ B 9.9 sec 60’ 27’ C 10.3 sec 63’ 27’3” Mean 10 sec 60’ 26’ St Dev 0.2 sec 3’ 6” Finding Normal Percentiles by Hand: *Use when you don’t have a calculator Z Table – pg. A78-A79 When the value doesn’t fall exactly 1, 2, or 3 standard deviations from the mean, look it up in a table of Normal percentiles (Z table) Convert to z-scores before using the table Find the first two digits in the vertical column to the right (or left) Find the third digit in the top row The table gives the percent of data to the left (below) the z-score If you are looking for the percent of data below or above, you will have to subtract to get the value you want Finding Normal Percentiles Using the Calculator: **Step-by-Step: pg. 113 From Percentiles to z-scores: Sometimes you want to know what the cutoff value is for a certain percentile (i.e. What SAT score would you need to score in the 90th percentile?) Using the table: o Find the area (percent) in the table *for problems about things like the “top 15%” you will need to look up 1-% (.8500) o If you can’t find the exact value, take the closest one o Then look to the side and top to get the z-score o You often need to convert the z-score back to a raw data value (just plug the z-score into the equation and solve for y) **Step-by-Step: pg. 114-117 Normal Probability Plot: This is a way to check the Nearly Normal Condition If the distribution of the data is roughly Normal, the plot is roughly a diagonal straight line. Deviations from a straight line indicate that the distribution is not Normal This plot is usually able to show deviations from Normality more clearly than the corresponding histogram, but it’s usually easier to understand how a distribution fails to be Normal by looking at its histogram *sometimes you will need to look at both **TI Tips: pg. 119 STATPLOT On choose the last graph icon specify your datalist and which axis you want the data on (often Y) Specify the Mark you want the plot to use Hit ZoomStat Suggested Practice (New Book): pg. 184 #4.38, 4.39, 4.41-4.44, 4.46, 4.48, 4.50, 4.51, 4.52 Suggested Practice (Old Book): Ch. 6 #2, 6, 10- 22 even, 26-38 even, 42, 46