Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 3 – Exploring and Understanding Data (Chapter 6 – The Standard Deviation as a Ruler and the Normal Model) • • When we ___________ or ____________ all the data values by any constant value, both measures of location (e.g., mean and median) and Key Terms: - Rescaling data: measures of spread (e.g., range, IQR, standard deviation) are divided and Shifting – multiplied by the same value. - Rescaling – Example: - Standardizing – - - Standardized Value – - Normal Model – - Parameter – - Statistic – - z-score – - Standard Normal Model – - 68-95-99.7 Rule – - Normal percentile – - Changing center and spread - Use the following summary statistics to create a new list of summary statistics that have been multiplied by 2 and then added to by 14. • The trick in comparing very different-looking values is to use ___________ ____________ as our rulers. • The standard deviation tells us how the whole collection of values varies, so it’s a natural ruler for comparing an individual to a group. • As the most common measure of variation, the standard deviation plays a crucial role in how we look at data. • Shifting data: • • Adding (or subtracting) a ____________ amount to each value just adds We compare individual data values to their ____________, relative to their standard deviation using the following formula: (or subtracts) the same constant to (from) the mean. This is true for the ____________and other measures of position too. • In general, adding a constant to every data value adds the same constant to measures of ____________ and percentiles, but leaves measures of _____________ unchanged. • We call the resulting values ____________ ____________, denoted as z. They can also be called ____________. • Standardized values have no units. • z-scores measure the distance of each data value from the mean in standard Using the data set - 1, 1, 1, 1, 5, 6, 20, 30 do the following: deviations. • EXAMPLE: A negative z-score tells us that the data value is below the mean, - Find the mean - Subtract the mean from each data value, what is your new mean? - Divide each new value by “s”, what is your new standard deviation? while a positive z-score tells us that the data value is above the mean. • Standardized values have been converted from their original units to the standard statistical unit of standard deviations from the mean. • Thus, we can compare values that are measured on different scales, with different units, or from different populations. • EXAMPLE: how far it is from the mean. Mary took the SAT and scored a 1750. The mean score for the SAT is 1500 with a standard deviation of 125. Mark took the ACT and scored a 28. The mean score for the ACT is a 26 with a standard deviation of 1.5. Who did better on their test? A z-score gives us an indication of how unusual a value is because it tells us • • Anything more than 2 standard deviations from the mean is ___________. • The larger a z-score is (negative or positive), the more unusual it is. There is no universal standard for z-scores, but there is a model that shows up over and over in Statistics. • This model is called the _______________ (You may have heard of “bellshaped curves.”). • Normal models are appropriate for distributions whose shapes are unimodal and roughly symmetric. • Standardizing data into z-scores shifts the data by subtracting the mean and • • rescales the values by dividing by their standard deviation. • • • These distributions provide a measure of how extreme a z-score is. Remember that a negative z-score tells us that the data value is below the Standardizing into z-scores does not change the shape of the mean, while a positive z-score tells us that the data value is above the distribution. mean. Standardizing into z-scores changes the center by making the • There is a Normal model for every possible combination of mean and standard mean 0. deviation. Standardizing into z-scores changes the spread by making the • standard deviation 1. We write ________________ to represent a Normal model with a mean of μ and a standard deviation of σ. • We use Greek letters because this mean and standard deviation do not come • from data—they are numbers (called ____________________) that specify the model. • • likely it is to find one that far from the mean. • Summaries of data, like the sample mean and standard deviation, are written with Latin letters. Such summaries of data are called statistics. Normal models give us an idea of how extreme a value is by telling us how We can find these numbers precisely, but until then we will use a simple rule that tells us a lot about the Normal model… • It turns out that in a Normal model: When we standardize Normal data, we still call the standardized value a z- • about 68% of the values fall within one standard deviation of the mean; score, and we write • about 95% of the values fall within two standard deviations of the mean; and, • EXAMPLE: A student calculated that it takes an average of 17 minutes with a standard about 99.7% (almost all!) of the values fall within three standard deviations of the mean. • The following shows what the 68-95-99.7 Rule tells us: • When we use the Normal model, we are assuming the distribution is Normal. • We cannot check this assumption in practice, so we check the following deviation of 3 minutes to drive from home, park the car, and walk to an early morning class. a. One day it took the student 21 minutes to get to class. How many standard deviations from the average is that? condition: b. Another day it took only 12 minutes for the student to get to class. What is • the z-score? Nearly Normal Condition: The shape of the data’s distribution is unimodal and symmetric. • This condition can be checked with a histogram or a Normal probability plot (to be explained later). • c. And, when we have data, make a histogram to check the Nearly Normal Another day it took 17 minutes for the student to get to class. What is the Condition to make sure we can use the Normal model to model the z-score? distribution. • When a data value doesn’t fall exactly 1, 2, or 3 standard deviations from the mean, we can look it up in a table of Normal percentiles. • Table Z in Appendix E provides us with normal percentiles, but many calculators and statistics computer packages provide these as well. Example: What z-score represents the first quartile in a Normal model? EXAMPLE: SAT verbal scores are represented by N(500, 100). Sketch the model with the Empirical Rule labeled. Use it to answer questions that follow. a. b. Between what scores would you expect to find 68% of the data? What score is the cut off to be in the top 2.5%? • Look in Table Z for an area of 0.2500. • The exact area is not there, but 0.2514 is pretty close. • This figure is associated with z = -0.67, so the first quartile is 0.67 standard deviations below the mean. EXAMPLE: SAT Verbal scores are represented by N(500, 100). What proportion of SAT scores fall between 450 and 600? • Table Z is the standard Normal table. We have to convert our data to z-scores before using the table. • Figure 6.5 shows us how to find the area to the left when we have a z-score of 1.80: Some colleges only admit students who have an SAT verbal score in the top 10%. What score would make you eligible? • Sometimes we start with areas and need to find the corresponding z-score or even the original data value. Find the percent of observations that fall in given intervals: • When you actually have your own data, you must check to see whether a Normal model is reasonable. -0.7 < z < 2.4 z > 1.8 • Looking at a histogram of the data is a good way to check that the underlying distribution is roughly unimodal and symmetric. • A more specialized graphical display that can help you decide whether a Normal model is appropriate is the Normal probability plot. • If the distribution of the data is roughly Normal, the Normal probability plot approximates a diagonal straight line. Deviations from a straight line indicate that Find the percent of observations that fall in given intervals: -0.7 < z < 2.4 z > 1.8 the distribution is not Normal. • Nearly Normal data have a histogram and a Normal probability plot that look somewhat like this example: For each model, find the missing parameter: 1) = 1250, 35% below 1200, σ= ? • 2) = 0.64, 12% above 0.70, σ= ? 3) σ= 0.5, 90% above 10.0, = ? A skewed distribution might have a histogram and Normal probability plot like this: