Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STANDARD DEVIATION CHAPTER 6 To determine who should get the GOLD MEDAL, somehow the performances in all 7 events need to be combined into ONE score!! How do they do this? Some races are recorded in minutes and seconds ( the runs) and some are recorded in meters ( the throwing and jumping) WHAT DO YOU THINK???? The Standard Deviation as a Ruler Use standard deviation when comparing unlike measures. Standard deviation is the most common measure of spread. Remember standard deviation is the square root of the variance. Standardizing We standardize to eliminate units. A standardized value can be found by subtracting the mean from the value and dividing by the standard deviation. Has no units A z-score measures the distance of each data value from the mean in standard deviation. Negative z-score- data value below the mean Positive z-score- data value above the mean Benefits of Standardizing Standardized values are converted to the standard statistical unit of standard deviations from the mean. (z-score) Values that are measured on different scales or in different units can now be compared. Example: Judges will use z-scores to determine the winner of a heptathlon . Bacher ran the 800-m in 129 seconds, 1.6 standard deviations better than the mean. Her long jump of 5.84 m had a z-score of -.44 (fell below the mean). 1.6-.44= .94. Judges would do this for all runners and events and the highest score wins. Shifting Data Adding or subtracting a constant amount to each value just adds or subtracts the same constant to: the mean and median Maximum, minimum, and quartiles The spread does not change because the distribution is simply shifting. The range, IQR, and the standard deviation remains the same. Recap: Adding a constant to every data value adds the same constant to measures of center and percentiles, but leaves measures of spread unchanged. Rescaling Data Rescaling data is multiplying or dividing all values by the same number. Changes the measurement units. Ex. Inches to feet (multiply by 12) When we divide or multiply all the data values by any constant value, both measures of location (mean and median) and measures of spread (range, IQR and standard deviation) are divided or multiplied by that same value. Back to z-scores Standardizing z-scores is shifting them by the mean and rescaling them by standard deviation. Standardizing: does not change the shape of the distribution of a variable. Changes the center by making the mean 0 changes the spread by making standard deviation 1 The First Three Rules for Working with Normal Models Make a picture! Make a picture! Make a picture! The Normal Model There are a lot of variables in the real world that have similar looking distributions… The Normal Model The Normal Model is a simpler version of symmetric, mound shaped data (often referred to as a “Bell Shaped Curve”) The Normal Model • We use a normal model to mathematically smooth the distribution (more on this later) The Normal Model When can we use the normal model? The data is unimodal The data is symmetric The data is mound shaped The data has no outliers You need to check (with a histogram or a Normal Probability Plot, or something similar) if the data is nearly normal before you can use the Normal Distribution to model your data Example Is the normal model appropriate? Why or why not? Example Is the normal model appropriate? Why or why not? Mean = StDev = 19.810 mm 112.016 mm Min LQ Med. UQ Max 0.316 mm 1.872 mm 3.589 mm 9.457 mm 2682.000 mm = = = = = The Normal Model • Whether or not the normal model is appropriate for a particular variable is based only on the shape of the variable’s distribution… • A variable’s distribution can have any center and any spread and still fit a normal model. How is this possible? The Normal Model • There’s a different normal model for every center and every spread. • Center • The center of the normal model is denoted m (say “mu”) • μ is the mean and the median of the normal model • Spread • The spread of the normal model is denoted s (say “sigma”) • Smaller s makes the curve “tall and skinny”; larger s makes the curve “flat and wide” • Write N(m, s) • This says that we have a normal model with a mean of m and a std. dev. of s The Normal Model • We could work with all of the different normal models, but then we’d have to keep track of a lot of stuff. It is much, much easier to convert all the different normal models into one Standard Normal Model • Standardize our model by transforming the data z ym s • Standard Normal Model: • N(0,1) • Mean = 0 • Standard Deviation = 1 The Normal Model • Once standardized… • We’ve simply shifted and rescaled the distributions. • Think of it as adding a z-scale/z-axis which always has a mean of 0 and a standard deviation of 1. Notes on the Normal Model • The curve is always above the x-axis – Never reaches the x-axis – Continues to infinity and –infinity • Total area under the curve = 1 • It is perfectly symmetric – Mean = Median • For any segment of the normal curve we know the area underneath the curve within that segment The Normal Model • Why use the normal model? – It’s simpler than real data. – The data from a normal model is distributed in a predictable pattern – For any segment of the standard normal model we know the area underneath the curve. – The area underneath the curve tells us what proportion of the data has values that correspond to our segments, thus giving us the probability that a data point will be in that segment The Normal Model: 68-95-99.7 Rule • (about) 68% of the data is between -1 and 1 standard deviations from the mean • (about) 95% of the data is between -2 and 2 standard deviations from the mean • (about) 99.7% of the data is between -3 and 3 standard deviations from the mean The Normal Model: 68-95-99.7 Rule 68% of observations are within 1 σ of the mean μ For N(0,1), 68% of the observations are between –1 and 1 The Normal Model: 68-95-99.7 Rule 95% of observations are within 2 σ of the mean μ For N(0,1), 95% of the observations are between –2 and 2 The Normal Model: 68-95-99.7 Rule 99.7% of observations are within 3 σ of the mean μ For N(0,1), 99.7% of the observations are between –3 and 3 The Normal Model: 68-95-99.7 Rule The heights of men are thought to follow a Normal Model with a mean of 70 in. and a std. dev. of 3 in. N(70, 3) Draw a picture of this Normal Model. Clearly label what the 68-95-99.7 Rule indicates about men’s heights. Example: How to proceed Step 1: Check to see if the normal model is appropriate Step 2: Draw the standard normal picture. Step 3: Find the mean and standard deviation of the sample data. Step 4: Make the conversion from the standard normal to the data and draw the new axis (directly below the standard normal picture) using this formula. z ym s Step 5: Use the picture to answer any questions. Example 1. What percentage of men have heights greater than 70 inches? 2. What percentage of men have heights less than 64 inches? 3. What percentage of men have heights greater than 73 inches? 4. 0.15% of men have heights above what value? Example De Veaux, Velleman & Bock, Ch6, #19: EPA fuel economy estimates for automobile models tested recently predicted a mean of 24.8 mpg and a standard deviation of 6.2 mpg for highway driving. Assume that a Normal model can be applied. 1. What does the 68-95-99.7 rule say about the distribution of autos’ fuel efficiency? 2. About what percent of autos should get more than 31 mpg? 3. Describe the gas mileage of the worst 2.5% of all the cars. 4. If I pick a automobile at random, what is the probability that I select a car that gets more than 37.2 mpg?