* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Suppose we measured the right foot length of 30 teachers and graphed the results. Assume the first person had a 10 inch foot. Number of People with that Shoe Size If our second subject had a 9 inch foot, we would add her to the graph. As we continued to plot foot lengths, a pattern would begin to emerge. 8 7 6 5 4 3 2 1 . 4 5 6 7 8 9 10 11 12 Length of Right Foot 13 14 Number of People with that Shoe Size Notice how there are more people (n=6) with a 10 inch right foot than any other length. Notice also how as the length becomes larger or smaller, there are fewer and fewer people with that measurement. This is a characteristics of many variables that we measure. There is a tendency to have most measurements in the middle, and fewer as we approach the high and low extremes. 8 7 6 5 4 3 2 1 If we were to connect the top of each bar, we would create a frequency polygon. . 4 5 6 7 8 9 10 11 12 Length of Right Foot 13 14 Number of People with that Shoe Size You will notice that if we smooth the lines, our data almost creates a bell shaped curve. 8 7 6 5 4 3 2 1 4 5 6 7 8 9 10 11 12 13 14 Length of Right Foot You will notice that if we smooth the lines, our data almost creates a bell shaped curve. Number of People with that Shoe Size This bell shaped curve is known as the “Bell Curve” or the “Normal Curve.” 8 7 6 5 4 3 2 1 4 5 6 7 8 9 10 11 12 13 14 Length of Right Foot Number of Students Whenever you see a normal curve, you should imagine the HISTOGRAM within it. 9 8 7 6 5 4 3 2 1 12 13 14 15 16 17 18 19 20 21 22 Points on a Quiz What can you say about the Mean, Median and Mode of a Normal Curve? So WHAT ABOUT THE STANDARD DEVIATION???? The inflection points (where the curve starts to flatten out) represent the width of the standard deviation μ-σ AP Statistics, Section 2.1, Part 1 μ μ+σ 7 Normal distributions are a family of distributions that have the same general bell shape. They are symmetric (the left side is an exact mirror of the right side) with scores more concentrated in the middle than in the tails. Examples of normal distributions are shown to the right. Notice that they differ in how spread out they are although the area under each curve is always the same and equal to 1! The mean and standard deviation are useful ways to describe a set of scores as they also determine the shape of the bell curve. If the scores are grouped closely together, the curves will have a smaller standard deviation than if they are spread farther apart. Large Standard Deviation Small Standard Deviation . Same Means Different Standard Deviations Different Means Same Standard Deviations Different Means Different Standard Deviations THE NORMAL EQUATION Notation: N(μ,σ) is a normal distribution N(0,1) is the standard normal distribution “Standardizing” is the process of doing a linear translation from N(μ,σ) into N(0,1) IN GENERAL, A Normal Distribution is: Symmetrical bell-shaped (unimodal) density curve Above the horizontal axis Each curve is specified by its Mean & Standard Deviation: N(m, s) The transition points occur at m + s Probability is calculated by finding the area under the curve As s increases, the curve flattens & spreads out As s decreases, the curve gets taller and thinner All Normal distributions can be “standardized” and are therefore the same if we measure in units of size σ from the mean µ as center. Definition: The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the standardized variable z x -m s has the standard Normal distribution, N(0,1). Normal Distributions Standard Normal Distribution + The The EMPIRICAL Rule Normal models give us an idea of how extreme a value is by telling us how likely it is to find one that far from the mean. We can find these numbers precisely, but until then we will use a simple rule that tells us a lot about the Normal model… It turns out that in a Normal model: about 68% of the values fall within one standard deviation of the mean; about 95% of the values fall within two standard deviations of the mean; and, about 99.7% (almost all!) of the values fall within three standard deviations of the mean. Copyright © 2010, 2007, 2004 Pearson Education, Inc. 68-95-99.7 (or Empirical) Rule AP Statistics, Section 2.1, Part 1 15 Example, p. 113 Normal Distributions The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55). a) Sketch the Normal density curve for this distribution. b) What percent of ITBS vocabulary scores are less than 3.74? c) What percent of the scores are between 5.29 and 9.94? Finding Normal Percentiles by Hand When a data value doesn’t fall exactly 1, 2, or 3 standard deviations from the mean, we can look it up in a table of Normal percentiles. A Table of Standard Normal Probabilities provides us with normal percentiles, but many calculators and statistics computer packages provide these as well. Copyright © 2010, 2007, 2004 Pearson Education, Inc. The Standard Normal Table Definition: The Standard Normal Table Table A is a table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left of z. Suppose we want to find the proportion of observations from the standard Normal distribution that are less than 0.81. We can use Table A: Z .00 .01 .02 0.7 .7580 .7611 .7642 0.8 .7881 .7910 .7939 0.9 .8159 .8186 .8212 Copyright © 2010, 2007, 2004 Pearson Education, Inc. Normal Distributions Because all Normal distributions are the same when we standardize, we can find areas under any Normal curve from a single table. P(z < 0.81) = .7910 + Example, p. 117 Finding Areas Under the Standard Normal Curve Normal Distributions Find the proportion of observations from the standard Normal distribution that are between -1.25 and 0.81. Can you find the same proportion using a different approach? 1 - (0.1056+0.2090) = 1 – 0.3146 = 0.6854 Strategies for finding probabilities or proportions in normal distributions 1 2 3 4 5 6 Express the problem in terms of the observed variable x Make a graph of the data to check the Nearly Normal Condition to make sure we can use the Normal distribution to model the distribution. Draw a picture of the distribution and shade the area of interest under the curve. Standardize x to restate the problem in terms of a standard Normal variable z. Use Table A or the calculator and the fact that the total area under the curve is 1 to find the required area under the standard Normal curve. Conclude: Write your conclusion in the context of the problem. Copyright © 2010, 2007, 2004 Pearson Education, Inc. + Distribution Calculations When Tiger Woods hits his driver, the distance the ball travels can be described by N(304, 8). What percent of Tiger’s drives travel between 305 and 325 yards? When x = 305, z = 305 - 304 0.13 8 When x = 325, z = 325 - 304 2.63 8 Normal Distributions Normal Using Table A, we can find the area to the left of z=2.63 and the area to the left of z=0.13. 0.9957 – 0.5517 = 0.4440. About 44% of Tiger’s drives travel between 305 and 325 yards. + Cautions We should only use the z-table when the distributions are normal, and data has been standardized The z-table only gives the amount of data found below the z-score, THAT IS THE AREA TO THE LEFT OF THE z-score! If you want to find the portion found above the z-score, subtract the probability found on the table from 1. 22 AP Statistics, Section 2.2, Part 1 + EXAMPLE: Find the proportion of observations from the standard Normal distribution the is greater than .81 + From Percentiles to Scores: z in Reverse Sometimes we start with areas and need to find the corresponding z-score or even the original data value. Example: What z-score represents the first quartile in a Normal model? + From Percentiles to Scores: z in Reverse (cont.) Look in theTable for an area of 0.2500. The exact area is not there, but 0.2514 is pretty close. This figure is associated with z = –0.67, so the first quartile is 0.67 standard deviations below the mean. Will my calculator do any of this normal stuff? • Normalpdf – use for graphing ONLY • Normalcdf – will find probability of area from lower bound to upper bound • Invnorm (inverse normal) – will find zscore for probability • THESE COMMANDS ARE FOUND IN THE DISTRIBUTION MENU Finding Normal Percentiles using the calculator Go to the Distribution key on your calculator Find NORMCDF Use the key stroke: NORMCDF(min z,max z) Copyright © 2010, 2007, 2004 Pearson Education, Inc. Example Men’s heights are N(69,2.5). What percent of men are taller than 68 inches? Copyright © 2010, 2007, 2004 Pearson Education, Inc. AP Statistics, 28 Section 2.2, Part 1 Working with intervals What proportion of men are between 68 and 70 inches tall? AP Statistics, Section 2.2, Copyright © 2010, 2007, 2004 Pearson Education, Inc.1 Part 29 INVERSE NORM We can also use the calculator to also find the zscore for a particular area: INVNORM(prop of area to the left) Note: when using the calculator, entering μ,σ will “un-standardize” the data Copyright © 2010, 2007, 2004 Pearson Education, Inc. Working backwards How tall must a woman be in order to be in the top 15% of all women? Copyright © 2010, 2007, 2004 Pearson Education, Inc. AP Statistics, 31 Section 2.2, Part 1 Working backwards How tall must a man be in order to be in the 90th percentile? Copyright © 2010, 2007, 2004 Pearson Education, Inc. AP Statistics, 32 Section 2.2, Part 1 Working backwards What range of values make up the middle 50% of men’s heights? Copyright © 2010, 2007, 2004 Pearson Education, Inc. AP Statistics, 33 Section 2.2, Part 1 + REMINDER: CAUTION!!! Whether using the calculator or Table, we should only use the ztable when the distributions are normal, and data has been standardized 34 AP Statistics, Section 2.2, Part 1 Therefore, we need a strategy for assessing Normality. Plot the data. •Make a dotplot, stemplot, or histogram and see if the graph is approximately symmetric and bell-shaped. •Check the 1-Vars Stats & compare mean & median Check whether the data follow the 68-95-99.7 rule. •Count how many observations fall within one, two, and three standard deviations of the mean and check to see if these percents are close to the 68%, 95%, and 99.7% targets for a Normal distribution. + Normality Normal Distributions Assessing + Are You Normal? How Can You Tell? A more specialized graphical display that can help you decide whether a Normal model is appropriate is the Normal probability plot. If the distribution of the data is roughly Normal, the Normal probability plot approximates a diagonal straight line. Deviations from a straight line indicate that the distribution is not Normal. Most software packages can construct Normal probability plots. These plots are constructed by plotting each observation in a data set against its corresponding percentile’s z-score. Interpreting Normal Probability Plots If the points on a Normal probability plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot. Normal Distributions Probability Plots + Normal Are You Normal? How Can You Tell? (cont.) Nearly Normal data have a histogram and a Normal probability plot that look somewhat like this example: Copyright © 2010, 2007, 2004 Pearson Education, Inc. Are You Normal? How Can You Tell? (cont.) A skewed distribution might have a histogram and Normal probability plot like this: Copyright © 2010, 2007, 2004 Pearson Education, Inc. Are Walter Johnson’s Wins Normal? 5, 14, 13, 25, 25, 33, 36, 28, 27, 25, 23, 23, 20, 8, 17, 15, 17, 23, 20, 15, 5 into list L1 Run “1-Var Stats” Is the data set symmetric? Where do you look? AP Statistics, Section 2.2, Copyright © 2010, 2007, 2004 Pearson Education, Inc.2 Part 40 Are Walter Johnson’s Wins Normal? Look also at boxplot Is the data set symmetric? AP Statistics, Section 2.2, Copyright © 2010, 2007, 2004 Pearson Education, Inc.2 Part 41 68-95-99.7 Rule? You can use the 68-95-99.7 rule with a histogram to see if the distribution roughly fits the rule. You will also want to check the Normal Probability Plot. Copyright © 2010, 2007, 2004 Pearson Education, Inc. AP Statistics, 42 Section 2.2, Part 2 In Summary: What Can Go Wrong? Don’t use a Normal model when the distribution is not unimodal and symmetric. Copyright © 2010, 2007, 2004 Pearson Education, Inc. What Can Go Wrong? (cont.) Don’t use the mean and standard deviation when outliers are present—the mean and standard deviation can both be distorted by outliers. Don’t round off too soon. Don’t round your results in the middle of a calculation. Don’t worry about minor differences in results. Copyright © 2010, 2007, 2004 Pearson Education, Inc. What have we learned? The story data can tell may be easier to understand after shifting or rescaling the data. Shifting data by adding or subtracting the same amount from each value affects measures of center and position but not measures of spread. Rescaling data by multiplying or dividing every value by a constant changes all the summary statistics—center, position, and spread. Copyright © 2010, 2007, 2004 Pearson Education, Inc. What have we learned? (cont.) We’ve learned the power of standardizing data. Standardizing uses the SD as a ruler to measure distance from the mean (z-scores). With z-scores, we can compare values from different distributions or values based on different units. z-scores can identify unusual or surprising values among data. Copyright © 2010, 2007, 2004 Pearson Education, Inc. What have we learned? (cont.) We’ve learned that the 68-95-99.7 Rule can be a useful rule of thumb for understanding NORMAL distributions: For data that are unimodal and symmetric, about 68% fall within 1 SD of the mean, 95% fall within 2 SDs of the mean, and 99.7% fall within 3 SDs of the mean. Copyright © 2010, 2007, 2004 Pearson Education, Inc. What have we learned? (cont.) We see the importance of Thinking about whether a method will work: Normality Assumption: We sometimes work with Normal tables. These tables are based on the Normal model. Data can’t be exactly Normal, so we check the Nearly Normal Condition by making a histogram (is it unimodal, symmetric and free of outliers?) or a normal probability plot (is it straight enough?). Copyright © 2010, 2007, 2004 Pearson Education, Inc.