Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 2.2 NORMAL DISTRIBUTIONS Lesson Objectives ESTIMATE the relative locations of the median and mean on a density curve. ESTIMATE areas (proportions of values) in a Normal distribution. FIND the proportion of values in a specified interval, or the value that corresponds to a given percentile in any Normal distribution. Using The Empirical Rule Using the Standard Normal Distribution DETERMINE whether a distribution of data is approximately Normal from graphical and numerical evidence. Density Curve In Chapter 1, you learned how to plot a dataset to describe its shape, center, spread, etc. Sometimes, the overall pattern of a large number of observations is so regular that we can describe it using a smooth curve. Density Curve: An idealized description of the overall pattern of a distribution. Area underneath = 1, representing 100% of observations. Density Curves Density Curves come in many different shapes; symmetric, skewed, uniform, etc. The area of a region of a density curve represents the % of observations that fall in that region. The median of a density curve cuts the area in half. The mean of a density curve is its “balance point.” Density Curves The area of a region of a density curve represents the % of observations that fall in that region. What % of the observations represented by the following density curve fall between .4 and .6? 1- .4 .6 1 Normal Distributions • Once special type of density curve is the Normal curve. • These density curves are symmetric, single peaked, and bell shaped. Normal curves describe Normal distributions. • “Normal” distributions are very important in statistics (hence the capital “N” for Normal) Normal Distributions • All Normal distributions, although they may vary in appearance somewhat, have the same overall shape. • We describe a Normal distribution by giving its mean, μ and its standard deviation, σ. • Because Normal distributions are symmetric, the mean is located in the _______ of the distribution and is _______ to the median. Let’s talk notation.. • Because Normal distributions come up a lot in statistics, we abbreviate a Normal distribution with a mean μ and a standard deviation σ as N(μ ,σ) • For example, heights of young women follow a Normal distribution with μ = 64.5 inches and σ =2.5 inches – The distribution of young women’s heights would be ___________ Why Normal Distributions? (1) Normal distributions are good descriptions for some distributions of real data. *can you think of any data that would be normally distributed? (2) Normal distributions are good approximations to the results of many kinds of chance outcomes, such as tossing a coin many times. (3) Many statistical inference procedures based on Normal distributions work well for other roughly symmetric distributions Warning… • Although many sets of data follow a Normal distribution, there are also many that do not. – Even symmetric distributions may not be Normal! The 68-95-99.7 Rule (AKA the Empirical Rule) • Heights of young women follow a Normal distribution with μ = 64.5 inches and σ =2.5 inches • Heights of young women, μ = 64.5 inches and σ =2.5 inches • N(64.5, 2.5) • Between what heights do 95% of women fall? • Heights of young women, μ = 64.5 inches and σ =2.5 inches • N(64.5, 2.5) • Between what heights do 95% of women fall? • Heights of young women, μ = 64.5 inches and σ =2.5 inches • There are _____ women in this room. What # of women would we expect to have heights within 1 standard deviation of the mean? • Heights of young women, μ = 64.5 inches and σ =2.5 inches • What proportion of girls in this room would we expect to have heights greater than 67? • Heights of young women, μ = 64.5 inches and σ =2.5 inches • What proportion of girls in this room would we expect to have heights greater than 68? How do we know if a distribution is Normal??? Assessing Normality Method 1 • One method for assessing normality is to construct a histogram or a stemplot and then see if the graph is approximately bell-shaped and symmetric about the mean. • Histograms and stemplots can reveal important “non-Normal” features of a distributions such as skewness, outliers, or gaps and clusters. Method 1 Continued • For example, this distribution of vocabulary scores appears Normal. – The distribution is bell-shaped, it is roughly symmetric, there are no gaps or clusters, and there do not appear to be any outliers. Method 1 Cont. • A boxplot also works! • Assess that the boxplot is roughly symmetric and check for any outliers Method 2 We can improve the effectiveness of our plots by marking x, x ± s, x ± 2s on the horizontal axis. Then compare the counts of observations in each interval using the empirical rule. • MEAN = 6.8585 • STDEV = 1.5952 1 21 2.07 x - 3s 129 3.67 x - 2s 5.26 x–s 331 6.86 x 318 8.45 x+s 125 21 10.05 x + 2s 1 11.64 x+3s Method 2 Continued 1 21 2.07 x - 3s 129 3.67 x - 2s 5.26 x–s 331 6.86 x 318 8.45 x+s 125 21 10.05 x + 2s 1 11.64 x+3s Method 2 Continued… • Because the actual counts of our distribution follow the empirical rule very closely, we can confirm that the Normal distribution with μ = 6.86 and σ = 1.595 fits the data well. STANDARD NORMAL DISTRIBUTIONS The Standard Normal Distribution • All normal distributions are the same if we measure in units of size σ about the mean μ as center. • Changing these units requires that we standardize (like we did in 2.1) Z=x-μ σ • If the variable we standardize has a normal distribution, then so does the new variable, z • The new distribution is called the standard Normal Distribution • We can find the proportion of observation that lie within any range of values simply by finding the area under the curve. The standard Normal Table • Because standardizing Normal distributions makes them all the same, we can use a single table to find the areas under a Normal distribution. • This table is called the standard Normal table. – It’s inside the front cover of you textbook! – You will be given this table on the AP exam The standard Normal Table CAREFUL!!!! • Example: Find the proportion of observations from the standard Normal distribution that are less than -2.15. Using the standard Normal table… • Caution: the area that we found was to the LEFT of z = 2.15. In this case, that is what we were looking for. • HOWEVER if the problem had asked for the area lying to the right of -2.15. What would that answer be? Area to the Right • The total area under the curve is _____. • So if 0.0158 lies to the left of -2.15… • Then lies to the right of -2.15. How do you avoid making a mistake when asked to find the area to the RIGHT? • Always sketch the Normal curve, mark the zvalue, and shade the area of interest (aka the area you are looking for in the problem) • THEN, when you get you answer, CHECK TO SEE IF IT IS REASONABLE!!! Practice • Exercise 2.29 Putting it all Together: Solving Problems Involving Normal Distributions • Step 1: State the distribution and values of interest. Draw a picture of the distribution with the mean/standard deviation clearly ID’s and shade the area of interest. • Step 2: Perform calculations – SHOW YOUR WORK. Either... (i) Compute a z-score for each boundary variable an use Table A or your calculator to find the area under the standard Normal Curve; or (ii) use the normalcdf command and label each of the inputs. • Step 3: Conclusion. Write your conclusion in the context of the problem. – Just saying “the area under the curve that is less that 2.1” means nothing! Your results should tell you something about the data. Example: Cholesterol and Young Boys • • For 14-year-old boys, the mean is μ = 170 milligrams of cholesterol per deciliter of blood (mg/dl) and the standard deviation σ = 30 mg/dl. Levels above 240 mg/dl may require medical attention. What percent of 14-year-old boys have more than 240 mg/dl of cholesterol? Finding a Value when Given a Proportion • What if you wanted to know what score you would have to get in order to place among the top 10% of your class on a test? • Sometimes, we may be asked to find the observed value with a given proportion of the observations above or below it. • To do this, we just read Table A going backwards. In other words, find the proportion you are looking for in the body of the table, figure out the corresponding z-score, and then “unstandardize” to get the observed value. Inverse Normal Calculation Example • Scores on the SAT Verbal test in recent years follow approximately the N(505, 110) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT? Practice! • 2.31a, b • 2.32 Using your calculator: Finding Areas with normalcdf • • • You can find the areas under the Normal curve using normalcdf. For 14-year-old boys, the mean is μ = 170 milligrams of cholesterol per deciliter of blood (mg/dl) and the standard deviation σ = 30 mg/dl. Levels above 240 mg/dl may require medical attention. What percent of 14-yearold boys have more than 240 mg/dl of cholesterol? Using Your Calculator: invNorm • Finally, we can use our calculators to calculate raw or standardized values given the area under the Normal curve or a relative frequency. • Scores on the SAT Verbal test in recent years follow approximately the N(505, 110) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT?