Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 2.2 Density curves and Normal Distributions Activity Roll one die 20 times and record the results Make a dotplot Roll two dice 20 times and record the sum of the two die Make a dotplot Exploring Quantitative Data In Chapter 1, we developed a kit of graphical and numerical tools for describing distributions. Now, we’ll add one more step to the strategy. Exploring Quantitative Data 1. Always plot your data: make a graph, usually a dotplot, stemplot, or histogram. 2. Look for the overall pattern (shape, center, and spread) and for striking departures such as outliers. 3. Calculate a numerical summary to briefly describe center and spread. 4. Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve (density curve) 2 4 8 6 S u m o f Tw o D i c e 10 12 Density Curves • A density curve is a mathematical model for the distribution – A mathematical model is an idealized description • Ignores minor irregularities as well as any outliers • Describes the overall pattern of a distribution Density Curves • Is always on or above the horizontal axis • Has area exactly 1 underneath it • The area under the curve and between any range of values is the proportion of all observations that fall in that range • Most common density curve is the Normal curve Mean and median of a density curve • Median of a density curve is the equal-areas point (hard to find with skewed curves) • Mean is the balance point of a density curve (at which the curve would balance if made of solid material) • In a symmetric distribution the mean and median are in the same place (the center of the curve) • In a skewed distribution the mean is pulled towards the tail • Can basically locate the mean, median, and quartiles of any density curve (not the standard deviation) http://www.shodor.org/interactivate/activities/PlopIt/ Describing Density Curves Because a density curve is an idealized description of the distribution of data, we need to distinguish between the mean and standard deviation of the density curve and the mean and standard deviation computed from the actual (sample) observations actual observations density curve Mean x standard deviation s (sigma) (mu) Density Curve practice Density Curve practice Normal Distributions One particularly important class of density curves are the Normal curves, which describe Normal distributions. • All Normal curves have the same overall shape: symmetric, single- peaked, and bell-shaped • Any specific Normal curve is completely described by giving its mean µ and its standard deviation σ. • Changing µ without changing σ moves the normal curve along the horizontal axis without changing its spread. • The standard deviation σ controls the spread of a normal curve. • Inflection points are the points at which the change of the curvature takes place and are located at a distance σ on either side of the mean. • We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ). Empirical Rule: 68-95-99.7 In the Normal distribution with mean μ and standard deviation σ. Remember values in a data set must appear to be a normal bellshaped histogram, dotplot, or stemplot to use the Empirical Rule! Warmup 1. Suppose the average height of freshmen at MRHS is 60 inches with a standard deviation of 1.5 inches. What is the z-score for a freshman who has a height of a. 58 inches? Z= -1.33 b. 60.15 inches? Z= 0.1 2. Suppose the average height of sophomores at MRHS is 62 inches with a standard deviation of 2 inches. What is the height of the sophomore (x- value) that corresponds to a a) z-score = 0? b) z-score = -2.44? c) z-score = 3.1? x= 62 in x= 57.12 in x= 68.2 in 3. Suppose the average height of juniors at MRHS is unknown but the standard deviation is 2.5 inches. What is the population mean height of juniors if a junior 66 inches tall corresponds to a z-score of μ= 67.8 in 0.75? 4. Suppose the height of seniors at MRHS is 67 inches but the standard deviation is unknown. What is the standard deviation knowing a) a senior 68.5 inches tall has a corresponding z-score of .87? σ = 1.73 in b) a senior 63 inches tall has a corresponding z-score of –2.43? (This resulting population standard deviation should be σ = 1.65 different from the answer to a.) in 5. An incoming freshman took her college’s placement exams in French and math. In French, she scored 82 and in math, 86. The overall results on the French exam had a mean of 72 and a standard deviation of 8, while the mean math score was 76 with a standard deviation 12. On which exam did she do better RELATIVE to her classmates? This shows that she performed better on the French exam than on the Math exam relative to her classmates. Standard Normal • Normal distributions vary from one another in two main ways: • The mean, µ, may be located anywhere on the x-axis • The bell shape may be more or less spread according to the size of the standard deviation, σ • We can standardize our distribution in order to find proportions and compare to other distributions The Standard Normal Distribution All Normal distributions are the same if we measure in units of size σ from the mean µ as center. The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the standardized variable z x - has the standard Normal distribution, N(0,1). Standard Normal Table • Table A – table of areas under the standard normal curve. The table entry for each value z is the area under the curve to the left • We can answer any question about proportion of observations in a normal distribution by standardizing and then using the standard normal Always draw your curve and shade what you are looking for!!!!! table The Standard Normal Table The standard Normal Table (Table A) is a table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left of z. Suppose we want to find the proportion of observations from the standard Normal distribution that are less than z = 0.81. We can use Table A: Z .00 .01 .02 0.7 .7580 .7611 .7642 0.8 .7881 .7910 .7939 0.9 .8159 .8186 .8212 P(z < 0.81) = .7910 Practice 1. Find the proportion of observations from the standard normal distribution that are less than 1.4 0.9192 or 91.92% Practice 2. Find the proportion of observations from the standard normal distribution that are greater than –2.15 1-0.0158=0.9842 or 98.42% Always make sure that you draw a picture and shade the area of interest and ensure that your answer is reasonable in the context of the problem Find the following probabilities: 3. P(z < 1) 4. P(z < 0) 50% 84.13% 5. P(z < 1.5) 93.32% 6. P(z > 1.5) 1-.9332 = 6.68% 7. P(2.3 < z < 3.1) .9990-.9893= 0.97% Normal Distribution Calculations We can answer a question about areas in any Normal distribution by standardizing and using Table A or by using technology. How To Find Areas In Any Normal Distribution Step 1: State the distribution and the values of interest. Draw a Normal curve with the area of interest shaded and the mean, standard deviation, and boundary value(s) clearly identified. Step 2: Perform calculations—show your work! Do one of the following: (i) Compute a z-score for each boundary value and use Table A or technology to find the desired area under the standard Normal curve; or (ii) use the normalcdf command and label each of the inputs. Step 3: Answer the question in context. Going back to Warmup #5 An incoming freshman took her college’s placement exams in French and math. In French, she scored 82 and in math, 86. The overall results on the French exam had a mean of 72 and a standard deviation of 8, while the mean math score was 76 with a standard deviation 12. What percent of students did she score higher than for both tests? French: 89.44% Math: 79.67% The area on Table A corresponds to the percentiles. Using your Calculatornormalcdf (instead of Table A) Scores on the SAT are approximately normal N(505, 110) distribution. What proportion of students fall: Normalcdf Below 100? Lower: -1E99 Upper: 100 µ: 505 σ: 110 P(x<100) ≈ 0 Above 330? Normalcdf Lower: 330 Upper: 1E99 µ: 505 σ: 110 P(x<100) ≈ 0.944 Between 200 and 600? Normalcdf Lower: 200 Upper: 600 µ: 505 σ: 110 P(x<100) ≈ 0.803 Working Backwards: Normal Distribution Calculations Sometimes, we may want to find the observed value that corresponds to a given percentile. There are again three steps. How To Find Values From Areas In Any Normal Distribution Step 1: State the distribution and the values of interest. Draw a Normal curve with the area of interest shaded and the mean, standard deviation, and unknown boundary value clearly identified. Step 2: Perform calculations—show your work! Do one of the following: (i) Use Table A or technology to find the value of z with the indicated area under the standard Normal curve, then “unstandardize” to transform back to the original distribution; or (ii) Use the invNorm command and label each of the inputs. Step 3: Answer the question. How high must a student score in order to place in the top 10% of all students taking the SAT? A percentile of 90% corresponds to a zscore around 1.29 𝑧= 𝑥−𝜇 𝜎 1.29 = 𝑥 − 505 110 𝑥 = 646.9 invNorm Area= 0.9 µ= 505 σ=110 X=645.97 Remember to answer in context and MAKE SURE YOU ROUND THE CORRECT DIRECTION!!!!! Warm up 1. According to http://www.cdc.gov/growthcharts/, the heights of three-year-old females are approximately Normally distributed with a mean of 94.5 cm and a standard deviation of 4 cm. What is the third quartile of this distribution? #61 An airline flies the same route at the same time each day. The flight time varies according to a Normal distribution with unknown mean and standard deviation. On 15% of days, the flight takes more than an hour. On 3% of days, the flight lasts 75 minutes or more. Use this information to determine the mean and standard deviation of the flight time distribution. Ways for Assessing Normality 1. Construct a stemplot or histogram Determine if the graph has an approximate bell shaped curve 2. See if your data follows the Empirical Rule (68, 95, 99.7%) 3. Construct a normal probability plot (aka normal quantile plot) Assess the graph using the graphing calculator Does the data look approximately linear? Assessing Normality The Normal distributions provide good models for some distributions of real data. Many statistical inference procedures are based on the assumption that the population is approximately Normally distributed. A Normal probability plot provides a good assessment of whether a data set follows a Normal distribution. Interpreting Normal Probability Plots If the points on a Normal probability plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot. Example of GRE scores Normal probability plot of hurricane losses Normal Probability Plots Remember the comparison of Barry Bonds to Hank Aaron in Ch 1? What was the shape of Aaron’s data: 13 20 24 26 27 29 30 32 34 34 38 39 39 40 40 44 44 44 44 45 47 • If the data distribution is close to a normal distribution, the plotted points will lie close to a straight line. • Nonnormal data will show a nonlinear trend • Outliers appear as points that are far away from the overall pattern of the plot • Try our Barry Bond’s example from Ch 1. • We proved before that 73 was an outlier. See how it looks on the Normal probability plot. The measurements listed below describe the usable capacity (in cubic feet) of a sample of 36 side-by-side refrigerators (Consumer Reports, May 2010). Are the data close to Normal? 12.9 13.7 14.1 14.2 14.5 14.5 14.6 14.7 15.1 15.2 15.3 15.3 15.3 15.3 15.5 15.6 15.6 15.8 16.0 16.0 16.2 16.2 16.3 16.4 16.5 16.6 16.6 16.6 16.8 17.0 17.0 17.2 17.4 17.4 17.9 18.4 Here is a histogram of these data. It seems roughly symmetric and bellshaped. Does this data follow the Empirical rule? What does this tell you? Create a Normal Probability Plot. What can you conclude? Since we have proved out data are very close to Normal, we can go ahead and use our Normal calculations (normalcdf, invNorm, z-scores, etc)