Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1.3 Density Curves and Normal Distributions Density curves Measuring center and spread for density curves Normal distributions The 68-95-99.7 rule Standardizing observations Using the standard Normal Table Inverse Normal calculations Normal quantile plots 1 Exploring Quantitative Data 2 We now have a kit of graphical and numerical tools for describing distributions. We also have a strategy for exploring data on a single quantitative variable. Now, we’ll add one more step to the strategy. Exploring Quantitative Data 1. Always plot your data: make a graph. 2. Look for the overall pattern (shape, center, and spread) and for striking departures such as outliers. 3. Calculate a numerical summary to briefly describe center and spread. 4. Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. 2 Recall: Histogram Table 1.3 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Recall: From Section 1.1, we have: Q: How many percent of those chose fifth-grade students have IQ scores of 105 or less? Important property of a density curve is that areas under the curve correspond to relative frequencies Density Curves Example: Here is a histogram of vocabulary scores of 947 seventh graders. The smooth curve drawn over the histogram is a mathematical model for the distribution. 5 Density Curves An important property of a density curve is that areas under the curve correspond to relative frequencies relative frequencies=.303 area = .293 Note the relative frequency of vocabulary scores <= 6 is roughly equal to the area under the density curve <= 6. Density Curves and Normal Distribution Density curves come in any imaginable shape. Some are well known mathematically and others aren’t. Density Curves and Normal Distribution Definition, pg 56 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Normal distributions Normal – or Gaussian – distributions are a family of symmetrical, bell shaped density curves defined by a mean m (mu) and a standard deviation s (sigma) : N(m,s). 1 f ( x) e 2 1 xm 2 s 2 x e = 2.71828… The base of the natural logarithm π = pi = 3.14159… x A family of density curves Here means are the same (m = 15) while standard deviations are different (s = 2, 4, and 6). 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Here means are different (m = 10, 15, and 20) while standard deviations are the same (s = 3) 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 The 68-95-99.7 Rule The 68-95-99.7 Rule In the Normal distribution with mean µ and standard deviation σ: Approximately 68% of the observations fall within σ of µ. Approximately 95% of the observations fall within 2σ of µ. Approximately 99.7% of the observations fall within 3σ of µ. Standard Normal Distribution N(0, 1) 11 The standard Normal distribution Because all Normal distributions share the same properties, we can standardize our data to transform any Normal curve N(m,s) into the standard Normal curve N(0,1). N(64.5, 2.5) N(0,1) => x z Standardized height (no units) For each x we calculate a new value, z (called a z-score). Standardizing: calculating z-scores A z-score measures the number of standard deviations that a data value x is from the mean m. z (x m ) s When x is 1 standard deviation larger than the mean, then z = 1. for x m s , z m s m s 1 s s When x is 2 standard deviations larger than the mean, then z = 2. for x m 2s , z m 2s m 2s 2 s s When x is larger than the mean, z is positive. When x is smaller than the mean, z is negative. Use normalcdf(start, end, 0, 1) to find prob=area under N(0, 1). Prob=Area=normalcdf(-999, -1, 0, 1) =0.1587 Prob=Area=normalcdf(-999, -1, 0, 1) =0.8413 B A Prob=Area=normalcdf(-1, 2, 0, 1) =0.8186 For Part A: Prob=Area =normalcdf(1, 999, 0, 1) =0.1587 Use normalcdf(start, end, 0, 1) to find prob=area under N(0, 1). Example: Let Z follows a standard normal distribution, Z~N(0, 1), find out: (1) (2) (3) (4) (5) (6) (7) Pr(Z>0). Pr(Z>3). Pr(Z<-1). Pr(-1<Z<1). Pr(-2<Z<1). Pr(1.5<Z<2.3). Pr(-4<Z<4). Standard Normal Distribution N(0, 1) Normal Calculations How to Solve Problems Involving Normal Distributions Express the problem in terms of the observed variable x, list the values of µ and σ. Perform calculations. Step1: Standardize x to restate the problem in terms of a standard Normal variable z. Step 2: Draw a picture of N(0, 1), and shade the area of interest under the curve. Step 3: Use normalcdf(start, end, 0, 1) to find: PROB=required area under standard Normal curve. Write your conclusion in the context of the problem. 16 Example 1: The National Collegiate Athletic Association (NCAA) requires Division I athletes to score at least 820 on the combined math and verbal SAT exam to compete in their first college year. The SAT scores of 2003 were approximately normal with mean 1026 and standard deviation 209. What proportion of all students would be NCAA qualifiers (SAT ≥ 820)? x 820 m 1026 s 209 (x m) z s (820 1026) 209 206 z 0.99 209 z Use Calculator and find : normalcdf( -0.99, 999, 0, 1) 0.84 Example 2: Recall: The SAT scores of 2003 were approximately normal with mean 1026 and standard deviation 209. The NCAA defines a “partial qualifier” eligible to practice and receive an athletic scholarship, but not to compete, as a combined SAT score is at least 720. What proportion of all students who take the SAT would be partial qualifiers? That is, what proportion have scores between 720 and 820? x 720 m 1026 s 209 (x m) z s (720 1026) 209 306 z 1.46 209 Use Calculator and find : normalcdf( -1.46, - 0.99, 0, 1) 9% z x 820 m 1026 s 209 z (x m) s (820 1026) 209 206 z 0.99 209 z About 9% of all students who take the SAT have scores between 720 and 820. Example 1.25, Page 59: Heights of young women The distribution of heights of young women aged 18 to 24 is approximately Normal distribution with mean µ = 64.5 inches and standard deviation s = 2.5 inches. That is: heights follows approximately N(64.5”,2.5”) distribution. Let X be the height of women aged 18 to 24. X ~ N(64.5”,2.5”) approx. Question: What percent of women are shorter than 67 inches tall (i.e. 5’6”)? mean µ = 64.5" standard deviation s = 2.5" x (height) = 67" EX 1.25, Page 59: Women heights Women heights Approx. N(64.5”,2.5”) distribution. What percent of women are shorter than 67 inches tall (that’s 5’6”)? mean µ = 64.5" standard deviation s = 2.5" x (height) = 67" We calculate z, the standardized value of x: z z (x m) s , (67 64.5) 2.5 1 2.5 2.5 Conclusion: 84.13% of women are shorter than 67”. By subtraction, 1 - 0.8413, or 15.87% of women are taller than 67". EX 1.25, Page 59: Women heights (Cont.) Let X=height (inches) of young women aged 18-24 years. X ~N(64.5", 2.5") approx. Question: a) What percent of these women's heights are between 63" and 68"? b) What percent of these women are taller than 65 inches tall? EX 1.25, Page 59: Women heights (Cont.) Let X=height (inches) of young women aged 18-24 years. X ~N(64.5", 2.5") approx. Question: c) What height represents the 90th percentile of this aged woman? Question c) is what it is called a "backwards problem", since you're solving for an X value while know an area… Review: p-th percentile The p-th percentile of a distribution is the value that has p percent of the observations fall at or below it. (Recall Q1, Median, and Q3.) Inverse normal calculations For N(0, 1), find the observed range of values that correspond to a given proportion/ area under the curve, by invnorm(%, 0, 1) EX: (1) the 25th percentile. (2) the 55th percentile (3) the 10th percentile. (4) the 90th percentile Inverse normal calculations Example1: Suppose the height of a randomly selected 5-year-old child is a normal distribution with m =100cm and s =6cm. (1)What’s the 90th percentile? (2) What’s the 50th percentile? (3) What’s the 10th percentile? (4) What’s the 25th percentile? (5) what’s the 56th percentile? Solution (1) : Step1 : From Calculator : Z 1.28 Step2 : So the z - score is 1.28, which means that : ( x m ) ( x 100) 1.28 s 6 Step3 : x 100 (1.28 6) x 107.68 Answer Key: (1) 107.68; (2) 100; (3) 92.31; (4) 95.95; (5) 100.91. Inverse normal calculations Example 2: A soft-drink machine is regulated so that it discharges an average of 200 milliliters per cup with SD 15 milliliters. With normality assumption. (1) Find the prob that a cup will contain more than 220 milliliters (2) Find the prob that a cup will contain between 180 and 230 milliliters (3) Find the 40th percentile of the discharge amount (4) Find the 89th percentile of the discharge amount Normal quantile plots One way to assess if a distribution is indeed approximately normal is to plot the data on a normal quantile plot. The data points are ranked and the percentile ranks are converted to zscores with Table A. The z-scores are then used for the x axis against which the data are plotted on the y axis of the normal quantile plot. If the distribution is indeed normal the plot will show a straight line, indicating a good match between the data and a normal distribution. Systematic deviations from a straight line indicate a nonnormal distribution. Outliers appear as points that are far away from the overall pattern of the plot. Good fit to a straight line: the distribution of rainwater pH values is close to normal. Curved pattern: the data are not normally distributed. Instead, it shows a right skew: a few individuals have particularly long survival times. Normal quantile plots are complex to do by hand, but they are standard features in most statistical software. Normal quantile plot of CO2 – Table 1.6 on page 33 Notice the systematic failure of the points to fall on the line, especially at the low end where the data is “piled up”. Also, note the outliers at the high end… Conclusion: Not normal Normal quantile plot of the IQ scores of 78 7th grades students - Data in Table 1.9 on page 39 Notice that the data points follow the line fairly well, though there is a slight curve at the low-middle, indicating more data than would be expected for a normal. The y-intercept is around 110 (mean= approx. 110) and the slope is around 10 (s.d. is approx. 10). Conclusion: Normal