Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1.3 Density Curves and Normal Distributions Ulrich Hoensch Tuesday, September 11, 2012 Fitting Density Curves to Histograms Advanced statistical software (NOT Microsoft Excel) can produce “smoothed versions” of histograms. Example 1 The following are histograms and corresponding density curves for data representing: (a) the acidity or rainwater; (b) the survival time of Guinea pigs. Fitting Density Curves to Histograms When fitting a density curve to a histogram, we want that for any interval on the horizontal axis that spans the width of a collection of rectangles, the following holds: area of rectangles ≈ area under density curve. This requirement follows from the more general fact that for both histograms and density curves, area = proportion. Example 2: Scores of Seventh-Graders The following figure shows the histogram and a fitted density curve of the test scores of 947 seventh-grade students on the Iowa Test of Basic Skills. (a) The bars shaded in blue represent the actual proportion of students who score less than or equal to a 6.0 (the proportion is 0.303 = 30.3%). (b) The area below the density curve shaded in blue represents the proportion of students who score less than or equal to a 6.0, as predicted by the density curve (this proportion is 0.293 = 29.3%). Example 2: Scores of Seventh-Graders Definition of Density Curve A density curve is a curve that I is always on or above the horizontal axis and I has area exactly 1 underneath it. In addition, we have that for any two values a and b on the horizontal axis, area below the density curve between a and b ≈ proportion of observations that fall between a and b. Median of a Density Curve The median of a density curve is the point M on the horizontal axis so that the area below the density curve and to the left of M is 50% (and consequently the area to the right is also 50%). 50% 50% Median Percentiles of a Density Curve The pth percentile of a density curve is the point P on the horizontal axis so that p percent of the area below the density curve lie to the left of P. The inter-quartile range is consequently the extent of the middle 50% of the area. 50% Q1 Q3 Mean of a Density Curve The mean of a density curve is the “balance point” of the curve: if the area below the curve were made of a solid material, the mean would correspond to the position of the fulcrum when balancing it: Mean and Median of a Density Curve Unless a density curve is symmetric, the mean is not equal to the median. I For right-skewed distributions the mean is larger than the median; I For left-skewed distributions the mean is smaller than the median. Normal Distributions Normal curves are the density functions of normal distributions. They have the following general shape. I They are symmetric, unimodal (have only one peak), and bell-shaped. I The mean is denoted by the symbol µ (small Greek letter “mu”), and the standard deviation is denoted by the symbol σ (small Greek letter “sigma”). I On either side of the mean there are two points, called inflection points where the curve makes the transition from bending upwards to bending downwards, and vice versa. I The standard deviation σ is the horizontal distance from the mean µ to these inflection points. Normal Distributions Two normal curves are shown here. The 68-95-99.7 Rule Example 3: Height of Young Women The height of young women aged 18 to 24 is approximately normally distributed with mean µ = 64.5 inches and standard deviation σ = 2.5 inches. We write X ∼ N(µ, σ) if a variable X has a normal distribution with mean µ and standard deviation σ. Consequently, we have that for the height X of young women, X ∼ N(64.5, 2.5). Example 3: Height of Young Women Question. What is the percentage of women that are between 62 and 69.5 inches tall? Answer. I The value 62 is one standard deviation below the mean, so the area between the value 62 and the mean is 68%/2 = 34%. I The value 69.5 is two standard deviations above the mean, so the area between the mean and the value 69.5 is 95%/2 = 47.5%. I We are looking for the combined area: the percentage of women that are between 62 and 69.5 inches tall is 34% + 47.5% = 81.5%. Example 4: LSAT Scores The scores X on the LSAT are normally distributed with mean µ = 150 and standard deviation σ = 15; that is X ∼ N(150, 15). 100 120 140 160 180 200 Find each of the following, using a TI-83/TI-83 Plus/TI-84 Plus calculator. Example 4: LSAT Scores Q. Find the percentage of people who score between 120 and 160 on the LSAT. A. 1. Type [2ND] VARS (DISTR), select 2: normalcdf(. 2. Type normalcdf(120,160,150,15). 3. Press ENTER. The proportion is 0.72475 . . ., so the percentage of people who score between 120 and 160 is about 72.5%. Example 4: LSAT Scores The shaded area is 72.5%. 100 120 140 160 180 200 Note: The general syntax for finding the proportion between a and b is normcdf(a,b ,µ,σ ). Example 4: LSAT Scores Q. Suppose you scored a 172 on the LSAT. Find the percentage of people who scored below or at the same level. A. 1. Type [2ND] VARS (DISTR), select 2: normalcdf(. 2. Type normalcdf(-1000000,172,150,15). (The number −1000000 can be replaced by any very small (very negative) number.) 3. Press ENTER. The proportion is 0.92876 . . ., so the percentage of people who score at or below 172 is about 92.9%. 92.9% 100 120 140 160 180 200 Example 4: LSAT Scores Q. What is the cutoff score for the top 10% (i.e. the 90th percentile)? A. 1. Type [2ND] VARS (DISTR), select 3: invNorm(. 2. Type invNorm(0.9,150,15). 3. Press ENTER. The percentile is 169.22 . . ., so 90% of people score below 169 (and 10% score above 169). 90% 100 120 140 160 180 200 Example 4: LSAT Scores The general syntax for finding the cutoff so that the proportion p of observations fall below this cutoff is invNorm(p,µ,σ ). Q. Find the scores of the middle 80% of the distribution. A. In order to find the middle 80%, we need to find the 10th and the 90th percentile. 80% 100 120 140 160 180 200 1. The 90th percentile was computed above, it is 169. 2. Type invNorm(0.1,150,15) to find the 10th percentile. It is 130.77 . . . ≈ 131. So the middle 80% score between 131 and 169 on the LSAT.