Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
M227 Chapter 6 The Normal Distribution Section 1 OBJECTIVES • • • • • • • Identify distributions as symmetrical or skewed. Identify the properties of the normal distribution. Find the area under the standard normal distribution, given various z values. Find the probabilities for a normally distributed variable by transforming it into a standard normal variable. Find specific data values for given percentages using the standard normal distribution. Use the central limit theorem to solve problems involving sample means for large samples. Use the normal approximation to compute probabilities for a binomial variable. INTRODUCTION • Quantitative random variables can be either discrete or continuous. • Discrete variables assume finite number of values between any two given values of the variable. • Continuous variables can assume infinite number of values between any two given values of the variable. • Examples of continuous variables: height, weight, temperature, cholesterol levels. • Many continuous variables have distributions that are bell-shaped, and these are called approximately normally distributed variables. • Experiment: measure height of women in US.; start with a sample of 100, then start increasing the sample size and decreasing the class width; observe the shape of the resulting histograms. • • • When the sample size becomes very, very large, the histogram approaches a normal distribution (figure (d) above). This distribution is also called a bell curve or the Gaussian distribution. No variable fits the normal distribution perfectly, since the normal distribution is a theoretical curve. For many variables, the deviation of their distributions from the normal distribution is very small; thus, we can use the properties of the normal distribution in the study of these variables. Section 6-1 Page 1 Chapter 6 The Normal Distribution M227 Section 2 Properties of the Normal Distribution • In mathematics, curves can be represented by equations. Examples: equation of a line: y = mx + b , equation of a circle: x 2 + y 2 = r 2 , and so on. • The normal distribution is a continuous, symmetric, bell-shaped distribution of a variable • The equation for the normal distribution curve, developed by the German mathematician Carl Gauss, is: 2 e − ( X − µ ) /(2σ y= σ 2π 2 ) where e ≈ 2.718 π ≈ 3.14 µ = population mean σ = population standard deviation • Shape and position of the normal distribution curve depends on two parameters, the mean and the standard deviation. Normal Distribution Properties 1. The normal distribution curve is bell-shaped. 2. The mean, median, and mode are equal and located at the center of the distribution. 3. The normal distribution curve is unimodal (i.e., it has only one mode). 4. The curve is symmetrical about the mean, which is equivalent to saying that its shape is the same on both sides of a vertical line passing through the center. 5. The curve is continuous—i.e., there are no gaps or holes. For each value of X, here is a corresponding value of Y. 6. The curve never touches the x-axis. Theoretically, no matter how far in either direction the curve extends, it never meets the x-axis—but it gets increasingly closer. 7. The total area under the normal distribution curve is equal to 1.00 or 100%. 8. The area under the normal curve that lies within one standard deviation of the mean is approximately 0.68, or 68%; within two standard deviations, about 0.95, or 95%; and within three standard deviations, about 0.997 or 99.7%. Section 6-2 Page 2 M227 Chapter 6 The Normal Distribution Areas under the Normal Distribution curve Section 6-3 Page 3 Section 3 Chapter 6 The Normal Distribution M227 Section 3 The Standard Normal Distribution • Since each normally distributed variable has its own mean and standard deviation, the shape and location of these curves will vary. In practical applications, one would have to have a table of areas under the curve for each variable. To simplify this, statisticians use the standard normal distribution. • The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. Standard Normal Distribution • The values under the curve indicate the proportion of area in each section , and represent probabilities (compare this to a relative frequency histogram). For example, the area between the mean and 1 standard deviation above or below the mean is about 0.3413 or 34.13%. • “Represent probabilities” mean: If it were possible to select any z value at random, the probability of choosing one, say, between 0 and 2 would be the same as the area under the curve between 0 and 2. In this case, the area is 0.4772. Therefore, the probability of randomly selecting any z value between 0 and 2 is 0.4772. • The horizontal axis for the graph of the standard normal distribution is called the z-axis. • All normally distributed variables can be transformed into the standard normally distributed variable by using the formula for the standard score: z= value − mean SDev or z= X −µ σ • z value is the number of standard deviations that a particular X value is away from the mean. • In order to find the area under the standard normal distribution curve for any z value, use Table E in Appendix C. Section 6-3 Page 4 Chapter 6 The Normal Distribution Finding the Area Under the Normal Distribution Curve: Section 3 M227 1. Between 0 and any z value: Look up the z value in the table to get the area -z 0 0 z 0 0 z 2. In any tail: a. Look up the z value to get area b. Subtract area from 0.5000 -z 3. Between two z values on the same side of the mean: a. Look up both z values to get the areas. b. Subtract the smaller area from the larger area z1 z2 0 0 4. Between two z values on opposite sides of the mean: a. Look up both values to get the areas b. Add the areas -z 0 +z 5. To the left of any z value (z > mean): a. Look up the z value to get the area b. Add 0.5000 to the area 0 +z 6. To the right of any z value (z < mean): a. Look up the z value to get the area b. Add 0.5000 to the area -z 0 7. In any two tails: a. Look up the z values to get areas b. Subtract each area from 0.5000 c. Add the answers -z Section 6-3 Page 5 0 +z z1 z2 Chapter 6 The Normal Distribution M227 Section 3 Examples 1. Find the area under the normal distribution curve for 0 < z < 2.34 or P(0 < z < 2.34) 2. Find the area for −1.75 < z < 0 or P( −1.75 < z < 0 3. Find the area for z > 1.11 or P( z > 1.11) 4. Find the area for z < −1.93 or P( z < −1.93) 5. Find the area for 2 < z < 2.47 or P(2 < z < 2.47) 6. Find the area for −2.48 < z < −0.83 or P( −2.48 < z < −0.83) 7. Find the area for z < 1.99 or P( z < 1.99) 8. Find the area for z > 2.43 or P( z > 2.43) Examples 1. Find the probability that z is less than 2.03 2. Find the probability that z is within 1.4 standard deviations of the mean. 3. Fill in the blank: P( 0 < z < _______ ) = 0.4279 (or “find z value such that the area under the standard normal distribution curve between 0 and the z value is 0.4279”) 4. Find two z values, one positive and one negative, so that the areas in the two tails total 12%. Examples Study the section in the book titled “Excel Step by Step”, page 301. Redo some of the above problems using Excel. Note: The NORMSDIST function returns the “cumulative” area. For example, if z = 1, then NORMSDIST(1) = 0.8413 (0.5000 + 0.3413) as opposed to the E table that returns the value of 0.3413 0 0 z=1 E Table NORMSDIST Section 6-3 Page 6 z=1 Chapter 6 The Normal Distribution M227 Section 4 Applications of the Normal Distribution • For all the problems presented in this chapter, one can assume that the variable is normally distributed or approximately normally distributed. • To solve problems by using the standard normal distribution,, transform the original variable to a standard normal distribution variable by using the z formula: X −µ z= σ Example: Let x be a normal random variable with mean 80 and standard deviation 12. What percentage of values are: 1. Between 85 and 98 2. Outside of 1.5 standard deviations of the mean Example 6-14: The mean number of hours an American worker spends on a computer is 3.1 hours per workday. Assume the standard deviation is 0.5 hour. Find the percentage of workers who spend less than 3.5 hours on the computer. Solution: 1. Draw the figure and represent the area that we want to find. X − µ 3.5 − 3.1 = = 0.80 2. Find the z value corresponding to 3.5: z = σ 0.5 Hence, 3.5 is 0.8 standard deviations above the mean of 3.1. 3. Find the area using table E: A(0<z<0.8)=0.2881; Since we need the area to the left of z=0.8, add 0.5 to it to get 0.7881. 4. Therefore, 78.81% of the workers spend less than 3.5 hours per workday on the computer. Example 6-16: AAA reports that the average time it takes to respond to an emergency call is 25 minutes. Assume that standard deviation is 4.5 minutes. If 80 call are randomly selected, approximately how many will be responded to in less than 15 minutes? 2. 15 − 25 = −2.22 σ 4.5 Find the area from table E: 0.4868 (use +2.22) 3. Subtract 0.4868 from 0.5 to get 0.0132 4. Multiply the sample size 80 by 0.0132 to get 1.056. Hence, approximately 1 call will be responded to in under 15 minutes. 1. Find the area to the left of 15: z = X −µ = Calculating Cut-off Values Example 6-17: To qualify for a police academy, candidates must score in the top 10% on a general abilities test. The test has a mean of 200 and a standard deviation of 20. Find the lowest possible score to qualify. Section 6-4 Page 7 M227 Chapter 6 The Normal Distribution Section 4 1. We need to find the X values that cuts off the upper 10% of the area under the normal distribution curve 2. 3. 4. Work backward to solve this problem. Subtract 0.1 from 0.5 to get the area between the mean 200 and X. Find the z value that corresponds to the area of .4000 If specific value cannot be found, use closest value. If it falls exactly between two z values, use the larger of the two z values. z value = 1.28 5. 6. Substitute in the z value formula and solve for X: X −µ X − 200 → 1.28 = → X = 226 z= σ 20 A score of at least 226 is needed in order to qualify Example 6-18: For a medical study, a researcher wishes to select people in the middle 60% of the population based on blood pressure. If the mean systolic pressure is 120 and the standard deviation is 8, find the upper and lower readings that would qualify people to participate in the study. Section 6-4 Page 8 M227 Chapter 6 The Normal Distribution Section 4 Note that two values are needed, one above the mean and one below the mean. Find the value to the right of the mean first. The closest z value for an area of 0.3000 is 0.84. Substitute this into the z-score formula to find X1: X −µ X − 120 z= → 0.84 = 1 → X 1 − 120 = (0.84)(8) → X 1 = 120 + 6.72 = 126.72 σ 8 On the other side: X 2 = 120 − 6.72 = 113.28 Therefore, the middle 60% will have blood pressure readings of 113.28 < X < 126.72 Determine Normality 1. Draw a histogram and see if the curve is bell-shaped. If it is, do step 2. 3( X − median ) 2. Check for skewness using Pearson’s index of skewness: PI = ; If s −1 < PI < 1 , then do step 3. (If not, assume that the data are significantly skewed) 3. Check for outliers. Section 6-4 Page 9 Chapter 6 The Normal Distribution M227 Section 5 The Central Limit Theorem • A sampling distribution of sample means is a distribution using the means computed from all possible random samples of a specific size taken from a population. • The goal of the Central Limit Theorem is to determine the behavior of the means of samples of the same size taken from the same population, as it relates to the population mean. • Properties of the Distribution of Sample Means (select all possible samples of a specific size with replacement): o The mean of the sample means (denoted by µ X ) will be the same as the population mean o The standard deviation of the sample means will be smaller than the standard deviation of the population, and it will be: σ X = • σ n Example: Consider an 8-point quiz given to 4 students. The results of the quiz were: 2, 6, 4, 8. Assume that the population for this experiment consists of the 4 students. The population mean and standard deviation are: 2+6+4+8 µ= =5 4 σ= (2 − 5)2 + (6 − 5)2 + (4 − 5)2 + (8 − 5)2 = 2.236 4 Select all samples sizes of 2 taken with replacement, and calculate the mean of each sample: Sample 2,2 2,4 2,6 2,8 4,2 4,4 4,6 4,8 Mean 2 3 4 5 3 4 5 6 Sample 6,2 6,4 6,6 6,8 8,2 8,4 8,6 8,8 Mean 4 5 6 7 5 6 7 8 Construct an ungrouped frequency distribution of the means: Section 6-5 X f 2 3 4 5 6 7 8 1 2 3 4 3 2 1 Page 10 Chapter 6 The Normal Distribution M227 Section 5 The Histogram for this distribution appears to be approximately normal: Sample Means Frequencies 5 4 4 3 3 2 3 2 2 1 1 1 0 2 3 4 5 6 7 8 Sample Means Calculate the mean of the sample means: µX = 2 + 3 + ⋅⋅⋅ + 8 80 = = 5 = µ !! 16 16 Calculate the standard deviation of the sample means: σX = (2 − 5)2 + ⋅⋅⋅(8 − 5)2 = 1.581 ; 16 σ n = 2.236 = 1.581 = σ X !!!! 2 • The standard deviation of the sample means is called the standard error of the mean. • Central Limit Theorem: As the sample size n increases, the shape of the distribution of the sample means taken with replacement from a population with mean µ and standard deviation σ will approach a normal distribution. This distribution will have a mean µ and a standard deviation σ X = σ n . • The central limit theorem can be used to answer questions about sample means in the same manner that the normal distribution can be used to answer questions about individual values. • A new formula must be used for the z values: z = • Practical Alternatives to the Central Limit Theorem: o When the original population is normally distributed, then the distribution of the sample means will be also normally distributed, for any sample size n. o X −µ σ n When the distribution of the original population might not be normal, then a sample size of 30 or more is needed in order to assume that the sample of the means is approximately normally distributed. The larger the sample, the better the approximation. Example 6-22: The average age of a vehicle registered in the United States is 8 years (or 96 months). Assume the standard deviation is 16 months. If a random sample of 36 vehicles is selected, find the probability that the mean of their age is between 90 and 100 months. Section 6-5 Page 11 Chapter 6 The Normal Distribution M227 Section 5 Since the sample size is greater than 30, we can assume that the sample is approximately normally distributed: The two z values are: z1 = 90 − 96 100 − 96 = −2.25, z2 = = 1.5 16 36 16 36 The two areas corresponding to these z values are: A( −2.25) = 0.4878 and A(1.5) = 0.4332 Since the z values are on the opposite side of the mean, the probability is found by adding the two areas: P (90 < X < 100) = 0.4878 + 0.4332 = 0.921 = 92.1% Example 6-23: • Emphasize the difference between asking questions about an individual and between a sample . Finite Population Correction Factor OMIT Section 6-5 Page 12 Chapter 6 The Normal Distribution M227 Section 6 Normal Approximation to the Binomial Distribution • Often, Normal Distributions are used to solve problems of Binomial Distributions when n is large. • Characteristics of binomial distributions: o There must be a fixed number of trials. o The outcome of each trial must be independent. o Each experiment can have only two outcomes or be reduced to two outcomes. o The probability of a success must remain the same for each trial. • A binomial distribution is determined by n (number of trials) and p (probability of success). • When p is approximately 0.5 and n increases, the shape of the binomial distribution becomes similar to that of the normal distribution. • Rule of thumb: Use normal distribution when: n ⋅ p ≥ 5 and n ⋅ q ≥ 5 . • • In addition to the above condition, a correction for continuity must be used. This correction results from the fact that when we deal with a discrete variable X, we must use its boundaries for its probability: example P ( X = 7) we use the correction P (6.5 < X < 7.5) . Summary of Normal Approximation to the Binomial Distribution • Binomial P( X P( X P( X P( X P( X = a) ≥ a) > a) ≤ a) < a) Normal P ( a − 0.5 < X < a + 0.5) P ( X > a − 0.5) P ( X > a + 0.5) P ( X < a + 0.5) P ( X < a − 0.5) • Procedure to use Normal distribution for approximating Binomial Distribution: o Step 1: Check to see whether the normal approximation can be used. o Step 2 Find the mean µ and the standard deviation σ . o Step 3 Write the problem in probability notation, using X. o Step 4 Rewrite the problem using the continuity correction factor, and show the corresponding area under the normal distribution curve. o Step 5 Find the corresponding z values. o Step 6 Find the solution. • • Example 6-24. Example 6-27. Section 6-6 Page 13 M227 Chapter 6 The Normal Distribution Section 7 Summary • The normal distribution can be used to describe a variety of variables, such as heights, weights, and temperatures. • The normal distribution is bell-shaped, unimodal, symmetric, and continuous; its mean, median, and mode are equal. • The area under the normal distribution curve is 1. • Mathematicians use the standard normal distribution which has a mean of 0 and a standard deviation of 1. • The normal distribution can be used to describe a sampling distribution of sample means. • These samples must be of the same size and randomly selected with replacement from the population. • The central limit theorem states that as the size of the samples increases, the distribution of sample means will be approach a normal distribution. • If the normality of the population is not known, use a sample size greater than 30. • The normal distribution can be used to approximate other distributions, such as the binomial distribution. • For the normal distribution to be used as an approximation to the binomial distribution, the conditions np ≥ 5 and nq ≥ 5 must be met. • A correction for continuity may be used for more accurate results. Section 6-7 Page 14