Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
52 Instruction: The Normal Distribution This lecture discusses the normal distribution (also called the Gaussian distribution). Previous lectures discussed samples and their distributions. In this section, we will consider a particular population distribution called the normal curve. The normal curve is a bell-shaped symmetric curve that is the graph of a theoretical relative frequency distribution of a continuous variable associated with a large population. Recall that a continuous variable takes values that represent categories such that an infinite number of possible scores fall between any two measured scores. Such observations as weights, lengths, and durations are continuous variables. Consider for example the sample of weights rounded to the nearest tenth of a microgram and the sample's relative frequency distribution below. ⎧ 2.2, 4.9, 0.6, 2.4, 2.9, 3.1, 1.4, 2.6, 1.7, 2.3, ⎪ W = ⎨1.8, 3.2, 2.1, 4.3, 2.6, 1.9, 3.3, 0.8, 3.9, 2.5, ⎪ 2.7, 1.8, 2.3, 3.4, 1.1, 2.7, 3.5, 1.4, 3.8, 2.1 ⎩ ⎫ ⎪ ⎬ ⎪ ⎭ f 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 This symmetrical distribution represents a sample of thirty weights measured in micrograms. Imagine that the sample represents a random sample of thirty weights taken from a type of krill in the Atlantic ocean; then imagine the distribution associated with the population of billions of the same type of krill. The distribution above uses five classes each with a width of one microgram. For a population of billions, much smaller classes would be used with tiny widths with the effect that the graph would be "smoothed" into a continuous bell-shaped curve similar to Figure A. Figure A 53 Indeed, many such populations have relative frequency distributions whose graph takes a shape of this type called the normal curve whose properties are given below. The graph of a normal curve is bell-shaped and symmetric about a vertical line through the center of the distribution, which corresponds to the mean, median, and mode. The normal curve approaches the Xaxis asymptotically at the far right and far left. The functions that define or yield the distribution of a continuous random variable are called continuous probability density functions. The normal probability density function is given below. The normal probability density function, f ( X ) , is given by f (X ) = 1 ⎡ X −µ ⎤ σ ⎥⎦ 2 − ⎢ 1 e 2⎣ 2πσ where µ is the mean, σ is the standard deviation, and X is the continuous random variable. The independent variable of the normal probability density function is the continuous random variable. The mean and the standard deviation are parameters such that every distinct combination of µ and σ produce a different normal probability distribution. As Figure B shows, normal distributions can vary while maintaining the characteristics that make them normal. Figure B Since the normal curve represents a relative frequency distribution it represents in effect a probability distribution because the relative frequency of any given interval of X-values equals the probability that a given object selected at random from the population will have a raw score that falls in the given interval (assuming that selecting each member of the population is equally likely). Accordingly, the area under the curve along any interval of X-values corresponds the probability that the continuous variable will equal a value in the corresponding interval. The total area under the normal curve equals one square unit, representing the population as a whole (100% of the data) and corresponding to the certain probability. According to what is called the empirical rule, the area under the normal curve that corresponds to an interval of values within one, two, and three standard deviations equals about 68%, 95%, and 99.7% of the area respectively. This means that approximately 68% of the data of a normal population falls within one standard deviation, approximately 95% of the data of a normal population falls within two standard deviations, and approximately 99.7% of the data of a 54 normal population falls within three standard deviations. The empirical rule is displayed graphically below in Figure C. −3 −2 −1 0 1 2 3 Figure C The empirical rule has implications that can help solve problems. Consider a population of 500,000 whose distribution is understood to be normal. If the population mean is 12 and the standard deviation equals 4. How many members of the population have raw scores between 8 and 16? Note that 8 and 16 are both within one standard deviation (that is, one four-unit interval) from the mean. According to the empirical rule, about 68% of the data fall within one standard deviation of the mean. Since 0.68×500,000 = 340,000, there are an estimated 340,000 members with raw scores between 8 and 16. The symmetry of the normal curve also has implications that can help solve problems. Consider the same population above. How many members of the population have raw scores between 12 and 16. We know 340,000 members fall between 8 and 16. Note that the mean equals the midpoint of 8 and 16. Since the distribution is symmetrical, half of the 340,000 members fall between 8 and the mean while the other half falls between the mean and 16. Thus, 170,000 members have a score between 12 and 16. The previous paragraphs discussed problems that could be addressed by the empirical rule, which applies to problems involving regions under the normal curve within 1, 2, or 3 standard deviations, which are Z-scores. If we transform X-values to Z-scores, then it is possible to standardize the normal probability density function by letting µ = 0 and σ = 1 . The standardized normal probability density function, f ( Z ) , is given by f (Z ) = 1 1 − 2 Z2 e . 2π Most textbooks provide tables like Table E.2 in our text (Business Statistics, Levine et. al.). These tables help answer questions involving fractional units of standard deviations. Table E.2 shows the area under the curve for intervals that extend from negative infinity to a particular Z-score. The full table is in the textbook, but the segment of the table below will help answer the following four questions regarding a population with a mean of 476 and standard deviation of 20. 1) What percent of the population has a score below 506? 2) What percent of the population has a score greater than 506? 3) What percent of the population has a score between the mean and 506? 4) What percent of the population has a score between 468 and 506? 55 To use the table, convert the raw scores to Z-scores as shown below. 506 − 476 = 1.50 20 468 − 476 = = −0.4 20 Z 506 = Z 468 Use the Z-score to consult the table as below. Z 0.37 0.38 0.39 0.40 0.41 0.42 0.43 A 0.6443 0.6480 0.6517 0.6554 0.6591 0.6628 0.6664 Z 0.93 0.94 0.95 0.96 0.97 0.98 0.99 A 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 Z 1.49 1.50 1.51 1.52 1.53 1.54 1.55 A 0.9319 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 Z 2.05 2.06 2.07 2.08 2.09 2.10 2.11 A 0.9798 0.9803 0.9808 0.9812 0.9817 0.9821 0.9826 Z 2.61 2.62 2.63 2.64 2.65 2.66 2.67 A 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 Z 3.17 3.18 3.19 3.20 3.21 3.22 3.23 A 0.99924 0.99926 0.99929 0.99931 0.99934 0.99936 0.99938 The table contains the answer to question one: 93.32% of the area under the curve falls below the Z-score of 1.5. Equivalently, 93.32% of the population has a raw score below 506. For question two, we use the complement principle. If 93.32% of the data falls below the raw score of 506, then the rest of the data must fall above 506. Subtracting the answer to question one from the entirety finds the answer to question two: 100% − 93.32% = 6.68% . For question three, we will find a difference. According to the table, 93.32% of the area falls below 1.5. Also, 50% of the area falls below the mean. Thus, 43.32% (93.32 – 50 = 43.32) of the area falls between the mean and the Z-score 1.5. Equivalently, 43.32% of the population has a raw score between the mean and 506. Table E.2 in the textbook shows areas associated with negative Z-scores, but our partial table above does not. Nevertheless, we can answer question four using the symmetrical property of the normal curve. We need the area below the raw score 468, which has a Z-score of –0.4. We find the area below a positive 0.4, which is 66.64%. Subtracting the area below the mean gives the area between the mean and 0.4: 66.64% − 50.0% = 16.64% , but recalling that the normal curve is symmetrical, we know that 16.64% of the data also falls between –0.4 and the mean. Since we know from question three that 43.32% of the data falls between the mean and 1.5, we know that 59.96% of the data falls between the Z-scores –0.4 and 1.5 (because 16.64 plus 43.32 equals 59.96). Equivalently, 59.96% of the data falls between the raw scores 468 and 506. Assignment 5 56 Problems #1 Use the empirical rule for the normal distribution to answer the following two questions. A) In a survey conducted by the National Center for Health Statistics, the sample mean height of women in the United States (ages 20-29) was 64 inches, with a sample standard deviation of 2.75 inches. If the sample is normally distributed, about what percent of the women have heights between 64 inches and 69.5 inches? B) The mean mileage (in thousands) for a rental car company’s fleet is twelve and the standard deviation (in thousands) is approximately 3.2. Between what two values do 99.7% of the data lie? (Assume normality.) #2 #3 #4 #5 Assume the mean annual consumption of peanuts is normally distributed with a mean of 5.9 pounds per person and a standard deviation of 1.8 pounds per person. A) What percent of people annually consume less than 3.1 pounds of peanuts? B) What percent of people annually consume more than 3.1 pounds of peanuts? The weights of adult male rhesus monkeys are normally distributed, with a mean of 15 pounds and a standard deviation of 3 pounds. A rhesus monkey is randomly selected and weighed. A) Assume that a rhesus monkey is randomly selected and weighed. Find the probability that the monkey's weight is less than thirteen pounds. B) Assume that a rhesus monkey is randomly selected and weighed. Find the probability t hat the monkey's weight is more than seventeen pounds. According to the National Marine Fisheries Service, the lengths of Atlantic croaker fish are normally distributed with a mean of ten inches and a standard deviation of two inches. A) Assume that an Atlantic croaker fish is selected at random. Find the probability that the fish is less than seven inches in length. B) Assume that an Atlantic croaker fish is selected at random. Find the probability that the fish is between seven inches and fifteen inches in length. Assume that a normally distributed population of weights of a certain species of fish has a mean of forty pounds and a standard deviation of eight pounds. A certain fishing trawler throws back any netted fish (of the species in question) that do not weight at least sixty-four pounds. Find the probability that at least one acceptable fish will be caught per netting of 100 fish (of the species in question).