Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
10/22/2014 Properties of a Normal Distribution 5.1 Introduction to Normal Distributions x • The mean, median, and mode are equal • Bell shaped and is symmetric about the mean • The total area that lies under the curve is one or 100% Properties of a Normal Distribution Means and Standard Deviations Curves with different means, same standard deviation Inflection point Means? Inflection point x • As the curve extends farther and farther away from the mean, it gets closer and closer to the x-axis but never touches it. 10 11 12 13 14 15 16 17 18 19 20 Curves with different means, different standard deviations • The points at which the curvature changes are called inflection points. The graph curves downward between the inflection points and curves upward past the inflection points. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1 10/22/2014 Determining Intervals Empirical Rule 68% About 68% of the area lies within 1 standard deviation of the mean 3.3 3.6 3.9 4.2 4.5 4.8 5.1 x Example: An instruction manual claims that assembly time for a product is normally distributed with a mean of 4.2 hours and a standard deviation of 0.3 hour. Determine the interval in which 95% of the assembly times fall. About 95% of the area lies within 2 standard deviations 95% of the data will fall within 2 standard deviations of the mean. About 99.7% of the area lies within 3 standard deviations of the mean 4.2 – 2 (0.3) = 3.6 and 4.2 + 2 (0.3) = 4.8. 95% of the assembly times will be between 3.6 and 4.8 hrs. The Standard Normal Distribution Chptr 2: The Standard Score Standard normal distribution: mean = 0, standard deviation = 1 The standard- or z-score, represents the number of standard deviations a random variable x falls from the mean: Using z-scores any normal distribution can be transformed into the standard normal distribution. Test scores for a civil service exam are normally distributed with a mean of 152 and a standard deviation of 7. Find the standard z-score for a person with a score of: (a) 161 (b) 148 (c) 152 –4 –3 –2 –1 0 1 2 3 4 z If a normal distribution is standardized using tables, then each value must be standardized to find probabilities. 2 10/22/2014 Cumulative Areas The total area under the curve is one. Cumulative Areas Using a standard normal table, find the cumulative area for a z-score of –1.25 Sum left to right z –3 –2 –1 0 1 2 3 z –3 –2 –1 0 1 2 3 Cumulative area is close to 0 for z-scores close to –3.49 Pg. A16: down the z column on the left to z = –1.2 and across to the cell under .05 = 0.1056, the cumulative area. The probability that z is at most –1.25 is 0.1056. Cumulative area for z = 0 is 0.50 Cumulative area is close to 1 for z-scores close to 3.49 Finding Probabilities Finding Probabilities To find the probability that z is less than a given value, read the cumulative area in the table corresponding to that z-score. To find the probability that z is greater than a given value, subtract the cumulative area in the table from 1. Find P(z > –1.24). Find P(z < –1.45) P (z < –1.45) = 0.0735 0.1075 0.8925 –3 –2 –1 0 1 2 3 z Read down the z-column to –1.4 and across to .05 = 0.0735 z –3 –2 –1 0 1 2 3 The cumulative area (area to the left) is 0.1075. So the area to the right is 1 – 0.1075 = 0.8925. P(z > –1.24) = 0.8925 3 10/22/2014 Finding Probabilities Summary The probability that z is between two values: find the cumulative areas for each and subtract the smaller area from the larger. To find the probability that z is less than a given value, read the corresponding cumulative area. Find P(–1.25 < z < 1.17) z -3 -2 -1 0 1 2 3 To find the probability is greater than a given value, subtract the cumulative area in the table from 1. –3 –2 –1 0 1 2 1. P(z < 1.17) = 0.8790 3 z 2. P(z < –1.25) = 0.1056 3. P(–1.25 < z < 1.17) = 0.8790 – 0.1056 = 0.7734 Probabilities can’t be negative, so subtract smaller from larger z -3 -2 -1 0 1 2 3 To find the probability z is between two given values, find the cumulative areas for each and subtract the smaller area from the larger. -3 -2 -1 *cdf* 0 1 2 3 z Probabilities and Normal Distributions Section 5.2 Normal Distributions Finding Probabilities If a random variable, x, is normally distributed, then the probability that x will fall within an interval is equal to the area under the curve in the interval. Example: IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. Find the probability that a person selected at random will have an IQ score less than 115. 100 115 To find the area, first find the standard score equivalent to x = 115 z 115 100 1 15 4 10/22/2014 Probabilities and Normal Distributions Monthly utility bills in a city are normally distributed with a mean of $100 and a standard deviation of $12. A utility bill is randomly selected. Find the probability it is between $80 and $115. Normal Distribution Normal Distribution: = 100; = 12 100 115 P(80 < x < 115) SAME SAME Find P(x < 115). Standard Normal Distribution Application Example P(–1.67 < z < 1.25) Find P(z < 1). 0 1 From Standard Normal Table: P(z < 1) = 0.8413, so P(x <115) = 0.8413 Subtract areas under the curve: 0.8944 – 0.0475 = 0.8469 The probability that a utility bill is between $80 and $115 is 0.8469. From Areas to z-Scores Section 5.3 Find the z-score corresponding to a cumulative area of 0.9803. Normal Distributions: Finding Values z = 2.06 corresponds roughly to the 98th percentile. 0.9803 –4 –3 –2 –1 0 1 2 3 4 z Locate 0.9803 in the area portion of the table. Read the values at the beginning of the corresponding row and at the top of the column. The z-score is 2.06. 5 10/22/2014 Finding z-Scores from Areas Find the z-score corresponding to the 90th percentile. Finding z-Scores from Areas Find the z-score with an area of .60 falling to its right. .40 .90 .60 z 0 z The closest table area is .8997. The row heading is 1.2 and column heading is .08. This corresponds to z = 1.28. A z-score of 1.28 corresponds to the 90th percentile. z 0 With .60 to the right, the remaining area is .40. The closest value in the table is .4013. The row heading is -0.2 and column heading is .05. The z-score is -0.25. A z-score of -0.25 has an area of .60 to its right. It also corresponds to the 40th percentile Finding z-Scores from Areas Find the z-score such that 45% of the area under the curve falls between –z and z. .275 Cummulate from the left .275 .45 –z 0 z The area remaining in the tails is .55. Half this area is in each tail, so since .55/2 = .275 is the cumulative area for the negative z value and .275 + .45 = .725 is the cumulative area for the positive z. The closest table area is .2743 and the z-score is -0.60. The positive z score is 0.60. From z-Scores to Raw Scores To find a data value, x when given a standard score, z: z-score formula? Example: The test scores for a civil service exam are normally distributed with a mean of 152 and a standard deviation of 7. Find the test score for a person with a standard score of: 2.33, –1.75, 0 x = 152 + (2.33)(7) = 168.31 x = 152 + (–1.75)(7) = 139.75 x = 152 + (0)(7) = 152 z or standard scores are the number of standard deviations above or below the mean 6 10/22/2014 Finding Percentiles or Cut-off Values Monthly utility bills in a city are normally distributed with a mean of $100 and a standard deviation of $12. What is the smallest utility bill that can be in the top 10% of the bills? 90% Section 5.4 The Central Limit Theorem 10% z Find the cumulative area in the table that is closest to 0.90. The area 0.8997 corresponds to a z-score of 1.28. To find the corresponding x-value, use x = 100 + 1.28(12) = 115.36 $115.36 is the smallest value in the top 10%. Sampling Distributions A sampling distribution is the probability distribution of a sample statistic that is formed when samples of size n are repeatedly taken from a population. If the sample statistic is the sample mean, then the distribution is the sampling distribution of sample means. The Central Limit Theorem If a sample n 30 is taken from a population with any type of distribution that has a mean = and standard deviation = Sample x Sample Sample Sample Sample Sample then the sample means will have a normal distribution and a standard deviation of The sampling distribution consists of the values of the sample means, standard error of the mean 7 10/22/2014 Application The Central Limit Theorem If a sample of any size is taken from a population with a normal distribution with mean = , and standard deviation = Mean length of sockeye salmon is =69.2 and =2.9 cm. Random samples of 60 fish are selected. Find the mean and standard deviation (standard error) of the sampling distribution. x then the distribution of means of sample size n, will be normally distributed with a mean and a standard deviation 69.2 Distribution of means of sample size 60, will be mean Standard deviation normal. Interpreting the Central Limit Theorem Interpreting the Central Limit Theorem Mean of sockeye salmon is =69.2 cm. If a random sample of 60 fish is selected, what is the probability that the mean length for the sample is greater than 70 cm? Assume the standard deviation is 2.9 cm. Since n > 30 the sampling distribution of will be normal mean standard deviation Find the z-score for a sample mean of 70: z 2.14 There is a 0.0162 or 1.62% probability that a sample of 60 sockeye will have a mean length greater than 70 cm. What is probability that 1 fish will be > 70 cm? P(z>0.28) = 1-0.6103 = 0.3897 39% 8 10/22/2014 Application Central Limit Theorem A long time ago, the mean price of gasoline in California was $1.164 per gallon. What is the probability that the mean price for a sample of 38 gas stations in California is between $1.169 and $1.179? Assume the standard deviation = $0.049. Since n > 30 the sampling distribution of will be normal Application Central Limit Theorem P( 0.63 < z < 1.90) = 0.9713 – 0.7357 = 0.2356 z mean standard deviation Calculate the z-score for sample values of $1.169 and $1.179. .63 1.90 The probability is 0.2356 that the mean for the sample is between $1.169 and $1.179. Hint: drawing the distribution, values, and area of interest will help keep calculations clear. Central Limit Theorem Section 5.5 Creature Cast Central Limit Theorem video http://vimeo.com/75089338 Normal Approximation to Binomial Distributions 9 10/22/2014 Binomial Distribution Characteristics • There are a fixed number of independent trials, n. • Each trial has 2 outcomes, Success or Failure. • The probability of S on a single trial is p and the probability of F is q. In total: p+q=1 • We can find the probability of exactly x successes out of n trials. Where x = 0 or 1 or 2 … n. • x is a discrete random variable representing a count of the number of S’s in n trials. Application 34% of Americans have type A+ blood. If 500 Americans are sampled at random, what is the probability at least 300 have type A+ blood? Using Chapter 4 you could calculate the probability that exactly 300, exactly 301… exactly 500 Americans have A+ blood type and then add the probabilities (but this should drive you crazy). Alternatively…use normal curve probabilities to approximate binomial probabilities. If np 5 and nq 5, then the binomial random variable x is approximately normally distributed with mean = np and standard deviation Why np 5 and nq 5? 0 1 2 3 4 5 n=5 p = 0.25, q = .75 np =1.25 nq = 3.75 Binomial Probabilities The binomial distribution is discrete with a probability histogram graph. The probability that a specific value of x will occur is equal to the area of the rectangle with midpoint at x. Example: If n = 50 and p = 0.25 find n = 20 p = 0.25 np = 5 nq = 15 0.111 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 n = 50 p = 0.25 np = 12.5 nq = 37.5 0 10 20 30 40 Add the areas of the rectangles with midpoints at x = 14, x = 15, and x = 16: 0.111 + 0.089 + 0.065 = 0.265 14 0.089 15 0.065 16 50 10 10/22/2014 Correction for Continuity Correction for Continuity Use the normal approximation to the binomial distribution to find . 14 14 15 16 Values for the binomial random variable x are 14, 15 and 16. Normal Approximation to the Binomial Use the normal approximation to the binomial to find . Find the mean and standard deviation using binomial distribution formulas: 15 16 The interval of values under the normal curve is To ensure the boundaries of each rectangle are included in the interval, subtract 0.5 from a left-hand boundary and add 0.5 to a right-hand boundary. Application A survey of Internet users found that 75% favored government regulations of “junk” e-mail. If 200 Internet users are randomly selected, find the probability that fewer than 140 are in favor of government regulation. Since np = 150 5 and nq = 50 5, can use the normal approximation to the binomial distribution. Adjust the endpoints to correct for continuity P(13.5 ≤ x 16.5). Convert each endpoint to a standard score: The binomial phrase of “fewer than 140” means up to 139: 0, 1, 2, 3…139. Use the correction for continuity to translate to the continuous variable in the interval . Find P(x< 139.5). 11 10/22/2014 Application A survey of Internet users found that 75% favored government regulations of “junk” e-mail. If 200 Internet users are randomly selected, find the probability that fewer than 140 are in favor of government regulation. Use the correction for continuity P(x<139.5). P( z < -1.71) = 0.0436 The probability that fewer than 140 are in favor of government regulation is approximately 0.0436. 12