Download S03B Normal Distributions - Dixie State University :: Business

William Christensen, Ph.D. The Standard Normal Distribution 0      The Density Curve (or probability density function) shown above gives us a good picture of a Standard Normal Distribution. Although all normal distributions have a bell-shape, a “standard” normal distribution has a mean = 0 and a standard deviation = 1 Also, the total area under the curve is exactly 1 Remember: we previously learned that the sum of all probabilities in any probability distribution also must add up to exactly 1 Hopefully you get the picture and realize there is an important relationship between the density curve shown above (with a total area under the curve of 1) and probability distributions where the sum of all individual probabilities is also 1 The Standard Normal Distribution      In the Standard Normal Distribution we integrate the idea of “area-under-the-curve” with what we learned about probability This concept is at the heart of statistics and it is critical that you understand, so let’s review a little and take it step-by-step First, know and remember that a normal distribution has a classic bell-shape and is perfectly symmetrical, with the mean right in the middle Remember we when talked about the concept of “unusual” and defined it as any value or observation more than 2 standard deviations from the mean? The following slide shows us the general relationship between “area under the curve” and probability. Let’s break it down The Empirical Rule For Normal Distributions 99.7% of data are within 3 standard deviations of the mean 95% within 2 standard deviations 68% within 1 standard deviation 34% 34% 2.4% 2.4% 0.1% 0.1% 13.5% x - 3s x - 2s 13.5% x-s x x+s x + 2s x + 3s Area-under-the-curve and Probability  OK, the previous slide shows us some good stuff      Notice how you can add up all the sections and they add to 1 Let’s consider some other examples from the previous slide: It shows us that the area-under-the-curve or PROBABILITY of an x value being between the mean and 1 standard deviation above the mean is 0.34 or 34% It also shows us that the area-under-the-curve or PROBABILITY of an x value being between within plus or minus 1 standard deviation of the mean is 0.68 or 68% (34% + 34% = 68%) It shows us that the area-under-the-curve or PROBABILITY of an x value being more than 2 standard deviations above or below the mean is how much? How about 2.4% + 0.1% on one side plus 2.4% and 0.1% on the other side for a total of 0.05 or 5%. Etc., etc., etc. You should become very familiar with this way of thinking and of finding the P(x) by looking at the area-under-the-curve Standard Normal Distributions and area-under-the-curve     A Standard Normal Distribution allows us to use z-scores to do the same thing we did in the previous slide Remember what a z-score is? It represents the number of std. deviations an x value is from the mean Well, if µ = 0 and σ =1 (a Standard Normal Distribution) then x values and z-scores are the same number and we get an density curve that looks like this Again, we can read this graph just like the previous slide    We see the area-under-the-curve or PROBABILITY of an x value between the mean and z = 1 is 0.34 or 34% We also see the area-under-the curve or PROBABILITY of an x value of more than z=2 above the mean is 0.025 or 2.5% (2.4% + 0.1%). Etc., etc., etc. This is great, but what if we want to know the PROBABILITY of an x value falling between the mean and z=0.5 (or 0.5 std. deviations above the mean). We need some way to find all these in-between areas or Probabilities 34% 34% 2.4% 0.1% 2.4% 0.1% 13.5% -3 -2 -1 13.5% 0 1 2 3 In a Standard Normal Distribution, x values are the same as z-scores because an x of 1 is 1 standard deviation from the mean (0). A negative z-score means you are below the mean and a positive z-score means you are above the mean. Using z-scores to find Probability or area-under-the-curve  The Excel function NORMSDIST allows us to find the area-underthe curve from the extreme leftside of a normal distribution out to any z-score 0 z =NORMSDIST(z), where z is the z-score for any x value, provides the area-underthe-curve or PROBABILITY from the extreme left out to the z-score Note: Since the total area-under-the-curve = 1, and the distribution is symmetrical (mirror image on each side), then it should make sense that the total area-under-the-curve below the mean = 0.5, and the total area-under-the-curve (or probability) above the mean also = 0.5 Using z-scores to find Probability or area-under-the-curve  Example: try using Excel to find the area-underthe-curve from the extreme left to a z-score of 1 0 z=1 Excel gave us this area or probability. This means the probability of an x value falling from anywhere below the mean up to z=1 (or one standard deviation) above the mean is about 0.841 of 84.1%. Notice that since the entire left-half of the distribution has an area = 0.5, then we can calculate the area from the mean (0) to the z=+1 as 0.841345 – 0.5000 = 0.341345. Using z-scores to find Probability or area-under-the-curve  Example: With the same information as the previous slide, can you calculate the area-underthe-curve from a z-score of 1 to the extreme right? 0 z=1 Since we know the total area-underthe-curve equals 1, and the area from the extreme left over to z=1 is 0.841345, then the remaining area must be the difference between 1 and 0.841345, or in other words 1 – 0.841345 = 0.158655. This means there is a probability of about 0.159 or 15.9% of finding an x value with a z > +1 (more than 1 std. deviation above the mean). Cool eh! Using z-scores to find Probability or area-under-the-curve  Example: This time, let’s use Excel to find the area-underthe-curve from the extreme left to a z-score of -0.5 (remember a negative z-score means we are below the mean) 0 z = -0.5 IMPORTANT: Although z-scores can be negative, area or probability is NEVER negative. Here Excel gave us an area or probability of +0.308538. This means the probability of finding an x value with a z-score less than -0.5 (below the value which is 0.5 std. deviations below the mena) is about 0.3085 or 30.85%. Using z-scores to find Probability or area-under-the-curve  Example: With the same information, let’s find the area-under-the-curve from the extreme right down to the z-score of -0.5 0 z = -0.5 Since the total area-under-the-curve must be equal to 1, the area to the right of our z=-0.50 must be the difference between 1 and 0.308538, or 1 – 0.308538 = 0.691462. This means the probability of finding an x value with a z-score greater than -0.5 (greater than a value which is ½ std. deviation below the mean) is about 0.6915 or 69.15%. Normal Distributions There are many distributions that are normally distributed, but that are not “standard”. That is, they are bell-shaped but do not have a mean of 0 or a standard deviation of 1. Here we see the normally distributed heights of men and women. Women: µ = 63.6 inches σ = 2.5 Men: µ = 69.0 inches inches σ = 2.8 inches 63.6 69.0 Height (inches) Normal Distributions • • • We can convert any normal distribution to a standard normal distribution (and thus be able to find probabilities (area-under-the-curve) We do this by changing x values to z-scores For example: we see the mean height of men is 69.0 inches and the std. deviation of men’s heights is 2.8 inches. What is the z-score for the mean (x = 69.0)? How many std. deviations is 69.0 from the mean? Intuition alone should tell you that 69.0 is 0 std. deviations from itself (69.0) • • We can also use the formula we learned z = (x - µ) / σ = (69 – 69) / 2.8 = 0 We can also use the Excel function =STANDARDIZE(x,mean,std.dev.) to calculate the z-score for any x value where you know the mean and std. deviation of the distribution. In this case =STANDARDIZE(69.0,69.0,2.8) = 0 Women: µ = 63.6 inches σ = 2.5 inches Men: µ = 69.0 inches σ = 2.8 inches 63.6 69.0 Height (inches) Normal Distributions • • Example: Try this. Can you calculate the z-score for a woman’s height of 6 feet (that’s 12 x 6 = 72 inches)? • • • Using the formula: z = (x - µ) / σ = (72 – 63.6) / 2.5 = 3.36 Using Excel =STANDARDIZE(72,63.6,2.5) = 3.36 This means that 72 inches is 3.36 standard deviations above the mean height So now the big question. What is the probability of a woman being 6 feet (72 inches) tall or taller. Or, another way of asking this is: what percent of all women are 6 feet tall or taller? • • We can solve this using Excel in two different ways: First, we could us the =NORMSDIST(z) function we just learned and simply plug in the z=3.36 we just calculated =NORMSDIST(3.36) = 0.99961, but remember this is the probability or area from the extreme left up to z=3.36. We actually want to know the area of probability above that (72 inches or greater). We know the total area must equal 1, so the difference between 1 and 0.99961 must be that remaining area. 1 – 0.99961 = 0.00039 This means the probability of a woman being 6 feet or taller is 0.00039 or 0.0391% (about 4 in 10,000). Another way of saying this is that about 0.039% of all women are 6 feet or taller Women: µ = 63.6 inches σ = 2.5 inches 63.6 Height72 (inches) Normal Distributions • • Example: Can you calculate the z-score for a woman’s height of 6 feet (that’s 12 x 6 = 72 inches)? What is the probability of a woman being 6 feet (72 inches) tall or taller. Or, another way of asking this is: what percent of all women are 6 feet tall or taller? • • We can solve this using Excel in two different ways: Here is the second (and even easier) way I promised to show you. Excel has another function that combines the calculation of a zscore with finding the area or probability from the extreme left to that z-score. =NORMDIST Women: µ = 63.6 inches σ = 2.5 inches 63.6 Height72 (inches) Normal Distributions • Example: What is the probability of a woman being 6 feet (72 inches) tall or taller. Or, another way of asking this is: what percent of all women are 6 feet tall or taller? • Using Excel’s NORMDIST function we can answer this very easily The “cumulative” option, if set to “true” or “1” provides the cumulative probability or the total area from the extreme left to the x value we input. This is normally how we would use this function. However, just so you know, if this option is set to “false” or “0” it will return the exact probability of x occurring. In this case, if set to “false” or “1” it would tell us the probability of a woman being exactly 72 inches tall. Women: µ = 63.6 inches σ = 2.5 inches 0.99961 is the area or probability that includes everything below the mean plus everything above the mean up to z=3.36 or 3.36 std.deviations above the mean 63.6 72 Height (inches) • • Finding an x value when given an area or probability Often we are interested in finding the value for x (in this case Men’s height) given a certain probability. For example, using the information shown below, we might want to know what height separates the 10% of tallest men. Men: µ = 69.0 inches σ = 2.8 inches 10% tallest men 90% of men Height (inches) 69.0 ? Finding a z-score or x value when given an area or probability • • • Remember that probability is measured by the area-underthe-curve To find the z-score associated with a given probability or area-under-the-curve we can use the Excel function NORMSINV To find an x value (within a distribution in which we know the mean and standard deviation) we can use the Excel function NORMINV Men: µ = 69.0 inches σ = 2.8 inches 10% tallest men 90% of men Height (inches) 69.0 ? NORMSINV • • =NORMSINV(probability) gives us the z-score (number of standard deviations from the mean) for whatever cumulative probability we enter into the function The probability we input always represents the cumulative area starting from the extreme left and going out until the area-under-thecurve equals the probability we input Probability represents cumulative area from extreme left 0 z=1 NORMSINV • For example: • To find the z-score (number of standard deviations from the mean) for a cumulative probability of 0.85 or 85%, use =NORMSINV(0.85) In others words, the z-score associated with a cumulative probability of 0.85 or 85% is 1.036433 85% of area 0 z=1 NORMINV • • =NORMINV(probability,mean,stand ard_dev) gives us the x value associated with a probability we input (given we also have the mean and standard deviation of the distribution which x is part of) Again, the probability we input always represents the cumulative area starting from the extreme left and going out until the area-underthe-curve equals the probability we input Men: µ = 69.0 inches σ = 2.8 inches 10% tallest men 90% of men Height (inches) 69.0 ? NORMINV • • • • For example: Remember we wanted to find the men’s height that sets apart the tallest 10% of men For Men’s Heights we were given a mean of 69.0 inches and standard deviation of 2.8 inches We now have all the ingredients and can input them into the Excel function =NORMINV(0.90,69.0,2.8) In others words, the height that separates the 10% of tallest men is 72.588 inches. We could also say that only 10% of all men are over 72.588 inches tall 90% of men Height (inches) 69.0 Men: µ = 69.0 inches σ = 2.8 inches 10% tallest men 72.588” Exercises  Given that the population of women have a mean weight of 143 lbs., with a standard deviation of 29 lbs., use what you just learned to answer the following questions 1. 2. 3. 4. What percent of women would you expect to weight less than 130 lbs? If you randomly select one woman, what is the probability that she will weigh more than 150 lbs? What percent of women weigh more than 110 lbs? What percent of women weigh between 130 and 160 lbs? For answers, either email me a single Excel file containing all your work, or come to one of the scheduled open lab sessions and bring your work Women’s Weight in lbs. x = 143 s = 29 143 Caution!!!  1. Don’t confuse z scores and probabilities  z scores represent the number of standard deviations an x value is from the mean. z scores are negative if their corresponding x value falls below the mean, and z scores are positive if their corresponding x value falls above the mean.    For example, for a distribution with a mean of 100, the z score for any value less than 100 would be negative, and the z score for any x value greater than 100 would be positive Or, for a distribution with a mean of -50 (negative fifty), the z-score for -60 would be negative, and the z-score for -40 (above the mean) would be positive. Probabilities are represented by the area-under-the-curve and must always always always be between 0 and 1   If you ever think you have a probability that is less than 0 or greater than 1 then you have made a serious error The probabilities that Excel gives you are cumulative, meaning that they represent the area from the extreme left (beginning) of the distribution out to some z score or x value. The area or probability you are interested in might not be the exact probability that Excel gives you, but you can always find the area you are interested in by remembering that the total area-under-the-curve is 1 and adding or subtracting areas appropriately. Some of the exercises I gave you will let you some practice doing this. The Central Limit Theorem Central Limit Theorem Everything we just learned about probability and normal distributions only applies, of course, when our population is normally distributed. And, in fact, many things in nature and science and behavior are normally distributed. However, there are also a number of things of interest that may not be normally distributed. The “Central Limit Theorem” provides us with a neat little trick that can turn ANY distribution into a normal distribution and therefore allow us to use what we learned about area and probabilities. Here is what we need to do, according to The Central Limit Theorem, to change a nonnormal distribution into a normal distribution 1. Rather than looking at each individual x, we need to take the mean/average of randomly selected groups of x’s (usually groups of 3, 4, or 5 individual x’s) and we have to have a reasonable number (30 or more, but the more the better) of these groups. 2. It is the mean of these groups that forms our new “normally distributed” population   The mean of our new “normal distribution” (composed of group means) is simply the mean of the group means – often called the mean of means The standard deviation of our new distribution is calculated as the standard deviation of the individual x’s divided by the square root of n (where n is the number of individual x’s in each group) Central Limit Theorem the mean of the sample means µx = µ the standard deviation of sample means (often called standard error of the mean) σx = σ n Where n is the number in each group Central Limit Theorem Here is an example of how we can use the Central Limit Theorem • A study was done in which 50 social security numbers were randomly selected. • The last 4 digits of these numbers were put into the following histogram (4 x 50 = 200 numbers total). • Does this look like a normal distribution (bell-shaped)? • I hope you can see that it DOES NOT. In fact, it looks a lot like what is called a uniform distribution (all the numbers have about the same number of occurrences) So here we go, we have a non-normal distribution that we would like to turn into a normal distribution by using the Central Limit Theorem Frequency 20 10 0 0 1 2 3 4 5 6 7 8 9 Central Limit Theorem As already mentioned, to use the Central Limit Theorem we first put the data into groups. • In this case, the data is naturally in groups (groups of 4 digits) • We next take the mean or average of each group of 4 digits as shown in the table (the mean of these means becomes the mean of our new transformed distribution) • The standard deviation of our new transformed distribution is the std. deviation of the original data divided by the square root of the number in each group (for this example n = 4) last 4 digits of SS# 1 8 6 4 5 3 3 6 9 8 8 8 5 1 2 5 9 3 3 5 4 2 6 2 7 7 1 6 9 1 5 4 5 3 3 9 6 2 2 5 0 2 7 8 5 7 3 4 4 4 5 1 3 6 7 3 7 3 3 8 3 7 6 1 9 5 7 8 6 4 0 7 mean 4.75 4.25 8.25 3.25 5.00 3.50 5.25 4.75 5.00 5.25 4.25 4.50 4.75 3.75 5.25 3.75 4.50 6.00 Central Limit Theorem Frequency A look at the histogram of our new transformed distribution (the distribution of means of the last 4 digits of 50 SS#’s) looks very much like a normal distribution should (i.e., bell-shaped) • Here we had 50 groups, and the more sample groups we have, the closer the distribution of means will be to a pure normal distribution • There you go – that’s how the Central Limit Theorem transforms a non-normal distribution of individual samples into a normal distribution of sample means 15 10 5 0 0 1 2 3 4 5 6 7 8 9 Determining Normality (How to know if a distribution is normally distributed) Although the Central Limit Theorem is usually used to transform nonnormally distributed data into a normal distribution (by taking the distribution of the means of randomly selected groups), it can also be used on data that is already normally distributed in order to examine probabilities associated with groups of x’s rather than individual x’s. For example, if we wanted to know the probability that the average weight of a group of 5 randomly selected women is greater than 160 lbs., we could use the Central Limit Theorem to do that. See if you can come up with the correct answer to that question (hint: the key is to remember that the standard deviation of our new distribution is σ / sqrt(n) where n is the group size, n=5 in this case). The next slide discusses how to test whether or not data is normally distributed Procedure for Determining Whether Data Have a Normal Distribution There are various sophisticated methods for determining normality and if you progress in statistics you will learn these methods. However, for this elementary course, I only expect you to be able to test for “normality” (whether or not a sample comes from a normally distributed population) by looking at the following: 1. Histogram: Construct a histogram. Reject normality if the histogram departs dramatically from a bell shape. 2. Outliers: Identify outliers. Reject normality if there is more than one outlier present. An outlier is an extremely small or extremely large value that appears inconsistent with the rest of the data William Christensen, Ph.D.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download S03B Normal Distributions - Dixie State University :: Business