Download Chapter 9: Normal Curve

1/21/2016 Normal Curve Chapter 9: Normal Curve Objectives for This Chapter Understand the basic characteristics of the standard Normal curve. Apply the Z table. Assess percentile rank of a score. Assess percent frequency (number of scores) above/below/between points on the curve. Assess percent of curve above/below/between score(s). Assess probability of attaining certain scores and finding deviant scores. Assess area of curve above/below/between score(s). I march across the desert and all i have to show for it is this stupid Rock? Discovery of The Rosetta Stone (../../images/normal/soldiers.jpg)The young French soldier was hot, thirsty and tired. He had been involved in Napoleon's Egyptian expedition for months. He'd marched through the desert with no water, chasing after strange pools of liquid that miraculously appeared on the horizon, and then cruelly disappeared as he approached them. He'd also choked on storms of dust that raced across the landscape. Now he and his companions had been assigned to tear down an ancient wall so they could build an extension to Fort Julien. Backbreaking work. As he pried another stone out of the wall, its color struck him as being odd. It was dark, almostblack actually. On one side that was flat there appeared to be writing of some sort. Not French, to be sure, but some strange script. No, not just one typeeither, but three different types. Very odd. He (../../images/normal/rosetta_stone.jpeg)decided to call over the officerin charge over to take a look at it. Click images to enlarge. What had been discovered was a large stone on which the same message had been written in three different languages; the upper text is Ancient Egyptian hieroglyphs, the middle portion Demotic and the lowest was Ancient Greek. This stone is known as the Rosetta Stone. Because it presents essentially the same text in all three scripts (with some minor differences between them), it provided the key to the modern understanding of Egyptian hieroglyphics (http://en.wikipedia.org/wiki/Egyptian_hieroglyphs). (http://en.wikipedia.org/wiki/Ancient_Greek) We aren't troops in a foreign country, but we are on a journey across what may sometimes feel like a hot and unforgiving desert. Additionally, I've heard more than one student mutter under her breath that learning statistics http://www.derekborman.com/230_web_book/module3/normal/index.html 1/17 1/21/2016 Normal Curve seems an awful lot like deciphering hieroglyphics. Fortunately, if you brave the burning sands, the scorching winds, the poisonous scorpions, thirst, exhaustion, feelings of hopelessness and the occasional fainting spell...where was I going with this...Oh, yes...you will be provided with the key to the modern understanding of statistics. Not only will you have the key to understanding modern statistics, but you will have new insight into everything from politics to medicine to sports to business. Though it may not feel like it, we will be talking about nothing less than the heart of scientifically informed analysis and desicion making. Like the Rosetta Stone, the Normal distribution or Normal curve allows us to make several translations: from scores to percentiles and probabilities, from probabilities and percentiles to scores, and from scores in one set of units to scores in other units. This is the focus of this chapter. The Normal curve is also called the bell curve because of its shape. The creation of this curve is often attributed to the brilliant German mathematician, Karl Friedrich Gauss (17771855). Thus, it is sometimes called the Gaussian distribution. Actually, the mathematical equation that generates the Normal curve was introduced by Abhaham De Moivre (16671754). He was a sharp guy. He even spent time in the company of Sir Isaac Newton. Why do we use the Normal curve? The Total Area Always equals 100% (../../images/normal/different curves.GIF)Normal distributions are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails. Normal distributions are sometimes described as bell shaped. Here are some examples. Notice that they differ in how spread out they are. But even though the shapes are different, the area under each curve is the same. The total area accounted for under any curve is 100%. This never changes and is critical to our understanding of how to apply the Normal distribution. Because we know that the area under the curve is always 1 or 100%, we can understand a lot about individual scores and groups of scores to which the Normal distribution is applied. We'll need a Normal curve table for this. More on that one, shortly. Click images to enlarge. The Normal curve has other characteristics that are always true. Once again, the fact that we can always count on these characteristics provides a good model for understanding numeric trends in data. The following are other important characteristics of the Normal curve: 1. All Normal curves are symmetric around the mean of the distribution. In other words, the left half of the Normal curve is a mirror image of the right half. 2. All Normal curves are unimodal. Because Normal curves are symmetric, the most frequently observed score in a Normal distribution— the mode— is the same as the mean. 3. Since the Normal curves are unimodal and symmetric, the mean, median, and mode of all Normal distributions are equal. 4. All Normal curves are asymptotic to the horizontal axis of the distribution. Scores in a Normal distribution descend rapidly as one moves along the horizontal axis from the center of the distribution toward the extreme ends of the distribution, but they never actually touch it. This is because scores on the Normal curve are continuous and held to describe an infinity of observations. http://www.derekborman.com/230_web_book/module3/normal/index.html 2/17 1/21/2016 Normal Curve 5. All Normal curves have the same proportions of scores under the curve relative to particular locations on the horizontal axis when scores are expressed as areas, percentiles, probabilities, etc. Hey, It's As Natural as Big Feet! As we discussed in a previous chapter, research in any field must deal with variability. We know that too much variability probably means that we have more error in our methods and data, whereas less variability is one indication that our methods and data comprise less error. So, less variability is good, but there will always be some. Why? Everyone doesn't respond the same way to the same medication; different people have different memory abilities; some people are taller and some people are shorter. Turns out that variability is natural, as is the Normal distribution. In other words, organisms inherit physical and derivative psychological traits...well..."Normally." Jack Links - Messin ... We take this Normality as a common pattern or "process" of nature and our "observations" of it. And even though we may not be able to identify all of the factors (we never will, by the way) that make up the thing we like to call "intelligence," when we measure this thing in large numbers and with proper research methods, we get that nice Normal curve. Go figure! Consider the phenomenon of Bigfoot or Sasquatch. I know...I know. A fairytale. Right? Like there could really be populations of halfhuman/halfape creatures that exist in various remote locations and are only detectable through their forensic remains. Before we dismiss it too quickly, let's try the hard thing. Let's try to argue FOR the existence of Sasquatch based on Normality. How could we do this? (../../images/normal/bigfoot.jpg)As you may know, footprints are the standard stock in trade of Sasquatch research, and their sometimes inhuman length assures almost immediate measurement, even by firsttime witnesses. The process here consists of foot lengths and the observations are the measurements of footprints. Foot lengths are going to be affected by a lot of factors: Gender of the creature. Family genetics. Nutrition. Surface from which the foot lengths were measuredsnow, mud, grass, etc. Length of time between the creation of the footprint and its measurement. Amount of alcohol consumed by everyone involved. It's complicated! Nonetheless, as can be seen here, a sample of 410 independently collected footprints (ostensibly left by a Bigfoot) forms a fairly Normal curve (with frequency plotted on the y axis and foot length plotted on the x axis). The Normal distribution overall argues compellingly for the existence of Sasquatch as a genuine species, in that production of fictitious data over 40 years by hundreds of people independently of each other would likely have generated a distribution with many peaks. A further factor that supports the authenticity of the data is the fact that foot length, foot width, heel width, and gait are interrelated in a logical and cohesive fashion, a congruence not plausible by pure chance. Hmmmm....very interesting. Are you a true believer, yet? If you want to learn a little more about forensic research on the big fella, you can read this research paper (http://www.bfro.net/ref/theories/whf/fahrenbacharticle.htm). (../../images/normal/SAT_chart.png)Why don't we frame this in less cryptozoological terms. Let's look at the SAT. The process here consists of the students taking the test, and the observations are the students’ scores. Now, my score, for example, is going to be due to a whole set of different factors: my IQ, what I had for breakfast, how much I studied the night before, how http://www.derekborman.com/230_web_book/module3/normal/index.html 3/17 1/21/2016 Normal Curve good my teachers are, which butterflies were flapping their wings in Beijing this morning, and so on. In short, my score is the result of a whole set of hardtopredict factors. The same with my fellow students. And yet, even though all these factors are hard to predict, if you take the scores of a large number of students from a single population, the scores will be Normally distributed as you see here. Once again, when we see such a Normal curve in our data, we're inclined to think that we're on the right track. Have A Go At Sir Francis Galton's Quincunx Of course, this same kind of Normal distribution shows up all over the place, just about anywhere we observe a large number of samples from a process that is the sum of many independent probabilistic factors. In fact, Francis Galton invented a machine, the quincunx, designed to illustrate how multiple probabilistic factors can add up and result in a Normal distribution. Below is an interactive quincunx. Experiment with it and see if it gives you greater insight into sampling, probability and how Normal curves are created. Make sure to select the "Auto Drop" option so that you don't have to keep clicking to drop individual balls. So, what's really going on here? The little gray balls drop down through the fixed black pegs to the bins below. If http://www.derekborman.com/230_web_book/module3/normal/index.html 4/17 1/21/2016 Normal Curve you drop more and more little gray balls, their pattern in the bins increasingly resembles a Normal distribution. Another way to think about it is that there are more pathways to the middle of the distribution and fewer pathways to the tails of the distribution. Therefore, the probability of balls landing in one of the middle bins is higher. We might call it the "path of least resistance" which might also be referred to as an average of sorts. If you want to know more about the math and probability underlying the functioning of the quincunx, you can look at this information on Pascal's triangle (http://en.wikipedia.org/wiki/Pascal%27s_triangle). If we performed a realworld simulation with actual threedimensional balls, bins, etc., there would be numerous factors (beyond simple chance) contributing to the final location of each ball dropped. Factors such as air currents, irregularities in ball shape, and other hardtoanalyze factors would make it very difficult to predict the bin in which any individual gray ball would land. However, as a whole, the pattern becomes very predictable as the number of “samples” increases. The individual peculiarities of each gray ball’s trajectory are indeed wholly due to the combined influence of many independent “accidents." This, as Galton predicted, eventually yields a Normal distribution. The 689599 Rule The standard normal curve is a special example of the normal distribution. The height of a Normal distribution can be specified mathematically in terms of two population parameters: the mean (μ) and the standard deviation (σ). Instead of calculating our curve parameters in painstaking, mathematic long hand, we will simply use sample statistics (s and xbar) to estimate the properties or distribution shape of our actual population. In other words, we can do some shortcutting. Every time you look at a group of scores (sample of data), you want to be thinking about those scores as comprising a shape. Even though you will see data listed in groups andcolumns, underneath every data set is a shape. Whenever we perform statistical analyses, we're hoping that this shape comes as close as possible to bell shaped or Normal. As we move along with our discussion, this idea of "shape" will become more concrete. (../../images/normal/68_95_99_curve.png)The distances along the horizontal axis of our curve, when divided into standard deviations, will always include the same proportion of the total area: Between 1 and +1 standard deviation units lies about 68% of the area. Between 2 and +2 standard deviation units lies about 95% of the area. Between 3 and +3 standard deviation units lies about 99% of the area. This is true of a standard normal curve whether it is perfectly bellshaped, a little narrower or a little wider. This graphic depicts the approximate 689599 breakdown for a bellshaped, standard normal curve. Click image to enlarge. This conception of the normal curve starts to become powerful when we "map" it onto normally distributed variables. One example of a variable that forms a normal curve is I.Q. In this case, we can tell what percentage of people are in any area of the curve. A normal distribution of 1000 cases will have 683 (about 68%) people between +/1 standard deviation, about 954 (about 95%) people between +/2 standard deviations, and 997 (about 99%) people between +/3 standard deviations. Only 3 people will be outside 3 standard deviations from the mean, if the sample size is 1000. In other words, in a perfectly normal distribution based on such data, we would expect only about three people to have I.Q. scores above and below the I.Q. scores associated with z scores of +3 and 3. Review of Z Scores http://www.derekborman.com/230_web_book/module3/normal/index.html 5/17 1/21/2016 Normal Curve We discussed standard scores, or z scores in a previous chapter. A standard score or z score is the deviation of a raw score from the mean in standard deviation units. Each standard deviation unit represents a specific distance, expressed in the units of the sample scores. When we have normally distributed data, our deviation units will go out about three up and three down before we almost run out of curve. We have two formulasone that allows us to calculate a z score from a raw score and one that allows us to calculate a raw score from a z score. Z scores can also be positive or negative. The sign of the z score tells the direction of the score relative to the mean: Negative zscores represent raw scores below the mean, and positive z scores indicate scores above the mean. Let's consider another "big foot" example to refresh our memories about the use of these formulas. Suppose we find fo a sample of women that the average shoe size is 8.25, with a standard deviation of 1.17. What will be the z score, or standard score, corresponding to a shoe size of 10.5? Using this formula, we would first subtract 8.25 from 10.5, giving us a difference of 2.25. Then, we would divide this difference by 1.17. The answer would be 1.92. Based on this sample, a woman who wears a size 10.5 shoe will be 1.92 standard deviation units above the mean. And because you know a little bit about the shape of normal curves and how z scores relate to it, you could also conclude that very few women from this distribution would have larger feet, but many would have smaller feet. Continuing with the sam example, what size shoe will be worn by a woman who is 2.25 standard deviation units below the mean? In this case, we have been given the z score, z = 2.25, and we are asked to convert it to a raw score. Using this formula, we multiply 2.25 by 1.17. This gives us 2.63, which we then add to the average (8.25) of the distribution. In the end we figure out that for a z score of 2.25, the corresponding shoe size in this distribution is 5.62. This woman will probably have to try on a 5 ½ and size 6 to get a pair of shoes that fits. What is a Z Table And How Do I Use It? Now that we know about the 689599 Rule and we know how to convert raw scores to z scores and vis versa, what do we do with all of this amazing knowledge? We pull out our own "Rosetta Stone," light some candles, call up the ghost of Napoleon and make him perform all of your statistical calculations. Hey, you gotta blame somebody for all the pain you're enduring in this class. Why not throw it all on the dead guy who started this whole thing. This normal curve table really is our Rosetta Stone. It can provide information about the population from which our sample was drawn. Moreover, all of the inferential statistics in the world derive from the assumption that whatever variables we study have underlying distributions that come close to Normal. In other words, it sort of all comes back to this table. We will use this table to answer some simple but important questions about data. This table opens the door to many applications. This Table can be expanded to full screen or you can zoom in to enlarge the image within the space. You have a full copy of this table in your Instructional Manual. In this table, the z score is in the far left column. Each column to the right represents a given z score to the hundredths decimal place. Values in this chart show the area/percent/probability BELOW a certain point(the z score) in the normal curve. For example, the percent area below a Zscore of 1.22 is 11.12%. Reflexively, we know that the area/percent/probability ABOVE a Z score is 100 minus the value listed in the table. In other words, Below area + Above area = 100% or 1.0. A negative zscore value is to the left of the mean. A Positive zscore value is to the right of the mean. You must keep these ideas in mind if you are to use the table correctly. http://www.derekborman.com/230_web_book/module3/normal/index.html 6/17 1/21/2016 Normal Curve Let's look at a specific z score to see what we can extract from the table. How about a z score of +.52? First, to find this z score, we would scroll down the the second part of the tablethe part that (../../images/normal/z1.png)depicts positive scores. In the ZScore column, I would go down to the row with 0.5. Next, I would follow the headings of the columns over to 0.02. Finally, I would identify the intersection of the selected row and column at 69.85. In other words, about 70% of the distribution is below a z score of +.52 and about 30% of the distribution is above. So, what about a z score of .52? Easy. We would just flip our previous conclusions30% of the curve is below and about 70% is above. Don't believe me? Look at the part of the Z Table depicting negative scores that is, z scores below the mean. Your table indicates that 30.15% of the distribution is below a zscore of .52. Of course, this means that 70% is above. What if we wanted to find out how much of the distribution is between a score of .52 and mean (midpoint) of the distribution? To answer this question, you have to keep in mind two things: FIRST, the area above a z score of 0 represents 50% of the distribtuion, as does the area below. Together, the two areas comprise 100% of the distribution. SECOND, the areas depicted in our Z Table are for areas below a given z score. So, here are the rules for determining the amount of area between a given z score and the middle of the distribution: 1. IF THE Z SCORE IS POSITIVE, subtract .50 from the value associated with that given z score. 2. IF THE Z SCORE IS NEGATIVE, subtract the value associated with that given z score from .50. Play Time!! This is some pretty nasty stuff to get your head around without getting your hands dirty. Below is a zscore calculator. Explore the different options. Spend some time with it. Mentally note how changing one parameter leads to changes in another. Use the Z Table to predict what will happen if you enter a certain z score in the calculator. What will happen to the shading? To the area above the z score? Below it? Notice how percentile, area and probability almost seem to be "saying" the same thing. Hmmm...... http://www.derekborman.com/230_web_book/module3/normal/index.html 7/17 1/21/2016 Normal Curve What is a Percentile Rank and How Do I calculate it? Percentiles and percentile ranks are frequently used as indicators of performance in many fields, from psychology to medicine to education to business. Percentiles and percentile ranks provide information about how a person or thing relates to a larger group. Relative measures of this type are often extremely valuable to researchers employing statistical techniques. Characteristics of Data Distributions Remember that Z Table to which you were introduced, a couple of sections ago? That table is your Rosetta Stone for understanding the language or characteristics of standard normal distributions. Whereas with the Rosetta Stone, scholars have been able to interpret ancient Egyptian, Demotic and Greek, the Z Table will help us to understand distribution characteristics such as percentile rank, area, probability, percent, and percent frequency. One reason the normal distribution is important is that many psychological and educational variables are distributed approximately normally. We already discussed this. http://www.derekborman.com/230_web_book/module3/normal/index.html 8/17 1/21/2016 Normal Curve A second reason the normal distribution is so important is that it is easy for mathematical statisticians to work with. This means that many kinds of statistical tests can be derived for normal distributions. Almost all statistical tests discussed in this book assume normal distributions. Fortunately, these tests work very well even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normality. Finally, if the mean and standard deviation of a normal distribution are known, it is easy to convert back and forth from raw scores to percentiles to areas to frequencies. Stay with it, and you will come to understand why these distribution characteristics are so important. Finding And Understanding Percentile Ranks of Individual Scores A percentile rank is the percentage of cases up to and including the one in which we are interested. Turns out that is exactly the information our Z Table gives us. So, calculating percentile ranks is a fairly straightforward procedure. One important thing to remember that a percentile rank tells us about a single score. To be more concrete, assume a test in Introductory Psychology is normally distributed with a mean of 80 and a standard deviation of 5. boy guessing Have you drawn it? You already know how. Even though you have the image right here, go ahead and draw the curve on your own. DRAWING IS A BIG DEAL in this chapter. Trust me. It helps with understanding and will only take a few seconds for each problem. What is the percentile rank of a person who received a score of 70 on the test? Before we calculate this one and check our Z Table, can you make a guess about the answer? Bet you can. If you really understand what percentile rank means and you understand the 689599 Rule, you could make a pretty edumacated guess about the answer. In this chapter, you should start developing your "statistics 6th sense," and this involves GUESSING ANSWERS BEFORE CALCULATING THEM. This is another powerful pathway to understanding statistics. So do it! Mathematical statisticians have developed ways of determining the proportion of a distribution that is below a given number of standard deviations from the mean. They have shown that only 2.3% of the population will be less than or equal to a score two standard deviations below the mean. In terms of the Introductory Psychology test example, this means that a person scoring 70 would be in the 2.3rd percentile. In other words, So, how did we get to this conclusion? There are a few steps we have to go through. First, we need to convert that score of 70 to a z score. Any time we're trying to find area, percentile rank, probability, etc. in a standard normal distribution, we must have a z score. We use our zscore formula to get this. Our calculations would go as follows: Now that we have a z score, we can go to our Z Table. We will go and look at our negative z scores. We will be looking for what else...a z value of 2.0. When you find that z score, you can see that it is associated with an area of .0228. We could multiply this number by 100 to get 2.23%. This is the percent of scores at or below our score of 70. To keep us on track in these sections, I propose that we use "outlines." At the begining of a problem, you should not only think about drawing and guessing, but you should also think about creating an OUTLINE for the problem. The outline will comprise the steps you need to take to complete the problem. An outline always begins with what the given information and ends with the missing information. The outline for the problem we just completed is: http://www.derekborman.com/230_web_book/module3/normal/index.html 9/17 1/21/2016 Normal Curve X > z > Percentile Rank So, what does this percentile rank tell us? Remember that all of our interpretations based on our Z Table are "relative." That is, what we conclude is relative to the one or more distributions that we're working with. In this case, relative to other test takers in the group, a person who earns a 70 on the test did not perform very well. Think about that for a moment. What we are NOT saying is that the test was hard or easy. That would be a different claim, altogether and would probably involve comparisons of averages among different groups. We're not doing that in this chapter. (../../images/normal/percentile_rank3.png)Now, what if we had asked ourselves a different question. What if we had a percentile rank in mind and wanted to know what type of raw score that translated to? Often times, graduate schools will publish the percentile ranks (on graduate school exams) for those who have gained admittance into their elitist temples of dogma and indefatigable selfimportance. Oops! Is my egalitarianism showing? Let's suppose we know that in order to get into graduate school, you have to score in the upper 10 percent of those who take the test. In other words, you have to be in the 90th percentile or better. So the question that you must answer is: What is the lowest score you can get on the test and still be at or above the 90th percentile? Another way of asking this: What score cuts off the lower 90% from the upper 10% of scores in this distribution? And yet one more way to ask the question: What score has a percentile rank of 90? We're just asking the same question in different ways. Click image to enlarge. Have you drawn it? Have you guessed what score is needed? Let's see...knowing that 50% of the scores are below the mean and 34% of scores are between the mean and the first standard deviation unit above, I'm going to guess that my z score will be somewhere around 1.1 or 1.2 and the raw score that is at the 90th percentile is somewhere around 86. What were your guesses? The outline for this problem would look like: Percentile Rank > z > X Okay. Let's follow the outline to the answer. We know that the percentile rank is 90. That's given. But where do we go from there? For this problem, we're trying to get to x, and the only way to do that is by figuring out z first. That is, we have to figure out what z score is associated with the 90th percentile. So, open up the Z Table and look for the z score that is closest to an area of 90. Because the percentile rank we're looking for is 90, we know that we will be looking in the part of the table showing positive z scores. (If the percentile rank was below 50, then we would be looking in the part of the table showing negative z scores.) So, we locate the box in the table that comes closest to 90. The box that comes closest to 90 is .8997. What is the z score associated with this area? A z score of +1.28 is associated with the place in the distribution where the upper 10% of scores is cut off from the lower 90%. So, I was pretty close with my initial guess. And look at the chart that we drew and shaded. Seems to make sense. So far, so good. But we're not through, yet. How do I know? Because the outline says that I'm not through. So far, I've only gotten as far as identifying the z score. To finish the problem, I need to find the raw score associated with that z score. To do this, I just use my ztox formula. Our answer to this question is 86.4. In other words, you will have to score at least an 86 on the exam in order to feel especially confident about getting into graduate school. Try a couple on your own. What is the raw score for a percentile rank of 34? What is the percentile rank for a raw score of 96? Draw it! Guess it! Outline it! Calculate it! Check with someone else in class to see if you arrived at the correct answers. If you got the correct answers, then you just might be grad school material! http://www.derekborman.com/230_web_book/module3/normal/index.html 10/17 1/21/2016 Normal Curve How Do I Find The Percentage Area of The Curve Above a Score? Pleeeeeze. You're an expert now. This one is easy. To find the percentile rank of a score, we had to find the percentage of the normal curve below the score. A related problem is to find the percentage of the curve above a particular point. (../../images/normal/area_above1.png)How about an example. Suppose we administer an IQ test and calculate some statistics. The average for our sample is 100 and the standard deviation is 16. The question is: What percentage of the distribution is above a score of 120? As with the previous section, the first order of business is to draw the curve. Go ahead and do that. Click image to enlarge. The next order of business is to think about what is being asked and make some guesses. Given the information in the question and our drawing of the curve, we know that we're dealing with the upper tail of the distribution. If we've labeled our drawing correctly we can guess that about 1015% of the distribution is above a raw score of 120. Even if we didn't know anything about the 689599 Rule, we could still make this guess, so long as our normal curve is drawn to scale. What would the outline be for this problem? X > z > Area Now we can walk through it. As before, we have been given a raw score (120) and must first convert this score into a z score so we can use the normal curve table. So, we see that a raw score of 120 has a z score of 1.25 in this distribution. Because this is a positive z score, we know that we will be looking at the part of the Z Table with positive z scores depicted. After locating +1.25 in the Z Table, we find that .8944 or 89.44% of the distribution lies below. That's what it says in the table, but that's not what we were asked. We were asked to find the area above. How do we do this? Simple. Just subtract 89.44% from 100%. That gives us a difference of 10.56%. In other words, we conclude that 10.56% of the area in this distribution is above a raw score of 120. Let's see, our original guess was 1015%. So, we were pretty close. And because our original guess was close, we can feel all the more confident that we have worked out this problem correctly. How do I Find The Percentage Frequency? Another useful piece of information about the normal curve is that the percentage area under the curve is not all that different from percentage frequency. Percentage frequency involves finding the total number of subjects who have scores within a particular area of the curve. We do this by finding the percentage area in a particular part of the curve and then figuring that percentage of the total sample size to find out how many subjects have scores in that area. (../../images/normal/freq_above1.jpg)We're really just answering a simple question: How many subjects have scores within a particular area of the curve? This area could be above a particular point, below it or even between two points. http://www.derekborman.com/230_web_book/module3/normal/index.html 11/17 1/21/2016 Normal Curve For example, if we administer an IQ test to 250 randomly selected individuals, we would expect 10.56% ofthem to score 120 or above. We answered that question in the last section. But now for a new question: How many would score 120 or above? The answer is 10.56% percent of 250, or 26.4 (realistically, 26) people. Let's look at another example. How many of our randomly selected 250 would we expect to score above 80? As before, we draw it, guess it, outline it and then calculate it. With only the information that we have and a welldrawn and labeled curve, we might guess that 210 to 230 people in this distribution have an IQ above 80. Click to enlarge. How about an outline for this one. Remember that we always start the outline with what we're given and finish it with where we need to end up. X > z > area > Number To use the Z Table, we first convert the IQ score of 80 into a z score. You might have alread guessed that our z score would be somewhere around 1.2 or 1.3 based on the drawing. Our calculations reveal that an IQ of 80 is associated with a z score of 1.25 in this distribution. Now that we have a z score, we can go to the Z Table and look it up. When we do, we see that 1.25 is associated with an area below of about .1056. That is, 10.56% of the distribution is below a z score of 1.25. But remember, that's not what the question asked. We need to find the percent and number of participants above an IQ of 80. How do we do this? Simple. If 10.56% represents the area below a z score of 1.25, then the area above must equal 89.44%. Right. The area above and below must equal 100%. So, all we did was subtract 10.56% from 100%. Thus, we need to find 89.44% of 250. Using a calculator, we find that: 250 x 89.44% = 223.6. Another way to perform this calculation is: 250 x .8944 = 223.6. In other words, we would expect about 224 people to score 80 or above. Did you have a hard time following that? Yes? Did you draw your curve and label it? I know, I know...you had a picture right there in front of you. Why should you draw it? Here's an answer for you: So you don't have to take this class again! Seriously, do you really want to have to go through all of this one more time? I thought not. So, when I tell you to draw, guess, outline and calculate, go ahead and do that. Even if you don't quite understand what you're doing, the simple act of doing it on your own helps tremendously. Now, using the same data from the previous example, try a couple on your own. How many people would we expect to score below an IQ of 105? How many people would we expect to score above an IQ of 105? Draw it! Guess it! Outline it! Calculate it! Check with someone else in class to see if you arrived at the correct answers. Or better yet, articulate your answers in the bulletin board below. If you got the correct answers, then you just might be grad school material! How do I find the Area and Percent Frequency between Two Scores? http://www.derekborman.com/230_web_book/module3/normal/index.html 12/17 1/21/2016 simpsons picture Normal Curve We have learned how to find a percentage area below a score (percentile rank of the score) and how to determine an area above a score. What about determing a percentage area (or frequency) between two scores? For example, suppose we have a random sample of 1000 individuals addicted to The Simpsons TV show. Average number of episodes watched per week is 100 with a standard deviation of 16. Suppose that we want to offer special counseling to the group of subjects that watches between 90 and 120 episodes per week, but we need to know how many people we would be dealing with. How many people watch between 90 and 120 episodes per week? The problem, then, is to determine the area between the scores of 90 and 120 and to convert this area to a frequency based on N = 1,000. But even though this problem is a little different from previous ones in this section, we still start in the same place draw the curve and label it. As you can see, this curve (click to enlarge) is a little more complex than the others that (../../images/normal/between1.jpg)we've seen so far. This is because we will now have to figure out two z scores instead of one. That's right. We will need to calculate z scores for 90 and 120. Before we get to the calculations, however, we need to guess our answer and make an outline for the problem. Let's see...a guess. Just looking at the space between 90 and 120, we might guess that 6070% of the area is in that space. Well, if 6070% of the area is in that space, then we would have to guess that between 600 and 700 people in this group would watch between 90 and 120 episodes of The Simpsons per week. We've made our guess. There will be one outline for this problem but we will have to go through it twice (once for x=90 and once for x=120) to get all of the information that we need. The subscripts indicate whether the outline is for the first or second x value. The last step is to actually perform the calculations. First we will calculate the z scores. The z score for 90 is which rounds to .63. (Remember that when the last digit is 5 or more, we round up.) The z score for 120 is http://www.derekborman.com/230_web_book/module3/normal/index.html 13/17 1/21/2016 Normal Curve Now, look at the drawing above. Sheez! Homer looks pretty happy; well I guess he looks more manic. Anyhow, I think that Homer's uncharacteristic jubilation arises from the fact that our two calculations seem to line up with our drawing. That is, when we labeled the drawing, we might have guessed that our x value of 90 would be close to a z score of .625 and our x value of 120 would be close to 1.25. So far, so good. Now that we have our z scores, we can go to the Z Table. Remember that the Z Table has a section for negative z scores and another section for positive z scores. When we look up a z score of .63, we see that the area below that is .2643 or 26.43%. Looking up a z score of 1.25 reveal that .8944 or 89.44% of the distribution is below. But our question didn't ask about area and number of people below these values. Our original question asked us to deal with the area between scores. To find the area between two z scores, we just subtract the area below (26.43%) for one z score (.63) from the area below (89.44%) for the other (1.25). When we subtract 26.43% from 89.44% we arrive at a difference of 63.01%. In other words, we estimate that about 63% of the area in this curve is between the raw scores of 90 and 120 episodes viewed per week. We're almost there. Look at our outline. One step to go. To finish this up, we need to figure out the total number of individuals that we would expect to fall between 90 and 120 episodes viewed per week. All we have to do at this point is find 63.01% of 1,000, which is our sample size. Piece of cake. 1,000 x 63.01% = 630.1, or 630 Thus, about 630 people would have IQ scores between 90 and 120. Wow! 630! That's a lot of people to provide counseling for, but I think we're up to the challenge. Note that drawing your curve really helps you to visualize the problem and enables you to see easily that we must find the difference between the areas to arrive at the correct solution. No question about itdrawing curves will help you understand these problems better and help you to avoid errors. How Do I find pobability? The normal distribution, as well as other distributions, is a probability distribution. Although we have been discussing percentage areas, percential ranks and percent frequencies up to this point, we could just as easily have talked about probabilities. In fact, we can convert the percentage areas in the normal curve table to probabilities by dividing % Area by 100. For example, if the percentage area under the curve between 2 and 3 standard deviation units of the mean is 2.15, then the probability of someone having a score in this area is: 2.15/100 = .0215. (../../images/normal/area_above1.png)Let's go back to a previous problem. We will use the data and curve from the section on assessing area above a given point on the curve. We have administered an IQ assessment and have found a sample average of 100 and a standard deviation of 16. Suppose that we want to determin the probability that a person picked at random would have an IQ score of 120 or above. After drawing and labeling our curve, we might guess that the probability is somewhere between .10 and .15that is, between 10% and 15% probability. What would the outline be for this problem? http://www.derekborman.com/230_web_book/module3/normal/index.html 14/17 1/21/2016 Normal Curve X > z > Area > Probability Let's walk through it. We have a raw score of 120. According to our outline, our next move is to calculate a z score. We conduct our calculations as follows and come up with a positive z score of 1.25. Because this is a positive z score, we know that we will be looking at the part of the Z Table with positive z scores depicted. After locating +1.25 in the Z Table, we find that .8944 or 89.44% of the distribution lies below. We find the area above by simply subtracting 89.44% from 100%. This yields a difference of 10.56%. In other words That's what it says in the table, but that's not what we were asked. We were asked to find the area above. How do we do this? Simple. Just subtract 89.44% from 100%. That gives us a difference of 10.56%. In other words, 10.56% of the distribution is above a raw score of 120. To convert this area into a probability, you would perform this calculation: 10.56%/100 = .1056. In other words, if you randomly chose someone from the group tested, there is a 10.56 probability that you would select someone who has an IQ higher than 120. That's a very low probability. Another way to move from percent area to probability is to simply use the word "probability." You don't really have to divide the area by 100 and use a decimal place. Just take your area and say something like, "There is a 10.56% probability of selecting someone with an IQ of 120 or higher." That works just fine. Moreover, we can see that this conclusion is consisten with our drawing and our guess. Try a couple on your own. What is the probability of someone scoring below a percentile rank of 63? What is the probability of someone scoring above a raw score of 105? What is the probability of randomly selecting someone who scored between 90 and 120? Draw it! Guess it! Outline it! Calculate it! Check with someone else in class to see if you arrived at the correct answers. Or better yet, articulate your answers in the bulletin board below. How do I find deviant Scores And their Probability? How do you like that? We start of talking about normal and end up talking about "deviance." Circle of life I supposesometimes we're near the middle of the distribution, moving with the herd. While at other times we're off doing our own thing. What have we covered so far? 1. 2. 3. 4. 5. 6. Finding percentile rank from a score Finding a score based on percentile rank Finding the area above a score Finding the area between two scores Finding the frequency of scores in a particular area Finding the probability of scores in a particular area In another type of problem, we might be asked to determine IQ scores so deviant or unlikely that they occur 10% http://www.derekborman.com/230_web_book/module3/normal/index.html 15/17 1/21/2016 Normal Curve or less of the time. In a way, this is a trick question, because the direction of the deviance from the mean is not stated. Because the direction is is not specified, we are actually dealing with both ends of the normal curve and not just the top or botom tail. An appropriate outline would be: Area > z > X (../../images/normal/deviance1.png)Notice in the figure (click to enlarge) that because the direction is not stated, the deviant 10% on the curve has been split into two equal parts, with 5% above the mean and 5% below it. Extremely high IQ scores are as unlikely as extremely low ones. If we had not divided the 10% in half and had instead put 10% at the upper end and 10% at the lower end, then we would be cutting off the deviant 20% of the distribution. Whenever we encounter a problem in which the direction of the deviance is not given, we must halve the given area before labeling both ends of the curve. So what scores cut off the deviant upper and lower 5% of the distribution? Based on our drawing, we might guess somewhere around 70 to 75 on the low end and 125 to 130 on the high end. The problem now becomes one of finding a z score when we know a percentage area under the curve and converting this z score into a raw score. Since our z scores are cutting off the deviant 5% of the distribution on the high and low ends, we know that the z scores will be mirror images of one another. That is, the values will be the same, with one being positive and the other negative. So we can look in either the negative or positive sections of the Z Table. If you look in the negative section, you will be looking for the area that comes closest to 5%. In our Z Table, the areas that come closest are .0585 and .0505. These areas are associated with z scores of 1.65 and 1.64, respectively. If we look in the positive section of the table, we have to look for the area that comes closest to 95%. The areas that come closest to 95% in the positive section are .9495 and .9505. These areas are associated with z scores of +1.64 and +1.65, respectively. We've just confirmed that the 5% cutoff point on the high end is at exactly the same place on the low end. For our example, let's take positive and negative 1.64 as the z scores cutting off the deviant 10% of the distribution. The final step in the problem is to convert these two z scores into raw scores. We will use the formula that allows us to convert z scores to raw scores. X = zs + avg = (±1.64)(16) + 100 = ±26.24 + 100 = 73.76 and 126.24 Thus, the IQ scores that are so unlikely that they occur 10% or less of the time are those less than 74 and greater than 126. The problem of finding deviant scores that we have been considering can be related back to probability. For example, when we asked, "What scores are so deviant that they occur less than 10% of the time?" we could have as easily asked, "What scores are so deviant that their probability of occurrence is 5% or less?" To answer this question we would have followed the same steps as above, but instead of thinking about the Z Table in terms of area, we would have thought about it in terms of probability. And as We've already figured out, area and probability are almost one and the same idea when we're talking about curves. Self Test http://www.derekborman.com/230_web_book/module3/normal/index.html 16/17 1/21/2016 Normal Curve Selftest for chapter (normal_test.pdf) Answers to selftest (normal_answers.pdf) Some content adapted from other's work. See home page for specifics. LAST UPDATED: 20150924 5:32 PM Mesa Community College | 1833 W. Southern Ave. Mesa, AZ 85202 | Email Address: [email protected] | Phone: (480) 4617181 | Disclaimer xhtml | css | 508 DEREK BORMAN: PSYCHOLOGICAL SCIENCE MCC PSYCHOLOGICAL SCIENCE HOMEPAGE http://www.derekborman.com/230_web_book/module3/normal/index.html 17/17

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 9: Normal Curve