Download Chapter 9: Normal Curve

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
1/21/2016
Normal Curve
Chapter 9: Normal Curve
Objectives for This Chapter
Understand the basic characteristics of the standard Normal curve.
Apply the Z table.
Assess percentile rank of a score.
Assess percent frequency (number of scores) above/below/between points on
the curve.
Assess percent of curve above/below/between score(s).
Assess probability of attaining certain scores and finding deviant scores.
Assess area of curve above/below/between score(s).
I march across the desert and all i have to show for it is this stupid Rock?
Discovery of The Rosetta Stone
(../../images/normal/soldiers.jpg)The young French soldier was hot, thirsty
and tired. He had been involved in Napoleon's Egyptian expedition for months.
He'd marched through the desert with no water, chasing after strange pools of
liquid that miraculously appeared on the horizon, and then cruelly disappeared
as he approached them. He'd also choked on storms of dust that raced across
the landscape. Now he and his companions had been assigned to tear down an
ancient wall so they could build an extension to Fort Julien. Backbreaking work.
As he pried another stone out of the wall, its color struck him as being odd. It
was dark, almostblack actually. On one side that was flat there appeared to be
writing of some sort. Not French, to be sure, but some strange script. No, not
just one typeeither, but three different types. Very odd. He
(../../images/normal/rosetta_stone.jpeg)decided to call over the officer­in­
charge over to take a look at it. Click images to enlarge.
What had been discovered was a large stone on which the same message had been written in three different
languages; the upper text is Ancient Egyptian hieroglyphs, the middle portion Demotic and the lowest was Ancient
Greek. This stone is known as the Rosetta Stone. Because it presents essentially the same text in all three scripts
(with some minor differences between them), it provided the key to the modern understanding of Egyptian
hieroglyphics (http://en.wikipedia.org/wiki/Egyptian_hieroglyphs). (http://en.wikipedia.org/wiki/Ancient_Greek)
We aren't troops in a foreign country, but we are on a journey across what may sometimes feel like a hot and
unforgiving desert. Additionally, I've heard more than one student mutter under her breath that learning statistics
http://www.derekborman.com/230_web_book/module3/normal/index.html
1/17
1/21/2016
Normal Curve
seems an awful lot like deciphering hieroglyphics. Fortunately, if you brave the
burning sands, the scorching winds, the poisonous scorpions, thirst, exhaustion,
feelings of hopelessness and the occasional fainting spell...where was I going with
this...Oh, yes...you will be provided with the key to the modern understanding of
statistics. Not only will you have the key to understanding modern statistics, but you
will have new insight into everything from politics to medicine to sports to business.
Though it may not feel like it, we will be talking about nothing less than the heart of
scientifically informed analysis and desicion making.
Like the Rosetta Stone, the Normal distribution or Normal curve allows us to make
several translations: from scores to percentiles and probabilities, from probabilities
and percentiles to scores, and from scores in one set of units to scores in other units.
This is the focus of this chapter.
The Normal curve is also called the bell curve because of its shape. The creation of this curve is often attributed to
the brilliant German mathematician, Karl Friedrich Gauss (1777­1855). Thus, it is sometimes called the Gaussian
distribution. Actually, the mathematical equation that generates the Normal curve was introduced by Abhaham De
Moivre (1667­1754). He was a sharp guy. He even spent time in the company of Sir Isaac Newton.
Why do we use the Normal curve?
The Total Area Always equals 100%
(../../images/normal/different curves.GIF)Normal distributions are a family of
distributions that have the same general shape. They are symmetric with scores
more concentrated in the middle than in the tails. Normal distributions are
sometimes described as bell shaped. Here are some examples. Notice that they differ in how spread out they are. But
even though the shapes are different, the area under each curve is the same.
The total area accounted for under any curve is 100%. This never changes
and is critical to our understanding of how to apply the Normal distribution.
Because we know that the area under the curve is always 1 or 100%, we can
understand a lot about individual scores and groups of scores to which the
Normal distribution is applied. We'll need a Normal curve table for this. More on
that one, shortly. Click images to enlarge.
The Normal curve has other characteristics that are always true. Once again, the
fact that we can always count on these characteristics provides a good model for understanding numeric trends in
data. The following are other important characteristics of the Normal curve:
1. All Normal curves are symmetric around the mean of the distribution. In other
words, the left half of the Normal curve is a mirror image of the right half.
2. All Normal curves are unimodal. Because Normal curves are symmetric, the most
frequently observed score in a Normal distribution— the mode— is the same as
the mean.
3. Since the Normal curves are unimodal and symmetric, the mean, median, and
mode of all Normal distributions are equal.
4. All Normal curves are asymptotic to the horizontal axis of the distribution. Scores
in a Normal distribution descend rapidly as one moves along the horizontal axis
from the center of the distribution toward the extreme ends of the distribution,
but they never actually touch it. This is because scores on the Normal curve are
continuous and held to describe an infinity of observations.
http://www.derekborman.com/230_web_book/module3/normal/index.html
2/17
1/21/2016
Normal Curve
5. All Normal curves have the same proportions of scores under the curve relative
to particular locations on the horizontal axis when scores are expressed as areas,
percentiles, probabilities, etc.
Hey, It's As Natural as Big Feet!
As we discussed in a previous chapter, research in any field
must deal with variability. We know that too much
variability probably means that we have more error in our
methods and data, whereas less variability is one indication
that our methods and data comprise less error. So, less
variability is good, but there will always be some. Why?
Everyone doesn't respond the same way to the same
medication; different people have different memory
abilities; some people are taller and some people are
shorter. Turns out that variability is natural, as is the
Normal distribution. In other words, organisms inherit
physical and derivative psychological
traits...well..."Normally."
Jack Links - Messin ...
We take this Normality as a common pattern or "process" of
nature and our "observations" of it. And even though we
may not be able to identify all of the factors (we never will,
by the way) that make up the thing we like to call "intelligence," when we measure this thing in large numbers and
with proper research methods, we get that nice Normal curve. Go figure!
Consider the phenomenon of Bigfoot or Sasquatch. I know...I know. A fairytale. Right? Like there could really be
populations of half­human/half­ape creatures that exist in various remote locations and are only detectable
through their forensic remains. Before we dismiss it too quickly, let's try the hard thing. Let's try to argue FOR the
existence of Sasquatch based on Normality. How could we do this?
(../../images/normal/bigfoot.jpg)As you may know, footprints are the
standard stock in trade of Sasquatch research, and their sometimes
inhuman length assures almost immediate measurement, even by
first­time witnesses. The process here consists of foot lengths and the
observations are the measurements of footprints. Foot lengths are
going to be affected by a lot of factors: Gender of the creature. Family
genetics. Nutrition. Surface from which the foot lengths were
measured­­snow, mud, grass, etc. Length of time between the
creation of the footprint and its measurement. Amount of alcohol
consumed by everyone involved. It's complicated!
Nonetheless, as can be seen here, a sample of 410 independently
collected footprints (ostensibly left by a Bigfoot) forms a fairly Normal
curve (with frequency plotted on the y axis and foot length plotted on
the x axis). The Normal distribution overall argues compellingly for the existence of Sasquatch as a genuine
species, in that production of fictitious data over 40 years by hundreds of people independently of each other
would likely have generated a distribution with many peaks. A further factor that supports the authenticity of the
data is the fact that foot length, foot width, heel width, and gait are interrelated in a logical and cohesive fashion, a
congruence not plausible by pure chance.
Hmmmm....very interesting. Are you a true believer, yet? If you want to learn a little more about forensic research
on the big fella, you can read this research paper (http://www.bfro.net/ref/theories/whf/fahrenbacharticle.htm).
(../../images/normal/SAT_chart.png)Why don't we frame this in less
cryptozoological terms. Let's look at the SAT. The process here consists of the
students taking the test, and the observations are the students’ scores. Now,
my score, for example, is going to be due to a whole set of different factors:
my IQ, what I had for breakfast, how much I studied the night before, how
http://www.derekborman.com/230_web_book/module3/normal/index.html
3/17
1/21/2016
Normal Curve
good my teachers are, which butterflies were flapping their wings in Beijing
this morning, and so on. In short, my score is the result of a whole set of
hard­to­predict factors. The same with my fellow students. And yet, even
though all these factors are hard to predict, if you take the scores of a large
number of students from a single population, the scores will be Normally
distributed as you see here. Once again, when we see such a Normal curve in
our data, we're inclined to think that we're on the right track.
Have A Go At Sir Francis Galton's Quincunx
Of course, this same kind of Normal distribution shows up all over the place, just about anywhere we observe a
large number of samples from a process that is the sum of many independent probabilistic factors. In fact, Francis
Galton invented a machine, the quincunx, designed to illustrate how multiple probabilistic factors can add up and
result in a Normal distribution.
Below is an interactive quincunx. Experiment with it and see if it gives you greater insight into sampling,
probability and how Normal curves are created. Make sure to select the "Auto Drop" option so that you don't have
to keep clicking to drop individual balls.
So, what's really going on here? The little gray balls drop down through the fixed black pegs to the bins below. If
http://www.derekborman.com/230_web_book/module3/normal/index.html
4/17
1/21/2016
Normal Curve
you drop more and more little gray balls, their pattern in the bins increasingly resembles a Normal distribution. Another way to think about it is that there are more pathways to the middle of the distribution and fewer pathways
to the tails of the distribution. Therefore, the probability of balls landing in one of the middle bins is higher. We
might call it the "path of least resistance" which might also be referred to as an average of sorts. If you want to
know more about the math and probability underlying the functioning of the quincunx, you can look at this
information on Pascal's triangle (http://en.wikipedia.org/wiki/Pascal%27s_triangle).
If we performed a real­world simulation with actual three­dimensional balls, bins, etc., there would be numerous
factors (beyond simple chance) contributing to the final location of each ball dropped. Factors such as air currents,
irregularities in ball shape, and other hard­to­analyze factors would make it very difficult to predict the bin in which
any individual gray ball would land. However, as a whole, the pattern becomes very predictable as the number of
“samples” increases. The individual peculiarities of each gray ball’s trajectory are indeed wholly due to the
combined influence of many independent “accidents." This, as Galton predicted, eventually yields a Normal
distribution. The 68­95­99 Rule
The standard normal curve is a special example of the normal distribution. The height of a Normal distribution can
be specified mathematically in terms of two population parameters: the mean (μ) and the standard deviation (σ).
Instead of calculating our curve parameters in painstaking, mathematic long hand, we will simply use sample
statistics (s and x­bar) to estimate the properties or distribution shape of our actual population. In other words, we
can do some shortcutting.
Every time you look at a group of scores (sample of data), you want to be thinking about those scores as
comprising a shape. Even though you will see data listed in groups andcolumns, underneath every data set is a
shape. Whenever we perform statistical analyses, we're hoping that this shape comes as close as possible to bell­
shaped or Normal. As we move along with our discussion, this idea of "shape" will become more concrete.
(../../images/normal/68_95_99_curve.png)The distances along the
horizontal axis of our curve, when divided into standard deviations,
will always include the same proportion of the total area: Between
­1 and +1 standard deviation units lies about 68% of the area.
Between ­2 and +2 standard deviation units lies about 95% of the
area. Between ­3 and +3 standard deviation units lies about 99%
of the area. This is true of a standard normal curve whether it is
perfectly bell­shaped, a little narrower or a little wider. This graphic
depicts the approximate 68­95­99 breakdown for a bell­shaped,
standard normal curve. Click image to enlarge.
This conception of the normal curve starts to become powerful
when we "map" it onto normally distributed variables. One example
of a variable that forms a normal curve is I.Q. In this case, we can
tell what percentage of people are in any area of the curve. A normal distribution of 1000 cases will have 683
(about 68%) people between +/­1 standard deviation, about 954 (about 95%) people between +/­2 standard
deviations, and 997 (about 99%) people between +/­3 standard deviations. Only 3 people will be outside 3
standard deviations from the mean, if the sample size is 1000. In other words, in a perfectly normal distribution
based on such data, we would expect only about three people to have I.Q. scores above and below the I.Q. scores
associated with z scores of +3 and ­3.
Review of Z Scores
http://www.derekborman.com/230_web_book/module3/normal/index.html
5/17
1/21/2016
Normal Curve
We discussed standard scores, or z scores in a previous chapter. A standard score or z score is the deviation of a
raw score from the mean in standard deviation units. Each standard deviation unit represents a specific distance,
expressed in the units of the sample scores. When we have normally distributed data, our deviation units will go
out about three up and three down before we almost run out of curve. We have two formulas­­one that allows us
to calculate a z score from a raw score and one that allows us to calculate a raw score from a z score.
Z scores can also be positive or negative. The sign of the z score tells the direction of the score relative to the
mean: Negative zscores represent raw scores below the mean, and positive z scores indicate scores above the
mean.
Let's consider another "big foot" example to refresh our memories about the use of
these formulas. Suppose we find fo a sample of women that the average shoe size is
8.25, with a standard deviation of 1.17. What will be the z score, or standard score,
corresponding to a shoe size of 10.5? Using this formula, we would first subtract 8.25
from 10.5, giving us a difference of 2.25. Then, we would divide this difference by
1.17. The answer would be 1.92. Based on this sample, a woman who wears a size
10.5 shoe will be 1.92 standard deviation units above the mean. And because you
know a little bit about the shape of normal curves and how z scores relate to it, you could also conclude that very
few women from this distribution would have larger feet, but many would have smaller feet.
Continuing with the sam example, what size shoe will be worn by a woman who is
2.25 standard deviation units below the mean? In this case, we have been given
the z score, z = ­2.25, and we are asked to convert it to a raw score. Using this
formula, we multiply ­2.25 by 1.17. This gives us ­2.63, which we then add to the average (8.25) of the
distribution. In the end we figure out that for a z score of ­2.25, the corresponding shoe size in this distribution is
5.62. This woman will probably have to try on a 5 ½ and size 6 to get a pair of shoes that fits. What is a Z Table And How Do I Use It?
Now that we know about the 68­95­99 Rule and we know how to convert raw scores to z scores and vis versa,
what do we do with all of this amazing knowledge? We pull out our own "Rosetta Stone," light some candles, call
up the ghost of Napoleon and make him perform all of your statistical calculations. Hey, you gotta blame
somebody for all the pain you're enduring in this class. Why not throw it all on the dead guy who started this whole
thing.
This normal curve table really is our Rosetta Stone. It
can provide information about the population from
which our sample was drawn. Moreover, all of the
inferential statistics in the world derive from the
assumption that whatever variables we study have
underlying distributions that come close to Normal.
In other words, it sort of all comes back to this table.
We will use this table to answer some simple but
important questions about data. This table opens the
door to many applications.
This Table can be expanded to full screen or
you can zoom in to enlarge the image within
the space. You have a full copy of this table in your
Instructional Manual.
In this table, the z score is in the far left column. Each column to the right represents a given z score to the
hundredths decimal place. Values in this chart show the area/percent/probability BELOW a certain point(the z
score) in the normal curve. For example, the percent area below a Z­score of ­1.22 is 11.12%. Reflexively, we
know that the area/percent/probability ABOVE a Z score is 100 minus the value listed in the table. In other words,
Below area + Above area = 100% or 1.0. A negative z­score value is to the left of the mean. A Positive z­score
value is to the right of the mean. You must keep these ideas in mind if you are to use the table correctly.
http://www.derekborman.com/230_web_book/module3/normal/index.html
6/17
1/21/2016
Normal Curve
Let's look at a specific z score to see what we can extract from the table. How about a z score of +.52? First, to
find this z score, we would scroll down the the second part of the table­­the part that
(../../images/normal/z1.png)depicts positive scores. In the Z­Score
column, I would go down to the row with 0.5. Next, I would follow the
headings of the columns over to 0.02. Finally, I would identify the
intersection of the selected row and column at 69.85. In other words,
about 70% of the distribution is below a z score of +.52 and about 30%
of the distribution is above.
So, what about a z score of ­.52? Easy. We would just flip our previous
conclusions­­30% of the curve is below and about 70% is above. Don't
believe me? Look at the part of the Z Table depicting negative scores­­
that is, z scores below the mean. Your table indicates that 30.15% of
the distribution is below a z­score of ­.52. Of course, this means that
70% is above.
What if we wanted to find out how much of the distribution is between a score of ­.52 and mean (mid­point) of the
distribution? To answer this question, you have to keep in mind two things: FIRST, the area above a z score of 0
represents 50% of the distribtuion, as does the area below. Together, the two areas comprise 100% of the
distribution. SECOND, the areas depicted in our Z Table are for areas below a given z score.
So, here are the rules for determining the amount of area between a given z score and the middle of the
distribution:
1. IF THE Z SCORE IS POSITIVE, subtract .50 from the value associated with that
given z score.
2. IF THE Z SCORE IS NEGATIVE, subtract the value associated with that given z
score from .50.
Play Time!!
This is some pretty nasty stuff to get your head around without getting your hands dirty. Below is a z­score
calculator. Explore the different options. Spend some time with it. Mentally note how changing one parameter
leads to changes in another. Use the Z Table to predict what will happen if you enter a certain z score in the
calculator. What will happen to the shading? To the area above the z score? Below it? Notice how percentile, area
and probability almost seem to be "saying" the same thing. Hmmm......
http://www.derekborman.com/230_web_book/module3/normal/index.html
7/17
1/21/2016
Normal Curve
What is a Percentile Rank and How Do I calculate it?
Percentiles and percentile ranks are frequently used as indicators of performance in many fields, from psychology
to medicine to education to business. Percentiles and percentile ranks provide information about how a person or
thing relates to a larger group. Relative measures of this type are often extremely valuable to researchers
employing statistical techniques.
Characteristics of Data Distributions
Remember that Z Table to which you were introduced, a couple of sections ago? That table is your Rosetta Stone
for understanding the language or characteristics of standard normal distributions. Whereas with the Rosetta
Stone, scholars have been able to interpret ancient Egyptian, Demotic and Greek, the Z Table will help us to
understand distribution characteristics such as percentile rank, area, probability, percent, and percent frequency.
One reason the normal distribution is important is that many psychological and educational variables are
distributed approximately normally. We already discussed this.
http://www.derekborman.com/230_web_book/module3/normal/index.html
8/17
1/21/2016
Normal Curve
A second reason the normal distribution is so important is that it is easy for mathematical statisticians to work
with. This means that many kinds of statistical tests can be derived for normal distributions. Almost all statistical
tests discussed in this book assume normal distributions. Fortunately, these tests work very well even if the
distribution is only approximately normally distributed. Some tests work well even with very wide deviations from
normality.
Finally, if the mean and standard deviation of a normal distribution are known, it is easy to convert back and forth
from raw scores to percentiles to areas to frequencies. Stay with it, and you will come to understand why these
distribution characteristics are so important.
Finding And Understanding Percentile Ranks of Individual Scores
A percentile rank is the percentage of cases up to and including the one in which we are interested. Turns out
that is exactly the information our Z Table gives us. So, calculating percentile ranks is a fairly straightforward
procedure. One important thing to remember that a percentile rank tells us about a single score.
To be more concrete, assume a test in Introductory Psychology is normally
distributed with a mean of 80 and a standard deviation of 5.
boy guessing
Have you drawn it? You already know how. Even though you have the image right
here, go ahead and draw the curve on your own. DRAWING IS A BIG DEAL in
this chapter. Trust me. It helps with understanding and will only take a few
seconds for each problem.
What is the percentile rank of a person who received a score of 70 on the test? Before we calculate this one and check our Z Table, can you make a guess about
the answer? Bet you can. If you really understand what percentile rank means and
you understand the 68­95­99 Rule, you could make a pretty edu­ma­cated guess
about the answer. In this chapter, you should start developing your "statistics 6th sense," and this involves
GUESSING ANSWERS BEFORE CALCULATING THEM. This is another powerful pathway to understanding
statistics. So do it!
Mathematical statisticians have developed ways of
determining the proportion of a distribution that is below a
given number of standard deviations from the mean. They
have shown that only 2.3% of the population will be less than
or equal to a score two standard deviations below the mean.
In terms of the Introductory Psychology test example, this
means that a person scoring 70 would be in the 2.3rd
percentile. In other words, So, how did we get to this conclusion? There are a few steps
we have to go through. First, we need to convert that score of
70 to a z score. Any time we're trying to find area, percentile
rank, probability, etc. in a standard normal distribution, we
must have a z score. We use our z­score formula to get this. Our calculations would go as follows:
Now that we have a z score, we can go to our Z Table. We will go and look at
our negative z scores. We will be looking for what else...a z value of ­2.0. When
you find that z score, you can see that it is associated with an area of .0228. We
could multiply this number by 100 to get 2.23%. This is the percent of scores at
or below our score of 70.
To keep us on track in these sections, I propose that we use "outlines." At the begining of a problem, you should
not only think about drawing and guessing, but you should also think about creating an OUTLINE for the problem.
The outline will comprise the steps you need to take to complete the problem. An outline always begins with what
the given information and ends with the missing information. The outline for the problem we just completed is:
http://www.derekborman.com/230_web_book/module3/normal/index.html
9/17
1/21/2016
Normal Curve
X ­­> z ­­> Percentile Rank
So, what does this percentile rank tell us? Remember that all of our interpretations based on our Z Table are
"relative." That is, what we conclude is relative to the one or more distributions that we're working with. In this
case, relative to other test takers in the group, a person who earns a 70 on the test did not perform very well.
Think about that for a moment. What we are NOT saying is that the test was hard or easy. That would be a
different claim, altogether and would probably involve comparisons of averages among different groups. We're not
doing that in this chapter.
(../../images/normal/percentile_rank3.png)Now, what if we
had asked ourselves a different question. What if we had a
percentile rank in mind and wanted to know what type of raw
score that translated to? Often times, graduate schools will
publish the percentile ranks (on graduate school exams) for
those who have gained admittance into their elitist temples of
dogma and indefatigable self­importance. Oops! Is my
egalitarianism showing?
Let's suppose we know that in order to get into graduate
school, you have to score in the upper 10 percent of those
who take the test. In other words, you have to be in the 90th
percentile or better. So the question that you must answer is:
What is the lowest score you can get on the test and still be
at or above the 90th percentile? Another way of asking this:
What score cuts off the lower 90% from the upper 10% of
scores in this distribution? And yet one more way to ask the question: What score has a percentile rank of 90?
We're just asking the same question in different ways. Click image to enlarge.
Have you drawn it? Have you guessed what score is needed? Let's see...knowing that 50% of the scores are below
the mean and 34% of scores are between the mean and the first standard deviation unit above, I'm going to guess
that my z score will be somewhere around 1.1 or 1.2 and the raw score that is at the 90th percentile is somewhere
around 86. What were your guesses? The outline for this problem would look like:
Percentile Rank ­­> z ­­> X
Okay. Let's follow the outline to the answer. We know that the percentile rank is 90. That's given. But where do we
go from there? For this problem, we're trying to get to x, and the only way to do that is by figuring out z first. That
is, we have to figure out what z score is associated with the 90th percentile. So, open up the Z Table and look for
the z score that is closest to an area of 90. Because the percentile rank we're looking for is 90, we know that we
will be looking in the part of the table showing positive z scores. (If the percentile rank was below 50, then we
would be looking in the part of the table showing negative z scores.) So, we locate the box in the table that comes
closest to 90. The box that comes closest to 90 is .8997. What is the z score associated with this area? A z score of
+1.28 is associated with the place in the distribution where the upper 10% of scores is cut off from the lower 90%.
So, I was pretty close with my initial guess. And look at the chart that we drew and shaded. Seems to make sense.
So far, so good. But we're not through, yet. How do I know? Because the outline says that I'm not through. So far,
I've only gotten as far as identifying the z score. To finish the problem, I need to find the raw score associated with
that z score. To do this, I just use my z­to­x formula. Our answer to this question is 86.4. In other words, you will
have to score at least an 86 on the exam in order to feel especially confident about getting into graduate school.
Try a couple on your own. What is the raw score for a percentile rank of 34? What is the percentile rank for a raw
score of 96? Draw it! Guess it! Outline it! Calculate it! Check with someone else in class to see if you arrived at
the correct answers. If you got the correct answers, then you just might be grad school material!
http://www.derekborman.com/230_web_book/module3/normal/index.html
10/17
1/21/2016
Normal Curve
How Do I Find The Percentage Area of The Curve Above a Score?
Pleeeeeze. You're an expert now. This one is easy. To find the percentile rank of a score, we had to find the
percentage of the normal curve below the score. A related problem is to find the percentage of the curve above a
particular point.
(../../images/normal/area_above1.png)How about an
example. Suppose we administer an IQ test and calculate
some statistics. The average for our sample is 100 and the
standard deviation is 16. The question is: What percentage of
the distribution is above a score of 120?
As with the previous section, the first order of business is to
draw the curve. Go ahead and do that. Click image to
enlarge.
The next order of business is to think about what is being
asked and make some guesses. Given the information in the
question and our drawing of the curve, we know that we're
dealing with the upper tail of the distribution. If we've labeled
our drawing correctly we can guess that about 10­15% of the
distribution is above a raw score of 120. Even if we didn't
know anything about the 68­95­99 Rule, we could still make this guess, so long as our normal curve is drawn to
scale.
What would the outline be for this problem?
X ­­> z ­­> Area
Now we can walk through it. As before, we have been given a raw
score (120) and must first convert this score into a z score so we can
use the normal curve table. So, we see that a raw score of 120 has a
z score of 1.25 in this distribution. Because this is a positive z score,
we know that we will be looking at the part of the Z Table with
positive z scores depicted. After locating +1.25 in the Z Table, we
find that .8944 or 89.44% of the distribution lies below.
That's what it says in the table, but that's not what we were asked. We were asked to find the area above. How do
we do this? Simple. Just subtract 89.44% from 100%. That gives us a difference of 10.56%. In other words, we
conclude that 10.56% of the area in this distribution is above a raw score of 120. Let's see, our original guess was
10­15%. So, we were pretty close. And because our original guess was close, we can feel all the more confident
that we have worked out this problem correctly.
How do I Find The Percentage Frequency?
Another useful piece of information about the normal curve is that the percentage area under the curve is not all
that different from percentage frequency. Percentage frequency involves finding the total number of subjects
who have scores within a particular area of the curve. We do this by finding the percentage area in a particular
part of the curve and then figuring that percentage of the total sample size to find out how many subjects have
scores in that area.
(../../images/normal/freq_above1.jpg)We're really just answering a simple question: How many subjects have
scores within a particular area of the curve? This area could be above a particular point, below it or even between
two points.
http://www.derekborman.com/230_web_book/module3/normal/index.html
11/17
1/21/2016
Normal Curve
For example, if we administer an IQ test to 250 randomly
selected individuals, we would expect 10.56% ofthem to score
120 or above. We answered that question in the last section. But
now for a new question: How many would score 120 or above?
The answer is 10.56% percent of 250, or 26.4 (realistically, 26)
people. Let's look at another example.
How many of our randomly selected 250 would we expect to
score above 80? As before, we draw it, guess it, outline it and
then calculate it. With only the information that we have and a
well­drawn and labeled curve, we might guess that 210 to 230
people in this distribution have an IQ above 80. Click to
enlarge.
How about an outline for this one. Remember that we always start the outline with what we're given and finish it
with where we need to end up.
X ­­> z ­­> area ­­> Number
To use the Z Table, we first convert the IQ score of 80 into a z score. You might have alread guessed that our z
score would be somewhere around ­1.2 or ­1.3 based on the drawing. Our calculations reveal that an IQ of 80 is
associated with a z score of ­1.25 in this distribution.
Now that we have a z score, we can go to the Z Table and look it up.
When we do, we see that ­1.25 is associated with an area below of about .1056. That is, 10.56% of the distribution is below a z score of ­1.25. But remember, that's not what the question asked.
We need to find the percent and number of participants above an IQ of 80.
How do we do this? Simple. If 10.56% represents the area below a z score of ­1.25, then the area above must
equal 89.44%. Right. The area above and below must equal 100%. So, all we did was subtract 10.56% from
100%.
Thus, we need to find 89.44% of 250. Using a calculator, we find that: 250 x 89.44% = 223.6. Another way to
perform this calculation is: 250 x .8944 = 223.6. In other words, we would expect about 224 people to score 80
or above.
Did you have a hard time following that? Yes? Did you draw your curve and label it? I know, I know...you had a
picture right there in front of you. Why should you draw it? Here's an answer for you: So you don't have to take
this class again! Seriously, do you really want to have to go through all of this one more time? I thought not. So,
when I tell you to draw, guess, outline and calculate, go ahead and do that. Even if you don't quite understand
what you're doing, the simple act of doing it on your own helps tremendously.
Now, using the same data from the previous example, try a couple on your own. How many people would we
expect to score below an IQ of 105? How many people would we expect to score above an IQ of 105? Draw it!
Guess it! Outline it! Calculate it! Check with someone else in class to see if you arrived at the correct answers.
Or better yet, articulate your answers in the bulletin board below. If you got the correct answers, then you just
might be grad school material!
How do I find the Area and Percent Frequency between Two Scores?
http://www.derekborman.com/230_web_book/module3/normal/index.html
12/17
1/21/2016
simpsons picture
Normal Curve
We have learned how to find a percentage area below a score (percentile
rank of the score) and how to determine an area above a score. What
about determing a percentage area (or frequency) between two scores?
For example, suppose we have a random sample of 1000 individuals
addicted to The Simpsons TV show. Average number of episodes watched
per week is 100 with a standard deviation of 16. Suppose that we want
to offer special counseling to the group of subjects that watches between
90 and 120 episodes per week, but we need to know how many people
we would be dealing with. How many people watch between 90 and 120
episodes per week?
The problem, then, is to determine the area between the scores of 90
and 120 and to convert this area to a frequency based on N = 1,000. But
even though this problem is a little different from previous ones in this section, we still start in the same place­­
draw the curve and label it. As you can see, this curve (click to enlarge) is a little more complex than the others
that (../../images/normal/between1.jpg)we've seen so far.
This is because we will now have to figure out two z scores
instead of one. That's right. We will need to calculate z
scores for 90 and 120.
Before we get to the calculations, however, we need to
guess our answer and make an outline for the problem.
Let's see...a guess. Just looking at the space between 90
and 120, we might guess that 60­70% of the area is in that
space. Well, if 60­70% of the area is in that space, then we
would have to guess that between 600 and 700 people in
this group would watch between 90 and 120 episodes of The
Simpsons per week.
We've made our guess. There will be one outline for this
problem but we will have to go through it twice (once for
x=90 and once for x=120) to get all of the information that
we need. The subscripts indicate whether the outline is for the first or second x value.
The last step is to actually perform the calculations. First we will calculate the z scores. The z score for 90 is
which rounds to ­.63. (Remember that when the last digit is 5 or more, we round up.) The z score for 120 is
http://www.derekborman.com/230_web_book/module3/normal/index.html
13/17
1/21/2016
Normal Curve
Now, look at the drawing above. Sheez! Homer looks pretty happy; well I guess he looks more manic. Anyhow, I
think that Homer's uncharacteristic jubilation arises from the fact that our two calculations seem to line up with our
drawing. That is, when we labeled the drawing, we might have guessed that our x value of 90 would be close to a
z score of ­.625 and our x value of 120 would be close to 1.25. So far, so good.
Now that we have our z scores, we can go to the Z Table. Remember that the Z Table has a section for negative z
scores and another section for positive z scores. When we look up a z score of ­.63, we see that the area below
that is .2643 or 26.43%. Looking up a z score of 1.25 reveal that .8944 or 89.44% of the distribution is below.
But our question didn't ask about area and number of people below these values. Our original question asked us to
deal with the area between scores. To find the area between two z scores, we just subtract the area below
(26.43%) for one z score (­.63) from the area below (89.44%) for the other (1.25). When we subtract 26.43%
from 89.44% we arrive at a difference of 63.01%. In other words, we estimate that about 63% of the area in this
curve is between the raw scores of 90 and 120 episodes viewed per week.
We're almost there. Look at our outline. One step to go. To finish this up, we need to figure out the total number
of individuals that we would expect to fall between 90 and 120 episodes viewed per week. All we have to do at this
point is find 63.01% of 1,000, which is our sample size. Piece of cake.
1,000 x 63.01% = 630.1, or 630
Thus, about 630 people would have IQ scores between 90 and 120. Wow! 630! That's a lot of people to provide
counseling for, but I think we're up to the challenge.
Note that drawing your curve really helps you to visualize the problem and enables you to see easily that we must
find the difference between the areas to arrive at the correct solution. No question about it­­drawing curves will
help you understand these problems better and help you to avoid errors.
How Do I find pobability?
The normal distribution, as well as other distributions, is a probability distribution. Although we have been
discussing percentage areas, percential ranks and percent frequencies up to this point, we could just as easily have
talked about probabilities. In fact, we can convert the percentage areas in the normal curve table to probabilities
by dividing % Area by 100.
For example, if the percentage area under the curve between 2 and 3 standard deviation units of the mean is 2.15,
then the probability of someone having a score in this area is: 2.15/100 = .0215.
(../../images/normal/area_above1.png)Let's go back to a previous problem. We will use the data and curve from
the section on assessing area above a given point on the curve. We have administered an IQ assessment and have
found a sample average of 100 and a standard deviation of 16. Suppose that we want to determin the probability
that a person picked at random would have an IQ score of 120 or above. After drawing and labeling our curve, we
might guess that the probability is somewhere between .10 and .15­­that is, between 10% and 15% probability.
What would the outline be for this problem?
http://www.derekborman.com/230_web_book/module3/normal/index.html
14/17
1/21/2016
Normal Curve
X ­­> z ­­> Area ­­> Probability
Let's walk through it. We have a raw score of 120. According
to our outline, our next move is to calculate a z score. We
conduct our calculations as follows and come up with a
positive z score of 1.25.
Because this is a positive z score, we know that we will be
looking at the part of the Z Table with positive z scores depicted. After locating +1.25 in the Z Table, we find that
.8944 or 89.44% of the distribution lies below. We find the area above by simply subtracting 89.44% from 100%.
This yields a difference of 10.56%. In other words
That's what it says in the table, but that's not what we were asked. We were asked to find the area above. How do
we do this? Simple. Just subtract 89.44% from 100%. That gives us a difference of 10.56%. In other words,
10.56% of the distribution is above a raw score of 120.
To convert this area into a probability, you would perform this calculation: 10.56%/100 = .1056. In other
words, if you randomly chose someone from the group tested, there is a 10.56 probability that you would select
someone who has an IQ higher than 120. That's a very low probability.
Another way to move from percent area to probability is to simply use the word "probability." You don't really have
to divide the area by 100 and use a decimal place. Just take your area and say something like, "There is a 10.56%
probability of selecting someone with an IQ of 120 or higher." That works just fine. Moreover, we can see that this
conclusion is consisten with our drawing and our guess.
Try a couple on your own. What is the probability of someone scoring below a percentile rank of 63? What is the
probability of someone scoring above a raw score of 105? What is the probability of randomly selecting someone
who scored between 90 and 120? Draw it! Guess it! Outline it! Calculate it! Check with someone else in class
to see if you arrived at the correct answers. Or better yet, articulate your answers in the bulletin board below. How do I find deviant Scores And their Probability?
How do you like that? We start of talking about normal and end up talking about "deviance." Circle of life I
suppose­­sometimes we're near the middle of the distribution, moving with the herd. While at other times we're off
doing our own thing.
What have we covered so far?
1. 2. 3. 4. 5. 6. Finding percentile rank from a score
Finding a score based on percentile rank
Finding the area above a score
Finding the area between two scores
Finding the frequency of scores in a particular area
Finding the probability of scores in a particular area
In another type of problem, we might be asked to determine IQ scores so deviant or unlikely that they occur 10%
http://www.derekborman.com/230_web_book/module3/normal/index.html
15/17
1/21/2016
Normal Curve
or less of the time. In a way, this is a trick question, because the direction of the deviance from the mean is not
stated. Because the direction is is not specified, we are actually dealing with both ends of the normal curve and not
just the top or botom tail. An appropriate outline would be:
Area ­­> z ­­> X
(../../images/normal/deviance1.png)Notice in the figure
(click to enlarge) that because the direction is not stated,
the deviant 10% on the curve has been split into two equal
parts, with 5% above the mean and 5% below it.
Extremely high IQ scores are as unlikely as extremely low
ones. If we had not divided the 10% in half and had
instead put 10% at the upper end and 10% at the lower
end, then we would be cutting off the deviant 20% of the
distribution. Whenever we encounter a problem in which
the direction of the deviance is not given, we must halve
the given area before labeling both ends of the curve.
So what scores cut off the deviant upper and lower 5% of
the distribution? Based on our drawing, we might guess
somewhere around 70 to 75 on the low end and 125 to 130
on the high end.
The problem now becomes one of finding a z score when we know a percentage area under the curve and
converting this z score into a raw score. Since our z scores are cutting off the deviant 5% of the distribution on the
high and low ends, we know that the z scores will be mirror images of one another. That is, the values will be the
same, with one being positive and the other negative.
So we can look in either the negative or positive sections of the Z Table. If you look in the negative section, you
will be looking for the area that comes closest to 5%. In our Z Table, the areas that come closest are .0585 and
.0505. These areas are associated with z scores of ­1.65 and ­1.64, respectively. If we look in the positive section
of the table, we have to look for the area that comes closest to 95%. The areas that come closest to 95% in the
positive section are .9495 and .9505. These areas are associated with z scores of +1.64 and +1.65, respectively.
We've just confirmed that the 5% cutoff point on the high end is at exactly the same place on the low end.
For our example, let's take positive and negative 1.64 as the z scores cutting off the deviant 10% of the
distribution. The final step in the problem is to convert these two z scores into raw scores. We will use the formula
that allows us to convert z scores to raw scores.
X = zs + avg = (±1.64)(16) + 100 = ±26.24 + 100 = 73.76 and
126.24
Thus, the IQ scores that are so unlikely that they occur 10% or less of the time are those less than 74 and greater
than 126.
The problem of finding deviant scores that we have been considering can be related back to probability. For
example, when we asked, "What scores are so deviant that they occur less than 10% of the time?" we could have
as easily asked, "What scores are so deviant that their probability of occurrence is 5% or less?" To answer this
question we would have followed the same steps as above, but instead of thinking about the Z Table in terms of
area, we would have thought about it in terms of probability. And as We've already figured out, area and
probability are almost one and the same idea when we're talking about curves.
Self Test
http://www.derekborman.com/230_web_book/module3/normal/index.html
16/17
1/21/2016
Normal Curve
Self­test for chapter (normal_test.pdf)
Answers to self­test (normal_answers.pdf)
Some content adapted from other's work. See home page for specifics.
LAST UPDATED: 2015­09­24 5:32 PM
Mesa Community College | 1833 W. Southern Ave. Mesa, AZ 85202 | E­mail Address: [email protected] | Phone: (480) 461­7181 |
Disclaimer
xhtml | css | 508
DEREK BORMAN: PSYCHOLOGICAL SCIENCE
MCC PSYCHOLOGICAL SCIENCE HOMEPAGE
http://www.derekborman.com/230_web_book/module3/normal/index.html
17/17