Download Excel Basics – Finding areas under the normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Excel Basics – Finding areas under the normal distribution.
Excel has some very useful functions for finding areas under the normal
distribution.
NORMSDIST(z)
Z is the value for which you want the distribution.
Returns the standard normal cumulative distribution function. The
distribution has a mean of 0 (zero) and a standard deviation of one. Use this
function in place of a table of standard normal curve areas.
1) a) Pick a cell and enter a z score into it (for example 2), don’t forget to
add a label so you’ll know what you put in this cell.
b) In a cell next to it, enter the function NORMSDIST(Z), use the
address of the cell where you placed the z score as your z value. What
did you get?
(if you used z=2, you should get an area of 0.97724 or in other words,
more than 97% of the population have scores lower then your z. Try
other values of z in order to get a better feeling for the use of this
function, for example 0,1,5,-1,-3)
NORMSINV(probability)
Probability is a probability corresponding to the normal distribution.
Returns the inverse of the standard normal cumulative distribution. The
distribution has a mean of zero and a standard deviation of one.
NORMSINV will return a z score that corresponds to an area under the
curve. The area should be between 0 and 1.
2) a) Pick a cell and enter a probability into it (for example 0.975), don’t
forget to add a label so you’ll know what you put in this cell.
b) In a cell next to it, enter the function NORMSINV(probability), use
the address of the cell where you placed the probability. What did you
get?
(if you used p=0.975, you should get a z score of 1.95996. What
would you get if you used a p = 0.97724 (you should get a value close
to 2, your z from #1) Try other values of p in order to get a better
feeling for the use of this function, for example 0.5,0.99.
In real life, we usually deal with normal distributions that are not
standardized, so they are not expressed in z scores. Excel has several
functions that will let you compute areas under the curve directly from your
scores without standardizing them first.
NORMDIST(x,mean,standard_dev,cumulative)
X is the value for which you want the distribution.
Mean is the arithmetic mean of the distribution.
Standard_dev is the standard deviation of the distribution.
Cumulative is a logical value that determines the form of the function. If
cumulative is TRUE, NORMDIST returns the cumulative distribution
function; if FALSE, it returns the probability mass function. IN THIS
EXERCISE USE “TRUE” SINCE YOU WANT THE AREA UNDER
THE CURVE.
3) a) Enter your score, mean, S.D to different cell in Excel. Don’t
forget to add a label so you’ll know what you put in this cell, for
example use, x=102,m=100,sd = 2. You can also enter the word
TRUE into a cell so you can use it in the function.
b) In another cell enter the function
NORMDIST(x,mean,SD,cumulative), use the address of the cells
where you placed the x,mean,sd,TRUE. What did you get?
(if you used the sample values, you should get an area of 0.8413. Try
other values of x, m and s.d in order to get a better feeling for the use
of this function.
NORMINV(probability,mean,standard_dev)
Probability is a probability corresponding to the normal distribution.
Mean is the arithmetic mean of the distribution.
Standard_dev is the standard deviation of the distribution.
Returns the inverse of the normal cumulative distribution for the specified
mean and standard deviation.
4) a) Type into different cells a probability,mean and S.D (for
example p = 0.84134, m =100, s.d = 2), don’t forget to add a label so
you’ll know what you put in this cell.
b) In a cell next to it, enter the function
NORMINV(probability,mean,s.d), use the addresses of the cells
where you placed the probability, mean and S.D. What did you get?
(if you used the sample values, you should get a score close to 102
which was the x value from #3. This means that 84.135% of the
population have a score below 102 in a population with that is
normally distributed with a mean of 100 and S.D of 2. Try other
values of p, m and S.D in order to get a better feeling for the use of
this function.
Questions about the normal distribution often ask you to calculate
the area under the curve between two scores or the probability that
a score would turn out to be between two scores. The following
exercise shows you how to calculate those values easily.
5) Enter into cell A16 “M”, B16 “SD”, C16 “X1”, D16 “X2”, E16 “Z1”, F16
“Z2”,G16 “F(Z1)”, H16 “F(Z2)”, I16 “p”.
Enter the values 100,2,96,104 below m, S.D, x1, x2 respectively. These
values are usually given to you in questions. Below Z1 we will calculate the
standard score of X1. There are two ways to do this, either using the formula
we learned in class (X-)/σ or using the excel function STANDARDIZE(x,
mean, S.D). Choose either one of them.
Calculate Z2 in the same way. If you used the sample values you should get
z scores of –2,2.
Given the Z scores in E17 and F17, which Excel function would you use to
calculate the area under the curve? (NORMSDIST). Calculate F(Z1) and
F(Z2) using NORMSDIST, you should get areas of 0.023,0.977.
In order to calculate the area between these two scores, or the probability
that a score would fall between X1 and X2, calculate the difference between
F(Z2) and F(Z1) in cell I17. (H17-G17) You should get a value of 0.954 so
there is 95.4 chance that a given score would fall between 96 and 104 in our
distribution.
This could be very useful when working on the homework problems. For
example, see question 4 in chapter 6.
We are given the mean height of a Merkin plant m=65 and the S.D = 3.
A) What is the probability that the plant will be between 64 and 67 inches
tall?
D) What is the probability that the plant will be less than 40 inches tall? I
would use NORMDIST to answer this.
E) What is the probability that the plant will be more than 60 inches tall?
You can use NORMDIST to calculate the probability that the plant will
be less than 60 inches and then get the probability that the plant will be
more than 60 inches tall using the fact that the total area under the curve
is equal to 1.
Question:
What is the z score that corresponds to Alpha = 0.05, or in other words: if
we set up alpha to be 0.05 what is the z score that anything above it would
cause us to reject the null hypothesis?
Enter 0.05 into a cell in Excel, label it Desired alpha. In another cell,
calculate the bottom portion of the distribution. In other words, if your alpha
is set to 0.05, then 0.95 of the population is below is so the bottom portion is
equal to 1- desired alpha.
Now use the function NORMSINV(p) to calculate the z score that
corresponds to this alpha. Your p is the bottom portion and you should get a
z = 1.645, you will see this value many times in the next few weeks.
As you will soon learn in class, hypothesis testing can be either nondirectional or directional. If we divide the distribution to a bottom portion
and the region above alpha we are using a directional hypothesis and
predicting that our effect will be found in the upper portion of the curve.
Sometime we don’t know where we will find an effect so we use a non
directional test. In that case, an alpha of 0.05 should be divided to 2 so that
we place 0.025 on one end of the curve and 0.025 on the other side of the
curve.
Enter your desired alpha into a cell in Excel, the upper tail in this case will
be alpha/2. Calculate this value in another cell.
As before, the bottom portion is 1-upper_tail, the only difference is that now
the upper tail is equal to alpha/2. Calculate the bottom portion in a different
cell, you should get 0.975.
You can now calculate the z score that corresponds to the bottom portion
using NORMSINV(p). You should get z=1.96. Because of symmetry
reasons (the standardized normal distribution is symmetric around 0) the z
score that corresponds to the upper portion is equal to –z or –1.96. You can
also get this value by using NORMSINV(upper tail) or NORMSINV(0.025).
Don’t worry if this is not entirely clear, the class should clear up any
confusion.