Download Gaussian (or Normal) Random Variable

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Statistics 215 Lab Materials
Gaussian (or Normal) Random Variable
In this section we introduce the Gaussian Random Variable, which is more commonly referred to as the
Normal Random Variable. This is the random variable that has a bell-shaped curve as its probability
density function. This is pictured below.
Page 1 of 6
Statistics 215 Lab Materials
The Normal distribution or a Normal random variable has nothing truly “normal” about it. That is to say,
that there is nothing abnormal about other random variables. The Normal distribution does arise more
frequently than other distribution. There are two settings in which it occurs quite frequently. The first of
these is biological. The Normal distribution seems to arise when numerous quantities are added together.
This often arises in biology when large amounts of genetic material combine in a particular trait, e.g.
heights or lengths. The other setting where the Normal is often observed is the psychological setting. As
with heights and lengths, this is thought to be the result of many genetic factors combining. For example,
IQ measurements are often modeled as having a Normal distribution. More specific examples of Normal
RV’s include: lengths of newborn male piglets, heights of female peacocks, lengths of 2 inch nails, scores
on the Stanford-Binet Psychological test.
The Normal distribution has the following characteristics. It’s range is the entire number line. We can
completely determine any Normal distribution by knowing it’s mean and it’s standard deviation. The
distribution is symmetric about the mean. Consequently the mean and the median are the same number.
The mean tells us where the center of the distribution is and the standard deviation tells us how dispersed or
spread out the distribution is. The Normal distribution is used so commonly that we have special notation
for the Normal distribution. The reason that the Normal distribution is so widely used is that it is an
extremely flexible distribution.
Notation: X ~ N(5,4) is read “X is a RV with a Normal distribution with mean 5 and variance 4”. In
general, the notation is Y ~ N(µy, σy2) is read “Y is a Normal random variable with mean µy and variance
σy2.”
As with other continuous RV’s the Normal distribution uses area to determine probability. However, the
Normal has a special feature that separates it from other distributions. This feature is that for calculating
probabilities what is necessary for finding a particular probability is the z-score corresponding to the cutoff
of interest. That is, if we want to know P(X<7) for a Normal RV X, what we need to know is the z-score
for 7. Recall that the z-score for 7 would be
z=
7 − µx
which depends on the values for the mean and
σx
the standard deviation. One result of this is that the probability of being 2 standard deviations above the
mean is the same whether the mean is 75 or 75,000 and whether the standard deviation is 2 or 200. As a
consequence the z-score plays an indispensable role in calculating probabilities from a Normal distribution.
Recall that the z-score of a value€
c is the number of standard deviations c is above or below the mean.
Because of the role that the z-score plays, we specify a random variable Z to have a Normal distribution
with mean 0 and standard deviation 1. Z is often referred to a “standard Normal” random variable. The
reason for this specification is that by taking the z-score all Normal random variables can be transformed
into having mean 0 and standard deviation 1. The overall goal and consequence of this is that we need to
use the z-score (and hence the standard Normal distribution) to find Normal probabilities. Thus if X is a
Normal random variable with mean 85 and standard deviation 5, then P(X>90) = P(Z>
)=
P(Z>1.0). This is because we can transform the variable X to the variable Z and by calculating the z-score
for 90, we have the same probability, P(X>90) = P(Z>1.0). This is true for any calculation that we do with
Normal random variables. We transform to Z and use Z for our probabilities.
Calculating Normal Probabilities
There are three steps to calculating a Normal probability.
1. Find the z-score for the value of interest.
2. Determine the appropriate formula for calculating the probability.
3. Use that z-score to find the probability using Table 3.
Page 2 of 6
Statistics 215 Lab Materials
Example:
If X is a Normal RV with mean 5 and standard deviation 2, find the z-score for x = 4.
The z-score for x = 4 is
=
= -0.5. Consequently, the value x = 4 is one-half of a
standard deviation below the mean, since z = -0.5. Then P(X>4) = P(Z>-0.5).
Example
If X is a Normal RV with mean 5 and standard deviation 2, find the z-score for 8.4.
The z-score for 8.4 is
=
= 1.7. Consequently, the value 8.4 is 1.7 standard
deviations above the mean, since z = 1.7. Then P(X<8.4) = P(Z<1.7).
Example:
If H ~N(142, 3.52), find the z-score for 150.
The z-score for 150 is
=
=2.29. Consequently, the value 150 is 2.29 standard
deviations above the mean, since z=2.29. Then P(H>150) = P(Z>2.29).
Having found the z-score we need to determine the appropriate formula for calculating the probability of
interest. The reason that we do this is the structure of Table D.3(a), which we will use for calculation. This
table has values for probabilities that are less than and with positive z-scores. However, we are often
interested in probabilities that involve negative z-scores or in probabilities that involve greater than a
particular value. Assume that we are interested in a random variable X with mean 70 and standard
deviation 10. P(X<80)=P(Z<
) = P(Z<1.0). This is an example of a probability that is less than a
positive z-score. Instead, if we wanted P(X>80) = P(Z>1.0), then this is an example of a probability that is
more than a positive z-score. If we wanted to know P(X>60) = P(Z>
) = P(Z>-2.0), this is an
example of a greater than probability with a negative z-score. Finally, if we need to calculate P(X<60) =
P(X<-2.0), this is an example of a less then probability with a negative z-score.
Table 3 contains probabilities such as P(Z<z). Consequently, we need rules to work other probabilities into
this format. This is similar to the rules that were used for the binomial and Poisson tables to get
probabilities other than P(X≤r).
What we want
Calculation we need to perform
Example
P(Z<z), with z positive
P(Z<z)
P(Z<1.42)
P(Z>z) with z positive
P(Z>z) = 1-P(Z<z)
P(Z>1.42) = 1-P(Z<1.42)
P(Z<z) with z negative
P(Z<z) =P(Z<z)
P(Z<-1.42)
P(Z>z) with z negative
P(Z>z) = P(Z<-z) *
P(Z>-1.42) = P(Z<1.42)
*Recall that the negative of a negative is a positive.
These rules stem from two basic facts. First the symmetry of the Normal distribution means that the
P(Z>z) = P(Z<-z). Since z and –z are the same distance from the mean of zero, symmetry says these
Page 3 of 6
Statistics 215 Lab Materials
probabilities must be the same. The other fact that is used is the complement rule, which says that P(Z>z)
= 1- P(Z<z). Combining these facts we get the above table of rules.
Finally the last step we need is using Table 3. Suppose we want to find P(Z<1.48). The first step is to find
the tenths place 1.4 and find it in the first column. Then go across that row to the column labeled 0.08.
The entry in the table is 0.9306, so P(Z<1.48) = 0.9306.
If we want to find P(Z< 0.85). First find 0.8 in the first column of the table. Then go across that row to the
column for 0.05. The entry in the table is 0.8023, so P(Z<0.85) = 0.8023.
If we want to find P(Z<2.11). Again we find 2.1 in the first column of the table and go across that row to
the column for 0.01. The value in the table is 0.9826, so P(Z<2.11) = 0.9826.
The following examples combine all these steps.
Example
Suppose that X is a normal random variable with mean 100 and standard deviation 7.5
Find P(X < 110).
P(X<110) = P(Z<
) = P(Z<1.33) = 0.9082. We can look P(Z<1.33) up directly in the table.
Find P(X > 120)
P(X>120) = P(Z>
) = P(Z>2.67) = (by complementary events) =1- P(Z<2.67) = 1- 0.9962 =
0.0038.
Find P(X > 93)
P(X>93) = P(Z>
) = P(Z>-0.93) = (by symmetry of the Normal distribution) =P(Z<0.93) =
0.8238.
Find P(X < 84)
P(X<84) = P(Z<
) = P(Z < -2.13) = (by symmetry of the Normal distribution) =P(Z>2.13) = (by
complementary events) = 1-P(Z<2.13) = 0.9834.
TIP: Since Table 3 uses only two decimal places for z-scores, round all z-scores to two decimal places
when using this table.
TIP: It is common to refer to a random variable by the name of the random variable or by the distribution.
They are interchangeable. Since any RV is defined by its distribution, this usage is appropriate, though it
often confuses people the first time they see or hear this.
TIP: It is often helpful when doing calculations with Normal probabilities to draw a picture to get an idea
about the quality of your final answer. If it conflicts with the picture then you may need to reconsider your
calculations. The first step in this is to draw a bell-shaped curve. Draw a vertical line down the center and
label it with the value of the mean. Over 99% of the Normal distribution is within 3 standard deviations of
the mean. So go to the right edge of you curve and label it with the value of the mean plus three times the
standard deviation. Go to the left edge and label it with the value of the mean minus three times the
standard deviation. Then shade the area for the probability that you are interested in.
Page 4 of 6
Statistics 215 Lab Materials
Example:
Suppose X is a Normal random variable with mean 120 and standard deviation 7. Find P(X>125)
99
120
141
We use 120 for the center since it is the mean. The values 141 and 99 are 120 + 3*7 and 120-3*7, which
are 3 standard deviations above and below the mean, respectively.
P(X>125) = P(Z>
) = P(Z>0.71) = 1-P(Z<0.71) = 1- 0.7611 = 0.2389. Given the accuracy of
the picture it seems reasonable that the probability should be around 24%. We would have been nervous
had the answer we calculated been more than 50 % or less than 2%. Drawing a picture is a nice check for
gross errors in calculation.
7.4 Percentiles of the Normal distribution
Back in Chapter 4 we discussed percentiles for data. For example the 80th percentile is the point in the
distribution where 80% of the data or 80% of the probabilities are below that point (and consequently 20%
are above that point). We often want to calculate percentiles for a specific distribution or set of data. For
example if I want to build a cage that 98% of frogs will be comfortable in, the I need to know the 98th
percentile of frog sizes. An admissions officer might only want to accept students who are in the top 20%
of all scores on some standardized test. In that case the admissions officers would need to know the 80th
percentile of scores on that test. They would accept only those students whose test scores were above the
80th percentile.
To find percentiles for the Normal distribution, we reverse the process from the previous section. In the
previous section we had a value and we were looking for a probability or a percentage. For example, the
previous section we wanted P(X>182) = c and we found what c was. In this section, we’ll have P(X>k) =
0.7500, say, and we’ll have to find k. Here, we have the percentage and we want to find the value that
would give us that percentage. Consequently, we’ll reverse the steps we took in the previous section.
Suppose that we want to find the 75th percentile of a Normal distribution with mean 430 and standard
deviation 22. Let X be a Normal RV with mean 430 and standard deviation 22. Then we want to find a
value k, such that P(X<k) = 0.7500. Likewise there exists a z-score for k, call it zk, such that P(Z<zk) =
0.7500. Now we can find zk by going into the body of Table 3 and finding 0.7500. Inside the body of the
table we find the closest percentage to 0.7500. That percentage is 0.7486. This probability corresponds to
a z-score of 0.67. To find the z-score go to the top of the column and the left of the row. Thus, zk = 0.67.
This is the z-score for k, but we need to convert that back to k. Now zk =
, 0.67 =
.
Solving for k gives us k = 430 + 0.67*(22) = 444.74. So the 75th percentile of a Normal distribution with
mean 430 and standard deviation 22 is approximately 444.74.
Page 5 of 6
Statistics 215 Lab Materials
Finding the j*100th percentile, k, of a Normal random variable X.
1. In the body of Table D.3(a) find j (or the value closest to j).
2. Find the z-score for j, call it zk.
3. Using the formula for the z-score,
, solve for k.
Example:
For a Normal Random variable X ~N(45, 6) find the 92nd percentile
1. In Table 3, the closest value to 0.9200 is 0.9207.
2. zk = 1.41
3.
1.41 =
, then k = 45 + 1.41*6 = 53.46.
So the 92nd percentile is 53.46.
Example:
For a Normal RV Y ~ N(76, 3), find the 97th percentile
1. In Table D.3(a), the closest value to 0.9700 is 0.9699.
2.zk = 1.88
3.1.88 =
, then k = 76 + 1.88*3 = 81.64
So the 97th percentile is 81.6.
Page 6 of 6