Download Chapter 6

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Chapter 6
Putting Statistics to Work: the Normal Distribution
Productive inference from sample to population requires that the appropriate statistic be
used to characterize various probabilities associated with the distributions of interest. As
we may hypothetically have an infinite number of means as well as an infinite number of
standard deviations that describe potential distributions, we therefore have a problem to
solve in that we do not have an infinite number of statistical procedures to deal with
every possible distribution. Does this mean that we can't use statistics to analyze a vast
majority of our data? Thankfully, no. While each distribution is unique, most
distributions can be grouped with other distributions that share important characteristics.
These groups of similar distributions can be further characterized by an 'ideal' (i.e.,
theoretical) distribution that typifies the important characteristics. Statistics applicable to
the entire group of similar distributions can then be developed based upon our knowledge
of the ideal distribution. Perhaps the most important ideal distribution used is the
'normal' distribution (Figure 6.1). Once one understands the characteristics of the normal
distribution, knowledge of other distributions is easily obtained.
Figure 6.1. The normal distribution.
Most people are familiar with the normal distribution described as a “bell-shaped curve,”
perhaps as a scale for grading. The bell-shaped curve is nothing but a special case of the
normal distribution; the words “bell-shaped” describe the general shape of the
distribution, and the word “curve” is used as a synonym for distribution. While we
generally refer to the normal distribution, there are really many different normal
distributions. In fact, there are as many different normal distributions as there are possible
means and standard deviations, both theoretically and in the real world.
However, all of these normal distributions share five characteristics.
1. Symmetry. If divided into left and right halves, each half is a mirror image of the other.
2. The maximum height of the distribution is at the mean. One of the consequences of
this stipulation and number 1 above is that the mean, the mode and the median have
identical values.
3. The area under a normal distribution sums to unity. This characteristic is simpler than
it sounds but is important because of how we use the normal distribution. Areas within
the theoretical distribution as a geometric form represent probabilities of events that
range from 0 to 1 (i.e., 0% to 100%). The phrase ‘sums to unity’ means that all of the
probabilities represented by the area under the normal distribution sum to 1 and thus
represent all possible outcomes. Each half of the symmetrical distribution of which the
mean is the center represents half (.5) of the probabilities.
4. Normal distributions are theoretically asymptotic at both ends, or tails, of the
distribution. If we were to follow a point along the slope of the curve toward the tail to
infinity, the point would incrementally become ever closer to zero without ever quite
reaching it. This aspect of the normal distribution is necessary because we need to
consider every possible variate to infinity. Put another way, every single possible variate
can be assigned some probability of occurring, even if it is astronomically small.
5. The distribution of means of multiple samples from a normal distribution will have a
tendency to be normally distributed. Considering this commonality among normal
distributions requires thinking about means somewhat differently. As you know, means
characterize groupings of variates. In this special context we need to consider calculating
individual means on repeated samples, and plotting these means as variates that
collectively create a new distribution that is composed of means. Accordingly, this new
distribution has a tendency to be normally distributed. This issue will be further discussed
in Chapter 7.
With these commonalties in mind, let us further consider some of the differences among
normal distributions. First, as Figures 3.17, 3.18 and 3.19 show, normal distributions may
be conceptualized as leptokurtic, platykurtic, or mesokurtic. Additionally, any
combination of means and standard deviations is possible, and there is no necessary
relationship between the mean and the standard deviation for any given distribution.
Normal distributions may have different means and the same standard deviation (Figure
6.2) or the same means and different standard deviations (Figure 6.3).
Figure 6.2. Two normal distributions with different means and the same standard
deviation.
Figure 6.3. Two normal distributions with the same mean and different standard
deviations.
If σ is large, variates are generally far from the mean. If σ is small, most variates are
relatively close to the mean. Regardless of the standard deviation, variates near the mean
in a normal distribution are more common, and therefore more probable, than variates in
the tails of the distribution. One of the most useful aspects of normal distributions is that
regardless of the value of σ or µ (Figure 6.4):
µ ± 1 σ contains 68.26% of all variates
µ ± 2 σ contains 95.44% of all variates
µ ± 3 σ contains 99.734% of all variates
Figure 6.4. Percentages of variates within 1, 2, and 3 standard deviations from µ .
It is also possible to express this relationship in terms of more commonly used
percentages. For example:
50% of all items fall between µ ± .674 σ
95% of all items fall between µ ± 1.96 σ
99% of all items fall between µ ± 2.58 σ
If µ ± 1 σ contains 68.26% of all variates, µ ± 2 σ contains 95.44% of all variates, and µ
± 3 σ contains 99.74% of all variates (Figure 6.4), we know that any values beyond µ ±
2 σ are rare events, expected less than 5 times in 100, and µ ± 3 σ is even more rare,
expected less than 1 time out of 100.
This characteristic of the normal distribution allows us to consider the probability of
individual variates occurring within a geometric space under the distribution. As the
probability space (i.e., the sum of the area of probability we are considering) under the
normal distribution = 1.0, we know that the percentages mentioned above may be
converted to probabilities. When we consider the relationship between a distribution and
an individual variate of that distribution, we know that the probability is .6826 that the
variate is within µ ± 1 σ ; .9544 that the variate is within µ ± 2 σ ; and .9974 that the
variate is within µ ± 3 σ (Figure 6.5).
Figure 6.5. Standard deviations as areas under the normal curve expressed as
probabilities.
The probabilities illustrated in Figure 6.5 are unchanging for all normal distributions
regardless of their means or standard deviations. Furthermore, probabilities may be
calculated for any area under the curve. For example, we might be interested in the areas
between two points on the axis, or between one point and the mean, or between one point
and infinity. These areas under the curve do vary depending on the location and the shape
of the distribution as described by the mean and the standard deviation. In other words,
there are as many relationships between any individual variate and the probabilities
associated with normal distributions as there are different possible means and standard
deviations. All are infinite in number.µσ
In order to best effectively use the normal distribution to generate probabilities,
statisticians have created the standard normal distribution. The standard normal
distribution has, by definition, µ=0 and σ =1. Rather than calculate probabilities of areas
under the curve for every possible mean and standard deviation, it is easiest to convert
any distribution to the standard normal. This transformation occurs through the
calculation of z, where:
Formula 6.1: z =
Yi − µ
σ
The calculation of z establishes the difference between any variate and the mean ( Yi − µ ),
and expresses that difference in standard deviation units (by dividing by σ ). In other
words, the product of the formula, called a z-score, is how many standard deviations Yi
is from µ in the standard normal distribution. Appendix A is a table of areas under the
curve of the standard normal distribution. Once we have a z-score, it is possible to use
Appendix A to determine the exact probabilities under the curve. To illustrate this point,
let us consider the following example.
Donald K. Grayson, in his analysis of the microfauna from Hidden Cave, Nevada, notes
that only one species of pocket gopher, Thomomys bottae, occurs in the area today,
although it is possible for other species to have been represented in the past. Grayson
(1985:144) presents the following descriptive statistics on mandibular alveolar lengths in
mm for modern Thomomys bottae: Y =5.7, s=.48, and n=54. Specimen number HC-215
has a value Yi =6.4. What is the probability of obtaining a value between the mean,
Y =5.7, and Yi =6.4 (Figure 6.6)?
Figure 6.6. Illustration of the relationship between the sample mean and the variate of
6.4.
Since we do not have the population parameters, we substitute the sample values for the
mean and standard deviation.
z=
Yi − Y 6.40 − 5.70
=
s
.48
z = 1.46
. Is this
This value for z tells us that 6.4 is 1.46 standard deviations from the mean
a common or rare event? We know in general it is common, as the value lies between
one and two standard deviations from the mean. Yet, we might be interested in the exact
probability. These probabilities for areas under the standard normal distribution can be
found in Appendix A. Values expressed in Appendix A are probabilities in the area
between z and the mean. To find the probability for 1.46 standard deviation units, look
down the left side of the table until the value 1.4 is located. Follow this row until it
intersects with the column value for .06. At that intersection is the value .4279, which
represents the probability of a variate falling between the mean and z = 1.46 . A value in
that interval is therefore a common event.
In addition to determining the probability found in the above example, we can also find
the probability of having a value greater than and less than 6.4 (z = 1.46). Since we know
that the total probability represented in the curve is equal to 1.0, and that .50 lies on each
side of the mean, we can determine that .5 + .4279 = .9279 equals the probability of a
value less than z = 1.46, and 1-.9279 = .0721 represents the probability of a value greater
than z = 1.46. We could then conclude that a value larger than 1.46 would approach being
a rare event, something that we would expect approximately only 7 times out of 100.
The above example illustrates finding probabilities based on areas under the normal
curve. It should be noted that we cannot determine exact values that represent points on
the line, because points are infinitesimally small. To illustrate this, we did not determine
above the probability of a value z = 1.46; only values greater or lesser than this value, or
the probability of a value between z = 1.46 and the mean. The probability of the point z =
1.46 cannot be measured. If absolutely necessary to find the area that closely relates to
1.46, one should look for the area under the curve between 1.455 and 1.465.
Note that Appendix A only presents values for areas where Yi is greater than the mean.
What happens if Yi is less than the mean? Another example will serve to illustrate this
point. What is the probability of an alveolar length between 5.3 mm and 6.8 mm (Figure
6.7)?
Figure 6.7. Illustration of the relationship between the sample mean and the variates 5.3
and 6.8.
We can illustrate this probability in the following way:
Pr{5.3< Yi <6.8}
Pr{
Y1 − Y
Y −Y
<z< 2
}
s
s
Pr{
5.3 − 5.7
6.8 − 5.7
<z<
}
.48
.48
Pr{ − .83 < z < 2.29 }
Since the normal curve is symmetrical, it is possible to ignore the negative sign for -.83 to
use Appendix A to find the area between this value and the mean. The tabled value for
.83 = .2967. The tabled value for 2.29 = .4890. Since we are interested in the area under
the curve between -.83 and 2.29, we can sum the two individual probabilities to
determine that the
Pr{ − .83 < z < 2.29 } = .2967+.4890 = .7857.
You will note that we used a new kind of notation in the preceding example. Unlike
many of the symbols previously discussed, this notation does not provide instructions for
computation. Instead, it describes the problem we wish to solve. Pr is the symbol
indicating we are determining a probability. The area inside the brackets {} is called the
probability space. It indicates exactly what probability we wish to find: in this case, the
probability of a variate with a value between 5.3 and 6.8. While it may seem tedious, it is
important you explicitly write and draw a sketch of your probability space. It is a useful
and easy way to keep track of the probability space you are after while ensuring you do
not make a simple mistake.
The normal distribution is incredibly useful for a number of reasons. For example, we
may now conclude that specimen HC-215 from Hidden Cave does not differ in a
significant manner from the modern population of Thomomys bottae. If it did, it may
have led us to suggest that another species of Thomomys might have been present at
Hidden Cave in the past – a conclusion of significant paleoenvironmental and
archaeological significance. This and similar uses fall under the subject of hypothesis
testing, the subject of the next chapter.
References Cited
Grayson, D.K. 1985. The paleontology of Hidden Cave: Birds and mammals. In The
Archaeology of Hidden Cave, Nevada, edited by D. H. Thomas, pp. 125-161. American
Museum of Natural History Anthropological Papers 66(1).