Download Properties of Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Central limit theorem wikipedia , lookup

Probability amplitude wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Chapter 1
Properties of Random
Variables
Random variables are encountered in every discipline in science. In this
chapter we discuss how we may describe the properties random variables,
in particular by using probability distributions, as well as defining the mean
and the standard deviation of random variables. Since random variables
are encountered throughout chemistry and the other natural sciences, this
chapter is rather broad in scope. We do, however, introduce one particular
type of random variable, the normally-distributed random variable. One
of the most important skills you will need to obtain in this chapter is the
ability to use tables of the standard normal cumulative probabilities to solve
problems involving normally-distributed variables. A number of numerical
examples in the chapter will illustrate how to do so.
Chapter topics:
1. Random variables
2. Probability distributions
(especially the normal
distribution)
3. Measures of location and
dispersion
1.1 A First Look at Random Variables
In studying statistics, we are concerned with experiments in which the outcome is subject to some element of chance; these are statistical experiments.
Classical statistical experiments include coin flipping or drawing cards at
random from a deck. Let’s consider a specific experiment: we throw two
dice and add the numbers displayed. The list of possible outcomes of this
would be {2, 3, . . . , 12}. This list comprises the domain of the experiment.
The domain of a variable is defined simply as the list of all possible values
that the variable may assume.
The domain will depend on exactly how a variable is defined. For example, in our dice experiment, we might be interested in whether or not the
sum is odd or even; the domain will then be {even, odd}. Let’s consider a
different experiment: tossing a coin two times. We can think of the domain
as {HH, HT, TH, TT}, where H = heads and T = tails. Alternatively, we can
focus on the total number of heads after the two tosses, in which case the
domain is {0, 1, 2}. In all experiments, the domain associated with the
experiment will contain all the outcomes that are possible, no matter how
unlikely.
Chemical measurements using some instrument are also statistical measurements, with an associated domain. The domain of these measurements
will be all the possible values that can be assumed by the measurement device.
1
The domain contains all the
possible values of a variable
2
1. Properties of Random Variables
discrete values
continuous values
Figure 1.1: Difference between discrete and continuous random variables. A discrete variable can only assume certain values (e.g., the hash-marks on the number
line), whereas a continuous variable can assume any value within the range of all
possible values.
Random variables are variables
that cannot be predicted with
complete certainty
The outcome of an experiment will vary: in other words, the outcome
is a variable. Furthermore, the outcome of a vast majority of the experiments in science will contain a random component, so that the outcome
is not completely predictable. These types of variables are called random
variables. Since random variables cannot be predicted exactly, they must
be described in the language of probability. For example, we cannot say
for certain that the result of a single coin flip will be ‘heads,’ but we can
say that the probability is 0.5. Every outcome in the domain will have a
probability associated with it.
It is an advantage to be able to express experimental outcomes as numbers; such variables are quantitative random variables (as opposed to an
outcome such as ‘heads,’ which is a “qualitative” random variable). We
will be concerned exclusively with quantitative random variables, of which
there are two types: discrete and continuous variables.
The distinction between these variables is most easily understood by
using a few examples. If our experiment consists of rolling dice or surveying the number of children in households, then the random variable will
always be a whole number; these are discrete variables. A discrete variable
can only assume certain values within the range contained in the domain.
Unlike a discrete variable, a continuous variable is theoretically able to assume any value in an interval contained within its domain. If we wanted
to measure the height or weight of a group of people, then the resulting
values would be continuous variables.
If we think in terms of a number line, a discrete variable can only assume certain values on the line (for example, the values associated with
whole numbers) while continuous variables may assume any value on the
line. Figure 1.1 demonstrates this concept. The number line in the figure represents the entire domain for a variable. A discrete variable would
be constrained to assume only certain values within the interval, while a
continuous variable can assume any value on the number line. Within its
domain there are always an infinite number of possible values for a continuous variable. The number of discrete variables can be either finite or
infinite.
One final note: although the distinction between continuous and discrete variables is important in how we use probability to describe the possible outcomes, as a practical matter there is probably no such thing as a
truly continuous random variable in measurement science. This is because
any measuring device will limit the number of possible outcomes. For example, consider a digital analytical balance that has a range of 0–100 g and
1.2. Probability Distributions of Discrete Variables
3
displays the mass to the nearest 0.1 mg. There are 106 possible values in
this range — a large value, to be sure, but not infinitely large. For most purposes, however, we may treat this measurement as a continuous variable.
1.2 Probability Distributions of Discrete Variables
1.2.1 Introduction
Let’s briefly summarize what we have so far:
• a statistical experiment is one in which there is some element of
chance in the outcome;
• the outcome of the experiment is a random variable;
• the domain is a list (possibly infinite) of all possible outcomes of a
statistical experiment.
Now, although the domain gives us the possible outcomes of an experiment, we haven’t said anything about which of these are the most probable outcomes of the experiment. For example, if we wish to measure the
heights of all the students at the University of Richmond using a 30 ft. tape
measure, then the domain will consist of all the possible readings from
the tape, 0–30 ft. However, even though measurements of 6 in or 20 ft are
contained within the domain, the probability of observing these values is
vanishingly small.
A probability distribution describes the likelihood of the outcomes of
an experiment. Probability distributions are used to describe both discrete
and continuous random variables. Discrete distributions are a little easier
to understand, and so we will discuss them first.
Let’s consider a simple experiment: tossing a coin twice. Our random
variable will be the number of heads that are observed after two tosses.
Thus, the domain is {0 1 2}; no other outcomes are possible. Now, let’s
assign probabilities to each of these possible outcomes. The following table
lists the four possible outcomes along with the value of x associated with
each outcome.
Outcome
TT
TH
HT
HH
Random Variable (x)
0
1
1
2
If we assume that heads or tails is equally probable (probability of 0.5
for both), then each of the four outcomes is equally probable, with a probability of 0.25 each. It seems intuitive, then, that
P(x = 0) = 0.25
P(x = 1) = 0.5
P(x = 2) = 0.25
where P(x = x0 ) is the probability that the random variable x is equal to
the value x0 .
There! We have described the probability distribution of each possible
outcome of our experiment. The set of ordered pairs, [x, P(x = x0 )], where
Since random variables are
inherently unpredictable,
probability distributions must be
used to describe their properties
4
1. Properties of Random Variables
x is a random variable and P(x = x0 ) is the probability that x assumes any
one of the values in its domain, describe the probability distribution of the
random variable x for this experiment. Note that the sum of the probability
of all the outcomes in the domain equals one; this is a requirement for all
discrete probability distributions.
1.2.2 Examples of Discrete Distributions
The Binomial Distribution
A Bernoulli experiment consists
of a series of identical trials,
each of which has two possible
outcomes.
Coin-tossing experiments are an example of a general type of experiment
called a Bernoulli, or binomial, experiment. For example, a biologist may be
testing the effectiveness of a new drug in treating a disease. After infecting,
say, 30 rats, the scientist may then inject the drug into each rat and record
whether the drug is successful or not on a rat by rat basis. Each rat is a
“coin toss,” and the result is an either-or affair, just like heads-tails. Many
other experiments in all areas of science can be described in similar terms.
To generalize, a Bernoulli experiment has the following properties:
1. Each experiment consists of a number of identical trials (“coin flips”).
The random variable, x, is the total number of “successful” trials observed after all the trials are completed.
2. Each trial has only two possible results, “success” and “failure.” The
probability of success is p and the possibility of failure is q. Obviously, p + q = 1.
3. The probabilities p and q remain constant for all the trials in the
experiment.
In our simple coin-tossing example, we could deduce the probability
distribution of the experiment by simple inspection, but there is a more
general method. The probability distribution for any Bernoulli experiment
is given by the following function, p(x),
The binomial distribution
function describes the outcome
of Bernoulli experiments
p(x) =
n!
p x qn−x
x!(n − x)!
(1.1)
where n is the number of trials in the experiment, and n! is the factorial of
n. If we want to find the probability of a particular outcome P(x = x0 ), then
we must evaluate the binomial distribution functions at that value x0 .
Let’s imagine that, in our hypothetical drug-testing experiment (with
n = 30 rats), the probability of a successful drug treatment is p = 0.15.
Figure 1.2 shows the probability distribution of this experiment1 .
There are three common methods used to represent the probability distribution of a random variable:
1. As a table of values, where the probability every of possible outcome
in the domain is given. Obviously, this is only practical when the
number of possible outcomes is fairly small.
2. As a mathematical function. This is the most general format, but it
can be difficult to visualize. In some cases it may not be possible to
represent the probability distribution as a mathematical function.
1 Note that the binomial distribution becomes more difficult to calculate as the number of
trials, n, increases (due to the factorial terms involving n). There some other distribution
functions that can give reasonable approximations to the binomial function in such cases.
1.2. Probability Distributions of Discrete Variables
5
0.25
Probability, P(x)
0.20
0.15
0.10
0.05
0.00
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Number of successes, x
Figure 1.2: Binomial Probability Distribution. A graphical depiction of the binomial
probability distribution as calculated from eqn. 1.1 with n = 30 and p = 0.15.
3. As a graphical plot. This is a common method to examine probability
distributions.
Figure 1.3 on page 6 describes the outcome of an experiment using a plot
and a table, both of which were constructed using equation 1.2.
The Poisson Distribution
Besides binomial experiments, counting experiments are also quite common
in science. Most often, we are interested in counting the number of occurrences of some event within some time interval or region in space. For
example, we might want to characterize a photon detection rate by counting the number of photons detected in a certain time interval, or we might
want to characterize the density of trees in a plot of land by counting the
number of trees that occur in a given acre. The random variable in any
counting experiment is a positive whole number, the number of “counts.”
It is often true that this discrete random variable follows a Poisson distribution.
Let’s say that we are counting alpha particles emitted by a sample of a
radioactive isotope at a rate of 2.5 counts/second. Our experimental measurement is thus “counts detected in one second” and the domain consists
of all positive whole numbers (and zero).
The Poisson probability distribution for this experiment can be determined from the following general formula:
p(x) =
e−λt (λt)x
x!
(1.2)
where λ is the average rate of occurrence of events, and t is the interval of
observation. Thus, for our experiment, the product λt = 2.5 events/second
× 1 second = 2.5 events. Let’s use this formula to calculate the probability
A counting experiment is an
experiment in which events or
objects are enumerated in a
given unit of time or space.
The Poisson distribution function
describes the outcome of many
counting experiments.
1. Properties of Random Variables
Probability
6
Measured alpha counts
Figure 1.3: The Poisson probability distribution, shown here as both a table and
a plot, describes the probability of observing alpha particle counts, as calculated
from eqn. 1.2 with λ = 2.5 counts/second and t = 1 second.
that we will measure 5 counts during one observation period:
P(x = 5) =
e−2.5 (2.5)5
= 0.0668
5!
Figure 1.3 shows the probabilities of measuring zero through 10 counts
during one measurement period for this experiment.
Just like the binomial distribution, the Poisson distribution of discrete
variables has two important properties:
• The probability is never negative: P(x = x0 ) ≥ 0
∞
• The sum of all probabilities is unity: i=0 P(x = xi ) = 1
These properties are shared by all discrete probability distributions.
Advanced Topic: The Boltzmann Distribution
Probability distributions are necessary in order to characterize the outcome of many experiments in science due to the presence of measurement
error, which introduces a random component to experimental measurements. However, probability distributions are also vital in understanding
the nature of matter on a more fundamental level. This is because many
properties of a system, when viewed at the atomic and molecular scale,
are actually random variables. There is an inherent “uncertainty” of matter
and energy that is apparent at small scales; this nature of the universe is
predicted by quantum mechanics. What this means is that we must again
resort to the language of probability (and probability distributions) in order
to describe such systems.
Let us consider the energy of a molecule, which is commonly considered to be partitioned as electronic, vibrational, and rotational energy. As
you should know from introductory chemistry, a molecule’s energy is quantized. In other words, the energy of a molecule is actually a discrete random
1.2. Probability Distributions of Discrete Variables
7
0.7
0.6
298 K
400 K
Probability
0.5
0.4
0.3
0.2
0.1
0.0
0
1
2
3
4
5
6
7
8
9
10
Vibrational quantum number
Figure 1.4: Probability distribution among vibrational energy levels of the I2
molecule at two different temperatures. The actual energy levels are given by
1
Eν = (ν + 2 ) · 214.6 cm−1 , where ν is the vibrational quantum number. The probability distribution assumes evenly spaced vibrational levels (i.e., the harmonic oscillator assumption). Notice that at higher temperatures, there is a greater probability
that a molecule will be in a higher energy level.
variable. The random nature of the energy is an innate property of matter
and not due to random error in any measuring process.
Since molecular energy is a random variable, it must be described by
a probability distribution. If a molecule is in thermal equilibrium with its
environment, the probability that the molecule has a particular energy at
any given time is described by the Boltzmann distribution:
e−βx
p(x) = βx
e
(1.3)
where β = (kT )−1 , T is the temperature in K, and the denominator is a summation over all the possible energy states of the molecule. If the different
states of a molecule have evenly spaced energy levels and no degeneracy,
then the Boltzmann distribution function can be simplified to
p(x) = e−βx 1 − e−β∆E
where ∆E is the separation between energy levels. Figure 1.4 shows the
probability distribution for the vibrational energy of an I2 molecule at two
different temperatures.
We can interpret the Boltzmann distribution in two ways, both of which
are useful:
• The Boltzmann distribution gives the probability distribution of the
energy of a single molecule at any given time. For example, if we measure the vibrational energy of an I2 molecule at 298 K, then according
The Boltzmann probability
distribution function describes
the energy of a molecule in
thermal equilibrium with its
surroundings.
8
1. Properties of Random Variables
to the Boltzmann distribution there is a 64.5% probability that the
molecule is in the ground vibrational level (ν = 0). If we wait for a
time (say 10 seconds), and then measure again, then there is a 22.9%
chance that the molecule has absorbed some heat and is now in the
first excited vibrational energy level (ν = 1). Of course, there is still a
64.5% chance that the molecule is in the ground state.
• The Boltzmann distribution gives the fractional distribution of molecular energy states in a chemical sample. Let’s imagine that we have a
sample of one million I2 molecules at 298K (which is not very many;
remember that one mole is about 1023 molecules). The Boltzmann distribution tells us that at any given time, about 645,000 molecules will
be in the ground vibrational energy level and about 229,000 molecules
will be in the first excited vibrational level. Molecules may be constantly gaining and losing vibrational energy, through collisions and
by absorbing/emitting infrared light, but since there are so many
molecules, the total number of molecules at each energy level will
remain fairly constant. For this reason, the energy probability density
function is sometimes called the Boltzmann distribution of states.
1.3 Important Characteristics of Random
Variables
Two important properties of
random variables are
location (‘central tendency’)
and dispersion (‘variability’).
We have discussed the idea of probability distributions, in particular the
distributions of discrete variables. We will proceed to continuous variables
momentarily, but first we will discuss two important properties by which
we may characterize random variables, irregardless of the probability distribution: location and the dispersion.
Let’s take stock of the situation thus far: for any random variable, the
domain gives all possible values of the variable and the probability distribution gives the likelihood of those values. Together these two pieces of
information provide a complete description of the properties of the random variable. Two important properties of a variable are contained in this
description are:
• Location: the central tendency of the variable, which describes a value
around which the variable tends to cluster, and
• Dispersion: the typical range of values that might be expected to be
observed in experiments. This gives some idea of the spread in values
that might result from our experiment(s).
The probability distribution contains all the information needed to determine these characteristics, as well as still more esoteric descriptors of
the properties of random variables. Since we have discussed the distributions of discrete variables, we will tend to use these in our discussions and
examples; however, the same concepts apply, with very little modification,
to continuous variables.
1.3.1 Central Tendency of a Random Variable
The central tendency, or location, of a variable can be indicated by any (or
all) of the following: the mode, the median, or the mean. Although most
1.3. Important Characteristics of Random Variables
9
people are familiar with means, the other two properties are actually easier
to understand.
Mode
The mode is the most probable value of a discrete variable. More generally,
it is the maximum of the probability distribution function: the value of
xmode such that
p(xmode ) = Pmax
probability
Multi-modal probability distributions have more than one mode — distributions with two modes are bimodal, and so on. Although multi-modal distributions may have several local maxima, there is usually a single global
maximum that is the single most probable value of the random variable.
In the example with the alpha particle measurements (see fig. 1.3), the
mode of the distribution — xmode = 2 — can be determined by glancing at
the bar graph of the Poisson distribution.
Multi-modal Distribution
value
Median
The median is only a little more complicated than the mode: it is the value
Q2 such that
P(x < Q2 ) = P(x > Q2 )
In other words, there is an equal probability of observing a value greater
than or less than the median.
The median is also the second quartile — hence the origin of the symbol
Q2 . Any distribution can be divided into four equal “pieces,” such that:
P(x < Q1 ) = P(Q1 < x < Q2 ) = P(Q2 < x < Q3 ) = P(x > Q3 )
The boundaries Q1 , Q2 (i.e., the median), and Q3 are the quartiles of the
probability distribution.
Mean
Before defining the mean, it is helpful to discuss a mathematical operation called the weighted sum. Most everybody performs weighted sums —
especially students calculating test averages or grade point averages! For
example, let’s imagine that a student has taken two tests and a final, scoring
85 and 80 points on the tests, and 75 points on the final. An “unweighted
average” of these three numbers is 80 points; however, the final is worth
(i.e., weighted) more than the tests. Suppose that the instructor feels that
the final is worth 60% of the test grade, while the other two tests are worth
20% each. The weighted sum would be calculated as follows.
Let w1 = w2 = 0.2, and w3 = 0.6
weighted score =
wi · scor e = w1 · 85 + w2 · 80 + w3 · 75 = 78
The final score, 78, is a weighted sum. Since the final is weighted more
than the test, the weighted sum is closer to the final score (75) than is the
unweighted average (80). Grade point averages are calculated on a similar
principle, where the weights for each grade are determined by the course
credit hours. To choose an example from chemistry, the atomic weights
A distribution is sometimes split
up ten ways, into deciles. The
median is the fifth decile, D5 .
10
1. Properties of Random Variables
listed in the periodic tables are weighted averages of isotope masses; the
weights are determined by the relative abundance of the isotopes.
In general, a weighted sum is represented by the expression
(1.4)
weighted sum =
wi xi
weights.
where xi are the individual values, and wi are the corresponding
When the sum of all the individual weights is one ( wi = 1) then the
weighted sum is often referred to as a weighted average.
The mean of a discrete random variable is simply a weighted average,
using the probabilities as the weights. In this way, the most probable values
have the most “influence” in determining the mean; this is why the mean is
a good indicator of central tendency. The mean, or expected value, E(x),
of a random variable is defined as follows: for a discrete variable, it is
xi p(xi )
(1.5a)
E(x) = µx =
while for a continuous variable, it is
+∞
E(x) = µx =
x p(x) dx
(1.5b)
−∞
where p(x) is a mathematical function that defines the probability distribution of the random variable x.
The means of binomial and Poisson distributions are given by the following general formulas:
µx = n · p
for the binomial distribution, where n is the number of trials and p is the
probability of success for each trial. For a variable described by the Poisson
distribution, the mean can be calculated as
µx = λ · t
where λ is the mean “rate” and t is the measurement interval (usually a
time or distance).
Comparison of Location Measures
We have defined three different indicators of the location of a random variable: the mean, median and mode. Each of these has a slightly different
meaning.
Imagine that you are betting on the outcome of a particular experiment:
• If you choose the mode, you are essentially betting on the single most
likely outcome of the experiment.
• If you choose the median, you are equally likely to be larger or smaller
than the outcome.
• If you choose the mean, you have the best chance of being closest to
the outcome.
A random variable cannot be predicted exactly, but each of the three
indicators gives a sense of the what value the random variable is likely to
mean
11
(a)
median
mode
mode
median
mean
1.3. Important Characteristics of Random Variables
(b)
Figure 1.5: Comparison of values of mean, median and mode for (a) positively
and (b) negatively skewed probability distributions. For symmetrical distributions
(so-called ‘bell-shaped’ curves) the three values are identical.
be near. In most applications, the mean gives the best single description of
the location of the variable.
Just how different are the values of the mean, median and mode? It
turns out that the three values are different only for assymetric distributions, such as the two shown in figure 1.5. If a distribution is skewed to the
right (or positively skewed; fig. 1.5(a)) then
The mean is the most common
descriptor of the location of a
random variable.
µx > Q2 > xmode
while for distributions skewed to the left (negatively skewed; fig. 1.5(b))
µx < Q2 < xmode
For symmetrical (“bell-shaped”) distributions, the mean, median and mode
all have exactly the same value.
1.3.2 Dispersion of a Random Variable
Some variables are more “variable,” more uncertain, than others. Of course,
theoretically speaking, a variable may assume any one of the range of values in the domain. However, when speaking of the variability of a random
variable, we generally mean the range of values that would commonly (i.e.,
most probably) be observed in an experiment. This property is called the
dispersion of the random variable. Dispersion refers to the range of values
that are commonly assumed by the variable.
Experiments that produce outcomes that are highly variable will be more
likely to give values that are farther from the mean than similar experiments that are not as variable. In other words, probability distributions
tend to be broader as the variability increases. Figure 1.6 compares the
probability functions (actually called “probability density functions”) of two
continuous variables.
As with the mean, it is convenient to describe variability with a single
value. Three common ways to do so are:
1. The interquartile range and the semi-interquartile range
2. The mean absolute deviation
Statisticians sometimes use the
term scale instead of dispersion.
12
1. Properties of Random Variables
0.14
Probability density
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0
5
10
15
20
25
30
35
40
Value
Figure 1.6: Comparing the variability of two random variables. The variable described by the broader probability distribution (dotted line) is more likely to be
farther from the mean than the other variable.
3. The variance, and the standard deviation.
These will now be described.
(Semi-)Interquartile Range
The interquartile range, IQR, is the difference between the first and third
quartiles (see figure 1.7):
IQR = Q3 − Q1
This is a measure of dispersion because the “wider” a distribution gets, the
greater the difference between the quartiles.
The semi-interquartile range, QR is probably the more commonly used
measure of dispersion; it is simply half the interquartile range.
QR =
Q3 − Q1
2
(1.6)
Mean Absolute Deviation
The mean absolute deviation is
the expected value of | x − µx |
Since the dispersion describes the spread of a random variable about its
mean, it makes sense to have a quantitative descriptor of this quantity. The
mean absolute deviation, MD, is exactly what it sounds like: the expected
value (i.e., the man) of the absolute deviation of a variable from its mean
value, µx .
MD ≡ E (| x − µx |)
The concept behind the mean absolute deviation is quite simple: it indicates the mean (‘typical’) distance of a variable from its own mean, µx . For
1.3. Important Characteristics of Random Variables
13
Probability
Interquartile Range
Q1
Q2
Q3
Value
Figure 1.7: The interquartile range is a measure of the dispersion of a random
variable. It is the difference between the first and third quartiles of a distribution,
Q3 − Q1 , where the quartiles divide the distribution into four equal parts (see page
9). The semi-interquartile range is also a common measure of dispersion; it is half
the interquartile range.
a discrete variable,
MD =
| xi − µx | p(x)
while for a continuous variable,
+∞
MD =
| x − µx | p(x)
−∞
(1.7a)
(1.7b)
Variance and Standard Deviation
Like the mean absolute deviation, the variance and standard deviation measure the dispersion of a random variable about its mean µx . The variance
of a random variable x, σx2 , is the expected value of (x − µx )2 , which is the
squared deviation of x from its mean value:
σx2 ≡ E (x − µx )2
As you can see, the concept of the variance is very similar to that of the
mean absolute deviation. In fact, the variance is sometimes called the mean
squared deviation. The variance for discrete and continuous variables is
given by
σx2 = (xi − µx )2 p(xi )
(1.8a)
i
∞
σx2 =
(x − µx )2 p(x) dx
−∞
(1.8b)
The variance is the expected
value of (x − µx )2 , and the
standard deviation is the positive
root of the variance.
14
The standard deviation is
calculated
from the variance:
σx = + σx2
The RSD is an alternate way to
present the standard deviation
1. Properties of Random Variables
Look at the discrete variable (eqn. 1.8a): we have another weighted sum!
The values being summed, (xi − µx )2 , are the squared deviations of the
variable from the mean. The squared deviations indicate how far the value
xi is from the mean value µx , and the weights in the sum, as in eqn. 1.5, are
the probabilities of xi . Thus, “broader” probability distributions will tend
to have larger weights for values of x that have larger squared deviations
(x − µx )2 (and hence are more distant from the mean). Such distributions
will give larger values for the variance, σx2 . Higher variance signifies greater
variability of a random variable.
One problem with using the variance to describe the dispersion of a
random variable is that the units of variance are the squared units of the
original variable. For example, if x is a length measured in m, then σx2 has
units of m2 . The standard deviation, σx , has the same units as x, and so
is a little more convenient at times. The standard deviation is simply the
positive square root of the variance.
Sometimes the variability of a random variable is specified by the relative standard deviation, RSD:
σx
σx
RSD =
or RSD =
µx
x
Both of these expressions are commonly used to calculate RSD; which one
is used is usually obvious from the context. The RSD can be expressed as a
fraction or as a percentage. The RSD is sometimes called the coefficient of
variation (CV).
Comparison of Dispersion Measures
The standard deviation, σx , is
the most common measure of
dispersion.
We have described three common ways to measure a random variable’s
dispersion: semi-interquartile range, QR , mean absolute deviation, MD, and
the standard deviation, σ . These measures are all related to each other, so,
in a sense, it makes no difference which we use. In fact, for distributions
that are only moderately skewed, MD ≈ 0.8σ and QR ≈ 0.67σ . For a
variety of reasons (which are beyond the scope of this text), the variance
and standard deviation are the best measures of dispersion of a random
variable.
Aside: Quantum Variability
The term ‘uncertainty’ in
Heisenberg’s principle refers to
the standard deviation of values
used to describe properties at
the atomic/molecular level.
As stated earlier, quantum mechanics asserts that many of the properties
of matter at the atomic/molecular scale are inherently unpredictable (i.e.,
random). The magnitude of the variability of these properties only becomes
apparent on a sufficiently small scale. Hence, these variables must be described by a probability distribution with a certain mean and standard deviation. This ability to interpret system properties such as energy and position as random variables is an example of the broad scope of the concepts
contained in the study of probability and statistics.
One of the most important relationships in quantum mechanics is Heisenberg’s Uncertainty Principle. The Uncertainty Principle states that the product of the standard deviation of certain random variables, called complementary variables, or complementary properties, has a lower limit. For
example, the linear momentum p and the position q of a particle are complementary properties; the Heisenberg Uncertainty Principle states that
σp · σq ≥
h
4π
1.4. Probability Distributions of Continuous Variables
15
As stated in the Uncertainty Principle, the standard deviation of complementary variables are inversely related to one another. In other words, if a
particle such as an electron is constrained to remain confined to a certain
area, then the uncertainty in the linear momentum is great: i.e., if σq is
small (e.g., for a confined electron) then σp is large.
1.4 Probability Distributions of Continuous
Variables
1.4.1 Introduction
Properties such as mass or voltage are typically free to assume any value;
hence, they are continuous variables. There is one fundamental distinction
between discrete and continuous variables: the probability of a continuous
random variable, x, exactly assuming one of the values, x0 , in the domain
is zero! In other words,
P(x = x0 ) = 0
How, then, do we specify the probabilities of continuous random variables? Instead of calculating the probabilities of specific continuous variables, we determine the probability that the outcome is within a given range
of continuous variables. In order to find the probability that the random
variable x will be between two values x1 and x2 , we can use a function p(x)
such that
x2
P(x1 ≤ x ≤ x2 ) =
p(x) dx
(1.9)
x1
The function p(x) is called the probability density function of the continuous random variable x. Figure 1.8 demonstrates the general idea.
Just as the probability of a discrete variable must sum to one over the
entire domain, the area under the probability density function within the
range of possible values for x must be one. For example, if the domain
ranges from −∞ to ∞, then
∞
p(x) dx = 1
−∞
As in the discrete case, the value of the function p(x) must be positive over its entire range. The probability density function allows us to
construct a probability distribution for continuous variables; indeed, sometimes it is called simply a “distribution function,” as with discrete variables.
However, evaluation of the probability density function for a particular
value x0 does not yield the probability that x = x0 — that probability is
zero, after all — as it would for a discrete distribution function.
Probability distributions are thus a little more complicated for continuous variables than for discrete variables. The main difference is that probabilities of continuous variables are defined in terms of ranges of values,
rather than single values. The probability density function, p(x) (if one
exists) can be used to determine these probabilities.
The probability density function,
sometimes called the probability
mass function, is used to
determine probabilities of
continuous random variables.
1. Properties of Random Variables
x1
Probability density
16
x2
Value
Figure 1.8: Probability characteristics of continuous variables. The curve is the
probability density function, and the shaded area is the probability that the random
variable will be between x1 and x2 . The area under the entire curve is one.
1.4.2 Normal (Gaussian) Probability Distributions
The normal probability
distribution describes the
characteristics of many
continuous random variables
encountered in measurement
science.
By far, the most common probability distribution in science is the Gaussian
distribution. In very many situations, it is assumed that continuous random
variables follow this distribution; in fact, it is so common that it is simply
referred to as the normal probability distribution. The probability density
function of this distribution is given by the following equation:
(x − µ)2
1
√
exp −
N(x : µ, σ ) =
(1.10)
2σ 2
σ 2π
where the expression N(x : µ, σ ) conveys the information that x is a
normally-distributed variable with mean µ and standard deviation σ . Figure 1.9 shows a normal probability density function with µx = 50 and
σx = 10. Note that it is a symmetric distribution with the well-known “bellcurve” shape.
Calculating probability distributions of continuous variables using the
probability density function is a little more complicated than with discrete
variables, as shown in example 1.1.
Example 1.1
Johnny Driver is a conscientious driver; on a freeway with a posted speed
limit of 65 mph, he tries to maintain a constant speed of 60 mph. However, the car speed fluctuates during moments of inattention. Assuming
that car speed follows a normal distribution with a mean µx = 60 mph
and standard deviation σx = 3 mph, what is the probability that Johnny
is exceeding the speed limit at any time?
Figure 1.10 shows a sketch of the situation. The car speed is a random
variable that is normally distributed with µx = 60 mph and σx = 3 mph.
1.4. Probability Distributions of Continuous Variables
17
0.05
Probability density
0.04
0.03
0.02
0.01
0.00
0
20
40
60
80
100
Measurement value
Figure 1.9: Plot of the Gaussian (“normal”) probability distribution with µ = 50 and
σ = 10. Note that most of the area under the curve is within 3σ of the mean.
We need to determine the probability that x is greater than 65 mph, which
is the shaded area under the curve in the figure:
∞
P(x > 65) =
65
1
(x − µx )2
√
dx
exp −
2σ
2π σ
where µx and σx are given the appropriate values. When this integral is
evaluated, a value of 0.0478 is obtained. Thus, there is a 4.78% probability
that Johnny is speeding.
1.4.3 The Standard Normal Distribution
In calculating probabilities of continuous variables, it is usually necessary
to evaluate integrals, which can be inconvenient (a computer program is required in the case of normally-distributed variables) and tedious. It would
be preferable if there were tables of integration values available for reference; there are, in fact, many tables available for just this purpose. Of
course, an integration table will only be valid for a specific probability distribution. However, the normal distribution is not a single distribution, but
is actually a family of distributions: changing the mean or variance of the
variable will give a different distribution. It is not practical to formulate
integration tables for all possible values of µ and σ 2 ; fortunately, this is
not necessary, as we will see now.
A special case of the normal distribution (eqn. 1.10) occurs when the
mean is zero (µx = 0) and the variance is unity (σx2 = σx = 1); this particular
normal probability distribution is called the standard normal distribution,
N(z).
1
z2
N(z) = √
(1.11)
exp −
2
2π
The standard normal distribution
is a special version of the normal
distribution. It is useful in
solving problems like
example 1.1.
18
1. Properties of Random Variables
x = 65
0.14
Probability density
0.12
0.10
0.08
0.06
0.04
0.02
0.00
50
52
54
56
58
60
62
64
66
68
70
Car speed, mph
Figure 1.10: Sketch of distribution of the random variable in example 1.1. The
area under the curve is the value we want: P(x > 65) = 0.0478
Other than giving a simplified form of the normal distribution function,
the standard normal distribution is important because integration tables
of this function exist that can be used to calculate probability values for
normally-distributed variables. In order to use these tables, it is necessary
to transform a normal variable, with arbitrary values of µ and σ 2 , to the
standard normal distribution. This transformation is accomplished with
z-transformation, which is usually called standardization.
Taking a variable x, we define z such that
z=
x − µx
σx
(1.12)
The transformed value z is the z-score of the value x. The z-score of a
value is the deviation of the value from its mean µx in units of the standard
deviation, σx , as illustrated in the example 1.2.
Example 1.2
Let’s say we set up an experiment such that the outcome is described by
a normal distribution with µx = 25.0 and σx = 2.0. A single measurement
yields x0 = 26.4; what is the z-score of this measurement?
The value is calculated directly from eqn. 1.12
z0 =
x0 − µx
26.4 − 25.0
=
= 0.7
σx
2.0
Thus, the measurement is +0.7σ from the mean.
The process of standardization of a random variable x yields another
variable z; if x is normally distributed with mean µx and standard deviation
σx , then z is also normally distributed with µ = 0 and σ = 1. This illustrates an important concept: any value calculated using one or more random variables is also a random variable. In other words, the calculations
associated with standardization did not rid x of its innate “randomness.”
1.4. Probability Distributions of Continuous Variables
19
Although there are no integration tables for a normally-distributed variable x with arbitrary mean and standard deviation, we can apply the ztransformation and use the integration tables of the standard normal distribution. Tables of the standard normal distribution usually give cumulative probabilities, which correspond to the areas in one of the “tails” under
the density function. The ‘left tail’ is given by
z0
z0
P(z < z0 ) =
left tail
N(z)dz
(1.13)
−∞
P(z < z0 )
while the ‘right tail’ area is calculated from
right tail
+∞
P(z > z0 ) =
N(z)dz
(1.14)
z0
The next example will show how we can use the z-tables to calculate probabilities of normally-distributed variables.
Example 1.3
In example 1.1 we determined by integration the probability that a car
of variable speed was exceeding the speed limit (65 mph); the mean and
standard deviation of the car speed were 60 mph and 3 mph, respectively. Now solve this problem using z-tables.
The problem can be re-stated as follows: determine the probability
P(x > x0 ) =?
where x is a normally-distributed variable with µx = 60 mph, σx = 3 mph,
and x0 = 65 mph. The only way to solve this problem is by integration. We
can use the z-tables if we first standardize the variables. The z-transformed
problem reduces to
x − µx
x0 − µx
>
σx
σx
= P(z > z0 )
P(x > x0 ) = P
where z is described by the standard normal distribution, and z0 is the
appropriate z-score:
z0 =
x0 − µx
65 − 60
=
≈ 1.67
σx
3
Now we can use the z-table to find the area in the ‘right tail’ of the
z-distribution. From the z-table, we see that
P(x > 65) ≈ P(z > 1.67) = 0.0475
This answer agrees (more or less) with our previous value, 0.0478 (see
example 1.1). The slight difference is due to the fact that 53 does not exactly
equal 1.67.
z0
P(z > z0 )
20
1. Properties of Random Variables
Important Relationships for Standard Normal Distributions
You should become very familiar
with the concepts presented in
this section.
The Appendix presents a number of useful statistical tables, including one
for the standard normal distribution (i.e., a z-table). Since the normal distribution is symmetric, there is no need to list the areas corresponding to
both negative and positive z-score values, so most tables only present half
of the information. The z-table given in this book lists right-tail areas associated with positive z-scores. In order to calculate the areas corresponding
to various ranges of normally-distributed variables, using only right-tail areas, a few important relationships should be learned.
⇒ Calculating left-tail areas: P(z < −z0 )
Since the normal distribution is symmetric, the following relationship is
true:
P(z < −z0 ) = P(z > z0 )
(1.15)
This expression allows one to calculate left-tail areas from right-tail areas,
and vice versa. This symmetric nature of the normal probability distribution is illustrated here:
-z0
z0
=
⇒ Calculating probabilities greater than 0.5: P(z > −z0 )
As mentioned previously, most tables (including the one in this book) only
list the areas for half the normal curve. That is because areas corresponding to the other half — i.e., probabilities larger than 0.5 — can easily be
calculated. Out table only lists the right-tail areas for positive z-scores;
thus, we need a way to calculate right-tail areas for negative z-scores. We
would use the following equation:
P(z > −z0 ) = 1 − P(z > z0 )
(1.16)
A pictorial representation of this equation is:
-z0
=
–
z0
⇒ Calculating ‘middle’ Probabilities: P(z1 < z < z2 )
Instead of “tail” areas (i.e., P(z > z0 ) or P(z < −z0 )), it is often necessary to
calculate the area under the curve between two z-scores. The most general
expression for this situation is
P(z1 < z < z2 ) = P(z > z1 ) − P(z > z2 )
Again, in picture form:
z1
z1
z2
=
z2
–
(1.17)
1.4. Probability Distributions of Continuous Variables
21
It is important to become adept at using z-tables to calculating probabilities of normally-distributed variables. The following two examples illustrate some of the problems you might encounter.
Example 1.4
A soft-drink machine is regulated so that it discharges an average volume of 200. mL per cup. If the volume of drink discharged is normally
distributed with a standard deviation of 15 mL,
(a) what fraction of the cups will contain more than 224 mL of soft
drink?
(b) what is the probability that a 175 mL cup will overflow?
(c) what is the probability that a cup contains between 191 and 209 mL?
(d) below what volume do we get the smallest 25% of the drinks?
In answering these types of questions, it is always helpful to draw a quick
sketch of the desired area, as we do here (in the margins).
(a) This problem is similar to previous ones: we must find a right-tail area
P(x > x0 ), where x0 = 224 mL. To do so, we can use the z-tables if we first
calculate z0 , the z-score of x0 .
224
x − µx
z0 − µx
>
σx
σx
224 − 200
= P(z > 1.6)
=P z>
15
= 0.0548
P(x > x0 ) = P
150
175
200
225
250
P(x > 224 mL)
Looking in the z-tables yields the answer. There is a 5.48% probability that
a 224 mL cup will overflow.
(b) In this case, the z-score is negative, so that we must use eqn. 1.16 to find
the probability using the z-tables in the Appendix.
175 − 200
P(x > 175 mL) = P z >
15
= P(z > −1.67) = 1 − P(z > 1.67)
= 1 − 0.0475 = 0.9525
175
150
175
200
225
250
P(x > 175 mL)
There is a 95.25% probability that the 175 mL cup will overflow. A common
mistake in this type of problem is to calculate P(z > z0 ) (0.0475) instead
of 1 − P(z > z0 ) (0.9525); referring to a sketch helps to catch this problem,
since it is obvious from the sketch that the probability should be greater
than 50%.
191
209
(c) We must find P(x1 > x > x2 ), where x1 = 191 mL and x2 = 209 mL. To do
so using the z-tables, we must find the z-scores for both x1 and x2 , and
then use eqn. 1.17 to calculate the probability.
150
175
200
225
250
P(191 mL < x < 209 mL)
22
1. Properties of Random Variables
191 − 200
209 − 200
<z<
15
15
= P(−0.6 < z < +0.6)
P(191 mL < x < 209 mL) = P
= 1 − P(z < −0.6) − P(z > +0.6)
= 1 − 2 · P(z > +0.6) = 1 − 2 · 0.2743
= 0.4514
So there is a 45.14% probability that a cup contains 191–209 mL.
(d) This question is a little different than the others. We must find a value x0
such that P(x < x0 ) = 0.25. In all of the previous examples, we began with
a value (or a range of values) and then calculated a probability; now we are
doing the reverse — we must calculate the value associated with a stated
probability. In both cases we use the z-tables, but in slightly different ways.
x0
25%
150
175
200
225
250
P(x < x0 ) = 0.25
To begin, from the z-tables we must find a value z0 such that P(z < z0 ) =
0.25. Looking in the z-tables, we see that P(z > 0.67) = 0.2514 and P(z >
0.68) = 0.2483; thus, it appears that a value of 0.675 will give a right-tailed
area of approximately 0.25. Since we are looking for a left-tailed area, we
can state that
P(z < −0.675) ≈ 0.25
Our next task is to translate this z-score into a volume; in other words, we
want to “de-standardize” the value z0 = −0.675 to obtain x0 , the volume
that corresponds to this z-score. From eqn. 1.12 on page 18, we may write
x0 = µx + z0 · σx
= 200 + (−0.675)(15) mL
= 189.9 mL
Thus, we have determined that the drink volume will be less than 189.9 mL
with probability 25%.
Example 1.5
The mean inside diameter of washers produced by a machine is 0.502 in,
and the standard deviation is 0.005 in. The purpose for which these
washers are intended allows a maximum tolerance in the diameter of
0.496–0.508 in; otherwise, the washers are considered defective. Determine the percentage of defective washers produced by the machine, assuming that the diameters are normally distributed.
0.496
0.48
0.49
0.50
0.508
0.51
0.52
P(x < 0.496) + P(x > 0.508)
We are looking for the probability that the washer diameter is either less
than 0.496 in or greater than 0.508 in. In other words, we want to calculate
the sum P(x < 0.496 in) + P(x > 0.508 in).
First we must calculate the z-scores of the two values x1 and x2 , where
x1 = 0.496 in and x2 = 0.508 in. Then we can use the z-table to determine
the desired probability.
x1 − µx
z1 =
σx
0.496 − 0.502
=
= −1.2
0.005
x2 − µx
z2 =
σx
0.508 − 0.502
=
= 1.2
0.005
1.4. Probability Distributions of Continuous Variables
23
We can see that z1 = −z2 ; in other words, the two tails have the same area.
Thus,
P(x < x1 ) + P(x > x2 ) = P(z < z1 ) + P(z > z2 )
= 2 · P(z > 1.2) = 2 · 0.1151
= 0.2302
Remember:
x1 = 0.496 in
x2 = 0.508 in
Thus, 23.02% of the washers produced by this machine are defective.
Aside: Excel Tip
Modern spreadsheet programs, such as MS Excel, contain a number of statistical functions, including functions that will integrate the normal probability distribution. In Excel, the functions NORMDIST and NORMSDIST will
perform these integrations for normal and standard normal distributions,
respectively. These can be especially useful in determining integration values that are not in z-tables, or when the tables are not readily available.
View the on-line help documentation in Excel for more information on
how to use these functions. Note that NORMSDIST was used in generating
the z-table in the Appendix. In fact, all of the statistical tables were generated in Excel — other useful spreadsheet functions will be highlighted
throughout this book.
Further Characteristics of Normally-Distributed Variables
Before leaving this section, consider the following characteristics of all random variables that follow a normal distribution.
• Approximately two-thirds of the time the random variable will be
within one standard deviation of the mean value; to be exact,
P(µx − σx < x < µx + σx ) = 0.6827
• There is approximately a 95% probability that the variable will be
within two standard deviations of the mean:
P(µx − 2σx < x < µx + 2σx ) = 0.9545
You should be able to use the z-tables to obtain these probabilities;
you might want to verify these numbers as an exercise.
These characteristics, which are shown in figure 1.11, are useful rules of
thumb to keep in mind when dealing with normally-distributed variables.
For example, a measurement that is five standard deviations above the
mean is not very likely, unless there is something wrong with the measuring device (or there is some other source of error).
1.4.4 Advanced Topic: Other Continuous Probability
Distributions
At this point, we have described several important probability distributions, along with the types of experiments that might result in these
distributions.
Note that both functions return
left-tail areas rather than the
right-tail areas we use in this
book
24
1. Properties of Random Variables
68%
-4
-3
-2
-1
0
95%
1
2
3
4
-4
-3
-2
-1
0
z-score
z-score
(a)
(b)
1
2
3
4
Figure 1.11: Characteristic of normally-distributed variables: the shaded area represents the probability that a normally-distributed variable will assume a value
within (a) one or (b) two standard deviations of the mean.
• Bernoulli experiments (“coin tossing experiments”) are common and
their outcomes are described by the binomial probability distribution,
which is a discrete probability distribution.
• Counting experiments are also common, and these often result in variables described by a Poisson distribution, which is also a discrete distribution.
• Many continuous variables are adequately described by the Gaussian
(‘normal’) probability distribution.
Still, there are some situations that result in continuous variables that
cannot be described by a normal distribution. We will describe two other
continuous probability functions, but there are many more.
The Exponential Distribution
Let’s go back to counting experiments (see page 5). In this type of experiment, we are interested in counting the number of events that occur in a
unit of time or space. However, let’s say we change things around, as in the
following examples.
• We may count the number of photons detected per unit time (a discrete variable) or we may measure the time between detected photons
(a continuous variable).
• We may count the number of cells in a solution volume (a discrete
variable) or we may measure the distance between cells (a continuous
variable).
• We may count the numbers of molecules that react per unit time (i.e.,
the reaction rate, a discrete variable) or we may be interested in the
time between reactions (a continuous variable).
• We may count the number of cars present on a busy street (a discrete
variable) or we may measure the distance between the cars (a continuous variable).
1.4. Probability Distributions of Continuous Variables
25
Probability density
2.5
2.0
1.5
1.0
0.5
0.0
0.0
0.5
1.0
1.5
2.0
2.5
Time, s
Figure 1.12: Exponential probability distribution with µx = σx = 0.4 s. This distribution describes the time interval between α-particles emitted by a radioisotope;
see page 5 for more details.
And so on. We are essentially “flipping” the variable from events in a
given unit of time (or space) to the time (or space) between events. If the
discrete variable — the number of events — in these examples is described
well by a Poisson distribution, then the continuous variable is described by
the exponential probability density function.
p(x) = ke−kx
(1.18)
If the number of counts follows a
Poisson distribution, then the
interval between counts follows
an exponential distribution.
where k is a characteristic of the experiment. In fact, for an exponentially
distributed variable the mean, median, and standard deviation are given by
µx = k−1
Q2 = ln (2) · k
In certain applications, the mean
of the exponential distribution is
called the lifetime, τ, and the
median is the half-life, t1/2 .
−1
σx = k−1
The mean (and standard deviation) of the exponential distribution is
the inverse of the mean of the corresponding Poisson distribution. For example, we described an experiment on page 5 in which we were counting
α-particles emitted by a sample of a radioactive isotope at a mean rate of
2.5 counts/second. It only stands to reason that the mean time between
1
detected α-particles would be 2.5 = 0.4 seconds. The corresponding exponential distribution is shown in figure 1.12.
Statistical tables for the exponential distribution are not often given
because the integral of the exponential probability density function is easy
to evaluate: the probability that x is between x1 and x2 is given by
P(x1 < x < x2 ) = e−x1 /µx − e−x2 /µx
(1.19)
Exponential probability distributions are common in chemistry, but you
may not be used to thinking of them as probability distributions. Anytime you come across a process that experiences an “exponential decay,”
26
1. Properties of Random Variables
chances are that you can think of the process in terms of a counting experiment. Examples of exponential decays are:
• the decrease in concentration in a chemical reaction (first-order rate
law);
• the decrease in light intensity as photons travel through an absorbing
medium (Beer’s Law);
• the decrease in the population in an excited energy state of an atom
or molecule (lifetime measurements).
All of these processes can be observed in a counting experiment, with characteristic Poisson and exponential distributions.
Atomic and Molecular Orbitals
Atomic and molecular orbitals
are simply probability
distributions describing the
position of electrons in atoms
and molecules, respectively.
Radial probability distribution
function for atomic 1s orbitals.
Atomic and molecular orbitals are probability density functions for the position of an electron in an atom or molecule. Such orbitals are sometimes
called electron density functions. They allows us to determine the probability that the electron will be found in a given position relative to the
nucleus. The different orbitals (e.g., 2s or 3px atomic orbitals) correspond
to different probability density functions.
The electron density functions actually contain three random variables,
since they give the probability that an electron is at any given point in
space. As such they are really examples of joint probability distributions of
the three random variables corresponding to the coordinate axes (e.g., x, y
and z in a Cartesian coordinate system). For spherically-symmetric orbitals,
it is convenient to rewrite the joint probability distribution in terms of a
single variable r , which is the distance of the electron from the nucleus. For
the 1s orbital, this probability density function (called a radial distribution
function) has the following form:
p(r ) = kr 2 e−3r /µr
where µr is the mean electron-nucleus distance and k is a normalization
constant that ensures that the integrated area of the function is one. The
values of µr and k will depend on the identity of the atom. The radial
density function for the hydrogen 1s atomic orbital is shown in figure 1.13.
On page 6 we observed that the energy of a molecule is a random variable that can be described by the Boltzmann probability distribution; now
we have encountered yet another property at the atomic/molecular scale
that must be considered a random variable. Electron position can also only
be described in terms of a probability distribution. Understanding the nature and properties of random variables and their probability distributions
thus has applications beyond statistical data analysis.
1.5 Summary and Skills
The single most important skill
developed in this chapter is the
ability to use z-tables to do
probability calculations involving
normally-distributed random
variables.
A random variable is a variable that cannot be predicted with absolute certainty, and must be described using a probability distribution. The location
of the probability distribution is well described by the mean, µx , of the
variable, and the inherent uncertainty in the variable is usually described
by its standard deviation, σx .
1.5. Summary and Skills
27
52.9 pm
Probability density
0.010
0.008
0.006
0.004
0.002
0.000
0
50
100
150
200
250
Radial distance, pm
Figure 1.13: Radial distribution function of the hydrogen 1s orbital. The dotted
line at the mode indicates the Bohr radius, a0 , of the orbital, where a0 = 52.9 pm.
The mean radial distance µr for this orbital is 79.4 pm.
There are two general types of quantitative random variables: discrete
variables, which can only assume certain values (e.g., integers) and continuous variables. Examples of important discrete probability distributions
include the binomial and Poisson distributions — these functions allow one
to predict the outcome of Bernoulli and counting experiments, respectively.
Both of these types of experiments are quite common in science.
For continuous variables, the probability density function can be used
to find the probability that the variable is within a certain range of values,
P(x1 < x < x2 ). The most important family of probability density functions is the Gaussian, or normal, probability distribution. The standard
normal distribution specifically describes a normally-distributed variable
with µx = 0 and σx = 1; integration tables of the cumulative standard normal distribution (i.e., z-tables) can be used to calculate probabilities for any
normally-distributed variable.
Another important probability density function is the exponential distribution, which describes the interval between successive Poisson events
in a counting experiment.
Finally, properties of matter at the atomic/molecular scale must often
be described using probability distributions. In particular, molecular energy is a discrete random variable that may be described by the Boltzmann
probability distribution, and electron position is a continuous random variable whose probability density function is called an atomic (or molecular)
orbital. The Heisenberg Uncertainty Principle describes the relationship between the standard deviations of certain sets of random variables called
complementary variables.