Download Probability and simulation

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
eNote 2
eNote 2
Probability and simulation
1
2
eNote 2 CONTENTS
Contents
2
Probability and simulation
2.1 Random variable . . . . . . . . . . . .
2.2 Discrete distributions . . . . . . . . . .
2.2.1 Mean and variance . . . . . . .
2.2.2 Binomial distribution . . . . . .
2.2.3 Hypergeometric distribution .
2.2.4 Poisson distribution . . . . . .
2.3 Continuous distributions . . . . . . . .
2.3.1 Uniform distribution . . . . . .
2.3.2 Normal distribution . . . . . .
2.3.3 Log-Normal distribution . . . .
2.3.4 Exponential distribution . . . .
2.4 Identities for the mean and variance .
2.5 Covariance and correlation . . . . . .
2.5.1 Correlation . . . . . . . . . . .
2.6 Independence of random variables . .
2.7 Functions of normal random variables
2.7.1 The χ2 distribution . . . . . . .
2.7.2 The t-distribution . . . . . . . .
2.7.3 The F distribution . . . . . . .
2.8 Exercises . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
3
6
12
19
22
24
28
31
32
39
40
44
46
47
48
53
54
59
66
71
eNote 2
2.1 RANDOM VARIABLE
3
In this chapter elements from probability theory are introduced. These are needed to
form the basic mathematical description of randomness. For example for calculating
the probabilities of outcomes in various types of experiment setups, e.g. dice rolls and
lottery draws, as well as for natural phenomena such as the waiting time between radioactive decays. The theory is introduced together with illustrative R code examples,
which the reader is encouraged to try and interact with in parallel to reading the text.
2.1 Random variable
The basic building blocks to describe random outcome of an experiment are introduced
in this section. The definition of an experiment is quite broad. It can be an experiment,
which is carried out under controlled conditions e.g. in a laboratory or flipping a coin,
as well as an experiment in conditions which are not controlled, where for example a
process is observed e.g. observations of the GNP or measurements taken with a space
telescope. Hence an experiment can be thought of as any setting in which the outcome
cannot be fully known. This for example also includes measurement noise, which are
random “errors” related to the system used to observe with, maybe originating from
noise in electrical circuits or small turbulence around the sensor. Measurements will
always contain some noise.
First the sample space is defined.
Definition 2.1
The sample space S is the set of all possible outcomes of an experiment.
Example 2.2
Consider an experiment in which a person will throw two paper balls with the purpose of
hitting a wastebasket. All the possible outcomes forms the sample space of this experiment
as
S = (miss,miss), (hit,miss), (miss,hit), (hit,hit)
(2-1)
Now a random variable can be defined.
eNote 2
4
2.1 RANDOM VARIABLE
Definition 2.3
A random variable is a function which assign a numerical value to each to each outcome in the sample space. In this book random variables are denoted with capital
letters, e.g.
X, Y, . . .
(2-2)
Example 2.4
Continuing the paper ball example above, a random variable can be defined as the number
of hits, thus
X (miss,miss) = 0
(2-3)
X (hit,miss) = 1
(2-4)
X (miss,hit) = 1
(2-5)
X (hit,hit) = 2
(2-6)
In this case the random variable is a function which maps the sample space S to positive
integers, i.e. X : S → N0 .
Remark 2.5
The random variable represents a value of the outcome before the experiment is carried out. Usually the experiment is carried out n times and a random variable is
needed for each time
{ Xi : 1, 2, . . . , n}
(2-7)
After the experiment has been carried out n times a set of values of the random
variable is available as
{ xi : 1, 2, . . . , n}
(2-8)
Each value is called a realization or observation of the random variable and is denoted
with a small letter sub-scripted with an index i, as introduced in Chapter 1.
eNote 2
2.1 RANDOM VARIABLE
5
Finally, in order to quantify probability a random variable is associated with a probability
distribution. The distribution can either be discrete or continuous depending on the
nature of the outcomes:
• Discrete outcomes can for example be: the outcome of a dice roll, the number of
children per family or the number of failures of a machine per year. Hence some
countable phenomena which can be represented by an integer.
• Continuous outcomes can for example by: the weight of the yearly harvest, the
time spend on homework each week or the electricity production per hour. Hence
a phenomena which can be represented by a continuous value.
Furthermore the outcome can either be unlimited or limited. This is most obvious in the
case discrete case, e.g. a dice roll is limited to the values between 1 and 6. However it is
also often the case for continuous random variables, for example many are non-negative
(weights, distances, etc.) and proportions are limited to a range between 0 and 1.
Conceptually there is no difference between the discrete and the continuous case, however it is easier to distinguish since the formulas, which in the discrete case are with
sums, in the continuous case are with integrals. In the remaining of this chapter, first
the discrete case is presented and then the continuous.
eNote 2
6
2.2 DISCRETE DISTRIBUTIONS
2.2 Discrete distributions
In this section discrete distributions and their properties are introduced. A discrete
random variable has discrete outcomes and follows a discrete distribution.
To exemplify, consider the outcome of one roll of a fair six-sided dice as the random
variable X fair . It has six possible outcomes, each with equal probability. This is specified
with the probability density function.
Definition 2.6
For a discrete random variable X the probability density function (pdf) is
f ( x ) = P( X = x )
(2-9)
It assigns a probability to every possible outcome value x.
A discrete pdf fulfills two properties: there must be no negative probabilities for any
outcome value
f ( x ) ≥ 0 for all x
(2-10)
and the probability for all outcome values must sum to one
∑
f (x) = 1
(2-11)
all x
For the fair dice the pdf is
x
f Xfair ( x )
1
2
3
4
5
6
1
6
1
6
1
6
1
6
1
6
1
6
If the dice is not fair, maybe it has been modified to increase the probability of rolling a
six, the pdf could for example be
x
f Xunfair ( x )
1
2
3
4
5
6
1
7
1
7
1
7
1
7
1
7
2
7
where X unfair is a random variable representing the value of a roll with the unfair dice.
eNote 2
2.2 DISCRETE DISTRIBUTIONS
7
Remark 2.7
Note that the pdfs has subscript with the symbol of the random variable to which
they belong. This is done when there is a need to distinguish between pdfs e.g. for
several random variables. For example if two random variables X and Y are used
in same context, then: f X ( x ) is the pdf for X and f Y ( x ) for Y, similarly the sample
standard deviation s X is for X and sY is for Y, and so forth.
With the pdf defined the experiment can easily be simulated, i.e. instead of carrying out
the experiment in reality it is carried out using a model on the computer. When the
simulation includes generating random numbers it is called a stochastic simulation.
Example 2.8
Let’s simulate the experiment of rolling a dice using the following R code (open the file
eNote2-ProbabilitySimulation.R and try it)
## Simulate rolling a dice
## Make a random draw from (1,2,3,4,5,6) with equal probability for each outcome
sample(1:6, size=1)
The simulation becomes more interesting when the experiment is repeated many times, then
we can calculate the empirical density function (or empirical pdf ) as a discrete histogram and
actually “see” the shape of the pdf
eNote 2
8
2.2 DISCRETE DISTRIBUTIONS
## Simulate a fair dice
1.0
## Number of simulated realizations
n <- 30
## Draw independently from the set (1,2,3,4,5,6) with equal probability
xFair <- sample(1:6, size=n, replace=TRUE)
## Count the number of each outcome using the table function
table(xFair)
## Plot the empirical pdf
plot(table(xFair)/n)
## Add the pdf to the plot
lines(rep(1/6,6), type="h", col="red")
0.6
0.4
0.0
0.2
Density
0.8
Empirical pdf
pdf
1
2
3
4
5
6
x
Try simulating with different number of rolls n and describe how this affects the
accuracy of the empirical pdf compared to the pdf?
Now repeat this with the unfair dice
eNote 2
9
2.2 DISCRETE DISTRIBUTIONS
## Simulate an unfair dice
1.0
## Number of simulated realizations
n <- 30
## Draw independently from the set (1,2,3,4,5,6) with higher probability for a six
xUnfair <- sample(1:6, size=n, replace=TRUE, prob=c(rep(1/7,5),2/7))
## Plot the empirical density function
plot(table(xUnfair)/n)
## Add the pdf to the plot
lines(c(rep(1/7,5),2/7), type="h", col="red")
0.6
0.4
0.0
0.2
Density
0.8
Empirical pdf
pdf
1
2
3
4
5
6
x
Compare the fair and the unfair dice simulations:
How did the empirical pdf change?
By simply observing the empirical pdf can we be sure to distinguish between the fair
and the unfair dice?
How does the number of rolls n affect how well we can distinguish the two dices?
eNote 2
2.2 DISCRETE DISTRIBUTIONS
10
One reason to simulate becomes quite clear here: it would take considerably more time
to actually carry out these experiments. Furthermore, sometimes calculating the theoretical properties of random variables (e.g. products of several random variables etc.)
can be impossible and simulations can be an easy way to obtain results.
Random number sequences generated with software algorithms have the properties of
real random numbers, e.g. they are independent, but are in fact deterministic sequences
depending on a seed, which sets an initial value of the sequence. Therefore they are
named pseudo random numbers, since they behave like and are used as random numbers
in simulations, but are in fact deterministic sequences.
Remark 2.9
In R the initial values can be set with a single number called the seed as demonstrated
with the following R code. As default the seed is created from the time of start-up of
a new instance of R. A way to generate truly (i.e. non-pseudo) random numbers can
be to sample some physical phenomena, for example atmospheric noise as done at
www.random.org.
## The random numbers generated depends on the seed
## Set the seed
set.seed(127)
## Generate a (pseudo) random sequence
sample(1:10)
[1]
3
1
2
8
7
4 10
9
6
5
## Generate again and see that new numbers are generated
sample(1:10)
[1]
9
3
7
2 10
8
6
1
5
4
## Set the seed and the same numbers as before just after the
## seed was set are generated
set.seed(127)
sample(1:10)
[1]
3
1
2
8
7
4 10
9
6
5
eNote 2
11
2.2 DISCRETE DISTRIBUTIONS
The cumulated distribution function (cdf), or simply the distribution function, is often used.
Definition 2.10
The cdf
The cumulated distribution function (cdf) for the discrete case is the probability of realizing an outcome below or equal to the value x
F ( x ) = P( X ≤ x ) =
∑
∑
f (xj ) =
j where x j ≤ x
P( X = x j )
(2-12)
j where x j ≤ x
The probability that the outcome of X is in a range is
P( a < X ≤ b) = F (b) − F ( a)
(2-13)
For the fair dice the probability of an outcome below or equal to 4 can be calculated
4
FXfair (4) =
1
1
1
1
2
∑ f Xfair (x j ) = 6 + 6 + 6 + 6 = 3
j =1
Example 2.11
Simulate again 30 rolls of the dice and see the empirical distribution function
(2-14)
eNote 2
12
2.2 DISCRETE DISTRIBUTIONS
## Simulate a fair dice
1.0
## Number of realizations
n <- 30
## Draw independently from the set (1,2,3,4,5,6) with equal probability
xFair <- sample(1:6, size=n, replace=TRUE)
## Calculate the empirical cdf
cumsum(table(xFair))
## Plot a discrete histogram
plot(as.table(cumsum(table(xFair)))/n, lwd=10)
## Add the cdf
lines(cumsum(rep(1/6,6)), type="h", col="red")
0.6
0.4
0.0
0.2
Distribution
0.8
Empirical cdf
cdf
1
2
3
4
5
6
x
2.2.1 Mean and variance
In Chapter 1 the sample mean and the sample variance was introduced. They indicate respectively the centering and the spread of data, i.e. of a sample. In this section the mean
and variance are introduced. They are properties of the distribution of a random variable, they are called population parameters. The mean indicates where the distribution is
centered. The variance indicates the spread of the distribution.
Mean and expected value
eNote 2
13
2.2 DISCRETE DISTRIBUTIONS
The mean (µ) of a random variable is the population parameter which most statistical
analysis focus on. It is also defined as a function E( X ): the expected value of the random
variable X. There is no difference, only that usually: the mean is used for a single random variable and the expected value when dealing with functions of multiple random
variables (for example E(Y ) = E( a1 X1 + a2 X2 ) is used instead of µY = µ a1 X1 +a2 X2 ).
Definition 2.12
Mean value
The mean of a discrete random variable X is
∞
µ = E( X ) =
∑ xi f ( xi )
(2-15)
i =1
where xi is the value and f ( xi ) is the probability that X takes the outcome value xi .
The mean is simply the weighted average over all possible outcome values, weighted
with the corresponding probability. As indicated in the definition there might be infinitely many possible outcome values, hence, even if the total sum of probabilities is
one, then the probabilities must go sufficiently fast to zero for increasing values of X in
order for the sum to be defined.
Example 2.13
For the fair dice the mean is calculated by
1
1
1
1
1
1
µ xfair = E( X fair ) = 1 + 2 + 3 + 4 + 5 + 6 = 3.5
6
6
6
6
6
6
(2-16)
for the unfair dice the mean is
1
1
1
1
1
2
µ xunfair = E( X unfair ) = 1 + 2 + 3 + 4 + 5 + 6 ≈ 3.86
7
7
7
7
7
7
(2-17)
After an experiment has been carried out n times then the sample mean or average can be
calculated as previously defined in Chapter 1.
1 n
µ̂ = x̄ = ∑ xi .
n i
(2-18)
eNote 2
2.2 DISCRETE DISTRIBUTIONS
14
It is called a statistic, which means that it is calculated from a sample. Note the use of
a hat in the notation over µ: this indicates that it is an estimate of the real underlying
mean.
Our intuition tells us that the estimate (µ̂) will be close to true underlying
h
i expectation
1
(µ) when n is large. This is indeed the case, to be more specific E n ∑ Xi = µ (when
E[ Xi ] = µ), and we say that the average is a central estimator for the expectation. The
exact quantification of these qualitative statements will be covered in Chapter 3.
Now play a little around with the mean and the sample mean with some simulations.
Example 2.14
Carrying out the experiment more than one time an estimate of the mean, i.e. the sample
mean, can be calculated. Simulate rolling the fair dice
## Simulate a fair dice
## Number of realizations
n <- 30
## Simulate rolls with a fair dice
xFair <- sample(1:6, size=n, replace=TRUE)
## Calculate the sample mean
sum(xFair)/n
## or
mean(xFair)
Let us see what happens with the sample mean of the unfair dice by simulating the same
number of rolls
## Simulate an unfair dice
## n realizations
xUnfair <- sample(1:6, size=n, replace=TRUE, prob=c(rep(1/7,5),2/7))
## Calculate the sample mean
mean(xUnfair)
Consider the mean of the unfair dice and compare it to the mean of the fair dice (see
Equation (2-16) and (2-17)). Is this in accordance with your simulation results?
eNote 2
2.2 DISCRETE DISTRIBUTIONS
15
Let us again turn to how much we can “see” from the simulations and the impact of the
number of realizations n on the estimation. In statistics the term information is used to refer
to how much information is embedded in the data, and therefore how accurate different
properties (parameters) can be estimated from the data.
Repeat the simulations several times with n = 30. By simply comparing the sample
means from a single simulation can it then be determined if the two means really are
different?
Repeat the simulations several times and increase n. What happens with to the ’accuracy’ of the sample mean compared to the real mean? and thereby how well it can
be inferred if the sample means are different?
Does the information embedded in the data increase or decrease when n is increased?
Variance and standard deviation
The second most used population parameter is the variance (or standard deviation). It is
a measure describing the spread of the distribution, more specifically the spread away
from the mean.
Definition 2.15
Variance
The variance of a discrete random variable X is
σ2 = Var( X ) = E[( X − µ)2 ]
(2-19)
The variance is the squared standard deviation σ.
The variance is expected value (i.e. average (weighted by probabilities)) of the squared
distance between the outcome and the mean value.
eNote 2
2.2 DISCRETE DISTRIBUTIONS
16
Remark 2.16
Notice that the variance cannot be negative.
Notice that the variance is the expected value of a function, namely the squared
distance between the random variable X and its mean. It is thus calculated by
σ2 =
∞
∑ ( x i − µ )2 f ( x i )
(2-20)
i =1
where xi is the outcome value and f ( xi ) is the pdf of the ith outcome value.
The standard deviation is measured on the same scale (same units) as the random variable, which is not case for the variance. Therefore the standard deviation can much
easier be interpreted, when communicating the spread of a distribution.
Consider how the expected value is calculated in Equation (2-15). One can
think of the squared distance as a new random variable that has an expected
value which is the variance of X.
Example 2.17
The variance of rolls with the fair dice is
σx2fair = E[( X fair − µ Xfair )2 ]
1
1
1
1
1
1
=(1 − 3.5)2 + (2 − 3.5)2 + (3 − 3.5)2 + (4 − 3.5)2 + (5 − 3.5)2 + (6 − 3.5)2
6
6
6
6
6
6
70
=
24
≈2.92
(2-21)
After an experiment has been carried out n times the sample variance can be calculated
as defined previously by
s2 = σ̂2 =
n
1
( xi − x̄ )2
∑
n − 1 i =1
and hence thereby also sample standard deviation s.
(2-22)
eNote 2
17
2.2 DISCRETE DISTRIBUTIONS
Again our intuition tells us that the statistic (e.g. sample variance), should in some sense
converge to the true variance - this is indeed the case and the we call the sample variance
a central estimator for the true underlying variance. This convergence will be quantified
for a special case in Chapter 3.
The sample variance is calculated by:
• take the sample mean (average of the observations): x̄
• take the distance for each sample: xi − x̄
• and finally take the average of the squared distances (using n − 1 in the
denominator, see Chapter 1)
Example 2.18
Return to the simulations. First calculate the sample variance from n rolls of a fair dice
## Simulate a fair dice and calculate the sample variance
## Number of realizations
n <- 30
## Simulate
xFair <- sample(1:6, size=n, replace=TRUE)
## Calculate the distance for each sample to the sample mean
distances <- xFair - mean(xFair)
## Calculate the average of the squared distances
sum(distances^2)/(n-1)
## Or use the built in function
var(xFair)
Let us then try to play with variance in the dice example. Let us now consider a four-sided
dice. The pdf is
x
FXfairFour ( x )
1
2
3
4
1
4
1
4
1
4
1
4
Plot the pdf for both the six-sided dice and the four-sided dice
eNote 2
18
2.2 DISCRETE DISTRIBUTIONS
## Plot the pdf of the six-sided dice and the four-sided dice
0.8
0.6
0.4
0.0
0.2
Four sided dice density
0.8
0.6
0.4
0.2
0.0
Six sided dice density
1.0
1.0
## Plot them
plot(rep(1/6,6), type="h", col="red")
plot(rep(1/4,4), type="h", col="blue")
1
2
3
4
5
6
1
x
2
3
4
5
x
## Calculate the means and variances of the dices
## The means
muXSixsided <- sum((1:6)*1/6) # Six-sided
muXFoursided <- sum((1:4)*1/4) # Four-sided
## The variances
sum((1:6-muXSixsided)^2*1/6)
sum((1:4-muXFoursided)^2*1/4)
Which dice outcome has the highest variance? is that as you had anticipated?
6
eNote 2
2.2 DISCRETE DISTRIBUTIONS
19
2.2.2 Binomial distribution
The binomial distribution is a very important discrete distribution and appears in many
applications, it is presented in this section. In statistics it is typically used for proportions as described in Chapter 8.
If an experiment has two possible outcomes (e.g. failure or success, no or yes, 0 or 1) and
is repeated more than one time, then the number of successes is binomial distributed.
For example the number of heads obtained after a certain number of flips with a coin.
Each repetition must be independent, this is called successive draws with replacement
(think of drawing notes from a hat, where after each draw the note is put back again,
i.e. the drawn number is replaced again).
Definition 2.19
Binomial distribution
Let the random variable X be binomial distributed
X ∼ B(n, p)
(2-23)
where n is number of independent draws and p is the probability of a success in
each draw.
The binomial pdf describes probability of obtaining x successes
n x
f ( x; n, p) = P( X = x ) =
p (1 − p ) n − x
(2-24)
x
where
n
n!
=
x!(n − x )!
x
(2-25)
is the number of distinct sets of x elements which can be chosen from a set of n
elements. Remember that n! = n · (n − 1) · . . . · 2 · 1.
eNote 2
2.2 DISCRETE DISTRIBUTIONS
Theorem 2.20
20
Mean and variance
The mean of a binomial distributed random variable is
µ = np
(2-26)
σ2 = np(1 − p)
(2-27)
and the variance is
Actually this can be proved by calculating the mean using Definition 2.12 and the variance using Definition 2.15.
Example 2.21
The binomial distribution for 10 flips with a coin describe probabilities of getting x heads (or
equivalently tails)
eNote 2
21
2.2 DISCRETE DISTRIBUTIONS
## Simulate a binomial distributed experiment
0.6
0.4
0.0
0.2
Density
0.8
1.0
## Number of flips
nFlips <- 10
## The possible outcomes are (0,1,...,nFlips)
xSeq <- 0:nFlips
## Use the dbinom() function which returns the pdf, see ?dbinom
pdfSeq <- dbinom(xSeq, size=nFlips, prob=1/2)
## Plot the density
plot(xSeq, pdfSeq, type="h")
0
2
4
6
8
10
x
Example 2.22
In the previous examples successive rolls of a dice was simulated. If a random variable which
counts the number of sixes obtained X six is defined, it follows a binomial distribution
## Simulate 30 successive dice rolls
## Go
Xfair <- sample(1:6, size=30, replace=TRUE)
## Count the number sixes obtained
sum(Xfair==6)
## This is equivalent to
rbinom(1, size=30, prob=1/6)
eNote 2
2.2 DISCRETE DISTRIBUTIONS
22
Remark 2.23
In R, implementations of many different distributions are available. For each distribution at least the following is available
• The pdf is available by preceding with ’d’, e.g. dbinom
• The cdf is available by preceding with ’p’, e.g. pbinom
• The quantiles by preceding with ’q’
• and random number generation by preceding with ’r’
See for example the help with ?dbinom.
2.2.3 Hypergeometric distribution
The hypergeometric distribution describes number of successes from successive draws
without replacement.
eNote 2
23
2.2 DISCRETE DISTRIBUTIONS
Definition 2.24
Hypergeometric distribution
Let the random variable X be the number of successes in n draws without replacement. Then X follows the hypergeometric distribution
X ∼ H (n, a, N )
(2-28)
where a is the number of successes in the N elements large population. The probability of obtaining x successes is described by the hypergeometric pdf
f ( x; n, a, N ) = P( X = x ) =
−a
( xa )( N
n− x )
( Nn )
(2-29)
The notation
a
a!
=
b!( a − b)!
b
(2-30)
represents the number of distinct sets of b elements which can be chosen from a set
of a elements.
Theorem 2.25
Mean and variance
The mean of a hypergeometric distributed random variable is
a
N
(2-31)
na( N − a)( N − n)
N 2 ( N − 1)
(2-32)
µ=n
and the variance is
σ2 =
Example 2.26
A lottery drawing is a good example where the hypergeometric distribution can be applied.
The numbers from 1 to 90 are put in a bowl and randomly drawn without replacement (i.e.
without putting back the number when it has been drawn). Say that you have the sheet with
8 numbers and want to calculate the probability of getting all 8 numbers in 25 draws.
eNote 2
24
2.2 DISCRETE DISTRIBUTIONS
## The probability of getting x numbers of the lottery sheet in 25 drawings
0.5
## Number of successes in the population
a <- 8
## Size of the population
N <- 90
## Number of draws
n <- 25
## Plot the pdf, note that the parameters names are different in the R function
plot(0:8, dhyper(x=0:8,m=a,n=N-a,k=n), type="h")
0.3
0.2
0.0
0.1
Density
0.4
n = 25
a=8
N = 90
0
2
4
6
8
x
2.2.4 Poisson distribution
The Poisson distribution describes the probability of a given number of events occurring
in a fixed interval if these events occur with a known average rate and independently
of the distance to the last event. Often it is events in a time interval, but can as well
be counts in other intervals, e.g. of distance, area or volume. In statistics the Poisson
distribution is usually applied for analyzing for example counts of: arrivals, traffic,
failures and breakdowns.
eNote 2
25
2.2 DISCRETE DISTRIBUTIONS
Definition 2.27
Poisson distribution
Let the random variable X be Poisson distributed
X ∼ Po(λ)
(2-33)
where λ is the rate (or intensity): the average number of events per interval. The
Poisson pdf describes the probability of x events in an interval
f ( x; λ) =
λ x −λ
e
x!
(2-34)
Example 2.28
The Poisson distribution is typically used to describe phenomena such as:
• the number radioactive particle decays per time interval, i.e. the number of clicks per
time interval of a Geiger counter
• calls to a call center per time interval (λ does vary over the day)
• number of mutations in a given stretch of DNA after a certain amount of radiation
• goals scored in a soccer match
One important feature is that the rate can be scaled, such that probabilities of occurrences in other interval lengths can be calculated. Usually the rate is denoted with the
interval length, for example the hourly rate is denoted as λhour and can be scaled to the
minutely rate by
λminute =
λhour
60
(2-35)
such the probabilities of x events per minute can be calculated with the Poisson pdf with
rate λminute .
eNote 2
26
2.2 DISCRETE DISTRIBUTIONS
Example 2.29
Poisson rate scaling
You are enjoying a soccer match. Assuming that the scoring of goals in the league is a Poisson
process and in average 3.4 goals are scored per match. Calculate the probability that no goals
will be scored while you leave the match for 10 minutes.
Let λ90minutes = 3.4 be goals per match and scale this to the 10 minute rate by
λ10minutes =
3.4
λ90minutes
=
9
9
(2-36)
Let X be the number of goals in 10 minute intervals and use this to calculate the probability
of no events a 10 minute interval by
P( X = 0) = f (0, λ10minutes ) ≈ 0.685
(2-37)
which was found with the R code
## Probability of no goals in 10 minutes
## The Poisson pdf
dpois(x=0, lambda=3.4/9)
Theorem 2.30
Mean and variance
A Poisson distributed random variable X has exactly the rate λ as the mean
µ=λ
(2-38)
σ2 = λ
(2-39)
and variance
eNote 2
27
2.2 DISCRETE DISTRIBUTIONS
Example 2.31
Poisson process
Simulate a Poisson random variable to see the Poisson distribution
## Simulate a Poisson random variable
0.5
## The mean rate of events per interval
lambda <- 4
## Number of realizations
n <- 1000
## Simulate
x <- rpois(n, lambda)
## Plot the emperical pdf
plot(table(x)/n)
## Add the pdf to the plot
lines(0:20, dpois(0:20,lambda), type="h", col="red")
0.3
0.2
0.1
0.0
Density
0.4
Emperical pdf
pdf
0 1 2 3 4 5 6 7 8 9
x
11
eNote 2
28
2.3 CONTINUOUS DISTRIBUTIONS
2.3 Continuous distributions
If an outcome of an experiment takes a continuous value, for example: a distance, a
temperature, a weight, etc., then it is represented by a continuous random variable.
Definition 2.32
Density and probabilities
The pdf of a continuous random variable X is a non-negative function for all possible
outcomes
f ( x ) ≥ 0 for all x
(2-40)
and has an area below the function of one
Z ∞
−∞
f ( x )dx = 1
(2-41)
It defines the probability of observing an outcome in the range from a to b by
P( a < X ≤ b) =
Z b
a
f ( x )dx
(2-42)
For the discrete case the probability of observing an outcome x is equal to the pdf of x,
but this is not the case for a continuous random variable, where
P( X = x ) = P( x < X ≤ x ) =
Z x
x
f (u)du = 0
(2-43)
i.e. the probability for a continuous random variable to be realized at a single number
P( X = x ) is zero.
The plot in Figure 2.1 shows how the area below the pdf represents the probability of
observing an outcome in a range. Note that the normal distribution is used here for the
examples, it is introduced below in Section 2.3.2.
29
2.3 CONTINUOUS DISTRIBUTIONS
0.2
P (a < X ≤ b)
0.0
0.1
f (x)
0.3
0.4
eNote 2
-4
-2
a
0
2
b
4
x
Figure 2.1: The probability of observing the outcome of X in the range between a and b
is the area below the pdf spanning the range, as illustrated with the colored area.
Definition 2.33
Distribution
The cdf of a continuous variable is defined by
F ( x ) = P( X ≤ x ) =
Z x
−∞
f (u)du
(2-44)
and has the properties (in both the discrete and continuous case): the cdf is nondecreasing and
lim F ( x ) = 0 and lim F ( x ) = 1
x →−∞
x →∞
(2-45)
The relation between the cdf and the pdf is
P( a < X ≤ b) = F (b) − F ( a) =
Z b
a
f ( x )dx
(2-46)
30
2.3 CONTINUOUS DISTRIBUTIONS
0.0
0.2
0.4
F (x)
0.6
P (a < X ≤ b) = F (b) − F (a)
0.8
1.0
eNote 2
-4
-2
a
0
b
2
4
x
Figure 2.2: The probability of observing the outcome of X in the range between a and b
is the distance between F ( a) and F (b).
as illustrated in Figure 2.2.
Definition 2.34
Mean and variance
For a continuous random variable the mean or expected value is
µ = E( X ) =
Z ∞
−∞
x f ( x )dx
(2-47)
hence similar as for the discrete case the outcome is weighted with the pdf.
The variance is
2
2
σ = E[( X − µ) ] =
Z ∞
−∞
( x − µ)2 f ( x )dx
(2-48)
The differences between the discrete and the continuous case can be summed up in two
eNote 2
31
2.3 CONTINUOUS DISTRIBUTIONS
points:
• In the continuous case integrals are used, in the discrete case sums are used.
• In the continuous case the probability of observing a single value is always zero.
In the discrete case it can be positive or zero.
2.3.1 Uniform distribution
A random variable following the uniform distribution has equal density at any value
within a defined range.
Definition 2.35
Uniform distribution
Let X be a uniform distributed random variable
X ∼ U (α, β)
where α and β defines the range of possible outcomes. It has the pdf
(
1
for x ∈ [α, β]
f ( x ) = β−α
0
otherwise
(2-49)
(2-50)
The uniform cdf is
F(x) =



0
x −α
 β−α

1
for x < α
for x ∈ [α, β)
for x ≥ β
In Figure 2.3 the uniform pdf and cdf are plotted.
(2-51)
32
2.3 CONTINUOUS DISTRIBUTIONS
P (a < X ≤ b) = F (b) − F (a)
1
eNote 2
0
0
f (x)
F (x)
1
β−α
P (a < X ≤ b)
α
a
b
β
α
x
a
b
β
x
Figure 2.3: The uniform distribution pdf and cdf.
Theorem 2.36
Mean and variance of the uniform distribution
The mean of a uniform distributed random variable X is
µ=
1
(α + β)
2
(2-52)
1
( β − α )2
12
(2-53)
and the variance is
σ2 =
2.3.2 Normal distribution
The most famous continuous distribution is the normal distribution for many reasons.
Often it is also called the Gaussian distribution. The normal distribution appears naturally for many phenomena and is therefore used in extremely many applications, which
will be apparent in later chapters of the eNotes.
eNote 2
33
2.3 CONTINUOUS DISTRIBUTIONS
Definition 2.37
Normal distribution
Let X be a normal distributed random variable
X ∼ N (µ, σ2 )
(2-54)
where µ is the mean and σ2 is the variance (remember that the standard deviation is
σ). Note that the two parameters are actually the mean and variance of X.
It follows the normal pdf
( x − µ )2
1
−
f ( x ) = √ e 2σ2
σ 2π
(2-55)
and the normal cdf
1
F(x) = √
σ 2π
Theorem 2.38
Z x
−∞
e
−
( u − µ )2
2σ2
du
(2-56)
Mean and variance
The mean of a Normal distributed random variable is
µ
(2-57)
σ2
(2-58)
and the variance is
Hence simply the two parameters defining the distribution.
eNote 2
34
2.3 CONTINUOUS DISTRIBUTIONS
Example 2.39
Example: Let us play with the normal pdf
## Play with the normal distribution
0.2
0.0
0.1
pdfX
0.3
0.4
## The mean and standard deviation
muX <- 0
sigmaX <- 1
## A sequence of x values
xSeq <- seq(-6, 6, by=0.1)
##
pdfX <- 1/(sigmaX*sqrt(2*pi)) * exp(-(xSeq-muX)^2/(2*sigmaX^2))
## Plot the pdf
plot(xSeq, pdfX, type="l")
-6
-4
-2
0
2
4
6
xSeq
Try with different values of the mean and standard deviation. Describe how this
change the position and spread of the pdf?
eNote 2
35
2.3 CONTINUOUS DISTRIBUTIONS
Theorem 2.40
A linear combination of independent normal distributed random variables is also
normal distributed.
Use the mean and variance identities introduced in Section 2.4 to find the mean and
variance of the linear combination as exemplified here:
Example 2.41
Consider two normal distributed random variables
X1 ∼ N (µ X1 , σX2 1 )
and
X2 ∼ N (µ X2 , σX2 2 )
(2-59)
The difference
Y = X1 − X2
(2-60)
Y ∼ N (µY , σY2 )
(2-61)
µ Y = µ X1 − µ X2
(2-62)
σY2 = σX2 1 + σX2 2
(2-63)
is normal distributed
where the mean is
and
where the mean and variance identities introduced in Section 2.4 have been used.
Standard normal distribution
Definition 2.42
Standard normal distribution
The standard normal distribution is the normal distribution with zero mean and
unit variance
Z ∼ N (0, 1)
where Z is the standardized normal random variable.
(2-64)
eNote 2
36
2.3 CONTINUOUS DISTRIBUTIONS
Historically before the widespread use of computers the standardized random variables
were used a lot, since it was not possible to easily evaluate the pdf and cdf, instead
they were looked up in tables for the standardized distributions. This was smart since
transformation into standardized distributions requires only a few simple operations.
Theorem 2.43
able
Transformation to the standardized normal random vari-
A normal distributed random variable X can be transformed into a standardized
normal random variable by
Z=
Example 2.44
X−µ
σ
(2-65)
Percentiles in the standard normal distribution
The most used percentiles (or quantiles) in the standard normal distribution are
Percentile
Quantile
Value
1%
0.01
-2.33
2.5%
0.025
-1.96
5%
0.05
-1.64
25%
0.25
-0.67
75%
0.75
0.67
95%
0.95
1.64
97.5%
0.975
1.96
99%
0.99
2.33
Note that the values can be considered as standard deviations (i.e. for Z the standardized
normal then σZ = 1), which holds for any normal distribution.
The most used quantiles are marked on the plot
37
2.3 CONTINUOUS DISTRIBUTIONS
0.2
q0.25
q0.75
0.1
Density
0.3
0.4
eNote 2
0.0
q0.01
-3
q0.05
q0.025
-2
q0.95
q0.975
q0.99
-1
0
1
2
3
z (σ)
Note that the units on the x-axis is in standard deviations.
Normal pdf details
In order to get insight into how the normal distribution is formed consider the following
steps. In Figure 2.4 the result of each step is plotted.
1. Take the distance to the mean: x − µ
2. Square the distance: ( x − µ)2
3. Make it negative and scale it:
4. Take the exponential: e
−( x −µ)2
(2σ2 )
−( x −µ)2
(2σ2 )
5. Finally, scale it to have an area of one:
−( x −µ)2
√1 e (2σ2 )
σ 2π
Example 2.45
In R functions to generate values from many distributions are implemented. For the normal
distribution the following functions are available:
38
2.3 CONTINUOUS DISTRIBUTIONS
-1
0
1
2
eNote 2
-2
Step
Step
Step
Step
Step
-2
-1
0
1
1
2
3
4
5
2
x
Figure 2.4: The steps involved in calculating the normal distribution pdf.
## Normal distribution functions
## Do it for a sequence of x values
xSeq <- seq(-10, 10, by=0.1)
## The pdf
dnorm(xSeq, mean=0, sd=1)
## The cdf
pnorm(xSeq, mean=0, sd=1)
## The quantiles
qnorm(c(0.01,0.025,0.05,0.5,0.95,0.975,0.99), mean=0, sd=1)
## Generate random normal distributed realizations
rnorm(100, mean=0, sd=1)
## Calculate the probability that that the outcome of X is between a and b
a <- 0.2
b <- 0.8
pnorm(b) - pnorm(a)
## See more details by running "?dnorm"
eNote 2
39
2.3 CONTINUOUS DISTRIBUTIONS
Use the functions to make a plot of the normal pdf with marks of the
2.5%, 5%, 95%, 97.5% percentiles.
Make a plot of the normal pdf and a histogram (empirical pdf) of 100 simulated realizations.
2.3.3 Log-Normal distribution
If a random variable is log-normal distributed then its logarithm is normally distributed.
Definition 2.46
Log-Normal distribution
A log-normal distributed random variable
X ∼ LN (α, β2 )
(2-66)
where α is the mean and β2 is the variance of the normal distribution obtained when
taking the natural logarithm to X.
The log-normal pdf is
ln x −α)
−(
1
2β2
f (x) = √
e
x 2πβ
2
(2-67)
The log-normal distribution occurs in many fields, in particular: biology, finance and
many technical applications.
eNote 2
40
2.3 CONTINUOUS DISTRIBUTIONS
Theorem 2.47
Mean and variance of log-normal distribution
Mean of the log-normal distribution
µ = eα+ β
2 /2
(2-68)
and variance
2
2
σ2 = e2α+ β (e β − 1)
(2-69)
2.3.4 Exponential distribution
The usual application of the exponential distribution is for describing the length (usually time) between events which, when counted, follows a Poisson distribution, see
Section 2.2.4. Hence the length between events which occur continuously and independently at a constant average rate.
Definition 2.48
Exponential distribution
Let X be an exponential distributed random variable
X ∼ Exp(λ)
(2-70)
where λ is the average rate of events.
It follows the exponential pdf
(
f (x) =
λe−λx
0
for x ≥ 0
for x < 0
(2-71)
eNote 2
2.3 CONTINUOUS DISTRIBUTIONS
Theorem 2.49
41
Mean and variance of exponential distribution
Mean of an exponential distribution is
µ=
1
λ
(2-72)
σ2 =
1
λ2
(2-73)
and the variance is
Example 2.50
Exponential distributed time intervals
Simulate a Poisson process by generating exponential distributed time intervals
## Simulate exponential waiting times
## The rate parameter: events per time
lambda <- 4
## Number of realizations
n <- 1000
## Simulate
x <- rexp(n, lambda)
## The empirical pdf
hist(x, probability=TRUE)
## Add the pdf to the plot
xseq <- seq(0.001, 10, len=1000)
lines(xseq, dexp(xseq,lambda), col="red")
42
2.3 CONTINUOUS DISTRIBUTIONS
1.5
1.0
0.0
0.5
Density
2.0
2.5
eNote 2
0.0
0.5
1.0
1.5
x
Furthermore check that by counting the events in fixed length intervals that they follow a
Poisson distribution.
## Check that the exponential distributed intervals form a Poisson process
## by counting the events in each interval
## Sum up to get the running time
xCum <- cumsum(x)
## Use the hist function to count in intervals between the breaks,
## here 0,1,2,...
tmp <- hist(xCum, breaks=0:ceiling(max(xCum)))
## Plot the discrete empirical pdf
plot(table(tmp$counts)/length(tmp$counts))
## Add the Poisson pdf to the plot
lines(0:20, dpois(0:20,lambda), type="h", col="red")
43
2.3 CONTINUOUS DISTRIBUTIONS
0.10
0.20
Emperical pdf
pdf
0.00
Density
0.30
eNote 2
0
2
4
6
8
10
12
x
t2
t1
0.0
t4
t3
0.5
t6
t5
t8
t7
1.0
t10
t9
1.5
2.0
2.5
Time
Figure 2.5: Exponential distributed time intervals in a simulated Poisson process
eNote 2
44
2.4 IDENTITIES FOR THE MEAN AND VARIANCE
2.4 Identities for the mean and variance
Rules for calculation of the mean and variance of linear combinations of independent
random variables are introduced here. They are valid for both the discrete and continuous case.
Theorem 2.51
Mean and variance of linear functions
Let Y = aX + b then
E(Y ) = E( aX + b) = aE( X ) + b
(2-74)
Var(Y ) = Var( aX + b) = a2 Var( X )
(2-75)
and
Random variables are often scaled, i.e. aX, for example when shifting units.
Example 2.52
A bike shop sells 100 bikes per month varying with a standard deviation of 15. They earn 200
Euros per bike. What is their average earnings per month?
Let X be the number of bikes sold per month. In average they sell µ = 100 bikes per month
and it varies with a variance of σX2 = 225. The shops monthly earnings Y has then a mean
and standard deviation of
E(Y ) = E(200X ) = 200E( X ) = 200 · 100 = 20000 Euro/month
(2-76)
σY =
q
Var(Y ) =
q
Var(200X ) =
q
2002 Var( X ) =
√
40000 · 225 = 3000 Euro/month
(2-77)
eNote 2
2.4 IDENTITIES FOR THE MEAN AND VARIANCE
Theorem 2.53
45
Mean and variance of linear combinations
The mean of a linear combination of random variables (Xi ) is
E( a1 X1 + a2 X2 + .. + an Xn ) = a1 E( X1 ) + a2 E( X2 ) + .. + an E( Xn )
(2-78)
if further the random variables are independent then the variance is
Var( a1 X1 + a2 X2 + . . . + an Xn ) = a21 Var( X1 ) + a22 Var( X2 ) + . . . + a2n Var( Xn ) (2-79)
Example 2.54
Return to the dice example to emphasize an important point.
Think of a scaling (i.e. a linear function) of a single roll with a dice, say five times
Y scale = 5X
(2-80)
then the spread (standard deviation) of the distribution increases linearly with the scaling of
the mean
σY = 5σX
(2-81)
whereas for a sum (linear combination) of five rolls
Y sum = X1 + X2 + X3 + X4 + X5
(2-82)
E(Y sum ) = E(Y scale ) = 5E( X )
(2-83)
their mean will be the same
the spread however will increase only with the square root of the number
√
σY = 5σX
(2-84)
This is simply because when applying the sum to many random outcomes, then the high and
low outcomes will even out each other, such that the variance will be smaller for a sum than
for a scaling.
eNote 2
46
2.5 COVARIANCE AND CORRELATION
2.5 Covariance and correlation
In this chapter (Section 2.2.1) we have discussed mean and variance (or standard deviation), and the relation to the sample versions (sample mean/average and sample
variance). In Chapter 1 we discussed the sample covariance and sample correlation,
these two measures also have theoretical justification, namely covariance and correlation, which we will discuss in this section. We start by the definition of covariance.
Definition 2.55
Covariance
Let X and Y be two random variables, then the covariance between X and Y, is
Cov( X, Y ) = E [( X − E[ X ])(Y − E[Y ])] .
(2-85)
Remark 2.56
It follows immediately from the definition that Cov( X, X )
Cov( X, Y ) = Cov(Y, X ).
=
Var( X ) and
An important concept in statistics is independence (see Section 2.6 for a formal definition). We often assume that realizations (random variables) are independent. If two
random variables are independent then their covariance will be zero, the reverse is however not necessarily true (see also the discussion on sample correlation i Chapter 1).
The following calculation rule apply to covariance between two random variables X
and Y.
Theorem 2.57
Covariance between linear combinations
Let X and Y be two random variables, then
Cov( a0 + a1 X + a2 Y, b0 + b1 X + b2 Y ) = a1 b1 Var( X ) + a2 b2 Var(Y )+
( a1 b2 + a2 b1 )Cov( X, Y )
(2-86)
eNote 2
2.5 COVARIANCE AND CORRELATION
47
Proof
Let Z1 = a0 + a1 X + a2 Y and Z2 = b0 + b1 X + b2 Y then
Cov( Z1 , Z2 ) = E[( a1 ( X − E[ X ]) + a2 (Y − E[Y ]))(b1 ( X − E[ X ]) + b2 (Y − E[Y ]))]
= E[ a1 ( X − E[ X ])b1 ( X − E[ X ])] + E[ a1 ( X − E[ X ])b2 (Y − E[Y ])]+
E[ a2 (Y − E[Y ])b1 ( X − E[ X ])] + E[ a2 (Y − E[Y ])b2 (Y − E[Y ])]
= a1 b1 Var( X ) + a2 b2 Var(Y ) + ( a1 b2 + a2 b2 )Cov( X, Y )
(2-87)
(2-88)
(2-89)
Example 2.58
Let X ∼ N (3, 22 ) and Y ∼ N (2, 1) and the covariance between X and Y given by Cov( X, Y ) =
1. What is the variance of the random variable Z = 2X − Y.
Var( Z ) =Cov[2X − Y, 2X − Y ] = 22 Var( X ) + Var(Y ) − 4Cov( X, Y )
2 2
=2 2 + 1 − 4 = 13
(2-90)
(2-91)
2.5.1 Correlation
We have already seen (in Chapter 1) that the sample correlation measures the degree of
linear dependence between two variables. The theoretical counterpart is the correlation
Definition 2.59
Correlation
Let X and Y be two random variables with Var( X ) = σx2 , Var(Y ) = σy2 , and
Cov( X, Y ) = σxy , then the correlation between X and Y is
ρ xy =
σxy
σx σy
(2-92)
eNote 2
2.6 INDEPENDENCE OF RANDOM VARIABLES
48
Remark 2.60
The correlation is a number between -1 and 1.
Example 2.61
Let X ∼ N (1, 22 ) and e ∼ N (0, 0.52 ) be independent random variables, find the correlation
between X and Z = X + e.
The variance of Z is
Var( Z ) = Var( X ) + Var(e) = 4 + 0.25 = 4.25.
(2-93)
The covariance between X and Z is
Cov( X, Z ) = Cov( X, X + e) = Var( X ) = 4,
(2-94)
and hence
ρ xz = √
4
4.25 · 4
= 0.97.
(2-95)
2.6 Independence of random variables
In statistics the concept of independence is very important, and in order to give a formal definition of independence we will need the definition of two-dimensional random
variables. The probability density function of a two-dimensional discrete random variable, called the joint probability density function, is,
eNote 2
49
2.6 INDEPENDENCE OF RANDOM VARIABLES
Definition 2.62
Joint pdf of two-dimensional discrete random variables
The pdf of a two-dimensional discrete random variable [ X, Y ] is
f ( x, y) = P( X = x, Y = y)
(2-96)
with the properties
∑∑
f ( x, y) ≥0 for all ( x, y)
f ( x, y) =1
(2-97)
(2-98)
all x all y
Remark 2.63
P( X = x, Y = y) should be read: the probability of X = x and Y = y.
Example 2.64
Imagine two throws with an honest coin: the possible outcome of each throw is either head
or tail, which will be given the values 0 and 1 respectively. The complete set of outcomes is
(0,0), (0,1), (1,0), and (1,1) each with probability 1/4. And hence the pdf is
f ( x, y) =
1
;
4
x = {0, 1}, y = {0, 1}
(2-99)
further we see that
1
1
∑∑
x =0 y =0
1
f ( x, y) =
∑ ( f (x, 0) + f (x, 1)) = ( f (0, 0) + f (0, 1)) + ( f (1, 0) + f (1, 1))
(2-100)
x =0
=1
(2-101)
The formal definition of independence for a two dimensional discrete random variable
is
eNote 2
50
2.6 INDEPENDENCE OF RANDOM VARIABLES
Definition 2.65
Independence of discrete random variables
Two discrete random variables X and Y are said to be independent if and only if
P( X = x, Y = y) = P( X = x ) P(Y = y)
(2-102)
Example 2.66
Example 2.64 is an example of two independent random variables, to see this write the probabilities
P( X = 0) = ∑ f (0, y) =
1
2
(2-103)
P( X = 1) = ∑ f (1, y) =
1
2
(2-104)
y
y
and similar P(Y = 0) = P(Y = 1) = 1/2, now we see that P( X = x ) P(Y = y) = 1/4 for x
and y equal 0 or 1, which is equal to f ( x, y).
Example 2.67
Now imagine that for the second throw we only observe the sum (Z) of X and Y. In this case
the set of outcomes ( X, Z ) is (0, 0), (0, 1), (1, 1), (1, 2), each with probability 1/4. Now we see
that P( X = 0) = P( X = 1) = P( Z = 1) = 1/2, while P( Z = 0) = P( Z = 2) = 1/4, hence
P( X = 0, Z = 0) = P( X = 0, Z = 1) = P( X = 1, Z = 1) = P( X = 1, Z = 2) = 1/4 (2-105)
but, e.g
P ( X = 0) P ( Z = 0) =
1 1
1
1
· = 6= = P( X = 0, Z = 0)
2 4
8
4
and hence we see that X and Z are not independent.
(2-106)
eNote 2
51
2.6 INDEPENDENCE OF RANDOM VARIABLES
Remark 2.68
In the construction above it is quite clear that X and Z cannot be independent. In real
applications we do not know exactly how the outcomes are realized and therefore
we will need to assume independence (or test it).
To be able to define independence of continuous random variables, we will need the pdf
of a two-dimensional random variable.
Definition 2.69
Pdf of two dimensional continous random variables
The pdf of a two-dimensional continous random variable [ X, Y ] is a function f ( x, y)
from R2 into R+ with the properties
Z Z
f ( x, y) ≥0 for all ( x, y)
f ( x, y)dxdy =1
(2-107)
(2-108)
Just as for one-dimensional random variables the probability interpretation is in form
of integrals
P(( X, Y ) ∈ A) =
Example 2.70
Z
A
f ( x, y)dxdy
(2-109)
Bivariate normal distribution
The most important two-dimensional distribution is the bivariate normal distribution
f ( x1 , x2 ) =
2π
1
p
1
|Σ|
e− 2 ( x−µ)
T Σ −1 ( x − µ )
1
=
2π
q
2
σ11 σ22 − σ12
e
−
2 ( x −µ )2 +σ ( x −µ )2 −2σ ( x −µ )( x −µ )
σ22
22 2
2
2
1
1
12 1
1 2
2(σ11 σ22 −σ2 )
12
(2-110)
(2-111)
here x = ( x1 , x2 ), and µ = [ E( X1 ), E( X2 )], and Σ is the socalled variance-covariance matrix
with elements (Σ)ij = σij = Cov( Xi , X j ) (note that σ12 = σ21 ), | · | is the determinant, and Σ−1
is the inverse of Σ.
eNote 2
52
2.6 INDEPENDENCE OF RANDOM VARIABLES
Definition 2.71
Independence of continous random variables
Two continous random variables X and Y are said to be independent if
f ( x, y) = f ( x ) f (y)
(2-112)
We list here some properties of independent random variables.
Theorem 2.72
Properties of independent random variables
If X and Y are independent then
• E( XY ) = E( X ) E(Y )
• Cov( X, Y ) = 0
Let X1 , . . . , Xn be independent indentically distributed random variables then
Cov( X, Xi − X ) = 0
(2-113)
Proof.
E( XY ) =
=
Z Z
Z
xy f ( x, y)dxdy =
x f ( x )dx
Z
Z Z
xy f ( x ) f (y)dxdy
y f (y)dy = E( X ) E(Y )
Cov( X, Y ) = E[( X − E( X ))(Y − E(Y ))]
= E[ XY ] − E[ E( X )Y ] − E[ XE(Y )] + E( X ) E(Y )
=0
Cov( X, Xi − X ) =Cov( X, Xi ) − Cov( X, X )
1
1
= σ2 − 2 Cov(∑ Xi , ∑ Xi )
n
n
1 2
1
= σ − 2 nσ2 = 0
n
n
(2-114)
(2-115)
(2-116)
(2-117)
(2-118)
(2-119)
(2-120)
(2-121)
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
53
Remark 2.73
Note that Cov( X, Y ) = 0 does not imply that X and Y are independent. However,
if X and Y follow a bivariate normal distribution, then if X and Y are uncorrelated
then they are also independent.
2.7 Functions of normal random variables
This section will cover some important functions of normal random variable. In general
question of how an arbitrary function of a random variable is distributed, cannot be
answered on closed form. We have already discussed simulation as an explorative tool,
this is covered in more details in Chapter 4.1, and it is recommented that you read
through Chapter 4.1 before you read this section.
The simplest function we can think of is linear combinations of normal random variables.
Theorem 2.74
Linear combinations of normal random variables
Let X1 , . . . , Xn be independent normal random variables, then any linear combination of X1 , . . . , Xn will follow a normal distribution, with mean and variance given
in Theorem 2.53.
Remark 2.75
Note that combining Theorems 2.74 and 2.72, and Remark 2.73 imply that X and
Xi − X are independent.
In addition to the result given in the theorem above we will cover 3 additional distibutions (χ2 , student-t and F) that are very important for the statistical inference covered
in the following chapters.
eNote 2
54
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
2.7.1 The χ2 distribution
The χ2 (chi-square) distribution is defined by
Definition 2.76
Let X be χ2 distributed, then its pdf is
f (x) =
where Γ
ν
2
1
2 Γ
ν
2
x
x 2 −1 e − 2 ;
ν
ν
2
x≥0
(2-122)
is the Γ-function and ν is the degrees of freedom.
An alternative definition (here formulated as a theorem) of the χ2 -distribution is
Theorem 2.77
Let X1 , . . . , Xν be independent random variables following the standard normal distribution, then
ν
∑ Xi2 ∼ χ2 (ν)
(2-123)
i =1
We will omit the proof of the theorem as it requires more probabilty calculus than covered here. Rather we will present a small simulation example that illustrate how the
theorem can be checked by a simulation
eNote 2
55
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
Example 2.78
## Simulate
set.seed(125)
n <- 10; k <- 200
Z <- replicate(k, sum(rnorm(n)^2))
## Plot
par(mfrow=c(1,2))
hist(Z, freq = FALSE)
curve(dchisq(x, df = n), add = TRUE, col = 2, lwd = 2)
plot(ecdf(Z))
curve(pchisq(x, df = n), add = TRUE, col = 2, lwd = 2)
1.0
ecdf(Z)
0.0
0.00
0.2
0.02
0.4
0.04
Fn(x)
Density
0.6
0.06
0.8
0.08
0.10
Histogram of Z
5
10
15
20
5
Z
10
15
20
25
x
The left plot compares the histogram and the theoretical pdf while the right plot compare the
empirical CDF with the theoretical CDF.
Theorem 2.79
Given a sample of size n from the normal distributed random variables Xi with
variance σ2 , then the sample variance S2 (viewed as random variable) can be transformed into
χ2 =
( n − 1) S2
σ2
which follows the χ2 distribution with the degrees of freedom ν = n − 1.
(2-124)
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
56
Proof. Start by rewriting (n − 1)S2 /σ2
n
( n − 1) S2
=
∑
σ2
i =1
Xi − X
σ
2
(2-125)
2
Xi − µ + µ − X
=∑
σ
i =1
2
n
n n
X−µ
Xi − µ 2
( X − µ)( Xi − µ)
=∑
+∑
−2∑
σ
σ
σ2
i =1
i =1
i =1
2
2
n X−µ
X−µ
Xi − µ 2
=∑
− 2n
+n
σ
σ
σ
i =1
2
n
X−µ
Xi − µ 2
√
−
=∑
σ
σ/ n
i =1
n
we know that
Xi − µ
σ
∼ N (0, 1) and
X −µ
√
σ/ n
(2-126)
(2-127)
(2-128)
(2-129)
∼ N (0, 1), and hence the left hand side is a
χ2 (n) distributed random variable minus a χ2 (1) distributed random variable (also X
and S2 are independent, see Theorems 2.72, and 2.74, and Remark 2.73). Hence the left
hand side must be χ2 (n − 1).
If someone claims that a sample comes from a specific normal distribution (i.e. Xi ∼
N (µ, σ2 ), then we can examine probabilities of specific outcomes of the sample variance.
Such calculation will be termed hypethesis test in later chapters.
Example 2.80
A manufacture of machines for dosing milk claims that their machines can dose with a precision defined by the normal distribution with a standard deviation less than 2% of the dose
volume in the operation range. A sample of n = 20 observations was taken to check if the
precision was as claimed. The sample deviation was calculated to 0.03.
Hence the claim is that σ ≤ 0.02, thus we want to answer the question: if σ = 0.02 (i.e.
the upper limit of the claim), what is then the probability of getting the sampling deviation
s ≥ 0.03?
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
57
## Chi-square milk dosing precision
## The sample size
n <- 20
## The claimed deviation
sigma <- 0.02
## The sample deviation
s <- 0.03
## Calculate the chi-square statistic
chiSq <- (n-1)*s^2 / sigma^2
## Use the cdf to calculate the probability of getting the sample deviation
## or more higher
1 - pchisq(chiSq, df=n-1)
[1] 0.0014023
It seems very unlikely that the standard deviation is below 2% since the probability of obtaining this sample variance under these conditions is very small. The probability we just
found will be termed a p-value in latter chapters, and p-values are a very important concept
in testing of hypothesis.
The probability in the above example will be called the p-value in later chapters and it
is a fundamental concept in statistics.
Theorem 2.81
Mean and variance
Let X ∼ χ2 (ν) then the mean and variance of X is
E( X ) = ν;
V ( X ) = 2ν
(2-130)
We will omit the proof of this theorem, but it is easily checked by a symbolic calculation
software (like e.g. Maple).
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
58
Example 2.82
We want to calculate the expected value of the sample variance (S2 ) based on n observations
with Xi ∼ N (µ, σ2 ). We have all ready seen that nσ−21 S2 ∼ χ2 (n − 1) and we can therefore
write
σ2 n − 1
2
E
S
n − 1 σ2
σ2
n−1 2
=
S
E
n−1
σ2
σ2
=
( n − 1) = σ 2
n−1
E ( S2 ) =
(2-131)
(2-132)
(2-133)
and we say that S2 is a central estimator for σ2 . We can also find the variance of the estimator
2
V (S ) =
=
Example 2.83
σ2
n−1
2
V
n−1 2
S
σ2
σ4
σ4
2( n − 1) = 2
2
( n − 1)
n−1
(2-134)
(2-135)
Pooled variance
Suppose now that we have two different samples (not yet realized) X1 , . . . , Xn1 and Y1 , . . . , Yn2
with Xi ∼ N (µ1 , σ2 ) and Yi ∼ N (µ2 , σ2 ) (both i.i.d.). Let S12 be the sample variance based on
the X’s and S22 be the sample variance based on the Y’s. Now both S12 and S22 will be central
estimators for σ2 , and so will any weighted average of the type
S2 = aS12 + (1 − a)S22 ;
a ∈ [0, 1].
(2-136)
Now we would like to choose a such that the variance of S2 is as small as possible, and hence
we calculate the variance of S2
σ4
σ4
+ (1 − a )2 2
n1 − 1
n2 − 1
1
1
4
2
2
=2σ a
+ (1 − a )
n1 − 1
n2 − 1
V ( S2 ) = a2 2
(2-137)
(2-138)
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
59
In order to find the minimum we differentiate with respect to a
∂V (S2 )
1
1
4
=2σ 2a
− 2(1 − a )
∂a
n1 − 1
n2 − 1
1
1
1
+
=4σ4 a
−
n1 − 1 n2 − 1
n2 − 1
n1 + n2 − 2
1
4
=4σ a
−
(n1 − 1)(n2 − 1) n2 − 1
(2-139)
(2-140)
(2-141)
which is zero for
a=
n1 − 1
n1 + n2 − 2
(2-142)
In later chapters we will refer to this choice of a as the pooled variance (S2p ), inserting in
(2-136) gives
S2p =
(n1 − 1)S12 + (n2 − 1)S22
n1 + n2 − 2
(2-143)
note that S2p is a weighted (proportional to the number of observations) average of the sample
variances. It can also be shown (you are invited to do this) that n1 +σn22 −2 S2p ∼ χ2 (n1 + n2 − 2).
Note that the assumption of equal variance in the two samples is crucial in the calculations
above.
2.7.2 The t-distribution
The t-distribution is the sampling distribution of the sample mean standardized with the
sample variation. It is valid for all sample sizes, however for larger sample sizes (n >
30) the difference between the t-distribution and the normal distribution is very small.
Hence for larger sample sizes the normal distribution is often applied.
Definition 2.84
The t-distribution pdf is
f T (t) =
1
Γ ( ν+
2 )
√
νπ Γ( ν2 )
1+
t2
ν
− ν +1
2
where ν is the degrees of freedom and Γ() is the Gamma function.
(2-144)
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
60
The relation between normal random variables and χ2 -distributed random variables are
given in the following theorem
Theorem 2.85
Let Z ∼ N (0, 1) and Y ∼ χ2 (ν), then
X= √
Z
∼ t(ν)
Y/ν
(2-145)
We will not prove this theorem, but show by an example how this can be illustrated by
simulation.
Example 2.86
par(mar = c(3.5, 3.5, 1.5, 0.5))
## Simulate
set.seed(125)
nu <- 8; k <- 200
Z <- rnorm(k)
Y <- rchisq(k, df=nu)
X <- Z/sqrt(Y/nu)
## Plot
par(mfrow=c(1,2))
hist(X, freq = FALSE, xlim = qt(c(0.001, 0.999), df = nu), xlab=’x’)
curve(dt(x, df = nu), add =TRUE, col = 2, lwd = 2)
plot(ecdf(X), xlim = qt(c(0.001, 0.999), df = nu))
curve(pt(x, df = nu), add =TRUE, col = 2, lwd = 2)
eNote 2
61
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
ecdf(X)
0.6
Fn(x)
0.20
0.4
0.15
0.0
0.00
0.05
0.2
0.10
Density
0.25
0.8
0.30
1.0
0.35
Histogram of X
-4
-2
0
2
4
-4
x
-2
0
2
4
x
The left plot compare the histogram and the theoretical pdf while the right plot compare the
empirical CDF with the theoretical CDF.
The t-distribution arises when a sample is taken of a normal distributed random variable, then the standardized (with the sample variance) sample mean follows the tdistribution.
Theorem 2.87
Given a sample of normal distributed random variables X1 , . . . , Xn , then the random
variable
T=
X−µ
√ ∼ t ( n − 1)
S/ n
(2-146)
follows the t-distribution, where X is the sample mean, µ is the mean of X, n is the
sample size and S is the sample standard deviation.
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
Proof. Note that
X −µ
√
σ/ n
∼ N (0, 1) and
( n −1) S2
σ2
T =r
=
62
∼ χ2 (n − 1), hence
X −µ
√
σ/ n
( n −1) S2
σ 2 ( n −1)
X −µ
√
1/ n
= √
S2
X−µ
√ ∼ t ( n − 1)
S/ n
(2-147)
(2-148)
We could also verify this by simulation
Example 2.88
## Simulate
set.seed(125)
n <- 8; k <- 200
mu <- 1; sigma <- 2
X <- replicate(k,rnorm(n, mean = mu, sd = sigma)) # Return a (n x k) matrix
Xbar <- apply(X, 2, mean)
S <- apply(X, 2, sd)
T <- (Xbar - mu)/(S/sqrt(n))
## Plot
par(mfrow=c(1,2))
hist(T, freq = FALSE,xlim = qt(c(0.001, 0.999), df = n-1))
curve(dt(x, df = n -1), add = TRUE, col = 2, lwd = 2)
plot(ecdf(T),xlim = qt(c(0.001,0.999), df = n-1))
curve(pt(x, df = n -1), add = TRUE, col = 2, lwd = 2)
eNote 2
ecdf(T)
0.0
0.00
0.05
0.2
0.10
0.4
0.15
Fn(x)
0.6
0.20
0.8
0.25
0.30
1.0
Histogram of T
Density
63
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
-4
-2
0
T
2
4
-4
-2
0
2
4
x
The left plot compare the histogram and the theoretical pdf while the right plot compare the
empirical CDF with the theoretical CDF.
Note that X and S are random variables, since they are the sample mean and standard
deviation of a sample (not yet taken) consisting of realizations of X.
Very often samples with only few observations are available. In this case by assuming
normality of the population (i.e. the Xi ’s are normal distributed) and for a some mean µ,
the t-distribution can be used to calculate the probability of obtaining the sample mean
in a given range.
Example 2.89
An electric car manufacture claims that their cars can drive in average 400 km on a full charge
at a specified speed. From experience it was known that this full charge distance, denote it
by X, is normal distributed. A test of n = 10 cars was carried out, which resulted in a sample
mean of x̄ = 382 km and a sample deviation of s = 14.
Now we can use the t-distribution to calculate the probability of obtaining this value of the
sample mean or lower, if their claim about the mean is actually true.
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
64
## Calculate the probability of getting the sample mean given that the
## claim is the real mean
## A test of 10 cars was carried out
n <- 10
## The claim is that the real mean is 400 km
muX <- 400
## From the sample the sample mean was calculated to
xMean <- 393
## And the sample deviation was
xSD <- 14
## Use the cdf to calculate the probability of obtaining this sample mean or
## a lower value
pt( (xMean - muX) / (xSD / sqrt(n)), df = n - 1)
[1] 0.074152
If we had the same sample mean and sample deviation, how do you think changing
the number of observations will affect the calculated probability? Try it out.
The t-distribution converges to the normal distribution as the simple size increases. For
small sample sizes it has a higher spread than the normal distribution. For larger sample
sizes with n > 30 observations the difference between the normal and the t-distribution
is very small.
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
Example 2.90
65
t-distribution
Generate plots to see how the t-distribution is shaped compared to the normal distribution.
## Plot the t-distribution for different sample sizes
0.4
## First plot the standard normal distribution
xseq <- seq(-5,5,len=1000)
plot(xseq, dnorm(xseq), type="l", xlab="x", ylab="Density")
## Add the t-distribution for 30 observations
lines(xseq, dt(xseq, df=30-1), col=2)
## Add the t-distribution for 15, 5 and 2 observations
lines(xseq, dt(xseq, df=15-1), col=3)
lines(xseq, dt(xseq, df=5-1), col=4)
lines(xseq, dt(xseq, df=2-1), col=5)
0.2
0.0
0.1
Density
0.3
Norm.
n= 30
n= 15
n= 5
n= 2
-4
-2
0
2
4
x
How does the number of observations affect the shape of the t-distribution pdf compared to the normal pdf?
eNote 2
66
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
Theorem 2.91
Mean and variance
Let X ∼ t(ν) then the mean and variance of X is
E( X ) =0;
V (X) =
ν>1
ν
;
ν−2
(2-149)
ν>2
(2-150)
We will omit the proof of this theorem, but it is easily checked by an symbolic calculation
software (like e.g. Maple).
Remark 2.92
For ν ≤ 1 the expectation (and hence the variance) is not defined (the integral is not
absolutely convergent), and for ν ∈ (1, 2] (1 < ν ≤ 2) the variance is equal ∞. Note
that this does not violate the general definition of probability density functions.
2.7.3 The F distribution
The F-distribution is defined by
Definition 2.93
The F-distribution pdf is
1
f F (x) =
ν1 ν2 B 2, 2
ν1
ν2
ν1
2
x
ν1
2 −1
ν
1+ 1x
ν2
− ν1 +ν2
2
(2-151)
where ν1 an ν2 are the degrees of freedom and B(·, ·) is the Beta function.
The F-distribution appaers as the ratio between two independent χ2 -distributed random variables.
eNote 2
67
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
Theorem 2.94
Let U ∼ χ2 (ν1 ) and V ∼ χ2 (ν2 ), be independent then
F=
U/ν1
∼ F (ν1 , ν2 )
V/ν2
(2-152)
Again we will omit the proof of the theorem and rather show how it can be visualised
by simulation
Example 2.95
## Simulate
set.seed(125)
nu1 <- 8; nu2 <- 10; k <- 200
U <- rchisq(k,df=nu1)
V <- rchisq(k,df=nu2)
F <- (U/nu1) / (V/nu2)
## Plot
par(mfrow=c(1,2))
hist(F, freq = FALSE)
f <- seq(0, max(F), length=100)
curve(df(x, df1 = nu1, df2 = nu2), add = TRUE, col = 2, lwd = 2)
plot(ecdf(F))
curve(pf(x, df1 = nu1, df2 = nu2),add = TRUE, col = 2, lwd = 2)
ecdf(F)
0.6
Fn(x)
0.4
0.4
0.3
0.2
0.2
0.0
0.1
0.0
Density
0.5
0.8
0.6
0.7
1.0
Histogram of F
0
1
2
3
F
4
5
0
1
2
3
x
4
5
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
68
The left plots compare the histogram and the theoretical pdf while the right plot compare the
empirical CDF with the theoretical CDF.
Theorem 2.96
Let X1 , . . . , Xn1 be independent and sampled from a normal distribution with mean
µ1 and variance σ12 , further let Y1 , . . . , Yn2 be independent and sampled from a normal distribution with mean µ2 and variance σ22 . Then the statistic
F=
Proof. Note that
(n1 −1)S12
σ12
S12 /σ12
∼ F (n1 − 1, n2 − 1)
S22 /σ22
∼ χ2 (n1 − 1) and
(n1 −1)S12
σ12 (n1 −1)
(n2 −1)S22
σ22 (n2 −1)
=
S12
σ12
S22
σ22
(n2 −1)S22
σ22
(2-153)
∼ χ2 (n2 − 1) and hence
∼ F (n1 − 1, n2 − 1)
We can also illustrate this sample version by simulation
(2-154)
eNote 2
69
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
Example 2.97
## Simulate
set.seed(125)
n1 <- 8; n2 <- 10; k <- 200
mu1 <- 2; mu2 <- -1
sigma1 <- 2; sigma2 <- 4
S1 <- replicate(k, sd(rnorm(n1, mean = mu1, sd = sigma1)))
S2 <- replicate(k, sd(rnorm(n2, mean = mu2, sd = sigma2)))
F <- (S1^2 / sigma1^2) / (S2^2 / sigma2^2)
## Plot
par(mfrow=c(1,2))
hist(F, freq = FALSE, ylim = range(df(f, df1 = n1-1, df2 = n2-1)))
curve(df(x, df1 = n1-1, df2 = n2-1), add =TRUE, col = 2, lwd = 2)
plot(ecdf(F))
curve(pf(x, df1 = n1-1, df2 = n2-1), add =TRUE, col = 2, lwd = 2)
ecdf(F)
0.6
Fn(x)
0.4
0.4
0.3
0.0
0.0
0.1
0.2
0.2
Density
0.5
0.8
0.6
0.7
1.0
Histogram of F
0
1
2
3
4
F
5
6
0
2
4
6
x
The left plots compare the histogrqam and the theoretical pdf while the right plot compare
the empirical CDF with the theoretical CDF.
eNote 2
2.7 FUNCTIONS OF NORMAL RANDOM VARIABLES
70
Remark 2.98
Of particular importance in statistics is the case when σ1 = σ2 , in this case
F=
Theorem 2.99
S12
∼ F (n1 − 1, n2 − 1)
S22
(2-155)
Mean and variance
Let F ∼ F (ν1 , ν2 ) then the mean and variance of F is
ν2
; ν2 > 2
ν2 − 2
2ν22 (ν1 + ν2 − 2)
V ( F) =
;
ν1 (ν2 − 2)2 (ν2 − 4)
E( F ) =
(2-156)
ν2 > 4
(2-157)
eNote 2
2.8 EXERCISES
71
2.8 Exercises
Exercise 1
Discrete random variable
a) Let X be a random variable. When running the R-command dbinom(4,10,0.6) R
returns 0.1115, written as:
dbinom(4,10,0.6)
[1] 0.11148
What distribution is applied and what does 0.1115 represent?
b) Let X be the same random variable as above. The following are results from R:
pbinom(4,10,0.6)
[1] 0.16624
pbinom(5,10,0.6)
[1] 0.3669
Calculate the following probabilities: P( X ≤ 5), P( X < 5), P( X > 4) and P( X =
5).
c) Let X be a random variable. From R we get:
dpois(4,3)
[1] 0.16803
What distribution is applied and what does 0.168 represent?
eNote 2
2.8 EXERCISES
72
d) Let X be the same random variable as above. The following are results from R:
ppois(4,3)
[1] 0.81526
ppois(5,3)
[1] 0.91608
Calculate the following probabilities: P( X ≤ 5), P( X < 5), P( X > 4) and P( X =
5).
Exercise 2
Course passing proportions
a) If a passing proportion for a course given repeatedly is assumed to be 0.80 in
average, and there are 250 students who are taking the exam each time, what is
the expected value µ and standard deviation σ, for the number of students who
do not pass the exam for a randomly selected course?
Exercise 3
Notes in a box
A box contains 6 notes:
On 1 of the notes there is the number 1
On 2 of the notes there is the number 2
On 2 of the notes there is the number 3
On 1 of the notes there is the number 4
Two notes are drawn at random from the box. The following random variable is introduced: X, which describes the number of notes with the number 4 among the 2 drawn.
The two notes are drawn without replacement.
eNote 2
2.8 EXERCISES
73
a) Calculate the mean and variance for X, and the probability of drawing no note
with the number 4, hence P( X = 0):
b) The 2 notes are now drawn with replacement. What is the probability that none
of the 2 notes has the number 1 on it?
Exercise 4
Consumer Survey
An investigation of the quality of price tagging was carried out, where 20 different products were purchased in a store. Discrepancies between the price appearing on the sales
slip and the price payed at the shelf were found in 6 of these purchased products.
a) At the same time a customer buys 3 random (different) items within the group
consisting of the 20 products in the store. The probability that no discrepancies
occurs for this customer is?
Exercise 5
Hay delivery quality
A horse owner receives 20 bales of hay in a sealed plastic packaging. To control the hay,
3 bales of hay are randomly sampled, and checked whether they contain harmful fungal
spores.
It is believed that among the 20 bales of hay 2 bales are infected with fungal spores. A
random variable X describes the number of infected bales of hay among the 3 sampled.
a) The mean of X, (µ), the variance of X, (σ2 ) and P( X ≥ 1) are:
eNote 2
2.8 EXERCISES
74
b) Another supplier advertises that no more than 1% of his bales of hay are infected.
The horse owner buys 10 bales of hay from this supplier, and decides to buy hay
for the rest of the season from this supplier if the 10 bales are not infected.
The probability that none of the 10 purchased bales of hay are infected, if 1% of the
bales from a supplier are infected ( p1 ) and the probability that the 10 purchased
bales of hay are error-free, if 10% of the bales from a supplier are infected ( p10 )
are:
Exercise 6
Newspaper consumer survey
In a consumer survey performed by a newspaper, 20 different groceries (products) were
purchased in a grocery store. Discrepancies between the price appearing on the sales
slip and the shelf price were found in 6 of these purchased products.
a) Let X denote the number of discrepancies when purchasing 3 random (different)
products within the group of the 20 products in the store. What is the mean and
variance of X?
Exercise 7
A fully automated production
On a large fully automated production plant items are pushed to a side band at random
time points, from which they are automatically fed to a control unit. The production
plant is set up in such a way that the number of items sent to the control unit on average
is 1.6 items per minute. Let the random variable X denote the number of items pushed
to the side band in 1 minute. It is assumed that X follows a Poisson distribution.
a) The probability that there will arrive more than 5 items at the control unit in a
given minute is:
b) The probability that no more than 8 items arrive to the control unit within a 5minute period is:
eNote 2
2.8 EXERCISES
Exercise 8
75
Call center staff
The staffing for answering calls in a company is based on that there will be 180 phone
calls per hour randomly distributed. If there are 20 calls or more in a period of 5 minutes
the capacity is exceeded, and there will be an unwanted waiting time, hence there is a
capacity of 19 calls per 5 minutes.
a) What is the probability that the capacity is exceeded in a random period of 5 minutes?
b) If the probability should be at least 99% that all calls will be handled without
waiting time for a randomly selected period of 5 minutes, how large should the
capacity per 5 minutes then at least be?
Exercise 9
Illegal importation of dogs
By illegal importation of dogs from Eastern Europe, often dogs are received that are not
vaccinated against a specific disease. A new product is on the market for the treatment
of non-vaccinated dogs, if they are attacked by the disease. So far the disease has caused
a great mortality among imported dogs, but the producer of the new product claims
that at least 80% of infected dogs will survive if treated with the product. In an animal
clinique 30 dogs were treated with the new product, and 19 of them were cured.
a) Does this result disprove the manufacturer’s claim, if a significance level of α =
5% is used (both the conclusion and the argument must be included in the answer):
b) Veterinarians are required to report cases of illness. In one period, there were on
average 10 reported cases of illness per workday. If you use the usual probability
distribution for the number of events per time unit with an average of 10 per day,
the probability of getting 16 reported cases or more on any given day is:
eNote 2
2.8 EXERCISES
76
c) What is the mean and standard deviation of X?
d) Chickens are received in batches of 1500. A batch is controlled by sampling 15
chickens for control and the batch is accepted if all the controlled chickens are
free of salmonella. Assume that the chickens with different origin than Denmark
contains 10% chickens infected with salmonella. Assume that the Danish produced chicken contains 1% chickens infected with salmonella. The probability of
accepting a batch, if chickens have another origin than Denmark respectively if it
is Danish-produced chickens are:
Exercise 10
Continuous random variable
a) The following R commands and results are given:
pnorm(2)
[1] 0.97725
pnorm(2,1,1)
[1] 0.84134
pnorm(2,1,2)
[1] 0.69146
Specify which distributions are used and explain the resulting probabilities (preferably by a sketch).
eNote 2
2.8 EXERCISES
77
b) What is the result of the following command: qnorm(pnorm(2))?
c) The following R commands and results are given:
qnorm(0.975)
[1] 1.96
qnorm(0.975,1,1)
[1] 2.96
qnorm(0.975,1,2)
[1] 4.9199
State what the numbers represent in the three cases (preferably by a sketch).
Exercise 11
The normal pdf
a) Which of the following statements regarding the probability density function of
the normal distribution N (1, 22 ) is false:
1. The total area under the curve is equal to 1
2. The mean is equal to 12
3. The variance is equal to 2
4. The curve is symmetric about the mean
5. The two tails of the curve extend indefinitely
6. Don’t know
eNote 2
2.8 EXERCISES
78
Let X be normal distributed with mean 24 and variance 16
b) Calculate the following probabilities:
– P( X ≤ 20)
– P( X > 29.5)
– P( X = 23.8)
Exercise 12
Computer chip control
A machine for checking computer chips uses on average 65 milliseconds per check with
a standard deviation of 4 milliseconds. A newer machine, potentially to be bought, uses
on average 54 milliseconds per check with a standard deviation of 3 milliseconds. It can
be used that check times can be assumed normal distributed and independent.
a) The probability that the time savings per check using the new machine is less than
10 milliseconds is:
b) The mean (µ) and standard deviation (σ ) for the total time use for checking 100
chips on the new machine is:
Exercise 13
Concrete items
A manufacturer of concrete items knows that the length ( L) of his items are reasonably
normally distributed with µ L = 3000mm and σL = 3mm. The requirement for these
elements is that the length should be not more than 3007mm and the length must be at
least 2993mm.
a) The expected error rate in the manufacturing will be:
eNote 2
2.8 EXERCISES
79
b) The concrete items are supported by beams, where the distance between the beams
is called l and the concrete items length is still called L. For the items to be supported correctly, the following requirements for these lengths must be fulfilled:
90mm < L − l < 110mm. It is assumed that the mean of the distance between the
beams is µl =2900mm. How large may the standard deviation σl of the distance
between the beams be if you want the requirement fulfilled in 99% of the cases?
Exercise 14
Online statistic video views
In 2013, there were 110,000 views of the DTU statistics videos that are available online.
Assume first that the occurrence of views through 2014 follows a Poisson process with
a 2013 average: λ365days = 110000.
a) What is the probability that in a randomly chosen half an hour there is no occurrence of views?
b) There has just been a view, what is the probability that you have to wait more than
fifteen minutes for the next view?
c) In June 2013, there were 17560 views. Which of the following answers is the best
one for the question: is this June figure in contradiction to the assumption that the
occurence of views is evenly distributed over the year (all probability statements
in the answer options are correct and X ∼ Po (λ) means that X follows a poisson
distribution with intensity λ)?
eNote 2
80
2.8 EXERCISES
Exercise 15
Body mass index distribution
The so-called BMI (Body Mass Index) is a measure of the weight-height-relation, and is
defined as the weight (W) in kg divided by the squared height (H) in meters:
BMI =
W
.
H2
Assume that the BMI-distribution in a population is a lognormal distribution with α =
3.1 and β = 0.15 (hence that log(BMI) is normal distributed with mean 3.1 and standard
deviation 0.15).
a) A definition of ”being obese” is a BMI-value of at least 30. How large a proportion
of the population would then be obese?
Exercise 16
Bivariate normal
a) In the bivariate normal distribution (see Example 2.70), show that if Σ is a diagonal matrix then ( X1 , X2 ) are also independent and follows univariate normal
distributions.
b) Assume that X and Y are independent standard normal random variables. Now
let Z1 and Z2 be defined by
Z1 = a11 X + c1
(2-158)
Z2 = a12 X + a22 Y + c2
(2-159)
Show that an appropriate choice of a11 , a12 , a22 , c1 , c2 can give any bivariate normal
distribution for the random vector ( Z1 , Z2 ), i.e. find a11 , a12 , a22 , c1 , c2 as a function
of µ1 , µ2 and the elements of Σ.
Note that Σij = Cov( Zi , Zj ), and that any linear combination of random normal
variables will result in a random normal variable.
eNote 2
81
2.8 EXERCISES
c) Use the result to simulate 1000 realization of a bivariate normal random variable with µ = (1, 2) and
1 1
Σ=
(2-160)
1 2
and make a scatter plot of the bivariate random variable.
Exercise 17
Sample distributions
a) Verify by simulation that n1 +σn22 −2 S2p ∼ χ2 (n1 + n2 − 2) (See Example 2.83). You
may use n1 = 5, n2 = 8, µ1 = 2, µ2 = 4, and σ2 = 2.
b) Show that if X ∼ N (µ1 , σ2 ) and Y ∼ N (µ2 , σ2 ), then
X̄ − Ȳ − (µ1 − µ2 )
q
∼ t ( n1 + n2 − 2)
S p n11 + n12
(2-161)
Verify the result by simulation. You may use n1 = 5, n2 = 8, µ1 = 2, µ2 = 4, and
σ2 = 2.
Exercise 18
Sample distributions 2
Let X1 , ..., Xn , and Y1 , ..., Yn , with Xi ∼ N (µ1 , σ2 ) and Yi ∼ N (µ2 , σ2 ) be independent
random variables. S12 and S22 are the sample variances based on the X’s and the Y’s
respectively. Now define a new random variable
Q=
a) For n equal 2, 4, 8, 16 and 32 find
S12
S22
(2-162)
eNote 2
2.8 EXERCISES
82
1. P( Q < 1)
2. P( Q > 2)
3. P Q < 12
4. P 12 < Q < 2
b) For at least one value of n illustrate the results above by direct simulation from
independent normal distributions. You may use any values of µ1 , µ2 and σ2 .