Download 2.3 Continuous Random Variables

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Density of states wikipedia , lookup

History of statistics wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
2.3 Continuous Random Variables
A continuous random variable is one which can take any value in
an interval (the values that can be taken by such a variable cannot
be listed).
Such variables are normally measured according to a scale.
Examples of continuous random variables: age, height, weight,
time, air pressure.
Such variables are normally only measured to a given accuracy
(e.g. the age of a person is normally given to the nearest year).
1 / 108
2.3.1 The notion of a density function
Suppose X is a continuous random variable. Consider
fδ (x) =
P(x < X < x + δ)
δ
This is the probability that X lies in an interval of length δ divided
by the length of the interval.
i.e. this can be thought of as the average probability density on
the interval (x, x + δ).
2 / 108
The notion of a density function
Let
P(x < X < x + δ)
.
δ
Then fX (x) is the probability density function of the random
variable X .
fX (x) = limδ→0
If it is clear which variable we are talking about then the subscript
may be left out.
”Likely” values of X correspond to areas where the density
function is large. ”Unlikely” values of X correspond to areas where
the density function is small.
3 / 108
2.3.2 Properties of a density function
A density function f (x) of a random variable X satisfies 2
conditions:
1) f (x)≥0, for all x.
Z
∞
2)
f (x)dx=1.
−∞
The second condition simply states that the total area under the
density curve is 1.
4 / 108
The support of a continuous random variable
The support, SX , of a continuous random variable X is the set of
values for which f (x) > 0.
We have
Z
f (x)dx = 1
SX
In general, we only have to integrate over intervals where the
density function is positive.
5 / 108
Density curves and probability
The probability that X lies between a and b is the area under the
density curve between x = a and x = b.
6 / 108
Density curves and probability
Hence,
Z
b
P(a < X < b) =
f (x)dx.
a
In particular,
Z ∞
1. P(X > a)=
f (x)dx
a
Z b
2. P(X < b)=
f (x)dx.
−∞
Note that for any constant a, P(X = a) = 0.
7 / 108
2.3.3 Expected value of a continuous random variable
The expected value of a random variable X with density function
f (x) is
Z
E (X ) = µX =
xf (x)dx.
SX
i.e. we integrate over the interval(s) where the density function is
positive.
The expected value of a function g(X) of a random variable is
Z
E [g (X )] =
g (x)f (x)dx.
SX
If a distribution is symmetrical about x = x0 , then E (X ) = x0 .
8 / 108
2.3.3 Variance of a continuous random variable
The variance of X is given by
σX2 =Var (X ) = E [(X − µ)2 ]
Z
=
(x − µ)2 f (x)dx
SX
=E (X 2 ) − E (X )2 .
σX is the standard deviation of the random variable X .
Note that these formulas are analogous to the definitions of
expected values for discrete random variables. The only change is
that the summations become integrals.
9 / 108
2.3.4 The Cumulative Distribution Function and Quartiles
of a Distribution
The (cumulative) distribution function of a continuous random
variable X is denoted FX . By definition,
Z t
FX (t) = P(X ≤ t) =
fX (x)dx,
−∞
where fX is the density function.
Differentiating this equation we obtain FX0 (x) = fX (x).
Suppose SX = [a, b], where a and b are finite. For x ≤ a,
FX (x) = 0. Also, for x ≥ b, FX (x) = 1.
10 / 108
The Cumulative Distribution Function
It should be noted that some textbooks define the cumulative
distribution function as
FX (x) = P(X < x).
In the case of continuous distributions, this definition is equivalent
to the definition given above, since P(X = x) = 0.
However, these definitions are not equivalent in the case of discrete
random variables.
11 / 108
The Quantiles of a Distribution
For 0 < p < 1, the p-quantile of a continuous random variable, qp ,
satisfies
FX (qp ) = p.
q0.5 is the median of X .
q0.25 and q0.75 are called the lower and upper quartiles of X ,
respectively.
If the support SX is an interval, then all quantiles are uniquely
defined.
12 / 108
Relation between the mean and the median for a
continuous distribution
If a continuous random variable X has a distribution which is
symmetric around x0 , then
E [X ] = q0.5 = x0 .
Many continuous distributions are right skewed, i.e. have a long
right hand tail (e.g. the distribution of wages, the exponential
distribution).
For such distributions, the mean is greater than the median, i.e. in
everyday language the ”average” (median) person earns less than
the ”average” (given as the mean) wage.
For left-skewed distributions, the median is greater than the mean.
13 / 108
Example 2.3.1
Suppose the random variable X has density function f (x) = cx on
the interval [0,5] and f (x) = 0 outside this interval.
1. Calculate the value of the constant c.
2. Calculate the probability that (X − 2)2 ≥ 1.
3. Calculate E (X ) and σX .
4. Derive the cumulative distribution function of X .
5. Calculate the median, lower quartile and upper
quartile of this distribution.
14 / 108
Example 2.3.1
15 / 108
Example 2.3.1
16 / 108
Example 2.3.1
17 / 108
Example 2.3.1
18 / 108
Example 2.3.1
19 / 108
Example 2.3.1
20 / 108
Example 2.3.1
21 / 108
Example 2.3.1
22 / 108
2.3.4 Standard continuous distributions
The uniform distribution on the interval [a, b]. We write
X ∼ U[a, b].
f (x)
1
b−a
0
a
b
x
23 / 108
The uniform distribution
The area under the density function (a rectangle) is 1.
The width of this rectangle is (b − a), the height of this rectangle
is f (x), the density function.
Hence, for x ∈ [a, b]
(b − a)f (x) = 1 ⇒ f (x) =
1
b−a
Otherwise, f (x) = 0.
24 / 108
The uniform distribution
By symmetry, E (X ) is the mid-point of the interval i.e.
E (X ) =
a+b
.
2
Suppose a calculator calculates to k decimal places.
The rounding error involved in a calculation may be assumed to be
uniform on the interval [−0.5 × 10−k , 0.5 × 10−k ].
25 / 108
Example 2.3.2
Suppose the length of the side of a square is chosen from the
uniform distribution on [0, 3].
Calculate
1. the probability that the length of the side is between
2 and 4
2. the expected area of this square.
26 / 108
Example 2.3.2
27 / 108
Example 2.3.2
28 / 108
2. The exponential distribution
The density function of an exponential random variable with
parameter λ is given by f (x) = λe −λx , for x ≥ 0 and f (x) = 0 for
x < 0. We write X ∼ Exp(λ).
29 / 108
The exponential distribution
This distribution may be used to model the time between the
arrival of telephone calls.
λ is the rate at which calls arrive (i.e. the expected length of time
between calls is 1/λ).
30 / 108
The exponential distribution and the Poisson distribution
From the interpretation of the exponential distribution and the
Poisson distribution, we can see that there is a connection between
them.
If the time between observations, X , has an Exp(λ) distribution
(e.g. the time between two calls, when calls come in at rate λ),
then the number of observations in time t has a Poisson(λt)
distribution.
Note that λt is the expected number of calls to arrive in time t.
31 / 108
Example 2.3.3
The average number of calls coming into a call centre is 3/minute.
Calculate
1) the probability that the time between two calls is
greater than k mins.
2) t, where t is the time such that the length of time
between two calls is less than t with probability 0.8.
3) the probability that the time between calls is greater
than c + k, given that the time between calls is at
least c (c, k > 0).
32 / 108
Example 2.3.3
33 / 108
Example 2.3.3
34 / 108
Example 2.3.3
35 / 108
Example 2.3.3
36 / 108
The lack of memory property
Note that this result states that it does not matter how long we
have waited between calls, the distribution of the extra time we
wait until the next call is simply the distribution of the time
between calls.
This property is called the ”lack of memory” property.
Another distribution which has this property is the geometric
distribution. For example, suppose I throw a die until I get a six.
It does not matter how often I have already thrown the die, I
expect to throw the die on average another 6 times before I obtain
a six (as long as the die is ”fair”).
37 / 108
3. The Pareto distribution
X has a Pareto distribution with parameters xm and α when the
density function is given by
αx α
m
, x ≥ xm
.
f (x) = x α+1
0, x < xm
We write X ∼ Pareto(xm , α).
Note that α > 1, xm > 0.
38 / 108
The Pareto distribution
The density function of the Pareto distribution looks similar to that
of the exponential distribution shifted xm units to the right (it has
a heavier tail though).
The Pareto distribution is often used to model the distribution of
wages when there is a minimum wage.
xm represents the minimum wage.
α represents the degree of ”concentration” of the wage
distribution, i.e. the smaller α, the larger the degree of wage
inequality.
39 / 108
Standard Results for the Exponential and Pareto
Distributions
Suppose X ∼ Exp(λ), then for k ≥ 0
P(X > k) = e −λk
Suppose X ∼ Pareto(xm , α), then for k ≥ xm
x α
m
P(X > k) =
k
40 / 108
Standard Results for the Exponential and Pareto
Distributions
For α > 1 the expected (mean) of the Pareto distribution is given
by
αxm
E (X ) =
α−1
Note that these results follow directly from calculating the
appropriate definite integrals.
41 / 108
Standard Results for the Exponential and Pareto
Distributions
In order to get any probability related to the exponential or Pareto
distribution, we use the facts from the previous slide together with
the following two facts, which hold for any continuous distribution.
1. P(X < k)=1 − P(X > k)
2. P(a < X < b)=P(X > a) − P(X > b)
42 / 108
Illustration of these Results
The second result follows from the fact that the probability of X
lying between a and b is the area under the density curve between
x = a and x = b.
43 / 108
Example 2.3.4
The distribution of monthly salaries in Poland can be modelled
using the Pareto distribution. The minimum salary is 2 000 PLN
and the concentration factor is 2.
Calculate the probability that an individual earns
i) less than 4 000 PLN
ii) greater than 8 000 PLN
iii) Calculate the expected wage.
44 / 108
Example 2.3.4
45 / 108
Example 2.3.4
46 / 108
Example 2.3.4
47 / 108
4. The normal (Gaussian) distribution
X has a normal distribution with expected value (mean) µ and
variance σ 2 when the density function is given by
−(x − µ)2
1
√
exp
.
f (x) =
2σ 2
σ 2π
We write X ∼ N(µ, σ 2 ). This is the very commonly met bell
shaped distribution.
Much of the theory of statistics is based upon the properties of this
distribution. The normal distribution will be the subject of the next
section.
48 / 108
The normal (Gaussian) distribution
49 / 108
Expected value and variance of standard continuous
distributions
Distribution
N(µ, σ 2 )
Exp(λ)
U[a, b]
Pareto(xm , α)
Expected value
µ
Variance
σ2
1
λ
a+b
2
αxm
α−1
1
λ2
(b−a)2
12
2
xm
α−1
α
α−2
Note: The expected value of the Pareto distribution exists only
when α > 1, the variance exists only when α > 2.
50 / 108
2.4 The Normal Distribution and the Central Limit
Theorem
The importance of the normal distribution results from the central
limit theorem, which explains why this bell shaped distribution is
so often observed in nature.
51 / 108
2.4.1 The standard normal distribution
The density function cannot be integrated algebraically.
In order to calculate probabilities associated with the normal
distribution, we can use standardization.
A standard normal random variable has expected value 0 and
standard deviation equal to 1.
Such a random variable is denoted by Z i.e. Z ∼ N(0, 1).
52 / 108
Using tables for the standard normal distribution
The NORMSDIST(k) function in excel gives the value of
P(Z < k).
This function has been used to create the table used in this course
with probabilities of the form P(Z > k) = 1 − P(Z < k) for k ≥ 0.
Of course, often we have to calculate probabilities of events which
take a different form.
In order to do this we use the following 3 rules. These follow from
the interpretation of the probability of an event as the appropriate
area under the density curve.
53 / 108
1. The law of complementarity
The law of complementarity
P(Z < k) = 1 − P(Z > k)
It should be noted that P(Z = k) = 0.
The area under the density curve is 1, hence
P(Z < k) + P(Z > k) = 1 i.e. P(Z < k) = 1 − P(Z > k). This
is a general rule for continuous distributions.
54 / 108
The law of complementarity
55 / 108
2. The law of symmetry
The law of symmetry
Since the standard normal distribution is symmetric about 0,
P(Z < −k) = P(Z > k)
This is used to calculate probabilities in the ”left hand tail” of the
distribution (i.e. when the constant is negative). This law is
specific to distributions which are symmetric around 0.
56 / 108
The law of symmetry
57 / 108
3. The interval rule
The interval rule
P(a < Z < b)=P(Z > a) − P(Z > b)
General for continuous distributions
58 / 108
Reading the table for the standard normal distribution
In order to read P(Z > k), where k is given to 2 decimal places,
we find the row corresponding to the digits either side of the
decimal point and the column corresponding to the second place
after the decimal point.
The table on the next slide illustrates a fragment of the table.
59 / 108
Reading the table for the standard normal distribution
1.1
1.2
0.00
0.1357
0.1151
0.01
0.1335
0.1131
0.02
0.1314
0.1112
0.03
0.1292
0.1093
For example, P(Z > 1.22) = 0.1112.
Since, P(Z > k) is decreasing in k and must be non-negative, we
can assume that for k > 4, P(Z > k) ≈ 0.
60 / 108
Example 2.4.1
Calculate
i) P(Z > 1.76)
ii) P(Z > −0.83)
iii) P(Z < −0.18)
iv) P(−0.43 < Z < 1.36).
61 / 108
Example 2.4.1
62 / 108
Example 2.4.1
63 / 108
Example 2.4.1
64 / 108
Example 2.4.1
65 / 108
Reading the table for the standard normal distribution
Sometimes it is necessary to find the number k for which
P(Z > k) = p, where p ≤ 0.5.
In this case we find the value closest to p in the heart of the table
and the value of k is read from the values corresponding to the
appropriate row and column.
The rules of complementarity and symmetry may be needed to
obtain the desired form i.e. P(Z > k) = p, where p ≤ 0.5.
66 / 108
Example 2.4.2
Find the value of k satisfying P(Z < k) = 0.17.
67 / 108
Example 2.4.2
68 / 108
Example 2.4.2
69 / 108
Example 2.4.2
70 / 108
The NORMSINV function in Excel
The function NORMSINV gives the value of k for which
P(X < k) = p for a given p, 0 < p < 1.
This is the inverse function to the distribution function of the
standard normal distribution.
In the previous example, i.e. Find the value of k satisfying
P(Z < k) = 0.17.
k = NORMSINV (0.17) = −0.95
71 / 108
2.4.2 Standardisation of a normal random variable
Clearly, the technique used in the previous subsection only works
for a standard normal random variable.
How do we calculate appropriate probabilities for a general normal
distribution i.e. X ∼ N(µ, σ 2 )?
The first step is to standardise the variable.
72 / 108
Standardisation of a normal random variable
If X ∼ N(µ, σ 2 ), then
Z=
X −µ
∼ N(0, 1)
σ
Subtracting the expected value first centres the distribution around
0 and then division by the standard deviation ”shrinks” the
dispersion of the distribution to the dispersion of the standard
normal distribution.
73 / 108
Transformations of normal random variables
In general, if X ∼ N(µ, σ 2 ), then Y = aX + b also has a normal
distribution.
In particular, Y ∼ N(aµ + b, a2 σ 2 ).
The sum of independent, normal random variables is also normally
distributed. The expected value and variance of such a sum are the
sums of the individual expected values and variances, respectively.
After standardisation, we can calculate the appropriate
probabilities as before.
74 / 108
Transformations of normal random variables
It should be noted that this standardisation procedure is specific to
the normal distribution.
Other distributions may have particular standard forms and
standardisation procedures.
For example, the standard exponential distribution is Exp(1),
where f (x) = e −x .
Note that if Y ∼Exp(λ), then λY ∼Exp(1).
75 / 108
Example 2.4.3
The height of male students is normal with a mean of 175cm and
variance of 144cm2 .
a) What is the probability that a randomly picked male student is
i) taller than 190cm
ii) between 163 and 181cm?
b) 10% of male students are shorter than what height?
76 / 108
Example 2.4.3
77 / 108
Example 2.4.3
78 / 108
Example 2.4.3
79 / 108
Example 2.4.3
80 / 108
Example 2.4.3
81 / 108
2.4.3 The central limit theorem
Suppose I throw a coin once. The distribution of the number of
heads, X , is
P(X = 0) = 0.5; P(X = 1) = 0.5
i.e. nothing like a bell shape distribution.
However, suppose I throw the coin a large number of times, say k
times. I am reasonably likely to get around k2 heads, but the
probability of getting either a large number or small number of
heads (with respect to k2 ) is very small.
The distribution of the number of heads thrown, X , has a bell like
shape (i.e. similar to the normal distribution).
82 / 108
2.4.3 The central limit theorem
This is a particular case of the central limit theorem.
Note that X can be written as X = X1 + X2 + . . . + Xn , where
Xi = 1 if the i-th toss results in heads
Xi = 0 if the i-th toss results in tails.
83 / 108
The central limit theorem (CLT)
Suppose X = X1 + X2 + . . . + Xn , where n is large and the Xi are
independent random variables, then X is approximately normally
distributed, i.e. X ∼approx N(µ, σ 2 ), where
µ=E (X ) =
n
X
E (Xi )
i=1
n
X
σ 2 =Var (X ) =
Var (Xi ).
i=1
This approximation is good if n ≥ 30, the variances of the Xi are
comparable and the distributions of the Xi ’s are reasonably
symmetrical.
If the distributions of the Xi ’s are clearly asymmetric, then this
approximation will be less accurate.
84 / 108
Example 2.4.4
n independent observations are taken from the exponential
distribution with expected value 1.
Using an appropriate approximation, estimate the probability that
the mean of these observations (the sample mean X ) is between
0.9 and 1.1 when i) n = 30, ii) n = 100
85 / 108
Example 2.4.4
86 / 108
Example 2.4.4
87 / 108
Example 2.4.4
88 / 108
Example 2.4.4
89 / 108
Example 2.4.4
90 / 108
Example 2.4.4
91 / 108
The relation between the central limit theorem and
sampling
Note 1: As the sample size grows, the probability of the sample
mean being close to the theoretical (population) mean increases.
In the case of the exponential distribution, the estimation of the
theoretical mean using the sample mean is not highly accurate, due
to the high coefficient of variation (CV=1).
92 / 108
The relation between the central limit theorem and
sampling
Note 2: In this case the exact probabilities can be calculated.,
since the sum of i.i.d. exponential random variables has a Gamma
distribution (not considered in this course).
In the first case, the exact probability (to 4 d.p.) is 0.4162
(compared to the estimate 0.4176).
In the second case, the exact probability (to 4 d.p.) is 0.6835
(compared to the estimate 0.6826).
Hence, as the number of observations increases the more accurate
the approximation using the CLT is.
Since the exponential distribution is clearly asymmetrical the
approximation using CLT is relatively poor.
93 / 108
Proportion of observations from a normal distribution
within one standard deviation of the mean
Note 3: After standardisation, the constants indicate the number
of standard deviations from the mean (a negative sign indicates
deviations below the mean).
Here, P(−1 < Z < 1) = 0.6826 shows that if X comes from any
normal distribution, the probability of being within one standard
deviation of the mean is just over 23 .
Similarly, P(−2 < Z < 2) = 0.9545. Thus, an observation from
any normal distribution will be less than 2 standard deviations
from the mean with a probability of just over 0.95.
94 / 108
2.4.4 The normal approximation to the binomial
distribution
Suppose n is large and X ∼ Bin(n, p), then X ∼approx N(µ, σ 2 ),
where µ = np, σ 2 = np(1 − p).
This approximation is used when n ≥ 30, 0.1 ≤ p ≤ 0.9.
For values of p outside this range, the Poisson approximation tends
to work better.
95 / 108
The continuity correction for the normal approximation to
the binomial distribution
It should be noted that X has a discrete distribution, but we are
using a continuous distribution in the approximation.
For example, suppose we wanted to estimate the probability of
obtaining exactly k heads when we throw a coin n times. This
probability will in general be positive.
However, if we use the normal approximation without an
appropriate ”correction”, we cannot sensibly estimate P(X = k)
[for continuous distributions P(X = k) = 0].
96 / 108
The continuity correction for the normal approximation to
the binomial distribution
Suppose the random variable X takes only integer values and has
an approximately normal distribution.
In order to estimate P(X = k), we use the continuity correction.
This uses the fact that when k is an integer
P(X = k) = P(k − 0.5 < X < k + 0.5).
97 / 108
Example 2.4.5
Suppose a coin is tossed 36 times. Using CLT, estimate the
probability that exactly 20 heads are thrown.
98 / 108
Example 2.4.5
99 / 108
Example 2.4.5
100 / 108
Example 2.4.5
Using the BINOMDIST function in the excel package, we can
calculate this probability using the exact distribution.
To four decimal places, this probability is 0.1063.
101 / 108
The continuity correction for the normal approximation to
the binomial distribution
This continuity correction can be adapted to problems in which we
have to estimate the probability that the number of successes is in
a given interval. e.g.
P(15 ≤ X < 21)=P(X = 15) + P(X = 16) + . . . + P(X = 20)
=P(14.5 < X < 15.5) + . . . + P(19.5 < X < 20.5)
=P(14.5 < X < 20.5)
102 / 108
The continuity correction for the normal approximation to
the binomial distribution
Note that when applying the continuity correction, if the end point
of an interval is given by a non-strict inequality, then we stretch
the interval at that end by 0.5.
If the end point of an interval is given by a strict inequality, then
we shrink the interval at that end by 0.5.
103 / 108
Example 2.4.6
A die is thrown 180 times. Estimate the probability that
1) at least 35 sixes are thrown
2) between 27 and 33 sixes are thrown (inclusively).
104 / 108
Example 2.4.6
105 / 108
Example 2.4.6
106 / 108
Example 2.4.6
107 / 108
The normal approximation to the binomial
The probabilities calculated from the exact distribution, to four
decimal places, are 0.1828 and 0.5160
It should be noted that the normal approximation to the binomial
is most accurate when n is large and p is close to 0.5.
This is due to the fact that X = X1 + X2 + . . . + Xn , where
Xi ∼ 0 − 1(p).
The distribution of Xi is symmetric when p = 0.5.
108 / 108