Download 1. Discrete Random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 2: Random Variables
In this chapter we will cover:
1. Discrete Random variables, (§2.1 Rice)
2. Continuous Random variables, (§2.2 Rice)
3. Functions of a random variable (§2.3 Rice)
Random Variables
1. A random variable is a number whose value is determined by chance
2. The number of heads in three coin tosses is a random variable
3. The time till the next magnitude 8 earth-quake is a random variable.
4. Example 2 is a discrete random variable since the answer must be a discrete integer value i.e., 0, 1, 2 . . .. Since time
is continuous (3) is a continuous random variable
Example: coin toss
• For the three coin tosses the sample space is
Ω = {hhh, hht, htt, hth, ttt, tth, thh, tht}
• The random variable X is then 3 when hhh occurs, 2 when hht or thh or hth occurs
• That is X = 2 if and only if ω ∈ {hht, thh, hth}, hence P (X = 2) = P ({hht, thh, hth}).
• We can therefore work out the probability of seeing X = 0, 1, 2, 3 These are
P (X = 0) =
1
3
3
, P (X = 1) = , P (X = 2) = ,
8
8
8
P (X = 3) =
1
8
• This is called the probability mass function for X. It is also called the frequency function.
1
Probability mass function
• A general discrete random variable which values x1 , x2 , x3 , · · ·
• The probability mass function is
p(xi ) = P (X = xi )
• From the rules of probability we must have that
0 ≤ p(xi ) ≤ 1
and
X
p(xi ) = 1
i
Cumulative distribution function
• As an alternative to the mass function you can also defined the cumulative distribution function (cdf)
• Defined by
F (x) = P (X ≤ x)
Cumulative Distribution
0.35
1.0
Prob. Mass Fn.
●
0.6
0.05
0.2
0.4
Probability
0.20
0.15
0.10
●
0.0
●
0.00
Probability
0.25
0.8
0.30
●
0
1
2
3
−1
0
1
2
x
2
3
4
Cumulative distribution function
• Cumulative distribution functions are often denoted by capital letters e.g. F (x)
• Frequency functions by lowercase letters e.g. f (x)
• The CDF is non-decreasing and satisfies
lim F (x) = 0, lim F (x) = 1
x→−∞
x→∞
Bernoulli Random variables
1. A Bernoulli random variable takes only two possible values 0 or 1
2. The probability it takes the value 1 is p, the probability it takes value 0 is 1 − p.
3. Its frequency function is
p(x) =


p
1−p

0
if x = 1
if x = 0
otherwise
4. This can also be written as p(x) = px (1 − p)1−x for x = 0, 1 and 0 otherwise
Exercise
Sketch the frequency and cdf for the random variable X where
P (X = −1) =
P (X = 2) =
1
1
3
, P (X = 0) =
, P (X = 1) =
,
10
10
10
2
3
, P (X = 3) = 0, P (X = 4) =
10
10
3
Exercise
Whats the probability mass function for the cdf below?
1.0
Cumulative Distribution
●
0.8
●
●
0.6
●
●
0.4
Probability
●
●
0.2
●
●
0.0
●
0
2
4
6
8
10
x
Recommended Questions
From §2.5 of Rice you should do 1, 3, 5 (a), 7
Indicator random variables
1. If A is an event, then there is a probability p that the event happens, and 1 − p that is doesn’t happen
2. This can be coded as a random variable by taking the value 1 if it does happen and 0 if it does not.
3. Formally defined the indicator random variable IA (ω) by
1 ω∈A
IA (ω) =
0 ω∈
/A
4. Then IA (ω) is a Bernoulli r.v. for any A.
4
The Binomial distribution
1. Suppose that n independent experiments, each either ‘success’ or a ‘failure’, are run
2. Further suppose that for each experiment there is a fixed probability p of ‘success’
3. The number of successes in n experiments is called a binomial random variable
4. Its frequency function is
n k
p(x) =
p (1 − p)n−k
k
for x ∈ {0, 1, · · · , n}.
The Binomial distribution
Some frequency functions when n = 10 for different values of p
0.4
p=0.1
p=0.5
●
●
●
●
0.10
probability
0.2
●
●
●
0.1
probability
0.3
0.20
●
●
●
2
4
●
●
6
●
●
8
●
10
●
●
●
0
●
2
4
6
x
x
p=0.3
p=0.9
8
0.4
0
●
0.00
0.0
●
●
●
0.3
probability
●
●
●
0.1
●
0.2
0.20
●
●
0.00
●
●
0
2
4
6
●
8
●
0.0
probability
10
●
●
0.10
●
●
10
●
0
●
●
2
x
●
●
●
4
●
6
8
10
x
The Binomial distribution
1. The mode is the x value with the highest probability. What is it in each of the cases shown above?
2. What is the relationship between the p = 0.1 and p = 0.9 case?
5
The Tay-Sachs disease
• Couples can be carriers of Tay-Sach disease
• Each child has a probability 0.25 of having the disease and this is independent across different children
• If the couple have 4 children, the number that will have the disease is Binomial (4, 0.25)
• These are P (k = 0) = 0.316, P (k = 1) = 0.422, P (k = 2) = 0.211, P (k = 3) = 0.047, P (k = 4) = 0.004
The Tay-Sachs disease
• What would these probabilities be if the probability of a single child having the disease is 0.5?
• What is the mode (i.e the most likely number)?
The geometric distribution
• The geometric distribution is also constructed from independent Bernoulli trials
• On each trial a ‘success’ occurs with probability p
• The geometric random variable counts the number of trials before the first success happens
• The frequency function is
p(k) = (1 − p)k−1 p
for k = 1, 2, 3, · · ·.
6
The geometric distribution
Here are some numerical examples for different values of p.
0.6
p=0.5
0.6
p=0.1
10
●
0.0
● ● ● ●
● ● ● ● ● ● ●
5
10
p=0.3
p=0.6
0.6
x
0.4
probability
●
15
●
●
0.2
0.6
x
0.4
probability
●
15
●
0.2
0.4
●
● ●
● ●
● ● ●
● ● ● ●
● ● ● ● ●
5
●
0.2
probability
0.4
0.2
0.0
probability
●
●
●
●
●
● ●
● ● ● ● ●
● ● ●
5
10
●
0.0
0.0
●
15
● ● ●
● ● ● ● ● ● ● ● ●
5
x
10
15
x
Exercise
1. Which is more likely (i) 9 heads from 10 throws or (ii) 18 heads from 20 throws, of a fair coin
2. If X is a geometric random variable with p = 0.5 for what value of k is P (X ≤ K) ≈ 0.99
The hypergeometric distribution
• Suppose we have an urn with n balls, r black and n − r white.
• Let X be the number of black balls drawn when taking m balls without replacement. X has a hypergeometric
distribution.
• Its frequency function is
P (X = k) =
• Thus the probability of winning a lottery is hypergeometric
7
r
k
n−r
m−k
n
m
The Poisson distribution
• This has a frequency function
P (X = k) =
λk
exp(−λ)
k!
for k = 0, 1, 2, · · ·.
• This can be thought of as a limit of binomial trials as n gets large, and p is small, where λ = np.
The Poisson and binomial distributions
Comparing numerically some Poisson and binomial distributions, the black is the Binomial, the red the Poisson.
n=20, p=0.5, lambda=10
●
●
1
2
●
3
4
5
0.10
0.00
0.0
0
●
●
●
●
● ●
● ●
● ●
●
● ● ●
● ● ●
●
●
5
10
●
● ● ● ● ●
● ● ●
15
x
n=5, p=0.1, lambda=0.5
n=100, p=0.1, lambda=10
0.20
●
●
●
0.00
●
●
2
4
6
●
●
8
●
●
0.08
0.00
●
●
●
●
●
●
●
●
●
●
●
●
●
0.04
●
●
20
●
●
●
●
●
●
●●
0.12
●
probability
probability
●
●
0
●
0.10
●
x
●
●
0
● ●
●
●
●
●
●
●
●
0.05
probability
0.4
●
●
0.2
probability
0.15
0.6
n=5, p=0.1, lambda=0.5
●
●
10
●
●
●
●
●
●
●
●
●
0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
20
40
x
60
80
100
x
Examples
• Modelling the number of telephone calls coming into an exchange if the exchange has a large number of customers
which act more or less independently
• Modelling the number of α particles emitted from a radio active source
• Modelling the number of large accidents by an insurance company
8
Recommended questions
From §2.5 Rice problems: 11,13,1,7,27,31,32.
Continuous Random variables
• Suppose that the random variable of interest can take a continuum of values rather than lies in a discrete set
• In such a case the frequency function is replaced by the density function f (x), which is f (x) ≥ 0 and
Z ∞
f (x)dx = 1
−∞
• If X is a random variable with density f (x) then
b
Z
P (a < X < b) =
f (x)dx
a
Continuous Random variables
• For small δ, if f (x) is continuous then
P (x −
δ
δ
≤X ≤x+ )=
2
2
Z
x+ δ2
f (u)du ≈ δf (x)
x− δ2
• The cumulative distribution function F (x) is defined as
Z
x
F (x) = P (X ≤ x) =
f (u)du
−∞
• By calculus have that
dF
(x)
dx
f (x) =
Uniform Random Variables
• If X is uniformly distributed on the interval [a, b] then
1
b−a
f (x) =
0
a≤x≤b
x < a or x > b
• The cumulative distribution function is
F (x) =



9
0
x−a
b−a
1
x<a
a≤x≤b
x>b
Uniform Random Variables
The density and cdf for the uniform on [0, 1].
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
Density
Probability
0.6
0.8
1.0
Uniform CDF
1.0
Uniform density
−0.5
0.0
0.5
1.0
1.5
−0.5
x
0.0
0.5
1.0
1.5
x
The cdf
• By the properties of the cdf the inverse F −1 (x) is well-defined.
• The pth quantile of F is defined to be xp such that F (xp ) = p where p ∈ [0, 1]
• When p = 0.5 the quantile is called the median, when its 0.25 or 0.75 its called the lower or upper quartile of F .
10
Probabilities
If X has a uniform [0, 1] distribution then P (X ∈ (0.5, 0.6)) is illustrated for both the density and cdf below
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
Density
Probability
0.6
0.8
1.0
Uniform CDF
1.0
Uniform density
−0.5
0.0
0.5
1.0
1.5
−0.5
0.0
x
0.5
1.0
1.5
x
Exercise
Sketch both the density and cdf function for a uniform [−1, 1] random variable and indicate what corresponds to the
probability that x > 0.
The exponential distribution
• The density function is
f (x) =
λ exp(−λx) x ≥ 0
0
x<0
• The cdf is
F (x) =
1 − exp(−λx) x ≥ 0
0
x<0
11
‘Memoryless property’
• The exponential distribution is often used to model lifetimes or waiting times.
• It has the following property
P (T > t + s|T > s) = P (T > t),
see page 49
• What does this mean?
The Normal distribution
• Probably the most used distribution in statistics is called the normal
• Its density function is given by
(x − µ)2
1
f (x) = √ exp −
2σ 2
σ 2π
−∞ < x < ∞.
• The µ term is called the mean and the σ is called the standard deviation.
• The cdf does not have a nice formula, but Table 2 page A7 Rice gives numerical values for a standard normal
distribution.
12
The Normal distribution
The plot shows three normal distributions. The black has µ = 0, σ = 1 (often called a standard normal). The red has
µ = 5, σ = 1 while the blue has µ = 0, σ = 3.
0.2
0.1
0.0
Density
0.3
0.4
Normal densities
−10
−5
0
x
13
5
10
The Normal distribution
The figure shows the relationship between the shape of the normal density and the size of a standard deviation
0.2
0.0
0.1
Density
0.3
0.4
Normal densitiy
−4
−2
0
2
4
Standard deviations from mean
Recommended Questions
From §2.5 Rice look at questions 34, 40, 41, and 45. Also study the memoryless property of the exponential on page 49.
Functions of a random variable
• Suppose X has a density function f (x), what is the density function of Y = g(X) for some function g?
• Since X is a random variable (i.e., its value its determined by chance), then g(X)’s value is also determined by
chance, hence it is also a random variable
• The function g(X) could be a linear function, i.e., Y = g(X) = aX + b
• Alternatively it could be a non-linear function Y = g(X) = X 2 .
14
Example Normal distribution
• Suppose X ∼ N (µ, σ 2 ) (i.e. X has a normal distribution with mean µ and standard deviation σ) and that Y =
aX + b where a > 0.
• Consider the cdf for Y ,
FY (y)
= P (Y ≤ y)
= P (aX + b ≤ y)
y−b
= P (X ≤
)
a
y−b
= FX (
)
a
• Thus the density of Y is
fY (y)
=
=
d
y−b
FX (
)
dy
a
y−b
1
fX (
)
a
a
Example Normal distribution
• Thus
"
1
exp −
fY (y) =
2
aσ 2π
1
√
y − b − aµ
aσ
2 #
so Y ∼ N (a + bµ, a2 σ 2 )
Example B page 59
2
• Let X ∼ N (µ, σ ), we want to find the probability that X is less than σ away from µ, i.e. P (|X − µ| < σ)
• This probability is
P (σ < X − µ < σ) = P (−1 <
• Using the previous result we see that Z =
X−µ
σ
X −µ
< 1)
σ
has a standard normal N (0, 1) distribution
• If Φ(x) is the cdf for the standard normal distribution, then we want
Φ(1) − Φ(−1) = 0.68
15
Example C page 59
• Find the density of X = Z 2 where Z ∼ N (0, 1)
• We have
FX (x)
= P (X ≤ x)
√
√
= P (− x ≤ Z ≤ x)
√
√
= Φ( x) − Φ(− x)
• Find the density of X by differentiating the cdf. Since Φ0 (x) = φ(x) the density for the standard normal, we get
fX (z)
√
1 −1/2 √
1
x
φ( x) + x−1/2 φ(− x)
2√
2
= φ( x)
=
• More explicitly this is
x−1/2
fX (x) = √
exp(−x/2).
2π
General rule
• Let X be a continuous variable with density f (x) and let Y = g(X) where g is differentiable, and monotonic
• The density of Y is
fY (y) = fX (g
−1
d −1 (y)) g (y)
dy
Recommended Questions
From §2.5 Rice look at questions 53 (Hint use the results on a function of a random variable to convert the r.v. to a standard
normal then use the tables at back of book), 55, 58, 59 and 67 (a, b)
16
Related documents