Download 39 Proposition 4.4. Let X be a discrete random variable, then its

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Randomness wikipedia , lookup

Transcript
39
Proposition 4.4. Let X be a discrete random variable, then its probability
mass function satisfies:
(i) pX (x)
0 for all x 2 R;
(ii) {x 2 R : pX (x) 6= 0} is at most countable;
(iii)
P
x2R
pX (x) = 1.
Conversely, given a map p : R ! R that satisfies the properties (i)-(iii), then
p is the probability mass function of some discrete random variable.
Proof.
(i) It follows just from the definition of a PMF, that is pX (x) =
P(X = x), and the fact that probability measures are non-negative.
(ii) Since X is a discrete random variable, there exists a subset of the
real line K ⇢ R that is countable and such that P(X 2 K) = 1.
That means that the PMF is null outside K, that is pX (x) = 0 for all
x 2 R \ K. Thus, the points to which the PMF assigns strictly positive
value can only be among the ones contained in K, hence there is at
most a countable number of such points.
(iii) Note that, since a random variable takes values in R, we can rewrite
the sample space as
⌦=
[
x2R
{X = x} =
[
x2X(⌦)
{X = x},
that is a union over all possible values of X (the image of ⌦ through X),
of which we have at most a countable number. Moreover, the events
in the union are all mutually exclusive. So, by the additivity axiom of
probability measures, we have
0
1
[
X
1 = P(⌦) = P @
{X = x}A =
P ({X = x}) .
x2X(⌦)
x2X(⌦)
40
4. Discrete random variables
Example 4.5. In the experiment consisting in three independent tosses of
a coin, let us denote by X the number of heads obtained. If the coin is
balanced, we have an equal-likelihood model, so each elementary event has
a probability equal to 1/8=0.125. So, if we want to compute the probability
that in the three tosses we get exactly one head, we have
pX (1) = P(X = 1) = P({(HT T ), (T HT ), (T T H)}
= P({(HT T )}) + P({(T HT )}) + P({(T T H)}
= 3 · 0.125 = 0.375.
Another function associated to a discrete random variable and determined
by its PMF is the cumulative distribution function.
Definition 4.6. Let X be a discrete random variable, the map FX : R ! R
defined by
FX (a) = P(X  a) =
X
pX (x),
xa
a 2 R,
is called the cumulative distribution function of X.
4.1
Classes of discrete random variables
Discrete random variable can be classified based on their PMF in di↵erent
classes that are often related to specific situations.
Bernoulli
In many contexts we are interested in the occurrence of a certain event,
called a success, or the non-occurrence thereof, called a failure, in any of
a sequence of independent repetition of a random experiment, where the
probability of success is always the same. If we denote by E the event
of interest, the success probability is the paramenter p = P(E), which is
constant over all independent repetitions of the experiment, called Bernoulli
trials. Then 1
p is called the failure probability.
4.1 Classes of discrete random variables
41
Definition 4.7. A Bernoulli random variable X with success probability p
is a random variable defined by
8
<1, ! 2 E,
X(!) =
:0, ! 2 E c ,
8! 2 ⌦,
for some event E 2 F with probability P(E) = p.
Note that the probability mass function of a Bernoulli random variable
X with success probability p is given by
8
>
1 p, if x = 0,
>
>
<
pX (x) = p,
if x = 1,
>
>
>
:0,
otherwise,
8x 2 R.
Examples of Bernoulli trials are independent tosses of a coin, where we
observe whether the outcome is a head (success) or not (failure), repetitive
bets on “red” in the roulette, where we observe whether we win (success) or
not (failure), or again the random sampling with replacement from a population, where we observe whether the selected member posseses a specific
attribute (success) or not (failure).
Binomial
Binomial random variables arise when we are not just interested in the
success/failure relative to some event E 2 F in a single trial, but rather in
the number of successes in a finite sequence of Bernoulli trials.
Definition 4.8. A Binomial random variable X with parameters n, p is a
random variable that counts the number of successes, each with probability
p, in a finite sequence of n Bernoulli trials. We denote X ⇠ Bin(n, p).
Proposition 4.9. The probability mass function of a Binomial random variable X with parameters n, p is
8
< n px (1 p)n x ,
x
pX (x) =
:0,
if x = 0, 1, . . . , n,
otherwise,
8x 2 R.
(4.1)
42
4. Discrete random variables
Proof. The sample space for a sequence of n Bernoulli trials is
⌦ = {(x1 , . . . , xn ) : xi 2 {0, 1}, i = 1, . . . , n},
where each element xi in any n-tuple representing a possible outcome has
value 1 in case of a success and value 0 in case of a failure. Take now a non
negative integer k, 0  k  n, and consider the event of getting exactly k
successes in n trials. This is written as
{X = k} = {(x1 , . . . , xn ) 2 ⌦ : |{xi = 1, i = 1, . . . , n}| = k}.
Since the trials are independent, any single outcome in {X = k} has probability pk (1
p)n k . Since the cardinality of {X = k} is the number of
combinations of k elements from a collection of n and the single outcomes
are mutually exclusive events, we get
✓ ◆
n x
P(X = k) =
p (1
x
p)n x .
For all other values of k, the probability that X equals k is null.
Example 4.10. Insurance companies compute their premiums based on
many factor, among which the mortality tables, containing the probabilities that people of a certain age will live another specified number of years.
Assume that the probability that a persone aged 20 will be alive at age 65 is
80%, and suppose that three people of age 20 are randomly selected. Compute the probability that exactly two, at most one, or at least one respectively
of the three people will be alive at age 65.
We assume that the life of each of the three people is independent of the
others, which means we have three Bernoulli trials. The event of interest is
E = “the selected person is still alive at age 65” and the success probability
is p = 0.8. Then, using formula (4.1), the PMF of the variable X counting
the number of people among the three that are still alive at 65 is
✓ ◆
3
pX (x) =
(0.8)x (0.2)3 x ,
x = 0, 1, 2, 3,
x
4.1 Classes of discrete random variables
43
and 0 otherwise. We get
P(X = 2) = pX (2) = 38.4%,
P(X  1) = pX (0) + pX (1) = 10.4%,
P(X
1) = pX (1) + pX (2) + pX (3) = 1
pX (0) = 99.2%.
Hypergeometric
In statistical estimations of population proportions or in quality control,
the key role is played by hypergeometric random variables. Unlike binomial
random variables, these are not related to a sequence of Bernoulli trials.
Definition 4.11. A hypergeometric random variable X with parameters
N, n, p is a random variable that counts the number of elements in a random
sample of size n, taken without replacement from a population of size N ,
having a specified attribute, where p is the proportion of members of the
population possessing the attribute. We denote X ⇠ H(N, n, p).
Proposition 4.12. The probability mass function of a hypergeometric random variable X with parameters N, n, p is gievn by
pX (x) =
Np
x
N (1 p)
n x
N
n
,
max{0, n
N (1
p)}  x  min{n, N p}, (4.2)
and pX (x) = 0 otherwise.
Proof. Since the process of sampling wihout replacement does not consist of
independent selections, we don’t have Bernoulli trials. However, since each
member of the population is equally likely to be selected, then we have an
equal-likelihood model, where the probability of any event E can be computed as the ratio of the number of outcomes in the event over the number of
all possible outcomes. The cardinality of the sample space, i.e. the number
of all possible samples of size n without replacement, is |⌦| =
N
n
. In order
to compute the PMF of X, we only have to consider the events of the kind
{X = k} where max{0, n
N (1
p)}  k  min{n, N p}, since for any
44
4. Discrete random variables
other value of k we have {X = k} = ;. Indeed, the number of selected mem-
bers having the specified attribute cannot be less than the di↵erence of the
sample size and the number of members of the whole population that don’t
have the attribute, and it cannot be bigger than the number of members of
the whole population that have the attribute. Then, in order to compute
pX (k) = P(X = k), we have to compute the cardinality of {X = k}. To
this aim, we can divide the population in two groups, one made up by the
members possessing the attribute and one made up by the remaining ones.
Outcomes in {X = k} are samples where k elements are selected from the
first group, for which we have N p Cn possibilities, and n
the second group, for which we have
BCR,
|{X = k}| =
✓
N (1 p) Cn k
Np
k
◆✓
k are selected from
possibilities. Thus, by the
◆
N (1 p)
.
n k
This implies that
|{X = k}|
P(X = k) =
=
|⌦|
Np
k
N (1 p)
n k
N
n
,
which ends the proof.
Example 4.13 (Statistical quality control). Consider a quality assurance
engineer who has to inspect the finished products of a company manufactoring TVs. In particular, he has to select at random 5 TVs from each lot of
100, inspect them thoroughly and report the number of defective ones. Let
X denote the number of defective TVs in a lot of size 100. Assume that 6
items are actually defective in that lot and compute the probability that the
selected sample contains k defective items, for k = 0, 1, 2, 3, 4, 5.
The variable X defined here is a hypergeometric random variable with parameters 100,5,p, where p = 6/100 = 0.06. Therefore, we have
P(X = k) =
100·0.06
k
100·0.94)
5 k
100
5
=
6
k
94
5 k
100
5
.
4.1 Classes of discrete random variables
45
Poisson
Poisson random variable are very recurrent in many applications and,
together with the binomial random variables, are the most important one in
the discrete framework.
Definition 4.14. A discrete random variable X is a Poisson random variable
with parameter
> 0 if its probability mass function is given by
8
x
<e
, if x = 0, 1, . . . ,
x!
pX (x) =
8x 2 R.
:0
otherwise,
(4.3)
Remark 4.15. The function defined in (4.3) is indeed a probability mass
function.
This is important in order for Definition 4.14 to make sense. In order
to prove it, it is enough to check that the properties (i)-(iii) in Proposition
4.4 are satisfied. The properties (i) and (ii) are trivially satisfied, by the
definition in (4.3). Let us check if (iii) holds true. Note that for any t 2 R,
the exponential of t can be rewritten as
t
e =
1
X
tk
k=0
We have:
X
pX (x) =
x2R
1
X
x=0
pX (x) = e
k!
.
1
X
x=0
x
x!
=e
e = 1.
So all properties in Proposition 4.4 are satisfied and we have a PMF.
Poisson random variables often arise in the modeling of the frequence of
occurrence of a certain event during a specified period of time.
Geometric
Geometric random variable are also related to a sequence of Bernoulli
trials, but instead of counting the number of successes in a finite sequence like
in the case of binomial variables, they count the number of trials performed
up to the first success.
46
4. Discrete random variables
Definition 4.16. A geometric random variable X with parameter p is a random variable that counts the number of Bernoulli trials with success probability p up to, and including, the first success. We denote X ⇠ G(p).
Proposition 4.17. The probability mass function of a geometric random
variable X with parameter p is given by
8
<p(1 p)x 1 , if x = 0, 1, . . . ,
pX (x) =
:0,
otherwise,
8x 2 R.
(4.4)
Proof. Let us consider a non-negative integer k. The event {X = k} = “the
first success is at the k-th trial” occurs if and only if the first k
1 trials
result in failures and the k-th trail results in a success. Thus, denoted Ei the
event that the the i-th trial results in a success, for all i 2 N, we can rewrite
!
k\1
{X = k} =
Eic \ Ek .
i=1
Then, since the trials are all independent, and so are all the events in the
intersection, we get
pX (k) = P({X = k}) =
k
Y1
i=1
P (Eic ) · P(Ek ) =
k
Y1
i=1
(1
p) · p = (1
p)k 1 p.
Geometric random variables have an interesting property which is due to
the independence of Bernoulli trials. If we choose any trial in a sequence
of Bernoulli trials and we consider the subsequent sequence of trials, this
behaves in the same way as the sequence starting from the beginning. Thus,
if by the n-th trial no successes occur, the number of trials up to the first
success starting from the n-th trial has the same distribution as the number
of trials from the first one up to the first success. Furthermore, we can prove
that geometric random variable are the only positive-integer-valued random
variable possessing such a property.
4.1 Classes of discrete random variables
47
Proposition 4.18. A discrete random variable X taking values in N [ {0}
has the lack-of-memory property, i.e.
P(X = n + k|X > n) = P(X = k),
8n, k 2 N,
(4.5)
if and only if it is a geometric random variable.
Proof. ((). If X ⇠ G(p), then
P(X = n + k, X > n)
P(X > n)
P(X = n + k)
=
P(X > n)
p(1 p)n+k 1
= P1
p)i 1
i=n+1 p(1
P(X = n + k|X > n) =
p(1 p)n+k 1
= P1
p i=n (1 p)i
p(1 p)n+k 1
=
n
p 1(1 (1p) p)
= p(1
p)k
1
= P(X = k),
so the lack-of-memory property holds true.
()). If X is a discrete random variable such that P(X 2 N) = 1 and having
the lack-of-memory peroperty, then we want to prove that it is geometrically
distributed. Let us denote p = P(X = 1), i.e. the probability of having a
success on the first trial, hence on each trial, and qn = P(X > n). Applying
the lack-of-memory property for k = 1, we get
P(X = n + 1|X > n) = P(X = 1) = p,
where the left-hand side can be rewritten as
P(X = n + 1)
P(X = n + 1|X > n) =
P(X > n)
qn qn+1
=
qn
qn+1
=1
.
qn
48
4. Discrete random variables
Hence the equation
qn+1
qn
= 1 p holds true for all n 2 N, but it is also true for
n = 0, since q0 = P(X > 0) = 1 and q1 = P(X > 1) = 1
Then, taking the product for n = 0 to n = k
qk =
k
Y1
qn+1
= (1
q
n
n=0
P(X = 1) = 1
1, we get
p)k ,
and consequently
pX (k) = qk
Therefore, X ⇠ G(p).
1
qk = (1
p)k
1
(1
p)k = p(1
p)k 1 .
p.