Download MTH 202 : Probability and Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
MTH 202 : Probability and Statistics
Lecture 5 - 8 : 15, 20, 21, 23 January, 2013
Random Variables
and their Probability Distributions
3.1 : Random Variables
Often while we need to deal with probability of certain events, we need
to take care of certain arithmetic operations on the sets which turn out
to be tricky. Rather it is easier to understand deal with real valued
functions defined on the points of the sample space Ω. We call these
as random variables.
Before we get into these we would need to know about a specific σ-field
built on the open intervals in R, known as the Borel σ-field.
Theorem 3.1.1 : Let X be a non-empty set and P(X) denote its
power set. For any collection C ⊆ P(X) of subsets of X, there exists a smallest σ-field of subsets of X which contain all subsets of
X from C.
Proof : There is at least one σ-field which contain C itself, namely
P(X). Now consider to collect all such σ-field : T := {U : C ⊆ U};
the collection is non-empty since P(X) is a member of it. Now
the intersection ∩U ∈T U fulfill the defining properties of a
σ-field (Verify !) and this also contain C.
Note 3.1.2 : The σ-field constructed above is often denoted by hCi and
called as the one generated by C. In practical it is hard to describe
the members of hCi while a collection of subsets C ⊆ P(X) is given.
Hence for practical purposes, we will be needing a necessary condition
to avoid dealing with sets from hCi.
Definition 3.1.3 : Let C be the collection of all open intervals of the
form (a, b) where a, b ∈ R. Then the smallest σ-field hCi (in R), often
denoted as B(R) generated by C is called the Borel σ-field. The sets
from B(R) are called Borel sets.
1
2
Exercise 3.1.4 : B(R) can also be described to be generated by the
following collections :
(i){(−∞, x] : x ∈ R}, (ii){[x, ∞) : x ∈ R}, (iii){(−∞, x) : x ∈ R},
(iv){(x, ∞) : x ∈ R}, (v){[x, y] : x, y ∈ R}, (vi){[x, y) : x, y ∈ R},
(vii){(x, y] : x, y ∈ R}.
We are now ready to encounter random variables :
Definition 3.1.5 : Let (Ω, S, P ) be a probability space. A function
F : Ω → R is called a random variable (RV in short) if :
F −1 (B) = {ω ∈ Ω : F (ω) ∈ B} ∈ S
for every Borel set B ⊆ R.
It is customary to denote a random variable by X instead of the usual
function notation F . From the previous exercise we can deduce that :
Theorem 3.1.6 : Let (Ω, S, P ) be a probability space. Then X is a
random variable if and only if
X −1 ((−∞, x]) = {ω ∈ Ω : X(ω) ≤ x} ∈ S
for all x ∈ R.
Example 3.1.7 : Suppose that we toss a coin thrice and count the
number of heads turn up in each outcome. Here the sample space Ω
is {abc : a, b, c ∈ {H, T }}. Let us define the probability on Ω by
P ({abc}) := 1/8 for all abc ∈ Ω. Suppose we are speaking of the event
A = {at least two H0 s} and calculate P A.
At this moment instead of talking about the set A, we can simply
introduce a random variable X : Ω → R by :
X(ω) := number of H0 s in ω
For example X(HT H) = 2 = X(HHT ), X(HHH) = 3 etc. The
function X is a random variable simply because S is all of the power
set P(Ω).
Finally we would write P A by P (X ≥ 2) and calculate that P A =
1/2 = P X −1 ([2, ∞)).
Exercise 3.1.8 : Let X be an RV. Is√|X| also an RV? If X is an RV
that takes only nonnegative values, is X also an RV?
Solution : Let Ux be the set
Ux := |X|−1 ((−∞, x]) = {ω ∈ Ω : |X(ω)| ≤ x}
3
Then


if x < 0,
∅
−1
Ux = X (0)
if x = 0,

X −1 [−x, x] if x > 0.
Clearly these sets are in S, since X is a RV.
√
Next we recall that for x ≥ 0 in R the symbol x denote the positive
square root of x. Let Vx be the set
p
√ −1
Vx := X ((−∞, x]) = {ω ∈ Ω : −∞ < X(ω) ≤ x}
Then Vx = ∅ if x < 0. Now if x ≥ 0 we have
p
Vx = {ω : 0 ≤ X(ω) ≤ x} = X −1 ([0, x2 ])
√
Hence X is also an RV.
Exercise 3.1.9 : Let Ω = [0, 1] and S be the Borel σ-field of subsets
of Ω. Define X : Ω → R by :
(
ω
if 0 ≤ ω ≤ 12 ,
X(ω) =
ω − 12 if 12 < ω ≤ 1.
Is X an RV? If so, what is the event {ω : X(ω) ∈ ( 14 , 12 )}?
Solution : We notice that

∅
if x < 0,

−1
1 1
X (−∞, x] = [0, x] ∪ ( 2 , 2 + x] if 0 ≤ x < 12 ,

[0, 1 ]
if x ≥ 21 .
2
3.2 : Probability distribution of a Random Variable
Theorem 3.2.1 : The RV X defined on a probability space (Ω, S, P )
induces a probability space (R, B(R), Q) defined by
Q(B) := P (X −1 (B)) = P ({ω : X(ω) ∈ B})
for all B ∈ B(R).
Proof : Ref. Pg 43, Sec. 2.3, Theorem 1 [RS].
Before we speak about the probability distribution, we would first define the idea of a distribution function in general.
4
Definition 3.2.2 : A function F : R → R is called a distribution
function if :
(i) x < y implies F (x) ≤ F (y) for all x, y ∈ R (non-decreasing),
(ii) limx→a+ F (x) = F (a) for all x ∈ R (right continuous),
(iii) F (−∞) = 0 (i.e., limx→−∞ F (x) = 0) and
F (+∞) = 1 (i.e., limx→+∞ F (x) = 1)
Exercise 3.2.3 : Do the following


0
(a) F (x) = x

1
functions define DF’s?
if x < 0,
if 0 ≤ x < 12 , and
if x ≥ 21
1
tan−1 x (x ∈ R)
π
Solution : (a) The property (i) can be easily checked. Since the
function is defined by the patches of continuos functions, we would
need to verify (ii) at x = 0 and x = 21 .
Now we see that, limx→0+ F (x) = limx→0+ x = 0 = F (0). Similarly at
x = 21 we have limx→ 1 + F (x) = 1 = F ( 12 ).
(b)
F (x) =
2
The third property is clear since the F merges with the constant functions 0 and 1 near −∞ and +∞ respectively.
(b) The limit of F (x) limx→−∞ π1 tan−1 x = − 21 6= 0.
Similarly, F (+∞) = 12 . Hence this is not a distribution function.
Theorem 3.2.4 : The set of points where a DF F is discontinuous is
at the most countable.
Proof : Ref. Pg 44, Sec. 2.3, Theorem 2 [RS].
We would now define the DF of an RV.
Definition 3.2.5 : Let X be an RV defined on a probability space
(Ω, S, P ). The function F : R → R defined by
F (x) = Q(−∞, x] = P ({ω ∈ ω : X(ω) ≤ x}) (x ∈ R)
is called the distribution function of the RV X.
The name ”distribution function of an RV” is surely given for some
reason :
Theorem 3.2.6 : The function F defined as above is a DF.
Proof : Ref. Pg 45, Sec. 2.3, Theorem 3 [RS].
5
In fact every DF can be shown to be a DF of an RV on some probability
space. The proof of this would not be discussed in this course.
From now on we would adopt the following notations :
P ({ω ∈ ω : X(ω) ≤ α}) is denoted by P (X ≤ α),
P ({ω ∈ ω : X(ω) < α}) is denoted by P (X < α) etc.
Exercise 3.2.7 : Do the following function define a DF? If so, find
P (−∞ < X < 2).
(
1 − e−x if x ≥ 0,
F (x) =
0
if x < 0.
Solution : F 0 (x) = e−x > 0 shows that the function is strictly increasing on the positive half of the real line. It is constant on the negative
side and 0 < 1 − e−x for all x > 0. Hence F is non-decreasing.
F (x) is continuous while x ≥ 0, which implies F is right continuous at
x = 0. At any other point F is indeed continuous.
Finally F is the constant function 0 while being on the negative side
of the real line showing F (−∞) = 0. Since limx→+∞ e1x = 0, we have
that F (+∞) = 1. Thus F is a DF.
Since F is a continuous function, P (X = a) = 0 for all a ∈ R (Why?).
P (−∞ < X < 2) = P (−∞ < X ≤ 2) − P (X = 2)
= F (2) − 0 = 1 − e−2
3.3 : Discrete and Continuous Random Variables
There would essentially be two distinct type of RV’s we would be dealing with. The first we will be discussing about discrete RV’s. Roughly
speaking, the discrete RV is the one for which the complete probability
mass would be concentrated at some discrete points (i.e., points which
are separated from each other by certain positive distance). First, we
would briefly recall the notion of countable set.
Definition 3.3.1 : A set E is said to be countable if it is either finite,
or else there is a bijection f : N → E.
The set E is finite meaning if you along with the some others are trying
to count the elements of E by numbers 1, 2, 3, . . . , it would theoretically
stop at some point, doesn’t matter even if the sun is extinct by then,
6
or else the earth is evacuated by rest of the humans while no one could
have changed your interest in counting E.
Figure 1. WALL-E and EVA
On the other hand, a countably infinite set is impossible to be counted
by any finite time given. However like the previous case, say while
counting by the numbers 1, 2, 3, . . . you also put a tag on the elements
by these numbers. Thus we would be calling E countably infinite if
every element would have a number tag ”n”, however large it could be.
In the previous definition the bijection f would ensure that the tags,
say ”n” given to the element ”f (n)” are all distinct.
Definition 3.3.2 : An RV defined on a probability space (Ω, S, P ) is
said to be of discrete type (or simply discrete) if there is a countable
set E ⊆ R such that P (X ∈ E) = 1.
A relevant query here at this point would be whether countable sets in
R are Borel sets, else it would be meaningless to talk about
P (X ∈ E) = P X −1 (E). First we note that every singleton sets {x} in
R are Borel sets by means of the infinite nested intersection :
∞ \
1
1
{x} =
x − ,x +
n
n
n=1
Thus countable subsets of R is a Borel set, since they would be countable union (either finite or infinite) of finite sets.
Now if it is known that P (X = xi ) = pi ≥ 0 for all xi ∈ E, we have
from the definition of probability that
∞
X
pn = 1
n=1
7
∞
Definition 3.3.3 : The collection of non-negative
P∞ real numbers {pi }i=1
satisfying P (X = xi ) = pi for all i ∈ N and i=1 pi = 1, is called the
probability mass function (PMF) of the RV X.
The DF F of X is given by :
X
F (x) = P (X ≤ x) =
pi (x ∈ R)
xi ≤x
The name ”probability mass function” for the expression {pi }∞
i=1 of
non-negative real numbers may be misleading. In fact it can precisely
be written as a function p : R → R by
(
pk if x = xk (k = 1, 2, . . . )
p(x) =
0 otherwise
In general :
Definition 3.3.4 :PLet {pi }∞
i=1 be a collection of non-negative real
∞
numbers such that i=1 pi = 1. Then {pi }∞
i=1 is the PMF of some RV
X.
Exercise 3.3.5 : For what value of K do the following define the
probability mass function of some random variable :
f (x) = K/N (x = 1, 2, . . . , N )
P
Solution : We need N
i=1 K/N = K = 1.
Next we would consider the RV’s associated to the DF’s which are of
continuous type.
Definition 3.3.6 : Let X be an RV defined on a probability space
(Ω, S, P ) with DF F . Then X is said to be of continuous type if
there is an integrable function f : R → [0, ∞) such that
Z x
F (x) =
f (t)dt (x ∈ R)
−∞
The function f is called the probability density function (PDF) of the
RV X.
Properties 3.3.7 : Let f be the PDF of the RV X on the probability
space (Ω, S, P ). Then :
Z ∞
Z b
(i)
f (t)dt = 1, (ii) P (a < X ≤ b) =
f (t)dt
−∞
In general :
a
8
Theorem 3.3.8 : Every non-negative real function f that is integrable
over R and satisfies
Z ∞
f (t)dt = 1
−∞
is the PDF of some continuous RV X.
As a special note, we would address a few comments regarding continuity of the distribution function.
Theorem 3.3.10 : Let F be the distribution function corresponding
to an RV X over the probability space (Ω, S, P ).
If F is continuous at x = a, then P (X = a) = 0. Otherwise
P (X = a) = F (a) − F (a−) > 0
Proof : Consider the sequence of event sets
1
1
En := {ω ∈ Ω : a − < X(ω) ≤ a} = X −1 ((a − , a])
n
n
1
Since (a − n , a] is a Borel set, En ∈ S for all n ∈ N. But we see that
E1 ⊇ E2 ⊇ E3 ⊇ . . .
i.e., the sequence
∞
\
{En }∞
n=1
is decreasing and we have
∞
\
1 −1
En = X
(a − , a] = X −1 ({a})
n
n=1
n=1
Since {En }∞
n=1 is decreasing we have (See corollary to Thm.6, Pg-13,
[RS])
−1
lim P (En ) = P (∩∞
({a})) = P (X = a)
n=1 En ) = P (X
n→∞
But P (En ) = P (a −
1
n
< X ≤ a) = F (a) − F (a − n1 ). Hence
1
) = F (a) − F (a−)
n→∞
n→∞
n
Now if F is continuous at x = a, it is left continuous there as well.
Hence, F (a) = F (a−).
Next if F is not continuous at x = a, since F is increasing we have
F (a − n1 ) < F (a) for all n ∈ N. Thus {P (En )}∞
n=1 is a sequence of
positive real numbers whose limit exists (since F is non-decreasing),
but not 0. Hence the limit F (a) − F (a−) > 0.
lim P (En ) = F (a) − lim F (a −
We will finally note that if X is of continuous type, then F has a
derivative almost everywhere, which is an equivalent to say that F is
absolutely continuous, a notion which is much stronger than continuity.
9
For details, you may consult Chap-5, Section 4, Cor. 12, [ROY]. In
short, we have the following conclusion :
Corollary 3.3.11 : Let F be the distribution function corresponding
to an RV X of continuous type over the probability space (Ω, S, P ).
Then P (X = a) = 0 for all a ∈ R. In particular, F is a continuous
function.
Moreover, there are RV’s whose types are neither continuous, nor discrete. Hence the DF’s for these would not be absolutely continuous.
However, these might have the corresponding density (or probability)
function which would be a little tricky to describe. For example :
Example 3.3.12 : Is the following function a DF? If so, find the
corresponding density or probability function :


0 2 if x < 1,
F (x) = (x−1)
if 1 ≤ x < 3,
8


1
if x ≥ 3
Proof : Except at the interval [1, 3), the function F is constant. In the
open interval (1, 3), we have F 0 (x) = (x − 1)/4 > 0. Hence F is nondecreasing. Clearly, F (−∞) = 0, F (+∞) = 1. Finally, F is defined
piecewise by the functions which are always right continuous, implying
F is right continuous. Therefore, F is a DF. The corresponding density
function f is given by


if x < 1,
0
(x−1)
f (x) =
if 1 ≤ x < 3,
4

0
if x ≥ 3
We note that F is not continuous at x = 3. In fact,
1
1
P (X = 3) = F (3) − F (3−) = 1 − = > 0.
2
2
References :
[ROY] Real Analysis, H.L. Royden, 3rd Edition, Macmillan Publishing
Co.
[RS] An Introduction to Probability and Statistics, V.K. Rohatgi and
A.K. Saleh, Second Edition, Wiley Students Edition.