Download We begin with an example. Example 3.1. As an example of random

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
PROBABILITY FALL 2014 - CLASS NOTES
11
3. SEPTEMBER 16
We begin with an example.
Example 3.1. As an example of random experiment with sample space the interval [0, 2⇡),
we have described a spinner on September 11. It is reasonable to assume that the probability
of the needle ending up between the angles a and b is proportional to the normalized (by
2⇡) length of the interval, i.e. (b a)/2⇡. We have verified this experimentally using the
random generator in MatLab.
We now want to look at the distribution of the sum of two such uniformly distributed
random numbers in [0, 1) (we renormalize for convenience), call it X .
3.1. Probability distribution functions on R, Rn . We want to be able to describe random
experiments whose natural sample space ⌦ is a subset of the real line R or of the Euclidean
space Rn . We get a simplified theory if we restrict ⌦ to be of the following type:
· ⌦ ⇢ R is a finite or countable union of (closed, half-open, open) possibly unbounded
intervals {I n };
· ⌦ ⇢ Rn is a countable union of products of intervals Ras above;
· ⌦ ⇢ Rn is a domain for which the Riemann integral ⌦ ·dx 1 . . . dx n makes sense;
we will refer to ⌦ as such by the phrase sample space (where we mean in fact admissible
sample space).
Definition 3.2. Let ⌦ ⇢ Rn be a sample space. A function f : ⌦ ! R is a probability
distribution function on ⌦ if
· f (x)
0 for all x 2 ⌦;
Z
f (x) dx = 15
· f is Riemann integrable on ⌦ and
⌦
Note thatR we can always think of f being defined on Rn by setting f = 0 on Rn \⌦. If we do
so, then Rn f = 1.
Definition 3.3. We say that X : R ! R is a random variable with probability distribution
function f = f X if
Z x
P(X  x) =
The function
Z
FX (x) =
1
f (t) dt.
x
1
f (t) dt = P(X  x)
is called cumulative distribution function of X .
5
We recall that, for instance, when ⌦ ⇢ R, ⌦ =
Z
S
n In
f (x) dx =
⌦
as above, so that
XZ
n
In
f (x) dx.
12
PROBABILITY FALL 2014 - CLASS NOTES
Lemma 3.4. The cumulative distribution function FX of f is a nondecreasing, absolutely continuous function such that FX0 = f at all points where FX is differentiable, and
lim FX (x) = 0,
lim FX (x) = 1.
x! 1
x!1
Proof. That FX is nondecreasing is immediate from f being positive. That FX is absolutely
continuous and FX0 = f wherever FX is differentiable is immediate from the fundamental
theorem of calculus for Riemann-integrable functions. Finally, the last two properties
are
R
obvious respectively from the Riemann integrability of f and from the fact that R f = 1. ⇤
Example 3.5. Let X be a random variable which is uniformly distributed on the interval
[a, b], with a < b being real numbers. Intuitively, this means that
8
x > a,
>
<0
x a
a  x < b,
P(X  x) =
>
:b a
1
x b;
this means that the function
⇢
f (x) =
1
b a
a<x<b
x  a or x
0
is a probability distribution function for X.
b
The Rn case of Definition 3.3 is as follows.
Definition 3.6. We say that X = (X 1 , . . . X n ) : Rn ! Rn is a random variable with probability
distribution function f if
Z x1
Z xn
P(X 1  x 1 , . . . , X n  x n ) =
The function
Z
FX (x 1 , . . . , x n ) =
1
Z
x1
1
···
is called cumulative distribution function of X .
···
1
f (t 1 , . . . , t n ) dt 1 · · · dt n .
xn
1
f (t 1 , . . . , t n ) dt 1 · · · dt n
Example 3.7. Let X = (X 1 , X 2 ) be a random variable describing the landing position of a dart
thrown at a target ⌦ which is a disc of radius R > 0, in cartesian coordinates centered at
the center of the target. Assuming that the landing position is uniformly distributed on the
circle, a probability distribution function is given by
⇢ 1
x 12 + x 22 < R2 ,
2
f (x 1 , x 2 ) = ⇡R
0
otherwise.
So in particular, for instance
Z0 Z
P(X 1  0, X 2  0) =
1
Z
0
1
f (x 1 , x 2 ) dx 1 = dx 2 =
x 12 +x 22 <R2 ,x 1 0,x 2
1
1
dx 1 dx 2 = .
2
⇡R
4
0
PROBABILITY FALL 2014 - CLASS NOTES
Our theory will later justify that for any E ⇢ ⌦
Z
P(X 2 E) :=
E
f (x 1 , x 2 )dx 1 = dx 2 =
13
|E|
⇡R2
as we had postulated.
∆
We can define a new random variable ⇢ := X 12 + X 22 as the distance from the landing to
the center of the target. We have, using the above, that
8
<0 r0,
2
2
2
2
P(⇢  r) = P(X 2 {x 1 + x 2  r }) = Rr 2 0  r  R
:
1, r > R.
Can we find the probability distribution function of ⇢?
Example 3.8 (improper integrals). Let
8
c
<
f (x) = x log
:
0
e
x
2
0< x <1
elsewhere
Choose c such that f is a probability distribution function.
Example 3.9 (uniform probability). Let (X , Y ) be uniformly distributed on the square [0, 1]2 .
Find the cumulative probability distribution function and the probability distribution function of Z = X + Y .
Example 3.10 (uniform probability). The train to Boston leaves every hour, but Francesco
has forgotten the exact time (i.e. each hour hr a train leaves at hr:mm and he does not know
what mm is). He will show up at the train station between 5pm and 6pm. Let T be the time
that Francesco will have to wait at the station. Assuming that both Francesco’s arrival and
the train departure time 5:mm are uniformly distributed between 5pm and 6pm, calculate
the cumulative probability distribution for T .
3.2. Probability measure associated to a distribution function. Given a random variable
X : R ! R with probability distribution function f = f X , we would like to calculate P(X 2 E)
for as many sets E ⇢ R as possible: these sets will be our events.
Definition 3.11. Let B ⇢ R. Then B 2 B(R) if either B or its complement can be written as
the countable union of subintervals (an , bn ] with an < bn real numbers.
Lemma 3.12. The collection B(R) is a
-algebra on R and furthermore
· B(R) contains all intervals of the form (a, b] (with possibly a = 1), [a, b], [a, b)
(with possibly b = 1), (a, b) (with possibly a = 1, b = +1)
· B(R) contains all the points, in the sense that {a} 2 B(R) for all a 2 R.
With the present definitions, see e.g. Definition 3.3, we can only compute P(X 2 E) for
E = ( 1, x]. With the following theorem, we extend the definition of P(·) to all sets of
B(R).
14
PROBABILITY FALL 2014 - CLASS NOTES
Theorem 3.1. Let X : R ! R be a random variable with probability distribution function
f = f X . There is a unique probability measure PX on (R, B(R)) in the sense of Definition 2.13,
given by PX (B) = P(X 2 B), such that for all a < b with possibly a = 1.
Z b
P(X 2 (a, b]) = P(x 2 (a, b)) = FX (b)
(3)
FX (a) =
a
f X (x) dx.
In particular, the following properties hold true.
(1) if E ⇢ F ⇢ R are both in B(R) then PX (E)  PX (F );
(2) PX (R) = 1, and if ⌦ is the set where f is nonzero, PX (B \⌦) = PX (B) for all B 2 B(R);
(3) if {A j } is a collection of pairwise disjoint sets of B(R), meaning that A j \ Ak = ; unless
k = j, then
Ç
å
1
1
X
[
PX
Aj =
PX (A j ).
j=1
j=1
(4) PX ({x}) = P(X = x) = 0
8x 2 R;
(5) PX (A) + PX (Ã) = 1, Ã being the complement of A.
We will not see a complete proof but the idea is to define PX on countable unions of
disjoint open on the left intervals by
Ç
å
1
1
X
[
PX
(a j , b j ] :=
FX (b j ) FX (a j )
j=1
j=1
and then extend this to any B 2 B(R) by postulating that the countable additivity (4) in
the Theorem holds.
3.3. Further examples.
Example 3.13 (exponential distribution). Let T be the random variable describing the (random) time between two consecutive breakdowns of a certain machine which is assumed
to be wear-free, in the following sense: if we set the origin of time at the last breakdown,
the probability of having a breakdown in the time interval [t, t + s) with the machine is
still working at time t > 0 is the same as the probability of having one between [0, s). By
this assumption, we can determine the cumulative distribution function of T up to some
parameter > 0. Indeed, we only consider positive times, so P(T  0) = 0. If t > 0 and
s > 0 a moment’s thought leads to P(T > t + s) = P(T > t)P(T > s) and thus setting
G(t) = P(T > t) = 1 P(T  t) = 1 F T (t), we have that
G(t + s) = G(t)G(s),
8t, s > 0.
It is clear that G(t) = e t satisfies the above equation for all s, t. Some work shows that
these are the only continuous solutions to the above equation (the proof is that H = ln G
satisfies H(t +s) = H(t)+H(s) and such a function, if continuous, must be linear.) Moreover
> 0 for G to go to zero at infinity. Therefore, we have found
⇢
0
t 0
F T (t) =
t
1 e
t > 0.
Note that
> 0 is the reciprocal of the expected time between occurrences.