Download A review of some useful facts about random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Least squares wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
Random Variables
Math 146
1
Introdution
Reall that we introdued random variables as funtions dened on a sample
spae. In other words, we have a sample spae
by
X, Y, Z, . . .)
observations.
S,
and funtions (often denoted
whih take spei values depending on the outome of our
The point is that we will mostly deal with events dened by
values taken by one or more random variables, as in
{X = x} , {a ≤ X ≤ b} , {X = x, a ≤ Y ≤ b}
et. We'll sometimes write r.v. for random variable, for brevity.
The olletion of probabilities
P [X = x] = pX (x)
(for all possible values of
variable
X.
x)is
(1)
alled the (probability ) distribution of the random
When we onsider more than on random variable, we may speak of
the olletion
P [X = x, Y = y] = pX,Y (x, y)
(for all possible values of
x
and
y)
as the joint distribution of
X
and
Y.
Sine a full desription of the distribution is sometimes diult to obtain,
and at other times is not needed, we often use some parameters of the distribution. Speially, we will often be interested in
•
The mean (or expeted value ) of the r.v.
EX =
X
X
pX (x) x
(2)
x
where the sum is over all possible values of
•
x.
X : some2
3
times we onsider a funtion of X,f (X) (ommon examples are X , X , . . .,
kX
e , and so on). It is a random variable too, and with a little reetion
The mean (or expeted value) of some interesting funtion of
it is not diult to show that
Ef (X) =
X
x
1
pX (x) f (x)
(3)
2
2 A Few Consequenes
•
As speial ases of the previous denition, we have the absolute moments
of the distribution:
EX k =
X
pX (x) xk
(4)
x
for
k = 2, 3, . . .
and the entered moments
k
E (X − EX) =
•
X
x
k
pX (x) (x − EX) =
X
x
pX (x) x −
X
!k
pX (x) x
x
An espeially ommonly used entered moment is the one with
(5)
k = 2,
alled the Variane of the r.v.:
V ar (X) =
X
x
2
pX (x) (x − EX)
(6)
Using the formulas above and a little algebra, it is not hard to show a few
proprieties of the operation of taking the expeted value. For example:
• E (aX + bY ) = aEX + bEY
(where
a
and
b
are two numbers)
2
• E (aX + bY ) = a2 EX 2 + b2 EY 2 + 2abE (XY )
• V ar (X + Y ) = V ar (X) + V ar (Y ) + 2cov (X, Y ), where cov (X, Y ) =
E [(X − EX) (Y − EY )] is usually alled the ovariane of X and Y
We'll see several ases where the whole point of our experiments will redue to
estimating the true value of the expeted values and the variane of one or
more random variables.
2
A Few Consequenes
Conentrating a moment on the rst few moments, in partiular the mean and
the variane, there are a few onsequenes that we may want to draw.
2.1
Linear Transformations
Given a r.v.
as
aX + b,
X
we are sometimes interested in working with a new r.v. dened
where
• X+b
a
and
b
are two numbers. Note that
is the same r.v., but with its values shifted by
example, that
X
b.
Suppose, for
represents the time until a ertain event ours (e.g., a
bus arrives). To study it we need to deide a starting time, when is it
that
• aX
X = 0. X + b
shifts the starting time to
−b.
is the same r.v., but using a dierent sale.
the time
X
Suppose we measure
mentioned above in hours. If we deide to hange are units to
minutes, all readings will be multiplied by 60, hene we will be onsidering
the r.v.
60X .
3
2 A Few Consequenes
A little algebra, and the denitions show that
• E (aX + b) = aEX + b
• V ar (aX + b) = a2 V ar (X)
In partiular, a shift does not hange the variane, sine we are omputing it
with respet to
2.2
EX
whih is shifted by the same amount.
Using Moments to Get Estimates
A simple (and very rough, beause of its vast generality) estimate illustrates
one use of moments. Consider
k
E |X|
for some
whether
k (we use absolute values, so as to deal only with non negative values,
k is even or odd). Applying the formula we saw, we have
X
E |X|k =
(7)
pX (x) |x|k
x
|X| exeed a ertain
P [|X| > M ]. To get a rough handle on it, we an split the sum in (7) in
parts: for |x| < M , and |x| ≥ M
X
X
X
k
k
k
pX (x) |x| =
pX (x) |x| +
pX (x) |x|
Now, suppose we are interested in the probability that
value,
two
x
Now, if
|x| ≥ M ,
X
|x|≥M
x<M
x≥M
we lower the value of the sum if we write
k
pX (x) |x| ≥
X
pX (x) M k = M k
X
|x|≥M
|x|≥M
in plae of
x:
pX (x) = M k P [|X| ≥ M ]
k
|x|<M pX (x) |x|
rough estimate, but we are assuming almost nothing on
Also, no matter what, we'll have that
M
P
≥ 0 (that's a quite
pX , so we an only
apply very rough information). Combining the two,
E |X|k ≥ M k P [|X| ≥ M ]
P [|X| ≥ M ] ≤
E |X|
Mk
k
This is known as Markov's Inequality. In partiular, onsider a r.v. Y and
X = Y − EY . Then EX 2 = V ar (Y ), and Markov's inequality, for k = 2,
dene
beomes
P [|Y − EY | ≥ M ] ≤
V ar (Y )
M2
This is known as Chebyshev's Inequality. Hene, knowing the variane of a
r.v. allows us a worst ase estimate of the probability of ending up far from the
mean.
4
2 A Few Consequenes
2.3
What is Expeted in the Expeted Value?
Atually, nothing is expeted.
EX
is not (neessarily) the most likely outome,
and, quite often, it is not even a value that X will ever take (think of X , equal
1
1
1
1
to 0 or 1, eah with probability : the expeted value is 0 ·
2
2 + 1 · 2 = 2 whih
is not a value X an take).
The signiane of
EX ,
in pratial terms, is given by the (mathematial)
Law of Large Numbers, whih we will disuss when turning to statistial appli-
ations.
In a hand-waving way, it is a good approximation to the average of
very many independent observations of X . For example, if you play a lottery,
−5
with probability of win p (say, p = 10
). If a win will gain you $100, we an
−5
represent a win as a random variable X suh that P [X = 100] = 10
, and
−5
P [X = 0] = 1 − 10 . Hene,
EX = 100 · 10−5 = 10−3
If
n
suessive attempts at this lottery an be onsidered as independent, iden-
tially distributed opies of
X,
and
n
is suiently large, we may expet to
end up with an average win (that is, total dollars won, divided by number of
−3
attempts) approximately equal to 10
.
This does not mean that you an be pretty sure that after, say,
10, 000
at-
tempts, you will end up with $10! To make this lear, onsider a simpler alu-
lation: suppose you play a fair game of hane, one in whih the probability
1
of winning is , and you are looking at the number of wins in this game, over a
2
large number of attempts. If the Law of Large Numbers applies, the perentage
of wins will be lose to 50%. This means that if you play
n
times
You would think that this implies
1
n
≈
N
2
that n ≈ N − n,
N
times, and you win
but that's not so!
The
statement above means, in preise language, that
for any
ε, provided N
is suiently large. If, for example, we had
(8) would hold very well:
and if
N>
(8)
n=
N
2
√
+ N,
√ √
N
n
− 1 = 2 + N − 1 = N = √1
N
2 N
N
2 N N
1
ε2 , indeed
Consequently, for, say
winning
n
− 1 < ε
N
2
n
− 1 < ε
N
2
N = 106
(you play a million games), you would be
106
+ 103
2
5
3 A Note on Continuous Random Variables
times, and so the dierene between your wins and those of your opponent would
be
106
+ 103 −
2
106
− 103
2
a relatively small number ompared to
106 ,
= 2 · 103
but marking a signiant dierene
in number of wins and, under this assumption, things would get worse and
worse as the game proeeded!
3
A Note on Continuous Random Variables
We may onsider any r.v. as disrete, i.e., taking a number of values (maybe,
theoretially, innite), that we an list.
In some ases, this is obvious.
instane, the toss of two die results in 11 possible values for
X
and
Y
Z = X +Y
For
where
are the points shown by the rst and seond die. Also, if we play a
game repeatedly, and for a very long time, the attempt number of our rst win,
let's all it
N,
an take value 1, 2, 3, ... and so on, potentially without bound,
if we keep losing and are very persistent.
However, in other ases this idea of listing all values is a bit of a streth.
Suppose we are measuring the time needed for a piee of equipment to fail,
priniple,
T
T.
In
an take any non negative real number as a value. However, we may
still treat it as a disrete r.v., if we take into aount that our measurements
will be inevitably limited in preision.
Hene, if we an be aurate to the
minute, and T is measured in hours, the only values we an observe will be
2
61
1
, 60
, · · · , 1, 60,
, · · ·. Note that, in theory, this sequene an go on indenitely.
0, 60
Hene, formulas like (1), (2), (3), (4), (5), (6), and (7) an always be thought
as making sense. Of ourse, if we are dealing with a huge number of values, eah
with a very small probability, alulating these formulas an beome exeedingly
diult.
For this reason, mathematiians have developed a tool to evaluate
these sums to great auray, without having to atually add all these many
small terms. If you will take a alulus lass, you will learn how this problem of
omputing long sums of very small addends has been solved by the introdution
of integrals. Sine we will not really need to perform those sums (we'll rely on
the work of others who bothered to do that), we won't go into this eld. Of
ourse, if you should pursue the study of statistis beyond introdutory ourses,
you will denitely need to inlude alulus in your bag of tools!
The one thing that we need to remember is the following: suppose that we
have a r.v. taking very many values, very lose one to the next, eah with very
small probability. Draw a histogram of this distribution: the probability of the
event
values
a ≤ X ≤ b
a and b:
is given by the area below the histogram between the two
6
3 A Note on Continuous Random Variables
Now, if we think of eah step to be very small, almost invisible (and note
how this an be done by seleting a suitable sale for the units used to measure
our variable!), we may substitute a smooth urve for the ragged line we have
above:
And, just as with the ragged urve, the probability for
X
to be between
two values will be the area under the urve, between these two values. More
often than not this is not a simple alulation, but there are plenty of tables
and omputer programs that an do the work for us.
7
4 A Few Examples
4
A Few Examples
The book onentrates on two ases: the binomial distribution, as the example
disrete distribution, and the so-alled normal distribution, as the example ontinuous distribution. There are good reasons to onentrate on the latter (the
main one being a deep theorem known as the Central Limit Theorem, whih
we will disuss when applying all this material to statistial problems), and less
so for the former (that is, the binomial distribution is a useful model, but it
is extremely far from the exlusive disrete model). Here are a few dierent
examples, with a brief mention of where they may arise.
4.1
Disrete Distributions
4.1.1
The Geometri Distribution
Suppose an event may our with probability
p.
If we repeat it over and over,
in an independent way, the time of rst ourrene will be given by
P [N = k] = p (1 − p)k−1
If you are urious, you may inquire for the rst time the event will have
ourred 2, 3, ...
m
times. You will nd the result in any book on probability,
as the hypergeometri distribution
4.1.2
The Multinomial Distribution
The binomial distribution is nie, but it onsiders two outomes only (say, win
or lose). What if we have several dierent possible outomes (for instane we
repeatedly toss a die, in an independent way what is the probability of having
a ertain number of 1's, of 2's, ...?
The formula is a slight ompliation of
a1 , a2 , . . . , am , eah with
p1 , p2 , . . . , pm and you repeat this experiment n times, you'll nd that
the probability of having n1 times the outome a1 , n2 times the outome a2 , and
so on, will be given by the formula (not that, neessarily, n1 + n2 + · · · nm = n
the binomial formula: if you have possible outomes
probability
n!
pn1 pn2 · · · pnmm
n1 !n2 ! · · · nm ! 1 2
For
m=2
4.1.3
this is just the binomial formula.
The Poisson Distribution
This is a distribution assoiated with rare events. In fat, it an be dedued as
the limiting ase (in an appropriate sense) of the binomial distribution when the
probability of one of the outomes is very small, and the number of attempts is
very large. The lassial example is the number of alls arriving at a swithboard over a xed amount of time.
This distribution is extremely useful in
ruial appliations sine, for example, it is a simple but not unrealisti model
8
4 A Few Examples
for the number of requests to use a resoure in a network (a omputer network,
an eletrial network, the number of ustomers arriving at a teller, ...). It turns
out that the number
N
in question has distribution
P [N = k] = e−λ
where
λ
λk
k!
is a parameter that, in a sense, is onneted the average time between
requests (the higher the value of
λ
the more intense the ow of requests). This
is a probability distribution beause of the remarkable formula (whih won't
make sense until you learn some alulus)
∞
X
λk
k=0
Inidentally, it turns out that
4.2
k!
EN = λ,
= eλ
and
V ar (N ) = λ
as well!
Continuous Distributions
Depending on the appliations, there are myriads of distributions that have been
onsidered. A few examples follow.
4.2.1
The Exponential Distribution
This is the ontinuous analog of the Geometri Distribution above (and it is
onneted to it in a very preise mathematial sense). Here you have a possible
model of rst time to an event, like the breakdown of a piee of equipment.
One way of giving a formula is through
P [T > t] = e−λt
from whih you an easily dedue that
P [a < T < b] = e−λa − e−λb
With more alulations than we an perform here, you would nd that
ET =
1
1
, V ar (T ) = 2
λ
λ
If this reminds you of the Poisson distribution's formulas, you are right: there is
a strong onnetion between the two distributions: if the number of arrivals has
a Poisson distribution, the time between arrivals has an exponential distribution,
and the λ is the same!
4.2.2
The Weibull Distribution
This a variation to the exponential distribution, used in survival analysis, when
the exponential model is not appropriate;
P [T > t] = e−λt
α
where
α>0
is another parameter
9
4 A Few Examples
4.2.3
The Beta and the Gamma Distribution
Just as the exponential distribution onerns the rst arrival time in a Poisson
ow, and, in the disrete ase, onsidering multiple arrivals led from the geometri to the hypergeometri distribution, in the ontinuous ase the analog is
the so-alled Gamma distribution. Also, in a number of problems more or less
onneted to the same setup, another distribution omes up, alled the Beta
distribution. We won't be onerned with these more omplex ases.
4.2.4
The Cauhy Distribution
One you allow for innitely many outomes, you an't be really sure that
EX, EX 2 , and so on make sense. In fat, there
the parameters we dened,
are many examples where they don't. These examples have beome of greater
interest sine the explosion of Finanial Mathematis, where the modeling
of the distribution of stok pries has led to onsider some of these exoti
examples. The granddaddy of these examples is the Cauhy distribution. This
is a distribution where the probability of the random variable to fall between
and
b
is given by the area under the urve
1
1
·
π 1 + x2
a
10
4 A Few Examples
If you think this looks very muh like the Normal Distribution, you are only
seeing a superial similarity. In fat, while for a normally distributed variable
X,
with parameters
µ
and
σ,
EX = µ, V ar (X) = σ
attempts to ompute expeted value and variane for a Cauhy variable are
fruitless: the numbers beome innitely large...