Download Discrete random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Lecture 2. Discrete random variables
Mathematical Statistics and Discrete Mathematics
November 4th, 2015
1 / 20
Random variables
The setting that we start with is a sample space S and a probability measure P on the
events contained in S.
A random variable is an assignment of a real number to each outcome in S. In other
words, X is a function defined on S and taking values in R (we denote this by
X : S → R). The value taken by X for the outcome ω is denoted by X(ω).
Let S be the combined sample space for three tosses of a die. Let ω = (ω1 , ω2 , ω3 ) ∈ S
be a configuration of three outcomes. Then, the following are random variables:
!X
!X
!X
!X
!X
0
= 2, a constant random variable.
1
= sum of all outcomes; X1 (ω) = ω1 + ω2 + ω3 , X1 (2, 5, 3) = 10.
2
= maximal outcome; X2 (ω) = max{ω1 , ω2 , ω3 }, X2 (2, 5, 3) = 5.
3
= third outcome; X3 (ω) = ω3 , X3 (2, 5, 3) = 3.
4 = index of smallest outcome; X4 (2, 5, 3) = 1 since the first outcome is the
smallest.
2 / 20
Discrete random variables
A set K ⊂ R is countably infinite if one can list all elements of K in an infinite
sequence, that is, one can write, K = {k1 , k2 , k3 , . . .}. Subsets for which this is not the
case are called uncountable.
! The sets of natural and integer numbers are countably infinite.
! Any interval (a, b) with a < b is uncountable.
A random variable X is called discrete if it takes values in a finite or countably infinite
subset of R.
! Toss a coin until first heads. Let X be the total number of tosses. The set of
possible values of X is {1, 2, 3, . . . }, and therefore X is discrete.
! Take a sample of sea water. Let X be the amount of salt per weight unit. The set
1
1
1
2
of possible values of X2 is [0, 1), and therefore X2 is not discrete.
3 / 20
Discrete probability density function
We will use capital letters X, Y, Z, etc. to denote random variables, and small letters x,
y, z, etc. for their observed values.
The knowledge of all possible values is not enough to describe a random variable. We
also need to know how probable it is that the variable attains the values.
The discrete probability density function (PDF) fX : R → [0, 1] of a discrete random
variable X is defined by
fX (x) = P(X = x) = P({ω : X(ω) = x}).
We say that x is a possible value for a discrete random variable X if
fX (x) = P(X = x) > 0.
The probability density function fX contains all information about the random variable
X. We say that two variables have the same distribution if they have the same PDF.
4 / 20
Discrete probability density function
Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of
heads. The only possible values are 0, 1, 2, 3. We have
• fX (0) = P(X = 0) = P(ttt) = P(t)3 = 8/27,
• fX (1) = P(X = 1) = P({htt, tht, tth}) = 3P(t)2 P(h) = 3 · (2/3)2 · 1/3 = 4/9,
• fX (2) = P(X = 2) = P({hht, hth, thh}) = 3P(h)2 P(t) = 3 · (1/3)2 · 2/3 = 2/9,
• fX (3) = P(X = 3) = P(hhh) = P(h)3 = 1/27.
f(x)
1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
-­‐1
0
1
2
3
4
5 / 20
Properties of discrete PDF
A nonnegative function f defined on the real numbers is a PDF of some discrete
random variable if and only if the following two conditions hold:
• The set K = {x : f (x) > 0} is finite or countably infinite,
•
X
f (x) = 1.
x∈K
Moreover, if X is a discrete random variable whose PDF is f , then for any set T ⊂ R
X
P(X ∈ T) =
f (x).
all possible values x of X in T
Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of
heads. We have
P(X ≤ 2.5) = P(X ∈ [0, 2.5]) = fX (0) + fX (1) + fX (2) = 26/27.
6 / 20
Cumulative density function
The cumulative density function (CDF) FX : R → [0, 1] of a random variable X is
defined by
FX (x) = P(X ≤ x).
Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of
heads.
f(x)
1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
-­‐1
0
1
2
3
4
7 / 20
Expectation
Main motivating question: what is the average value that a random variable assumes?
• Let us conduct a simple experiment: toss a die 6 million times, and let x̄ be the
arithmetic average of all outcomes.
• We want to know x̄.
• Since each outcome is equally probable and we toss the die an enormous number
of times, there should be no bias towards any of the possible outcomes.
• This means that each outcome appears roughly 1 million times, and hence
x̄ ∼
106 · 1 + 106 · 2 + 106 · 3 + 106 · 4 + 106 · 5 + 106 · 6
= 3.5
6 · 106
• more generally for non-uniform distributions, the number of appearance of each
outcome should be proportional to its probability.
8 / 20
Expectation
The expectation, expected value, mean value, or simply the mean E[X] of a discrete
random variable X is defined by
X
X
x · P(X = x) =
x · fX (x).
E[X] = µX =
all possible values x of X
all possible values x of X
! Toss a die and let Y be the outcome. Then,
E[Y] = (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5.
! Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of
heads. Then,
E[X] = 0 · fX (0) + 1 · fX (1) + 2 · fX (2) + 3 · fX (3)
= 4/9 + 2 · 2/9 + 3 · 1/27 = 1.
9 / 20
Expectation
Properties of expectation:
• If X and Y have the same distribution, then E[X] = E[Y]. Expectation is a
characteristic of the distribution.
The following is known as linearity of expectation.
• E[c] = c for a constant c.
• E[aX] = aE[X] for any random variable X and a constant a.
• E[X + Y] = E[X] + E[Y] for any pair of random variables X and Y.
• More generally, we have
E[X1 + X2 + . . . + Xn ] = E[X1 ] + E[X2 ] + . . . + E[Xn ]
for any choice of random variables X1 , X2 , . . . , Xn .
Exercise: check the facts above using the definition of expectation.
10 / 20
Expectation
Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of
heads. For i = 1, 2, 3, let Xi = 1 if the outcome is heads in the ith toss. We have
X = X1 + X2 + X3 .
Note that each Xi , i = 1, 2, 3, describes the same kind of experiment, and hence X1 ,
X2 , and X3 have the same distribution. We have,
E[X1 ] = E[X2 ] = E[X3 ] = 0 · 2/3 + 1 · 1/3 = 1/3.
Using linearity of expectation, we get
E[X] = E[X1 ] + E[X2 ] + E[X3 ] = 1.
No need to count all possible configurations as before!
11 / 20
Expectations of functions of random variables
Let X be a discrete random variable and let H : R → R be a function. The
expectation, expected value, mean value, or simply the mean E[H(X)] of H(X) is
defined by
X
X
H(x) · P(X = x) =
H(x) · fX (x).
E[H(X)] =
all possible values x of X
all possible values x of X
Toss a die and let Y be the outcome. Let H(x) = x2 . We have
E[H(Y)] = E[Y 2 ] = (1 + 22 + 32 + 42 + 52 + 62 )/6 ∼ 15.17.
12 / 20
Variance
Main motivating question: how far from its expectation is a random variable on
average?
The variance of a random variable X is defined to be
Var[X] = σX2 = E[(X − E[X])2 ] = E[(X − µX )2 ].
It is usually easier to use the following computational formula:
VarX = E[X 2 ] − (E[X])2 .
Toss a die and let Y be the outcome. We have
VarY = E[Y 2 ] − (E[Y])2 ∼ 15.17 − (3.5)2 = 2.92.
13 / 20
Variance
The standard deviation of a random variable X is defined to be
q
p
σX = σX2 = Var[X].
We say that two random variables X and Y are indpendent if the knowledge about the
values assumed by X does not give any information about the values of Y, and vice
versa.
• If X and Y have the same distribution, then Var[X] = Var[Y]. Variance is a
characteristic of the distribution.
• Var[X] ≥ 0 for any random variable X.
• Var[c] = 0 for a constant c.
• Var[aX] = a2 Var[X] for a constant a and a random variable X.
• Var[X + Y] = Var[X] + Var[Y] for any pair of independent random variables X
and Y.
• In particular Var[X + c] = Var[X] for any random variable X and a constant c.
14 / 20
Important distributions based on a coin toss
A trial is an experiment with two possible outcomes: success or failure.
A trial from a probabilistic point of view is nothing more but a toss of a (possibly
biased) coin.
• Bernoulli trial - a single trial
• Geometric distribution - time until first success in a sequence of independent
Bernoulli trials
• Binomial distribution - the number of successes in a sequence of independent
Bernoulli trials
15 / 20
Bernoulli trial (Bernoulli(p))
A random variable X has Bernoulli distribution with parameter p ∈ [0, 1] if
P(X = 1) = p = 1 − P(X = 0).
We denote this by writing X ∼ Bernoulli(p). We interpret the event {X = 1} as
{success} and {X = 0} as {failure}.
Properties:
• fX (1) = p, fX (0) = 1 − p,
• E[X] = p,
• Var[X] = p(1 − p).
16 / 20
Geometric distribution (Geom(p))
A random variable X has geometric distribution with parameter p ∈ [0, 1] if
fX (x) = (1 − p)x−1 p,
for x = 1, 2, 3, . . . .
We denote this by writing X ∼ Geom(p).
Properties:
• E[X] = 1p ,
• Var[X] =
1−p
,
p2
• FX (x) = 1 − (1 − p)bxc , for x ≥ 0 where
bxc = floor of x = round down integer value.
17 / 20
Geometric distribution (Geom(p))
One should think of X ∼ Geom(p) as the time of the first success in an infinite
sequence of independent trials Bernoulli(p).
Suppose that you play a 6 out of 49 lottery every week. Let X be the first time you win
since you started playing.
How long do you wait on average? We have
X ∼ Geom(1/ 49
6 ). Hence,
49
E[X] =
weeks ∼ 14 million weeks ∼ 270 thousand years.
6
18 / 20
Binomial distribution (Binom(n, p))
A random variable X has binomial distribution with parameters p ∈ [0, 1] and
n = 1, 2, 3, . . . if
n x
fX (x) =
p (1 − p)n−x , for x = 1, 2, 3, . . . .
x
We denote this by writing X ∼ Binom(n, p).
Properties:
• E[X] = np,
• Var[X] = np(1 − p),
Pbxc • FX (x) = i=0 nx px (1 − p)n−x , for x ≥ 0.
19 / 20
One should think of X ∼ Binom(n, p) as the total number of successes in a sequence
of n independent trials Bernoulli(p).
Toss a die 12 times and let X be the total number of 5’s. Then X ∼ Binom(12, 1/6),
and for example EX = 12/6 = 2.
20 / 20