Download Now we will cover discrete random variables. A random variable is a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Now we will cover discrete random variables.
A random variable is a function which maps from the sample space Ξ© to the set of real numbers.
Say I have a sample space of people. Say I am interested in their height in inches. Let H be the
function from the sample space to the set of real numbers. In other words, H is a random
variable.
Here is the picture:
Μ… = 2.5H (i.e. the height in cm). 𝐻
Μ… is a function of H. 𝐻
Μ… is also a
We also wish to define 𝐻
random variable because everything in the sample space gets mapped to something.
It is a random variable which is a function of another random variable. Thus, a function of
random a variable is also random variable.
Notation:
-
Random variable X (capital letter) (i.e. X : function Ξ© οƒ  ℝ).
Numerical value x (lower case letter) (i.e. x Ο΅ ℝ).
The Probability Mass Function (PMF) tells us how likely it is that we get an outcome that leads
to a particular numerical value. We calculate the overall probability for each outcome that leads
to a particular numerical value and we generate a bar graph. The bar graph we give a name to
which is pX(x) which is the probability mass function.
The picture is as follows.
Probability mass function definitions, notations, and observations:
px(x) = P(X = x)
= P((w Ο΅ Ξ© s.t. X(w) = x))
Note that
pX(x) β‰₯ 0 and βˆ‘π‘₯ px(x) = 1
which means that probabilities are nonnegative and the sum of all the possible probabilities adds
to 1.
Example
X number of heads in n ( n is fixed) independent coin tosses
P(H) = p
Let n = 4, then
pX(2) = P(HHTT) + P(HTHT) + P(HTTH) + P(THHT) + P(THTH) + P(TTHH)
= 6p2(1- p)2
= (42)p2(1- p)2
In general, we have
pX(k) = (π‘›π‘˜)pk(1-p)n-k
Expected Value:
Consider the following example.
Consider the following computation:
1
1
1
5
1(6) + 2(2) + 4(3) = 2
We think of this value as an average of sorts. What we did was take each value, multiply it by its
probability of occurring, and took the sum for all x.
In general, what we did was
βˆ‘ π‘₯ βˆ™ px(x)
π‘₯
We have a name for this computation and it is called the expected value. It is denoted by E[X].
Therefore, we have the following definition.
E[X] = βˆ‘π‘₯(π‘₯ βˆ™ px(x))
Properties of Expectations
Let X be a random variable and let Y = g(X)
We have the following picture.
Suppose we wish to compute E[Y].
We could do this by using the definition;
βˆ‘ 𝑦 βˆ™ py(y)
𝑦
This means that for every particular value of y, we would collect all the outcomes that lead to
that value of y and compute their probability.
It does the addition of the β€œy line.”
An alternative way, is the do the addition over the β€œx line.” Instead of using the probabilities of
y, we use the probabilities of x, multiply by g(x) instead of x, and sum for all x.
With the formula,
βˆ‘ 𝑔(π‘₯) βˆ™ px(x)
π‘₯
The picture is as follows:
We can see that we do the same arithmetic both ways. A proof will not be included here.
Properties:
If Ξ± and Ξ² are real numbers, then
E[Ξ±] = ?
Recall the definition,
E[X] = βˆ‘π‘₯(π‘₯ βˆ™ px(x))
Here, the only value of x is Ξ±, and pX(x) = 1 (since everything in the sample space maps to Ξ±)
Here is the picture
So we have
E[Ξ±] = βˆ‘π‘₯(π‘₯ βˆ™ px(x))
= βˆ‘π‘₯(𝛼 βˆ™ px(x))
= 𝛼 βˆ‘π‘₯ px(x)
=Ξ±βˆ™1
=Ξ±
Now let’s think about
E[Ξ±X].
We recall our definition and rule for X is a random variable and Y = g(X):
E[X] = βˆ‘π‘₯(π‘₯ βˆ™ px(x))
And
E[Y] =βˆ‘π‘₯(𝑔(π‘₯) βˆ™ px(x))
Here, g(x) = Ξ±X, so
E[Ξ±X] = βˆ‘π‘₯(𝑔(π‘₯) βˆ™ px(x))
= βˆ‘π‘₯(𝛼π‘₯ βˆ™ px(x))
=Ξ± βˆ‘π‘₯(π‘₯ βˆ™ px(x))
= Ξ±E[X]
Similarly,
E[Ξ±X + Ξ²] = βˆ‘π‘₯(Ξ±x + Ξ²)px(x)
= βˆ‘π‘₯(Ξ±x)px(x) + βˆ‘π‘₯(Ξ²)px(x)
= Ξ± βˆ‘π‘₯ x βˆ™ px(x) + Ξ² βˆ‘π‘₯ px(x)
= α𝐸[𝑋] + Ξ²(π‘₯)(1)
= α𝐸[𝑋] + Ξ²(π‘₯)
Note:
E[X2] = βˆ‘π‘₯ π‘₯ 2pX(x)
If we think of E[X] as the average, then we define variance to be the average squared deviation
from the mean. It is denoted V(X).
Visually, what we do is take the deviation from the mean and square it. This gives more
emphasis to big distances, such as outliers. We have the following picture.
We square the distance to the mean (and do so far all points), then take the average of that. If the
variance is large, then points are distributed far away from the mean. If the variance is small,
then the likely values are around the mean.
We now derive a formula to compute the variance.
var(X) = E[|X – E[X]|2]
= E[(X – E[X])2]
= βˆ‘π‘₯(x βˆ’ E[X])2pX(x)
= βˆ‘π‘₯(x2 + (E[X])2 – 2E[X]xpX(x)
= βˆ‘π‘₯ x2pX(x) + βˆ‘π‘₯(E[X])2pX(x) - βˆ‘π‘₯ 2E[X]xpX(x)
= βˆ‘π‘₯ x2pX(x) + (E[X])2 βˆ™ βˆ‘π‘₯ 𝑝X(x) - 2E[X] βˆ™ βˆ‘π‘₯ x βˆ™ pX(x)
= E[X2] + (E[X])2 βˆ™ (1) - 2E[X] βˆ™ E[X]
= E[X2] – (E[X])2
So, if we wish to compute the variance, we have the formula
var(X) = E[X2] – (E[X])2
You may be wondering why we gave a complicated definition for the variance to tell us
information about the average distance from the mean. The answer is because we had to.
Consider the following quantity which seems like it would do the job.
E[X – E[X]]
This is the average distance from the mean.
Recall that
E[Ξ±X + Ξ²] = Ξ±E[X] + Ξ²
Therefore,
E[X – E[X]] = E[X] – E[X] = 0
This tells us that the average deviation from the mean is 0. This is not a useful quantity for us.
One solution is to add absolute values to make |X - E[X]|, but it turns out to be more useful to
use the quantity
E[|X – E[X]|2]
We call this quantity the variance.
var(X) = E[|X – E[X]|2] = E[X2] – (E[X])2
A problem with the variance is that it gets the units wrong, if you want to talk about the spread of
a distribution. For example, if we start with units in meters, the variance will have units in
meters squared.
To remedy this problem, we define standard deviation (Οƒ) to be
ΟƒX = βˆšπ‘£π‘Žπ‘Ÿ(𝑋)
It gives us a quantity which is in the same units of the random variable we are dealing with.
Example
Consider the following distribution
We wish to find E[X], V(X), and Οƒ.
E[X] = βˆ‘π‘₯ π‘₯ βˆ™ 𝑝π‘₯(π‘₯)
1
1
1
=1βˆ™4+ 3βˆ™2+ 5βˆ™4
=3
V(X) = E[X2] – (E[X])2
Note:
E[X2] = βˆ‘π‘₯ π‘₯ 2 βˆ™ pX(x)
1
1
1
= 12 βˆ™ 4 + 3 2 βˆ™ 2 + 5 2 βˆ™ 4
= 11
Therefore,
var(X) = E[X2] – (E[X])2
= 11 - 32
=2
Thus,
𝜎X = √2