Download Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Probability Distributions
A probability distribution is a mapping of all the possible values of a random variable to their corresponding
probabilities for a given sample space.
The probability distribution is denoted as
which can be written in short form as
The probability distribution can also be referred to as a set of ordered pairs of outcomes and their probabilities. This
is known as the probability function f(x).
This set of ordered pairs can be written as:
where the function is defined as:
Cumulative Distribution Function (CDF)
The Cumulative Distribution Function (CDF) is defined as the probability that a random variable X with a given
probability distribution f(x) will be found at a value less than x. The cumulative distribution function is a cumulative
sum of the probabilities up to a given point.
The CDF is denoted by F(x) and is mathematically described as:
Discrete Probability Distributions
Discrete random variables give rise to discrete probability distributions. For example, the probability of obtaining a
certain number x when you toss a fair die is given by the probability distribution table below.
x P(X = x)
1 1⁄6
2 1⁄6
3 1⁄6
4 1⁄6
5 1⁄6
6 1⁄6
For a discrete probability distribution, the set of ordered pairs (x,f(x)), where x is each outcome in a given sample
space and f(x) is its probability, must follow the following:

P(X = x) = f(x)

f(x) ≥ 0

∑x f(x) = 1
Cumulative Distribution Function for a Discrete Random Variable
For a discrete random variable, the CDF is given as follows:
In other words, to get the cumulative distribution function, you sum up all the probability distributions of all the
outcomes less than or equal to the given variable.
For example, given a random variable X which is defined as the face that you obtain when you toss a fair die, find
F(3)
The probability function can also found from the cumulative distribution function, for example
given that you know the full table of the cumulative distribution functions of the sample space.
Continuous Probability Distribution
Continuous random variables give rise to continuous probability distributions. Continuous probability distributions
can't be tabulated since by definition the probability of any real number is zero i.e.
This is because the random variable X is continuous and as such can be infinitely divided into smaller parts such that
the probability of selecting a real integer value x is zero.
Consequently, the continuous probability distribution is found as
and so on.
While a discrete probability distribution is characterized by its probability function (also known as the probability mass
function), continuous probability distributions are characterized by their probability density functions.
Since we look at regions in which a given outcome is likely to occur, we define the Probability Density Function (PDF)
as the a function that describes the probability that a given outcome will occur at a given point.
This can be mathematically represented as:
In other words, the area under the curve.
For a continuous probability distribution, the set of ordered pairs (x,f(x)), where x is each outcome in a given sample
space and f(x) is its probability, must follow the following:



P(x_1 < X < x2) = ∫x_1x2 f(x) dx
f(x) ≥ 0 for all real numbers
∫∞∞ f(x) dx = 1
Cumulative Distribution Function for a Continuous Probability Distribution
For a continuous random variable X, its CDF is given by
which is the same as saying:
and
From the above, we can see that to find the probability density function f(x) when given the cumulative distribution
function F(x);
if the derivative exists.
Continuous probability distributions are given in the form
whereby the above means that the probability density function f(x) exists within the region {x;a,b} but takes on the
value of zero anywhere else.
For example, given the following probability density function
Find
1.
P(X ≤ 4)
2.
P(X < 1)
3.
P(2 ≤ X ≤ 3)
4.
P(X > 1)
5.
F(2)
Solutions:
1. P(X ≤ 4)
Since we're finding the probability that the random variable is less than or equal to 4, we integrate the density
function from the given lower limit (1) to the limit we're testing for (4).
We need not concern ourselves with the 0 part of the density function as all it indicates is that the function only
exists within the given region and the probability of the random variable landing anywhere outside of that region will
always be zero.
2. P(X < 1)
P(X < 1) = 0 since the density function f(x) doesn't exist outside of the given boundary
3. P(2 ≤ X ≤ 3)
Since the region we're given lies within the boundary for which x is defined, we solve this problem as follows:
4. P(X > 1)
The above problem is asking us to find the probability that the random variable lies at any point between 1 and
positive Infinity. We can solve it as follows:
but remember that we approximate the inverse of infinity to zero since it is too small
The above is our expected result since we already defined f(x) as lying within that region hence the random variable
will always be picked from there.
5. F(2)
The above is asking us to find the cumulative distribution function evaluated at 2.
Thus F(2) can be found from the above as
Variance and Standard Deviation of a
Random Variable
We have already looked at Variance and Standard deviation as measures of dispersion under the section
on Averages. We can also measure the dispersion of Random variables across a given distribution using Variance and
Standard deviation. This allows us to better understand whatever the distribution represents.
The Variance of a random variable X is also denoted by σ;2 but when sometimes can be written as Var(X).
Variance of a random variable can be defined as the expected value of the square of the difference between the
random variable and the mean.
Given that the random variable X has a mean of μ, then the variance is expressed as:
In the previous section on Expected value of a random variable, we saw that the method/formula for calculating the
expected value varied depending on whether the random variable was discrete or continuous. As a consequence, we
have two different methods for calculating the variance of a random variable depending on whether the random
variable is discrete or continuous.

For a Discrete random variable, the variance σ2 is calculated as:

For a Continuous random variable, the variance σ2 is calculated as:
In both cases f(x) is the probability density function.
The Standard Deviation σ in both cases can be found by taking the square root of the variance.
Example 1
A software engineering company tested a new product of theirs and found that the number of errors per 100 CDs of
the new software had the following probability distribution:
x f(x)
2 0.01
3 0.25
4 0.4
5 0.3
6 0.04
Find the Variance of X
Solution
The probability distribution given is discrete and so we can find the variance from the following:
We need to find the mean μ first:
Then we find the variance:
Example 2
Find the Standard Deviation of a random variable X whose probability density function is given by f(x) where:
Solution
Since the random variable X is continuous, we use the following formula to calculate the variance:
First we find the mean μ
Then we find the variance as:
Simplifying the Variance formula
We have seen that variance of a random variable is given by:
We can attempt to simplify this formula by expanding the quadratic in the formula above as follows:
We shall see in the next section that the expected value of a linear combination behaves as follows:
Substituting the expanded form into the variance equation:
Remember that after you've calculated the mean μ, the result is a constant and the expected value of a constant is
that same constant.
This simplifies the formula as shown below:
but
which means that;
The above is a simplified formula for calculating the variance.
We can also derive the above for a discrete random variable as follows:
but since the total probability is 1
and
Therefore,
where by;
Hence
For a continuous random variable:
whereby
which means that
Variance of an Arbitrary function of a random variable g(X)
Consider an arbitrary function g(X), we saw that the expected value of this function is given by:

For a discrete case

For a continuous case
The variance of this functiong(X) is denoted as σg(X) and can be found as follows:

For X is a discrete random variable

For X is a continuous random variable
Covariance
In the section on probability distributions, we saw that at times we might have to deal with more than one random
variable at a time, hence the need to study Joint Probability Distributions.
Just as we can find the Expected value of a joint pair of random variables X and Y, we can also find the variance and
this is what we refer to as the Covariance.
The Covariance of a joint pair of random variables X and Y is denoted by:
Cov(X,Y).