Download Random Variables, Distributions and Expectations The Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
Random Variables, Distributions and Expectations
The Distribution Function
--Definition of a Distribution Function
I.
Introduction:
Probability is the branch of mathematics that studies the possible outcomes of
given events together with the outcomes' relative likelihoods and distributions. In
common usage, the word "probability" is used to mean the chance that a particular event
(or set of events) will occur expressed on a linear scale from 0 (impossibility) to 1
(certainty), also expressed as a percentage between 0 and 100%. The analysis of events
governed by probability is called statistics.
There are several competing interpretations of the actual "meaning" of probabilities.
Frequentists view probability simply as a measure of the frequency of outcomes (the
more conventional interpretation), while Bayesians treat probability more subjectively as
a statistical procedure that endeavors to estimate parameters of an underlying distribution
based on the observed distribution.
A properly normalized function that assigns a probability "density" to each possible
outcome within some interval is called a probability density function (or probability
distribution function), and its cumulative value (integral for a continuous distribution or
sum for a discrete distribution) is called a distribution function (or cumulative distribution
function).
The distribution function
, also called the cumulative distribution function (CDF) or
cumulative frequency function, describes the probability that a variate
takes on a value
less than or equal to a number . The distribution function is sometimes also denoted
(Evans et al. 2000, p. 6).
The distribution function is therefore related to a continuous probability density function
by
(1)
(2)
so
(when it exists) is simply the derivative of the distribution function
(3)
Similarly, the distribution function is related to a discrete probability
by
(4)
(5)
There exist distributions that are neither continuous nor discrete.
A joint distribution function can be defined if outcomes are dependent on two
parameters:
(6)
(7)
(8)
Similarly, a multivariate distribution function can be defined if outcomes depend on
parameters:
(9)
The probability content of a closed region can be found much more efficiently than by
direct integration of the probability density function
by appropriate evaluation of the
distribution function at all possible extrema defined on the region (Rose and Smith 1996;
2002, p. 193). For example, for a bivariate distribution function
content in the region
,
is given by
, the probability
(10)
but can be computed much more efficiently using
(11)
Given a continuous
, assume you wish to generate numbers distributed as
using
a random number generator. If the random number generator yields a uniformly
distributed value
in
for each trial , then compute
(12)
The formula connecting
with a variable distributed as
is then
(13)
where
is the inverse function of
. For example, if
were a normal
distribution so that
(14)
then
(15)
A distribution with constant variance of for all values of is known as a homoscedastic
distribution. The method of finding the value at which the distribution is a maximum is
known as the maximum likelihood method.
II.
Definition:
In probability theory and statistics, the cumulative distribution function (CDF), or just
distribution function, completely describes the probability distribution of a real-valued
random variable X. Cumulative distribution functions are also used to specify the
distribution of multivariate random variables.
For every real number x, the CDF of a real-valued random variable X is given by
where the right-hand side represents the probability that the random variable X takes on a
value less than or equal to x. The probability that X lies in the interval (a, b] is therefore
FX(b) − FX(a) if a < b.
If treating several random variables X, Y, ... etc. the corresponding letters are used as
subscripts while, if treating only one, the subscript is omitted. It is conventional to use a
capital F for a cumulative distribution function, in contrast to the lower-case f used for
probability density functions and probability mass functions. This applies when
discussing general distributions: some specific distributions have their own conventional
notation, for example the normal distribution.
The CDF of X can be defined in terms of the probability density function ƒ as follows:
Note that in the definition above, the "less than or equal to" sign, "≤", is a convention, not
a universally used one (e.g. Hungarian literature uses "<"), but is important for discrete
distributions. The proper use of tables of the binomial and Poisson distributions depend
upon this convention. Moreover, important formulas like Levy's inversion formula for the
characteristic function also rely on the "less or equal" formulation.
III.
Definition of Terms:
Probability Distribution
The probability distribution of a discrete random variable is a list of probabilities
associated with each of its possible values. It is also sometimes called the probability
function or the probability mass function.
More formally, the probability distribution of a discrete random variable X is a function which
gives the probability p(xi) that the random variable equals xi, for each value xi:
p(xi) = P(X=xi)
It satisfies the following conditions:
a.
b.
Cumulative Distribution Function
All random variables (discrete and continuous) have a cumulative distribution function. It
is a function giving the probability that the random variable X is less than or equal to x,
for every value x.
Formally, the cumulative distribution function F(x) is defined to be:
for
For a discrete random variable, the cumulative distribution function is found by summing
up the probabilities as in the example below.
For a continuous random variable, the cumulative distribution function is the integral of
its probability density function.
Probability Density Function
The probability density function of a continuous random variable is a function which can
be integrated to obtain the probability that the random variable takes a value in a given
interval.
More formally, the probability density function, f(x), of a continuous random variable X is the
derivative of the cumulative distribution function F(x):
Since
it follows that:
If f(x) is a probability density function then it must obey two conditions:
a. that the total probability for all possible values of the continuous random variable X is 1:
b. that the probability density function can never be negative: f(x) > 0 for all x.
IV.
Examples:
1. Discrete case : Suppose a random variable X has the following probability distribution
p(xi):
xi
0
1
2
3
4
5
p(xi
1/3
5/3
10/3
10/3
5/3
1/3
)
2
2
2
2
2
2
This is actually a binomial distribution: Bi(5, 0.5) or B(5, 0.5). The cumulative
distribution function F(x) is then:
xi
0
1
2
3
4
5
F(xi
1/3
6/3
16/3
26/3
31/3
32/3
)
2
2
2
2
2
2
F(x) does not change at intermediate values. For example:
F(1.3) = F(1) = 6/32
F(2.86) = F(2) = 16/32
2.
3. This discrete probability density function models experiments that have only two possible
outcomes. The probability of success is p and the probability of failure is q=1-p. The pdf
models the probability that we will observe r sucesses and n-r failures in a total of ntrials.
Figure 1: Graph of the probability distribution function and the
cumulative probability distribution function (redrawn from
4. Suppose that two of fair dice are tossed. This time, let the random variable X denote the
sum of the points. What is the sample space and what is the probability distribution for
this experiment? In the Sample Space below, the first number of the ordered pair is the
number showing on the first die, and the second number is the number showing on the
second die. Notice that there are thirty-six possible results so the sample space has thirtysix elements.
(1,6)
(2,6)
(3,6)
(4,6)
(5,6)
(6,6)
(1,5)
(2,5)
(3,5)
(4,5)
(5,5)
(6,5)
(1,4)
(2,4)
(3,4)
(4,4)
(5,4)
(6,4)
(1,3)
(2,3)
(3,3)
(4,3)
(5,3)
(6,3)
(1,2)
(2,2)
(3,2
(4,2)
(5,2)
(6,2)
(1,1)
(2,1)
(3,1)
(4,1)
(5,1)
(6,1)
Table
1:
Table:
Sample
space
(Redrawn
from
cne.gmu.edu/modules/dau/prob/distributions/dis_1_frm.html example 2 probabilty
distributions)
5. In the Probability Distribution Table below, X is the sum of the two numbers showing on
the dice. If X = 2, the number showing on the first die must be one and the second die
also is one. The distribution table shows there is only one chance out of thirty-six that
both dice show one. When X = 3, the first die shows 1 and the second die shows 2 or vice
versa. Thus there are two chances in thirty-six of this happening.
1
1
1
0
1
2
4
3
2
1
/
/
/
/
/
3
3
3
3
3
3
6
6
6
6
6
6
x
2
3
4
5
6
7
8
9
f
1
2
3
4
5
6
5
(
/
/
/
/
/
/
x
3
3
3
3
3
)
6
6
6
6
6
Table
2:
Table:
Probability
distribution
table
(Redrawn
from
cne.gmu.edu/modules/dau/prob/distributions/dis_1_frm.html example 2 probabilty
distributions)
Probability Distribution
V.
Summary:
Distribution function, completely describes the probability distribution of a real-valued
random variable X.
This was popularly classified or defined and explained as the two; the cumulative
distribution function and probability density function. All random variables (discrete and
continuous) have a cumulative distribution function. It is a function giving the probability
that the random variable X is less than or equal to x, for every value x, while the
probability density function of a continuous random variable is a function which can be
integrated to obtain the probability that the random variable takes a value in a given
interval.