Download Exponential distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
UNIT-1
Random variables:
Random variables:
Discrete
Continuous
Probability
Distribution
Discrete
Binomial D.
Continuous
Poisson D.
Related Problems, mean,
Normal D.
Variance, standard deviation
In a random experiment, the outcomes are governed by
chance mechanism and the sample space’s consists of all
outcomes of the experiment when the elements of the
sample space are non-numeric, they can be quantified by
assigning a real no to every event of the sample space. This
assignment rule is known as random variable.
Random Variable:
A random variable X on a sample space S is a function
X:S→R from S to the set of real no.s which assigns a real
number X(x) to each sample point of S.
s
r
x
x(x)
[i.e., the pre image of every element of R is an event of S]
Range:
The range space Rx is the set of all possible values of
X:Rx≤R
Note: Although X is called a random variable, it is intact a
single valued fn.
X denotes random variable and x denotes one of its values.
Discrete Random Variable:
A random variable X is said to be discrete random variable
if its set of all possible outcomes, the sample space S is
countable (termite or countable infinite).
Counting Problems give rise to discrete random variables.
Continuous Random variable:
A random variable X is said to be continuous random
variable if S contains infinite no.s equal to the no. of points
on a live segment.
i.e., it takes all the possible values in an interval.
(An internal contains uncountable no. of possible values).
Ex: Consider the experiment of tossing a coin twice.
Sample space S={HH,HT,TH,TT}. Define X:S→R by
X(s)=no. of heads in S
. X (s )  2 X (s )  1X (s )  1X (s )  0
Range of X   X (8) :8  s  0,1, 2
1
2
3
4
2. Consider the experiment of throwing a pair of dice and
noting sum
S={(1,1)(1,2)_ _ _ _(6,6)}
Then the random variable for this experiment is defined as
X : S  RbyX (i, j )  i  j(i, j )  S .
{1.If X and Y are two random variables defined on S and a
and b are two real no.s, then.
(i)ax+by is also a random variable . In particular x-y is also
a random variable.
(ii) xy is also a random variable
(iii) IfX (s)  0s  sthen 1/x is also a random a
variable.}
.
It an a random experiment, the event corresponding to
a number a occurs, then the corresponding random variable
X assumes a and the probability of that event is denoted by
p(x=a) similarly, the probability of the event x assuming
any value in the interval a<x<b).
The probability of the event x≤c is written as p(x.≤c).
Note that more than one random variable can be
defined in a sample space
Discrete random variable:
It we can count the possibilities, of x1 then x is discrete.
Ex: The random variable x=the sum of the dots on two dice
is discrete.
X can assume the value 2,3,4,5………12.
Continuous:
In an interval of real no.s, there are an infinite no. of
possible values.
Ex: The random variable x= the time that an athlete crosses
the winning live.
[Probability density function:
The p.d.f. of a random variable x denoted by of (x) has the
favoring properties.
f ( x) 
1.
 f ( x)dx  1
2.


3.
P( E )   f ( x)dx
Where E is any event.
E
Note: P(F) = 0 does not imply that E is nill event or
impossible event.]
Probability distribution or distribution:
The probability distribution or distribution f(x) of a random
variable X is a description of the set of possible values of
(range of x) along with the probability associated with each
of x
Ex: Let x=the no.of heads in tossing two coins
X=x
0
1
2
F(x) = P(X=x)
1/4
2/4=1/2
1/4
Cumulative distribution function:
The cumulative distribution function for a random variable
x is defined by F(x) = P(x≤x) where x is any real no. (…….)
Properties:
1. If a<b then p(a<x≤b) = F(b)-F(a).
2. P(a≤x≤b) = p(x=a)+f(b)-f(a)
3. P(a<x<b) = F(b)-F(a)-p(x=b)
4. P(a≤x<b)=F(b)-F(a)-P(x)
According to the type of random variables, we have two
types of probability distributions.
1 Discrete probability distribution
2. Continuous probability distribution.
Discrete Probability distribution:
Let x be a discrete random variable. The discrete
probability distribution function f(x) for x is given by
satisfying the properties
f ( x)  p( x  x)xorf ( xi )  p( x  xi )i  1, 2...
1. P(xi) ≥0 V i
2. ∑P(xi) = 1
1. f(x)>,0 V x
2. ∑f(x) = 1 x←x
The discrete probability function can also be called as
probability mass function.
Any function satisfying the above 2 properties, will be a
discrete probability fn. or probability mass function.
X=x x1 x2 x3
P(x=x) P1 P2 P3
Ex: x: the sum of no.s which turn on tossing a pair of dice.
X=xi
2
3
4
5
6
7
8
9
10 11
12
P(x=xi)1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36
1/36
1 .P(xi)≥0
2. ∑P(xi)=1
X x1 x2 ---------------Xn
P(x) p(x1)
p(x2)
--------------P(Xn)
1. P (X<xi) = P (x1) + P (X2) + ----------+P(Xi-i)
2. P (X≤Xi) = P (x1) + P (X2)+ ---------+P (Xi-1) +
P(Xi)
3. P (X>Xi) = 1 – P (x≤xi)
Check whether the following can act as discrete probability
functions.
1. F(x) = X - 2/2 for X = 1,2,3,4
2. F(x) =XF2/25 x = 0,1,2,3,4
(1)
cannot f(x)<0 for x=1
(2)
cannot ∑f(x)≠1
General properties:
Expectation, mean, variance & Standard deviation :
Expectation:
Let a random variable x assumes the values x1,x2--------xn
with respective probabilities f1,f2,--------fn. Then the
expectation of X E(x) is defined as the sum of different
values of x and the corresponding probabilities.
E( X ) 
n

i 1
pi xi
Results:
1. It x is a random variable and k is a constant then
a. F(x+k) = F(x)+k
b. E(x)
2 If x and y are two discrete random variables, then E(x+y)
= E(x)+E(y)
Note:
1. E(x+y+z) =E(x) + E(y) + E(z)
2. E(ax+by) = a E(x) + bE(y)
3. E(xyz) = E(x) E(y) E(z)
Mean:
The mean value u of a discrete distribution
function is given by
n
px
i 1
i
i
 E( X )
Variance: The variance of a discrete distribution function is
given by
n
   pi x 2i   2
2
i 1
Standard deviation:
It is nothing but the positive square root of the variance.
  p x 
n
2
2
i 1
i
2
i
Continuous Probability distribution:
When a random variable x takes every value in an interval,
it is called an continuous random variable.
Ex: Temperature, heights and weights.
The continuous random variable will form a curve that can
be used to calculate the probabilities via areas.
Def: Let x be a continuous random variable. A function f(x)
is said to be a continuous probability function of x if for
any [a,b] a,b←R such that
1. f(x) ≥0
2.  f ( x)dx  1

The total area bounded by

graph f(x) and horizontal axis is
3. p(a≤x≤b) =  f (t )dt
a and b any two values of x
b
a
satisfying a<b.
Continuous probability function f(x) is also called as
probability density function.
Def: the cumulative distribution function of a continuous
random variable x is defined by f(x) = p (x≤xx)
x
F(x) = 
f (t )dt
where f(x) is Continuous probability function

By def. f(x) =
d
( F ( x ))
dx
General Properties;
Let X be a ran dom variable with p.d.f f(x)
The mean of X,

   xf ( x)dx

The variance of X denoted by


   ( x   ) f ( x)dx   x2 f ( x)dx   2
2
2


The standard deviation of X denoted by


  x   
2
f ( x)dx
Results:
If X is a continuous random variable and Y=ax+b then
E(Y)=aE(X)+b
And V(X)= a Var(Y)
Var(X+k)=Var(X)
2
V(kX)=



Median:
k 2 Var(X)
xf ( x)dx
In case of continuous distribution, Mediam is a point which
divides the entire distribution into two equal parts.
If X is defined from a to b,M is median then
M
b
1
f
(
x
)
dx

f
(
x
)
dx

a
M
2
Mode:
Mode is the value of x for which f(x) is maxium.
Mode is given by
f ( x)  0 f ( x)  0 fora  x  b
CHEBYSHEV’S INEQUALITY
: "The probability that the outcome of an experiment with the random variable
will fall more than standard deviations beyond the mean of , , is less than
."
Or: "The proportion of the total area under the pdf of outside of standard
deviations from the mean is at most
."
Proof
Let be the sample space for a random variable, , and let
stand for the
pdf of . Let , and partition , such that for every sample point in
Then
.
Clearly
Since the term that evaluates to the variance in
subtracted on the right-hand side.
has been
For any sample point in
Notice that the direction of the inequality changes since
squaring causes the right-hand expression to become
positive.
And for any sample point in
So, for any sample point in
, and so
or
, it can be said that
Dividing each side of the inequality by
Or, in other terms
results in
UNIT –II
Binomial distribution:
Binomial and Poisson distributions are related to
discrete random variables and normal distribution is related
to continuous random variables.
In many cores, it is desirable to have situations called
repeated trials. For this, we develop a model that is useful
in representing the probability distributions pertaining to
the no. of occurrence of an event in repeated trials of an
experiment.
Binomial distribution is discovered by James
Bernoulli
Bernoulli Trials:
If there are n trials of an experiment in which each
trial has only two mutually exclusive, equally likely and
independent outcomes, then they are called Bernoulli trials,
Let us denote the two outcomes by success and failure.
The Bernoulli distribution is a discrete distribution
having two possible outcomes labelled by
and
in
which
("success") occurs with probability and
("failure") occurs with probability
, where
. It
therefore has probability density function
which can also be written
the binomial distribution
The Binomial Distribution is one of the discrete
probability distribution. It is used when there are
exactly two mutually exclusive outcomes of a trial.
These outcomes are appropriately labeled Success
and Failure. The Binomial Distribution is used to
obtain the probability of observing r successes in n
trials, with the probability of success on a single trial
denoted by p.
Formula:
P(X = r) = nCr p r (1-p)n-r
where,
n = Number of events.
r = Number of successful events.
p = Probability of success on a single trial.
nCr = ( n! / (n-r)! ) / r!
1-p = Probability of failure.
Example: Toss a coin for 12 times. What is the
probability of getting exactly 7 heads.
Step 1: Here,
Number of trials n = 12
Number of success r = 7 (since we define
getting a head as success)
Probability of success on any single trial p
= 0.5
Example: Toss a coin for 12 times. What is the
probability of getting exactly 7 heads.
Step 1: Here,
Number of trials n = 12
Number of success r = 7 (since we define
getting a head as success)
Probability of success on any single trial p
= 0.5
Step 2: To Calculate nCr formula is used.
nCr = ( n! / (n-r)! ) / r!
= ( 12! / (12-7)! ) / 7!
= ( 12! / 5! ) / 7!
= ( 479001600 / 120 ) / 5040
= ( 3991680 / 5040 )
= 792
Step 3: Find pr.
pr = 0.57
= 0.0078125
Step 4: To Find (1-p)n-r Calculate 1-p and n-r.
1-p = 1-0.5 = 0.5
n-r = 12-7 = 5
Step 5: Find (1-p)n-r.
= 0.55 = 0.03125
Step 6: Solve P(X = r) = nCr p r (1-p)n-r
= 792 × 0.0078125 × 0.03125
= 0.193359375
The probability of getting exactly 7 heads is 0.19
Step 2: To Calculate nCr formula is used.
nCr = ( n! / (n-r)! ) / r!
= ( 12! / (12-7)! ) / 7!
= ( 12! / 5! ) / 7!
= ( 479001600 / 120 ) / 5040
= ( 3991680 / 5040 )
= 792
Step 3: Find pr.
pr = 0.57
= 0.0078125
Step 4: To Find (1-p)n-r Calculate 1-p and n-r.
1-p = 1-0.5 = 0.5
n-r = 12-7 = 5
Step 5: Find (1-p)n-r.
= 0.55 = 0.03125
Step 6: Solve P(X = r) = nCr p r (1-p)n-r
= 792 × 0.0078125 × 0.03125
= 0.193359375
The probability of getting exactly 7 heads is 0.19
Mean and variance
If X ~ B(n, p) (that is, X is a binomially distributed random
variable), then the expected value of X is
and the variance is
This fact is easily proven as follows. Suppose first that we have a
single Bernoulli trial. There are two possible outcomes: 1 and 0,
the first occurring with probability p and the second having
probability 1 − p. The expected value in this trial will be equal to μ
= 1 · p + 0 · (1−p) = p. The variance in this trial is calculated
similarly: σ2 = (1−p)2·p + (0−p)2·(1−p) = p(1 − p).
The generic binomial distribution is a sum of n independent
Bernoulli trials. The mean and the variance of such distributions
are equal to the sums of means and variances of each individual
trial:
Mode and median
Usually the mode of a binomial B(n, p) distribution is equal to
⌊(n + 1)p⌋, where ⌊ ⌋ is the floor function. However when
(n + 1)p is an integer and p is neither 0 nor 1, then the distribution
has two modes: (n + 1)p and (n + 1)p − 1. When p is equal to 0 or
1, the mode will be 0 and n correspondingly. These cases can be
summarized as follows:
In general, there is no single formula to find the median for a
binomial distribution, and it may even be non-unique. However
several special results have been established:




If np is an integer, then the mean, median, and mode
coincide.
Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉.
A median m cannot lie too far away from the mean: |m − np|
≤ min{ ln 2, max{p, 1 − p} }
The median is unique and equal to m = round(np) in cases
when either p ≤ 1 − ln 2 or p ≥ ln 2 or
|m − np| ≤ min{p, 1 − p} (except for the case when p = ½ and
n is odd). When p = 1/2 and n is odd, any number m in the
interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial
distribution. If p = 1/2 and n is even, then m = n/2 is the
unique median.
Data sometimes arise as the number of occurrences ( ) of an event per unit time or
space, e.g., the number of yeast cells per cm2 on a microscope slide. Under certain
conditions (see below), the random variable is said to follow a Poisson distribution,
which, as a count, is type of discrete distribution. Occurrences are sometimes called
arrivals when they take place in a fixed time interval.
The Poisson distribution was discovered in 1838 by Simeon-Denis Poisson as an
approximation to the binomial distribution, when the probability of success is small and
the number of trials is large. The Poisson distribution is called the law of small numbers
because Poisson events occur rarely even though there are many opportunities for these
evens to occur.
^ Poisson Experiment
The number of occurrences of an event per unit time or space will have a Poisson
distribution if:



the rate of occurrence is constant over time or space;
past occurrences do not influence the likelihood of future occurrences;
simultaneous occurrences are nearly impossible.
^ Poisson Distribution
The Poisson probability density function is given by:
where
is the mean of the Poisson random variable, i.e., the average number of
occurrences of the event per unit of time or space. As such, is the rate of occurrence per
unit time or space. For example, if one decay event of a radioactive substance occurs per
second, then
.
The following Poisson Applet can be used to compute Poisson probabilites and quantiles.
(Note:
in the applet.)
Distributional Properties
Notice that the modes of the Poisson probability distribution in the applet (when
) are 3 and 2. This is always the case: if is a positive integer, the modes are:
. If is not an integer, then the mode is: , which is the largest integer
.
Double click on m in the center of the slider and change the increment to 0.5. Animate
the probability density function to see that the defintions of the mode hold.
The Poisson distribution with sufficiently large can be approximated by a Normal
distribution. The approximation is good if
and a continuity correction is used. For
example, the Poisson probability:
,
where the later is computed from the normal distribution with a non-negative integer.
The Poisson applet works for
. Thus, for
, the normal approximation must
be used. Notice that as m in the applet increases, the distribution becomes more
symmetric.
^ Poisson Probabilities
Poisson probabilities can be computed directly from the probability density function or
from Poisson probability tables for certain values of . We start by verifying the rather
surprising fact pointed out in the definition of the mode:
, when is a
positive integer.
When
:
.
^ Poisson Moments
The mean of a Poisson random variable is , i.e.,
. The mean is a rate, e.g.,
a temporal rate for time events. For example, if 0.5 phone calls per hour are received on a
home phone during the day, then the mean number of phone calls between 9 A.M. and 5
P.M. is
.
The Poisson has the interesting property that the variance is also , i.e.,
. Thus, unlike the normal distribution, the variance of a Poisson
random variable depends on the mean. Certain Poisson-like random variables are overdispersed (
) or under-dispersed (
). For example, the negative binomial can
be viewed as an over-dispersed Poisson and, like the Poisson, is often used to model
species abundances in ecology, i.e, certain species abundances are Poisson distributed
and others are distributed as a negative binomial.
The mean, and hence the variance, can be estimated by the sample mean, which is the
maximum likelihood estimator, i.e., if
are realizations of a Poisson
experiment, the estimated mean is:
.
For example, suppose a supervisor wants to know the average number of typing mistakes
his/her secretary makes per page. Ten pages are rondomly selected and the the following
values were obtained:
2131333231
Then:
.
The value of actually used to simulate the data was 2 and thus the estimate is reasonably
close.
^ Poisson Quantiles
^ Poisson Approximation to the Binomial
Consider a binomial distribution consisting of trials with probability of success
i.e,.
for
. If is sufficiently large, then the binomial probability:
,
,
i.e., the binomial probability is approximately equal to the corresponding Poisson
probability. Note that
, the mean of the binomial distribution.
Thus, binomial probabilities, which are hard to compute for large , can be approximated
by corresponding Poisson probabilies. For example, suppose 10,000 soldiers are screened
for a rare blood disease (
). We want the probability at least 10 soldiers test
positive for the disease, i.e, we want
, where
.
This is difficult to compute using the binomial distribution, but much easier for the
Poisson with
.
POISSON DISTRIBUTION
In probability theory and statistics, the Poisson
distribution is a discrete probability distribution
that expresses the probability of a number of events
occurring in a fixed period of time if these events
occur with a known average rate and independently
of the time since the last event. The Poisson
distribution can also be used for the number of
events in other specified intervals such as distance,
area or volume.
The distribution was discovered by Siméon-Denis
Poisson (1781–1840) and published, together with
his probability theory, in 1838 in his work
Recherches sur la probabilité des jugements en
matières criminelles et matière civile ("Research on
the Probability of Judgments in Criminal and Civil
Matters"). The work focused on certain random
variables N that count, among other things, a
number of discrete occurrences (sometimes called
"arrivals") that take place during a time-interval of
given length. If the expected number of occurrences
in this interval is λ, then the probability that there
are exactly k occurrences (k being a non-negative
integer, k = 0, 1, 2, ...) is equal to
where
e is the base of the natural logarithm (e =
2.71828...)
k is the number of occurrences of an event - the
probability of which is given by the function
 k! is the factorial of k
 λ is a positive real number, equal to the
expected number of occurrences that occur
during the given interval. For instance, if the
events occur on average 4 times per minute,
and you are interested in the number of events
occurring in a 10 minute interval, you would use
as model a Poisson distribution with
λ = 10*4 = 40.


As a function of k, this is the probability mass
function. The Poisson distribution can be derived as a
limiting case of the binomial distribution.
The Poisson distribution can be applied to systems
with a large number of possible events, each of
which is rare. A classic example is the nuclear decay
of atoms.
The Poisson distribution is sometimes called a
Poissonian, analogous to the term Gaussian for a
Gauss or normal distribution
Poisson noise and characterizing small
occurrences
The parameter λ is not only the mean number of occurrences , but also its variance (see Table).
Thus, the number of observed occurrences fluctuates about its mean λ with a standard deviation .
These fluctuations are denoted as Poisson noise or (particularly in electronics) as shot noise.
The correlation of the mean and standard deviation in counting independent, discrete occurrences is
useful scientifically. By monitoring how the fluctuations vary with the mean signal, one can estimate
the contribution of a single occurrence, even if that contribution is too small to be detected directly.
For example, the charge e on an electron can be estimated by correlating the magnitude of an
electric current with its shot noise. If N electrons pass a point in a given time t on the average, the
mean current is I = eN / t; since the current fluctuations should be of the order (i.e. the variance of
the Poisson process), the charge e can be estimated from the ratio . An everyday example is the
graininess that appears as photographs are enlarged; the graininess is due to Poisson fluctuations in
the number of reduced silver grains, not to the individual grains themselves. By correlating the
graininess with the degree of enlargement, one can estimate the contribution of an individual grain
(which is otherwise too small to be seen unaided). Many other molecular applications of Poisson
noise have been developed, e.g., estimating the number density of receptor molecules in a cell
membrane.
Related distributions


If and then the difference Y = X1 − X2 follows a Skellam distribution.
If and are independent, and Y = X1 + X2, then the distribution of X1 conditional on Y = y is
a binomial. Specifically, . More generally, if X1, X2,..., Xn are Poisson random variables with
parameters λ1, λ2,..., λn then

The Poisson distribution can be derived as a limiting case to the binomial distribution as the
number of trials goes to infinity and the expected number of successes remains fixed.
Therefore it can be used as an approximation of the binomial distribution if n is sufficiently
large and p is sufficiently small. There is a rule of thumb stating that the Poisson
distribution is a good approximation of the binomial distribution if n is at least 20 and p is
smaller than or equal to 0.05. According to this rule the approximation is excellent if n ≥
100 and np ≤ 10.[1]

For sufficiently large values of λ, (say λ>1000), the normal distribution with mean λ, and
variance λ, is an excellent approximation to the Poisson distribution. If λ is greater than
about 10, then the normal distribution is a good approximation if an appropriate continuity
correction is performed, i.e., P(X ≤ x), where (lower-case) x is a non-negative integer, is
replaced by P(X ≤ x + 0.5).

If the number of arrivals in a given time follows the Poisson distribution, with mean = λ,
then the lengths of the inter-arrival times follow the Exponential distribution, with rate 1 /
λ.
Occurrence
The Poisson distribution arises in connection with Poisson processes. It applies to various
phenomena of discrete nature (that is, those that may happen 0, 1, 2, 3, ... times during a given
period of time or in a given area) whenever the probability of the phenomenon happening is
constant in time or space. Examples of events that may be modelled as a Poisson distribution
include:

The number of soldiers killed by horse-kicks each year in each corps in the Prussian
cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz
(1868–1931).



The number of phone calls at a call center per minute.
The number of times a web server is accessed per minute.
The number of mutations in a given stretch of DNA after a certain amount of radiation.
[Note: the intervals between successive Poisson events are reciprocally-related, following the
Exponential distribution. For example, the lifetime of a lightbulb, or waiting time between buses.]
How does this distribution arise? — The
law of rare events
In several of the above examples—for example, the number of mutations in a given sequence of
DNA—the events being counted are actually the outcomes of discrete trials, and would more
precisely be modelled using the binomial distribution. However, the binomial distribution with
parameters n and λ/n, i.e., the probability distribution of the number of successes in n trials, with
probability λ/n of success on each trial, approaches the Poisson distribution with expected value λ
as n approaches infinity. This provides a means by which to approximate random variables using
the Poisson distribution rather than the more-cumbersome binomial distribution.
This limit is sometimes known as the law of rare events, since each of the individual Bernoulli
events each rarely trigger. The name may be misleading because the total count of success events
in a Poisson process need not be rare if the parameter λ is not small. For example, the number of
telephone calls to a busy switchboard in one hour follows a Poisson distribution with the events
appearing frequent to the operator, but they are rare from the point of the average member of the
population who is very unlikely to make a call to that switchboard in that hour.
Here are the details. First, recall from calculus that
Let p = λ/n. Then we have
For the F term, first take its logarithm:
Using the Stirling formula
The expression for can be further simplified to
Therefore .
Consequently the limit of the distribution becomes
which now assumes the Poisson distribution.
More generally, whenever a sequence of binomial random variables with parameters n and pn is
such that
the sequence converges in distribution to a Poisson random variable with mean λ (see, e.g. law of
rare events).
Properties

The expected value of a Poisson-distributed random variable is equal to λ and so is its
variance. The higher moments of the Poisson distribution are Touchard polynomials in λ,
whose coefficients have a combinatorial meaning. In fact when the expected value of the
Poisson distribution is 1, then Dobinski's formula says that the nth moment equals the
number of partitions of a set of size n.

The mode of a Poisson-distributed random variable with non-integer λ is equal to , which is
the largest integer less than or equal to λ. This is also written as floor(λ). When λ is a
positive integer, the modes are λ and λ − 1.

Sums of Poisson-distributed random variables:
If follow a Poisson distribution with parameter and Xi are independent, then also follows a
Poisson distribution whose parameter is the sum of the component parameters.


The moment-generating function of the Poisson distribution with expected value λ is
All of the cumulants of the Poisson distribution are equal to the expected value λ. The nth
factorial moment of the Poisson distribution is λn.


The Poisson distributions are infinitely divisible probability distributions.
The directed Kullback-Leibler divergence between Poi(λ0) and Poi(λ) is given by
Generating Poisson-distributed random variables
A simple way to generate random Poisson-distributed numbers is given by Knuth, see References
below.
algorithm poisson random number (Knuth):
init:
Let L ← e−λ, k ← 0 and p ← 1.
do:
k ← k + 1.
Generate uniform random number u in [0,1] and let p ← p × u.
while p ≥ L.
return k − 1.
While simple, the complexity is linear in λ. There are many other algorithms to overcome this. Some
are given in Ahrens & Dieter, see References below.
Parameter estimation
Maximum likelihood
Given a sample of n measured values ki we wish to estimate the value of the parameter λ of the
Poisson population from which the sample was drawn. To calculate the maximum likelihood value,
we form the log-likelihood function
Take the derivative of L with respect to λ and equate it to zero:
Solving for λ yields the maximum-likelihood estimate of λ:
Since each observation has expectation λ so does this sample mean. Therefore it is an unbiased
estimator of λ. It is also an efficient estimator, i.e. its estimation variance achieves the Cramér-Rao
lower bound (CRLB).
Bayesian inference
In Bayesian inference, the conjugate prior for the rate parameter λ of the Poisson distribution is the
Gamma distribution. Let
denote that λ is distributed according to the Gamma density g parameterized in terms of a shape
parameter α and an inverse scale parameter β:
Then, given the same sample of n measured values ki as before, and a prior of Gamma(α, β), the
posterior distribution is
The posterior mean E[λ] approaches the maximum likelihood estimate in the limit as .
The posterior predictive distribution of additional data is a Gamma-Poisson (i.e. negative binomial)
distribution.
The "law of small numbers"
The word law is sometimes used as a synonym of probability distribution, and convergence in law
means convergence in distribution. Accordingly, the Poisson distribution is sometimes called the law
of small numbers because it is the probability distribution of the number of occurrences of an
event that happens rarely but has very many opportunities to happen. The Law of Small Numbers is
a book by Ladislaus Bortkiewicz about the Poisson distribution, published in 1898. Some historians
of mathematics have argued that the Poisson distribution should have been called the Bortkiewicz
distribution.[2]
See also

Anscombe transform - a variance-stabilising transformation for the Poisson distribution

Compound Poisson distribution

Tweedie distributions



Poisson process
Poisson regression

Poisson sampling

Queueing theory
Erlang distribution which describes the waiting time until n events have occurred. For
temporally distributed events, the Poisson distribution is the probability distribution of the
number of events that would occur within a preset time, the Erlang distribution is the
probability distribution of the amount of time until the nth event.

Skellam distribution, the distribution of the difference of two Poisson variates, not
necessarily from the same parent distribution.


Incomplete gamma function used to calculate the CDF.
Dobinski's formula (on combinatorial interpretation of the moments of the Poisson
distribution)


Schwarz formula
Robbins lemma, a lemma relevant to empirical Bayes methods relying on the Poisson
distribution

Coefficient of dispersion, a simple measure to assess whether observed events are close to
Poisson
Examples for rare events:
1. No. of printing mistakes per page.
2. No. of accidents on highway.
3. No. of bad cheques at a bank.
4. No. of blind persons
5. No. of noble prize winners
6. No. of Bharath rattans.
Chapter V: Normal Probability Distribution
The Normal Distribution is perhaps the most important model for studying quantitative
phenomena in the natural and behavioral sciences - this is due to the Central Limit
Theorem. Many numerical measurements (e.g., weight, time, etc.) can be well
approximated by the normal distribution.
The Standard Normal Distribution
The Standard Normal Distribution is the simplest version (zero-mean, unit-standarddeviation) of the (General) Normal Distribution. Yet, it is perhaps the most frequently
used version because many tables and computational resources are explicitly available for
calculating probabilities.
Nonstandard Normal Distribution: Finding Probabilities
In practice, the mechanisms underlying natural phenomena may be unknown, yet the use
of the normal model can be theoretically justified in many situations to compute critical
and probability values for various processes.
Nonstandard Normal Distribution: Finding Scores (Critical Values)
In addition to being able to compute probability (p) values, we often need to estimate the
critical values of the Normal Distribution for a given p-value.
Chapter VI: Relations Between Distributions
In this chapter, we will explore the relationships between different distributions. This
knowledge will help us to compute difficult probabilities using reasonable
approximations and identify appropriate probability models, graphical and statistical
analysis tools for data interpretation. The complete list of all SOCR Distributions is
available here and the Distributome applet provides an interactive graphical interface for
exploring the relations between different distributions.
The Central Limit Theorem
The exploration of the relations between different distributions begins with the study of
the sampling distribution of the sample average. This will demonstrate the universally
important role of normal distribution.
Law of Large Numbers
Suppose the relative frequency of occurrence of one event whose probability to be
observed at each experiment is p. If we repeat the same experiment over and over, the
ratio of the observed frequency of that event to the total number of repetitions converges
towards p as the number of experiments increases. Why is that and why is this important?
Normal Distribution as Approximation to Binomial Distribution
Normal Distribution provides a valuable approximation to Binomial when the sample
sizes are large and the probability of successes and failures is not close to zero.
Poisson Approximation to Binomial Distribution
Poisson provides an approximation to Binomial Distribution when the sample sizes are
large and the probability of successes or failures is close to zero.
Binomial Approximation to Hypergeometric
Binomial Distribution is much simpler to compute, compared to Hypergeometric, and can
be used as an approximation when the population sizes are large (relative to the sample
size) and the probability of successes is not close to zero.
Normal Approximation to Poisson
The Poisson can be approximated fairly well by Normal Distribution when λ is large.
Normal distribution:
Normal distribution is one of the most widely used
continuous probability distribution in applications of
statistical methods. It is of tremendous importance in the
analysis and evaluation of every aspect of experimental
date in science and medicine.
Def: Normal distribution is the probability distribution of a
continuous random variable x, known as normal random
variables or normal variate.
It is also called Gaussian distribution.
Chief characteristics:
1. The graph of the normal distribution y=f(x) in x-y plane
is known as the normal curve.
2. The curve is bell shaped and symmetrical about the line
x=--3. Area under the normal curve is unity i.e., it represents
total population.
4. Mean = median =mode.
5. The curve is symmetrical about the line --------6. x- axis is an asymptote to the curve and the points of
inflexion of the curve an at ----7. Since mean = , the line x = -- divides the total area into
two equal parts.
8. No portion of the curve lies below x – axis.
9.The prob. that the normal vaiate x with mean – and c.d.
between
Importance and applications of normal distribution:
Normal distribution plays a very important role in statistical
theory because of the following reasons
1. Most of the distributions for example, binomial, Poisson
etc can be approximated by Normal distribution..
2. Since it is a limiting case of Binomial distribution for
exceptionally large numbers, it is applicable to many
applied problems in kinetic theory of gases and fluctuations
in the magnitude of an electric current.
3. It is variable is normally distributed, it can sometimes be
brought to normal form by simple transformation of the
variable.
4. The proofs of all the tests of significance in sampling are
based on the fundamental assumption that the population
from which the samples have been drawn are normal.
5. Normal distribution finds large applications in statistical
quality Control.
6. Many of the distributions of sample statistic, the
distributions of sample mean, sample variance etc tend to
normality for large samples and as such they can best be
studied with the help of normal curve.
Area property:
By taking ---====, the standard normal curve is
formed. The probability that the normal variate x with
mean – and s.d. lies, between two specific values --- and –
with -----, can be obtained using area under the standard
normal curve as follows.
Def: A normal random variable with mean ------- and
variance ----- is called standard normal variable. Its
probability density function is given by
Def: The cumulative distribution fn. of a standard normal
random variable is
Normal approximation is binomial distribution:
When n is very large, it is very difficult to calculate
the probabilities by using binominal distribution. Normal
distribution is a limiting case of binomial distribution under
the following conditions.
(1) N, the no. of trials very large n --(2) neither p nor q is very small.
For a b.d. E(x) = np var(x) = npq.
Then the standard normal variate
Z
x u


x  np
npq
tends to the distribution of standard normal
variable given by
z
1
( z ) 
e 2   z  
2
2
If p  q , and for large n, we can approximate binomial curve
by normal curve. Here the interval becomes (x-1/2,x+1/2)
Note: If X is a poisson variable with mean  then the
standard normal variable Z= x   and the probability can be

calculated as explained above
Poisson distribution approaches the normal distribution as
 
A uniform distribution, sometimes also known as a rectangular distribution, is a distribution that has constant
probability.
The probability density function and cumulative distribution function for a continuous uniform distribution on
the interval
are
(1)
(2)
These can be written in terms of the Heaviside step function
as
(3)
(4)
the latter of which simplifies to the expected
for
.
The continuous distribution is implemented as UniformDistribution[a, b].
For a continuous uniform distribution, the characteristic function is
(5)
If
and
, the characteristic function simplifies to
(6)
(7)
The moment-generating function is
(8)
(9)
(10)
and
(11)
(12)
The moment-generating function is not differentiable at zero, but the moments can be calculated by
differentiating and then taking
. The raw moments are given analytically by
(13)
(14)
(15)
The first few are therefore given explicitly by
(16)
(17)
(18)
(19)
The central moments are given analytically by
(20)
(21)
(22)
The first few are therefore given explicitly by
(23)
(24)
(25)
(26)
The mean, variance, skewness, and kurtosis excess are therefore
(27)
(28)
(29)
(30)
Exponential distribution
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Exponential
Probability density function
Cumulative distribution function
parameters:
support:
pdf:
λ > 0 rate, or inverse scale
x ∈ [0, ∞)
λ e−λx
cdf:
mean:
1 − e−λx
λ−1
median:
λ−1 ln 2
0
mode:
λ−2
variance:
2
skewness:
ex.kurtosis: 6
1 − ln(λ)
entropy:
mgf:
cf:
Not to be confused with the exponential families of probability distributions.
In probability theory and statistics, the exponential distribution (a.k.a. negative
exponential distribution) is a family of continuous probability distributions. It describes
the time between events in a Poisson process, i.e. a process in which events occur
continuously and independently at a constant average rate.
Note that the exponential distribution is not the same as the class of exponential families
of distributions, which is a large class of probability distributions that includes the
exponential distribution as one of its members, but also includes the normal distribution,
binomial distribution, gamma distribution, Poisson, and many others.
Contents
[hide]









1 Characterization
o 1.1 Probability density function
o 1.2 Cumulative distribution function
o 1.3 Alternative parameterization
2 Occurrence and applications
3 Properties
o 3.1 Mean, variance, and median
o 3.2 Memorylessness
o 3.3 Quartiles
o 3.4 Kullback–Leibler divergence
o 3.5 Maximum entropy distribution
o 3.6 Distribution of the minimum of exponential random variables
4 Parameter estimation
o 4.1 Maximum likelihood
o 4.2 Bayesian inference
5 Prediction
6 Generating exponential variates
7 Related distributions
8 See also
9 References
[edit] Characterization
[edit] Probability density function
The probability density function (pdf) of an exponential distribution is
Here λ > 0 is the parameter of the distribution, often called the rate parameter. The
distribution is supported on the interval [0, ∞). If a random variable X has this
distribution, we write X ~ Exp(λ).
[edit] Cumulative distribution function
The cumulative distribution function is given by
[edit] Alternative parameterization
A commonly used alternative parameterization is to define the probability density
function (pdf) of an exponential distribution as
where β > 0 is a scale parameter of the distribution and is the reciprocal of the rate
parameter, λ, defined above. In this specification, β is a survival parameter in the sense
that if a random variable X is the duration of time that a given biological or mechanical
system manages to survive and X ~ Exponential(β) then E[X] = β. That is to say, the
expected duration of survival of the system is β units of time. The parameterisation
involving the "rate" parameter arises in the context of events arriving at a rate λ, when the
time between events (which might be modelled using an exponential distribution) has a
mean of β = λ−1.
The alternative specification is sometimes more convenient than the one given above, and
some authors will use it as a standard definition. This alternative specification is not used
here. Unfortunately this gives rise to a notational ambiguity. In general, the reader must
check which of these two specifications is being used if an author writes
"X ~ Exponential(λ)", since either the notation in the previous (using λ) or the notation in
this section (here, using β to avoid confusion) could be intended.
[edit] Occurrence and applications
The exponential distribution occurs naturally when describing the lengths of the interarrival times in a homogeneous Poisson process.
The exponential distribution may be viewed as a continuous counterpart of the geometric
distribution, which describes the number of Bernoulli trials necessary for a discrete
process to change state. In contrast, the exponential distribution describes the time for a
continuous process to change state.
In real-world scenarios, the assumption of a constant rate (or probability per unit time) is
rarely satisfied. For example, the rate of incoming phone calls differs according to the
time of day. But if we focus on a time interval during which the rate is roughly constant,
such as from 2 to 4 p.m. during work days, the exponential distribution can be used as a
good approximate model for the time until the next phone call arrives. Similar caveats
apply to the following examples which yield approximately exponentially distributed
variables:



The time until a radioactive particle decays, or the time between clicks of a geiger
counter
The time it takes before your next telephone call
The time until default (on payment to company debt holders) in reduced form
credit risk modeling
Exponential variables can also be used to model situations where certain events occur
with a constant probability per unit length, such as the distance between mutations on a
DNA strand, or between roadkills on a given road.[citation needed]
In queuing theory, the service times of agents in a system (e.g. how long it takes for a
bank teller etc. to serve a customer) are often modeled as exponentially distributed
variables. (The inter-arrival of customers for instance in a system is typically modeled by
the Poisson distribution in most management science textbooks.) The length of a process
that can be thought of as a sequence of several independent tasks is better modeled by a
variable following the Erlang distribution (which is the distribution of the sum of several
independent exponentially distributed variables).
Reliability theory and reliability engineering also make extensive use of the exponential
distribution. Because of the memoryless property of this distribution, it is well-suited to
model the constant hazard rate portion of the bathtub curve used in reliability theory. It is
also very convenient because it is so easy to add failure rates in a reliability model. The
exponential distribution is however not appropriate to model the overall lifetime of
organisms or technical devices, because the "failure rates" here are not constant: more
failures occur for very young and for very old systems.
In physics, if you observe a gas at a fixed temperature and pressure in a uniform
gravitational field, the heights of the various molecules also follow an approximate
exponential distribution. This is a consequence of the entropy property mentioned below.
[edit] Properties
[edit] Mean, variance, and median
The mean or expected value of an exponentially distributed random variable X with rate
parameter λ is given by
In light of the examples given above, this makes sense: if you receive phone calls at an
average rate of 2 per hour, then you can expect to wait half an hour for every call.
The variance of X is given by
The median of X is given by
where ln refers to the natural logarithm. Thus the absolute difference between the mean
and median is
in accordance with the median-mean inequality.
[edit] Memorylessness
An important property of the exponential distribution is that it is memoryless. This means
that if a random variable T is exponentially distributed, its conditional probability obeys
This says that the conditional probability that we need to wait, for example, more than
another 10 seconds before the first arrival, given that the first arrival has not yet happened
after 30 seconds, is equal to the initial probability that we need to wait more than 10
seconds for the first arrival. So, if we waited for 30 seconds and the first arrival didn't
happen (T > 30), probability that we'll need to wait another 10 seconds for the first arrival
(T > 30 + 10) is the same as the initial probability that we need to wait more than 10
seconds for the first arrival (T > 10). This is often misunderstood by students taking
courses on probability: the fact that Pr(T > 40 | T > 30) = Pr(T > 10) does not mean that
the events T > 40 and T > 30 are independent.
To summarize: "memorylessness" of the probability distribution of the waiting time T
until the first arrival means
It does not mean
(That would be independence. These two events are not independent.)
The exponential distributions and the geometric distributions are the only memoryless
probability distributions.
The exponential distribution is consequently also necessarily the only continuous
probability distribution that has a constant Failure rate.
[edit] Quartiles
The quantile function (inverse cumulative distribution function) for Exponential(λ) is
for 0 ≤ p < 1. The quartiles are therefore:
first quartile
ln(4/3)/λ
median
ln(2)/λ
third quartile
ln(4)/λ
[edit] Kullback–Leibler divergence
The directed Kullback–Leibler divergence between Exp(λ0) ('true' distribution) and
Exp(λ) ('approximating' distribution) is given by
[edit] Maximum entropy distribution
Among all continuous probability distributions with support [0,∞) and mean μ, the
exponential distribution with λ = 1/μ has the largest entropy.
[edit] Distribution of the minimum of exponential random variables
Let X1, ..., Xn be independent exponentially distributed random variables with rate
parameters λ1, ..., λn. Then
is also exponentially distributed, with parameter
This can be seen by considering the complementary cumulative distribution function:
The index of the variable which achieves the minimum is distributed according to the law
Note that
is not exponentially distributed.
[edit] Parameter estimation
Suppose a given variable is exponentially distributed and the rate parameter λ is to be
estimated.
[edit] Maximum likelihood
The likelihood function for λ, given an independent and identically distributed sample x =
(x1, ..., xn) drawn from the variable, is
where
is the sample mean.
The derivative of the likelihood function's logarithm is
Consequently the maximum likelihood estimate for the rate parameter is
While this estimate is the most likely reconstruction of the true parameter λ, it is only an
estimate, and as such, one can imagine that the more data points are available the better
the estimate will be. It so happens that one can compute an exact confidence interval –
that is, a confidence interval that is valid for all number of samples, not just large ones.
The 100(1 − α)% exact confidence interval for this estimate is given by[1]
Where is the MLE estimate, λ is the true value of the parameter, and χ2k; x is the value of
the chi squared distribution with k degrees of freedom that gives x cumulative probability
(i.e. the value found in chi-squared tables [1]).
[edit] Bayesian inference
The conjugate prior for the exponential distribution is the gamma distribution (of which
the exponential distribution is a special case). The following parameterization of the
gamma pdf is useful:
The posterior distribution p can then be expressed in terms of the likelihood function
defined above and a gamma prior:
Now the posterior density p has been specified up to a missing normalizing constant.
Since it has the form of a gamma pdf, this can easily be filled in, and one obtains
Here the parameter α can be interpreted as the number of prior observations, and β as the
sum of the prior observations.
[edit] Prediction
Having observed a sample of n data points from an unknown exponential distribution a
common task is to use these samples to make predictions about future data from the same
source. A common predictive distribution over future samples is the so-called plug-in
distribution, formed by plugging a suitable estimate for the rate parameter λ into the
exponential density function. A common choice of estimate is the one provided by the
principle of maximum likelihood, and using this yields the predictive density over a
future sample xn+1, conditioned on the observed samples x = (x1, ..., xn) given by
The Bayesian approach provides a predictive distribution which takes into account the
uncertainty of the estimated parameter, although this may depend crucially on the choice
of prior. A recent alternative that is free of the issues of choosing priors is the Conditional
Normalized Maximum Likelihood (CNML) predictive distribution [2]
The accuracy of a predictive distribution may be measured using the distance or
divergence between the true exponential distribution with rate parameter, λ0, and the
predictive distribution based on the sample x. The Kullback–Leibler divergence is a
commonly used, parameterisation free measure of the difference between two
distributions. Letting Δ(λ0||p) denote the Kullback–Leibler divergence between an
exponential with rate parameter λ0 and a predictive distribution p it can be shown that
where the expectation is taken with respect to the exponential distribution with rate
parameter λ0 ∈ (0, ∞), and ψ( · ) is the digamma function. It is clear that the CNML
predictive distribution is strictly superior to the maximum likelihood plug-in distribution
in terms of average Kullback–Leibler divergence for all sample sizes n > 0.
[edit] Generating exponential variates
A conceptually very simple method for generating exponential variates is based on
inverse transform sampling: Given a random variate U drawn from the uniform
distribution on the unit interval (0, 1), the variate
has an exponential distribution, where F −1 is the quantile function, defined by
Moreover, if U is uniform on (0, 1), then so is 1 − U. This means one can generate
exponential variates as follows:
Other methods for generating exponential variates are discussed by Knuth[3] and
Devroye.[4]
The ziggurat algorithm is a fast method for generating exponential variates.
A fast method for generating a set of ready-ordered exponential variates without using a
sorting routine is also available.[4]
[edit] Related distributions













An exponential distribution is a special case of a gamma distribution with α = 1
(or k = 1 depending on the parameter set used). Both an exponential distribution
and a gamma distribution are special cases of the phase-type distribution.
The exponential distribution is closely related to the double exponential
distribution (a.k.a. the Laplace distribution), which is a shifted version of an
exponential distribution applied to the absolute value of a quantity (graphically,
two exponential distributions glued back-to-back).
Y ∼ Pareto(xm, α), i.e. Y has a Pareto distribution, if Y = xmeX and X ∼
Exponential(α).
Y ∼ Weibull(γ, λ), i.e. Y has a Weibull distribution, if Y = X1/γ and X ∼
Exponential(λ−). In particular, every exponential distribution is also a Weibull
distribution.
Y ∼ Rayleigh(σ), i.e. Y has a Rayleigh distribution, if and X ∼ Exponential(λ).
Y ∼ Gumbel(μ, β), i.e. Y has a Gumbel distribution, if Y = μ − βlog(Xλ) and X ∼
Exponential(λ).
Y ∼ Laplace, i.e. Y has a Laplace distribution, if Y = X1 − X2 for two independent
exponential distributions X1 and X2.
Y ∼ Exponential, i.e. Y has an exponential distribution, if Y = min(X1, …, XN) for
independent exponential distributions Xi.
Y ∼ Uniform(0, 1), i.e. Y has a uniform distribution if Y = exp( − Xλ) and X ∼
Exponential(λ).
X ∼ χ22, i.e. X has a chi-square distribution with 2 degrees of freedom, if .
Let X1…Xn ∼ Exponential(λ) be exponentially distributed and independent and Y
= ∑i=1nXi. Then Y ∼ Gamma(n, 1/λ), i.e. Y has a Gamma distribution.
X ∼ SkewLogistic(θ), then log(1 + e−−) ∼ Exponential(θ): see skew-logistic
distribution.
Let X ∼ Exponential(λX) and Y ∼ Exponential(λY) be independent. Then has
probability density function . This can be used to obtain a confidence interval for .
Other related distributions:



Hyper-exponential distribution – the distribution whose density is a weighted sum
of exponential densities.
Hypoexponential distribution – the distribution of a general sum of exponential
random variables.
exGaussian distribution – the sum of an exponential distribution and a normal
distribution.
[edit] Independent events
The standard definition says:
Two events A and B are independent if and only if Pr(A ∩ B) = Pr(A)Pr(B).
Here A ∩ B is the intersection of A and B, that is, it is the event that both events A and B
occur.
More generally, any collection of events—possibly more than just two of them—are
mutually independent if and only if for every finite subset A1, ..., An of the collection we
have
variables X and Y are independent if and only if for every a and b, the events {X ≤ a} and
{Y ≤ b} are independent events as defined above. Mathematically, this can be described
as follows:
The random variables X and Y with distribution functions FX(x) and FY(y), and probability
densities ƒX(x) and ƒY(y), are independent if and only if the combined random variable
(X, Y) has a joint cumulative distribution function
or equivalently, a joint density
.
Similar expressions characterise independence more generally for more than two random
variables.
An arbitrary collection of random variables – possibly more than just two of them — is
independent precisely if for any finite collection X1, ..., Xn and any finite set of numbers
a1, ..., an, the events {X1 ≤ a1}, ..., {Xn ≤ an} are independent events as defined above.
The measure-theoretically inclined may prefer to substitute events {X ∈ A} for events
{X ≤ a} in the above definition, where A is any Borel set. That definition is exactly
equivalent to the one above when the values of the random variables are real numbers. It
has the advantage of working also for complex-valued random variables or for random
variables taking values in any measurable space (which includes topological spaces
endowed by appropriate σ-algebras).
If any two of a collection of random variables are independent, they may nonetheless fail
to be mutually independent; this is called pairwise independence.
If X and Y are independent, then the expectation operator E has the property
and for the variance we have
so the covariance cov(X, Y) is zero. (The converse of these, i.e. the proposition that if two
random variables have a covariance of 0 they must be independent, is not true. See
uncorrelated.)
Two independent random variables X and Y have the property that the characteristic
function of their sum is the product of their marginal characteristic functions:
but the reverse implication is not true (see subindependence).
[edit] Independent σ-algebras
The definitions above are both generalized by the following definition of independence
for σ-algebras. Let (Ω, Σ, Pr) be a probability space and let A and B be two sub-σalgebras of Σ. A and B are said to be independent if, whenever A ∈ A and B ∈ B,
The new definition relates to the previous ones very directly:

Two events are independent (in the old sense) if and only if the σ-algebras that
they generate are independent (in the new sense). The σ-algebra generated by an
event E ∈ Σ is, by definition,

Two random variables X and Y defined over Ω are independent (in the old sense)
if and only if the σ-algebras that they generate are independent (in the new sense).
The σ-algebra generated by a random variable X taking values in some
measurable space S consists, by definition, of all subsets of Ω of the form X−1(U),
where U is any measurable subset of S.
Using this definition, it is easy to show that if X and Y are random variables and Y is Pralmost surely constant, then X and Y are independent, since the σ-algebra generated by an
almost surely constant random variable is the trivial σ-algebra {∅, Ω}.
[edit] Conditionally independent random variables
Main article: Conditional independence
Intuitively, two random variables X and Y are conditionally independent given Z if, once
Z is known, the value of Y does not add any additional information about X. For instance,
two measurements X and Y of the same underlying quantity Z are not independent, but
they are conditionally independent given Z (unless the errors in the two measurements
are somehow connected).
The formal definition of conditional independence is based on the idea of conditional
distributions. If X, Y, and Z are discrete random variables, then we define X and Y to be
conditionally independent given Z if
for all x, y and z such that P(Z = z) > 0. On the other hand, if the random variables are
continuous and have a joint probability density function p, then X and Y are conditionally
independent given Z if
for all real numbers x, y and z such that pZ(z) > 0.
If X and Y are conditionally independent given Z, then
for any x, y and z with P(Z = z) > 0. That is, the conditional distribution for X given Y and
Z is the same as that given Z alone. A similar equation holds for the conditional
probability density functions in the continuous case.
Independence can be seen as a special kind of conditional independence, since
probability can be seen as a kind of conditional probability given no events.