Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UNIT-1 Random variables: Random variables: Discrete Continuous Probability Distribution Discrete Binomial D. Continuous Poisson D. Related Problems, mean, Normal D. Variance, standard deviation In a random experiment, the outcomes are governed by chance mechanism and the sample space’s consists of all outcomes of the experiment when the elements of the sample space are non-numeric, they can be quantified by assigning a real no to every event of the sample space. This assignment rule is known as random variable. Random Variable: A random variable X on a sample space S is a function X:S→R from S to the set of real no.s which assigns a real number X(x) to each sample point of S. s r x x(x) [i.e., the pre image of every element of R is an event of S] Range: The range space Rx is the set of all possible values of X:Rx≤R Note: Although X is called a random variable, it is intact a single valued fn. X denotes random variable and x denotes one of its values. Discrete Random Variable: A random variable X is said to be discrete random variable if its set of all possible outcomes, the sample space S is countable (termite or countable infinite). Counting Problems give rise to discrete random variables. Continuous Random variable: A random variable X is said to be continuous random variable if S contains infinite no.s equal to the no. of points on a live segment. i.e., it takes all the possible values in an interval. (An internal contains uncountable no. of possible values). Ex: Consider the experiment of tossing a coin twice. Sample space S={HH,HT,TH,TT}. Define X:S→R by X(s)=no. of heads in S . X (s ) 2 X (s ) 1X (s ) 1X (s ) 0 Range of X X (8) :8 s 0,1, 2 1 2 3 4 2. Consider the experiment of throwing a pair of dice and noting sum S={(1,1)(1,2)_ _ _ _(6,6)} Then the random variable for this experiment is defined as X : S RbyX (i, j ) i j(i, j ) S . {1.If X and Y are two random variables defined on S and a and b are two real no.s, then. (i)ax+by is also a random variable . In particular x-y is also a random variable. (ii) xy is also a random variable (iii) IfX (s) 0s sthen 1/x is also a random a variable.} . It an a random experiment, the event corresponding to a number a occurs, then the corresponding random variable X assumes a and the probability of that event is denoted by p(x=a) similarly, the probability of the event x assuming any value in the interval a<x<b). The probability of the event x≤c is written as p(x.≤c). Note that more than one random variable can be defined in a sample space Discrete random variable: It we can count the possibilities, of x1 then x is discrete. Ex: The random variable x=the sum of the dots on two dice is discrete. X can assume the value 2,3,4,5………12. Continuous: In an interval of real no.s, there are an infinite no. of possible values. Ex: The random variable x= the time that an athlete crosses the winning live. [Probability density function: The p.d.f. of a random variable x denoted by of (x) has the favoring properties. f ( x) 1. f ( x)dx 1 2. 3. P( E ) f ( x)dx Where E is any event. E Note: P(F) = 0 does not imply that E is nill event or impossible event.] Probability distribution or distribution: The probability distribution or distribution f(x) of a random variable X is a description of the set of possible values of (range of x) along with the probability associated with each of x Ex: Let x=the no.of heads in tossing two coins X=x 0 1 2 F(x) = P(X=x) 1/4 2/4=1/2 1/4 Cumulative distribution function: The cumulative distribution function for a random variable x is defined by F(x) = P(x≤x) where x is any real no. (…….) Properties: 1. If a<b then p(a<x≤b) = F(b)-F(a). 2. P(a≤x≤b) = p(x=a)+f(b)-f(a) 3. P(a<x<b) = F(b)-F(a)-p(x=b) 4. P(a≤x<b)=F(b)-F(a)-P(x) According to the type of random variables, we have two types of probability distributions. 1 Discrete probability distribution 2. Continuous probability distribution. Discrete Probability distribution: Let x be a discrete random variable. The discrete probability distribution function f(x) for x is given by satisfying the properties f ( x) p( x x)xorf ( xi ) p( x xi )i 1, 2... 1. P(xi) ≥0 V i 2. ∑P(xi) = 1 1. f(x)>,0 V x 2. ∑f(x) = 1 x←x The discrete probability function can also be called as probability mass function. Any function satisfying the above 2 properties, will be a discrete probability fn. or probability mass function. X=x x1 x2 x3 P(x=x) P1 P2 P3 Ex: x: the sum of no.s which turn on tossing a pair of dice. X=xi 2 3 4 5 6 7 8 9 10 11 12 P(x=xi)1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1 .P(xi)≥0 2. ∑P(xi)=1 X x1 x2 ---------------Xn P(x) p(x1) p(x2) --------------P(Xn) 1. P (X<xi) = P (x1) + P (X2) + ----------+P(Xi-i) 2. P (X≤Xi) = P (x1) + P (X2)+ ---------+P (Xi-1) + P(Xi) 3. P (X>Xi) = 1 – P (x≤xi) Check whether the following can act as discrete probability functions. 1. F(x) = X - 2/2 for X = 1,2,3,4 2. F(x) =XF2/25 x = 0,1,2,3,4 (1) cannot f(x)<0 for x=1 (2) cannot ∑f(x)≠1 General properties: Expectation, mean, variance & Standard deviation : Expectation: Let a random variable x assumes the values x1,x2--------xn with respective probabilities f1,f2,--------fn. Then the expectation of X E(x) is defined as the sum of different values of x and the corresponding probabilities. E( X ) n i 1 pi xi Results: 1. It x is a random variable and k is a constant then a. F(x+k) = F(x)+k b. E(x) 2 If x and y are two discrete random variables, then E(x+y) = E(x)+E(y) Note: 1. E(x+y+z) =E(x) + E(y) + E(z) 2. E(ax+by) = a E(x) + bE(y) 3. E(xyz) = E(x) E(y) E(z) Mean: The mean value u of a discrete distribution function is given by n px i 1 i i E( X ) Variance: The variance of a discrete distribution function is given by n pi x 2i 2 2 i 1 Standard deviation: It is nothing but the positive square root of the variance. p x n 2 2 i 1 i 2 i Continuous Probability distribution: When a random variable x takes every value in an interval, it is called an continuous random variable. Ex: Temperature, heights and weights. The continuous random variable will form a curve that can be used to calculate the probabilities via areas. Def: Let x be a continuous random variable. A function f(x) is said to be a continuous probability function of x if for any [a,b] a,b←R such that 1. f(x) ≥0 2. f ( x)dx 1 The total area bounded by graph f(x) and horizontal axis is 3. p(a≤x≤b) = f (t )dt a and b any two values of x b a satisfying a<b. Continuous probability function f(x) is also called as probability density function. Def: the cumulative distribution function of a continuous random variable x is defined by f(x) = p (x≤xx) x F(x) = f (t )dt where f(x) is Continuous probability function By def. f(x) = d ( F ( x )) dx General Properties; Let X be a ran dom variable with p.d.f f(x) The mean of X, xf ( x)dx The variance of X denoted by ( x ) f ( x)dx x2 f ( x)dx 2 2 2 The standard deviation of X denoted by x 2 f ( x)dx Results: If X is a continuous random variable and Y=ax+b then E(Y)=aE(X)+b And V(X)= a Var(Y) Var(X+k)=Var(X) 2 V(kX)= Median: k 2 Var(X) xf ( x)dx In case of continuous distribution, Mediam is a point which divides the entire distribution into two equal parts. If X is defined from a to b,M is median then M b 1 f ( x ) dx f ( x ) dx a M 2 Mode: Mode is the value of x for which f(x) is maxium. Mode is given by f ( x) 0 f ( x) 0 fora x b CHEBYSHEV’S INEQUALITY : "The probability that the outcome of an experiment with the random variable will fall more than standard deviations beyond the mean of , , is less than ." Or: "The proportion of the total area under the pdf of outside of standard deviations from the mean is at most ." Proof Let be the sample space for a random variable, , and let stand for the pdf of . Let , and partition , such that for every sample point in Then . Clearly Since the term that evaluates to the variance in subtracted on the right-hand side. has been For any sample point in Notice that the direction of the inequality changes since squaring causes the right-hand expression to become positive. And for any sample point in So, for any sample point in , and so or , it can be said that Dividing each side of the inequality by Or, in other terms results in UNIT –II Binomial distribution: Binomial and Poisson distributions are related to discrete random variables and normal distribution is related to continuous random variables. In many cores, it is desirable to have situations called repeated trials. For this, we develop a model that is useful in representing the probability distributions pertaining to the no. of occurrence of an event in repeated trials of an experiment. Binomial distribution is discovered by James Bernoulli Bernoulli Trials: If there are n trials of an experiment in which each trial has only two mutually exclusive, equally likely and independent outcomes, then they are called Bernoulli trials, Let us denote the two outcomes by success and failure. The Bernoulli distribution is a discrete distribution having two possible outcomes labelled by and in which ("success") occurs with probability and ("failure") occurs with probability , where . It therefore has probability density function which can also be written the binomial distribution The Binomial Distribution is one of the discrete probability distribution. It is used when there are exactly two mutually exclusive outcomes of a trial. These outcomes are appropriately labeled Success and Failure. The Binomial Distribution is used to obtain the probability of observing r successes in n trials, with the probability of success on a single trial denoted by p. Formula: P(X = r) = nCr p r (1-p)n-r where, n = Number of events. r = Number of successful events. p = Probability of success on a single trial. nCr = ( n! / (n-r)! ) / r! 1-p = Probability of failure. Example: Toss a coin for 12 times. What is the probability of getting exactly 7 heads. Step 1: Here, Number of trials n = 12 Number of success r = 7 (since we define getting a head as success) Probability of success on any single trial p = 0.5 Example: Toss a coin for 12 times. What is the probability of getting exactly 7 heads. Step 1: Here, Number of trials n = 12 Number of success r = 7 (since we define getting a head as success) Probability of success on any single trial p = 0.5 Step 2: To Calculate nCr formula is used. nCr = ( n! / (n-r)! ) / r! = ( 12! / (12-7)! ) / 7! = ( 12! / 5! ) / 7! = ( 479001600 / 120 ) / 5040 = ( 3991680 / 5040 ) = 792 Step 3: Find pr. pr = 0.57 = 0.0078125 Step 4: To Find (1-p)n-r Calculate 1-p and n-r. 1-p = 1-0.5 = 0.5 n-r = 12-7 = 5 Step 5: Find (1-p)n-r. = 0.55 = 0.03125 Step 6: Solve P(X = r) = nCr p r (1-p)n-r = 792 × 0.0078125 × 0.03125 = 0.193359375 The probability of getting exactly 7 heads is 0.19 Step 2: To Calculate nCr formula is used. nCr = ( n! / (n-r)! ) / r! = ( 12! / (12-7)! ) / 7! = ( 12! / 5! ) / 7! = ( 479001600 / 120 ) / 5040 = ( 3991680 / 5040 ) = 792 Step 3: Find pr. pr = 0.57 = 0.0078125 Step 4: To Find (1-p)n-r Calculate 1-p and n-r. 1-p = 1-0.5 = 0.5 n-r = 12-7 = 5 Step 5: Find (1-p)n-r. = 0.55 = 0.03125 Step 6: Solve P(X = r) = nCr p r (1-p)n-r = 792 × 0.0078125 × 0.03125 = 0.193359375 The probability of getting exactly 7 heads is 0.19 Mean and variance If X ~ B(n, p) (that is, X is a binomially distributed random variable), then the expected value of X is and the variance is This fact is easily proven as follows. Suppose first that we have a single Bernoulli trial. There are two possible outcomes: 1 and 0, the first occurring with probability p and the second having probability 1 − p. The expected value in this trial will be equal to μ = 1 · p + 0 · (1−p) = p. The variance in this trial is calculated similarly: σ2 = (1−p)2·p + (0−p)2·(1−p) = p(1 − p). The generic binomial distribution is a sum of n independent Bernoulli trials. The mean and the variance of such distributions are equal to the sums of means and variances of each individual trial: Mode and median Usually the mode of a binomial B(n, p) distribution is equal to ⌊(n + 1)p⌋, where ⌊ ⌋ is the floor function. However when (n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1. When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows: In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established: If np is an integer, then the mean, median, and mode coincide. Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉. A median m cannot lie too far away from the mean: |m − np| ≤ min{ ln 2, max{p, 1 − p} } The median is unique and equal to m = round(np) in cases when either p ≤ 1 − ln 2 or p ≥ ln 2 or |m − np| ≤ min{p, 1 − p} (except for the case when p = ½ and n is odd). When p = 1/2 and n is odd, any number m in the interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial distribution. If p = 1/2 and n is even, then m = n/2 is the unique median. Data sometimes arise as the number of occurrences ( ) of an event per unit time or space, e.g., the number of yeast cells per cm2 on a microscope slide. Under certain conditions (see below), the random variable is said to follow a Poisson distribution, which, as a count, is type of discrete distribution. Occurrences are sometimes called arrivals when they take place in a fixed time interval. The Poisson distribution was discovered in 1838 by Simeon-Denis Poisson as an approximation to the binomial distribution, when the probability of success is small and the number of trials is large. The Poisson distribution is called the law of small numbers because Poisson events occur rarely even though there are many opportunities for these evens to occur. ^ Poisson Experiment The number of occurrences of an event per unit time or space will have a Poisson distribution if: the rate of occurrence is constant over time or space; past occurrences do not influence the likelihood of future occurrences; simultaneous occurrences are nearly impossible. ^ Poisson Distribution The Poisson probability density function is given by: where is the mean of the Poisson random variable, i.e., the average number of occurrences of the event per unit of time or space. As such, is the rate of occurrence per unit time or space. For example, if one decay event of a radioactive substance occurs per second, then . The following Poisson Applet can be used to compute Poisson probabilites and quantiles. (Note: in the applet.) Distributional Properties Notice that the modes of the Poisson probability distribution in the applet (when ) are 3 and 2. This is always the case: if is a positive integer, the modes are: . If is not an integer, then the mode is: , which is the largest integer . Double click on m in the center of the slider and change the increment to 0.5. Animate the probability density function to see that the defintions of the mode hold. The Poisson distribution with sufficiently large can be approximated by a Normal distribution. The approximation is good if and a continuity correction is used. For example, the Poisson probability: , where the later is computed from the normal distribution with a non-negative integer. The Poisson applet works for . Thus, for , the normal approximation must be used. Notice that as m in the applet increases, the distribution becomes more symmetric. ^ Poisson Probabilities Poisson probabilities can be computed directly from the probability density function or from Poisson probability tables for certain values of . We start by verifying the rather surprising fact pointed out in the definition of the mode: , when is a positive integer. When : . ^ Poisson Moments The mean of a Poisson random variable is , i.e., . The mean is a rate, e.g., a temporal rate for time events. For example, if 0.5 phone calls per hour are received on a home phone during the day, then the mean number of phone calls between 9 A.M. and 5 P.M. is . The Poisson has the interesting property that the variance is also , i.e., . Thus, unlike the normal distribution, the variance of a Poisson random variable depends on the mean. Certain Poisson-like random variables are overdispersed ( ) or under-dispersed ( ). For example, the negative binomial can be viewed as an over-dispersed Poisson and, like the Poisson, is often used to model species abundances in ecology, i.e, certain species abundances are Poisson distributed and others are distributed as a negative binomial. The mean, and hence the variance, can be estimated by the sample mean, which is the maximum likelihood estimator, i.e., if are realizations of a Poisson experiment, the estimated mean is: . For example, suppose a supervisor wants to know the average number of typing mistakes his/her secretary makes per page. Ten pages are rondomly selected and the the following values were obtained: 2131333231 Then: . The value of actually used to simulate the data was 2 and thus the estimate is reasonably close. ^ Poisson Quantiles ^ Poisson Approximation to the Binomial Consider a binomial distribution consisting of trials with probability of success i.e,. for . If is sufficiently large, then the binomial probability: , , i.e., the binomial probability is approximately equal to the corresponding Poisson probability. Note that , the mean of the binomial distribution. Thus, binomial probabilities, which are hard to compute for large , can be approximated by corresponding Poisson probabilies. For example, suppose 10,000 soldiers are screened for a rare blood disease ( ). We want the probability at least 10 soldiers test positive for the disease, i.e, we want , where . This is difficult to compute using the binomial distribution, but much easier for the Poisson with . POISSON DISTRIBUTION In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. The distribution was discovered by Siméon-Denis Poisson (1781–1840) and published, together with his probability theory, in 1838 in his work Recherches sur la probabilité des jugements en matières criminelles et matière civile ("Research on the Probability of Judgments in Criminal and Civil Matters"). The work focused on certain random variables N that count, among other things, a number of discrete occurrences (sometimes called "arrivals") that take place during a time-interval of given length. If the expected number of occurrences in this interval is λ, then the probability that there are exactly k occurrences (k being a non-negative integer, k = 0, 1, 2, ...) is equal to where e is the base of the natural logarithm (e = 2.71828...) k is the number of occurrences of an event - the probability of which is given by the function k! is the factorial of k λ is a positive real number, equal to the expected number of occurrences that occur during the given interval. For instance, if the events occur on average 4 times per minute, and you are interested in the number of events occurring in a 10 minute interval, you would use as model a Poisson distribution with λ = 10*4 = 40. As a function of k, this is the probability mass function. The Poisson distribution can be derived as a limiting case of the binomial distribution. The Poisson distribution can be applied to systems with a large number of possible events, each of which is rare. A classic example is the nuclear decay of atoms. The Poisson distribution is sometimes called a Poissonian, analogous to the term Gaussian for a Gauss or normal distribution Poisson noise and characterizing small occurrences The parameter λ is not only the mean number of occurrences , but also its variance (see Table). Thus, the number of observed occurrences fluctuates about its mean λ with a standard deviation . These fluctuations are denoted as Poisson noise or (particularly in electronics) as shot noise. The correlation of the mean and standard deviation in counting independent, discrete occurrences is useful scientifically. By monitoring how the fluctuations vary with the mean signal, one can estimate the contribution of a single occurrence, even if that contribution is too small to be detected directly. For example, the charge e on an electron can be estimated by correlating the magnitude of an electric current with its shot noise. If N electrons pass a point in a given time t on the average, the mean current is I = eN / t; since the current fluctuations should be of the order (i.e. the variance of the Poisson process), the charge e can be estimated from the ratio . An everyday example is the graininess that appears as photographs are enlarged; the graininess is due to Poisson fluctuations in the number of reduced silver grains, not to the individual grains themselves. By correlating the graininess with the degree of enlargement, one can estimate the contribution of an individual grain (which is otherwise too small to be seen unaided). Many other molecular applications of Poisson noise have been developed, e.g., estimating the number density of receptor molecules in a cell membrane. Related distributions If and then the difference Y = X1 − X2 follows a Skellam distribution. If and are independent, and Y = X1 + X2, then the distribution of X1 conditional on Y = y is a binomial. Specifically, . More generally, if X1, X2,..., Xn are Poisson random variables with parameters λ1, λ2,..., λn then The Poisson distribution can be derived as a limiting case to the binomial distribution as the number of trials goes to infinity and the expected number of successes remains fixed. Therefore it can be used as an approximation of the binomial distribution if n is sufficiently large and p is sufficiently small. There is a rule of thumb stating that the Poisson distribution is a good approximation of the binomial distribution if n is at least 20 and p is smaller than or equal to 0.05. According to this rule the approximation is excellent if n ≥ 100 and np ≤ 10.[1] For sufficiently large values of λ, (say λ>1000), the normal distribution with mean λ, and variance λ, is an excellent approximation to the Poisson distribution. If λ is greater than about 10, then the normal distribution is a good approximation if an appropriate continuity correction is performed, i.e., P(X ≤ x), where (lower-case) x is a non-negative integer, is replaced by P(X ≤ x + 0.5). If the number of arrivals in a given time follows the Poisson distribution, with mean = λ, then the lengths of the inter-arrival times follow the Exponential distribution, with rate 1 / λ. Occurrence The Poisson distribution arises in connection with Poisson processes. It applies to various phenomena of discrete nature (that is, those that may happen 0, 1, 2, 3, ... times during a given period of time or in a given area) whenever the probability of the phenomenon happening is constant in time or space. Examples of events that may be modelled as a Poisson distribution include: The number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz (1868–1931). The number of phone calls at a call center per minute. The number of times a web server is accessed per minute. The number of mutations in a given stretch of DNA after a certain amount of radiation. [Note: the intervals between successive Poisson events are reciprocally-related, following the Exponential distribution. For example, the lifetime of a lightbulb, or waiting time between buses.] How does this distribution arise? — The law of rare events In several of the above examples—for example, the number of mutations in a given sequence of DNA—the events being counted are actually the outcomes of discrete trials, and would more precisely be modelled using the binomial distribution. However, the binomial distribution with parameters n and λ/n, i.e., the probability distribution of the number of successes in n trials, with probability λ/n of success on each trial, approaches the Poisson distribution with expected value λ as n approaches infinity. This provides a means by which to approximate random variables using the Poisson distribution rather than the more-cumbersome binomial distribution. This limit is sometimes known as the law of rare events, since each of the individual Bernoulli events each rarely trigger. The name may be misleading because the total count of success events in a Poisson process need not be rare if the parameter λ is not small. For example, the number of telephone calls to a busy switchboard in one hour follows a Poisson distribution with the events appearing frequent to the operator, but they are rare from the point of the average member of the population who is very unlikely to make a call to that switchboard in that hour. Here are the details. First, recall from calculus that Let p = λ/n. Then we have For the F term, first take its logarithm: Using the Stirling formula The expression for can be further simplified to Therefore . Consequently the limit of the distribution becomes which now assumes the Poisson distribution. More generally, whenever a sequence of binomial random variables with parameters n and pn is such that the sequence converges in distribution to a Poisson random variable with mean λ (see, e.g. law of rare events). Properties The expected value of a Poisson-distributed random variable is equal to λ and so is its variance. The higher moments of the Poisson distribution are Touchard polynomials in λ, whose coefficients have a combinatorial meaning. In fact when the expected value of the Poisson distribution is 1, then Dobinski's formula says that the nth moment equals the number of partitions of a set of size n. The mode of a Poisson-distributed random variable with non-integer λ is equal to , which is the largest integer less than or equal to λ. This is also written as floor(λ). When λ is a positive integer, the modes are λ and λ − 1. Sums of Poisson-distributed random variables: If follow a Poisson distribution with parameter and Xi are independent, then also follows a Poisson distribution whose parameter is the sum of the component parameters. The moment-generating function of the Poisson distribution with expected value λ is All of the cumulants of the Poisson distribution are equal to the expected value λ. The nth factorial moment of the Poisson distribution is λn. The Poisson distributions are infinitely divisible probability distributions. The directed Kullback-Leibler divergence between Poi(λ0) and Poi(λ) is given by Generating Poisson-distributed random variables A simple way to generate random Poisson-distributed numbers is given by Knuth, see References below. algorithm poisson random number (Knuth): init: Let L ← e−λ, k ← 0 and p ← 1. do: k ← k + 1. Generate uniform random number u in [0,1] and let p ← p × u. while p ≥ L. return k − 1. While simple, the complexity is linear in λ. There are many other algorithms to overcome this. Some are given in Ahrens & Dieter, see References below. Parameter estimation Maximum likelihood Given a sample of n measured values ki we wish to estimate the value of the parameter λ of the Poisson population from which the sample was drawn. To calculate the maximum likelihood value, we form the log-likelihood function Take the derivative of L with respect to λ and equate it to zero: Solving for λ yields the maximum-likelihood estimate of λ: Since each observation has expectation λ so does this sample mean. Therefore it is an unbiased estimator of λ. It is also an efficient estimator, i.e. its estimation variance achieves the Cramér-Rao lower bound (CRLB). Bayesian inference In Bayesian inference, the conjugate prior for the rate parameter λ of the Poisson distribution is the Gamma distribution. Let denote that λ is distributed according to the Gamma density g parameterized in terms of a shape parameter α and an inverse scale parameter β: Then, given the same sample of n measured values ki as before, and a prior of Gamma(α, β), the posterior distribution is The posterior mean E[λ] approaches the maximum likelihood estimate in the limit as . The posterior predictive distribution of additional data is a Gamma-Poisson (i.e. negative binomial) distribution. The "law of small numbers" The word law is sometimes used as a synonym of probability distribution, and convergence in law means convergence in distribution. Accordingly, the Poisson distribution is sometimes called the law of small numbers because it is the probability distribution of the number of occurrences of an event that happens rarely but has very many opportunities to happen. The Law of Small Numbers is a book by Ladislaus Bortkiewicz about the Poisson distribution, published in 1898. Some historians of mathematics have argued that the Poisson distribution should have been called the Bortkiewicz distribution.[2] See also Anscombe transform - a variance-stabilising transformation for the Poisson distribution Compound Poisson distribution Tweedie distributions Poisson process Poisson regression Poisson sampling Queueing theory Erlang distribution which describes the waiting time until n events have occurred. For temporally distributed events, the Poisson distribution is the probability distribution of the number of events that would occur within a preset time, the Erlang distribution is the probability distribution of the amount of time until the nth event. Skellam distribution, the distribution of the difference of two Poisson variates, not necessarily from the same parent distribution. Incomplete gamma function used to calculate the CDF. Dobinski's formula (on combinatorial interpretation of the moments of the Poisson distribution) Schwarz formula Robbins lemma, a lemma relevant to empirical Bayes methods relying on the Poisson distribution Coefficient of dispersion, a simple measure to assess whether observed events are close to Poisson Examples for rare events: 1. No. of printing mistakes per page. 2. No. of accidents on highway. 3. No. of bad cheques at a bank. 4. No. of blind persons 5. No. of noble prize winners 6. No. of Bharath rattans. Chapter V: Normal Probability Distribution The Normal Distribution is perhaps the most important model for studying quantitative phenomena in the natural and behavioral sciences - this is due to the Central Limit Theorem. Many numerical measurements (e.g., weight, time, etc.) can be well approximated by the normal distribution. The Standard Normal Distribution The Standard Normal Distribution is the simplest version (zero-mean, unit-standarddeviation) of the (General) Normal Distribution. Yet, it is perhaps the most frequently used version because many tables and computational resources are explicitly available for calculating probabilities. Nonstandard Normal Distribution: Finding Probabilities In practice, the mechanisms underlying natural phenomena may be unknown, yet the use of the normal model can be theoretically justified in many situations to compute critical and probability values for various processes. Nonstandard Normal Distribution: Finding Scores (Critical Values) In addition to being able to compute probability (p) values, we often need to estimate the critical values of the Normal Distribution for a given p-value. Chapter VI: Relations Between Distributions In this chapter, we will explore the relationships between different distributions. This knowledge will help us to compute difficult probabilities using reasonable approximations and identify appropriate probability models, graphical and statistical analysis tools for data interpretation. The complete list of all SOCR Distributions is available here and the Distributome applet provides an interactive graphical interface for exploring the relations between different distributions. The Central Limit Theorem The exploration of the relations between different distributions begins with the study of the sampling distribution of the sample average. This will demonstrate the universally important role of normal distribution. Law of Large Numbers Suppose the relative frequency of occurrence of one event whose probability to be observed at each experiment is p. If we repeat the same experiment over and over, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of experiments increases. Why is that and why is this important? Normal Distribution as Approximation to Binomial Distribution Normal Distribution provides a valuable approximation to Binomial when the sample sizes are large and the probability of successes and failures is not close to zero. Poisson Approximation to Binomial Distribution Poisson provides an approximation to Binomial Distribution when the sample sizes are large and the probability of successes or failures is close to zero. Binomial Approximation to Hypergeometric Binomial Distribution is much simpler to compute, compared to Hypergeometric, and can be used as an approximation when the population sizes are large (relative to the sample size) and the probability of successes is not close to zero. Normal Approximation to Poisson The Poisson can be approximated fairly well by Normal Distribution when λ is large. Normal distribution: Normal distribution is one of the most widely used continuous probability distribution in applications of statistical methods. It is of tremendous importance in the analysis and evaluation of every aspect of experimental date in science and medicine. Def: Normal distribution is the probability distribution of a continuous random variable x, known as normal random variables or normal variate. It is also called Gaussian distribution. Chief characteristics: 1. The graph of the normal distribution y=f(x) in x-y plane is known as the normal curve. 2. The curve is bell shaped and symmetrical about the line x=--3. Area under the normal curve is unity i.e., it represents total population. 4. Mean = median =mode. 5. The curve is symmetrical about the line --------6. x- axis is an asymptote to the curve and the points of inflexion of the curve an at ----7. Since mean = , the line x = -- divides the total area into two equal parts. 8. No portion of the curve lies below x – axis. 9.The prob. that the normal vaiate x with mean – and c.d. between Importance and applications of normal distribution: Normal distribution plays a very important role in statistical theory because of the following reasons 1. Most of the distributions for example, binomial, Poisson etc can be approximated by Normal distribution.. 2. Since it is a limiting case of Binomial distribution for exceptionally large numbers, it is applicable to many applied problems in kinetic theory of gases and fluctuations in the magnitude of an electric current. 3. It is variable is normally distributed, it can sometimes be brought to normal form by simple transformation of the variable. 4. The proofs of all the tests of significance in sampling are based on the fundamental assumption that the population from which the samples have been drawn are normal. 5. Normal distribution finds large applications in statistical quality Control. 6. Many of the distributions of sample statistic, the distributions of sample mean, sample variance etc tend to normality for large samples and as such they can best be studied with the help of normal curve. Area property: By taking ---====, the standard normal curve is formed. The probability that the normal variate x with mean – and s.d. lies, between two specific values --- and – with -----, can be obtained using area under the standard normal curve as follows. Def: A normal random variable with mean ------- and variance ----- is called standard normal variable. Its probability density function is given by Def: The cumulative distribution fn. of a standard normal random variable is Normal approximation is binomial distribution: When n is very large, it is very difficult to calculate the probabilities by using binominal distribution. Normal distribution is a limiting case of binomial distribution under the following conditions. (1) N, the no. of trials very large n --(2) neither p nor q is very small. For a b.d. E(x) = np var(x) = npq. Then the standard normal variate Z x u x np npq tends to the distribution of standard normal variable given by z 1 ( z ) e 2 z 2 2 If p q , and for large n, we can approximate binomial curve by normal curve. Here the interval becomes (x-1/2,x+1/2) Note: If X is a poisson variable with mean then the standard normal variable Z= x and the probability can be calculated as explained above Poisson distribution approaches the normal distribution as A uniform distribution, sometimes also known as a rectangular distribution, is a distribution that has constant probability. The probability density function and cumulative distribution function for a continuous uniform distribution on the interval are (1) (2) These can be written in terms of the Heaviside step function as (3) (4) the latter of which simplifies to the expected for . The continuous distribution is implemented as UniformDistribution[a, b]. For a continuous uniform distribution, the characteristic function is (5) If and , the characteristic function simplifies to (6) (7) The moment-generating function is (8) (9) (10) and (11) (12) The moment-generating function is not differentiable at zero, but the moments can be calculated by differentiating and then taking . The raw moments are given analytically by (13) (14) (15) The first few are therefore given explicitly by (16) (17) (18) (19) The central moments are given analytically by (20) (21) (22) The first few are therefore given explicitly by (23) (24) (25) (26) The mean, variance, skewness, and kurtosis excess are therefore (27) (28) (29) (30) Exponential distribution From Wikipedia, the free encyclopedia Jump to: navigation, search Exponential Probability density function Cumulative distribution function parameters: support: pdf: λ > 0 rate, or inverse scale x ∈ [0, ∞) λ e−λx cdf: mean: 1 − e−λx λ−1 median: λ−1 ln 2 0 mode: λ−2 variance: 2 skewness: ex.kurtosis: 6 1 − ln(λ) entropy: mgf: cf: Not to be confused with the exponential families of probability distributions. In probability theory and statistics, the exponential distribution (a.k.a. negative exponential distribution) is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate. Note that the exponential distribution is not the same as the class of exponential families of distributions, which is a large class of probability distributions that includes the exponential distribution as one of its members, but also includes the normal distribution, binomial distribution, gamma distribution, Poisson, and many others. Contents [hide] 1 Characterization o 1.1 Probability density function o 1.2 Cumulative distribution function o 1.3 Alternative parameterization 2 Occurrence and applications 3 Properties o 3.1 Mean, variance, and median o 3.2 Memorylessness o 3.3 Quartiles o 3.4 Kullback–Leibler divergence o 3.5 Maximum entropy distribution o 3.6 Distribution of the minimum of exponential random variables 4 Parameter estimation o 4.1 Maximum likelihood o 4.2 Bayesian inference 5 Prediction 6 Generating exponential variates 7 Related distributions 8 See also 9 References [edit] Characterization [edit] Probability density function The probability density function (pdf) of an exponential distribution is Here λ > 0 is the parameter of the distribution, often called the rate parameter. The distribution is supported on the interval [0, ∞). If a random variable X has this distribution, we write X ~ Exp(λ). [edit] Cumulative distribution function The cumulative distribution function is given by [edit] Alternative parameterization A commonly used alternative parameterization is to define the probability density function (pdf) of an exponential distribution as where β > 0 is a scale parameter of the distribution and is the reciprocal of the rate parameter, λ, defined above. In this specification, β is a survival parameter in the sense that if a random variable X is the duration of time that a given biological or mechanical system manages to survive and X ~ Exponential(β) then E[X] = β. That is to say, the expected duration of survival of the system is β units of time. The parameterisation involving the "rate" parameter arises in the context of events arriving at a rate λ, when the time between events (which might be modelled using an exponential distribution) has a mean of β = λ−1. The alternative specification is sometimes more convenient than the one given above, and some authors will use it as a standard definition. This alternative specification is not used here. Unfortunately this gives rise to a notational ambiguity. In general, the reader must check which of these two specifications is being used if an author writes "X ~ Exponential(λ)", since either the notation in the previous (using λ) or the notation in this section (here, using β to avoid confusion) could be intended. [edit] Occurrence and applications The exponential distribution occurs naturally when describing the lengths of the interarrival times in a homogeneous Poisson process. The exponential distribution may be viewed as a continuous counterpart of the geometric distribution, which describes the number of Bernoulli trials necessary for a discrete process to change state. In contrast, the exponential distribution describes the time for a continuous process to change state. In real-world scenarios, the assumption of a constant rate (or probability per unit time) is rarely satisfied. For example, the rate of incoming phone calls differs according to the time of day. But if we focus on a time interval during which the rate is roughly constant, such as from 2 to 4 p.m. during work days, the exponential distribution can be used as a good approximate model for the time until the next phone call arrives. Similar caveats apply to the following examples which yield approximately exponentially distributed variables: The time until a radioactive particle decays, or the time between clicks of a geiger counter The time it takes before your next telephone call The time until default (on payment to company debt holders) in reduced form credit risk modeling Exponential variables can also be used to model situations where certain events occur with a constant probability per unit length, such as the distance between mutations on a DNA strand, or between roadkills on a given road.[citation needed] In queuing theory, the service times of agents in a system (e.g. how long it takes for a bank teller etc. to serve a customer) are often modeled as exponentially distributed variables. (The inter-arrival of customers for instance in a system is typically modeled by the Poisson distribution in most management science textbooks.) The length of a process that can be thought of as a sequence of several independent tasks is better modeled by a variable following the Erlang distribution (which is the distribution of the sum of several independent exponentially distributed variables). Reliability theory and reliability engineering also make extensive use of the exponential distribution. Because of the memoryless property of this distribution, it is well-suited to model the constant hazard rate portion of the bathtub curve used in reliability theory. It is also very convenient because it is so easy to add failure rates in a reliability model. The exponential distribution is however not appropriate to model the overall lifetime of organisms or technical devices, because the "failure rates" here are not constant: more failures occur for very young and for very old systems. In physics, if you observe a gas at a fixed temperature and pressure in a uniform gravitational field, the heights of the various molecules also follow an approximate exponential distribution. This is a consequence of the entropy property mentioned below. [edit] Properties [edit] Mean, variance, and median The mean or expected value of an exponentially distributed random variable X with rate parameter λ is given by In light of the examples given above, this makes sense: if you receive phone calls at an average rate of 2 per hour, then you can expect to wait half an hour for every call. The variance of X is given by The median of X is given by where ln refers to the natural logarithm. Thus the absolute difference between the mean and median is in accordance with the median-mean inequality. [edit] Memorylessness An important property of the exponential distribution is that it is memoryless. This means that if a random variable T is exponentially distributed, its conditional probability obeys This says that the conditional probability that we need to wait, for example, more than another 10 seconds before the first arrival, given that the first arrival has not yet happened after 30 seconds, is equal to the initial probability that we need to wait more than 10 seconds for the first arrival. So, if we waited for 30 seconds and the first arrival didn't happen (T > 30), probability that we'll need to wait another 10 seconds for the first arrival (T > 30 + 10) is the same as the initial probability that we need to wait more than 10 seconds for the first arrival (T > 10). This is often misunderstood by students taking courses on probability: the fact that Pr(T > 40 | T > 30) = Pr(T > 10) does not mean that the events T > 40 and T > 30 are independent. To summarize: "memorylessness" of the probability distribution of the waiting time T until the first arrival means It does not mean (That would be independence. These two events are not independent.) The exponential distributions and the geometric distributions are the only memoryless probability distributions. The exponential distribution is consequently also necessarily the only continuous probability distribution that has a constant Failure rate. [edit] Quartiles The quantile function (inverse cumulative distribution function) for Exponential(λ) is for 0 ≤ p < 1. The quartiles are therefore: first quartile ln(4/3)/λ median ln(2)/λ third quartile ln(4)/λ [edit] Kullback–Leibler divergence The directed Kullback–Leibler divergence between Exp(λ0) ('true' distribution) and Exp(λ) ('approximating' distribution) is given by [edit] Maximum entropy distribution Among all continuous probability distributions with support [0,∞) and mean μ, the exponential distribution with λ = 1/μ has the largest entropy. [edit] Distribution of the minimum of exponential random variables Let X1, ..., Xn be independent exponentially distributed random variables with rate parameters λ1, ..., λn. Then is also exponentially distributed, with parameter This can be seen by considering the complementary cumulative distribution function: The index of the variable which achieves the minimum is distributed according to the law Note that is not exponentially distributed. [edit] Parameter estimation Suppose a given variable is exponentially distributed and the rate parameter λ is to be estimated. [edit] Maximum likelihood The likelihood function for λ, given an independent and identically distributed sample x = (x1, ..., xn) drawn from the variable, is where is the sample mean. The derivative of the likelihood function's logarithm is Consequently the maximum likelihood estimate for the rate parameter is While this estimate is the most likely reconstruction of the true parameter λ, it is only an estimate, and as such, one can imagine that the more data points are available the better the estimate will be. It so happens that one can compute an exact confidence interval – that is, a confidence interval that is valid for all number of samples, not just large ones. The 100(1 − α)% exact confidence interval for this estimate is given by[1] Where is the MLE estimate, λ is the true value of the parameter, and χ2k; x is the value of the chi squared distribution with k degrees of freedom that gives x cumulative probability (i.e. the value found in chi-squared tables [1]). [edit] Bayesian inference The conjugate prior for the exponential distribution is the gamma distribution (of which the exponential distribution is a special case). The following parameterization of the gamma pdf is useful: The posterior distribution p can then be expressed in terms of the likelihood function defined above and a gamma prior: Now the posterior density p has been specified up to a missing normalizing constant. Since it has the form of a gamma pdf, this can easily be filled in, and one obtains Here the parameter α can be interpreted as the number of prior observations, and β as the sum of the prior observations. [edit] Prediction Having observed a sample of n data points from an unknown exponential distribution a common task is to use these samples to make predictions about future data from the same source. A common predictive distribution over future samples is the so-called plug-in distribution, formed by plugging a suitable estimate for the rate parameter λ into the exponential density function. A common choice of estimate is the one provided by the principle of maximum likelihood, and using this yields the predictive density over a future sample xn+1, conditioned on the observed samples x = (x1, ..., xn) given by The Bayesian approach provides a predictive distribution which takes into account the uncertainty of the estimated parameter, although this may depend crucially on the choice of prior. A recent alternative that is free of the issues of choosing priors is the Conditional Normalized Maximum Likelihood (CNML) predictive distribution [2] The accuracy of a predictive distribution may be measured using the distance or divergence between the true exponential distribution with rate parameter, λ0, and the predictive distribution based on the sample x. The Kullback–Leibler divergence is a commonly used, parameterisation free measure of the difference between two distributions. Letting Δ(λ0||p) denote the Kullback–Leibler divergence between an exponential with rate parameter λ0 and a predictive distribution p it can be shown that where the expectation is taken with respect to the exponential distribution with rate parameter λ0 ∈ (0, ∞), and ψ( · ) is the digamma function. It is clear that the CNML predictive distribution is strictly superior to the maximum likelihood plug-in distribution in terms of average Kullback–Leibler divergence for all sample sizes n > 0. [edit] Generating exponential variates A conceptually very simple method for generating exponential variates is based on inverse transform sampling: Given a random variate U drawn from the uniform distribution on the unit interval (0, 1), the variate has an exponential distribution, where F −1 is the quantile function, defined by Moreover, if U is uniform on (0, 1), then so is 1 − U. This means one can generate exponential variates as follows: Other methods for generating exponential variates are discussed by Knuth[3] and Devroye.[4] The ziggurat algorithm is a fast method for generating exponential variates. A fast method for generating a set of ready-ordered exponential variates without using a sorting routine is also available.[4] [edit] Related distributions An exponential distribution is a special case of a gamma distribution with α = 1 (or k = 1 depending on the parameter set used). Both an exponential distribution and a gamma distribution are special cases of the phase-type distribution. The exponential distribution is closely related to the double exponential distribution (a.k.a. the Laplace distribution), which is a shifted version of an exponential distribution applied to the absolute value of a quantity (graphically, two exponential distributions glued back-to-back). Y ∼ Pareto(xm, α), i.e. Y has a Pareto distribution, if Y = xmeX and X ∼ Exponential(α). Y ∼ Weibull(γ, λ), i.e. Y has a Weibull distribution, if Y = X1/γ and X ∼ Exponential(λ−). In particular, every exponential distribution is also a Weibull distribution. Y ∼ Rayleigh(σ), i.e. Y has a Rayleigh distribution, if and X ∼ Exponential(λ). Y ∼ Gumbel(μ, β), i.e. Y has a Gumbel distribution, if Y = μ − βlog(Xλ) and X ∼ Exponential(λ). Y ∼ Laplace, i.e. Y has a Laplace distribution, if Y = X1 − X2 for two independent exponential distributions X1 and X2. Y ∼ Exponential, i.e. Y has an exponential distribution, if Y = min(X1, …, XN) for independent exponential distributions Xi. Y ∼ Uniform(0, 1), i.e. Y has a uniform distribution if Y = exp( − Xλ) and X ∼ Exponential(λ). X ∼ χ22, i.e. X has a chi-square distribution with 2 degrees of freedom, if . Let X1…Xn ∼ Exponential(λ) be exponentially distributed and independent and Y = ∑i=1nXi. Then Y ∼ Gamma(n, 1/λ), i.e. Y has a Gamma distribution. X ∼ SkewLogistic(θ), then log(1 + e−−) ∼ Exponential(θ): see skew-logistic distribution. Let X ∼ Exponential(λX) and Y ∼ Exponential(λY) be independent. Then has probability density function . This can be used to obtain a confidence interval for . Other related distributions: Hyper-exponential distribution – the distribution whose density is a weighted sum of exponential densities. Hypoexponential distribution – the distribution of a general sum of exponential random variables. exGaussian distribution – the sum of an exponential distribution and a normal distribution. [edit] Independent events The standard definition says: Two events A and B are independent if and only if Pr(A ∩ B) = Pr(A)Pr(B). Here A ∩ B is the intersection of A and B, that is, it is the event that both events A and B occur. More generally, any collection of events—possibly more than just two of them—are mutually independent if and only if for every finite subset A1, ..., An of the collection we have variables X and Y are independent if and only if for every a and b, the events {X ≤ a} and {Y ≤ b} are independent events as defined above. Mathematically, this can be described as follows: The random variables X and Y with distribution functions FX(x) and FY(y), and probability densities ƒX(x) and ƒY(y), are independent if and only if the combined random variable (X, Y) has a joint cumulative distribution function or equivalently, a joint density . Similar expressions characterise independence more generally for more than two random variables. An arbitrary collection of random variables – possibly more than just two of them — is independent precisely if for any finite collection X1, ..., Xn and any finite set of numbers a1, ..., an, the events {X1 ≤ a1}, ..., {Xn ≤ an} are independent events as defined above. The measure-theoretically inclined may prefer to substitute events {X ∈ A} for events {X ≤ a} in the above definition, where A is any Borel set. That definition is exactly equivalent to the one above when the values of the random variables are real numbers. It has the advantage of working also for complex-valued random variables or for random variables taking values in any measurable space (which includes topological spaces endowed by appropriate σ-algebras). If any two of a collection of random variables are independent, they may nonetheless fail to be mutually independent; this is called pairwise independence. If X and Y are independent, then the expectation operator E has the property and for the variance we have so the covariance cov(X, Y) is zero. (The converse of these, i.e. the proposition that if two random variables have a covariance of 0 they must be independent, is not true. See uncorrelated.) Two independent random variables X and Y have the property that the characteristic function of their sum is the product of their marginal characteristic functions: but the reverse implication is not true (see subindependence). [edit] Independent σ-algebras The definitions above are both generalized by the following definition of independence for σ-algebras. Let (Ω, Σ, Pr) be a probability space and let A and B be two sub-σalgebras of Σ. A and B are said to be independent if, whenever A ∈ A and B ∈ B, The new definition relates to the previous ones very directly: Two events are independent (in the old sense) if and only if the σ-algebras that they generate are independent (in the new sense). The σ-algebra generated by an event E ∈ Σ is, by definition, Two random variables X and Y defined over Ω are independent (in the old sense) if and only if the σ-algebras that they generate are independent (in the new sense). The σ-algebra generated by a random variable X taking values in some measurable space S consists, by definition, of all subsets of Ω of the form X−1(U), where U is any measurable subset of S. Using this definition, it is easy to show that if X and Y are random variables and Y is Pralmost surely constant, then X and Y are independent, since the σ-algebra generated by an almost surely constant random variable is the trivial σ-algebra {∅, Ω}. [edit] Conditionally independent random variables Main article: Conditional independence Intuitively, two random variables X and Y are conditionally independent given Z if, once Z is known, the value of Y does not add any additional information about X. For instance, two measurements X and Y of the same underlying quantity Z are not independent, but they are conditionally independent given Z (unless the errors in the two measurements are somehow connected). The formal definition of conditional independence is based on the idea of conditional distributions. If X, Y, and Z are discrete random variables, then we define X and Y to be conditionally independent given Z if for all x, y and z such that P(Z = z) > 0. On the other hand, if the random variables are continuous and have a joint probability density function p, then X and Y are conditionally independent given Z if for all real numbers x, y and z such that pZ(z) > 0. If X and Y are conditionally independent given Z, then for any x, y and z with P(Z = z) > 0. That is, the conditional distribution for X given Y and Z is the same as that given Z alone. A similar equation holds for the conditional probability density functions in the continuous case. Independence can be seen as a special kind of conditional independence, since probability can be seen as a kind of conditional probability given no events.