Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variables, Distributions and Expectations The Distribution Function --Definition of a Distribution Function I. Introduction: Probability is the branch of mathematics that studies the possible outcomes of given events together with the outcomes' relative likelihoods and distributions. In common usage, the word "probability" is used to mean the chance that a particular event (or set of events) will occur expressed on a linear scale from 0 (impossibility) to 1 (certainty), also expressed as a percentage between 0 and 100%. The analysis of events governed by probability is called statistics. There are several competing interpretations of the actual "meaning" of probabilities. Frequentists view probability simply as a measure of the frequency of outcomes (the more conventional interpretation), while Bayesians treat probability more subjectively as a statistical procedure that endeavors to estimate parameters of an underlying distribution based on the observed distribution. A properly normalized function that assigns a probability "density" to each possible outcome within some interval is called a probability density function (or probability distribution function), and its cumulative value (integral for a continuous distribution or sum for a discrete distribution) is called a distribution function (or cumulative distribution function). The distribution function , also called the cumulative distribution function (CDF) or cumulative frequency function, describes the probability that a variate takes on a value less than or equal to a number . The distribution function is sometimes also denoted (Evans et al. 2000, p. 6). The distribution function is therefore related to a continuous probability density function by (1) (2) so (when it exists) is simply the derivative of the distribution function (3) Similarly, the distribution function is related to a discrete probability by (4) (5) There exist distributions that are neither continuous nor discrete. A joint distribution function can be defined if outcomes are dependent on two parameters: (6) (7) (8) Similarly, a multivariate distribution function can be defined if outcomes depend on parameters: (9) The probability content of a closed region can be found much more efficiently than by direct integration of the probability density function by appropriate evaluation of the distribution function at all possible extrema defined on the region (Rose and Smith 1996; 2002, p. 193). For example, for a bivariate distribution function content in the region , is given by , the probability (10) but can be computed much more efficiently using (11) Given a continuous , assume you wish to generate numbers distributed as using a random number generator. If the random number generator yields a uniformly distributed value in for each trial , then compute (12) The formula connecting with a variable distributed as is then (13) where is the inverse function of . For example, if were a normal distribution so that (14) then (15) A distribution with constant variance of for all values of is known as a homoscedastic distribution. The method of finding the value at which the distribution is a maximum is known as the maximum likelihood method. II. Definition: In probability theory and statistics, the cumulative distribution function (CDF), or just distribution function, completely describes the probability distribution of a real-valued random variable X. Cumulative distribution functions are also used to specify the distribution of multivariate random variables. For every real number x, the CDF of a real-valued random variable X is given by where the right-hand side represents the probability that the random variable X takes on a value less than or equal to x. The probability that X lies in the interval (a, b] is therefore FX(b) − FX(a) if a < b. If treating several random variables X, Y, ... etc. the corresponding letters are used as subscripts while, if treating only one, the subscript is omitted. It is conventional to use a capital F for a cumulative distribution function, in contrast to the lower-case f used for probability density functions and probability mass functions. This applies when discussing general distributions: some specific distributions have their own conventional notation, for example the normal distribution. The CDF of X can be defined in terms of the probability density function ƒ as follows: Note that in the definition above, the "less than or equal to" sign, "≤", is a convention, not a universally used one (e.g. Hungarian literature uses "<"), but is important for discrete distributions. The proper use of tables of the binomial and Poisson distributions depend upon this convention. Moreover, important formulas like Levy's inversion formula for the characteristic function also rely on the "less or equal" formulation. III. Definition of Terms: Probability Distribution The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values. It is also sometimes called the probability function or the probability mass function. More formally, the probability distribution of a discrete random variable X is a function which gives the probability p(xi) that the random variable equals xi, for each value xi: p(xi) = P(X=xi) It satisfies the following conditions: a. b. Cumulative Distribution Function All random variables (discrete and continuous) have a cumulative distribution function. It is a function giving the probability that the random variable X is less than or equal to x, for every value x. Formally, the cumulative distribution function F(x) is defined to be: for For a discrete random variable, the cumulative distribution function is found by summing up the probabilities as in the example below. For a continuous random variable, the cumulative distribution function is the integral of its probability density function. Probability Density Function The probability density function of a continuous random variable is a function which can be integrated to obtain the probability that the random variable takes a value in a given interval. More formally, the probability density function, f(x), of a continuous random variable X is the derivative of the cumulative distribution function F(x): Since it follows that: If f(x) is a probability density function then it must obey two conditions: a. that the total probability for all possible values of the continuous random variable X is 1: b. that the probability density function can never be negative: f(x) > 0 for all x. IV. Examples: 1. Discrete case : Suppose a random variable X has the following probability distribution p(xi): xi 0 1 2 3 4 5 p(xi 1/3 5/3 10/3 10/3 5/3 1/3 ) 2 2 2 2 2 2 This is actually a binomial distribution: Bi(5, 0.5) or B(5, 0.5). The cumulative distribution function F(x) is then: xi 0 1 2 3 4 5 F(xi 1/3 6/3 16/3 26/3 31/3 32/3 ) 2 2 2 2 2 2 F(x) does not change at intermediate values. For example: F(1.3) = F(1) = 6/32 F(2.86) = F(2) = 16/32 2. 3. This discrete probability density function models experiments that have only two possible outcomes. The probability of success is p and the probability of failure is q=1-p. The pdf models the probability that we will observe r sucesses and n-r failures in a total of ntrials. Figure 1: Graph of the probability distribution function and the cumulative probability distribution function (redrawn from 4. Suppose that two of fair dice are tossed. This time, let the random variable X denote the sum of the points. What is the sample space and what is the probability distribution for this experiment? In the Sample Space below, the first number of the ordered pair is the number showing on the first die, and the second number is the number showing on the second die. Notice that there are thirty-six possible results so the sample space has thirtysix elements. (1,6) (2,6) (3,6) (4,6) (5,6) (6,6) (1,5) (2,5) (3,5) (4,5) (5,5) (6,5) (1,4) (2,4) (3,4) (4,4) (5,4) (6,4) (1,3) (2,3) (3,3) (4,3) (5,3) (6,3) (1,2) (2,2) (3,2 (4,2) (5,2) (6,2) (1,1) (2,1) (3,1) (4,1) (5,1) (6,1) Table 1: Table: Sample space (Redrawn from cne.gmu.edu/modules/dau/prob/distributions/dis_1_frm.html example 2 probabilty distributions) 5. In the Probability Distribution Table below, X is the sum of the two numbers showing on the dice. If X = 2, the number showing on the first die must be one and the second die also is one. The distribution table shows there is only one chance out of thirty-six that both dice show one. When X = 3, the first die shows 1 and the second die shows 2 or vice versa. Thus there are two chances in thirty-six of this happening. 1 1 1 0 1 2 4 3 2 1 / / / / / 3 3 3 3 3 3 6 6 6 6 6 6 x 2 3 4 5 6 7 8 9 f 1 2 3 4 5 6 5 ( / / / / / / x 3 3 3 3 3 ) 6 6 6 6 6 Table 2: Table: Probability distribution table (Redrawn from cne.gmu.edu/modules/dau/prob/distributions/dis_1_frm.html example 2 probabilty distributions) Probability Distribution V. Summary: Distribution function, completely describes the probability distribution of a real-valued random variable X. This was popularly classified or defined and explained as the two; the cumulative distribution function and probability density function. All random variables (discrete and continuous) have a cumulative distribution function. It is a function giving the probability that the random variable X is less than or equal to x, for every value x, while the probability density function of a continuous random variable is a function which can be integrated to obtain the probability that the random variable takes a value in a given interval.