Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RANDOM VARIABLES: Binomial and hypergeometric examples BINOMIAL EXPERIMENT, probabilities, and random variable. Binomial experiment: Consider a sequence of n independent and identical trials, each resulting in one of two possible outcomes, ”success” (S) or ”failure” (F). Let p=P(S), and assume p remains the same from trial to trial. We are interested in the number of successes. Binomial probabilities: The probability of k successes in n Binomial trials is P (k successes in n trials) = n k ! pk (1 − p)n−k , for k = 0, 1, 2, . . . , n. (1) Binomial distribution is the assignment of binomial probabilities (above). Binomial random variable X= number of S in n binomial trials. ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ HYPERGEOMETRIC EXPERIMENT, probabilities, and random variable. Hypergeometric experiment: A box contains r red chips and w white chips, where r+w=N (total number of chips). We draw n chips at random, without replacement, and we are interested in the number of, say, red chips. Hypergeometric probabilities: The probability of k red chips in a hypergeometric experiment is P (k red chips among n sampled) = r k ! w n−k N n ! ! Hypergeometric distribution is the assignment of hypergeometric probabilities (above). Hypergeometric random variable X= number of red chips among the n sampled chips. 1 (2) DISCRETE RANDOM VARIABLES (RVs) Definition. Suppose a sample space S has finite or countable number of simple outcomes. Let p be a real valued function on S such that 1. 0 ≤ p(s) ≤ 1 for every element s of S; 2. P s∈S p(s) = 1, Then p is said to be a discrete probability function. NOTE: For any event A defined on S: P (A) = P s∈A p(s). Definition. A real valued function X : S → R is called a random variable. Definition. A random variable with finite or countably many values is called a discrete random variable. Definition. Any discrete random variable X is described by its probability density function (or probability mass function), denoted pX (k), which provides probabilities of all values of X as follows: pX (k) = P (s ∈ S : X(s) = k). (3) NOTE: For any k not in the range (set of values) of X: pX (k) = 0. NOTE: For any t ≤ s, P (t ≤ X ≤ s) = Ps k=t P (X = k). NOTATION: For simplicity, we denote pX (k) = P (X = k) thus suppressing the dependence on the sample space. Examples: 1. Binomial random variable X= number of S in the binomial experiment with n trials and probability of S equal to p, X ∼ Bin(n, p). pX (k) = P (k successes in n trials) = n k ! pk (1 − p)n−k , for k = 0, 1, 2, . . . , n. (4) 2. Hypergeometric random variable X= number of red chips selected among n sampled in a hypergeometric experiment. Definition. Let X a discrete random variable. For any real number t, the cumulative distribution function F of X at t is given by FX (t) = P (X ≤ t) = P (s ∈ S : X(s) ≤ t). (5) Linear transformation: Let X be a discrete random variable (rv). let Y=aX+b, where a and b are real constants. Then pY (y) = pX ( y−b ). a 2 CONTINUOUS RANDOM VARIABLES (RVs) Suppose a sample space S is uncountable, like real numbers. We think of such S as ”identical” to real numbers. We can define a probability structure on such a space using a special function, f called probability density function (pdf). Definition. A probability function P on a set of real numbers S is called continuous, if there exists a R function f(t) such that for any closed interval [a, b] ⊂ S we have: P ([a, b]) = ab f(t)dt. For a function f to be a pdf of some probability function P, it is necessary and sufficient that the following properties are satisfied: 1. f(t) ≥ 0 for every t; 2. R∞ −∞ f(t)dt = 1. NOTE: If P is such that P (A) = axioms. R A f(t)dt for all A, then P satisfies all the Kolmogorov probability Definition: Any function Y that maps S (a subset of real numbers) into the real numbers is called a continuoous random variable. The pdf of Y is a function f such that P (a ≤ Y ≤ b) = For any event A defined on S: P (A) = R Z b a f(t)dt. A f(t)dt. Definition. The cdf of a continuous random variable Y with pdf f is function FY (t) such that FY (y) = P (Y ≤ y) = P ({s ∈ S : Y (s) ≤ y}) = Z y −∞ f(t)dt for any real y. Theorem. If FY (t) is a cdf and fY (t) is a pdf of a continuous random variable Y , then d FY (t) = fY (t). dt Properties of cdf. Let Y be a continuous random variable with cdf F. Then the following are true: 1. P (Y > y) = 1 − FY (y); 2. P (a < Y < b) = FY (b) − FY (a); 3. limy→∞ FY (y) = 1; 4. limy→−∞ FY (y) = 0; Theorem: The cdf of any random variable is a nondecreasing function. Linear transformation: Let X be a continuous random variable with pdf f. Let Y = aX + b, where 1 a and b are real constants. Then the pdf of Y is: gY (y) = |a| fX ( y−b ). a 3 MORE ON RANDOM VARIABLES A General framework for random variables: Definition A triple (S, E, P), where S is a sample space, E is the set of subsets of S (events), and P is a probability function on S (that is P : E → [0, 1] satisfying Kolmogorov’s axioms), is called a probability space. Definition A real valued function X on S, X → R is called a random variable. Definition A probability function (measure) defined on the real numbers R as follows: P (A) = P (X ∈ A) = P ({s ∈ S : X(s) ∈ A}) is called the probability distribution of X. Theorem The distribution of X is uniquely determined by the cumulative distribution function (cdf) FX of X: FX (x) = P (X ≤ x) = P ((−∞, x]). Properties of cdf 1. F is nondecreasing: If x1 ≤ x2 , then F (x1) ≤ F (x2); 2. F is right - continuous: for any x, limy→x+ F (y) = F (x); 3. limy→∞ FY (y) = 1; 4. limy→−∞ FY (y) = 0. NOTE: Here are two useful rules for computing probabilties: 1. For a sequence of increasing sets A1 ⊂ A2 ⊂ . . . the probability of their union is the limit of their S probabilities, that is: P ( ∞ i=1 Ai) = limi→∞ P (Ai ). 2. For a sequence of decreasing sets A1 ⊃ A2 ⊃ . . . the probability of their intersection is the limit of T their probabilities, that is: P ( ∞ i=1 Ai ) = limi→∞ P (Ai ). Types of distributions: There are three main types of distributions/random variables: 1. Discrete random variable: CDF is a step function, has at most countable number of values. 2. Continuous random variable: CDF is a continuous function, has intervals in the set of values. 3. Mixed random variable: CDF is neither continuous nor step function. Theorem: For any continuous random variable P (X = a) = 0 for any real number a. 4 EXPECTED VALUES OF RANDOM VARIABLES To get an idea about the central tendency for a random variable, we compute its expected value (mean). Definition Let X be a random variable. 1. If X is a discrete random variable with pdf pX (k), then the expected value of X is given by E(X) = µ = µX = X k · pX (k) = all k X k · P (X = k) all k 2. If X is a continuous random variable with pdf f, then EX = µ = µX = Z ∞ −∞ xf(x)dx. 3. If X is a mixed random variable with cdf F, then the expected value of X is given by E(X) = µ = µX = Z ∞ xF 0(x)dx + X k · P (X = k), all k where F 0 is the derivative of F where the derivative exists and k’s in the summation are the ”discrete” values of X. −∞ NOTE: For the expectation of a random variable to exist, we assume that all integrals and sums in the definition of the expectation above converge absolutely. Median of a random variable - a value ”dividing the distribution of X in halfs. If X is a discrete random variable, then its median m is the point for which P (X < m) = P (X > m). If there are two values m and m0 such that P (X ≤ m) = 0.5 and P (X ≥ m0) = 0.5, the median is the average of m and m0, (m + m0)/2. If X is a continuous random variable with pdf f, the median is the solution of the equation: Z m −∞ f(x)dx = 0.5. EXPECTED VALUES OF A FUNCTION OF A RANDOM VARIABLE Theorem. Let X be a random variable. Let g(·) be a function of X. If X is discrete with pdf pX (k), then the expected value of g(X) is given by Eg(X) = X g(k) · pX (k) = all k provided that P X g(k) · P (X = k), all k all k | g(k) | pX (k) is finite. If X is a continuous random variable with pdf fX (x), and if g is a continuous function, then the expected value of g(X) is given by Eg(X) = provided that R∞ −∞ Z ∞ −∞ g(x)f(x)dx, | g(x) | f(x)dx is finite. NOTE: Expected value is a linear operator, that is E(aX + b) = aE(X) + b, for any rv X. 5 VARIANCE and HIGHER MOMENTS of A RANDOM VARIABLE To get an idea about variability of a random variable, we look at the measures of spread. These include variance and standard deviation. Definition. Variance of a random variable, denoted Var(X) or σ 2 , is the average of its squared deviations from the mean µ. Let X be a random variable. 1. If X is a discrete random variable with pdf pX (k) and mean µX , then the variance of X is given by X X V ar(X) = σ 2 = E[(X − µX )2 ] = (k − µX )2pX (k) = (k − µX )2P (X = k) all k all k 2. If X is a continuous random variable with pdf f and mean µX , then 2 2 V ar(X) = σ = E[(X − µX ) ] = Z ∞ −∞ (x − µX )2 f(x)dx. 3. If X is a mixed random variable with cdf F and mean µX , then the variance of X is given by V ar(X) = σ 2 = E[(X − µX )2 ] = Z ∞ −∞ (x − µX )2 F 0(x)dx + (k − µX )2 P (X = k), all k X where F 0 is the derivative of F where the derivative exists and k’s in the summation are the ”discrete” values of X. NOTE: If EX 2 is not finite, then variance does not exist. Definition. Standard deviation σ of a r.v. X is square root of its variance (if exists): σ = q V ar(X). NOTE: The units of variance are square units of the random variable. The units of standard deviation are the same as the units of the random variable. Theorem: Let X be a random variable with variance σ 2. Then, we can compute σ 2 as follows: V ar(X) = σ 2 = E(X 2 ) − µ2X = E(X 2 ) − [E(X)]2 Theorem: Let X be a r.v. with variance σ 2. Then variance of aX + b, for any real a and b, is given by: V ar(aX + b) = a2V ar(X). HIGHER MOMENTS OF A RANDOM VARIABLE Expected value is called the first moment of a random variable. Variance is called the second central moment or second moment about the mean of a random variable. In general, we have the following definition of the central and ordinary moments of random variables. Definition: Let X be an r.v. Then the 1. The rt h moment of X (about the origin) is µr = E(X r ), provided that the moment exists. 2. The rt h moment of X about the mean is µ0r = E[(X − µX )r ], provided that the moment exists. 6 JOINT DENSITIES- RANDOM VECTORS Joint densities describe probability distributions of random vectors. A random vector X is an ndimensional vector which components are random variables: X = (X1 , X2 , . . . , Xn ), where all Xi0 s are random variables. We will concentrate on bivariate random vectors, that is n=2. Discrete random vectors are described by the joint probability density function of X and Y (or joint pdf ) denoted by pX,Y (x, y) = P (s ∈ S : X(s) = x, Y (s) = y) = P (X = x, Y = y). Another name for the joint pdf of a discrete random vector is joint probability mass function (pmf ). Computing probabilities for discrete random vectors. For any subset A of R2 , we have P ((X, Y ) ∈ A) = X X P (X = x, Y = y) = (x,y)∈A pX,Y (x, y). (x,y)∈A Continuous random vectors are described by the joint probability density function of X and Y (or joint pdf ) denoted by fX,Y (x, y). The pdf has the following properties: 1. fX,Y (x, y) ≥ 0 for every (x, y) ∈ R2 . 2. R∞ R∞ −∞ −∞ fX,Y (x, y)dxdy = 1. 3. For any region A in the xy-plane P ((X, Y ) ∈ A) = RR A fX,Y (x, y)dxdy. Marginal distributions. Let (X, Y ) be a continuous/discrete random vector having a joint distribution with pdf/pmf f(x, y). Then, the one-dimensional distributions of X and Y are called marginal distributions. We compute the marginal distributions as follows: If (X, Y ) is a discrete vector, then the distributions of X and Y are given by: fX (x) = X P (X = x, Y = y) and fY (y) = all y X P (X = x, Y = y). all x If (X, Y ) is a continuous vector, then the distributions of X and Y are given by: fX (x) = Z ∞ −∞ f(x, y)dy and fY (y) = Z ∞ −∞ f(x, y)dx. Joint cdf of a vector (X, Y ). The joint cumulative distribution function of X and Y (or joint cdf) is defined by FX,Y (u, v) = P (X ≤ u, Y ≤ v). Theorem. Let FX,Y (u, v) be a joint cdf of the vector (X, Y ). Then the joint pdf of (X, Y ), fX,Y , ∂2 is given by second partial deriveative of the cdf. That is fX,Y (x, y) = ∂x∂y FX,Y (x, y), provided that FX,Y (x, y) has continuous second partial derivatives. Multivariate random vectors. Please read on page 215. 7 INDEPENDENT RANDOM VARIABLES Definition. Two random variables are called independent iff for every intervals A and B on the real line P (X ∈ A and Y ∈ B) = P (X ∈ A)P (Y ∈ B). Theorem. The random variables X and Y are independent iff fX,Y (x, y) = fX (x)fY (y), where f(x, y) is the joint pdf of (X, Y ), and fX (x) and fY (y) are the marginal densities of X and Y, respectively. NOTE: Random variables X and Y are independent iff FX,Y (x, y) = FX (x)FY (y), where F (x, y) is the joint cdf of (X, Y ), and FX (x) and FY (y) are the marginal cdf’s of the X and Y, respectively. Independence of more than 2 r.v.s A set of n random variables X1 , X2 , X3 , . . . , Xn are independent iff their joint pdf is a product of the marginal pdfs, that is fX1 ,X2 ,X3 ,...,Xn (x1, x2, x3 , . . . , xn ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn ), where fX1 ,X2 ,X3 ,...,Xn (x1, x2, x3 , . . . , xn ) is the joint pdf of the vector (X1 , X2 , X3 , . . . , X and fX1 (x1 ), fX2 (x2), · · · , fXn (xn ) are the marginal pdf’s of the variables X1 , X2 , X3 , . . . , Xn . Random Sample. A random sample of size n from distribution f is a set X1 , X2 , X3 , . . . , Xn of independent and identically distributed (iid), with distribution f, random variables. CONDITIONAL DISTRIBUTIONS Let (X, Y ) be a random vector with some joint pdf or pmf. Consider the problem of finding the probability that X=x AFTER a value of Y was observed. To do that we develop conditional distribution of X given Y=y. Definition. If (X, Y ) is a discrete random vector with pmf pX,Y (x, y), and if P (Y = y) > 0, then the conditional distribution of X given Y=y is given by the conditional pmf pX|Y =y (x) = pX,Y (x, y) . pY (y) Similarily, if P (X = x) > 0, then the conditional distribution of Y given X=x is given by the conditional p (x,y) pmf pY |X=x (y) = X,Y . pX (x) Definition. If (X, Y ) is a continuous random vector with pdf fX,Y (x, y), and if fY (y) > 0, then the conditional distribution of X given Y=y is given by the conditional pdf fX|Y =y (x) = fX,Y (x, y) . fY (y) Similarily, if fX (x) > 0, then the conditional distribution of Y given X=x is given by the conditional pdf f (x,y) fY |X=x (y) = X,Y . fX (x) Independence and conditional distributions. If random variables X and Y are independent, then their marginal pdf/pmf’s are the same as their conditional pdf/pmf’s. That is fY |X=x(y) = fY (y) and fX|Y =y (x) = fX (x), for all y and x where fY (y) > 0 and fX (x) > 0, respectively. 8 FUNCTIONS OF RANDOM VARIABLES or COMBINING RANDOM VARIABLES: PDF OF A SUM, PRODUCT AND QUOTIENT of TWO RVs. Let X and Y be independent random variables with pdf or pmf’s fX and fY or pX and pY , respectively. Then, If X and Y are discrete random variables, then the pmf of their sum W = X+ Y is pW (w) = X pX (x)pY (w − x). allx If X and Y are continuous random variables, then the pdf of their sum W = X+ Y is the convolution of the individual densities: Z ∞ fW (w) = fX (x)fY (w − x)dx. −∞ If X and Y are independent continuous random variables, then the pdf of their quotient W = Y/X is given by: Z ∞ fW (w) = | x | fX (x)fY (wx)dx. −∞ The above formula is valid, if X is equal to zero in at most a set of isolated points (no intervals). If X and Y are independent continuous random variables, then the pdf of their product W = XY is given by: Z ∞ 1 fX (w/x)fY (x)dx. fW (w) = −∞ | x | 9 MORE PROPERTIES OF MEAN AND VARIANCE Mean of a function of two random variables. Let (X, Y) be a discrete random vector with pmf p(x, y) or continuous random vector with pdf f(x, y). Let g(x, y) be a real values function of X and Y. Then, the expected value of he random variable g(X, Y ) is given by: Eg(x, y) = XX g(x, y)p(x, y), in discrete case, and allx ally Eg(x, y) = Z ∞ −∞ Z ∞ −∞ g(x, y)f(x, y)dxdy, in continuous case, provided that the sums and the integrals converge absolutely. Mean of a sum of random variables. Let X and Y be any random variables, and a and b real numbers. Then E(aX + bY ) = aE(X) + bE(Y ), provided both expectations are finite. NOTE: Let X1 , X2 , . . . , Xn be any random variables with finite means, and let a1, a2, . . . , an be a set of real numbers. Then E(a1X1 + a2X2 + · · · + an Xn ) = a1E(X1 ) + a2E(X2 ) + · · · + an E(Xn ). Mean of a product of independent random variables. If X and Y are independent random variables with finite expectations, then E(XY ) = (EX)(EY ). Variance of a sum of independent random variables. Let X1 , X2 , . . . , Xn be any independent random variables with finite second moments (i.e. E(Xi2 ) < ∞). Then V ar(X1 + X2 + · · · + Xn ) = V ar(X1 ) + V ar(X2 ) + · · · + V ar(Xn ). NOTE: Let X1 , X2 , . . . , Xn be any independent random variables with finite second moments, and let a1, a2, . . . , an be a set of real numbers. Then V ar(a1X1 + a2 X2 + · · · + an Xn ) = a21V ar(X1 ) + a22V ar(X2 ) + · · · + a2n V ar(Xn ). ORDER STATISTICS Definition. Let X1 , X2 , . . . , Xn be a random sample from a continuous distribution with pdf f(x). Let x1, x2 , . . . , xn be the values of the sample. Order the sample values from the smallest to the largest: x(1) < x(2) < . . . < x(n) . Define random variable X(i) to have the value x(1), 1 ≤ i ≤ n. Then X(i) is called the ith order statistic. Special order statistics. Maximum X(n) and minimum X(1) of a random sample. 10 Theorem. Let X1 , X2 , . . . , Xn be a random sample from a continuous distribution with pdf f(x) and cdf F(x). The pdf of the ith order statistics is given by fX(i) (x) = n! [F (x)]i−1[1 − F (x)]n−if(x), (i − 1)!(n − i)! for any 1 ≤ i ≤ n. 11 MOMENT GENERATING FUNCTIONS Definition. Let X be a random variable. The moment generating function MX (t) of X is given by MX (t) = ( P etk p (k) EetX R ∞allk tx X if X discrete, −∞ e f(x)dx if X continuous, at all real values t for which the expectation exists. Use for mgf’s. Moment generating functions are very useful in mathematical statistics. They are primarily used for two purposes: 1. Finding moments of random variables, and 2. Identifying distributions of random variables. Theorem. Let X be a continuous random variable with pdf f(x). Assume that the pdf f(x) is sufficiently smooth for the order of differentiation and integration to be exchanged. Let MX (t) be the mgf of X. Then (r) MX (0) = E(X r ), provided that the rt h moment of X exists. Theorem: Identifying distributions. Suppose that X1 and X2 are random variables with pdf/pmf’s fX1 (x) and fX2 (x), respectively. Suppose that MX1 (t) = MX2 (t) for all t in a neighborhood of 0. Then X1 and X2 are equal in distribution, that is fX1 (x) = fX2 (x). Theorem: Properties of mgf’s. This theorem describes the mgf of a linear function and of a sum of independent random variables. 1. Mgf of a linear function. Suppose that X is a random variable with mgf MX (t), and let Y = aX + b, where a and b are real numbers. Then, the mgf of Y is given by: MY (t) = ebt MX (at). 2. Mgf of a sum of independent rv’s. Suppose X1 , X2 , . . . , Xn are independent random variables with mgf’s MX1 (t), MX2 (t), . . . , MXn (t), respectively. Then, the mgf of their sum is the product of the mgf’s, that is MX1 +X2 +···+Xn (t) = MX1 (t) · MX2 (t) · · · MXn (t). 12