Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Random Variables M. George Akritas M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Random Variables and Their Distribution Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Definition Let S be the sample space of some probabilistic experiment. A function X : S → R is called a random variable. Example 1. A unit is selected at random from a population of units. Thus S is the collection of units in the population. Suppose a characteristic (weight, volume, or opinion on a certain matter) is recorded. A numerical description of the outcome is a random variable. P 2. S = {s = (x1 . . . , xn ) : xi ∈ R, ∀i}, X (s) = i xi or X (s) = x, or X (s) = max{x1 . . . , xn }. 3. S = {s : 0 ≤ s < ∞} (e.g. we may be recording the life time of an electrical component), X (s) = I (s > 1500), or √ X (s) = s, or X (s) = log(s). M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function I A random variable X induces a probability measure on the range of its values, which is denoted by X (S). (SX in the book.) I X (S) can be thought of as the sample space of a compound experiment which consists of the original experiment, and the subsequent transformation of the outcome into a numerical value. Because the value X (s) of the random variable X is determined from the outcome s, we may assign probabilities to the possible values of X . I I I For example, if a die is rolled and we define X (s) = 1 for s = 1, 2, 3, 4, and X (s) = 0 for s = 5, 6, then P(X = 1) = 4/6, P(X = 0) = 2/6. The probability measure PX , induced on X (S) by the random variable X , is called the (probability) distribution of X . M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables I Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function The distribution of a random variable is considered known if the probabilities PX ((a, b]) = P(a < X ≤ b) are known for all a < b. Definition A random variable X is called discrete if X (S) is a finite or a countably infinite set. If X (S) is uncountably infinite, then X is called continuous. I For discrete r.v.’s X , PX is completely specified by the probabilities PX ({k}) = P(X = k), for each k ∈ X (S). I The function p(x) = P(X = x) is called the probability mass function of X . M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Example Consider a batch of size N = 10 products, 3 of which are defective. Draw 3 at random and without replacement and let the r.v. X denote the number of defective items. Find the pmf of X . Solution: The sample space of X is SX = {0, 1, 2, 3}, and: 7 7 3 P(X = 0) = 3 , 10 3 2 3 P(X = 2) = 7 1 P(X = 1) = 1 , 10 3 3 3 10 3 2 , P(X = 3) = 10 3 Thus, the pmf of X is x p(x) 0 0.292 1 0.525 M. George Akritas 2 0.175 Random Variables 3 0.008 Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function 0.7 Bar Graph Probability Mass Function 0.6 0.5 0.4 0.3 0.2 0.1 0 −1 0 1 2 x−values M. George Akritas Random Variables 3 4 Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Example Three balls are randomly selected at random and without replacement from an urn containing 20 balls numbered 1 through 20. Find the probability that at least one of the balls will have number ≥ 17. Solution: Here S = {s = (i1 , i2 , i3 ) : 1 ≤ i1 , i2 , i3 ≤ 20}, X (s) = max{i1 , i2 , i3 }, X (S) = {3, 4, . . . , 20} and we want to find P(X ≥ 17) = P(X = 17) + P(X = 18) + P(X = 19) + P(X = 20). These are found from the formula k−1 P(X = k) = 2 20 3 (why?) The end result is P(X ≥ 17) = 0.508. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function The PMF of a Function of X Let X be a discrete random variable with range (i.e. set of possible values) X and distribution PX , and let Y = g (X ) be a function of X with range Y. Then the pmf pY (y ) of Y is given in terms of the pmf pX (x) of X by P I pY (y ) = x∈X :g (x)=y pX (x), for all y ∈ Y. Example Roll a die and let X denote the outcome. If X = 1 or 2, you win $1; if X = 3 you win $2, and if X ≥ 4 you win $4. Let Y denote your prize. Find the pmf of Y . Solution: The pmf of Y is: y pY (y ) M. George Akritas 1 0.333 Random Variables 2 0.167 4 0.5 Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Definition The function FX : R → [0, 1] (or simply F if no confusion is possible) defined by FX (x) = P(X ≤ x) = PX ((−∞, x]) is called the (cumulative) distribution function of the rv X . Proposition FX determines the probability distribution, PX , of X . Proof: We have that PX is determined by its value PX ((a, b]) for any interval (a, b]. However, PX ((a, b]) is determined from FX by PX ((a, b]) = FX (b) − FX (a). M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Example Consider a batch of size N = 10 products, 3 of which are defective. Draw 3 at random and without replacement and let the r.v. X denote the number of defectives. Find the cdf of X . Solution: x p(x) F (x) 0 0.292 0.292 1 0.525 0.817 2 0.175 0.992 3 0.008 1.000 Moreover, F (−1) = 0. F (1.5) = 0.817. Also, p(1) = F (1) − F (0) M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Example Consider a random variable X with cumulative distribution function given by F (x) = 0, for all x that are less than 1, F (x) = 0.4, for all x such that 1 ≤ x < 2, F (x) = 0.7, for all x such that 2 ≤ x < 3, F (x) = 0.9, for all x such that 3 ≤ x < 4, F (x) = 1, for all x that are greater than or equal to 4. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function 1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6 Figure: The CDF of a Discrete Distribution is a Step or Jump Function M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Example Let X have cdf as shown above. Use the form of the cdf to deduce the distribution of X . Solution. Since its cdf is a jump function, we conclude that X is discrete with sample space the jump points of its cdf, i.e. 1,2,3, and 4. Finally, the probability with which X takes each value equals the size of the jump at that value (for example, P(X = 1) = 0.4). These deductions are justified as follows: a) P(X < 1) = 0 means that X cannot a value less than one. b) F (1) = 0.4, implies that P(X = 1) = 0.4. c) The second of the equations defining F also implies that P(1 < X < 2) = 0, and so on. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Proposition (Properties of the CDF) 1. If a ≤ b then F (a) ≤ F (b). 2. F (−∞) = 0, F (∞) = 1. 3. If a < b, then P([a < X ≤ b]) = F (b) − F (a). 4. F (x) is right continuous. 5. If p(x) is the pmf, then I I I I P F (x) = k≤x p(k) and p(k) = FX (k) − F (k − 1) F is a jump or step function. The flat regions of F correspond to regions where X takes no values. The size of the jump at each x ∈ SX equals p(x). M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables I The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables A unit is selected at random from a population of N units, and let the random variable X be the (numerical) value of a characteristic of interest. Let v1 , v2 , . . . , vN be the values of the characteristic of interest of each of the N units. Then, the expected value of X , denoted by µX or E (X ) is defined by N 1 X vi E (X ) = N i=1 Example 1. Let X denote the outcome of a roll of a die. Find E (X ). 2. Let X denote the outcome of a roll of a die that has the six on four sides and the number 8 on the other two sides. Find E (X ). M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The expected value, E (X ) or µX , of a discrete r.v. X having a possibly infinite sample space SX and pmf p(x) = P(X = x), for x ∈ SX , is defined as X µX = xp(x). x in SX Example Roll a die and let X denote the outcome. If X = 1 or 2, you win $1; if X = 3 you win $2, and if X ≥ 4 you win $4. Let Y denote your prize. Find E (Y ). Solution: The pmf of Y is: y pY (y ) 1 0.333 2 0.167 Thus, E (Y ) = 0.333 + 2 × 0.167 + 4 × 0.5 = 2.667 M. George Akritas Random Variables 4 0.5 Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example You are given a choice between accepting $ 3.52 = 12.25 or roll a die and win X 2 . What will you choose and why? Solution: If the game will be played several times your decision will be based on the value of E (X 2 ). (Why?) To find this use 1 + 4 + 9 + 16 + 25 + 36 = 91. Proposition Let X be a discrete r.v. taking values xi , i ≥ 1, having pmf pX . Then, X E [g (X )] = g (xi )pX (xi ). i M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example A product that a particular store location stocks monthly, yields a net profit of b dollars for each unit sold and a net loss of ` dollars for each unit left unsold at the end of the month. The monthly demand (i.e. # of units ordered) for this product is a rv having pmf p(k), k ≥ 0. If the store stocks s units, find the expected profit, and determine the number of units the store should stock to maximize the expected profit. Solution: Let X be the monthly demand. The random variable of interest here is the profit Ys = gs (X ) = bX − (s − X )`, if X ≤ s = bs, if X > s M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables Next, E (Ys ) = sb + (b + `) Ps The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables x=0 (x − s)p(x) (details in class). To determine the optimum value of s, note that the difference E (Ys+1 ) − E (Ys ) > 0 provided s X p(x) < x=0 b b+` (details in class). Thus, if the inequality on the right hand side above holds, stocking s + 1 units is better than stocking s. Let sL be the largest value of s that satisfied the inequality. Then, stocking sopt = sL + 1 maximizes the expected profit. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables I The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables For constants a, b we have E (aX + b) = aE (X ) + b. Definition The variance, σX2 or Var(X ), and standard deviation, σX or SD(X ), of a rv X are q 2 2 σX = Var(X ) = E (X − µX ) , σX = σX2 . Proposition Two common properties of the variance are 1. Var(X ) = E (X 2 ) − µ2X 2. Var(aX + b) = a2 Var(X ) M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Bernoulli Random Variable I I I A r.v. X is called Bernoulli if it takes only two values. The two values are referred to as success (S) and failure (F), or are re-coded as 1 and 0. Thus, always, SX = {0, 1}. Experiments resulting in a Bernoulli r.v. are called Bernoulli. Example 1. A product is inspected. Set X = 1 if defective, X = 0 if non-defective. 2. A product is put to life test. Set X = 1 if it lasts more than 1000 hours, X = 0 otherwise. I If P(X = 1) = p, we write X ∼ Bernoulli(p) to indicate that X is Bernoulli with probability of success p. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Bernoulli Random Variable If X ∼ Bernoulli(p), then I Its pmf is: x p(x) 0 1−p 1 p I Its expected value is, E (X ) = p I Its variance is, σX2 = p(1 − p). (Why?) M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Binomial Random Variable I A experiment consisting of n independent replications of a Bernoulli experiment is called a Binomial experiment. I If X1 , X2 , . . . , Xn are the Bernoulli r.v. for the n Bernoulli experiments, Y = n X Xi = the total number of 1s, i=1 is the Binomial r.v. Clearly SY = {0, 1, . . . , n}. I We write Y ∼ Bin(n, p) to indicate that Y is binomial with probability of success equal to p for each Bernoulli trial. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Binomial Random Variable If Y ∼ Bin(n, p), then I I Its pmf is: n k P(Y = k) = p (1 − p)n−k , k = 0, 1, . . . , n k n n−1 n n−1 Use x =n and x 2 = nx to get. x x −1 x x −1 1. Its expected value is E (Y ) = np 2. Its variance is σY2 = np(1 − p) M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables M. George Akritas The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example A company sells screws in packages of 10 and offers a money-back guarantee if two or more of the screws are defective. If a screws is defective with probability 0.01, independently of other screws, what proportion of the packages sold will the company replace? Solution: 1 − P(X = 0) − P(X = 1) ∼ = 0.004 M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example Physical traits, such as eye color, are determined from a pair of genes, each of which can be either dominant (d) or recessive (r ). One inherited from the mother and one from the father. Persons with genes (dd) (dr) and (rd) are alike in that physical trait. Assume that a child is equally likely to inherit either of the two genes from each parent. If both parents are hybrid with respect to a particular trait (i.e. both have pairs of genes (dr) or (rd)), find the probability that three of their four children will be like their parents in that physical trait. Solution: Probability that an offspring of two hybrid parents is also hybrid is 0.75. Thus, the desired probability is 4 0.753 0.251 ∼ = 0.422. 3 M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example In order for the defendant to be convicted in a jury trial, at least eight of the twelve jurors must enter a guilty vote. Assume each juror makes the correct decision with probability 0.7 independently of other jurors. If 40% of defendants in such jury trials are innocent, what is the probability that the jury renders the correct verdict to a randomly selected defendant? Solution: Let B = {jury renders the correct verdict}, and A = {defendant is innocent}. Then, according to the Law of Total Probability, P(B) = P(B|A)P(A) + P(B|Ac )P(Ac ) = P(B|A)0.4 + P(B|Ac )0.6. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Solution Continued: Next, let X denote the number of jurors who reach the correct verdict in a particular trial. Thus, X ∼ Bin(12, 0.7), and P(B|A) = P(X ≥ 5) = 1 − 4 X 12 k=0 k 0.7k 0.312−k = 0.9905, 12 X 12 P(B|A ) = P(X ≥ 8) = 0.7k 0.312−k = 0.724. k c k=8 Thus, P(B) = P(B|A)0.4 + P(B|Ac )0.6 = 0.8306 M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example A communications system consisting of n components works if at least half of its components work. Suppose it is possible to add components to the system, and that currently the system has n = 2k − 1 components. 1. Show that by adding one component the system becomes more reliable for all integers k ≥ 1. 2. Show that this is not necessarily the case if we add two components to the system. Solution: 1. Let An = {the system works when it has n components}. Then A2k−1 = {k or more of the 2k − 1 work} A2k = A2k−1 ∪ {k − 1 of the original 2k − 1 work, and the 2kth works} M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Solution Continued: It follows that A2k−1 ⊆ A2k . Thus, P(A2k−1 ) ≤ P(A2k ). 2. Using the same notation, A2k+1 = {k + 1 or more of the original 2k − 1 work} ∪ {k of the original 2k − 1 work, and at least one of the 2kth and (2k + 1)th work} ∪ {k − 1 of the original 2k − 1 work, and both the 2kth and (2k + 1)th work}. It is seen that A2k−1 is not a subset of A2k+1 , since, for example, A2k−1 includes the outcome {k of the original 2k − 1 work} but A2k+1 does not. It is also clear that A2k+1 is not a subset of A2k−1 . Thus, more information is needed to compare the reliability of the two systems. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example (Example Continued) Suppose each component functions with probability p independently of the others. For what value of p is a (2k + 1)-component system more reliable than a (2k − 1)-component system? Solution: Let X denote the number of the first 2k − 1 that function. Then, P(A2k−1 ) = P(X ≥ k) = P(X = k) + P(X ≥ k + 1) P(A2k+1 ) = P(X ≥ k + 1) + P(X = k)(1 − (1 − p)2 ) + P(X = k − 1)p 2 and P(A2k+1 ) − P(A2k−1 ) > 0 iff p > 0.5. (Details in class.) M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Binomial CDF Tables Example Suppose 70% of all purchases in a certain store are made with credit card. Let X denote the number of credit card uses in the next 10 purchases. Find a) µX and σX2 , and b) P(5 ≤ X ≤ 8). Solution. It seems reasonable to assume that X ∼ Bin(10, 0.7). a) E (X ) = np = 10(0.7) = 7, σX2 = 10(0.7)(0.3) = 2.1. b) Using the binomial table, we have P(5 ≤ X ≤ 8) = P(4 < X ≤ 8) = F (8) − F (4) = 0.851 − 0.047 = 0.804. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Hypergeometric Random Variable The hypergeometric distribution arises when a simple random sample of size n is taken from a finite population of N units of which M are labeled 1 and the rest are labeled 0. The number X of units labeled 1 in the sample is a hypergeometric random variable with parameters n, M and N. This is denoted by X ∼ Hypergeo(n, N, M) If X ∼ Hypergeo(n, N, M), its pmf is M N−M I P(X = x) = x n−x N n Note that P(X = x) = 0 if x > M, or if n − x > N − M. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables 0.4 Hypergeometric (10,60,M) 0.2 0.1 0.0 P(X=k) 0.3 M = 15 M = 30 M = 45 0 2 4 6 8 Figure: Some Hypergeometric PMFs. M. George Akritas Random Variables 10 Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Applications of the Hypergeometric Distribution Example (Quality Control) A company buys electrical components in batches of size 10. Quality inspection consists of choosing 3 components at random and accepting the batch only if all 3 are nondefective. If 30% of the batches have 4 defective components and 70% have only 1, what proportion of batches does the company accept? Solution: Let A be the event a batch is accepted. P(A) = P(A|4 defectives)0.3 + P(A|1 defective)0.7 1 9 4 6 = 0 3 0.3 + 10 3 M. George Akritas 0 3 0.7 = 0.54. 10 3 Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example (The Capture/Recapture Method) This method is used to estimate the size N of a wildlife population. Suppose that 10 animals are captured, tagged and released. On a later occasion, 20 animals are captured. Let X be N the number of recaptured animals. If all 20 possible groups are equally likely, X is more likely to take small values if N is large. The precise form of the hypergeometric pmf can be used to estimate N from the value that X takes. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables If X ∼ Hypergeo(n, N, M) then, I I Its expected value is: µX = n M N M M N −n 2 Its variance is: σX = n 1− N N N −1 I N −n is called finite population correction factor N −1 I Binomial Approximation to Hypergeometric Probabilities If n ≤ 0.05 × N, then P(X = x) ' P(Y = x), where Y ∼ Bin(n, p = M/N). M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example (Illustration of the binomial approximation) We will contrast P(X = 2) for X ∼ Hypergeo(n = 10, N, M), when M/N = 0.25, with its binomial approximation P(Y = 2) = 0.282 for Y ∼ Bin(n = 10, p = 0.25). 1. If N = 20 and M = 5, then P(X = 2) = 5 2 15 8 20 10 2. If N = 100 and M = 25, then 25 P(X = 2) = M. George Akritas 2 = 0.348. 75 8 100 10 = 0.292, Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Binomial Approximation to Hypergeometric Probabilities I As N, M → ∞, with M/N → p, and n remains fixed, M N−M n x x n−x → p (1 − p)n−x , ∀x = 0, 1, . . . , n. N x n One way to show √this is via Stirling’s formula for approximating factorials: n! ' 2πn( ne )n , or more precisely n! = √ 2πn n n e e λn where 1 1 < λn < 12n + 1 12n Use this √ on the left hand side and note that the terms resulting from 2πn tend to 1, and powers of e cancel. Thus, M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables M x N−M n−x N n ' The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables n M M (N − M)N−M (N − n)N−n N x N (M − x)M−x (N − M − n + x)N−M−n+x MM (M − x)M−x = (1 + x )M−x M x M −x (N − n)N−n NN = (1 − n N ) (N − n)−n N (N − M N−M (N − M − n + x)N−M−n+x = (1 + n−x )N−M−n+x (N − M)n−x N −M −n+x M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Negative Binomial Random Variable I In the negative binomial experiment, a Bernoulli experiment is repeated independently until the r th 1 is observed. For example, products are inspected, as they come off the assembly line, until the r th defective is found. I The number, Y , of Bernoulli experiments until the r th 1 is observed is the negative binomial r.v. I If p is the probability of 1 in a Bernoulli trial, we write Y ∼ NBin(r , p) I If r = 1, Y is called the geometric r.v. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Negative Binomial Random Variable If Y ∼ NBin(r , p), then I Its pmf is: y −1 r P(Y = y ) = p (1 − p)y −r , y = r , r + 1, . . . r −1 I Its expected value is: E (Y ) = I Its variance is: σY2 = M. George Akritas r p r (1 − p) p2 Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables M. George Akritas The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables I The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables If r = 1 the Negative Binomial is called Geometric: I I P(X = x|p) = p(1 − p)x−1 , x ≥ 1. The ”memoryless” property: For integers s > t, P(X > s|X > t) = P(X > s − t) Example Independent Bernoulli trials are performed with probability of success p. Find the probability that r successes will occur before m failures. Solution: r successes will occur before m failures iff the r th success occurs no later than the (r + m − 1)th trial. Hence the desired probability is found from r +m−1 X k − 1 p r (1 − p)k−r r −1 k=r M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example Two athletic teams, A and B, play a best-of-three series of games. Suppose team A is the stronger team and will win any game with probability 0.6, independently from other games. Find the probability that the stronger team will be the overall winner. Solution: Let X be the number of games needed for team A to win twice. Then X has the negative binomial distribution with r = 2 and p = 0.6. Team A will win the series if X = 2 or X = 3. Thus, P(Team A wins the series) = P(X = 2) + P(X = 3) 1 2 = 0.62 (1 − 0.6)2−2 + 0.62 (1 − 0.6)3−2 1 1 = 0.36 + 0.288 = 0.648 M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example A candle is lit every evening at dinner time with a match taken from one of two match boxes. Assume each box is equally likely to be chosen and that initially both contained N matches. What is the probability that there are exactly k matches left, k = 0, 1, . . . , N, when one of the match boxes is first discovered empty? Solution: Let E be the event that box #1 is discovered empty and there are k matches in box #2. E will occur iff the (N + 1)th choice of box #1 is made at the (N + 1 + N − k)th trial. Thus, 2N − k P(E ) = 0.52N−k+1 , N and the desired probability is 2P(E ). M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example Three electrical engineers toss coins to see who pays for coffee. If all three match, they toss another round. Otherwise the ’odd person’ pays for coffee. 1. Find the probability of a round of tossing resulting in a match. Answer: 0.53 + 0.53 = 0.25. 2. Let Y be the number of times they toss coins until the odd person is determined. What is the distribution of Y ? Answer: Geometric with p = 0.75. 3. Find P(Y ≥ 3). Answer: P(Y ≥ 3) = 1 − P(Y = 1) − P(Y = 2) = 1 − 0.75 − 0.75 × 0.25 = 0.0625. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Poisson Random Variable I A RV X with SX = {0, 1, 2, . . .} is a Poisson RV with parameter λ, X ∼ Poisson(λ), if its pmf is p(x) = P(X = k) = e −λ λx , x = 0, 1, 2, . . . , x! for some λ > 0. I I P∞ x=0 p(x) = 1 follows from e λ = P∞ k=0 (λ µX = λ, σX2 = λ. M. George Akritas Random Variables k /k!). Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables M. George Akritas The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables The Poisson random variable X can be: 1. the number of fish caught by an angler in an afternoon, 2. the number of new potholes in a stretch of I80 during the winter months, 3. the number of disabled vehicles abandoned in I95 in a year, 4. the number of earthquakes (or other natural disasters) in a region of the United States in a month, 5. the number of wrongly dialed telephone numbers in a given city in an hour, 6. the number of freak accidents, such as falls in the shower, in a given time period. 7. the number of hits in a website in a day. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables I In general, the Poisson distribution is used to model the probability that a number of certain events occur in a specified period of time (or distance, area or volume). I The events must occur at random and at a constant rate. I The occurrence of an event must not influence the timing of subsequent events (i.e. events occur independently). I Its earliest use dealt with the number of alpha particles emitted from a radioactive source in a given period of time. I Current applications include areas such as insurance industry, tourist industry, traffic engineering, demography, forestry and astronomy. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example (Use of the Poisson Table) Let X ∼ Poisson(5). Find: a) P(X ≤ 5), b) P(6 ≤ X ≤ 9), and c) P(X ≥ 10). Solution. a) P(X ≤ 5) = F (5) = 0.616. b) Write P(6 ≤ X ≤ 9) = P(5 < X ≤ 9) = P(X ≤ 9) − P(X ≤ 5) = F (9) − F (5) = 0.968 − 0.616. c) Write P(X ≥ 10) = 1 − P(X ≤ 9) = 1 − F (9) = 1 − 0.968. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example Suppose that a person taking Vitamin C supplements contracts an average of 3 colds per year, and that this average increases to 5 colds per year for persons not taking Vitamin C supplements. Suppose further that the number of colds a person contracts in a year is a Poisson random variable. 1. Compare the probability that a person taking Vitamin C supplements catches no more than two colds with the corresponding probability for a person not taking supplements. 2. Suppose 70% of the population takes Vitamin C supplements. Compute the probability that a randomly selected person will have no more than two colds in a given year. 3. Suppose that a randomly selected person contracts no more than two colds in a given year. What is the probability that he/she takes Vitamin C supplements? M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Proposition (Poisson Approximation to Binomial Probabilities) If Y ∼ Bin(n, p), with n ≥ 100, p ≤ 0.01, and np ≤ 20, then P(Y ≥ k) ' P(X ≥ k), k = 0, 1, 2, . . . , n, where X ∼ Poisson(λ = np). I The enormous range of applications of the Poisson distribution is due to this proposition. Read the discussion in p. 137 (following the proof of Proposition 3.5.1). M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example For the following four binomial RVs np = 3: a) Y1 ∼ Bin(9, 1/3), b) Y2 ∼ Bin(18, 1/6), c) Y3 ∼ Bin(30, 0.1), d) Y4 ∼ Bin(60, 0.05). Compare the P(Yi ≤ 2) with P(X ≤ 2) where X ∼ Poisson(3). 2 Comparison: First, P(X ≤ 2) = e −3 1 + 3 + 32 = 0.4232. Next, a) P(Y1 ≤ 2) = 0.3772, b) P(Y2 ≤ 2) = 0.4027, c) P(Y3 ≤ 2) = 0.4114, d) P(Y4 ≤ 2) = 0.4174. Note: The conditions of the Proposition on n and p are not satisfied for any of the four binomial RVs. M. George Akritas Random Variables Outline Random Variables and Their Distribution The Expected Value of Discrete Random Variables The Expected Value in the Simplest Case General Definition for Discrete RVs Types of Random Variables Example Due to a serious defect, n = 10, 000 cars are recalled. The probability that a car is defective is p = 0.0005. If Y is the number of defective cars, find: (a) P(Y ≥ 10), and (b) P(Y = 0). Solution. Here Y ∼ Bin(10, 000, 0.0005), and all conditions of the above Proposition are satisfied. Let X ∼ Poisson(λ = np = 5). Then, (a) P(Y ≥ 10) ' P(X ≥ 10) = 1 − P(X ≤ 9) = 1 − 0.968. (b) P(Y = 0) ' P(X = 0) = e −5 = 0.007. M. George Akritas Random Variables