Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PHP 2510 Random variables; some discrete distributions Random variables - what are they? Probability mass function; cumulative distribution function Some discrete random variable models: • Bernoulli • Binomial • Geometric • Negative binomial PHP 2510 – Sept 18, 2008 1 Random variables A random variable is essentially a random number. Formally, a random variable maps elements of a sample space to the set of real numbers. Example. Toss a fair coin 3 times. The sample space of all possible sequences is Ω = {hhh, hht, hth, thh, htt, tht, tth, ttt} Examples of random variables: X = number of heads Y = number of consecutive heads Z = 1 if three heads, 0 if not. We denote random variables by italic uppercase letters. PHP 2510 – Sept 18, 2008 2 Discrete random variable takes on a finite (or countable) number of distinct values, such as the number of illnesses in a year. Continuous random variables take on values along a continuum, such as time until an event, or height of a randomly selected person. Our focus today is on discrete random variables PHP 2510 – Sept 18, 2008 3 Random variables and probability mass functions A probability mass function (PMF) describes the frequency or probability of each value of a random variable. Example. Let X be the number of heads in three tosses of a fair coin. The PMF of X is PHP 2510 – Sept 18, 2008 P (X = 0) = 1/8 P (X = 1) = 3/8 P (X = 2) = 3/8 P (X = 3) = 1/8 4 Example. Let Y be the number of consecutive heads in three tosses of a fair coin. P (Y = 0) = 1/8 P (Y = 1) = 4/8 P (Y = 2) = 2/8 P (Y = 3) = 1/8 Example. Let Z = 1 if 3 heads are tossed, and Z = 0 otherwise. The PMF of Z is PHP 2510 – Sept 18, 2008 P (Z = 0) = 7/8 P (Z = 1) = 1/8 5 PMF and CDF of a random variable The probability mass function (PMF) is usually denoted by p(x) = P (X = x). For a discrete variable having outcomes x1 , x2 , . . ., the PMF sums to one: ∑ p(xi ) = 1 i The cumulative distribution function (CDF) is defined as F (x) = P (X ≤ x). PHP 2510 – Sept 18, 2008 6 Example. Let X denote the number of heads in three tosses of a coin. This table shows the PMF and CDF of X: PHP 2510 – Sept 18, 2008 x p(x) F (x) 0 1/8 1/8 1 3/8 4/8 2 3/8 7/8 3 1/8 1 7 Bernoulli distribution A Bernoulli random variable takes on only two values: 0 (failure) and 1 (success). The probability of success is π, then the probability of failure is 1 − π. p(1) = π p(0) = 1 − π, or p(x) = π x (1 − π)x , for x = 0 or 1. Example: The prevalence of HIV infection is 11%. Let X be the HIV status of a randomly chosen people. X = 1 if HIV+; X = 0 if HIV-. Then, X has a Bernoulli distribution. p(X = 1) = 0.11, PHP 2510 – Sept 18, 2008 p(X = 0) = 0.89. 8 Binomial distribution The binomial model for a random variable X characterizes number of successes in n repeated trials of an experiment that can result either in success or failure. Example 1. X = number of heads on 10 tosses of a fair coin Example 2. Y = number of winning lottery tickets out of 10 million purchased Example 3. Z = number of 100 patients in a clinical trial who have cancer remission following an experimental treatment Example 4. W = number of the 3 transferred embryos that implant in a woman’s uterus following in-vitro fertilization PHP 2510 – Sept 18, 2008 9 Mass function for binomial distribution When trials are independent, probability of having x successes in n trials is the same, regardless of the ordering of successes and failures. First, any particular sequence of x successes occurs with prob = π × π × · · · × π × (1 − π) × (1 − π) × · · · × (1 − π) | {z } | {z } x successes n − x failures = π x (1 − π)n−x (n) There are x ways of assigning x successes in a sequence of n trials. Then, (number of ways to have x successes) × π x × (1 − π)n−x ( ) n = π x (1 − π)n−x . x P (X = x) = PHP 2510 – Sept 18, 2008 10 Example: Number of smokers in a sample of size n 29% of Americans are smokers. Suppose you select 3 people at random from the population (i.e. n = 3). Let X denote the number of smokers in the sample. PHP 2510 – Sept 18, 2008 11 1st person 2nd person 3rd person x P (X = x) 1 1 1 3 0.02 0 1 1 2 0.06 1 1 0 2 0.06 1 0 1 2 0.06 1 0 0 1 0.15 0 1 0 1 0.15 0 0 1 1 0.15 0 0 0 0 0.36 PHP 2510 – Sept 18, 2008 12 Construct mass function for X ( ) 3 P (X = 0) = × .290 × .713 = .36 0 ( ) 3 P (X = 1) = × .291 × .712 = .45 1 P (X = 2) = P (X = 3) PHP 2510 – Sept 18, 2008 = 13 Quick review If the sample contains at least one smoker, what is the probability it contains exactly one smoker? Ans = .70 PHP 2510 – Sept 18, 2008 14 Example calculations with the binomial distribution Example 1: Roll 5 fair dice. Let X = number of sixes. Find: 1. P (X = 0) 2. P (X > 0) 3. P (X = 2 | X > 0) 4. E(X) PHP 2510 – Sept 18, 2008 15 Example 2: Testing whether a die is fair. 1. A die is rolled 5 times, and a six does not come up. Is the die fair? (p(0) = .40) 2. A die is rolled 10 times, and a six does not come up. Is it fair? (p(0) = .16) 3. A die is rolled 50 times, and six only comes up twice. Is it fair? (p(2) = .005, p(1) = .001, p(0) = .0001). PHP 2510 – Sept 18, 2008 16 Geometric distribution The geometric distribution is useful for modeling waiting times on a discrete scale. • Assume independent trials where success probability is pi • Geometric variable X characterizes the number of trials until the first success. • To have the first success occur on trial k, need k − 1 failures before the first success. Probability mass function is P (X = k) PHP 2510 – Sept 18, 2008 = (1 − π)k−1 × π 17 Example. Probability of contracting HIV in a single sexual encounter is 1 in 500. Let X denote the encounter during which a person gets infected for the first time. Assume each encounter is independent and carries the same risk. The mass function is ( P (X = k) = 499 500 )k−1 ( 1 500 ) Example. What is the probability of contracting HIV within the first 3 encounters? P (X = 1) = · · · = .002 P (X = 2) = · · · = .001996 P (X = 3) = · · · = .001992 P (X ≤ 3) = PHP 2510 – Sept 18, 2008 · · · = .006 18