Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Discrete Random Variables and Probability Distributions 3.1 - Random Variables 3.2 - Probability Distributions for Discrete Random Variables 3.3 - Expected Values 3.4 - The Binomial Probability Distribution 3.5 - Hypergeometric and Negative Binomial Distributions 3.6 - The Poisson Probability Distribution POPULATION Discrete random variable X Examples: shoe size, dosage (mg), # cells,… Pop values Probabilities Cumul Probs x p(x) F (x) x1 p(x1) p(x1) x2 p(x2) p(x1) + p(x2) x3 p(x3) p(x1) + p(x2) + p(x3) ⋮ ⋮ ⋮ 1 Total 1 Total Area = 1 Mean x p( x) all x 2 Variance ( x ) p( x) 2 all x X ~ The Binomial Distribution ~ Used only when dealing with binary outcomes (two categories: “Success” vs. “Failure”), with a fixed probability of Success () in the population. Calculates the probability of obtaining any given number of Successes in a random sample of n independent “Bernoulli trials.” Has many applications and generalizations, e.g., multiple categories, variable probability of Success, etc. POPULATION 40% Male, 60% Female For any randomly selected individual, define a binary random variable: 1 if Male, with prob 0.4 Y 0 if Female, with prob 1 0.6 RANDOM SAMPLE n = 100 Discrete random variable X = # Males in sample (0, 1, 2, 3, …, 99, 100) x p(x) F(x) x1 p(x1) F(x1) How can we calculate the probability of x p(x ) F(x ) = P(X = x), for x==2), 0, …, 1, 2, 3, …,100? P(Xp(x) = 0), 1), P(X P(X = 99), P(X = x100)? p(x ) ⋮ ⋮ ⋮ 1 F(x) = P(X ≤ x), for x = 0, 1, 2, 3, …,100? 1 2 2 3 3 2 POPULATION 40% Male, 60% Female RANDOM SAMPLE n = 100 For any randomly selected individual, define a binary random variable: 1 if Male, with prob 0.4 Y 0 if Female, with prob 1 0.6 Discrete random variable X = # Males in sample (0, 1, 2, 3, …, 99, 100) Example: How can we calculate the probability of p(25) p(x) = P(X = x), for=xP(X = 0,=1,25)? 2, 3, …,100? Solution: F(x) = Model P(X the ≤ x),sample for x =as 0, a1,sequence 2, 3, …,100? of independent coin tosses, with 1 = Heads (Male), 0 = Tails (Female), where P(H) = 0.4, P(T) = 0.6 .… etc…. 5 2100 How many possible outcomes of n = 100 tosses exist? How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 1 2 3 4 5 ...... 97 98 99 100 ...... … X = 25 Heads: { H1, H2, H3,…, H25 } permutations of 25 among 100 There are 100 possible open slots for H1 to occupy. For each one of them, there are 99 possible open slots left for H2 to occupy. For each one of them, there are 98 possible open slots left for H3 to occupy. …etc…etc…etc… For each one of them, there are 77 possible open slots left for H24 to occupy. For each one of them, there are 76 possible open slots left for H25 to occupy. Hence, there are ?????????????????????? 100 99 98 … 77 76 possible outcomes. This value is the number of permutations of the coins, denoted 100P25. 2100 How many possible outcomes of n = 100 tosses exist? How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 1 2 3 4 5 ...... 97 98 99 100 ...... X = 25 Heads: { H1, H2, H3,…, H25 } 100 99 98 … 77 76 permutations of 25 among 100 This number unnecessarily includes the distinct permutations of the 25 among themselves, all of which have Heads in the same positions. For example: We would not want to count this as a distinct outcome. 1 2 3 4 5 ...... ...... 97 98 99 100 2100 How many possible outcomes of n = 100 tosses exist? How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 1 2 3 4 5 ...... 97 98 99 100 ...... X = 25 Heads: { H1, H2, H3,…, H25 } 100 99 98 … 77 76 permutations of 25 among 100 This number unnecessarily includes the distinct permutations of the 25 among themselves, all of which have Heads in the same positions. How many is that? By the same logic…... 25 24 23 … 3 2 1 100 99 98 … 77 76 100!_ = 25 24 23 … 3 2 1 25! 75! “25 factorial” - denoted 25! R: choose(100, 25) Calculator: 100 nCr 25 100 “100-choose-25” - denoted 25 or 100C25 This value counts the number of combinations of 25 Heads among 100 coins. 2100 How many possible outcomes of n = 100 tosses exist? How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 1 2 3 4 5 0.4 0.6 0.6 0.4 0.6 ...... 97 . ... . . ... . 98 99 100 0.6 0.4 0.4 0.6 100 Answer: 25 What is the probability of each such outcome? Recall that, per toss, P(Heads) = = 0.4 P(Tails) = 1 – = 0.6 Answer: Via independence in binary outcomes between any two coins, 0.4 0.6 0.6 0.4 0.6 … 0.6 0.4 0.4 0.6 = (0.4)25 (0.6)75. 100 25 75 Therefore, the probability P(X = 25) is equal to……. (0.4) (0.6) 25 R: dbinom(25, 100, .4) 2100 How many possible outcomes of n = 100 tosses exist? How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 1 2 3 4 5 0.4 0.5 0.6 0.5 0.6 0.5 0.4 0.5 0.6 0.5 100 Answer: 25 ...... 97 . ... . . ... . 98 99 100 0.6 0.5 0.4 0.5 0.4 0.5 0.6 0.5 This is the “equally likely” scenario! What is the probability of each such outcome? Recall that, per toss, P(Heads) = = 0.4 0.5 P(Tails) = 1 – = 0.5 0.6 Answer: Via independence in binary outcomes between any two coins, 25 100 75 0.4 0.5 0.6 0.5 0.6 0.5 0.4 0.5 0.6 … 0.5 0.6 0.5 0.4 0.5 0.4 0.5 0.6 = (0.4) . (0.5)(0.6) 0.5 100 10025 100 100 75 (0.6) 2(1/ 2) (0.5) Therefore, the probability P(X = 25) is equal to……. 25 (0.4) Question: What if the coin were “fair” (unbiased), i.e., = 1 – = 0.5 ? POPULATION “Success” 40% Male, vs. “Failure” 60% Female RANDOM SAMPLE nsize = 100 n For any randomly selected individual, define a binary random variable: “Success” with prob 0.4 1 if Male, Y “Failure” with prob 11– 0.6 0 if Female, Discrete random variable X = # “Successes” Males in sample in sample (0, 1, 2, 3, …, 99, n) 100) Example: What is the probability 100 100 n xx x25 100 x xx 75 (0.4) (0.4) (1 (1(0.6) (0.6) ))n100 x x P(X = 25)? x 25 n x = 0, 1, 2, 3, …,100 Solution: F(x) =Model P(X ≤the x), sample for x = 0,as 1, 2, a 3, sequence …,100? of n = 100 independent coinwith tosses, with 1 = Heads (Male), 0= Tails Bernoulli trials P(“Success”) = , P(“Failure”) = 1 –(Female). . independent, with constant probability () per trial Then X is said to follow a Binomial distribution, written X ~ Bin(n, ), with “probability mass function” n x n x , x = 0, 1, 2, …, n. (1 .… )etc…. x p(x) = Example: Blood Type probabilities, revisited Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies? Check: 1. Independent outcomes? Reasonably assume that outcomes “Type O” vs. “Not Type O” between two individuals are independent of each other. 2. Constant probability ? From table, = P(Type O) = .461 throughout population. Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor x Blood Type + – O .384 .077 .461 1 A .323 .065 .388 2 B .094 .017 .111 AB .032 .007 .039 .833 .166 10 p(x) = x (.461)x (.539)10 – x 0 .999 Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461) 3 4 5 6 7 8 9 10 p(x) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 F (x) (.461)0 (.539)10 = 0.00207 0.00207 (.461)1 (.539)9 = 0.01770 0.01977 (.461)2 (.539)8 = 0.06813 0.08790 (.461)3 (.539)7 = 0.15538 0.24328 (.461)4 (.539)6 = 0.23257 0.47585 (.461)5 (.539)5 = 0.23870 0.71455 (.461)6 (.539)4 = 0.17013 0.88468 (.461)7 (.539)3 = 0.08315 0.96783 (.461)8 (.539)2 = 0.02667 0.99450 (.461)9 (.539)1 = 0.00507 0.99957 (.461)10 (.539)0 = 0.00043 1.00000 1 Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor x Blood Type + – O .384 .077 .461 1 A .323 .065 .388 2 B .094 .017 .111 AB .032 .007 .039 .833 .166 10 p(x) = x (.461)x (.539)10 – x 0 .999 Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461) 3 4 5 6 7 8 9 10 p(x) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 F (x) (.461)0 (.539)10 = 0.00207 0.00207 (.461)1 (.539)9 = 0.01770 0.01977 (.461)2 (.539)8 = 0.06813 0.08790 (.461)3 (.539)7 = 0.15538 0.24328 (.461)4 (.539)6 = 0.23257 0.47585 (.461)5 (.539)5 = 0.23870 0.71455 (.461)6 (.539)4 = 0.17013 0.88468 (.461)7 (.539)3 = 0.08315 0.96783 (.461)8 (.539)2 = 0.02667 0.99450 (.461)9 (.539)1 = 0.00507 0.99957 (.461)10 (.539)0 = 0.00043 1.00000 1 n = 10 p = .461 pmf = function(x)(dbinom(x, n, p)) N = 100000 x = 0:10 bin.dat = rep(x, N*pmf(x)) hist(bin.dat, freq = F, breaks = c(-.5, x+.5), col = "green") axis(1, at = x) axis(2) Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor x Blood Type + – O .384 .077 .461 1 A .323 .065 .388 2 B .094 .017 .111 AB .032 .007 .039 .833 .166 10 p(x) = x (.461)x (.539)10 – x 0 .999 Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461) 3 4 5 6 7 8 9 p(x) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 n Also, can show mean = x p(x) = == 4.61 (10)(.461) and variance 2 = (x – ) 2 p(x) = n (1 – ) = 2.48 F (x) (.461)0 (.539)10 = 0.00207 0.00207 (.461)1 (.539)9 = 0.01770 0.01977 (.461)2 (.539)8 = 0.06813 0.08790 (.461)3 (.539)7 = 0.15538 0.24328 (.461)4 (.539)6 = 0.23257 0.47585 (.461)5 (.539)5 = 0.23870 0.71455 (.461)6 (.539)4 = 0.17013 0.88468 (.461)7 (.539)3 = 0.08315 0.96783 (.461)8 (.539)2 = 0.02667 0.99450 (.461)9 (.539)1 = 0.00507 0.99957 (.461)10 (.539)0 = 0.00043 1.00000 1 Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor x Blood Type + – O .384 .077 .461 1 A .323 .065 .388 2 B .094 .017 .111 AB .032 .007 .039 .833 .166 10 p(x) = x (.461)x (.539)10 – x 0 .999 Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461) 3 4 5 6 7 8 9 10 p(x) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 Also, can show mean = x p(x) = n = 4.61 and variance 2 = (x – ) 2 p(x) = n (1 – ) = 2.48 F (x) (.461)0 (.539)10 = 0.00207 0.00207 (.461)1 (.539)9 = 0.01770 0.01977 (.461)2 (.539)8 = 0.06813 0.08790 (.461)3 (.539)7 = 0.15538 0.24328 (.461)4 (.539)6 = 0.23257 0.47585 (.461)5 (.539)5 = 0.23870 0.71455 (.461)6 (.539)4 = 0.17013 0.88468 (.461)7 (.539)3 = 0.08315 0.96783 (.461)8 (.539)2 = 0.02667 0.99450 (.461)9 (.539)1 = 0.00507 0.99957 (.461)10 (.539)0 = 0.00043 1.00000 1 Example: Blood Type probabilities, revisited Rh Factor Blood Type + Therefore, 1500 x 1500 x (.007) (.993) p(x) = x – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 1500 individuals Suppose nn==10 individuals areare to to be selected at random from the population. Probability table for X = #(Type AB–) Binomial model applies. X ~ Bin(10, Bin(1500, .461) .007) Also, can show mean = x p(x) = n = 10.5 – ) = 10.43 2.48 and variance 2 = (x – ) 2 p(x) = n (1 x = 0, 1, 2, …, 1500. RARE EVENT! Example: Blood Type probabilities, revisited Therefore, 1500 x 1500 x (.007) (.993) p(x) = x x = 0, 1, 2, …, 1500. Is there a better alternative? RARE EVENT! Long positive skew as x 1500 …but contribution 0 Chapter 3 Discrete Random Variables and Probability Distributions 3.1 - Random Variables 3.2 - Probability Distributions for Discrete Random Variables 3.3 - Expected Values 3.4 - The Binomial Probability Distribution 3.5 - Hypergeometric and Negative Binomial Distributions 3.6 - The Poisson Probability Distribution Example: Blood Type probabilities, revisited Rh Factor Blood Type + Therefore, 1500 x 1500 x (.007) (.993) p(x) = x – x = 0, 1, 2, …, 1500. O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 Poisson distribution AB .032 .007 .039 RARE EVENT! .833 .166 .999 Is there a better alternative? 1500 individuals Suppose nn==10 individuals areare to to be selected at random from the population. Probability table for X = #(Type AB–) p( x ) = e μ μ x x! x = 0, 1, 2, …, where mean and variance are = n = 10.5 and 2 = n = 10.5 Binomial model applies. X ~ Bin(1500, .007) Also, can show mean = x p(x) = n = 10.5 and variance 2 = (x – ) 2 p(x) = n (1 – ) = 10.43 X ~ Poisson(10.5) Notation: Sometimes the symbol (“lambda”) is used instead of (“mu”). Example: Blood Type probabilities, revisited Rh Factor Blood Type + Therefore, 1500 x 1500 x (.007) (.993) p(x) = x – x = 0, 1, 2, …, 1500. O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 Poisson distribution AB .032 .007 .039 RARE EVENT! .833 .166 .999 Is there a better alternative? Suppose n = 1500 individuals are to be selected at random from the population. Probability table for X = #(Type AB–) p( x ) = x ee10.5 (1x 0.5) x !x ! where mean and variance are = n = 10.5 and 2 = n = 10.5 Ex: Probability of exactly X = 15 Type(AB–) individuals = ? 1500 15 1485 Binomial: 15 (.007) (.993) x = 0, 1, 2, …, Poisson: X ~ Poisson(10.5) e 10.5 (10.5)15 15! (both ≈ .0437) Example: Deaths in Wisconsin Example: Deaths in Wisconsin Assuming deaths among young adults are relatively rare, we know the following: • Average λ = 584 deaths per year • Mortality rate (α) seems constant. Therefore, the Poisson distribution can be used as a good model to make future predictions about the random variable X = “# deaths” per year, for this population (15-24 yrs)… assuming current values will still apply. Probability of exactly X = 600 deaths next year e584 (584)600 0.0131 P(X = 600) = 600! R: dpois(600, 584) Probability of exactly X = 1200 deaths in the next two years Mean of 584 deaths per yr Mean of 1168 deaths per two yrs, so let λ = 1168: e1168 (1168)1200 0.00746 P(X = 1200) = 1200! 584 deaths / yr Probability of at least one death per day: λ = 365 days / yr = 1.6 deaths/day P(X ≥ 1) = P(X = 1) + P(X = 2) + P(X = 3) + … True, but not practical. e1.6 (1.6)0 = 1 – e–1.6 = 0.798 P(X ≥ 1) = 1 – P(X = 0) = 1 – 0! ● Binomial ~ X = # Successes in n trials, P(Success) = ● Poisson ~ As above, but n large, small, i.e., Success RARE ● Negative Binomial ~ X = # trials for k Successes, P(Success) = ● Geometric ~ As above, but specialized to k = 1 ● Hypergeometric ~ As Binomial, but changes between trials ● Multinomial ~ As Binomial, but for multiple categories, with 1 + 2 + … + last = 1 and x1 + x2 + … + xlast = n