Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Lecture 7: The concept of a random variable (Examples from “Chance Encounters: A First Course in Data Analysis and Inference by C. J. Wild et al.”) Lejla Batina Institute for Computing and Information Sciences – Digital Security Radboud University Nijmegen Version: spring 2012 Lejla Batina Version: spring 2012 Wiskunde 1 1 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Outline Main concepts about probability Discrete Random Variable Probability Distributions Lejla Batina Version: spring 2012 Wiskunde 1 2 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Recap • A sample space, S, for a random experiment is the set of all possible outcomes of the experiment. • An event is a set of outcomes. • The following events are often used, for given events A and B: • Unions A ∪ B, • Intersections A ∩ B, • The complement of A, denoted A (occurs if A does not occur) • Mutually exclusive events cannot occur at the same time. • A partition is a way of splitting up a sample space into separate parts. Events C1 , C2 , · · · , Ck form a partition of the Sk sample set S if they are mutually exclusive and i=1 Ci = S. Lejla Batina Version: spring 2012 Wiskunde 1 3 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Conditional probability and independency • The conditional probability of A occurring given that B occurs is: P(A|B) = P(A∩B) P(B) . • Events A and B are independent if knowing whether B has occurred gives no information about the chances of A occurring, i.e. P(A|B) = P(A). In this case it follows P(A ∩ B) = P(A) · P(B). Lejla Batina Version: spring 2012 Wiskunde 1 4 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Summary of useful concepts and formulas 1 P(S) = 1, P(S) = P(∅) = 0. 2 P(A) = 1 − P(A). 3 P(A ∪ B) = P(A) + P(B) − P(A ∩ B). 4 If A and B are mutually exclusive P(A ∩ B) = 0. • P(A) = P(A ∩ B) + P(A ∩ B) = P(B)P(A|B) + P(B)P(A|B). • If C1 , . . . , Ck is a partition: Pk Pk 5 P(A) = 6 i=1 P(A ∩ Ci ) = i=1 P(Ci )P(A|Ci ). Multiplication formula: P(A ∩ B) = P(A)P(B|A) = P(B)P(A|B). P(A ∩ B ∩ C ) = P(A ∩ B)P(C |A ∩ B) = = P(A)P(B|A)P(C |A ∩ B). P(A1 ∩ A2 . . . An ) = P(A1 )P(A2 |A1 ) . . . P(An |A1 ∩ . . . ∩ An ). Lejla Batina Version: spring 2012 Wiskunde 1 5 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Bayes Theorem P(A|B)P(B) • P(B|A) = P(A∩B) = P(A) = P(A) P(A|B)P(B) P(A∩B)+P(A∩B) • If C1 , . . . , Ck is a partition of S: P(Ci |A) = Lejla Batina P(Ci )P(A|Ci ) P(A) = Version: spring 2012 P(Ci )P(A|Ci ) Pk . j=1 P(Cj )P(A|Cj ) Wiskunde 1 6 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example The data in Table comes from a telephone poll of 800 adult Americans carried out in 1993. The question asked was: “Should smoking be banned from workplaces, should there be special smoking areas, or should there be no restrictions?” Banned Special areas No restrictions Total Non-smokers 0.3350 0.3975 0.0238 0.7563 Smokers 0.0200 0.1963 0.0274 0.2437 Total 0.3550 0.5938 0.0512 1.0000 What is the probability of a person favors banning given that the person smokes or not (when person is chosen at random)? ) 0.3350 P(banned|non − smoker ) = P(banned∩non−smoker = 0.7563 = 0.4429. P(non−smoker ) P(banned|smoker ) = 0.0821. ) 0.3350 = 0.3550 = 0.9437. P(non − smoker |banned) = P(banned∩non−smoker P(banned) Hence, 94% of people in the survey who favor banning smoking are non-smokers. Lejla Batina Version: spring 2012 Wiskunde 1 7 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Random variables A random variable is a type of measurement taken on the outcome of a random experiment. It is a process of assigning a number to every outcome of an experiment. Example A coin is tossed twice, then the sample space is S = {HH, HT , TH, TT }, and X - number of heads. Then we get: HH HT TH TT X 2 1 1 0 Definition Let S be a sample space and A is an event from S. A random variable is a real function defined on S, f : S → R. Lejla Batina Version: spring 2012 Wiskunde 1 8 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Definition of probability function Definition The probability function for a discrete random variable X gives P(X = x) = pi for every values x that X can take and the following also holds. A sequence of numbers p1 , p2 , . . . is a probability distribution for a discrete sample space S = s1 , s2 , . . . provided • pi ≥ 0, ∀i, • P pi = 1 A random variable that takes on a finite or a countably infinite number of values is called a discrete random variable, otherwise we have a non-discrete random variable. Lejla Batina Version: spring 2012 Wiskunde 1 9 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example: tossing a coin twice Consider again tossing a coin twice. x P(x) 0 1/4 1 1/2 2 1/4 A biased coin, for which the probability of getting a “head” is p is tossed twice. In this case, we get the following probability function: x P(x) 0 (1 − p)2 1 2p(1 − p) 2 p2 Often we will need to use a probability function to compute probabilities of events that contain more than one value, e.g. P(X ≥ a), P(X > b), P(X ≤ c) or P(a ≤ X ≤ b). Lejla Batina Version: spring 2012 Wiskunde 1 10 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Geometric distribution Example Consider tossing a biased coin with P(H) = p until the first head appears. Then S = {H, TH, TTH, TTTH, . . .}. Let X be the total number of tosses executed. Then X = 1, 2, 3, . . .. We get the following values for the probability function: P(X = 1) = p, P(X = 2) = (1 − p)p, P(X = x) = (1 − p)x−1 p, for x = 1, 2, 3, . . . . This is called the Geometric distribution and in this case we write X ∼ Geometric(p). The formula for P forms a geometric series. It holds: P(X ≥ x) = (1 − p)x−1 . Lejla Batina Version: spring 2012 Wiskunde 1 11 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example: modeling a real-life situation by “coin tossing” Example The chances of a successful pregnancy resulting from implanting a frozen embryo are about 1 in 10. Suppose a couple who are desperate to have children will continue to try this procedure they succeed. We can assume that the process is just like tossing a biased coin until the first success. Let X be the number of times the couple tries the procedure up to and including the successful attempt. Then X has a Geometric distribution with p = 0.1. • The probability of success on the 4th try is P(X = 4) = 0.93 0.1 = 0.0729. • The probability of success before the 4th try is P(X < 4) = P(X ≤ 3) = P(X = 1) + P(X = 2) + P(X = 3) = 0.271. • The probability of success on the 2nd, 3rd or 4th attempt is P(2 ≤ X ≤ 4) = P(X = 2) + P(X = 3) + P(X = 4) = 0.2439. Lejla Batina Version: spring 2012 Wiskunde 1 12 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Hypergeometric distribution Example Consider a barrel or urn containing N balls of which M are black and the rest N − M, are white. We take a simple random sample (i.e. without replacement), of size n and measure X , the number of black balls in the sample. This distribution is called the Hypergeometric distribution of X and in this case we write X ∼ Hypergeometric(N, M, n) and for the probability function we get: P(X = x) = Lejla Batina (Mx )(N−M n−x ) (Nn ) Version: spring 2012 Wiskunde 1 13 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example: applications of the Hypergeometric distribution Example The two color urn model can be used to model any situation in which we take a random sample from a finite population and count the number of objects (or individuals) in the sample that have (or not) the characteristic of interest. Examples: people who do or don’t smoke, will or not vote for a particular political party etc. Here: N is the size of population, M is the number of individuals with the characteristic of interest and X measures the number with that characteristic in a sample of n. Lejla Batina Version: spring 2012 Wiskunde 1 14 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example Example Suppose a company has 20 cars, out of which exactly 7 cars do not meet government standards and are therefore releasing excessive pollution. Moreover, suppose that a traffic policeman randomly inspects 5 cars. Find the probability that he does not find more than 2 polluting cars. Since N = 20, M = 7 and n = 5, we get: P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.7932. Lejla Batina Version: spring 2012 Wiskunde 1 15 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example: the game of Lotto Example A player purchases a board and choose 6 different numbers between 1 and 40. On the night of the draw, a sampling machine draws six balls (so-called winning numbers) at random without replacement from forty balls labeled 1 to 40. The machine then chooses a 7th ball from the remaining 34 giving the so-called bonus number, which is treated specially. Prizes are awarded according to how many of the winning numbers the player has picked. Say that the following scheme applies: • Category 1: All 6 winning numbers. • Category 2: 5 of the winning numbers plus the bonus number. • Category 3: 5 of the winning numbers but not the bonus number. • Category 4: 4 of the winning numbers. Find the probabilities of the prizes from Category 1, 2 and 4. Lejla Batina Version: spring 2012 Wiskunde 1 16 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Solution Example Let X the number of matches between the player’s numbers and the winning numbers. Then X ∼ Hypergeometric(N = 40, M = 6, n = 6). 34 (x6)(6−x ) . 40 (6) P(Category 1 − prize) = P(X = 6) = 2.605x10−7 . P(Category 4 − prize) = P(X = 4) = 0.0022. P(Category 2 − prize) = P(X = 5 ∩ bonus) = 1 = P(X = 5)P(bonus|X = 5) = P(X = 5)x 34 = 1.563x10−6 . P(X = x) = Lejla Batina Version: spring 2012 Wiskunde 1 17 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Binomial distribution Suppose we have again a biased coin where P(X = H) = p. A random experiment consists of making a fixed number, say n tosses and let X measures the number of heads. Then, X ∈ {0, 1, 2, . . . , n}, we write X ∼ Bin(n, p) and we call this distribution the Binomial distribution. X is said to be a Binomial random variable with parameters n and p if and P(X = x) = xn p x (1 − p)n−x . Example Find the probability 8of 82heads out of 10 times flipping a coin. P(X = 8) = 10 8 0.5 0.5 = 0.04394. Lejla Batina Version: spring 2012 Wiskunde 1 18 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example: applications of the Binomial distribution Example The biased coin tossing model can be used in many situations providing the following assumptions are valid. • Each trial (“toss”) has only 2 outcomes. • The probability of getting a success is the same (say p) for each trial. • The outcomes of the trials are mutually independent. For example, we can even use it when rolling a die if we are only interested in getting a six. Lejla Batina Version: spring 2012 Wiskunde 1 19 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Relationship to the Hypergeometric distribution Consider the urn again, but now we sample n balls with replacement. In this case, “the biased coin model” applies. For each trial, the probability of a success is constant at p = M N. Suppose X measures the number of black balls in a sample of size n, then X ∼ Bin(n, p = M N ). In practice, we do not have replacements often, i.e. for people, but we can still use this model if M and N − M are large compared to the sample size n. Take N = 1000, M = 200 and n = 5; then after 5 balls have drawn (out 200−x 200 of which x are black) the proportion of black balls is 1000−5 ≈ 1000 . Summary: If we take a sample of less than 10 % from a large population in which a proportion p have a characteristic of interest, the distribution of X i.e. the number in the sample with that characteristic, is approximately Binomial(n, p), where n is the sample size. Lejla Batina Version: spring 2012 Wiskunde 1 20 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Poisson distribution Definition A random variable X , where X ∈ {1, 2, . . . , ...} has a Poisson distribution if: x P(X = x) = e −λ · λx! , where λ is a constant. In this case we write X ∼ Poisson(λ). Note that, P(X = 0) = e −λ . It can be shown that the probabilities P(X = x) all sum to 1. Lejla Batina Version: spring 2012 Wiskunde 1 21 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example: applications of the Poisson distribution Example We consider a type of event occurring randomly through time, e.g. earthquakes, errors in accounts, telephone calls in a given time interval, arrivals at a queue, mistakes in calculations etc. Let X be the number occurring in a unit interval of time. Then under the following conditions, X can be shown to have Poisson(λ) distribution. • The event occurs at a constant average rate λ per unit time. • Occurrences are independent of one another. • More than one occurrence cannot happen at the same time. Lejla Batina Version: spring 2012 Wiskunde 1 22 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example While checking the galley proofs of several chapters of a book, the authors found 1.6 printer’s errors per page on average. We can assume the errors were occurring randomly according to a Poisson process. Let X be the number of errors on a single page. Then X ∼ Poisson(λ = 1, 6). Find the following probabilities: 1 The probability of finding no errors on any particular page, P(X = 0) = e −1.6 = 0.2019. 2 The probability of finding 2 errors on any particular page, P(X = 2) = 0.2584. 3 The probability of no more than 2 errors on a page, P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.7833. 4 The probability of more than 4 errors on a page, P(X > 4) = 1 − P(X ≤ 4) = 0.0238. 5 The probability of getting a total of 5 errors on 3 consecutive pages, Let Y be the number of errors in 3 pages. Then Y ∼ Poisson(λ = 4.8). P(Y = 5) = 0.1747. 6 What is the probability that in a block of 10 pages, exactly 3 pages have no errors? Let W be the number of pages wit no errors. Then, W ∼ Binomial(n = 10, p = 0.2019) and P(W = 3) = 7 10 3 0.20193 0.7981 = 0.2037. What is the probability that in 4 consecutive pages, there are no errors on the first and third pages, and one error on each of the other two? Let Xi be the number of errors on the i-th page. Then we have: P(X1 = 0 ∩ X2 = 1 ∩ X3 = 0 ∩ X4 = 1) = P(X1 = 0)P(X2 = 1)P(X3 = 0)P(X4 = 1) = = 0.20192 x0.32302 = 0.0043. Here, P(X = 1) = 0.3230. Lejla Batina Version: spring 2012 Wiskunde 1 23 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Approximation with Binomial Suppose X ∼ Bin(n = 1000, p = 0.06), the calculation is rather complicated. But in this case, it can be shown: −λ x n x n−x ' e x!λ x p (1 − p) So, if p is small and n is large in a Binomial distribution such that np = λ where λ is a constant, then we get: k limn→∞ P(X = k) = e −λ · λk! , where k = 0, 1, 2, . . .. The rule of thumb: we can use Poisson distribution if p < 0.1 and λ ≤ 5 or λ ≤ 10. Note: λ is the average number of events per time unit. Lejla Batina Version: spring 2012 Wiskunde 1 23 / 24 Main concepts about probability Discrete Random Variable Probability Distributions Radboud University Nijmegen Example Example When rolling 4 dice, the chance to have 4x6 is 614 . If we are rolling 1000 times then λ = 1000 · 614 ≈ 0.77. The probability to get 0, 1, 2 or 3 or more then 4 times 6 is computed respectively as: • P(X = 0) = e −λ ≈ 0.46 • P(X = 1) = e −λ · λ ≈ 0.36 • P(X = 2) = e −λ · λ2 /2 ≈ 0.14 • P(X ≥ 3) = 1 − P(X < 3) ≈ 0.043 Lejla Batina Version: spring 2012 Wiskunde 1 24 / 24