Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
9 Conditional Probability Continued As we have seen P(A|B) and P(B|A) are very different things. The following theorem relates these two conditional probabilities. Theorem 9.5 (Bayes’ Theorem) If A and B are events with P(A), P(B) > 0 then P(A|B)P(B) P(B|A) = . P(A) Remark. The conclusion of this theorem is sometimes stated as P(B|A) = P(A|B)P(B) , P(A|B)P(B) + P(A|B c )P(B c ) (we have just applied the theorem of total probability to the denominator.) A popular example concerns medical trials. We gave an example in lectures of a test for a disease which has a 90% chance of correctly identifying the presence of the disease (in other words, the conditional probability of a positive result given that you have the disease is 9/10) and only a 1% chance of giving a “false positive” (ie identifying the disease in a healthy patient). Despite this if the disease is a rare one affecting only 2% of the population, the conditional probability of not having the disease given a positive test is surprisingly large (around 0.35..). 10 Introduction to Random Variables Suppose, as usual, that we have a sample space S and a probability function P which assigns a real number to each event. Sometimes we are interested not in the exact outcome but only some consequence of it (for example I toss 3 coins but I only care about how many heads occur not the exact outcome). In this sort of situation random variables are useful. Definition. A random variable is a function from S to R. Remark. We usually use capital letters for random variables. One (informal) way of thinking is to regard a random variable as a question about the outcome which has an answer which is a real number (eg how many heads occur) or a measurement made on the outcome. 1 Statements like “X = 3” or “X ≤ 3” are events. Specifically “X = 3” is the event {s ∈ S : X(s) = 3} (that is the set of all outcomes on which X takes the value 3) and “X ≤ 3” is the event {s ∈ S : X(s) ≤ 3} (that is the set of all outcomes on which X takes a value at most 3). Definition. For a random variable X, the range of X is the set of all values taken by X. We denote it by Range(X). Thus Range(X) = {x ∈ R : X(s) = x for some s ∈ S}. Definition. For a random variable X the probability mass function (or pmf ) of X is the function from Range(X) to R given by x 7→ P(X = x). We saw lots of examples in the lectures. Here is one of them. Example. A coin which has probability p of coming up heads is tossed three times. Let X be the number of heads observed so for example X(hht) = 2 (remember that X is a function from S to R so each element of S is mapped by X to a real number, in this case to 2). The event ”X = 2” is {hht, hth, thh} and so P(X = 2) = P({hht, hth, thh}) = 3p2 (1 − p). Similarly, you can work out P(X = 0), P(X = 1), P(X = 3) and get that the probability mass function is n 0 1 2 3 3 2 2 P (X = n) (1 − p) 3p(1 − p) 3p (1 − p) p3 11 Discrete Random variables Definition (informal). Let X be a random variable defined on a sample space S. We say X is discrete if either a) X only takes finitely many values (that is Range(X) = {X(s) : s ∈ S} is finite), or, more generally, b) the values X takes are separated by “gaps”. 2 It follows that if Range(X) = N (for example when X equals the number of tosses of a coin until the first head is seen) then X is discrete. If Range(X) is eqaual to a subinterval of R such as {x ∈ R : x ≥ 0} (for example the time I wait until my bus arrives) then X is not discrete. Remark. Generally speaking, random variables are easier to deal with when they are discrete. We will do quite a lot with discrete random variable but look only briefly at “non-discrete” ones. Proposition 11.1. Let X be a discrete random variable. Then X P(X = x) = 1, x where the summation runs over all x ∈ Range(X). This proposition is a useful check when calculating the probability mass function of X. Definition. The expectation of a discrete random variable X is X x P(X = x), x where the summation runs over all x ∈ Range(X). It is usually denoted by E(X) (or E(X)). The variance of X is X (x − E(X))2 P(X = x), x where the summation runs over all x ∈ Range(X). It is usually denoted by Var(X). The idea is that the expectation is some sort of “average value of X”. The variance measures how concentrated X is about its expectation with a small variance meaning sharply concentrated and a large variance meaning spread out. 3