Download 9 Conditional Probability Continued 10 Introduction to Random

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
9
Conditional Probability Continued
As we have seen P(A|B) and P(B|A) are very different things. The following
theorem relates these two conditional probabilities.
Theorem 9.5 (Bayes’ Theorem) If A and B are events with P(A), P(B) > 0
then
P(A|B)P(B)
P(B|A) =
.
P(A)
Remark. The conclusion of this theorem is sometimes stated as
P(B|A) =
P(A|B)P(B)
,
P(A|B)P(B) + P(A|B c )P(B c )
(we have just applied the theorem of total probability to the denominator.)
A popular example concerns medical trials. We gave an example in lectures of a test for a disease which has a 90% chance of correctly identifying
the presence of the disease (in other words, the conditional probability of a
positive result given that you have the disease is 9/10) and only a 1% chance
of giving a “false positive” (ie identifying the disease in a healthy patient).
Despite this if the disease is a rare one affecting only 2% of the population,
the conditional probability of not having the disease given a positive test is
surprisingly large (around 0.35..).
10
Introduction to Random Variables
Suppose, as usual, that we have a sample space S and a probability function
P which assigns a real number to each event. Sometimes we are interested
not in the exact outcome but only some consequence of it (for example I toss
3 coins but I only care about how many heads occur not the exact outcome).
In this sort of situation random variables are useful.
Definition. A random variable is a function from S to R.
Remark. We usually use capital letters for random variables. One (informal) way of thinking is to regard a random variable as a question about the
outcome which has an answer which is a real number (eg how many heads
occur) or a measurement made on the outcome.
1
Statements like “X = 3” or “X ≤ 3” are events. Specifically “X = 3” is
the event {s ∈ S : X(s) = 3} (that is the set of all outcomes on which X
takes the value 3) and “X ≤ 3” is the event {s ∈ S : X(s) ≤ 3} (that is the
set of all outcomes on which X takes a value at most 3).
Definition. For a random variable X, the range of X is the set of all values
taken by X. We denote it by Range(X). Thus
Range(X) = {x ∈ R : X(s) = x for some s ∈ S}.
Definition. For a random variable X the probability mass function (or pmf )
of X is the function from Range(X) to R given by x 7→ P(X = x).
We saw lots of examples in the lectures. Here is one of them.
Example. A coin which has probability p of coming up heads is tossed three
times. Let X be the number of heads observed so for example X(hht) = 2
(remember that X is a function from S to R so each element of S is mapped
by X to a real number, in this case to 2).
The event ”X = 2” is {hht, hth, thh} and so
P(X = 2) = P({hht, hth, thh}) = 3p2 (1 − p).
Similarly, you can work out P(X = 0), P(X = 1), P(X = 3) and get that the
probability mass function is
n
0
1
2
3
3
2
2
P (X = n) (1 − p) 3p(1 − p) 3p (1 − p) p3
11
Discrete Random variables
Definition (informal). Let X be a random variable defined on a sample space
S. We say X is discrete if either
a) X only takes finitely many values (that is Range(X) = {X(s) : s ∈ S}
is finite),
or, more generally,
b) the values X takes are separated by “gaps”.
2
It follows that if Range(X) = N (for example when X equals the number
of tosses of a coin until the first head is seen) then X is discrete. If Range(X)
is eqaual to a subinterval of R such as {x ∈ R : x ≥ 0} (for example the
time I wait until my bus arrives) then X is not discrete.
Remark. Generally speaking, random variables are easier to deal with when
they are discrete. We will do quite a lot with discrete random variable but
look only briefly at “non-discrete” ones.
Proposition 11.1. Let X be a discrete random variable. Then
X
P(X = x) = 1,
x
where the summation runs over all x ∈ Range(X).
This proposition is a useful check when calculating the probability mass
function of X.
Definition. The expectation of a discrete random variable X is
X
x P(X = x),
x
where the summation runs over all x ∈ Range(X). It is usually denoted by
E(X) (or E(X)).
The variance of X is
X
(x − E(X))2 P(X = x),
x
where the summation runs over all x ∈ Range(X). It is usually denoted by
Var(X).
The idea is that the expectation is some sort of “average value of X”.
The variance measures how concentrated X is about its expectation with a
small variance meaning sharply concentrated and a large variance meaning
spread out.
3