Download Topic 2 - NCSU Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
ST 521: Statistical Theory I
Armin Schwartzman
Class topic 2
Conditional Probability and Independence
2
Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
cont. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Independence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Independence of many events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Independence of many events (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Sequential conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
The Birthday Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The Birthday Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Turning around probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Decomposition Formula (Total Probability) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
(So called) Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Bayes and Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Example: cervical cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1
Conditional Probability and Independence
2 / 15
Conditional Probability
If P (B) > 0 we define the conditional probability of the event A given B as
P (A|B) =
P (AB)
P (B)
Intuitively, conditioning on B means reducing the original sample space S to B, which becomes the
new, reduced sample space. All probabilities are computed with respect to B. Notice:
P (A|S) =
P (AS)
= P (A)
P (S)
Disjoint events: If AB = ∅, then P (A|B) = 0 and P (B|A) = 0.
ST 521 – Class topic 2
Fall 2014 – 3 / 15
cont.
Conditional probability satisfies the axioms of probability:
1.
2.
3.
P (S|B) = 1
P (A|B) ≥ 0
∞
∞
If A1 , A2 , . . . are mutually exclusive events, then P
i=1 Ai B =
i=1 P (Ai |B)
and all the other properties:
1.
2.
3.
P (∅|B) = 0
P (A|B) ≤ 1
P (Ac |B) = 1 − P (A|B)
etc.
ST 521 – Class topic 2
Fall 2014 – 4 / 15
2
Independence
Two events A and B are said to be independent if
P (AB) = P (A)P (B)
Why? Because then
(1)
P (AB)
P (A)P (B)
=
= P (A)
P (B)
P (B)
P (A)P (B)
P (AB)
=
= P (B)
P (B|A) =
P (A)
P (A)
P (A|B) =
This could have been taken as the definition, but (1) is easier to generalize.
If A and B are independent, then so are
• Ac and B
• A and B c
• Ac and B c
* Can two disjoint events be independent and viceversa? (HW)
ST 521 – Class topic 2
Fall 2014 – 5 / 15
Independence of many events
The events A1 , A2 , . . . , An are mutually independent if for every subcollection A i1 , . . . , Aik of size
k = 2, . . . , n
k
k
Aij =
P (Aij )
P
j=1
j=1
Notice that is a very strong condition. But it is necessary to ensure that
P (Aj |Ai1 . . . Aik ) = P (Aj )
for every j and every subcollection Ai1 , . . . , Aik that does not include Aj . Pairwise independence is
not enough.
ST 521 – Class topic 2
Fall 2014 – 6 / 15
3
Independence of many events (cont.)
Example: Toss a coin three times. Define the events:
• H1 = { The outcome of the first toss is heads }
• H2 = { The outcome of the second toss is heads }
• H3 = { The outcome of the third toss is heads }
Suppose every outcome is equally likely. Then these events independent.
Now define the events:
• A12 = { The outcome of the first toss equals the second }
• A13 = { The outcome of the first toss equals the third }
• A23 = { The outcome of the second toss equals the third }
These events are pairwise independent but not mutually independent.
ST 521 – Class topic 2
Fall 2014 – 7 / 15
Sequential conditioning
By the definition of conditional probability:
P (AB) = P (A)P (B|A)
P (AB) = P (B)P (A|B)
This is useful for computing probabilities of sequential events.
E.g: What is the probability of dealing two aces in a row?
More generally:
P (A1 . . . An ) = P (A1 )P (A2 |A1 )P (A3 |A1 A2 ) . . . P (An |A1 . . . An−1 )
ST 521 – Class topic 2
Fall 2014 – 8 / 15
4
The Birthday Problem
In a group of n students in a class, what is the probability that at least two have the same birthday?
Solution:
Suppose we order the n students in an arbitrary order. Let D j be the event that the first j have
different birthdays. Then
ST 521 – Class topic 2
Fall 2014 – 9 / 15
The Birthday Problem
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
n
ST 521 – Class topic 2
50
60
70
80
Fall 2014 – 10 / 15
5
Turning around probabilities
Also by the definition of conditional probability:
P (A|B) =
P (B|A)P (A)
P (B)
(2)
This is useful for computing conditional probabilities when the reverse conditioning is easier to
compute.
E.g.: Prob. that it will rain given that it is thundering vs. prob. that it thundered given that it is raining.
(2) is called sometimes Bayes’ rule. It is often used in a context where we want to know the
probability that a particular hypothesis is true. We have an a priori belief in whether or not the
hypothesis is true, then update that probability by collecting data.
E.g.: Suppose a priori boys are equally likely to be born as girls. Say 90% of boys play with trucks.
Baby X plays with trucks. What is the probability that Baby X is a boy?
ST 521 – Class topic 2
Fall 2014 – 11 / 15
Decomposition Formula (Total Probability)
Let {A1 , A2 , . . .} be a partition of S. Then
P (B) =
n
P (Ai )P (B|Ai )
i=1
Proof:
ST 521 – Class topic 2
Fall 2014 – 12 / 15
6
(So called) Bayes’ Theorem
Let {A1 , A2 , . . .} be a partition of S. Then
P (B|Aj )P (Aj )
P (Aj |B) = n
i=1 P (B|Ai )P (Ai )
Proof:
ST 521 – Class topic 2
Fall 2014 – 13 / 15
Bayes and Screening
An important application of Bayes’ theorem is screening.
Let D be the disease:
D means diseased
D means ”no disease”
and T be the diagnostic test:
T + means a positive test
T − means a negative test.
Then we have that the positive predictive value
P (D|T + ) =
≡
P (D) P (T + |D)
P (D) P (T + |D) + P (D) P (T + |D)
prevalence × sensitivity
prev. × sens. + (1 − prev.) × (1 − specificity)
ST 521 – Class topic 2
Fall 2014 – 14 / 15
7
Example: cervical cancer
Let D be cervical cancer.
P (D) we’ll take to be 1 in 21,000, which is the approximate annual incidence rate in the US (SEER
2002 estimate).
P (D) = .00004762
Let us take the sensitivity (P (T + |D)) to be 0.71 and the specificity (1 − P (T + |D)) to be 0.75.
Thence the positive predictive value is:
0.00004762 × 0.71
0.00004762 × 0.71 + (1 − 0.00004762) × (1 − 0.75)
= 0.000135
P (D|T + ) =
That means that for every 1,000,000 positive results, about 135 truly have cervical cancer. But for
any particular patient, testing positive increases the probability of having the disease by a factor of
2.8!
ST 521 – Class topic 2
Fall 2014 – 15 / 15
8