Download Probability Review – PEER Program 2014 Welcome to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Probability Review – PEER Program 2014
Welcome to PEER CWRU! This monograph will provide PEER students with an outline of basic probability
terms and concepts, and applications complementary to medical research and practice, sociological
research, and general evaluation.
Readers do not need a background in probability and statistics, but basic algebra is needed.
Mathematics will be kept to a minimum. Topics will have multiple examples, using information from
clinical abstracts and journals. Short examples will use italics. References will be provided to the
perplexed and curious, with additional information for the latter to inspire additional exploration. Let’s
go!
Here’s an outline of topics:
•
Basics of Probability
o Define probability, risk and odds
o Three axioms
o Conditional probability and Bayes’ Theorem
o Common probability terms: risk and RISK RATIO versus odds and ODDS RATIO
1. Basics of probability – definitions
Probability is a number between 0 to 1, or 0% to 100% reflecting the likelihood of an event occurring
based on the number of trials. Probability statements are written as fractions or percentages, as
Probability of an event = desired event / all possibilities
Probability of an event (in percent) = desired event / all possibilities * 100%
For example, the probability of heads on any coin toss is 0.50; the probability of the number six
appearing on a die is 1/6, or 0.167.
Chance, likelihood, risk, rates, “percentage of the time” are all probability statements. For example, the
risk of an outcome to a person in the population, (e.g. incidence of cancer during age 40 to 49), is simply
Pr(incident cancer, age 40-49) = # people diagnosed with incident cancer / all persons in risk pool
Risk pool may be limited by age, sex, time span, time of membership, etc. In this example, persons at
risk are those age 40 to those under age 50 in the population.
Odds differ from risk. Odds are not a direct probability statement, but can be converted into a
probability. As a definition,
David Bruckman, CWRU and Cleveland DPH for PEER program
Probability review
June 13, 2014
1
Probability Review – PEER Program 2014
Odds of an event = # desired event occurring : # desired event not occurring
Or
= # desired event occurring / # desired event not occurring
One example could be to determine the odds of a drug to work as expected for a specific set of patients.
Assume that we had outcomes information among people in the US who used this medication for
migraine headaches. In this group, the drug worked 80% of the time to reduce symptoms. While the
chance (probability) of success is 0.80, the odds of a person taking the drug would have symptoms
reduced is 8:2 odds (read, ‘eight to two odds’), or 8/2 = 4. In any ten people chosen randomly who will
be given the drug for migraine headaches, eight will have symptoms lessened while two will not. As a
fraction, the odds are 4 to 1 (4:1) or four times greater for the medication to work than for the
medication not to work with these patients.
So, to convert odds into a probability, add the event frequencies to determine your denominator for
your probability. In this case, the odds statement can be altered to represent a probability statement, or
8/10 = 0.80 (80%) as we had shown before.
Example: With two dice, you have four ways to obtain a five: (1, 4), (4, 1), (2, 3) and (3, 2). Of 36 possible
combinations the true probability or obtaining a five with two dice is 4/36 = 1/9 = 0.11 or 11%.
The true odds would be 4:32, or 1:8 (read, “one to eight odds, or one roll seeing a five to every eight rolls
not seeing a five”), or 0.125. Note that odds are not equal to probability.
2. Basics of probability – three axioms or rules
Three axioms (rules) to keep in mind.
a) To describe any event probability, Pr(Event) or Pr(E), the outcomes to the event must be mutually
exclusive and independent. That is, there can be no overlap in the probability for all outcomes. Each
outcome can be described, and taken together, the sum of the probabilities can be added up.
E.g. A patient admitted to hospital can have three outcomes. These outcomes may be described
P1, P2, P3, such as P1 = discharged home, alive; P2 = death in hospital; P3 = discharged to
another facility, alive.
b) The probability for any event ranges from zero (no chance) to 1.0 (100% chance it will happen), or
alternatively as
0 ≤ Pr(event) ≤ 1
Probability is often written as Pr(event)= some number, or P(event) =, or π(event) = .
David Bruckman, CWRU and Cleveland DPH for PEER program
Probability review
June 13, 2014
2
Probability Review – PEER Program 2014
E.g. A patient admitted to hospital can have three outcomes. These outcomes may be
described P1, P2, P3, such as P1 = discharged home, alive; P2 = death in hospital; P3 =
discharged to another facility, alive.
Say, for mitral valve surgery in one hospital for data collected over five years, the probabilities
were found to be P1(d/c home) =82.3%, P2(dead) = 5.7%, P3(d/c to another facility) = 12.0%
c) All probabilities for an event should add up to one, that is, for n possible outcomes, P1 + P2 + … +
Pn-1 + Pn = 1.0 or 100%. This may be written as
∑ Pr(event) = 1
(read as “the sum of probabilities for this event equals one.”)
E.g. In our patient sample, P1 + P2 + P3 = 82.3% + 5.7% + 12.0% = 100%.
Similarly, P1 + P2 + P3 = 0.823 + 0.057 + 0.12 = 1.0
A useful corollary is to use subtraction to determine a probability. For example, suppose you knew
Pr(discharged alive to home) and Pr(discharged alive to another facility) but not the probability of
death, Pr(death). Using the equation in (c),
∑ Pr(all events) = 1
Or, expanded, we can write
Pr(discharged home) + Pr(discharged elsewhere) + Pr(death) = 1
then subtract the first two terms from both sides. We obtain
1 - Pr(discharged home) - Pr(discharged elsewhere) = Pr(death)
Another corollary is the complement of an outcome. If an outcome can be written as Pr(E), then its
complement is Pr(Ec) and
Pr(Ec) = 1 - Pr(E)
or
Pr( of E not happening) = 1 - Pr(E happening)
3. Conditional Probability
The most important use of probabilities in medicine is when using conditional probabilities. This is a
critical skill to understand since the applications are so common. Conditional probabilities allow one to
calculate probabilities (including odds and risks) when only partial information of the outcomes are
available. Also, it is a useful tool to determine probabilities of combinations of events, such as when
David Bruckman, CWRU and Cleveland DPH for PEER program
Probability review
June 13, 2014
3
Probability Review – PEER Program 2014
asking, “What is the probability of B when A has occurred?” This question assumes temporality and
linkage, such that A occurred before B and that the outcome B is conditional on the outcome of A. If
two outcomes are not linked (i.e. not conditional), then the answer to the question, “What is the
probability of B when A has occurred?” is simply the probability of B regardless of A.
Example: Let’s consider two separate events, A and B. (Hint: Think A as exposure and B as disease.)
Assume exposure must come before some specific disease in question. Then
If Pr (B) > 0, then
𝑷𝒓(𝑩|𝑨) =
𝐏𝐫(𝑨 𝒂𝒏𝒅 𝑩)
𝐏𝐫(𝑨)
𝑷𝒓(𝑨|𝑩) =
𝐏𝐫(𝑨 𝒂𝒏𝒅 𝑩)
𝐏𝐫(𝑩)
(read as the Probability of B given A equals the division of the probability of A and B occurring together,
divided by Probability of A). Also, Pr (A and B) = Pr(AB) = Pr(BA)
Equivalently
Then,
𝑷𝒓(𝑩|𝑨) =
We can rewrite
𝐏𝐫(𝑨|𝑩) 𝐏𝐫(𝑩)
𝐏𝐫(𝑨)
Pr(A) = Pr(A|B)* Pr(B) + Pr(A|~ B)* Pr( ~ B)
Then, Bayes’ Theorem can be shown as ,
𝑷𝒓(𝑩|𝑨) =
𝐏𝐫(𝑨|𝑩) 𝐏𝐫(𝑩)
𝐏𝐫(𝑨|𝑩)𝑷𝒓(𝑩) + 𝐏𝐫(𝑨|~𝑩)𝐏𝐫(~𝑩)
Assume you want to find out the probability a disease B (outcome) given some exposure A. That is, A is
the exposure and B is disease.
•
•
•
•
•
Pr(B|A): The chance of disease given some exposure A or a positive test A. This is what we want
to know using Bayes’ Theorem.
Pr(A|B): The chance of being exposed to A, or a positive test A, given that you had the disease B.
This is the chance of a true positive test or the true positive rate.
Pr(B): Chance of having the disease. This is also called the prevalence of the disease.
Pr(~B): Chance of not having the disease. This is equal to Pr(~B) = 1 – Pr(B)
Pr(A|~B): Chance of being exposed to A, or a positive test A, given that you do not have the
disease (~B). This is the false positive rate.
David Bruckman, CWRU and Cleveland DPH for PEER program
Probability review
June 13, 2014
4
Probability Review – PEER Program 2014
𝑃𝑟(𝑑𝑖𝑠𝑒𝑎𝑠𝑒|𝑒𝑥𝑝𝑜𝑠𝑢𝑟𝑒) =
Pr(𝑒𝑥𝑝𝑜𝑠𝑢𝑟𝑒|𝑑𝑖𝑠𝑒𝑎𝑠𝑒)
Pr(𝑑𝑖𝑠𝑒𝑎𝑠𝑒)
This can be illustrated in a 2x2 contingency table and scenario.
Disease Status
Exposure Status
Disease present
Disease Absent
Exposed
A
B
Total exposed, A + B
Not exposed
C
D
Total not exposed, C + D
Total w/disease
A+C
Total w/o disease
B+D
Total, all subjects
A+B+C+D
The local public health department has received many calls from citizens, local clinics and emergency
rooms that people have gotten violently ill after attending a reception. The reception was held at a
restaurant with 200 patrons. The health department obtained a guest list with phone numbers from the
reception host. The guests were interviewed by phone to determine symptoms, if any, and a food
history. Some of the patrons where exposed (consumed) the chicken while others did not. Therefore,
the risk or probability of disease, Pr (disease), can be written in two equations
Risk of disease given exposure = Pr(disease| exposure) = A / (A + B)
A+ B reflect the total “at-risk” of disease due to exposure. Risk equations always use the total “at-risk” in
the denominator. (Later, you will see the denominator changes with odds.)
The risk or probability of disease lacking exposure is
Risk of disease given no exposure = Pr(disease| no exposure) = C / (C + D)
C+ D reflect the total “at-risk” of disease due to non-exposure, since there are people who may not be
exposed but had the disease (or symptoms suggestive of disease).
Therefore, the RISK RATIO of disease comparing exposure and non-exposure status, is
RR (disease given exposure status) =
𝐴
𝐴+𝐵
÷
𝐶
𝐶+𝐷
The RISK RATIO is simply the ratio of the two risks of disease based on exposure and non-exposure,
RR (disease) = Pr(disease | exposure) ÷ Pr (disease | non-exposure)
David Bruckman, CWRU and Cleveland DPH for PEER program
Probability review
June 13, 2014
5
Probability Review – PEER Program 2014
More facts on risk ratio:
1) 0 ≤ RR < ∞, with a RR=1 meaning that the risk of disease is independent of exposure since the
risk of disease given exposure = risk of disease given non-exposure
2) RR > 1 means that the risk, probability or likelihood, of disease is much greater for those
exposed than non-exposed.
3) When exposure may be protective, such as with vaccination as the exposure, one often finds
RR<1, or that the risk, probability or likelihood of disease is much smaller for those exposed than
non-exposed (not vaccinated.)
Extra: Mathematical insight
If Pr (B) > 0, then
and,
𝑃𝑟(𝐵|𝐴) =
Pr(𝐴|𝐵)
Pr(𝐵)
(read as the Probability of B given A equals…)
𝑃𝑟(𝐴|𝐵) =
Pr(𝐵|𝐴)
Pr(𝐴)
(read as the Probability of B given A equals…)
𝑃𝑟(𝐵|𝐴) ∗ Pr(𝐵) = Pr(𝐴|𝐵)
Also, if Pr (A) > 0, then
Then we can rewrite
𝑃𝑟(𝐵|𝐴) =
Pr(𝐴|𝐵)𝑃𝑟()
Pr(𝐵)
Since Pr (B) + Pr(not B) = 1 and Pr(A|B)Pr(A)
and,
𝑃𝑟(𝐵|𝐴) =
Pr(𝐴|𝐵)
Pr(𝐵)
𝑃𝑟(𝐵|𝐴) ∗ Pr(𝐵) = Pr(𝐴|𝐵)
David Bruckman, CWRU and Cleveland DPH for PEER program
Probability review
June 13, 2014
6