* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Probability Review – PEER Program 2014 Welcome to
Survey
Document related concepts
Transcript
Probability Review – PEER Program 2014 Welcome to PEER CWRU! This monograph will provide PEER students with an outline of basic probability terms and concepts, and applications complementary to medical research and practice, sociological research, and general evaluation. Readers do not need a background in probability and statistics, but basic algebra is needed. Mathematics will be kept to a minimum. Topics will have multiple examples, using information from clinical abstracts and journals. Short examples will use italics. References will be provided to the perplexed and curious, with additional information for the latter to inspire additional exploration. Let’s go! Here’s an outline of topics: • Basics of Probability o Define probability, risk and odds o Three axioms o Conditional probability and Bayes’ Theorem o Common probability terms: risk and RISK RATIO versus odds and ODDS RATIO 1. Basics of probability – definitions Probability is a number between 0 to 1, or 0% to 100% reflecting the likelihood of an event occurring based on the number of trials. Probability statements are written as fractions or percentages, as Probability of an event = desired event / all possibilities Probability of an event (in percent) = desired event / all possibilities * 100% For example, the probability of heads on any coin toss is 0.50; the probability of the number six appearing on a die is 1/6, or 0.167. Chance, likelihood, risk, rates, “percentage of the time” are all probability statements. For example, the risk of an outcome to a person in the population, (e.g. incidence of cancer during age 40 to 49), is simply Pr(incident cancer, age 40-49) = # people diagnosed with incident cancer / all persons in risk pool Risk pool may be limited by age, sex, time span, time of membership, etc. In this example, persons at risk are those age 40 to those under age 50 in the population. Odds differ from risk. Odds are not a direct probability statement, but can be converted into a probability. As a definition, David Bruckman, CWRU and Cleveland DPH for PEER program Probability review June 13, 2014 1 Probability Review – PEER Program 2014 Odds of an event = # desired event occurring : # desired event not occurring Or = # desired event occurring / # desired event not occurring One example could be to determine the odds of a drug to work as expected for a specific set of patients. Assume that we had outcomes information among people in the US who used this medication for migraine headaches. In this group, the drug worked 80% of the time to reduce symptoms. While the chance (probability) of success is 0.80, the odds of a person taking the drug would have symptoms reduced is 8:2 odds (read, ‘eight to two odds’), or 8/2 = 4. In any ten people chosen randomly who will be given the drug for migraine headaches, eight will have symptoms lessened while two will not. As a fraction, the odds are 4 to 1 (4:1) or four times greater for the medication to work than for the medication not to work with these patients. So, to convert odds into a probability, add the event frequencies to determine your denominator for your probability. In this case, the odds statement can be altered to represent a probability statement, or 8/10 = 0.80 (80%) as we had shown before. Example: With two dice, you have four ways to obtain a five: (1, 4), (4, 1), (2, 3) and (3, 2). Of 36 possible combinations the true probability or obtaining a five with two dice is 4/36 = 1/9 = 0.11 or 11%. The true odds would be 4:32, or 1:8 (read, “one to eight odds, or one roll seeing a five to every eight rolls not seeing a five”), or 0.125. Note that odds are not equal to probability. 2. Basics of probability – three axioms or rules Three axioms (rules) to keep in mind. a) To describe any event probability, Pr(Event) or Pr(E), the outcomes to the event must be mutually exclusive and independent. That is, there can be no overlap in the probability for all outcomes. Each outcome can be described, and taken together, the sum of the probabilities can be added up. E.g. A patient admitted to hospital can have three outcomes. These outcomes may be described P1, P2, P3, such as P1 = discharged home, alive; P2 = death in hospital; P3 = discharged to another facility, alive. b) The probability for any event ranges from zero (no chance) to 1.0 (100% chance it will happen), or alternatively as 0 ≤ Pr(event) ≤ 1 Probability is often written as Pr(event)= some number, or P(event) =, or π(event) = . David Bruckman, CWRU and Cleveland DPH for PEER program Probability review June 13, 2014 2 Probability Review – PEER Program 2014 E.g. A patient admitted to hospital can have three outcomes. These outcomes may be described P1, P2, P3, such as P1 = discharged home, alive; P2 = death in hospital; P3 = discharged to another facility, alive. Say, for mitral valve surgery in one hospital for data collected over five years, the probabilities were found to be P1(d/c home) =82.3%, P2(dead) = 5.7%, P3(d/c to another facility) = 12.0% c) All probabilities for an event should add up to one, that is, for n possible outcomes, P1 + P2 + … + Pn-1 + Pn = 1.0 or 100%. This may be written as ∑ Pr(event) = 1 (read as “the sum of probabilities for this event equals one.”) E.g. In our patient sample, P1 + P2 + P3 = 82.3% + 5.7% + 12.0% = 100%. Similarly, P1 + P2 + P3 = 0.823 + 0.057 + 0.12 = 1.0 A useful corollary is to use subtraction to determine a probability. For example, suppose you knew Pr(discharged alive to home) and Pr(discharged alive to another facility) but not the probability of death, Pr(death). Using the equation in (c), ∑ Pr(all events) = 1 Or, expanded, we can write Pr(discharged home) + Pr(discharged elsewhere) + Pr(death) = 1 then subtract the first two terms from both sides. We obtain 1 - Pr(discharged home) - Pr(discharged elsewhere) = Pr(death) Another corollary is the complement of an outcome. If an outcome can be written as Pr(E), then its complement is Pr(Ec) and Pr(Ec) = 1 - Pr(E) or Pr( of E not happening) = 1 - Pr(E happening) 3. Conditional Probability The most important use of probabilities in medicine is when using conditional probabilities. This is a critical skill to understand since the applications are so common. Conditional probabilities allow one to calculate probabilities (including odds and risks) when only partial information of the outcomes are available. Also, it is a useful tool to determine probabilities of combinations of events, such as when David Bruckman, CWRU and Cleveland DPH for PEER program Probability review June 13, 2014 3 Probability Review – PEER Program 2014 asking, “What is the probability of B when A has occurred?” This question assumes temporality and linkage, such that A occurred before B and that the outcome B is conditional on the outcome of A. If two outcomes are not linked (i.e. not conditional), then the answer to the question, “What is the probability of B when A has occurred?” is simply the probability of B regardless of A. Example: Let’s consider two separate events, A and B. (Hint: Think A as exposure and B as disease.) Assume exposure must come before some specific disease in question. Then If Pr (B) > 0, then 𝑷𝒓(𝑩|𝑨) = 𝐏𝐫(𝑨 𝒂𝒏𝒅 𝑩) 𝐏𝐫(𝑨) 𝑷𝒓(𝑨|𝑩) = 𝐏𝐫(𝑨 𝒂𝒏𝒅 𝑩) 𝐏𝐫(𝑩) (read as the Probability of B given A equals the division of the probability of A and B occurring together, divided by Probability of A). Also, Pr (A and B) = Pr(AB) = Pr(BA) Equivalently Then, 𝑷𝒓(𝑩|𝑨) = We can rewrite 𝐏𝐫(𝑨|𝑩) 𝐏𝐫(𝑩) 𝐏𝐫(𝑨) Pr(A) = Pr(A|B)* Pr(B) + Pr(A|~ B)* Pr( ~ B) Then, Bayes’ Theorem can be shown as , 𝑷𝒓(𝑩|𝑨) = 𝐏𝐫(𝑨|𝑩) 𝐏𝐫(𝑩) 𝐏𝐫(𝑨|𝑩)𝑷𝒓(𝑩) + 𝐏𝐫(𝑨|~𝑩)𝐏𝐫(~𝑩) Assume you want to find out the probability a disease B (outcome) given some exposure A. That is, A is the exposure and B is disease. • • • • • Pr(B|A): The chance of disease given some exposure A or a positive test A. This is what we want to know using Bayes’ Theorem. Pr(A|B): The chance of being exposed to A, or a positive test A, given that you had the disease B. This is the chance of a true positive test or the true positive rate. Pr(B): Chance of having the disease. This is also called the prevalence of the disease. Pr(~B): Chance of not having the disease. This is equal to Pr(~B) = 1 – Pr(B) Pr(A|~B): Chance of being exposed to A, or a positive test A, given that you do not have the disease (~B). This is the false positive rate. David Bruckman, CWRU and Cleveland DPH for PEER program Probability review June 13, 2014 4 Probability Review – PEER Program 2014 𝑃𝑟(𝑑𝑖𝑠𝑒𝑎𝑠𝑒|𝑒𝑥𝑝𝑜𝑠𝑢𝑟𝑒) = Pr(𝑒𝑥𝑝𝑜𝑠𝑢𝑟𝑒|𝑑𝑖𝑠𝑒𝑎𝑠𝑒) Pr(𝑑𝑖𝑠𝑒𝑎𝑠𝑒) This can be illustrated in a 2x2 contingency table and scenario. Disease Status Exposure Status Disease present Disease Absent Exposed A B Total exposed, A + B Not exposed C D Total not exposed, C + D Total w/disease A+C Total w/o disease B+D Total, all subjects A+B+C+D The local public health department has received many calls from citizens, local clinics and emergency rooms that people have gotten violently ill after attending a reception. The reception was held at a restaurant with 200 patrons. The health department obtained a guest list with phone numbers from the reception host. The guests were interviewed by phone to determine symptoms, if any, and a food history. Some of the patrons where exposed (consumed) the chicken while others did not. Therefore, the risk or probability of disease, Pr (disease), can be written in two equations Risk of disease given exposure = Pr(disease| exposure) = A / (A + B) A+ B reflect the total “at-risk” of disease due to exposure. Risk equations always use the total “at-risk” in the denominator. (Later, you will see the denominator changes with odds.) The risk or probability of disease lacking exposure is Risk of disease given no exposure = Pr(disease| no exposure) = C / (C + D) C+ D reflect the total “at-risk” of disease due to non-exposure, since there are people who may not be exposed but had the disease (or symptoms suggestive of disease). Therefore, the RISK RATIO of disease comparing exposure and non-exposure status, is RR (disease given exposure status) = 𝐴 𝐴+𝐵 ÷ 𝐶 𝐶+𝐷 The RISK RATIO is simply the ratio of the two risks of disease based on exposure and non-exposure, RR (disease) = Pr(disease | exposure) ÷ Pr (disease | non-exposure) David Bruckman, CWRU and Cleveland DPH for PEER program Probability review June 13, 2014 5 Probability Review – PEER Program 2014 More facts on risk ratio: 1) 0 ≤ RR < ∞, with a RR=1 meaning that the risk of disease is independent of exposure since the risk of disease given exposure = risk of disease given non-exposure 2) RR > 1 means that the risk, probability or likelihood, of disease is much greater for those exposed than non-exposed. 3) When exposure may be protective, such as with vaccination as the exposure, one often finds RR<1, or that the risk, probability or likelihood of disease is much smaller for those exposed than non-exposed (not vaccinated.) Extra: Mathematical insight If Pr (B) > 0, then and, 𝑃𝑟(𝐵|𝐴) = Pr(𝐴|𝐵) Pr(𝐵) (read as the Probability of B given A equals…) 𝑃𝑟(𝐴|𝐵) = Pr(𝐵|𝐴) Pr(𝐴) (read as the Probability of B given A equals…) 𝑃𝑟(𝐵|𝐴) ∗ Pr(𝐵) = Pr(𝐴|𝐵) Also, if Pr (A) > 0, then Then we can rewrite 𝑃𝑟(𝐵|𝐴) = Pr(𝐴|𝐵)𝑃𝑟() Pr(𝐵) Since Pr (B) + Pr(not B) = 1 and Pr(A|B)Pr(A) and, 𝑃𝑟(𝐵|𝐴) = Pr(𝐴|𝐵) Pr(𝐵) 𝑃𝑟(𝐵|𝐴) ∗ Pr(𝐵) = Pr(𝐴|𝐵) David Bruckman, CWRU and Cleveland DPH for PEER program Probability review June 13, 2014 6