Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat I Notes on Contingency Tables and Bayes Theorem Prof. Vinod April. 2015 Let us measure “Political affiliation” along columns and “attitude toward federal healthcare” along the rows of a Contingency Table. We use abbreviations: D=democrat, I=indep., R=republican, F =favors federal healthcare, N=does not favor a federal role. (file http://www.fordham.edu/economics/vinod/st1cntn.doc ) A survey of 671 persons yielded the following information, which is readily tabulated in 2 by 3 setup as follows. D F N Total I 161 110 271 R 40 40 80 Total 130 331 190 340 320 671=GT=grand The learning objective is to answer following type of questions. QUIZ: Please answer following questions: (answers are given later in this file) 1) Find probability that a randomly chosen person is Republican 1b) Find probability that a randomly chosen person is either an independent OR favors federal healthcare care (Hint: I or F addition Rule, Numerator=161+40+130+40) 2) Find conditional probability P( F| I) 3) Give the formula for a test of statistical independence. Are Political affiliation and attitude toward federal healthcare statistically independent? Note that we compute the row and column totals and the grand total (GT). These are needed in probability computations below. This a contingency table because a person’s attitude toward federal healthcare may be contingent upon (depend on) his or her political affiliation. Probability of Event A: P A number of simple events in A total number of simple events in the sample space Find the probability of a column characteristic D denotes the set along first column (being a democrat). P(D) =271/671 = 0.40387 =0.4039 when rounded to four places. The P(D) is computed by dividing the column total at bottom margin by the GT. Find the probability of a row characteristic (being in favor of fed. hlthcare) P(F) = 331/671 = 0.49329 = 0.4933 (rounded to 4 places) The P(F) is computed by dividing the row total at the right margin by the GT. Probability of a Complement: P Ac 1 P A Similarly P(N)=340/671. In fact we also have P(F)+P(N)=1, since N is complement of F. Note that the column/row totals are placed along the margins of the table. Probabilities computed from margins of contingency table are called marginal or unconditional probabilities. Addition Rule for probability from the contingency table. Addition (being a P(ID) = P(I or rule is for computing the Probability of a Union of 2 sets. democrat OR being an Independent, i.e., union of D and I) P(D)+P(I)-P(D ∩ I) D) = P(D)+P(I)-P(D and I)= [(161+110)/GT]+ [(40+40)/GT] - (40/GT) =(331+40)/GT =0.5529 Union of Mutually Exclusive Events: P A B P A PB Addition Rule: P A B P A PB P A B We apply this to Quiz problem 1b as follows. P(independent or F)=P(I)+P(F)-P(I ∩ F)=(331/GT)+(80/GT)-(40/GT)=371/GT Probability of Intersection of two sets. This is the probability of belonging to two sets simultaneously. This is also called the joint probability. We used it above for P(D ∩ I) and also for P(I ∩ F). In a contingency table the numerator for joint probability comes from the main body of the table (not margins) For example, let the two sets be D (first column) and F (first row). Now the joint probability is simply an element from the body of the contingency table (not margin) divided by the GT. P(D and F) = P(D ∩ F) = 161/GT =0.23994 or 0.2399 Definition of Conditional probability. Here we restrict attention to only a subset of sample space where some condition is satisfied: Condition is described by one or more sets (rows or columns). When we restrict attention to a set, the denominator in probability calculations becomes the number of elements in that set (total for that rows or columns) instead of the grand total. Example: Let the condition be that we restrict attention to set R for republicans. There are 320 republicans in the data. What is the probability (pr) of F "favoring federal healthcare" among these folks? There are 130 Republicans who favor federal healthcare. Hence by direct computation the conditional probability of F given R is 130/320 or 0.4063. Instead of doing this kind of tedious conditioning argument, it is easier to use a general formula for conditional probability. Conditional probability = joint pr. / (marginal pr. for the specified condition) FORMULA is: condL = Jt / MargL Conditional Probability: PA B P A B P B The word "conditional" is often also described by the word "given" and written as a vertical bar. P(F | R) is pronounced as Prob. of (F given R). For our example, let us verify that the formula gives the correct result. The two sets under consideration are F (first row) and R (third column). P(F given R)= Joint/marginal, where Joint=P(F and R)=P(F ∩ R)=130/GT and Marginal prob.=P(R)=(130+190)/GT . (Joint / Marginal) =130/320=0.40625 or 0.4063. Thus we verified that the conditional prob. formula is right for this example. Example 2: find conditional pr. of being indep. given that we restrict attention to folks who do NOT favor federal healthcare (set N, second row) P(I given N) = Joint/marginal for second row Joint=40/GT and Marginal= (110+40+190)/GT. Now P(I|N)=40/340=0.117 Example 3: find conditional pr. of D given the person favors fed healthcare. P(D given F)= P(D |F)=Jt/marg with Joint=161/GT, Marginal=(161+40+130)/GT. Hence P(D|F)=161/331=0.486 STATISTICAL INDEPENDENCE FROM CONTINGENCY TABLES Are the column characteristic and row characteristic statistically independent? In our example, we are considering whether Political affiliation and attitude toward federal healthcare are statistically independent. To answer this, we have to make the following formal test: We pick any one row from the set of rows (say F) and one column from the set of columns, say D. The formal test is to check if conditional probability equals the unconditional (marginal) probability so that the effect of the condition is moot. It is intuitively clear that if the condition is moot we have independence. Then the test for our choice of D and F the test for independence is: P(D|F) =? P(D) P(D|F)=161/331=0.486, Is this equal to P(D)=0.404. Answer is NO This means row and column characteristics are NOT statistically independent. Since P(D given F)=0.486 is not equal to P(D)=0.404 the row characteristic and column characteristic are dependent. We reject independence. In general, the left side of the test has conditional probability of some column given some row and the right side of the test is probability of some column irrespective of row condition. To summarize we chose D and F formal test is: P(D|F) =? P(D) Let us think about the formula a bit more. There are two things on the left F and D. Note that we must choose the first “D” on the right hand side! If both probabilities are numerically equal (a very rare thing) we satisfy the formal test and conclude that there is independence between row and column characteristics. At the risk of confusing you, consider what happens if we switch the conditioning. An equivalent test is to check if P(F|D)=? P(F) Notice we have the symbol F before the | conditioning symbol. Hence we must choose P(F) on the right side of the testing equation. Verify that we reject independence here. EXAMPLE WHERE ROW AND COLUMN ARE STATISTICALLY INDEPENDENT. Take another example: B= baby is happy, M=mother is present U= baby is unhappy, F= father is present. The baby was observed for 100 hours M B U Total F 30 20 50 Total 30 20 60 40 100=Grand 50 total= GT Is the baby independent of the mother? Yes if P(B given M) = P(B) Check if the baby is happy whether the mother is present or not! The conditional probability P(B given M) = joint/marginal= (30/GT) / (50/GT). i.e., (30/50)= 0.6 Now the unconditional (marginal) probability that the baby is happy is 60/GT This is also 0.6 Conclude that conditional probability P(B given M) is exactly the same as unconditional or marginal probability P(B) NUMERICALLY. Hence the baby is statistically independent of the mother. ROW characteristic is independent of column characteristic. Independent Events Rule: PA B P A or PB A PB QUIZ ANSWERS: Please answer following questions: 1) Find probability that a randomly chosen person is Republican 320/671 = 0.4769 1b) P(F or I) = P(F)+P(I)-P(F and I) =(331/GT)+(80/GT)-(40/GT)=371/GT 2) Find conditional probability P( F|I) = Joint / marginal , joint= (40/GT), marginal = 80/GT Cancel GT to yield P(F|I)=40/80=0.5 3) Give the formula for test of statistical independence. P(F|I) =? P(F) Are Political affiliation and attitude toward federal healthcare statistically independent? Since P(F|I) ≠P(F) when P(F)=331/671=0.4132936 or 0.4133. This is clearly not 0.5, so political affiliation matters, so they are Dependent BAYES THM a good source these days is: http://en.wikipedia.org/wiki/Bayes'_theorem Example 1 statement of the problem A city XYZ has reported 6 in 1000 HIV-positive cases. A person from that city was tested for HIV and the test was positive. The HIV test is known to be subject to two types of errors. False negative (person declared as HIV negative even though he really has HIV) error rate is 1 %. The False positive (person declared as HIV positive even though he really is free of HIV) error rate is 1 in 1000. What is the probability that he has HIV given that he tested positive on the test with these known error rates? H1=being HIV positive H2=being HIV negative. These are competing hypotheses. Since the person is from city XYZ we can assume that there is P(H1)=0.006 probability that he is HIV positive. (this is prior probability or what frequentists call the prejudice of the researcher) What above evidence or events? Define E1=test is positive, E2= test is negative. We know the error rates of the tests, which are following conditional probabilities. Given that the person has HIV, the probability that the test wrongly declares that he is disease-free (false negative) happens 1 in 100 times. That is P(E2|H1)=0.01 Given that the person has no HIV, the probability that the test wrongly declares that he has the disease (false positive) happens 1 in 1000 times. That is P(E1|H2)=0.001 This is all the information we have and we have to compute posterior probability that the person from city XYZ has HIV knowing that he tested positive. Find P(H1|E1) I suggest the set up with a long horizontal line, above which is H1, and below which is H2 Note how things must add up to 1 for prior, conditional above the line and conditionals below the line. Finally the two posteriors should also add up to unity: P(H1|E1)+P(H2|E1)=1 P(H1)=0.006 P(E1 | H1)= 0.99 P(E2| H1) = 0.01 ------------------------------------------------------------------long horizontal line P(H2)=0.994 P(E1| H2) = 0.001 P(E2 |H2) = 0.999 Bayes Theorem says that P(H1 |E1) = P( E1 | H 1) * P( H 1) k P( E1 | Hi ) * P( Hi ) i 1 where k is number of alternative hypotheses, here k=2 numerator of Bayes theorem right side= P(E1|H1)*P(H1) = 0.99*0.006 #R command num=0.99*0.006 #(=0.00594) Note that the first term of the denominator is the same as the numerator The second term is obtained by replacing H1 by H2 while keeping E1 the same! P(E1|H2)*P(H2)=0.001*0.994. #R command to compute second term in the denominator den2=0.001*0.994 #(=0.000994) The posterior probability answer for H1 by Bayes theorem then is 0.00594/ (0.00594+0.000994)= 0.00594/ 0.006934 P(H1|E1)= 0.8566484 #R command num/(num+den2) The second posterior probability for H2 or P(H2|E1) by the Bayes theorem then is #R command den2/(num+den2) # 0.1433516 Verify that the two posteriors should also add up to unity: P(H1|E1)+P(H2|E1)=1 0.8566484+# 0.1433516=1 Let denote "proportional to", then in words, (upon ignoring the denominator) Bayes theorem says: (Posterior probability) (Prior probability)*(Likelihood), where posterior probability is revised probability Prior probability is always probability of a hypothesis and likelihood is the probability of an event given that hypothesis. Note that likelihood is observable (objective), the prior probability can be subjective (prejudice). Some scientists so-called frequentists think that we should not revise probabilities obtained from objective data. Bayesians on the other hand argue that only the revised prob’s are reliable. Derivation of Bayes theorem: The following left side (LHS) equals right side (RHS) by definition of conditional probability We fix E1 as the event of interest and ignore E2. P(H1 |E1) = Joint (H1 & E1)/ Marginal (E1) = P( H1 ∩ E1) /P(E1) (1) multiply both sides by P(E1) to yield P(H1&E1)=P(H1|E1)* P(E1) (2) Conversely, the other conditional probability so-called likelihood is P(E1 |H1) = Joint (H1 & E1)/ Marginal (H1) = P( H1 ∩ E1) /P(H1) (3) Now multiply through P(H1) to both sides of (3)to yield P(H1&E1)=P(E1|H1)* P(H1) (4) Note that left sides of (2) and (4) are exactly the same, so we can equate the right sides also to yield P(H1|E1)* P(E1) = P(E1|H1)* P(H1) (5) Note that the right side of (5) is the numerator of Bayes Theorem. Left side of (5) has 2 terms of which first is the left side of Bayes Theorem. So we are almost there in proving it, provided P(E1) equals the denominator of Bayes Theorem. Rewrite (5) as: P(H1|E1)= P(E1 | H1) * P(H1) , P(E1) (6) , where the left side writes the posterior probability of H1 given event E1. For philosophical discussion it is appropriate to ignore the denominator on the right hand side of (6), and write the theorem by the statement that left side is proportional to ( ) the numerator of the right hand side. Then Bayes theorem states posterior=P(H1|E1) P(E1|H1)* P(H1)= Likelihood*Prior, (6b) For numerical computation of posterior probabilities one does need the denominator P(E1) and it is derived and explained next. The contingency table by definition is: E1 E2 Row total H1 P(H1&E1) P(H1&E2) P(H1) H2 P(H2&E1) P(H2&E2) P(H2) Col Total P(E1) P(E2) 1 Inserting numerical values from the HIV example we have. In filling this table we use the formula conditional probability=(Joint prob.)/(marginal prob.) Row E1 E2 total H1 0.00594 0.000006 0.006 H2 0.000994 0.993006 0.994 Col 0.006988 0.993012 1 Total The following is seen from the column entitled E1 in the contingency table. P(E1)= P(E1 & H1) + P(E1 & H2) a sum of joint probabilities. We can split each Joint probability into that column as conditional probability times marginal probability (again by definition of conditional prob) P( E1 & H1)= P(E1| H1)* P(H1) P( E1 & H2)= P(E1| H2)* P(H2) Hence the denominator of Bayes Theorem is written as a summation as many terms as there are hypotheses. Here there are two hypotheses, so we have: (7) P(E1) = P(E1| H1)*P(H1)+ P(E1| H2)*P(H2) = i P(E1|Hi)* P(Hi) This finishes derivation of the denominator. Now substituting (7) in (6) we have the usual form of Bayes theorem: (8) P(E1 | H1) * P(H1) P(H1|E1)= 2 P( E1 | Hi ) P Hi i 1 , QED PROOF completed. Without loss of generality this theorem can also be proved for the other conditional probability of the hypothesis H2 as (9) P(E1 | H2) * P(H2) P(H2|E1)= 2 P( E1 | Hi ) P Hi i 1 , Example 1 statement of the problem A city has reported 6 in 1000 HIV-positive cases. A person from that city was tested for HIV and the test was positive. The HIV test is known to be subject to two types of errors. False negative (person declared as HIV negative even though he really has HIV) error rate is 1 %. The False positive (person declared as HIV positive even though he really is free of HIV) error rate is 1 in 1000. What is the probability that he has HIV given that he tested positive on the test with these known error rates?. EXample 1 There is a 0.4 probability that a salesman will be sent by Morgan Co. to call on Mr. Smith. If Morgan does not send a salesman, there is a 0.7 probability that Smith will buy paper from Morgan’s competitor Xerox Co. If on the other hand Morgan does send a salesman, there is only 0.2 probability that Smith will buy from Xerox. If Smith does buy from Xerox, what is the probability that Morgan did not send a salesman. ANSWER: EVENT E1 IS THAT SMITH BUYS FROM XEROX (Comes from the last sentence) E2 is that Smith buys from Morgan THERE ARE TWO HYPOTHESES INVOLVED: H1: MORGAN sent a salesman H2: MORGAN did not send a salesman We are asked to find probability of H2 given that event E OCCURRED=P(H2 | E) Since we are looking for the probability of a hypothesis we need the Bayes theorem as in equation (9), because here we are asked to find the conditional probability P(H2|E). In general, the right hand sides of equations (8) and (9) show that BAYES THEOREM computations need two kinds of probabilities: (i) Prior probabilities of the exhaustive set of all hypotheses, AND (ii) Conditional probability of event E conditional on the hypotheses P(E|H1) and P(E|H2). However the conditional probability for E2=(opposite of event E) does not appear at all in Bayes formulas (8) or (9). Once we compute (i) and (ii), we are ready to apply Bayes theorem. LET US COMPUTE THE prior PROBABILITIES FIRST: P(H1)=0.4 from the FIRST SENTENCE OF THE PROBLEM. P(H2)=0.6 BECAUSE THIS IS A COMPLEMENT OF 0.4. NOTE THAT 0.4+0.6=1.0 H1 AND H2 are mutually exclusive and exhaustive P(H1) + P(H2) must be a certainty, i.e. its probability must be 1.0 I like to organize the information as follows separately for the two hypotheses with a long horizontal line separating the two hypotheses. The probabilities conditional on the two hypotheses are then conveniently written separated by the long horizontal line. The conditional probabilities add up to 1 conditional for each hypothesis. Their values are discussed below in detail. Sent salesman P(H1)=0.4 P(E1|H1)=0.2 P(E2|H1)=0.8 ------------------------------------------------------------------long horizontal line Salesman Not sent P(E1|H2)=0.7 P(H2)=0.6 P(E2|H2)=0.3 Now we turn to the second part, conditional probability needed for Bayes theorem. The phrase 'if the salesman is sent' means conditional on H1 The phrase 'if the salesman is not sent' means conditional on H2 The second sentence of the original problem states conditional on H2, so this information goes below the horizontal line. The probability that Smith will buy from Xerox conditional on H2 is 0.7 P( E | H2)= 0.7. See this written below the line in my set up in red The third sentence says that if the salesman is sent (conditional on H1) this goes above the line, the probability that Smith will buy from Xerox is only 0.2 P(E | H1) = 0.2 See this written above in Green color. Now we are ready to plug into the statement of the BAYES THEOREM of eq. (9) P(E | H2) P(H2) P(H2 | E) = ---------------------------------------P(E | H1) P(H1) + P(E | H2) P(H2) 0.7 TIMES 0.6 = ----------------------------0.2 TIMES 0.4 + 0.7 TIMES 0.6 =0.42/(0.08+0.42) = 0.84 Example 2 WHO-DONE IT? From past records, the chance that the accounting mistake was made by Tom (0.5), Dick (0.25) and Jane (0.25). With the kind of latest mistake the evidence points to Tom and Jane ( 1 in 1000) while Dick it is 2 in 1000. What is the posterior probability for each of them. Use Bayes Theorem. Two useful computational tricks are Priors Add up to 1 Conditionals add up to 1 for each hypothesis. E1=mistake was made, E2=No mistake made. Tom P(H1)=0.50 P(E1|H1)=0.001 P(E2|H1)=0.999 ---------------------------------------------------long horizontal line Dick P(E1|H2)=0.002 P(H2)=0.25 P(E2|H2)=0.998 ----------------------------------------------------long horizontal line Jane P(H3)=0.25 P(E1|H3)=0.001 P(E2|H3)=0.999 Numerator of Bayes Theorem for Tom P(E1|H1) * P(H1) = 0.001*0.5=0.0005 Numerator of Bayes Theorem for Dick P(E1|H2) * P(H2) = 0.002*0.25=0.0005 Numerator of Bayes Theorem for Jane P(E1|H3) * P(H3) = 0.001*0.25=0.00025 Denominator for all = grand sum of above= 0.00125 Posterior for Tom and Dick =0.0005/0.00125 = 0.4 Posterior for Jane= 0.2, SO Tom/Dick did it! EXAMPLE 3 You are in investor. IRS audits corporations and audits have effect on stock prices. We know that 20% of corporations will file incorrect return. IRS itself makes errors. Sometimes IRS auditors claim there is an error and in reality there was no error, which happens 10% times. Conversely, IRS auditors miss real errors 30% times. News reports say that IRS has just notified XYZ corporation that there is an error in their corporate tax return. Use Bayes theorem to determine the posterior probability of erroneous return by XYZ Corporation. Step 1: Define the Event! News of IRS audit and E1=error found, E2=no error found. Step 2: Prior probabilities? Step 3: What can affect the prior probability? Determine the conditional probabilities. Step 4: Compute posterior probability numerators and then posterior probabilities. H1 = XYZ corp. return does have an error H2 = XYZ corp. return does Not have an error IRS audits, E1 IRS finds error E2 IRS does NOT find any error As investors we are interested in finding whether XYZ corporation actually (eventually) has an error in its tax return now that the news has broken out that they are being audited. That is we want to know the posterior probability P(H1 | E1) Above the first horizontal line we have P(H1) Given that XYZ Corp has submitted an erroneous return (i.e, given H1) the probability that IRS will find it is P(E1|H1)= 0.70 [ We know this from the statement “IRS auditors miss real errors 30% times” which means it will find 70% times, this is called the likelihood function] Hence P(E2|H1) =0.30 Below the first horizontal line we have P(H2) that XYZ corp. return does Not have an error Case Given that the returns are good (no error) IRS does make a mistake and alleges errors in 10% of cases which eventually turn out to be wrong. P(E1 IRS finds error | H2 really no error) =0.10 P(E1|H1)=0.7 P(H1)=0.20 P(E2|H1)=0.3 --------------------------------------------long horizontal line P(E1|H2)=0.1 P(H2)=0.80 P(E2|H2)=0.9 If we want posterior probability P(H1|E1), numerator has P(E1|H1)*P(H1)=0.7*0.2=0.14 If we want posterior probability P(H2|E1), numerator has P(E1|H2)*P(H2)=0.1*0.8=0.08 Summation of all numerators = 0.14+0.08=0.22 is the denominator for posterior probability. So the posterior probability by Bayes Theorem that the XYZ corporate return actually has an error is P(H1 | E1) = Bayes numerator / Bayes denominator = 0.14/0.22 = 0.6364 EXAMPLE 4 A firm tests prospective employees by using a test. Among those who perform their jobs satisfactorily 65% passed the test. Among those who DID NOT perform their jobs satisfactorily and were fired, 25% passed the test. According to the recorded data, 90% of the employees perform satisfactorily. What is the probability that a prospective employee who passed the test will not perform satisfactorily? ANSWER: we define Event E=passing the test, E2=failing and H1=satisfactory performance on the job and H2 as unsatisfactory performance. Passed: 0.65 =P( E|H1) H1,Satisfactory: Failed test: 0.35= P(E2|H1) ----------------------------------------------------------------long horizontal line Passed: 0.25 =P( E|H2) H2, Unsatisfactory Failed test: 0.75= P(E2|H2) P(E|H2) P(H2) 0.25*0.10 Find P(UN | Pass) = P(H2 | E) =-------------------------------------- = --------------------------P(E|H2) P(H2)+P(E|H1) P(H1) 0.25*0.1+0.65*0.9 = 0.0409 This is the answer. Note that P(E2|H1) and P(E2|H2) do not appear in the Bayes formula at all. Sometimes the problem statement contains information about P(E2|H1) instead of the needed information about P(E|H1). Note that since the two conditional probabilities on any side of the long horizontal line always add up to 1 we can readily find the needed value by using: P(E|H1)=1P(E2|H1). EXAMPLE 5 A town suspects that teens are out to steal. Prior data shows prob. that a teen commits theft is 0.8 and the prob that adults commit theft are 0.2. A theft was reported and a teen and an adult were accused. Investigation showed that the prob. that accused teen is guilty is 0.6, while accused adult is guilty is 0.7. Find the probability that the accused teen did it. Answer Look at the last sentence. It should have P(H1|E) the left side of a typical Bayes Problem. clearly here H1=teen did it and E1=event that theft was committed. Now we just have to fill the available information as conditional and marginal probabilities. P(H1) and P(H2) are prior probabilities guilty: 0.6 =P( E1|H1) H1 teen (0.8): innocent: 0.4= P(E2|H1) ----------------------------------------------------------------long horizontal line guilty: 0.7 =P( E1|H2) H2, adult (0.2) innocent: 0.3= P(E2|H2) P(E1|H1) P(H1) 0.6*0.8 Find P(teen| theft ) = P(H1 | E1) =-------------------------------------- = --------------------------P(E1|H1) P(H1)+P(E1|H2) P(H2) 0.6*0.8+ 0.7*0.2 P(E|H1) P(H1) 0.7*0.2 Find P(adult | theft ) = P(H1 | E) =-------------------------------------- = --------------------------P(E1|H1) P(H1)+P(E1|H2) P(H2) 0.6*0.8+ 0.7*0.2 #Rcommands a=0.6*0.8;a b=0.2*.7;b teen=a/(a+b);teen #posterior Prob that accused teen did it= a/(a+b) =0.7741935 adult=b/(a+b);adult #posterior Prob that accused adult did it= b/(a+b)= 0.2258065 If there we no age profiling the prob. would be only 0.6 for the teen, instead of 0.7741935 This shows that Bayes Thm application can be unfair. Frequentists are opposed to Bayes They say that only P(E|H1)=0.6 should be relevant. H1 H2 Col Total E1 E2 Row total P(H1&E1) P(H1&E2) P(H1) P(H2&E1) P(H2&E2) P(H2) P(E1) P(E2) 1 Contingency table for this problem H1 H2 E1 0.48 0.14 E2 0.32 0.06 Row total 0.8 0.2 Col Total 0.62 0.38 1 We filled this table by using the following definitional relations. P(E1|H1)=0.6 means (Joint E1&H1/marginal= 0.8) is 0.6, so the joint is 0.8*0.6=0.48 P(E1|H2)=0.7 means (Joint E1&H2/marginal 0.2) is 0.7, so the joint is 0.7*0.2=0.14 Example (Defective Product, Which plant produced it?) A company’s output is split into two plants Plant A producing 60% and plant B producing 40%. Past record shows that A’s products are generally 90% good and B’s are 95% good. Given that a defective is complained about, what it the probability that the plant A produced it. Event is a defective product complaint is received. Prior is P(HA)=0.6 and P(HB)=0.4 (the supply propensity of each plant, no hidden prejudice here) E1=good product, E2=defective product. We are asked to find P(HA |E2) [By the way, the fact that 60% output comes from plant A does not mean 60% defectives also come from plant A. Frequentists say that this prior should be irrelevant and that what matters is how many defectives are produced by each plant. But Bayes Theorem says that both should matter. I agree that as long as there is not unfair prejudice, Bayes’s processing of information is optimal] good: 0.9 =P( E2|HA) HA (0.6): defective: 0.1= P(E1|HA) ----------------------------------------------------------------long horizontal line good: 0.95 =P( E2|HB) HB, (0.4) defective: 0.05= P(E1|HB) posterior Prob that plant A produced it=prior*conditional=0.6*0.1 /denom posterior Prob that plant B produced it=0.4*0.05 /denom R code: a=0.6*0.1; b=0.4*0.05;den=a+b; postA=a/den;postA #0.75 is the answer posterior for B is 0.25. That is plant A is the culprit here. When we are not considering the guilt or innocence of an individual, Bayes Theorem is useful in business. Example (Marketing choice) ABElectronics Inc wants to sell new models of TV sets. In the past 40% were successfully sold and 60% failed to sell. If a consumer report recommends the new TV model it matters. In the past 80% of successful sales had a favorable recommendation from consumer reports while 30% of unsuccessful models had also received a favorable recommendation. The new model has received a favorable recommendation from consumer reports. What is the probability that it will be successful in the end? H1= successful (past prejudice prob) P(H1)= 0.40 H2= unsuccessful model prior P(H2)=0.60 E1= consumer report recommends the model E2= consumer report DOES NOT recommend it CR recom| succ =P( E1|H1)= 0.80 H1 (0.4): CR rejects| success= P(E2|H1)=0.20 ----------------------------------------------------------------long horizontal line CR recom| Unsucc =P( E1|H2) =0.30 H2, (0.6) CR rejects| Unsucc = P(E1|H2)=0.70 Find posterior P(H1|E1) =num/denom num1=P(E1|H1)*P(H1)=0.8*0.4 den1=P(E1|H1)*P(H1)=0.8*0.4 den2= P(E1|H2)*P(H2)=0.30*0.6 ans=0.8*0.4 / (0.8*0.4 + 0.3*0.6) =0.64