Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
' $ Stat 504, Lecture 6 1 Introduction to Two-Way Tables Example 1: 2 × 2 Table of counts and/or proportions Table 1: Incidence of Common Colds involving French Skiers (Pauling(1971) as reported in Fienberg(1980) Cold No Cold Totals Placebo 31 109 140 Absorbic Acid 17 122 139 Totals 48 231 279 & % ' $ Stat 504, Lecture 6 2 Table 2: Incidence of Common Colds involving French Skiers (Pauling(1971) as reported in Fienberg(1980) Cold No Cold Totals Placebo 0.111 0.391 0.502 Absorbic Acid 0.601 0.437 0.498 Totals 0.172 0.828 1 Q1: Compare relative frequency of occurrence of some characteristics of two groups, e.g. is a probability of a member of the placebo group contracting a cold same as a probability of a member for the ascorbic group contracting a cold? Q2: Are two characteristics independent, e.g. are a type of treatment and contracting cold associated? Q3: Is one characteristic a cause for another, e.g. does having a therapeutic value of ascorbic acid (vitamin C) prevent contracting a cold? & % ' $ Stat 504, Lecture 6 3 Suppose that we collect data on two binary variables, Y and Z. Binary means that these variables take two possible values, say 1 (e.g. ”cold”) and 2 (e.g. ”no cold”). Suppose we collect values of Y (e.g. treatment) and Z (e.g. contracting cold) for n sample units. The data then consist of n pairs, (y1 , z1 ), (y2 , z2 ), . . . , (yn , zn ). We can summarize the data in a frequency table. Let xij be the number of sample units having Y = i and Z = j. Then x = (x11 , x12 , x21 , x22 ) is a summary of all n responses, e.g x11 = 31. We could display x as a one-way table with four cells, but it is customary to display x as a square table with two rows and two columns: & Z=1 Z=2 Y =1 x11 x12 Y =2 x21 x22 % ' Stat 504, Lecture 6 $ 4 Marginal totals. When a subscript in a cell count xij is replaced by a plus sign (+), it will mean that we have taken the sum of the cell counts over that subscript. The row totals are x1+ = x11 + x12 , x2+ = x21 + x22 , x+1 = x11 + x21 , x+2 = x12 + x22 , the column totals are and the grand total is x++ = x11 + x12 + x21 + x22 = n. These quantities are often called marginal totals, because they are conveniently placed in the margins of the table, like this. & Z=1 Z=2 total Y =1 x11 x12 x1+ Y =2 x21 x22 x2+ total x+1 x+2 x++ % ' Stat 504, Lecture 6 $ 5 If the sample units are randomly sampled from a large population, then x = (x11 , x12 , x21 , x22 ) will have a multinomial distribution with index n = x++ and parameter vector π = (π11 , π12 , π21 , π22 ), where πij = P (Y = i, Z = j). Z=1 Z=2 total Y =1 π11 π12 π1+ Y =2 π21 π22 π2+ total π+1 π+2 π++ = 1 The probability distribution {πij } is the joint distribution of Y and Z. When you sum the joint probabilities, you get a marginal distribution, e..g the probability distribution {πi+ } is the marginal distribution for Y where P (Y = 1) = π1+ and P (Y = 2) = π2+ . How does the distribution of Z change as the category of Y changes? The conditional distribution of Z given P πij Y , for example, is {πj|i } = πi+ , such that j πj|i = 1. & %