Download Statistics 400 - Lecture 2

Statistics 400 - Lecture 23  Last Day: Regression  Today: Finish Regression, Test for Independence (Section 13.4)  Suggested problems: 13.21, 13.23 Computer Output  Will not normally compute regression line, standard errors, … by hand  Key will be identifying what computer is giving you SPSS Example u E u t s R q q m M u u a 1 5 0 9 a P b O m e a S d u F a M i f a g a 1 R 6 1 6 1 0 R 4 7 1 T 0 8 a P b D a i c n d e d f f a t i s c S B e M E i t g t 1 ( C 2 4 0 6 R 4 7 6 7 0 a D  What is the Coefficients Table?  What is the Model Summary?  What is the ANOVA Table Back to Probability  The probability of an event, A , occurring can often be modified after observing whether or not another event, B , has taken place  Example: An urn contains 2 green balls and 3 red balls. Suppose 2 balls are selected at random one after another without replacement from the urn.  Find P(Green ball appears on the first draw)  Find P(Green ball appears on the second draw) Conditional Probability  The Conditional Probability of A given B : P( A and B) P( A | B)  P( B)  Example: An urn contains 2 green balls and 3 red balls. Suppose 2 balls are selected at random one after another without replacement from the urn.  A={Green ball appears on the second draw}  B= {Green ball appears on the first draw}  Find P(A|B) and P(Ac|B) Example:  Records of student patients at a dentist’s office concerning fear of visiting the dentist suggest the following proportions Fear Dentist Do Not Fear Dentist School Level Elementary Middle 0.12 0.08 0.28 0.25  Let A={Fears Dentist}; B={Middle School}  Find P(A|B) High 0.05 0.22 Conditional Probability and Independence  If fearing the dentist does not depend on age or school level what would we expect the probability distribution in the previous example to look like?  What does this imply about P(A|B)?  If A and B are independent, what form should the conditional probability take? Summarizing Bivariate Categorical Data  Have studied bivariate continuous data (regression)  Often have two (or more) categorical measurements taken on the same sampling unit  Data usually summarized in 2-way tables  Often called contingency tables Test for Independence  Situation: We draw ONE random sample of predetermined size and record 2 categorical measurements  Because we do not know in advance how many sampled units will fall into each category, neither the column totals nor the row totals are fixed Example:  Survey conducted by sampling 400 people who were questioned regarding union membership and attitude towards decreased spending on social programs Union Non-Union Total Support 112 84 196 Indifferent 36 68 104 Opposed 28 72 100 Total 176 224 400  Would like to see if the distribution of union membership is independent of support for social programs  If the two distributions are independent, what does that say about the probability of a randomly selected individual falling into a particular category  What would the expected count be for each cell?  What test statistic could we use? Formal Test  Hypotheses:  Test Statistic:  P-Value: Spurious Dependence  Consider admissions from a fictional university by gender Male Female Total Admit 490 280 770 Deny 210 220 430 Male Female Admit 0.70 0.56 Deny 0.30 0.44  Is there evidence of discrimination?  Consider same data, separated by schools applied to:  Business School: Male Female Admit 480 180 Deny 120 20 Male Female Admit 0.80 0.90 Deny 0.20 0.10 Deny 90 200 Male Female Admit 0.10 0.33 Deny 0.90 0.67  Law School: Male Female Admit 10 100  Simpson’s Paradox: Reversal of comparison due to aggregation  Contradiction of initial finding because of presence of a lurking variable

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics 400 - Lecture 2