Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Statistics AMP Institutes & Workshops Saturday, April 4th, 2015 Trey Cox. Ph. D. Mathematics Faculty Chandler-Gilbert Community College James Spiker A.P. Statistics Basha High School, Chandler, AZ This work was supported in part by MSP grant #1103080 through the National Science Foundation. Opinions expressed are those of the authors and not necessarily those of the NSF. Bivariate Data Analysis – Qualitative/Categorical On April 15, 1912, the Titanic struck an iceberg and rapidly sank with only 710 of her 2,204 passengers and crew surviving. I wonder…were the rich people more likely to survive? Was chivalry alive and well on the Titanic? Was it “every man for himself”? © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 2 Bivariate Data Analysis – Qualitative/Categorical 1st class passengers 2nd class passengers 3rd class passengers Survived 201 Did not survive 123 118 166 181 528 © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 3 Bivariate Data Analysis – Qualitative/Categorical Survived Did not survive 1st class passengers 2nd class passengers 201 118 123 166 3rd class passengers 181 528 Bivariate Data: Passenger and Survival Responsory variable? _______________ Explanatory variable? ______________ © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 4 Bivariate Data Analysis – Qualitative/Categorical CCSS.MATH.CONTENT.8.SP.A.4 Understand that patterns of association can also be seen in bivariate categorical data by displaying frequencies and relative frequencies in a two-way table. Construct and interpret a two-way table summarizing data on two categorical variables collected from the same subjects. Use relative frequencies calculated for rows or columns to describe possible association between the two variables. © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 5 Bivariate Data Analysis – Qualitative/Categorical CCSS.MATH.CONTENT.HSS.ID.B.5 Summarize categorical data for two categories in two-way frequency tables. Interpret relative frequencies in the context of the data (including joint, marginal, and conditional relative frequencies). Recognize possible associations and trends in the data. CCSS.MATH.CONTENT.HSS.CP.A.4 Construct and interpret two-way frequency tables of data when two categories are associated with each object being classified. Use the two-way table as a sample space to decide if events are independent and to approximate conditional probabilities. © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 6 Bivariate Data Analysis – Qualitative/Categorical Survived Did not survive 1st class passengers 2nd class passengers 201 118 123 166 3rd class passengers 181 528 Is there an association between class and survival? Survived Did not survive TOTAL 1st class passengers 201 123 324 2nd class passengers 118 166 284 3rd class passengers 181 528 709 TOTAL 500 817 1317 © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 7 Bivariate Data Analysis – Qualitative/Categorical So, what do you think?: Is there an association between class and survival? Survived Did not TOTAL survive 1st class 201 123 324 passengers 2nd class 118 166 284 passengers 3rd class 181 528 709 passengers TOTAL 500 817 1317 © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 8 Bivariate Data Analysis – Qualitative/Categorical Which table is easier to use to come to a conclusion? Why? What is the difference between the two tables? How is the second table generated from the first table? Survived Did not survive TOTAL 1st class passengers 201 123 324 2nd class passengers 118 166 284 3rd class passengers 181 528 709 Relative frequency table Survived Did not survive TOTAL 1st class passengers 62 38 100 2nd class passengers 42 58 100 3rd class passengers © 2014 Relay Graduate 26School of Education and Teach For America. 74 All rights reserved. 100 9 Bivariate Data Analysis – Qualitative/Categorical How was this second table generated? Survived Did not survive 1st class passengers 40 15 2nd class passengers 24 20 3rd class passengers 36 65 TOTAL 100 100 What question do the two tables help you answer? Survived Did not survive TOTAL 1st class passengers 62 38 100 2nd class passengers 42 58 100 3rd class passengers 26 74 100 © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 10 Bivariate Data Analysis – Qualitative/Categorical Your Turn! Can you make any substantiated claims from this data? Survived Did not survive Children in 1st class 4 1 Women in 1st class 139 4 Men in 1st class 58 118 Children in 2nd class 22 0 Women in 2nd class 83 12 Men in 2nd class 13 154 Children in 3rd class 30 50 Women in 3rd class 91 88 Men in 3rd class 60 390 © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 11 Where does all of this go? Q: Is there a more formal way to quantitatively measure if there is a significant difference between the different classes or genders in terms of who was saved and who perished? A: The chi-square test provides a method for testing the association between the row and column variables in a two-way table. (observed-expected) X =å expected 2 2 © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 12 Where does all of this go? The expected value for each cell in a two-way table is equal to: (row total) total)(column total) n ×(column total) n where n is the total number of observations included in the table. Q: Why does this formula make sense for Survived Did not survive TOTAL calculating the expected value (i.e. what we would st 1 class passengers 201 123 324 expect the table values to be if there were no nd 2 class passengers 118 166 284 association)? In other words, why would we rd 3 class passengers 181 528 709 multiply the row total and column total and divide TOTAL 500 817 1317 by n? © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 13 Where does all of this go? Survived Did not survive TOTAL 1st class passengers 201 123 324 2nd class passengers 118 166 284 3rd class passengers 181 528 709 TOTAL 500 817 1317 Expected Values? (row total) ×(column total) n Survived Did not survive 1st class passengers 123.01 200.99 2nd class passengers 107.82 176.18 3rd class passengers 269.17 439.83 © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 14 Where does all of this go? (observed-expected) X =å expected 2 2 1. Explain why the calculation of the chi-square statistic makes sense as a way to quantify if there is a difference between the variables in a table. 1. Do you think a large or small chi-square value would indicate an association between the two categorical variables? Explain. © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 15 Where does all of this go? CCSS.MATH.CONTENT.HSS.CP.A.3 Understand the conditional probability of A given B as P(A and B)/P(B), and interpret independence of A and B as saying that the conditional probability of A given B is the same as the probability of A, and the conditional probability of B given A is the same as the probability of B. CCSS.MATH.CONTENT.HSS.CP.B.6 Find the conditional probability of A given B as the fraction of B's outcomes that also belong to A, and interpret the answer in terms of the model. CCSS.MATH.CONTENT.HSS.CP.B.7 Apply the Addition Rule, P(A or B) = P(A) + P(B) - P(A and B), and interpret the answer in terms of the model. CCSS.MATH.CONTENT.HSS.CP.B.8 Apply the general Multiplication Rule in a uniform probability model, P(A and B) = P(A)P(B|A) = P(B)P(A|B), and interpret the answer in terms of the model. © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 16 Where does all of this go? Conditional Probability – The Power of Two-way Tables 1. If one of the passengers is randomly selected, what is the probability that this passenger was in first class? third class? 1. If one of the passengers is randomly selected, what is the probability that this passenger was in the first class and survived? 1. If one of the passengers who survived is randomly selected, what is the probability that this passenger was in third class? © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 17 Have You Seen a Probability Problem like this one?... The probability that a person has a certain virus is 0.005. A test used to detect the virus in a person is positive 80% of the time if the person has the virus and 5% of the time if the person does not have the virus. Let A be the event that “the person is infected” and B be the event “the person tests positive”. If a person tests positive, what is the probability that the person has the virus? © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 18 ...and solved this way? P( A) = 1/ 200 = 0.005 P(not A) = 0.995 P( B | A) = 0.80 P( B | not A) = 0.05 then we use: P( A) × P(B | A) P( A | B) = P( A) × P(B | A) + P(not A) × P( B | not A) © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 19 Why not solve it like this?... Use a contrived frequency table The probability that a person has a certain virus is 0.005. A test used to detect the virus in a person is positive 80% of the time if the person has the virus and 5% of the time if the person does not have the virus. Let A be the event that “the person is infected” and B be the event “the person tests positive”. If a person tests positive, what is the probability that the person has the virus? Has virus Does not have virus TOTAL Test + 4 Test 1 TOTAL 5 50 945 995 54 946 1000 © 2014 Relay Graduate School of Education and Teach For America. All rights reserved. 20