Download Titanic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Probability and Statistics
AMP Institutes & Workshops
Saturday, April 4th, 2015
Trey Cox. Ph. D.
Mathematics Faculty
Chandler-Gilbert Community College
James Spiker
A.P. Statistics
Basha High School, Chandler, AZ
This work was supported in part by MSP grant #1103080 through the National Science Foundation.
Opinions expressed are those of the authors and not necessarily those of the NSF.
Bivariate Data Analysis –
Qualitative/Categorical
On April 15, 1912, the Titanic struck an iceberg
and rapidly sank with only 710 of her 2,204
passengers and crew surviving.
I wonder…were the rich people more likely to survive?
Was chivalry alive and well on the Titanic?
Was it “every man for himself”?
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
2
Bivariate Data Analysis –
Qualitative/Categorical
1st class
passengers
2nd class
passengers
3rd class
passengers
Survived
201
Did not survive
123
118
166
181
528
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
3
Bivariate Data Analysis –
Qualitative/Categorical
Survived
Did not survive
1st class passengers
2nd class passengers
201
118
123
166
3rd class passengers
181
528
Bivariate Data: Passenger and Survival
Responsory variable? _______________
Explanatory variable? ______________
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
4
Bivariate Data Analysis –
Qualitative/Categorical
CCSS.MATH.CONTENT.8.SP.A.4
Understand that patterns of association can also
be seen in bivariate categorical data by displaying
frequencies and relative frequencies in a two-way
table. Construct and interpret a two-way table
summarizing data on two categorical variables
collected from the same subjects. Use relative
frequencies calculated for rows or columns to
describe possible association between the two
variables.
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
5
Bivariate Data Analysis –
Qualitative/Categorical
CCSS.MATH.CONTENT.HSS.ID.B.5
Summarize categorical data for two categories in two-way
frequency tables. Interpret relative frequencies in the
context of the data (including joint, marginal, and
conditional relative frequencies). Recognize possible
associations and trends in the data.
CCSS.MATH.CONTENT.HSS.CP.A.4
Construct and interpret two-way frequency tables of data
when two categories are associated with each object being
classified.
Use the two-way table as a sample space to decide if events
are independent and to approximate conditional
probabilities.
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
6
Bivariate Data Analysis –
Qualitative/Categorical
Survived
Did not survive
1st class passengers
2nd class passengers
201
118
123
166
3rd class passengers
181
528
Is there an association between class and survival?
Survived
Did not survive
TOTAL
1st class passengers
201
123
324
2nd class passengers
118
166
284
3rd class passengers
181
528
709
TOTAL
500
817
1317
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
7
Bivariate Data Analysis –
Qualitative/Categorical
So, what do you think?:
Is there an association between class and survival?
Survived
Did not
TOTAL
survive
1st class
201
123
324
passengers
2nd class
118
166
284
passengers
3rd class
181
528
709
passengers
TOTAL
500
817
1317
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
8
Bivariate Data Analysis –
Qualitative/Categorical
Which table is easier to use to come to a conclusion? Why?
What is the difference between the two tables?
How is the second table generated from the first table?
Survived
Did not survive
TOTAL
1st class passengers
201
123
324
2nd class passengers
118
166
284
3rd class passengers
181
528
709
Relative frequency table
Survived
Did not survive
TOTAL
1st class passengers
62
38
100
2nd class passengers
42
58
100
3rd class passengers © 2014 Relay Graduate
26School of Education and Teach For America.
74 All rights reserved.
100
9
Bivariate Data Analysis –
Qualitative/Categorical
How was this second table generated?
Survived
Did not survive
1st class passengers
40
15
2nd class passengers
24
20
3rd class passengers
36
65
TOTAL
100
100
What question do the two tables help you answer?
Survived
Did not survive
TOTAL
1st class passengers
62
38
100
2nd class passengers
42
58
100
3rd class passengers
26
74
100
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
10
Bivariate Data Analysis –
Qualitative/Categorical
Your Turn! Can you make any substantiated
claims from this data?
Survived
Did not survive
Children in 1st class
4
1
Women in 1st class
139
4
Men in 1st class
58
118
Children in 2nd class
22
0
Women in 2nd class
83
12
Men in 2nd class
13
154
Children in 3rd class
30
50
Women in 3rd class
91
88
Men in 3rd class
60
390
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
11
Where does all of this go?
Q: Is there a more formal way to quantitatively
measure if there is a significant difference
between the different classes or genders in terms
of who was saved and who perished?
A: The chi-square test provides a method for
testing the association between the row and
column variables in a two-way table.
(observed-expected)
X =å
expected
2
2
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
12
Where does all of this go?
The expected value for each cell in a two-way table
is equal to:
(row total)
total)(column total)
n
×(column total)
n
where n is the total number of observations
included in the table.
Q: Why does this formula make sense for
Survived
Did not survive
TOTAL
calculating
the expected value (i.e. what we would
st
1 class passengers
201
123
324
expect
the table values to be if there were no
nd
2 class passengers
118
166
284
association)?
In other words, why would we
rd
3 class passengers
181
528
709
multiply the row total and column total and divide
TOTAL
500
817
1317
by n?
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
13
Where does all of this go?
Survived
Did not survive
TOTAL
1st class passengers
201
123
324
2nd class passengers
118
166
284
3rd class passengers
181
528
709
TOTAL
500
817
1317
Expected Values?
(row total)
×(column total)
n
Survived
Did not survive
1st class passengers
123.01
200.99
2nd class passengers
107.82
176.18
3rd class passengers
269.17
439.83
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
14
Where does all of this go?
(observed-expected)
X =å
expected
2
2
1. Explain why the calculation of the chi-square
statistic makes sense as a way to quantify if there
is a difference between the variables in a table.
1. Do you think a large or small chi-square value
would indicate an association between the two
categorical variables? Explain.
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
15
Where does all of this go?
CCSS.MATH.CONTENT.HSS.CP.A.3
Understand the conditional probability of A given B as P(A and B)/P(B),
and interpret independence of A and B as saying that the conditional
probability of A given B is the same as the probability of A, and the
conditional probability of B given A is the same as the probability of B.
CCSS.MATH.CONTENT.HSS.CP.B.6
Find the conditional probability of A given B as the fraction of B's
outcomes that also belong to A, and interpret the answer in terms of the
model.
CCSS.MATH.CONTENT.HSS.CP.B.7
Apply the Addition Rule, P(A or B) = P(A) + P(B) - P(A and B), and
interpret the answer in terms of the model.
CCSS.MATH.CONTENT.HSS.CP.B.8
Apply the general Multiplication Rule in a uniform probability model, P(A
and B) = P(A)P(B|A) = P(B)P(A|B), and interpret the answer in terms of
the model.
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
16
Where does all of this go?
Conditional Probability – The Power of Two-way Tables
1. If one of the passengers is randomly selected, what
is the probability that this passenger was in first
class? third class?
1. If one of the passengers is randomly selected, what
is the probability that this passenger was in the
first class and survived?
1. If one of the passengers who survived is randomly
selected, what is the probability that this passenger
was in third class?
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
17
Have You Seen a Probability
Problem like this one?...
The probability that a person has a certain
virus is 0.005. A test used to detect the virus in
a person is positive 80% of the time if the
person has the virus and 5% of the time if the
person does not have the virus. Let A be the
event that “the person is infected” and B be the
event “the person tests positive”. If a person
tests positive, what is the probability that the
person has the virus?
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
18
...and solved this way?
P( A) = 1/ 200 = 0.005
P(not A) = 0.995
P( B | A) = 0.80
P( B | not A) = 0.05
then we use:
P( A) × P(B | A)
P( A | B) =
P( A) × P(B | A) + P(not A) × P( B | not A)
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
19
Why not solve it like this?...
Use a contrived frequency table
The probability that a person has a certain virus is 0.005. A
test used to detect the virus in a person is positive 80% of
the time if the person has the virus and 5% of the time if
the person does not have the virus. Let A be the event that
“the person is infected” and B be the event “the person
tests positive”. If a person tests positive, what is the
probability that the person has the virus?
Has virus
Does not have
virus
TOTAL
Test +
4
Test 1
TOTAL
5
50
945
995
54
946
1000
© 2014 Relay Graduate School of Education and Teach For America. All rights reserved.
20