Download Female says Yes - Duke Statistical

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
STAT 101
Dr. Kari Lock Morgan
Synthesis
Big Picture Essential Synthesis
• Review
• Speed Dating
Statistics: Unlocking the Power of Data
Lock5
Final
 Monday, April 28th, 2 – 5pm
 No make-ups, no excuses
 30% of your course grade
 Cumulative from the entire course
 Open only to a calculator and 3 double-sided
pages of notes prepared only by you
Statistics: Unlocking the Power of Data
Lock5
Help Before Final
 Wednesday, 4/23:
 3 – 4pm, Prof Morgan, Old Chem 216
 4 – 9pm, Stat Ed Help, Old Chem 211A
 Thursday, 4/24:
 5 – 7pm, Yating, Old Chem 211A
 4 – 9pm, Stat Ed Help, Old Chem 211A
 Friday, 4/25:
 1 – 3pm, Prof Morgan, Old Chem 216
 3 – 4 pm, REVIEW SESSION, room tbd
 Sunday, 4/27:
 4 – 6pm, Tori, Old Chem 211A
 6 – 7pm, Stat Ed Help, Old Chem 211A
 7 – 9pm, David, Old Chem 211A
 Monday, 4/28:
 12:30 – 1:30, Prof Morgan, Old Chem 216
Statistics: Unlocking the Power of Data
Lock5
Review
What is Bayes Rule?
a) A way of getting from P(A if B) to P(B if A)
b) A way of calculating P(A and B)
c) A way of calculating P(A or B)
Statistics: Unlocking the Power of Data
Lock5
Data Collection
• The way the data are/were collected
determines the scope of inference
• For generalizing to the population: was it a
random sample? Was there sampling bias?
• For assessing causality: was it a randomized
experiment?
• Collecting good data is crucial to making
good inferences based on the data
Statistics: Unlocking the Power of Data
Lock5
Exploratory Data Analysis
• Before doing inference, always explore your
data with descriptive statistics
• Always visualize your data! Visualize your
variables and relationships between variables
• Calculate summary statistics for variables
and relationships between variables – these
will be key for later inference
• The type of visualization and summary
statistics depends on whether the variable(s)
are categorical or quantitative
Statistics: Unlocking the Power of Data
Lock5
Estimation
• For good estimation, provide not just a point
estimate, but an interval estimate which takes into
account the uncertainty of the statistic
• Confidence intervals are designed to capture the true
parameter for a specified proportion of all samples
• A P% confidence interval can be created by
• bootstrapping (sampling with replacement from the
sample) and using the middle P% of bootstrap statistics
*
statisti
c

z
 SE
•
Statistics: Unlocking the Power of Data
Lock5
Hypothesis Testing
• A p-value is the probability of getting a
statistic as extreme as observed, if H0 is true
• The p-value measures the strength of the
evidence the data provide against H0
• “If the p-value is low, the H0 must go”
• If the p-value is not low, then you can not
reject H0 and have an inconclusive test
Statistics: Unlocking the Power of Data
Lock5
p-value
• A p-value can be calculated by
• A randomization test: simulate statistics
assuming H0 is true, and see what
proportion of simulated statistics are as
extreme as that observed
• Calculating a test statistic and comparing
that to a theoretical reference distribution
(normal, t, 2, F)
Statistics: Unlocking the Power of Data
Lock5
Hypothesis Tests
Variables
One Quantitative
Appropriate Test
Single mean (t)
One Categorical
Single proportion (normal)
Chi-square Goodness of Fit
Difference in proportions (normal)
Chi-square Test for Association
Two Categorical
One Quantitative,
One Categorical
Two Quantitative
More than two
Difference in means (t)
Matched pairs (t)
ANOVA (F)
Correlation (t)
Slope in Simple Linear Regression (t)
Multiple Regression (t, F)
Statistics: Unlocking the Power of Data
Lock5
Regression
• Regression is a way to predict one response
variable with multiple explanatory variables
• Regression fits the coefficients of the model
y  0  1 x1  2 x2  ...  k xk   i
• The model can be used to
• Analyze relationships between the explanatory
variables and the response
• Predict Y based on the explanatory variables
• Adjust for confounding variables
Statistics: Unlocking the Power of Data
Lock5
Probability
P( A or B)  P( A)  P( B)  P( A and B)
P( A and B)  P( A if B) P( B)
P(not A)  1  P( A)
P( A and B) P( B if A) P( A)
P( A if B) 

P( B)
P( B)
P( A)  P( A and B) + P(A and not B)
Statistics: Unlocking the Power of Data
Lock5
Romance
• What variables help to predict romantic
interest?
• Do these variables differ for males and females?
• All we need to figure this out is DATA!
(For all of you, being almost done with STAT 101,
this is the case for many interesting questions!)
Statistics: Unlocking the Power of Data
Lock5
Speed Dating
• We will use data from speed dating conducted
at Columbia University, 2002-2004
• 276 males and 276 females from Columbia’s
various graduate and professional schools
• Each person met with 10-20 people of the
opposite sex for 4 minutes each
• After each encounter each person said either
“yes” (they would like to be put in touch with
that partner) or “no”
Statistics: Unlocking the Power of Data
Lock5
Speed Dating Data
What are the cases?
a) Students participating in speed dating
b) Speed dates
c) Ratings of each student
Statistics: Unlocking the Power of Data
Lock5
Speed Dating
What is the population?
 Ideal
population?
 More
realistic population?
Statistics: Unlocking the Power of Data
Lock5
Speed Dating
It is randomly determined who the students will
be paired with for the speed dates.
We find that people are significantly more likely
to say “yes” to people they think are more
intelligent.
Can we infer causality between perceived
intelligence and wanting a second date?
a) Yes
b) No
Statistics: Unlocking the Power of Data
Lock5
Successful Speed Date?
What is the probability that a speed date is
successful (results in both people wanting a
second date)?
To best answer this question, we should use
a)
b)
c)
d)
e)
Descriptive statistics
Confidence Interval
Hypothesis Test
Regression
Bayes Rule
Statistics: Unlocking the Power of Data
Lock5
Successful Speed Date?
63 of the 276 speed dates were deemed
successful (both male and female said yes).
A 95% confidence interval for the true
proportion of successful speed dates is
a)
b)
c)
d)
(0.2, 0.3)
(0.18, 0.28)
(0.21, 0.25)
(0.13, 0.33)
Statistics: Unlocking the Power of Data
Lock5
Pickiness and Gender
Are males or females more picky when it comes
to saying yes?
Guesses?
a) Males
b) Females
Statistics: Unlocking the Power of Data
Lock5
Pickiness and Gender
Males
Yes No
146 130
Females 127 149
Are males or females more picky when it comes
to saying yes? How could you answer this?
a)
b)
c)
d)
e)
Test for a single proportion
Test for a difference in proportions
Chi-square test for association
ANOVA
Either (b) or (c)
Statistics: Unlocking the Power of Data
Lock5
Pickiness and Gender
Do males and females differ in their pickiness?
Using α = 0.05, how would you answer this?
a) Yes
b) No c) Not enough information
Statistics: Unlocking the Power of Data
Lock5
Reciprocity
Male says Yes Male says No
Female says Yes
63
64
Female says No
83
66
Are people more likely to say yes to someone who
says yes back? How would you best answer this?
a)
b)
c)
d)
e)
Descriptive statistics
Confidence Interval
Hypothesis Test
Regression
Bayes Rule
Statistics: Unlocking the Power of Data
Lock5
Reciprocity
Male says Yes Male says No
Female says Yes
63
64
Female says No
83
66
Are people more likely to say yes to someone
who says yes back? How could you answer this?
a)
b)
c)
d)
e)
Test for a single proportion
Test for a difference in proportions
Chi-square test for association
ANOVA
Either (b) or (c)
Statistics: Unlocking the Power of Data
Lock5
Reciprocity
 Are people more likely to say yes to someone
who says yes back?
 p-value = 0.3731
 Based on this data, we cannot determine
whether people are more likely to say yes
to someone who says yes back.
Statistics: Unlocking the Power of Data
Lock5
Race and Response: Females
Does the chance of females saying yes to males
differ by race?
Asian Black Caucasian Latino Other
0.50 0.57
0.42
0.48
0.53
How could you answer this question?
a)
b)
c)
d)
e)
Test for a single proportion
Test for a difference in proportions
Chi-square goodness of fit
Chi-square test for association
ANOVA
Statistics: Unlocking the Power of Data
Lock5
Race and Response: Males
Each person rated their date on a scale of 1-10
based on how much they liked them overall.
Does how much males like females differ by race?
How would you test this?
a)
b)
c)
d)
e)
Chi-square test
t-test for a difference in means
Matched pairs test
ANOVA
Either (b) or (d)
Statistics: Unlocking the Power of Data
Lock5
Physical Attractiveness
Each person also rated their date from 1-10 on
the physical attractiveness. Do males rate
females higher, or do females rate males higher?
Which tool would you use to answer this
question?
a)
b)
c)
d)
e)
Two-sample difference in means
Matched pair difference in means
Chi-Square
ANOVA
Correlation
Statistics: Unlocking the Power of Data
Lock5
Physical Attractiveness
𝑥𝑀 − 𝑥𝐹 = 0.406
95% CI: (0.10, 0.71)
p-value = 0.01
The histogram shown is of the
Statistics: Unlocking the Power of Data
a)
b)
c)
d)
data
bootstrap distribution
randomization distribution
sampling distribution
Lock5
Other Ratings
 Each person also rated their date from 1-10 on
the following attributes:
 Attractiveness
 Sincerity
 Intelligence
 How
fun the person seems
 Ambition
 Shared interests
 Which of these best predict how much
someone will like their date?
Statistics: Unlocking the Power of Data
Lock5
Multiple Regression
MALES RATING FEMALES:
FEMALES RATING MALES:
Statistics: Unlocking the Power of Data
Lock5
Ambition and Liking
Do people prefer their dates to be less ambitious???
How does the perceived ambition of a date relate to
how much the date is liked?
How would you answer this question?
a)
b)
c)
d)
e)
Inference for difference in means
ANOVA
Inference for correlation
Inference for simple linear regression
Either (b), (c) or (d)
Statistics: Unlocking the Power of Data
Lock5
Simple Linear Regression
MALES RATING FEMALES:
FEMALES RATING MALES:
Statistics: Unlocking the Power of Data
Lock5
Ambition and Liking
r = 0.44, SE = 0.05
𝛽1 =0.28, SE = 0.06
Find a 95% CI for .
Test whether 1 differs from 0.
Statistics: Unlocking the Power of Data
Lock5
After taking STAT 101:
 If you have a question that needs answering…
Thank You!!!
Statistics: Unlocking the Power of Data
Lock5