Download lab5 - Personal.psu.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
 1 in 8 women (12.5%) of women get breast
cancer, so P(breast cancer if female) = 0.125
 1 in 800 (0.125%) of men get breast cancer, so
P(breast cancer if male) = 0.00125
Statistics: Unlocking the Power of Data
Lock5
Two-Way Table
 Create a two-way table, with 1000 each of
males and females.
Gender/C Breast Cancer
ancer
No Breast Total
Cancer
Female
Male
Total
875
998.75
1873.75
0.125*1000 = 125
0.00125*1000 = 1.25
126.25
1000
1000
2000
 What’s the overall (unconditional) probability
of breast cancer?
Statistics: Unlocking the Power of Data
Lock5
Conditional Probability
Gender/ Breast Cancer No Breast Total
Cancer
Cancer
Female
Male
Total
125
1.25
126.25
875
998.75
1873.75
1000
1000
2000
 What’s P(breast cancer if female)?
 What’s P(female if breast cancer)?
 P(A if B) is NOT the same as P(B if A)!!!
Statistics: Unlocking the Power of Data
Lock5
Odds Ratio
 The odds ratio (OR) is the ratio of the odds of
an event in one group to the odds of an event
in another group
 Odds ratio for breast cancer comparing
females to males:
odds of getting breast cancer for males
OR =
odds of getting breast cancer for females
Statistics: Unlocking the Power of Data
Lock5
Odds Ratio
odds of getting breast cancer for males
OR =
odds of getting breast cancer for females
P(breast cancer if female)
1- P(breast cancer if female)
=
P(breast cancer if male)
1- P(breast cancer if male)
1/ 8
1/ 7
11
/
8
=
=
= 114.14
1 / 800
1 / 799
1- 1 / 800
Statistics: Unlocking the Power of Data
Lock5
Unit A
Essential Synthesis
Statistics: Unlocking the Power of Data
Lock5
The Big Picture
Population
Sampling
Sample
Statistical
Inference
Statistics: Unlocking the Power of Data
Descriptive
statistics
Lock5
Chapter 1: Data Collection
Was the sample
randomly selected?
Yes
No
Possible to
generalize to
the population
Should not
generalize to
the
population
Statistics: Unlocking the Power of Data
Was the explanatory
variable randomly
assigned?
Yes
Possible to
make
conclusions
about causality
No
Can not make
conclusions
about causality
Lock5
Chapter 2: Descriptive Statistics
 Type of summary statistics and visualization
methods depend on the type of variable(s) being
analyzed (categorical or quantitative)
Statistics: Unlocking the Power of Data
Lock5
Variable(s)
Visualization
Summary Statistics
Categorical
bar chart,
pie chart
frequency table,
relative frequency table,
proportion, odds
Quantitative
dotplot,
histogram,
boxplot
mean, median, max, min,
standard deviation,
z-score, range, IQR,
five number summary
Categorical vs
Categorical
side-by-side bar chart, two-way table, difference
segmented bar chart
in proportions, odds
ratio
Quantitative vs
Categorical
Overlaid histograms,
parallel dotplots,
side-by-side boxplots
statistics by group,
difference in means
Quantitative vs
Quantitative
scatterplot
correlation
Statistics: Unlocking the Power of Data
Lock5
Descriptive Statistics
Think of a topic or question you would like to
use data to help you answer.

What would the cases be?

What would the variables be?
(Limit to one or two variables)
Statistics: Unlocking the Power of Data
Lock5
Descriptive Statistics
How would you visualize and summarize the
variable or relationship between variables?
a) bar chart/pie chart, proportions, frequency
table/relative frequency table, odds
b) dotplot/histogram/boxplot, mean/median,
sd/range/IQR, five number summary
c) side-by-side or segmented bar charts, difference in
proportions, two-way table, odds ratio, conditionals
d) side-by-side boxplot, difference in means
e) scatterplot, correlation
Statistics: Unlocking the Power of Data
Lock5