Download Sample Survey

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
STAT 100
Review Section
Week 1 & 2
Winnie Wu
[email protected]
Office Hours
By appointment
Location
Online or on campus
Key Topics
•
•
•
•
Association vs. Causation
Design of Experiments and Survey Methods
Randomization
Random Sampling-types of sampling bias
Causation vs. Association
For each example, list whether we you believe
there is causation, or some confounding/lurking
variable
a) Smokers have higher rates of lung cancer.
• Heavy coffee drinking is associated with
higher rates of smoking.
• Heavy alcohol consumption is associated
with higher rates of smoking.
b) Couples that live together before being
married are more likely to get divorced later.
• “Couples who are more confident about their
relationship are more likely to get married
straight away. Hence, more stable couples are
less likely to live together before marriage
than less stable couples. Living together per
se is not the problem. The real problem is
that a deeper source of instability is correlated
with cohabitation.”
• “More religious couples are less likely to get
divorced and less likely to live together before
marriage. ”
c) Dog owners live longer.
• It turned out that walking with a dog gave
seniors a boost in parasympathetic nervous
system activity, which is good because the
parasympathetic nervous system helps calm
and rest the body.
Sample Survey
In 1987, Shere Hite authored a book entitled
Women and Love: A Cultural Revolution in
Progress(http://www.amazon.com/WomenLove-Cultural-RevolutionProgress/dp/0394530527) which reported some
very captivating survey results on women's
intimacy and love relationships. She reported
the following:
Sample Survey
• 84% of women are “not satisfied emotionally with
their relationships” (p. 804)
• 70% of all women “married five or more years are
having sex outside of their marriages (p. 856)
• 95% of women “report forms of emotional and
psychological harassment from men with whom
they are in love relationships” (p. 810)
• 84% of women report forms of condescension
from the men in their love relationships (p. 809)
Sample Survey
Hite collected her sample by sending surveys to
100,000 women via mail. She mailed the
questionnaires to addresses collected from
mailing lists of groups of women professionals,
counseling centers, church societies and senior
citizen groups. She received about 4,500 surveys
in response.
Sample Survey
Obviously, this is not an example of great survey
sampling. For Hite's data collection techniques,
give an example of each of the following:
• Selection Bias
• Response Bias
• Non-Response Bias
KEY TOPICS
• Descriptive statistics
Center: mean, meadian
Spread: SD, range, IQR (and outlier detection)
Percentiles
• Graphics (bar plots, histograms, boxplots, scatterplot)
• Other concepts
Shape: symmetric, skewed, bell-shaped
Resistance, Outliers
Z-scores
Mean, Definition
n
x   xi / n
i 1
24

i 1
xi  4507, x  4507 / 24  187.8 mg/dl.
17
Mean Advantages vs. Disadvantages
Advantages
• It is representative of all the points.
• If the underlying distribution is normal, then it is the
most efficient estimator of the middle of the distribution.
• Many statistical tests are based on the mean.
18
Mean Advantages vs. Disadvantages
Disadvantages
• It is very sensitive to outliers, e.g., if one of
the cholesterol levels were 800 rather than
200 then the mean would be increased by
25 units.
19
Mean Advantages vs. Disadvantages
• It is inappropriate if the underlying distribution is
far from being normal, for example, a distribution
which looks highly skewed.
20
Median
Advantages
• Always guarantees that 50% of the data
values are on either side of the median.
• Insensitive to outliers.
21
Median
Disadvantages
• It is not as efficient an estimator of the
middle as the mean if the distribution really
is normal in that it is mostly sensitive to the
middle of the distribution.
• Most statistical procedures are based on the
mean.
22
Skewness
23
24
****Box plot of cholesterol_before and
cholesterol_after
25
SAMPLE QUESTION #1
Suppose we measure the amount of weight 5 Harvard Football players can
bench-press, and we record the following observations (in pounds): 280, 250,
355, 275, 290.
What are the mean, median, and standard deviation for these observations?
SAMPLE QUESTION #1
Suppose we measure the amount of weight 5 Harvard Football players can
bench-press, and we record the following observations (in pounds): 280, 250,
355, 275, 290.
a) What are the mean, median, and standard deviation for these observations?
Mean = 290
Median = 280
Standard Deviation = 39.21
SAMPLE QUESTION #1
Suppose we measure the amount of weight 5 Harvard Football players can
bench-press, and we record the following observations (in pounds): 280, 250,
355, 275, 290.
a) What are the mean, median, and standard deviation for these observations?
Mean = 290
Median = 280
Standard Deviation = 39.21
b) What would happen to the mean, median and sd if another player (let’s
say the kicker) joined the study and lifted only 220 lbs?
SAMPLE QUESTION #1
Suppose we measure the amount of weight 5 Harvard Football players can benchpress, and we record the following observations (in pounds): 280, 250, 355, 275,
290.
a) What are the mean, median, and standard deviation for these observations?
Mean = 290
Median = 280
Standard Deviation = 39.21
b) What would happen to the mean, median and sd if another player (let’s say
the kicker) joined the study and lifted only 220 lbs?
The median would be 275+280=277.5
The mean would be 278.33 and the standard deviation would be 45.24
SAMPLE QUESTION #2
Female Heights in US ~ N(μ = 63.8in, σ = 2.5in)
[http://en.wikipedia.org/wiki/Human_height]
Male Heights in US ~ N(μ = 69.2in, σ = 2.8in)
[http://hypertextbook.com/facts/2007/SimasCeckauskas.shtml]
a) What is the probability that your male TF is only 68in tall or shorter?
b) How tall does your male TF have to be in order to be taller than 90% of the US
population?
c) Shaquille O’Neal is 85 inches tall. What proportion of the US population is as tall
as Shaq (or taller)?
SAMPLE QUESTION #2
Female Heights in US ~ N(μ = 63.8in, σ = 2.5in) [http://en.wikipedia.org/wiki/Human_height]
Male Heights in US ~ N(μ = 69.2in, σ = 2.8in)
[http://hypertextbook.com/facts/2007/SimasCeckauskas.shtml]
a) What is the probability that your male TF is only 68in tall or shorter?
Using the equation from class and the given information for male height in the US, we get:
z-score= (68 – 69.2)/2.8 = -1.2/2.8 = -0.4286
From the z table, 0.4286 corresponds to a 33.36% chance that the male TF is 68in or shorter.
b) How tall does your male TF have to be in order to be taller than 90% of the US population?
Here we are working backwards. From the table, we find that 90% corresponds to a z-score =1.28.
Therefore, z-score = (x – 69.2) / 2.8 = 1.28
We can solve for x by rearranging the above equation:
(1.28 * 2.8) + 69.2 = x = 72.78 in
The male TF needs to be 72.78 inches in order to be taller than 90% of the US population.
c) Shaquille O’Neal is 85 inches tall. What proportion of the US population is as tall as Shaq (or taller)?
Using the equation from class and the given information for male height in the US, we get:
z-score= (85 – 69.2)/2.8 = 15.8/2.8 = 5.642
From the z table, 5.642 corresponds to less than a 0.01% chance that the US population is as tall as
Shaq or taller.
Help in R
• To find help on a particular topic, you
can type for example:
• help.search(“box plot”)
• If you already know the name of the
command you can type
• ?boxplot
32