Download Monday, Sept. 17 Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Feb. 6 Statistic for the day:
Number of Florida high school
students who take physical
education courses online: 1204
Assignment: Continue to review for test
on Monday!
These slides were created by Tom Hettmansperger and in some cases
modified by David Hunter
Friday, Feb. 6
Review
Exam #1 (100 points)
Monday, Feb 9 in class
60 Multiple choice questions
Responsible for
Anything in lecture (except SFD)
Anything in book Chapts 1,4,5,7,8,9
Bring ID! Bring pencils! Bring 1 sheet of notes!
2 Types of studies to obtain data
relevant to your research:
Randomized Experiment
 Observational Study

Literary Digest Survey Results:
2.4 million responded!
 43% were for Roosevelt
 Literary Digest predicted a landslide victory
for Alf Landon

Turning Data into Information:
The distribution of the data



The shape of the distribution
 Is it skewed or is it symmetric?
What is a typical value?
 Should we use the mean or the median?
What is the spread of the distribution?
 Should we use the standard deviation or the
interquartile range?
 What are the quartiles?
Mean vs. Median: Which is more
“typical” in this (right-skewed) case?
Histogram of CD ownership, Stat 100.2 S04
Frequency
100
50
0
0
Mean = 89
500
Median=50
CDs
1000
Age at Death of English Rulers
60, 50, 47, 53, 48, 33, 71, 43, 65, 34,
56, 59, 49, 81, 67, 68, 49, 16, 86, 67
Turn these data into information.
Shape: Stem and Leaf Display
1 6
2
3 34
4 37899
5 0369
6 05778
7 1
8 16
The Median and the Quartiles
M
Q1
(5)
(5)
Q3
(5)
(5)
16 33 34 43 47 * 48 49 49 50 53**56 59 60 65 67***67 68 71 81 86
The first quartile is the number that divides the data into the first
quarter and the last three quarters.
The median divides the data into halves.
5 Number Summary
Median M = 54.5
 First Quartile Q1 = 47.5
 Third Quartile Q3 = 67
 Lowest = 16
 Highest = 86

Anatomy of a Boxplot
Age at death of a sample of 20
rulers of England
90
Reasonable range of data
(whiskers)
80
70
Q3
age
60
50
M
IQR = Q3 - Q1
Q1
40
30
20
Outlier
10
Shape: Histogram
Age at death of a sample of 20 rulers of England
5
Frequency
4
3
2
1
0
10
20
30
40
50
age
60
70
80
90
Rough way to approximate the
standard deviation:
Look at the histogram and estimate the
range of the middle 95% of the data.
The standard deviation is about
¼ of this range
Research Question 1: How high
should I build my doorways so
that 99% of the people will not
have to duck?
(Assume normal distribution with mean 68, st. dev. 4)
Secondary Question 2: If I built my
doors 75 inches (6 feet 3 inches) high,
what percent of the people would
have to duck?
Z-Scores: Measurement in
Standard Deviations
Given the mean (68), the standard deviation
(4), and a value (height say 75) compute
Z = (75-mean) / SD = (75-68) / 4 = 1.75
This says that 75 is 1.75 standard deviations
above the mean.
Morals of the story:
Whenever you meet a graph that is very far
from square, it is likely to produce an
impression different from what you would
have obtained from the data themselves.
 Almost any graph in which the vertical
scale does not start at zero is deceptive.

BAD
Bogus vertical scale. Hard to say what the graph should look like.
Portion of income taken by the government.
Top: spending equal to the income in western states.
Bottom: more densely populated east.
A perplexing polling paradox
People generally believe the results of polls.
 People do not believe in the scientific
principles on which polls are based

According to Gallup, most Americans said that
a survey of 1500 to 2000 respondents (a largerthan-average sample size for national polls)
CANNOT represent the views of all Americans.
How are Gallup Opinion Polls
Taken?





Telephone interviews: Random digit dialing
At random pick
 Exchange (area code + first three digits; e.g.,
814 865)
 Next two digits eg. 22
 Last two digits eg. 11
Up to three callbacks (why callbacks?)
Evenings and weekends
This catches unlisted numbers
Designed to be a random sample from the
POPULATION of people with telephones.
All members of the population are equally
likely to be in the sample.
Called a SIMPLE RANDOM SAMPLE.
Polls typically take roughly 1500 or 1600 people.
Margin of error: 2 standard deviations
We generally will NOT have the benefit of
a histogram to get the standard deviation
or the margin of error of the sample percentage.
SECRET FORMULA FOR THE
MARGIN OF ERROR OF A
SAMPLE PERCENTAGE:
1
------------------------------Square root of sample size
The Morning After Pill
Do you think that the ‘morning-after’ contraceptive
pill should be available over the counter?
Yes
No
Not sure
59.1%
37.1%
3.8%
USA Today call-in poll
(http://www.usatoday.com/quick/health/qh1206a.htm)
Volunteer response vs. volunteer
sample
Contraceptive call-in poll?
Volunteer sample!
1936 Literary Digest poll?
Volunteer response!
Which is worse?
Volunteer sample!
Do you have a tattoo?
Yes
Men
15%
No
Men
85%
Yes
Women
23%
No
Women
77%
Based on:
100 men
136 women
Stat100.2 S04
Sampling methods
(Simple) random sampling
 Stratified random sampling
 Cluster sampling
 Systematic sampling
 Bad: Haphazard or convenience sampling
(as in tattoo survey)

Stratified random sampling
Divide population into subgroups, or strata
 From each stratum, select a random sample

Example: Select a random sample from
each of four groups of students (in-state
non-minority, in-state minority, out-of-state
non-minority, out-of-state minority) to
ensure adequate representation of each
group.
Cluster sampling
Divide population into subgroups, or
clusters
 Select a random sample of clusters
 Measure individuals within selected clusters
according to some plan
Example: To study high schoolers, first take
a random sample of schools and then look in
depth at all students in selected schools

Systematic sampling

From a list of individuals in the population,
select every kth individual
Grizzly example: “Decimation”, a term originally used
for a punishment for mutinous Roman legions in which
the legion was lined up and every tenth person killed.
Comparisons
Randomized Experiments
Observational Studies
EXPLANATORY VARIABLE says which
population we sampled from.
RESPONSE VARIABLE says what we
measured or counted.
The key to a good observational study or
a good randomized experiment is
RANDOMIZATION
in both cases.
• In observational studies we need a random
sample from each population.
• In randomized experiments we must
randomize the subjects to the different
treatments (or treatment and control groups).
Randomized Experiment
Associated concepts and ideas:
•Control group (provides a benchmark)
•Blinding: single or double (reduce bias)
•Placebo (benchmark, blinding)
•Confounding (a lurking third variable)
•Pairing or blocking (reduces noise in data)
The Hawthorne effect
Imagine the following study, intended to
determine the prevalence of cheating:
Individual students taking an exam in a particular
course are filmed and observed closely by a team of
extra observers, who then record the number of
instances of cheating they observe.
Named for Elton Mayo’s famous study (1924-1932) of
workers at the Hawthorne, Illinois plant of the Western
Electric Company
Research question: Do cell phones cause
cancer?
What sort of a study could be used to answer this?
•Observational Study?
•Randomized Experiment?
If we cannot establish cause and effect, perhaps
we can we establish an association between cell
phones and cancer using an observational study.
Possible Observational Study:
Response Variable: whether or not a subject
gets cancer.
Explanatory Variable: whether or not the subject
uses a cell phone.
This may require a very long time.
A special kind of observational study:
SWITCH RESPONSE AND EXPLANATORY VARIABLES
Response Variable: whether a subject uses a cell phone or not
Explanatory Variable: whether a subject has cancer or not.
1. Select a sample of cancer patients (Cancer Case)
2. Develop a group of people who match the
cancer patients but do not have cancer. (Control)
3. Compute the % who use cell phones in each group.
Called a retrospective Case-Control Study
Research question: How does
putting a smiley face on the bill
influence a waitperson’s tip?





Response variable: Size of tip
Explanatory variable: Smiley face or not
Interacting variable: Sex of waitperson
Female waitress: Drawing a smiley face increased
tip
Male waiter: Drawing a smiley face decreased tip
Source: Journ. Appl. Soc. Psych, 1996