Download Chapter 19

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Math 1680.010
Chapter 19
Page 1 of 7
Chapter 19: Sample Surveys
Recall: Observed Value = Actual Value + Chance Error + Bias
In probability, we assumed that we knew the population parameter (for
example, the chance a coin lands heads), but the samples that were
observed were unknown. This is the essence of chance.
In statistics, we have the reverse scenario: the sample is known, but the
population parameter is unknown.
Definition: Population. A population is some generalization about a
class of individuals, set of measurements, either existing or conceptual
Definition: Sample. A sample is a subset of measurements from the
population, some part of the population being examined
Definition: Inference. An inference is a generalization made about a
population based on a sample.
Definition: Parameter. A parameter is numerical fact about a
population investigators want to know.
Definition: Statistic. A statistic is a number that can be computed from
a sample. Parameters are estimated by statistics.
Ex. #1: Consider the following two questions and determine which is a
probability question and which is a statistics question.
Question #1: A fair coin is flipped 100 times. What is the chance of
getting at least 60 heads?
Question #2: A coin is flipped 100 times, and it lands heads 65 times.
Can we reasonably assert that the coin is loaded?
Math 1680.010
Chapter 19
Page 2 of 7
To determine the outcome of the next election, it is impractical to ask
the entire population – unless you actually held the election. Instead, a
sample is chosen, and the results from the sample are extrapolated to the
population. Such an extrapolation is only reasonable when the sample
is representative – the topic of today’s lecture.
Ex. #2: Suppose two 16-year-old boys are asked to conduct a mall
survey. Just whom do you think will be over represented in the sample?
Ex. #3: Let’s examine the Literary Digest poll of 1936. Their
prediction was that Roosevelt would get 43%. However, the actual
percentage was 62%. What went wrong?
Problems:
1. The sample was not representative of the population.
2. The Digest poll was sent to 10 million people; only 2.4 million
responded. Thus, the results were subject to non-response bias.
This is typical of surveys of convenience, like call-in polls or the daily
poll in ESPN’s Sportzone.
Non-respondents can be very different than respondents. In Chicago,
the Digest predicted that Landon would win at least half of the vote. In
actuality, he gained less than a third. The problem is that only 20% of
Chicagoans returned the survey.
In practice, low-income and high-income people tend not to respond to
questionnaires, and thus the middle class is over represented. For this
reason, pollsters prefer interviews, which have a higher response rate
(65%) than questionnaires (25%).
2
Math 1680.010
Chapter 19
Page 3 of 7
However, non-response bias is still a concern in modern polling, which
is why (in a telephone poll) the pollster will call back up to 3 times and
call in the evening (or during the day or on weekends).
(FYI: A modern 1,000-person poll costs $25,000 or so.)
Ex. #4: Consider the poll of 1948 given below. What went wrong for
the polls? How were the samples chosen?
The candidates Crossley
Gallup
Roper The results
Truman
45
44
38
50
Dewey
50
50
53
45
Thurmond
2
2
5
3
Wallace
3
4
4
2
Problem: Quota Sampling. Consider the following example of quota
sampling. In this example, the interviewer in St. Louis had to survey 13
people in the following manner.
Of these 13 people

6 women

7 men
 
1 black
6 white

Monthly rent:


1 of
3 of
2 of
$44.01 $18.01-$44 $18.00
or more
or less
In this way, the sample is force-fit to match the characteristics of the
population collected from the Census Bureau.
What potential bias exists in this method of sampling?
3
Math 1680.010
Chapter 19
Page 4 of 7
Keep in mind, this bias existed in the polls of 1936, 1940 and 1944.
However, they were not large enough to change the predicted outcome.
Ideal Method: Simple random sample: drawing tickets at random
without replacement from a box of tickets. At each draw, every ticket
has an equal chance of being drawn  and the interviewer thus has no
discretion as to whom they interview. The law of averages thus dictates
that the sample percentage is close to the population percentage.
Problem: In real life, there is no “master list” with the names of all
300  million Americans. Also, even if such a list existed, the potential
respondents would be located (actually dispersed) all over the country,
and the cost of doing personal interviews would be too high.
For this reason, survey organizations tend to use multistage cluster
sampling, illustrated in the figure above.
4
Math 1680.010
Chapter 19
Page 5 of 7
In all probability methods:
1) The interviewers have no discretion as to whom to interview;
2) There is a fixed algorithm for selecting the sample; and
3) It involves the planned use of chance.
These processes are implemented to minimize bias, non-sampling error.
Once the sample has been chosen, the desired information must then be
solicited from the sample. This aspect of the modern poll is a bit of an
art, and must be done with care in order to not bias the results.
Problems:
1. Nonvoters. Respondents may not want to admit they will not vote.
2. Undecided. This caused a big error in the 1992 election polls.
3. Response bias. The wording of the question or tone of interviewer’s
voice may affect response.
4. Non-response bias. This was discussed earlier.
5. Interviewer control. The interviewer may not follow instructions.
In most random polling, telephone surveys are used. The country is
divided into a number of regions, and random numbers are dialed in
each area code – called random digit dialing. This eliminates the worry
arising from unlisted numbers, and the Yellow Pages are used to
eliminate business phone numbers.
Ex. #6: What are some potential pitfalls in this method that pollsters
should worry about?
5
Math 1680.010
Chapter 19
Page 6 of 7
Ex. #7: Polls often conduct telephone pre-election surveys. Could this
bias the results? What if the sample is drawn from the telephone book?
Ex. #8: One study on slavery estimates that “11.9% of slaves were
skilled craftsmen.” This estimate was based on record of thirty
plantations in Plaque-mines Parish, Louisiana. Is the statistic
trustworthy?
Ex. #9: In any survey, a fair number of people who are in the original
sample cannot be contacted by the survey organization, or are contacted
but refuse to answer questions. A high non-response rate is a serious
problem for survey organizations. True or false, and explain: this
problem is serious because the investigators have to spend more time
and money getting additional people to bring the sample back up to its
planned size.
6
Math 1680.010
Chapter 19
Page 7 of 7
Summary:
Observed Value = Actual Value + Chance Error + Bias
To minimize bias (or non-sampling error), a probability method uses an
objective chance process to construct the sample.
Large samples do not preclude the possibility of bias – but relatively
small samples that are properly constructed can be used to predict the
behavior of a population of millions.
Even if a sample is properly chosen, bias may result when soliciting
information from the sample.
7