Download 51-designing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Warm-up
In the United States, heart disease kills roughly one-anda-quarter times as many people as cancer. If you look at
the death rate per 100,000 residents by state, the
distribution for the two diseases are roughly normal,
provided you leave out Alaska and Utah, which are
outliers because of their unusually young populations.
The Mean and standard deviations for all 50 states are
given below:
Mean
SD
Heart disease
238
52
Cancer
196
31
Alaska has 88 deaths per 100,000 residents from heart
disease and 111 death per 100,000 from cancer. Explain
which death rate is more extreme compared to other states.
Solution
z heart
zcancer
88  238

 2.88
52
111  196

 2.74
31
Alaska’s death rate for heart disease is 2.88 standard
deviation below the mean. The death rate for cancer
is 2.74 standard deviation below the mean. These
rates are extreme but the death rate for heart disease
is more extreme.
Daniel S. Yates
The Practice of Statistics
Third Edition
Chapter 5:
Producing Data
5.1 Designing Samples
Copyright © 2008 by W. H. Freeman & Company
Essential Questions
• What is the difference between an
observational study and an experiment?
• What is a simple random Sample (SRS)?
• What are the different types of sampling
methods?
• What is bias? What are the sources of
bias?
Two Basic Ways to Collect
Data
Observational Study
Experiment
Observational vs. Experiment
• An observational study observes
individuals and measures variable of
interest but does not attempt to influence
the responses.
• An experiment, on the other hand,
deliberately imposes some treatment on
individuals in order to observe their
responses.
Observational and experimental
studies …
• Observational study is one in which measurements
representing a variable of interest are observed and
recorded, without controlling any factor that might
influence their values.
• Experimental study is one in which measurements
representing a variable of interest are observed and
recorded, while controlling factors that might influence
their values.
Sampling vs. a Census
• Sampling involves studying a part in order
to gain information about the whole.
• A census attempts to contact every
individual in the entire population.
Sample vs. Population
 What are some reasons that a sample
would be used instead of using the
population?
1. Cost
2. Access
3. Time
4. Not necessary if the sample truly
represents the population
Statistical Inference
• The purpose of collecting data on a
sample is to answer some question of
interest and make an inference about the
population
• or to conduct an experiment to
confirm/support a cause/effect relationship
when two variables, explanatory and
response, are related.
Parameter
— A descriptive measure of a population.
Statistic
— A descriptive measure of a sample.
Because populations tend to be very large, most population
parameters are not only unknown but also unknowable.
We can only use statistics inference to obtain an estimate if
willing to accept less than 100% accuracy. Instead of
investigating the entire population, we choose to study a
sample.
How to capture a “Sample”
• Getting a portion of the population is not
difficult.
• Getting a good sample is difficult.
• Creating a plan to do this is called “sample
design”.
How not to sample – Voluntary
Reponse Sample
• Voluntary response sample consists of
people who choose themselves by
responding to a general appeal (example:
Call in opinion polls).
• The problem with call in opinion polls is
that the people who answer the polls tend
to have strong opinions, especially strong
negative opinions.
• This sample is biased; this sample is not
representative of the population.
How not to sample – Convenience
Sample
• Choosing individuals who are easiest to reach is
called Convenience sampling. (For example:
Mall intercept interviews.)
• Convenience sampling may not get you access
to all the people in the population.
• Interviewers often avoid people who may make
them feel uncomfortable.
• This sample is biased; this sample is not
representative of the population.
Definition of Bias
Remedy for bias in choosing a sample is to
allow chance to do the selecting.
How to sample
• The best way to sample is to use a “simple
random sample”
• A simple random sample (SRS) of size n
consists of n individuals from the
population chosen in such a way that
every set of n individuals has and equal
chance to be the sample actually selected.
How to create a SRS
• Steps for choosing an SRS:
– Step 1: Label. Assign a numerical label to every
individual in the population.
– Step 2: Make Random selections of labels
• Random number table (Table B)
• Random number generator (RandInt in the TI-83/84)
• Computer software.
– Step 3: Stopping Rule – criteria use to stop sampling.
– Step 4: Identify Sample. Use the labels to identify the
subjects selected to be in the sample.
Using a Table of Random Digits
Look at Problem 5.10 page 341
Problem 5.10
131 |
05007 16632 81194 14873 04197 855776
O5 Beach Castle
19 Sea Castle
20 Banyan Tree
Probability Sample
Methods that use chance to choose a sample is a
Probability Sample.
• Some probability samples methods give every member
an equal chance for selection (SRS). This may not be
true for more elaborate sampling methods.
• However, the use of chance to select the sample is the
essential principle in statistical sampling.
More Complex Sampling Methods
• Methods for sampling from large
populations over wide areas may be more
complex than SRS.
• Common examples are:
– Stratified Random Sample
– Cluster Sampling
– Multi-Stage Sampling
Stratified Random Sample
Some Reasons to Do a Stratified
Random Sample
• It assures that you will be able to represent not
only the overall population, but also key
subgroups of the population.
If you want to be able to talk about subgroups,
this method gives you’re the ability to do so.
• Stratified random sampling will generally have
more statistical precision than simple random
sampling. This will only be true if the strata or
groups are homogeneous.
Sampling Methods/Designs
Stratified Random Sample
sample important groups within the population
separately and then combine the groups
Steps:
a. divide the population into groups of similar
individuals, called strata (gender, age, political
party, weight)
b. choose a separate SRS in each stratum
c. combine to form the full sample
Choose the strata based on facts
known BEFORE the sample is taken
Stratified Random Sample
For example, let's say that the population of clients for our agency can be
divided into three groups: Caucasian, African-American and HispanicAmerican. Furthermore, let's assume that both the African-Americans and
Hispanic-Americans are relatively small minorities of the clientele (10% and
5% respectively).
Stratified Random Sample
Example
A school official wants to estimate the average number of
hours per week that students devote to homework.
Because she believes that this figure will differ
considerably among classes, stratified random sample will
be employed. The population of students at this school will
be group into four strata consisting of all freshman,
sophomore, juniors and seniors. From each stratum, a
random sample of students will then be selected. The
resulting information can be combined to obtain an
estimate that is expected to be more precise than that
obtained from a random sample of the entire population.
Cluster Sampling
The main difference between Stratified Random
Sampling and Cluster sampling is once you
randomly select the clusters, all members of the
each selected cluster is part of the sample.
Stratified Random Sampling, you do an SRS at
all levels.
Cluster Sampling Example
Suppose an organization wishes to find out which
sports seniors are participating in across PA. It
would be too costly and take too long to survey
every student, or even some students from every
school. Instead, 100 schools are randomly
selected from all over PA.
These schools are considered to be clusters.
Then, every senior student in these 100 schools
is surveyed. In effect, students in the sample of
100 schools represent all seniors in Pa.
Multistage Sampling Design
• Randomly choose stage 1 strata (for
example, states)
• Randomly choose stage 2 strata (for
example, cities within states)
• and so on until you get down to the
sample size.
Analysis of Complex Sampling
Designs
• Analysis of data from sampling designs
more complex than an SRS is beyond the
scope of this course.
• However, the SRS is the building block of
the more complex design.
• The fundamental concepts are the same.
Cautions About Sample Surveys
• Response Bias – The behavior of the respondent or of
the interviewer can influence a response. For example a
respondent may lie about illegal or embarrassing
behavior.
• Poorly Worded Questions – Confusing or leading
questions can introduce strong bias.
Inferences About The Population
• Using chance to choose a sample
eliminates bias in the selection of the
sample of available individuals.
• The results from a sample is unlikely to
exactly match the entire population.
• We can improve our accuracy by using
large random samples.
Review Questions
A business school researcher wants
to know what factors affect the
survival and success of small
businesses. She selects a sample of
150 restaurants from those listed in
the Yellow pages. The population is…
1.
2.
3.
4.
Successful restaurants
150 restaurants she chose
All restaurants in the city
All small businesses
An SRS is
1. Stratified Random
Sample
2. Simple Random Sample
3. Statistically Real Survey
4. Single Radon Stocker
In an SRS, the sample
1. Is divided into
groups
2. Is selected
randomly
3. Biased
4. Is on a voluntary
basis
The design of a study is biased
if…
1. An SRS was used
2. It systematically
favors certain
outcomes
3. Population is divided
into strata
4. Not all individuals
are surveyed
A student wants to know the opinions
of the teachers in his school about
final exams so he asks his current
teachers.
1.
2.
3.
4.
SRS
Stratified Random
Convenience
Voluntary
A group of high school students
are first divided into groups by
elementary school attended and
then divided by grade. This is an
example of…
1. SRS
2. Stratified Random
Sample
3. Convenience
Sample
4. Multistage Sample
A principal wants to know the attitudes
of the students towards final exams so
she divides the students by grade and
then randomly selects 20 students
from each grade
1. SRS
2. Stratified
Random
3. Convenience
4. Voluntary
1.
2.
3.
4.
A textbook publisher wants to
know the attitudes of teachers in
the state towards final exams so
a survey is sent to the members
of the teachers’ union from his
hometown.
SRS
Stratified Random
Convenience
Voluntary
A state ed. Board member wants to
know the attitudes of teachers toward
final exams so she interviews
teachers at the state teachers’
convention.
1.
2.
3.
4.
SRS
Stratified Random
Convenience
Voluntary
A local church is trying to determine
the most popular hymn. They select a
random sample from the traditional
service (as opposed to the
contemporary service) at 8am.Which
bias is present.
1.
2.
3.
4.
Response Bias
Nonresponse Bias
Undercoverage
Leading question
If a sample is an SRS, then the
results will be free of bias.
1. True
2. False
Which of the following is not a
probability sample.
1.
2.
3.
4.
5.
6.
Voluntary
Convenience
SRS
Stratified Random Sample
Both 1 and 2
Both 3 and 4
If a survey requires that the
respondents call in with their
opinion and the interviewer
randomly selects which callers to
include, then it is an SRS.
1. True
2. False