Download Bayesian

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability interpretations wikipedia , lookup

Probability box wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
WhitePaper
Towards the Use of Bayesian Credibility
Intervals in Online Survey Results
Alan Roshwalb, Neale El-Dash, and Clifford Young
To w a r d s T h e U s e O f B a y e s i a n C r e d i b i l i t y I n te r v a l s I n O n l i n e S u r v e y R e s u l t s
Motivation
Introduction
Survey researchers and survey statisticians have been
dealing with compromise since the very beginnings
of the field. From the start we have been trading off
quality versus cost and results versus intuition. We
survey researchers and survey statisticians always ask
ourselves what is appropriate for the quality and the cost.
Should we use a door-to-door multi-stage area sample
versus a telephone sample, how big a sample should
we use, how many callbacks do we attempt, do we use
a single frame or a multiple frame, can we achieve our
goals using a mail study, will data be better if it were
self-administered or interviewer administered, or is an
incentive appropriate and how much? Each question
asks us to assess the gains, the pitfalls, potential bias,
loss or gain of information and its costs. Each decision
can lead us towards or away from the perfect probability
sample. Most of us review the data as it comes out of the
field. We often ask ourselves, does it look right? Does
it feel right? Who can forget the overwhelming support
for Pat Buchanan in Palm Beach County, FL in the 2000
Presidential election? It didn’t look right, it didn’t feel
right, and in the end, there were so many inconsistencies
in the electoral ballot form design, it wasn’t right.1
At the roots of survey research is to sample and collect
information from the target population through a sample
from the population. The purpose of sampling is to reduce
the cost and/or the amount of work that it would take to
survey the entire target population. Since only a portion of
the target population is measured, survey results need to
be generalized, allowing for statements about the entire
population to be made. Included in a generalization is a
measure of uncertainty with respect to the estimate. If
the estimate has a high degree of uncertainty, then there
is a wide range of values that can be the true population
value. If the estimate has little uncertainty attached to it,
then the range of plausible values is narrower, and hence
the estimate is more precise.
The most common measure of uncertainty is the
“confidence interval” (or “margin of error”) of the
estimate. It describes the level of confidence the true
population parameter falls within an interval around the
sample estimate. It is often used as a measure of survey
estimate’s reliability. As a result, surveys are traditionally
published using two statistics: the survey estimate and
the confidence interval. An example of such practice
is to say “For results based on these sample sizes, the
maximum margin of error is ±3 percentage points with
95% confidence”. The correct interpretation of this
statement is: if the poll were repeated 100 times, 95
of those times the true percentage of people giving a
particular answer to the question would fall within the
95% Confidence Interval, i.e., would be within 3 points of
the sample estimate.
This paper tries to consolidate these competing
decisions, our understanding and intuition into a
reasonable framework. Here we propose a departure from
“classical” approaches to a Bayesian approach. Our goal
is to provide a framework to increase our understanding,
use our intuition and cumulated knowledge and interpret
data that very often we would dismiss.
The 95% Confidence Interval as a measure of reliability
relies on the theory of statistical sampling where p(s) are the
probabilities of inclusion for all elements of the population.
We refer to this as the “classical” approach. Expectation of
statistical estimates including the parameter estimate and
variances are made over these probabilities. When data
are missing, either through non-response or through frame
coverage, the classical approach requires the researcher
to make assumptions such that estimates are valid or
even calculable. Assumptions include randomized nonresponse in the event of non-response or minimal effect
of non-coverage in the case of less than 100% coverage in
the sample frame.
1 In the 2000 Presidential Election, third-party candidate Patrick Buchanan received an unexpectedly large number of votes from Palm Beach County, FL, a heavily Jewish county. Pat Buchanan
had a history of positions considered to be controversial in the Jewish community. On review, it was felt the Palm Beach county butterfly ballot was poorly designed as could confuse voters.
2
To w a r d s T h e U s e O f B a y e s i a n C r e d i b i l i t y I n te r v a l s I n O n l i n e S u r v e y R e s u l t s
Probabilistic Sampling and
Classical Statistics
Theory from finite population sampling and survey
statistics allows for a confidence interval (±3% in this
example) to be calculated properly only if the sample
is selected using probabilistic sampling with known
and positive values for p(s). This means that one has to
know – with certainty – the probability of each unit in
the population being included in the sample, i.e., what
percentage of times each unit would be included in a
sample if an infinite number of samples were selected.
The root of this kind of reasoning is from the field of
Frequentist or Classical Statistics. Within the Frequentist
framework, probabilities measure the relative frequency
of the occurrence of an event.
• Some people can be regular users of the Internet only at
home,
• Some people can be regular users of the Internet only at
work,
• Some people can be users of the Internet at places
other than work or home (e.g. at the library, via
mobile device, etc.), and
• Some people never use the Internet or do not have
access.
Door-to-door and telephone surveys often profile the
population to account for elements of the population
missing from the sample. Less is known about the ‘profiles’
of individuals who complete online surveys versus those
who do not and the likelihoods of completing an online
survey for those that do. Our industry does not have a
standardized approach to these metrics yet.
As practitioners know, the idea of a ‘true’ probability sample
is utopian and does not exist in reality – we simply seek
to get as close as is possible. Gold standard probability
samples have always had non-probabilistic elements:
The survey research community has categorized the
nature of online surveys. Most are based on self-recruited
sample frames, either through Internet river sampling
or volunteering, and the respondents create a panel of
potential respondents for current and future studies.
These studies are now referred to as panel-based,
‘opt-in’ samples. They are often characterized as nonprobability samples, convenience samples or samples
based on other motivations to be part of the study. This is
a ‘layer’ of non-probability, since the construction of the
sample frame is a subset of the population, and there is
no practical intention on including the entire or substantial
portion of the population in the sample frame.
• Door-to-door studies – insufficient maps, missing
dwelling units, access restricted neighborhoods or
apartment buildings.
• RDD telephone studies – Cell phone households,
residential telephone numbers in zero-listed
blocks in RDD telephone samples, or households
consistently using call blocking or screening2.
• Internet studies – non-connected households etc.
Internet studies are perhaps more overtly non-probabilistic
than door-to-door or telephone samples because a
panel, in effect the sampling frame, may or may not
have been constructed to include every member of the
population. Internet frames rely on one of several means
of constructing their frame. Some use river sampling –
advertising on the Internet and asking people to join the
“panel,” i.e., their sample frame, some use lists of known
Internet users and actively recruit their membership in their
panels, others allow people to apply to their panel without
active recruitment, and others use non-Internet lists to
recruit. In most of these cases, the probabilities of being
included in the sample frame are unknown, very difficult to
ascertain, or many are just zero, for example, non-Internet
users. Troubling is that the nature of use of the Internet is
not uniform within the population, so this limits our ability
to calculate the likelihood of reaching a person by online
contact in the same way we do (with some caveats) on the
telephone. The reasons vary. It could be because internet
connections and usage patterns are highly variable across
households such as:
Strictly speaking from a classical perspective, one cannot
calculate the confidence interval for Internet opt-in based
surveys because it is not possible to assign a probability
of selection to each population member, and many have
a probability of selection of zero. Simplistically put, most
online surveys are considered non-probability samples.
So shall we give up? Many of the problems with online
samples are shared with the other approaches. Shall we
damn online samples simply because they are deprived
of measures of scientific precision? We do not believe so.
Even though online samples from opt-in panels are not
probabilistic, they do still provide useful information and
even have a good track record for accuracy in surveying
the population (e.g., election polling and market research
surveys). However, in one way or another, users need to
be able to ascribe the relative robustness and uncertainty
of a survey estimate. This cannot be done, however,
within the Frequentist framework.
2 Most of these issues are discussed in “Advances in Telephone Survey Methodology”, James M. Lepkowski, Clyde Tucker, J. Michael Brick, Edith D. De Leeuw, Lilli Japec, Paul J.
Lavrakas, Michael W. Link, Roberta L. Sangster, Wiley series in survey methodology.
3
To w a r d s T h e U s e O f B a y e s i a n C r e d i b i l i t y I n te r v a l s I n O n l i n e S u r v e y R e s u l t s
Non-Probabilistic Sampling and
Bayesian Statistics
In the context of an online sample design, an online
sampling design is ignorable implies that there is NOT a
relationship between the variables we want to measure,
denoted here by Y, and the probability that a person
completes the online survey. This is a strong assumption:
it would be hard to verify and it may not be true. A weaker,
more realistic, assumption is to say that the sampling
mechanism is ignorable conditional on X. This means that
once we control for all the issues described above, we
believe that the variables Y which we want to measure do
not influence the probability that one person completes
the survey. This can be interpreted as assuming that
the estimates are unbiased under a super-population
model conditioned on the relationship between Y and X.
Regardless of the design, randomness is interjected into
the relationship through the model.
Moving forward requires us to step outside of the
Frequentist framework and involve the field of Statistics
known as Bayesian Statistics. The theoretical basis for
Bayesian Statistics is that the uncertainty of an estimate
and, therefore, its precision, can be based on what we
know about the world, and what we know about the world
can be based on opinions, expertise, other data sources,
and updates to this knowledge based on the data we
collect ourselves3.
Bayesian statistics relies on models. These models either
through Bayes Theorem or super-population models allow
us to generalize from a sample to the population parameters
of interest4. The sample may come from a probability or
a non-probability sample. The model under the Bayesian
framework ultimately controls for those variables that are
systematically different, correcting for the unbalanced
samples due to non-response, non-coverage or other
nonrandom nature of designs.
Ipsos is developing steps to ensure that the design issues
in our online samples are conditionally ignorable5. These
steps are discussed in the Ipsos paper “Our Brave New
World: Blended Online Samples and the Performance
of Non-Probability Approaches”6. But, to summarize,
conditional ignorability can be achieved by combining
multiple opt-in online panel and non-panel sources,
which we refer to as blended online samples. In essence,
by combining multiple sample sources, the “holes” in any
one sample source can theoretically be filled out by any
one of the others.
All of the necessary adjustments and models are
incorporated in X. Implicit in X are discrepancies and
inadequacies from the design, and the biases from
mode effect or measurement errors. Under ignorability,
an important concept in Bayesian statistics, one does
not need to know the actual probabilities of selection.
Formally, the sample design is considered ignorable,
if the joint prior distribution of the parameter and the
response mechanism is independent of each other, or it
is taken into account in the model to allow for sensible
inferences about the population parameter. In the context
of this paper, the population parameter is θ.
3 For a discussion of probabilistic sampling from a Bayesian perspective see “Randomization in a Bayesian perspective”, J. B. Kadane and T. Seidenfeld.
Journal of Statistical Planning and Inference, 25:329-345, 1990.
4 For a famous example on the use of Bayesian statistics to obtain population estimates see “Subjective bayesian models in sampling finite populations”, W. A. Ericson.
Journal of the Royal Statistical Society - Series B (Methodological), 31(2):195-233, 1969.
5 For a discussion of ignorable sample designs within the context of non-random sampling see “On the validity of inferences from non-random sample”, T. M. F. Smith., Journal of the Royal
Statistical Society. Series A (General), 146(4):394-403, 1983
6 Young et. al. (2012) “Our Brave New World: Blended Online Samples and the Performance of Non-Probability Approaches”, Ipsos White Paper.
4
To w a r d s T h e U s e O f B a y e s i a n C r e d i b i l i t y I n te r v a l s I n O n l i n e S u r v e y R e s u l t s
Bayesian Credibility Intervals
Credibility Intervals are often stated in a very similar way
as Confidence Intervals, but the interpretation is quite
different, and in many ways more intuitive. For example,
the statement, “For results based on these sample
sizes, the credible intervals are ±3 percentage points
with 95% credibility” should be interpreted as: Given the
knowledge base of a practitioner and the results of the
survey data he has collected, the probability that the true
percentage of people giving a particular answer in the
whole population is within ±3 percentage points of the
survey estimate is 95%.
The discussion in this section uses the Bayesian
framework and assumes that the sample design is
conditionally ignorable. Bayesian probability intervals
are also known as Credibility Intervals. They measure the
degree of certainty one has in the value of a parameter
based on one’s experience, understanding and
knowledge of the population and tempered by the data
that has been observed. Every probability statement, in
this context, is based on what is known at the time the
probability is stated. One’s experience, understanding
and knowledge is termed the “knowledge base” for the
purposes of this paper. Examples of information that may
be included in one’s knowledge base within the context
of survey sampling include all other current published
surveys, past surveys by the same pollster, historical
political outcome data, a practitioner’s political expertise,
predictive models and other sources. One’s knowledge
base is not static, since it is continuously being updated
with new information.
For most survey, marketing and polling research
estimates, credibility intervals are based on BetaBinomial posterior distribution. This first assumes that Y
has a binomial distribution conditioned on the parameter
θ\, i.e., Y|θ~Bin(n,θ), where n is the size of our sample.
In this setting, Y counts the number of “yes”, or “1”,
observed in the sample, so that the sample mean ( ) is
a natural estimate of the true population proportion θ.
This portion of the model is often called the likelihood
function, and it is a standard concept in both the Bayesian
and the Classical framework. The posterior distribution
in Bayesian statistics takes the likelihood function and
combines it with our prior distribution. Using our prior
Beta distribution, we achieve our Beta-Binomial posterior
distribution. The Beta-Binomial posterior distribution
is also a beta distribution (π(θ/y) ~ ß (y+a,n–y+b)).
It is the hyper-parameters of the prior distribution, i.e.,
one’s knowledge base, updated using the latest survey
information9. In other words, the posterior distribution
represents our opinion on which are the plausible values
for θ adjusted after observing the sample data.
In the Bayesian framework, one’s knowledge base is
one’s Prior Distribution, and is mathematically denoted
by the function πθ, where θ can be viewed as the
population quantity in which we are interested7. For the
remainder of this paper, we will look at a specific family
of prior distributions.
For the purposes of this document, θ is a proportion
which assumes values between 0 and 1. This may
reflect the proportion supporting a particular voter
initiative or candidate. The family of prior distributions
we are considering assumes a beta distribution. In
effect, πθ~ß(a,b) is a useful representation of our prior
knowledge about the proportion θ. The quantities a and
b are called hyper-parameters, and are used to express/
model one’s prior opinion about θ. In other words,
judicial choice of a and b can restate one’s belief that
the parameter is nearer to 25% (a=1 and b=3), near to
50% (a=1 and b=1) or nearer to 75% (a=3 and b=1)8.
The choices of a and b also defines the shape of the
probability curve, with a=1 and b=1 denoting a uniform
or flat distribution. In effect, this is equivalent to believing
that the true value of θ has the same chance of being
any value between 0 and 1. The hyper-parameters a and
b are not limited to a known constant. They too can be
modeled as random quantities. This adds flexibility to
the model, and it allows for data-based approaches to
be considered, such as Empirical and Hierarchical Bayes.
7 For a detailed discussion about subjective probabilities see “Theory of Probability” by Bruno de Finetti (translation) 2 volumes, New York: Wiley, 1974-5.
8 The expected value of a Beta Distribution is a/(a+b).
9 For a discussion how to calculate one’s posterior distribution see “Bayesian Data Analysis - Second Edition”, A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin,
Chapman and Hall/CRC, 2003.
5
To w a r d s T h e U s e O f B a y e s i a n C r e d i b i l i t y I n te r v a l s I n O n l i n e S u r v e y R e s u l t s
Our credibility interval for θ is based on this posterior
distribution. As mentioned above, these intervals represent
our belief about which are the most plausible values for
θ given our updated knowledge base. There are different
ways to calculate these intervals based on π(θ/y). One
approach is to create an estimator analogous to what
is done within the Classical framework. In this case, the
credible interval for any observed sample is based on a
prior distribution that does not include information from
our knowledge base. This case occurs when we assume
that the parameters of the beta distribution are a=1, b=1
and y=n/2. Essentially, these choices provide a uniform
prior distribution where the value of θ is equally likely on
the range between 0 and 1. In effect, our knowledge base
is equally sure or unsure that the true value is near zero,
25%, 50%, 75% or 100%. Using a simple approximation of
the posterior by the normal distribution, the 95% credible
interval is given by, approximately:
This result points to one interesting consequence. If the
sample size is large enough with an uninformed prior,
the credibility interval almost coincides with the confidence
interval10. We need to assume random sampling was
used. This occurs only when the information included in
the sample dominates one’s prior, the information that
one has at hand from one’s knowledge base.
Choosing parameters for a more informed prior may
be arrived at using empirical methods, but let’s say we
were to choose a Beta prior with a=64, b=64 and y=n/2.
This is akin to saying at worst we believe population
parameter is near 50% and concentrated enough that 2
standard deviations under the prior distribution is ± 8.8%.
The posterior distribution under this scenario has
posterior variance, θ(1 – θ)/(n + 129). When θ=1/2, the
Credible Interval is approximately
. Under this
scenario, the parameter estimate ӯ has a 95% Credible
Interval “tighter” than a 95% Confidence Interval. Here
we have gained by using our knowledge base, i.e., the
true value is near 50%. Bayesian analysis allows us to
construct the 95% credible interval even in the case of a
non-probability sample under the ignorability assumption.
10 The 95% Confidence Interval would be
6
To w a r d s T h e U s e O f B a y e s i a n C r e d i b i l i t y I n te r v a l s I n O n l i n e S u r v e y R e s u l t s
Non-Ignorable Design
Discussion
Ignorable designs assume that sample elements are
missing from the sample when the mechanism that
creates the missing data occurs at random, often
referred to as missing at random or completely missing
at random. Basically, the likelihood function conditioned
on the parameter θ is independent of the missing data
mechanism. Since the process for updating the prior
distribution for θ uses this conditional relationship
between the data and the parameter θ, a design that
is non-ignorable has data generated by a parameter
other than θ, say θ1. In other words, the process may
naively update θ without accounting for the difference
between θ and θ1. Under our framework, this is model
misspecification. Non-ignorable missing data requires
additional analysis to decompose θ across the population
segments into a set of θ = {θa , θb , … θz}. The prior
under this usually assumes a multinomial distribution.
This is called a non-ignorable adjustment cell model11.
Under this model, adjustment cells are constructed
and updates occur across cells. It will require analyses
of existing data sources to identify cells where the
θ parameters may be different. The purpose of the
approach is to create cells where ignorability exists,
and then we can rely on this model to construct our
credibility interval.
How different is the Bayesian approach and its
corrections from classical concerns and approaches?
Simplistically, not very. Ipsos is using the Bayesian
approach to frame the problem in such a way to allow
us to use the wealth of information available outside
of the current survey to calibrate survey results and
provide statistical foundation for inference even when
a probabilistic sample is unavailable. The missing data
framework found in the work of Little and Rubin provides
a means of dealing with non-probability samples.
When examining conditions for ignorable designs, model
misspecification is very similar to biases that arise from
poor construction of a sample frame. If we consider optin panels as a sample frame construction issue, the
question is whether or not a random sample from the
panel is unbiased or the level of bias. In our discussion
we have discussed how the Bayesian approach
incorporates potential bias from the sample frame
construction within the estimation (θ versus θ1), allows
us to incorporate data from even non-probability sample
(Bayesian updating under conditionally ignorable
designs), and provide a probabilistic assessment of
uncertainty under these circumstances (posterior
distribution and its credible intervals).
To paraphrase Elliot and Little12, one criticism of “their”
work is that it is explicitly Bayesian. Hence it incorporates
subjective elements through the choices of model and
prior. Every method in modern survey sampling requires
subjective assumptions even when data are unadjusted
or rejection of methods. “The Bayesian framework makes
these assumptions explicit and open to debate, rather than
implicit in the estimation algorithm.” We hope this paper
accomplishes this – opens the discussion and use of data
from the blended online samples together with the large
quantities of information available outside of the study.
Here it is important to note that models to adjust within
cell discrepancies can be employed either at the presurvey design stage or post-survey weighting adjustment
stage. For the most part, we have focused our attention
on the latter solution (see “Our Brave New World: Blended
Online Samples and the Performance of Non-Probability
Approaches” for further detail on such an approach).
11 See Little and Rubin, Statistical Analysis with Missing Data, New York, NY: Wiley, (1987).
12 Elliott, M.R. and Little, R.J.A., “A Bayesian Approach to Combining Information from a Census, a Coverage Measurement Survey and Demographic Analysis,”
Journal of the American Statistical Association, (2000), 95, 450, 351-362.
7
Contact
About Ipsos Public Affairs
For more information, please contact:
Ipsos Public Affairs is a non-partisan, objective, surveybased research practice made up of seasoned
professionals. We conduct strategic research initiatives
for a diverse number of American and international
organizations, based not only on public opinion
research, but elite stakeholder, corporate, and media
opinion research.
Alan Roshwalb
Senior Vice President, US, Ipsos Public Affairs
202.420.2029
[email protected]
Clifford Young
President, US, Ipsos Public Affairs
202.420.2016
[email protected]
Ipsos has media partnerships with the most prestigious
news organizations around the world. In Canada, the U.S.,
UK, and internationally, Ipsos Public Affairs is the media
polling supplier to Reuters News, the world’s leading source
of intelligent information for businesses and professionals.
Ipsos Public Affairs is a member of the Ipsos Group,
a leading global sur vey-based market research
company. We provide boutique-style customer service
and work closely with our clients, while also undertaking
global research.
To learn more, visit: www.ipsos-na.com
Copyright© 2016 Ipsos Public Affairs. All rights reserved.
1 6 - 0 3 - 0 9