Download Non-sampling error - Digital Pathways Development

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Student's t-test wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Name ____________________ Maths teacher _____
YEAR 12 Statistics 2013
Evaluate a statistically based report
AS91266
2.11 Level 2 Internal
2 credits
Evaluate a statistically based report
1
Statistics – following the PPDAC cycle
Problem
Plan
– this is the stage where you
you will gather the data. Ask
define the question you want to answer. Ask
yourself
yourself

how do we go about answering this question?

what do we need to know?

how will we find the information that we
need?

what will we do with the information that
we collect?

who will find this information useful?

is this information relevant to the problem?
–This stage is about how

How would you answer the
question now, before you gather
the data?

how will we gather this data?

what data will we gather?

what measurement system will we
use?

how are we going to record this
information?
the statistical enquiry cycle
Analysis –
This stage
Conclusion
–
This stage is about
answering the
question in the
problem section
and providing
reasons based on
your analysis.
is about exploring the data,
calculating statistics and
drawing graphs and interpreting
them in terms of the question
posed This is the I
I wonder
notice,
stage. Ask
yourself
What is a typical value?
Where are most of the
values?
 What sort of graph
would display the data
best?
 What sort of scale shall
I use for the axes?
Use graphical language e.g.
spread, skew, mean, mode,
variation


Data
– This
stage is concerned
with how the data is
collected, managed
and organised. Ask
yourself

How shall we
record the data
(in a table)?
2
Understanding and using statistical language
We need to use statistical language to describe data and to communicate precise information.
A population can be any group of individuals or measurements that we are interested in finding out
about.
A census is a survey of the whole population, asking every individual the same questions.
A sample is part of the population that we can measure to find out about the population, using an
inference.
Parameters are measures which describe a population, such as mean, median, inter-quartile range.
Statistics are measures which describe a sample, such as mean, median, inter-quartile range.
An inference is an estimate for a population parameter, based on a sample statistic. A sample can give
useful information about a typical member of the population and about the shape of the population
distribution, but not about largest or smallest individuals in the population.
A numerical data set has a distribution which describes how the data varies among the sample or
population.
A distribution can be described by:
 a measure of the centre or typical value such as mean, median, mode.
 a measure of the variation or spread, such as IQR or standard deviation
 a measure of the shape, such as symmetry or skew
 a description of unusual features of the data, such as outliers.
A distribution is best displayed by a graph such as a dot plot, histogram or bar graph.
3
Measures of Central Tendency
A single number that can represent the typical value of a set of numbers.
The Median is the MIDDLE NUMBER (or
average of two middle numbers) of a group of numbers
listed in order from smallest to largest.
The Mean is the sum of a set of
numbers divided by how many
numbers are in the set.
The Mode is the number
that occurs the most often in
a set of data.
Mean
Median
Mode
Advantages
Easy to calculate
Every value is taken into account
Typical of middle data values
Not affected by outliers or
incorrect values
For non-numeric data it is the only
average you can use
Disadvantages
Influenced by outliers
Hard to calculate manually
May not exist
Tells us nothing about the
remaining data
4
An important idea
Consider the scores in for 5 contestants Adeline, Belinda, Carey, Dorothy and Elika in a trampoline competition. There
are six judges who give a score to each contestant from one to six. The results for each student are shown in the dot
plots below.
What do you notice about the distributions of the scores?
They all have the same mean and median and range, but the actual distributions are all different.
We need a way of distinguishing between the different spread of data for each distribution. The range does not
distinguish between them. We use the standard deviation. Use you calculator in stats mode to calculate the mean and
standard deviation for each set of data. Use the
trampoline results
Dot Plot
0 1 2 3 4 5 6 7
A
trampoline results
x and xσn functions.
contestant
A
mean
3.5
sd
2.1
B
3.5
2.5
C
3.5
1.5
D
3.5
2.2
E
3.5
1.8
Dot Plot
0 1 2 3 4 5 6 7
B
trampoline results
Dot Plot
How does the standard deviation vary with the spread
of the data?
0 1 2 3 4 5 6 7
C
trampoline results
Dot Plot
Small SD means little spread (most values close to the
mean)
Large SD means a lot of spread (many values far from
the mean).
0 1 2 3 4 5 6 7
D
trampoline results
Dot Plot
0 1 2 3 4 5 6 7
E
Note that a uniform distribution has a middling spread.
The standard deviation is a measure of
the spread of a set of data. It is a rough
measure of the average distance from
the mean.
5
Measures of Spread
Measures the variation in a set of numbers.
Low variation means most values are close to the centre.
High variation means lots of values are far from the centre.
Range is… The difference between the upper and lower extremes
(difference between the maximum and minimum values).
Interquartile range is… The difference between the
upper and lower quartiles, which is also the middle 50% of
your data
Standard deviation is a measure of spread.
A low
standard deviation indicates that the data points tend to be very
close to the mean , whereas high standard deviation indicates
that the data are spread out over a large range of values.
Is affected by outliers and skewed data.
6
Estimating the Mean and Standard Deviation
Dot Plot
heights of year 9 students
Mean:___156.7 cm__________
Standard deviation 14.6 cm___
80
100
120
140 160
height
180
200
220
These data are about the height, weight and age of bears. The measurements were recorded in America.
Dot Plot
Bears
40
50
60
70
Length
Dot Plot
Bears
80
0
Dot Plot
Bears
100 200 300 400 500 600
Weight
0
40
80
120
Age
160
Mean: 61.3 in
Mean: 192 pounds
Mean:_43 months______
Standard deviation 9.4 in
Standard deviation 110 pounds
Standard deviation 34 months
This dot plot shows the number of words remembered in Kim’s
Game by a year 9 class and a year 12 class.
year 12
year 9
mean
13.1
9.0
SD
2.4
2.8
This dot plot shows the time taken to write a
sentence with the dominant hand and the nondominant hand by a class of year 12 students.
Dot Plot
w riting time (secs)
dominant hand
non-dominant hand
dominant_handsec
mean
13.6 sec
34.9 sec
SD
1.4 sec
4.6 sec
nondominant_hand...
0
10
20
30
40
7
SAMPLING
Statistics involves the collection and analysis of data to find answers to complex problems. When possible, information
may be collected from the entire group under investigation. This is called a census.
Sampling involves surveying some of the group of interest. A sample survey is carried out to collect information
(data) when it is impractical, too expensive or unnecessary to carry out a census.
Situations involving sampling:
 Polls to establish public opinion related to politics or current events or business opinion
 Market research on goods and services
 Scientific experiments or tests eg taking a sample of blood.
 Radio and TV ratings are based on the listening and habits of a sample of people. These determine advertising
rates and the money available for programming.
 Government agencies collect information to make predictions for the needs of schools, hospitals, social
services.
Reasons for sampling
 Economic: it is too expensive to survey the entire population
 Time: collecting data takes time
 Availability of information: the entire population may not be accessible
 The nature of the method of data collection eg finding the number of hours a light bulb lasts,
The target population
The population under investigation is called the target population. A population is not necessarily people, it could
be icecream, lightbulbs, the time taken to send a text, etc
The Sampling Frame
In order to make a selection from the target population a list called the sampling frame is used. Some examples of
possible sampling frames are a school roll, the electoral roll for a district, a map showing the houses in an area, a
farmer’s database of her stock. Sometimes the sampling frame may not exactly coincide with the target population, for
example using the telephone directory to survey adults in Auckland.
Bias
To be effective a sample should be representative of the target population so that correct conclusions may be
reached about the population as a whole. The characteristics of the sample should be the same as the characteristics of
the target population.
If the sample does not accurately represent the target population the survey method is said to be biased.
Size of the sample
The larger the sample, the more likely your sample statistic is to be a good estimate of the population parameter.
However, for estimating the mean or median of populations up to 1000, a sample of about 30 is usually big enough to
give a reasonable estimate.
Making an inference
For each sampling method, once the sample units have been selected the data value for each is recorded. Statistics
can be calculated from the sample such as the mean, median, range, standard deviation. These statistics can then be
used to make an estimate of the parameter for the target population.
Making an estimate in this way is called making an inference about the population.
8
Sampling variation
The variation in a sample statistic from sample to sample due to the variability in the population, the sample
size and random variation.
Suppose a sample is taken and a sample statistic, such as a sample mean, is calculated. If a second sample of
the same size is taken from the same population, it is almost certain that the sample mean calculated from
this sample will be different from that calculated from the first sample. If further sample means are
calculated, by repeatedly taking samples of the same size from the same population, then the differences in
these sample means illustrate sampling variation.
Sampling error
The error (in an estimate of a population parameter, based on a sample statistic) caused because data are
collected from part of a population rather than the whole population (even if the sample is unbiased).
Sampling error occurs because of sampling variation. Even if identical sampling methods are used, two
samples are likely to give different estimates of the population mean, percentages, standard deviation, which
are also different from the true population mean, percentage, standard deviation. If the sampling method is
valid and reliable, the sample will represent the population but there is likely to be some sampling error in
each estimate of a population parameter. If the sample is very small (less than 30 for an estimate of the
median or mean, or less than 250 for an estimate of a proportion) then sampling error may cause the sample
to be biased and unrepresentative of the population. If the sample is large it may biased for other reasons
(non-sampling error), but it is unlikely to be biased due to sampling error.
An estimate of a population parameter, such as a sample mean or sample proportion, is different for
different samples (of the same size) taken from the population. Sampling error is due to sampling variation
and is one reason for the difference between an estimate and the true, but unknown, value of the population
parameter. The other reason is non-sampling error.
Non-sampling error
The error (in an estimate of a population parameter, based on a sample statistic) caused because of human
error (either in designing, carrying or contributing to the survey). Non-sampling errors have the potential to
cause bias in estimates based on surveys or samples.
To minimise non-sampling error:
 Use a sampling frame which is representative of the population
 Use a sampling method (random or systematic) which is likely to give a representative sample
 Make sure the survey questions are clear, unbiased and easy to answer
Some sources of non-sampling error are more difficult to control:
 People who choose not to answer (non-response)
 People who don’t tell the truth
There are many types of non-sampling errors, and the names used for them are not consistent.Some
examples of non-sampling errors causing bias in the sample are:







The sampling frame or sampling process is such that a specific group is excluded or under-represented in the
sample, deliberately or inadvertently. If the excluded or under-represented group is different, with respect to
survey issues, then bias will occur.
The sampling process allows individuals to select themselves. Individuals with strong opinions or those with
substantial knowledge will tend to be over-represented, creating bias.
Bias will occur if people who refuse to answer have different views of the survey issues from those who
respond. This can also happen with people who are never contacted and people who have yet to make up
their minds.
If the response rate (the proportion of the sample that takes part in a survey) is low, bias can occur because
respondents may tend consistently to have views that are more extreme than those of the population in
general.
The wording of questions, the order in which they are asked, and the number and type of options offered can
influence survey results.
Answers given by respondents do not always reflect their true beliefs because they may feel under social
pressure not to give an unpopular or socially undesirable answer.
Answers given by respondents may be influenced by the desire to impress an interviewer.
9
Sampling Strategies
Statisticians have developed various methods for taking a sample. To produce a survey that is free from
bias the method should ensure that the sampling frame is representative of the target population, and that
every unit in the sampling frame has an equal chance of being included in the sample.
1. Simple Random Sampling
This is the mathematical equivalent of drawing the names out of a hat, or sticking a pin randomly in a
list of names. It involves
 Allocating a number to every unit in the sample frame
 Generate random numbers
 Match the random numbers generated to the units
 Record what you are interested in about the unit
Notes
1. If the same random number comes up more than once it should be disregarded and another random
number generated to replace it.
2. Although the simple random sampling method is free from bias the sample may turn out to be not
representative of the population.
3. A disadvantage of this method is that it is time consuming or impossible to carry out with large
populations.
2. Systematic Sampling
This is the mathematical equivalent of selecting every 10th person on a list. It involves
 Using a random number to find a starting point on the sampling frame
 Divide the total by the sample size (and round) to find how many units to count to select the
next unit.
Notes
1. This method has the advantage of being quicker than simple random sampling.
2. If the target population has recurring patterns in it the sample may not be representative.
3. If the counting number is too large the method becomes awkward.
4. If the counting number is too small the sample may not be representative of the population.
3. Stratified Sampling
This method is only used when you have evidence that there are subgroups in the population which you
expect to have different parameters. It involves
 splitting the population into layers or strata. Each unit in the population is allocated to one layer
eg male/female. The number to be selected from each layer is calculated to be in the same
proportion as the number in each layer in the population. For example, from a group of 60 men
and 20 women for a sample of size 8 you would select 6 men and 2 women.
 Taking a simple random sample or systematic sample from each layer in the usual way.
Notes
1. This method guarantees that each strata is represented
2. The success of the method depends on the choice of strata.
4. Cluster sampling
This method chooses a part or parts of the population which is believed to be representative and
samples only from that part, using one of the methods above. Eg. Surveying people in Ellerslie, as it
has an ethnic breakdown which is representative of all Auckland.
Notes
1. It is easier and cheaper than sampling from the whole population.
2. The success of the method depends on the choice of the cluster(s).
5. Non probability sampling
These methods include convenience sampling (eg person on the street surveys) and quota sampling (a
form of convenience sampling in which target numbers are set for certain groups to ensure representation,
such as sampling equal numbers of men and women).
Notes
1. This method assumes that the people encountered in the convenience sample are representative of
the population, which may or may not be a valid assumption.
10
2. If the sample is not representative of the population you can’t make a valid inference about the
population.
6. Self selected sampling
An extreme form of non probability sampling is self-selected sampling (eg text in your vote).
Notes
1. With self selected sampling it is usual to get responses only from those people who feel strongly
about the issue in question.
2. There is a very high chance that a sample obtained with self selected sampling will not be
representative of the target population.
3. It is highly likely that an inference made from the sample would not be valid for the target
population.
Summary of advantages and disadvantages of different sampling methods
The advantages only apply when the sampling frame is representative of the
population.
method
Simple random
sampling
advantage
Usually representative
Systematic sampling
Quicker and easier to organise
than a simple random sample
but still likely to be
representative
Ensures each identified strata of
the population are represented.
Comparisons can be made
between strata.
Usually representative; may be
less expensive than simple
random sampling
The sampling units are chosen
because they are easy to access.
This relies on people
volunteering to take part in the
research.
Stratified sampling
Cluster sampling
Convenience sampling
Self-selected sampling
disadvantage
May be time consuming and
expensive to organise for a
large population
Cannot be used when there
may be cyclic patterns in the
data.
Requires prior knowledge of
the population
Relies on the clusters selected
being representative of the
population.
Unlikely to be representative
Unlikely to be representative
11
Survey Methods
Internet
self-administered
Written
self-administered
Questionnaires and other surveys can be completed in a face-to-face or telephone interview, or selfadministered on paper or internet. When choosing a survey method, you need to consider who your target
group is, the best way to reach them, the cost, and the time available. Even the best-designed survey will have
some non-response which may bias the results. There are advantages and disadvantages for each method.
method
example
advantages
disadvantages
 Response rate may be poor
 Cost is relatively low
and biased towards more
 Geographic distribution can be
educated and those with an
wide
interest in the topic
 Sensitivity issues handled well  No knowledge about nonresponse
 Long time between data
collection and analysis
 Low cost (no paper, no data
entry costs, no postage)
 data collection is quick
 Geographic distribution may be
wide
 Questionnaires may be
complex because the skips are
programmed in
 Pop-up instructions, videos,
voice-overs, animation are
available to make it more fun
and dynamic
 Bias against those without
internet access
 Self selection bias
 Non-response bias
12
Telephone
interview
Face-to-face
interview
 Good control of question order
 Good quality of responses
 Appropriate for some sensitive
issues
Call centre
selling.
Numbers are
selected
randomly from
a phone book
or by
generating
random
numbers.
 Cost is relatively low
 Geographic distribution can be
wide
 Sensitivity issues handled well
 Cost is high
 Data collection period is long
 Geographic distribution must
be clustered
 Takes a long time
 Response rate may be poor
and biased towards more
educated and those with an
interest in the topic
 No knowledge about nonresponse
 Long time between data
collection and analysis
Questionnaire design must ensure that the questions asked are:
 Easy to understand
 Are not leading questions (designed to get one particular answer)
 Allow for all possible responses
A well designed questionnaire will enable useful information to be collected.
A poorly designed questionnaire will result in non-sampling errors due to nonresponse, biased data and incorrect responses.
More big ideas in statistics:
Reliability describes the repeatability and consistency of test or sample.
A sampling process is
reliable if it gives a similar distribution each time it is repeated.
Example:
RELIABILITY AND STATISTICS
Physical scientists expect to obtain exactly the same results every single time, due to the relative predictability of the
physical realms. If you are a nuclear physicist or an inorganic chemist, repeat experiments should give exactly the
same results, time after time.
Ecologists and social scientists, on the other hand, understand fully that achieving exactly the same results is
an exercise in futility. Research in these disciplines incorporates random factors and natural fluctuations and,
whilst any experimental design must attempt to eliminate confounding variables and natural variations, there
will always be some disparities.
The key to performing a good experiment is to make sure that your results are as reliable as is possible; if
anybody repeats the experiment, powerful statistical tests will be able to compare the results and the scientist
can make a solid estimate of statistical reliability.
Read more: http://www.experiment-resources.com/definition-of-reliability.html#ixzz1fosgGb00
Validity defines the strength of the final results and whether they can be regarded as accurately
describing the real world. A sampling process is valid if it is unbiased and is likely to give a sample that is
representative of the population the sample comes from.
Example:
13
Comparing RELIABILITY and VALIDITY
Reliability and validity are often confused, but the terms actually describe two completely different concepts, although
they are often closely inter-related. This distinct difference is best summed up with an example:
Example: A researcher devises a new test that measures IQ more quickly than the standard IQ test:



If the new test delivers scores for a candidate of 87, 65, 143 and 102, then the test is not reliable or valid, and
it is fatally flawed.
If the test consistently delivers a score of 100 when checked, but the candidates real IQ is 120, then the test is
reliable, but not valid.
If the researcher’s test delivers a consistent score of 118, then that is pretty close, and the test can be
considered both valid and reliable.
Questions, questions, questions…
The problem is the big question an investigation is trying to answer, the purpose of an investigation. It is often
written as a question.
A survey question is a question asked in a survey or questionnaire in order to get information to help answer
the problem question.
A critical question (or worry question or interrogative question), is a question asked by someone interpreting
data or reading a statistical report (see page 16).
STATISTICAL LITERACY
Who needs statistical literacy?
Statistical literacy is needed by data consumers – anyone who tries to evaluate numerical information.
Statistical literacy is needed most by journalists, policy analysts, decision makers and by political,
economic and social leaders, but most of all by the citizens of a modern democracy.
What should a statistically-literate person be able to do?
Statistical literates should be able to evaluate number-based claims in the media.
Consider these newspaper headlines:





Soft Drinks Could Boost Pancreatic Cancer Risk.
Absent Dads cause Earlier Puberty in Girls.
Weddings boost mood
Shooter video games can improve decision making
New heart disease drug does not improve patient outcomes
A statistically-literate reader can tell that the first two claims will have much weaker support because
the outcome is not repeatable for a given person: you only get cancer once; you only go through puberty
once. Comparing different people weakens the argument.
14
They can tell that the last three claims have stronger support since the outcomes can be measured
before and after the event or condition in question. But only the last one can have fairly strong support.
The last study is the one in which the outcome is repeatable AND the subject can be assigned –
unknowingly to either get new drug or to get a placebo.
Statistical literacy is like speed reading or speed dating. You get more information faster.
Statistical literates can spot the difference between association and causation.
A statistically-literate reader knows that words like “kills”, “causes” and “blame” make “causal claims”
that are much more disputable than “association claims” involving words like “attributed to”,
“associated with”, “tied to”, “linked to” or “due to”.
A statistically-literate reader can spot obvious errors in news stories.
Consider these:

Racial Imbalance Persists at Elite Public Schools New York Times 11/08/2008. “at Stuyvesant...2%
of blacks, 3% of Hispanics, 24% of whites and 72% of Asians were accepted.” The 100% total is
suspicious. These are not parts of a pie so there is no reason for that total. The report should have
said “among those accepted, 2% are blacks, 3% are Hispanics, 24% are whites and 72% are Asians.”
Since these are parts of the same “pie”, they should total 100%.

Study says too much candy could lead to prison. AP 9/30/2009. “Of the children who ate candies
or chocolates daily at age 10, 69 percent were later arrested for a violent offense by the age of 34.”
This is an incredible statistic. Do you believe eating candy daily can predict criminal behaviour 20
years in advance? No! The truth: “69% of respondents who were violent criminals by the age of 34
years reported that they ate confectionary nearly every day during childhood.” The AP reversed the
order: “69% of daily candy-eating kids became violent criminals by 34” is very different from “69%
of violent criminals by age 34 had been daily candy-eaters as kids”.
Statistically-literate readers look for weasel words: count words that imply much but assert
very little – words like many, some or few, lots or little, high or low, often or seldom.


Many teens share prescription drugs. Some elderly get futile care.
Adult video gamers often overweight, depressed. Older drivers in fewer crashes.
Statistically-literate readers look out for ideas that are vague.
Consider these headlines:


High exposure to BPA linked to low sperm count. How low is low? By choosing a higher sperm
count cutoff, the number of men with low sperm count is increased.
Too much TV psychologically harms children. Exactly how much TV is too much? What did they
consider harm? What did they consider psychological harm?
A statistically-literate person has an
idea of when a relationship is not
likely to be causal.
Consider the traffic fatality graph at right. As
the amount of lemons imported from Mexico
increased, the US traffic fatality rate decreased.
The lack of a plausible mechanism and the
small effect size (a 6% drop from 15.8 to 14.8)
all but invite alternate explanations.
15
A statistically-literate reader knows the difference between frequently and likely.
 Car most frequently stolen: Honda Civic
 Car most likely to be stolen: Cadillac Escalade
Frequently is a count. Honda Civics are common so the number stolen is higher. Here likely is a rate per
car. Cadillac Escalades are less common so the theft-rate is higher.
A statistical-literate knows the difference between real statistics and speculative
statistics.
Which counts are real: deaths due to poisoning or drownings versus deaths due to obesity, radon or secondhand smoke? Answer: the former (corner-certified); the latter are all speculative.
A statistically-literate person knows how to read statements involving rates and
percentages.
Do these statements say the same thing?
 Percentage of women who smoke vs. percentage of smokers who are women.
 Death rate of men vs. male rate of death.
In both cases, the answer is “No”.
How about these statements?
 Percentage of women who smoke vs. percentage of smokers among women.
Here the answer is “Yes.”
Statistical literacy critical questions to ask about the article or report
Purpose of the article or report and identification of the population of interest.
A description of measures and data representations used in the article or report and an evaluation of the
appropriateness of these to the purpose.
How was the data displayed in the article?
Are the displays or measures appropriate for the type of data?
Are the displays or measures misleading in any way?
What summary statistics were used in the article?
Do the comments match the graphs/displays given?
Were outliers or extreme values present in the data, and if so, how were they handled?
A description of the sampling or survey method, including reference to sample size when available, used
in the article or report and an evaluation of the appropriateness of these to the purpose.
Is the original data available?
What type of data is it, categorical or numerical?
How accurate is the data?
Did the data require cleaning?
Where is the data that was quoted/used in the article from?
16
What were the survey questions asked?
What was the data collection method?
Were the survey questions appropriate?
Could the survey questions be misinterpreted or not give the data needed?
What were/are the variables of interest?
How were the variables of interest measured?
An evaluation of the validity and bias of the information presented in the media report. This may involve
using relevant contextual knowledge. Consider how the author/s of the article or report collected the
information, and the assumptions that the author/s made.
Consider any bias present in the article or report. Bias is where the author may have a particular point of
view. A biased article or report may still be valid, even though it is one-sided.
It is not enough just to state that an article or report is (or is not) biased or valid - you must comment on why
it is biased or not biased, or valid or not valid. Put notes on your report about the bias and validity of your
sources of information.
Do the comments (descriptions) made in the article or report reflect accurately the data given?
Are any comments misleading or biased?
Could alternative analyses be made?
Could the data have been interpreted in another way?
What data/information is not present?
A summary of the results of investigation and an evaluation of the effectiveness of the article or report
in meeting the purpose. This may involve using relevant contextual knowledge.
What questions is the article or report answering (what is the investigative question(s))?
Who is the article intended to be about (who is the intended population)?
Who is the article or report aimed at (who might be interested in the outcomes)?
What is the purpose of the article or report?
What further information is needed?
Are there any underlying or lurking variables that may have an impact on the outcome?
Are claims made in the article or report valid and/or sensible?
17