Download Revisiting Sampling Concepts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Revisiting Sampling
Concepts
Population
• A population is all the possible members of a
category
• Examples:
• the heights of every male or every female
• the temperature on every day since the beginning of time
• Every person who ever has, and ever will, take a particular
drug
Sample
• A sample is some subset of a population
– Examples:
• The heights of 10 students picked at random
• The participants in a drug trial
• Researchers seek to select samples that
accurately reflect the broader population from
which they are drawn.
Samples are drawn to infer something about
population
Sample
Sample
Statistics
Population
Inference
Population
Parameters
Reasons to Sample
 Ideally a decision maker would like to
consider every item in the population but;
• To Contact the whole population would be
time consuming e.g. Election polls
• The cost of such study might be too high
• In many cases whole population would be
consumed if every part of it was considered
• The Sample results are adequate
Probability Vs Non Probability Sampling
Probability Sampling
• Drawing Samples in Random manner
• Using random numbers
• Writing names on identical cards or slips and then
drawing randomly
• Choosing every nth item of the population
• First dividing the population into homogeneous
groups and then drawing samples randomly
Probability Vs Non Probability Sampling
Non Probability Sampling
• man-on-the-street interviews
• call-in surveys
• readership surveys
• web surveys
Types of Variables
• Qualitative
• Quantitative
•
•
Discrete
Continuous
• Categorical
• Numerical
Sampling Error
• “Sampling error is simply the difference between
the estimates obtained from the sample and the
true population value.”
Sampling Error = X - µ
Where
X = Mean of the Sample
µ = Mean of the Population
Validity of Sampling Process
Sampling Distributions
• A distribution of all possible statistics calculated
from all possible samples of size n drawn from a
population is called a Sampling Distribution.
• Three things we want to know about any
distribution?
– Central Tendency
– Dispersion
– Shape
Sampling Distribution of Means
• Suppose a population consists of three numbers
1,2 and 3
• All the possible samples of size 2 are drawn from
the population
• Mean of the Pop (µ)
= (1 + 2 + 3)/3 = 2
• Variance
• Standard Deviation
= 0.82
Distribution of the Population
Sampling distribution of means
n=2
Sample #
Sample
Sample Mean
1
1,1
1
2
1,2
1.5
3
1,3
2
4
2,1
1.5
5
2,2
2
6
2,3
2.5
7
3,1
2
8
3,2
2.5
9
3,3
3
Mean of SD
2
=µ
= 0.6
=µ
<
0.6 < 0.8
• The population’s distribution has far more
variability than that of sample means
• As the sample size increases the dispersion
becomes less and in the SD
• The mean of the sampling distribution of ALL the
sample means is equal to the true population
mean.
• The standard deviation of a sampling distribution
called Standard Error is calculated as
Central Limit Theorem ……
• The variability of a sample mean decreases as
the sample size increases
• If the population distribution is normal, so is
the sampling distribution
• For ANY population (regardless of its shape)
the distribution of sample means will approach
a normal distribution as n increases
• It can be demonstrated with the help of
simulation.
Central Limit Theorem ……
• How large is a “large sample”?
• It depends upon the form of the distribution from
which the samples were taken
• If the population distribution deviates greatly from
normality larger samples will be needed to
approximate normality.
Implications of CLT
• A light bulb manufacturer claims that the life span
of its light bulbs has a mean of 54 months and a
standard deviation of 6 months. A consumer
advocacy group tests 50 of them. Assuming the
manufacturer’s claims are true, what is the
probability that it finds a mean lifetime of less than
52 months?
Implications of CLT Cont
• From the data we know that
• µ = 54 Months
= 6 Months
• By Central Limit Theorem
= µ = 54
=
0.0094
52
54
-2.35
o
• To find
,we need to convert to z-scores:
• From the Area table
= 0.4906
• Hence, the probability of this happening is 0.0094.
• We are 99.06% certain that this will not happen
What can go wrong
• Statistics can be manipulated by taking biased
samples intentionally
Examples
• Asking leading questions in Interviews and
questionnaires
• A survey which showed that 2 out 3 dentists
recommend a particular brand of tooth paste
• Some time there is non response from particular
portion of population effecting the sampling design
How to do it rightly
• Need to make sure that sample truly represents
the population
• Use Random ways where possible
• Avoid personal bias
• Avoid measurement bias
• Do not make any decisions about the population
based on the samples until you have applied
statistical inferential techniques to the sample.