Download Random Samples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Random Samples
A random sample taken from a probability model that describes a population or process of
(single) units:
Sample size: n
(Population size is generally not a factor.)
Statistics (values obtained from a sample)
Sample mean: x
Sample standard deviation: S
Parameters (values for an entire population or process)
Population mean: 
Population standard deviation: 
Statistics are used to estimate parameters. Generally speaking the sample mean is the best way to
estimate the population mean. We care about the distribution of sample means.
(Super) Units: Samples of size n
(Super) Variable: Mean (of sample)
Two types of random sampling
For both types the following is true:
The probability of selecting a particular unit at any point is given by the probability
model. If this model describes a population, then every unit in the population is equally
likely to be chosen on any selection. (Keep in mind that the probability model describes
how many units share any particular variable value.)
For a random sample drawn from a probability model describing a population or process (that
could go on indefinitely):
After sampling any particular set of units, the probabilities of the variable values for the
next unit are exactly the same as they were for the first unit – and every other unit. This
means that random sampling is done with replacement.
For a simple random sample drawn from a probability model describing a finite population:
After sampling any particular set of units, that set of units is not replaced, and subsequent
units are chosen from the remaining unsampled units. This means a simple random
sample is done without replacement.
(With a simple random sample, every possible sample of n different units is equally likely
to be the chosen sample.)
Distributional Results
For both types of sampling, the mean of the distribution of (all possible) sample
means is equal to the mean for the probability model describing individual units.
X  
The formula for the standard deviation of the sample mean technically depends
upon the type of sampling used. For a random sample (drawn with replacement)
the formula shown works. This formula also works for almost all practical
situations involving a simple random sample.
X 

n
20 Times Rule (also known as the 5% rule)
If the population size is at least 20 times as large as the sample size n, the two types of sampling
can be treated the same way. (Restated: If the sample size n is no more than 5% of the population
size…) Consequently, the standard deviation of the sample mean is virtually the same as that
shown above.
The two expressions above are generally appropriate to handle the distribution of sample means.
Normally distributed data
If the distribution of the variable over individual units in the population is Normal, and the
sampling is random, then the sample mean has Normal distribution.
Central Limit Theorem
For a variable that does not have Normal distribution: As the sample size grows, the distribution
of (all possible) sample means becomes closer and closer to a Normal distribution.
How large is large enough to use a Normal distribution to determine probabilities for a
sample mean?
If the variable is Normal, n = 1 is large enough. See above.
For a variable with a symmetric distribution, n = 10 or so might be fine, unless the
distribution is the sort that has some large outliers.
For somewhat skewed distributions without large outliers, n = 30 is generally
fine.
For highly skewed distributions, or distributions that have outliers, rather large
samples are required. These situations require expertise beyond the intro stats
level.