Download Sampling Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
From the population to the
sample
The sampling distribution
FETP India
Competency to be gained
from this lecture
Use the properties of the sampling distribution
to calculate standard error to the mean
Key issues
• Population parameters versus sample statistics
• Sampling distribution and its properties
• Mean and standard error of the sampling distribution
Things we already know
• Mean
 Arithmetic sum of data divided by number of observations
• Standard deviation
 Index of variability (spread) of data about the mean
• Z-score
 Distance from mean in standard deviation units
z = (x-mean)/sd
• Normal curve
 Bell-shaped curve that relates probability to z-scores
Parameters and statistics
Population parameters
• A population parameter is a numerical
descriptive measure of a population
• Examples:
 Population mean (µ)
 Standard deviation ()
Parameters and statistics
A statistic
• A statistic is a numerical descriptive measure
of a sample
• Examples:
 Sample mean x
 Sample standard deviation s
Parameters and statistics
Inference
• The parameter is fixed
• The sample statistics varies from sample to
sample
• We try to infer what happens in the
population from what we see in the sample
Parameters and statistics
Sample mean: A typical situation
• A sample might be taken
• The mean and standard deviation are
computed
• From this data, one will want to infer that
the population values are identical or at
least similar
• In other words, it is hoped that the sample
data reflects the population data
Sampling distribution
Sample mean: Another approach
• Change your thinking from a single sample
• Consider the situation where you:
 Take many samples
 Calculate a mean and standard deviation for each
sample
Sampling distribution
Taking many samples from a population
• Consider a population of 1,000 individuals
with various heights
• If we take 10 samples of 100 persons from
the population, each of the 10 samples will
have a specific frequency distribution with:
 A specific mean
 A specific standard deviation
• In each sample, each data point is a height
Sampling distribution
Looking at the means of the samples
• We can look at the frequency distribution of
the means of each of the 10 samples
• In this case:
 The data points are no longer the heights
 The data points are the means
Sampling distribution
Intuitive observation
• If we take iterative samples from a population, we
are unlikely to sample extreme values every time:
 Values close to the mean are common
 Extreme values are less common
• Thus, when we compare the distribution of the
heights and the distribution of the means, we
observe:
 More variation in the distribution of individual heights
 Less variation in the distribution of the means
Sampling distribution
Taking many samples from the
population
• If we take many samples, we can plot a
complete frequency distribution of the
means of the samples
• Each sample produces a statistic (mean)
• The distribution of statistics (means) is
called a sampling distribution
Sampling distribution
Multiple sample means
Sampling distribution
Important properties of the sampling
distribution
1. The sampling distribution is normally
distributed
2. The mean of the sampling distribution is
equal to the mean of the population
Sampling distribution
Standard deviation of the
sampling distribution
• If the standard deviation of the population is

• The standard deviation of the sampling
distribution will be  / (√ n)
• n is the sample size
Sampling distribution
Terminology
• The mean of the sampling distribution
continues to be called the mean
• The standard deviation of the sampling
distribution is the standard error
Standard error
Distribution of sample means
• One could obtain a standard deviation of
sample means which would describe the
variability and the spread of sample means
about the true population mean
• In a practical situation:
 There is only one sample mean
 One hopes this sample mean is near the real
population mean
• Wouldn't it be nice to have an estimate of
the standard deviation of sample means
which describe the spread of sample means?
Standard error
Standard error of the mean
• Divide the standard deviation by the square
root of the number of observations
• The resulting estimate of the standard
deviation of sample means is called the
standard error of means
• It can be interpreted in a manner similar to
the standard deviation of raw scores
 For example, the probability of obtaining a
sample mean which is outside the -1.96 to +1.96
range is 5 out of 100
Standard error
Central limit theorem
• If x possesses any distribution with mean µ and
standard deviation SD
• Then the sample mean x based on a random sample
of size n will have a distribution that approaches
the distribution of a normal random variable
 Mean µ
 Standard deviation SD/square root of n as n increases
without limit.
• Special case:
 If x is normally distributed, the result is true for any
sample size
Standard error
Simple example
• Let the population be 1,2,3,4,5
 Mean = 15/5 = 3 = µ
• Let’s take a sample of two elements
• The 25 possible samples are:
1,1
2,1
3,1
4,1
5,1
1,2
2,2
3,2
4,2
5,2
1,3
2,3
3,3
4,3
5,3
1,4
2,4
3,4
4,4
5,4
1,5
2,5
3,5
4,5
5,5
Standard error
The frequency distribution of the
population is not normal
Frequency
2
1
0
1
2
3
4
5
Values
Standard error
Standard deviation of the population
Values
Total
Deviation to
the mean
Mean
1
2
3
4
5
Square
deviation to
the mean
3
3
3
3
3
-2
-1
0
1
2
0
4
1
0
1
4
10
Standard
deviation
Variance
2
1.4
Standard error
Looking at the mean of the samples
• The 25 means of the 25 samples are:
1
1.5
2
2.5
3
1.5
2
2.5
3
3.5
2
2.5
3
3.5
4
2.5
3
3.5
4
4.5
3
3.5
4
4.5
5
Mean of sample means = 75/25 = 3
Same as population mean
Standard error
The sampling distribution tends
to be normal
6
Frequency
5
4
3
2
1
0
1
1.5
2
2.5
3
3.5
4
4.5
5
Values
Even if the population is not normally distributed,
the sampling distribution will tend to be normal
Standard error
Standard deviation of the sample
Values
1
1.5
1.5
2
2
2
2.5
2.5
2.5
2.5
3
3
3
3
3
3
3.5
3.5
3.5
3.5
4
4
4
4.5
4.5
5
Total
Deviation to
the mean
Mean
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
-2
-1.5
-1.5
-1
-1
-1
-0.5
-0.5
-0.5
-0.5
0
0
0
0
0
0
0.5
0.5
0.5
0.5
1
1
1
1.5
1.5
2
0
Square
deviation to
the mean
4
2.25
2.25
1
1
1
0.25
0.25
0.25
0.25
0
0
0
0
0
0
0.25
0.25
0.25
0.25
1
1
1
2.25
2.25
4
25
Standard
error
Variance
1.00
1.00
Standard error
Standard deviation in the population
and standard error
• Standard deviation in the population:
 1.4
• Sample size:
 2
• Square root of the sample size:
 1.4
• Standard deviation / square root of the sample size:
 1.4 / 1.4 = 1
 = Standard error
Standard error
Applying the standard error:
Male's serum uric acid levels (1/2)
• Population mean :
 5.4 mg per 100 ml
• Standard deviation is:
 1
• Take 100 samples of 25 men in each sample
• Compute 100 sample means
• How many of those means would you expect to fall
within the range 5.4-(1.96x1) to 5.4+(1.96x1)?
• The answer is 95!
Standard error
Applying the standard error:
Male's serum uric acid levels (2/2)
• One sample
• Mean serum uric acid level of 8.2
• Would you assume this was "significantly"
different from the population mean?
 Yes, because a mean of that magnitude could
occur less than 5 times in 100
Standard error
Key messages
• While population parameters are fixed, samples
provide estimates (statistics) that fluctuate
• The distribution of a statistic for all possible
samples of given size ‘n’ is called the sampling
distribution.
 For large ‘n’, the sampling distribution is ‘normal’, even if
the original distribution is not.
 If the original distribution is normal, the result is true even
for small ‘n’.
• The mean of the sampling distribution is the
population mean and the standard deviation
(standard error) is the population SD/ sq.root n