Download Sampling Distribution of Sample Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Introduction to Inference
Sampling Distribution of Means &
Central Limit Theorem
Dr. Amjad El-Shanti
MD, PMH, Dr PH
University of Palestine
2016
Course Overview
Collecting Data
Exploring Data
Probability Intro.
Inference
Comparing Variables
Means
Proportions
Relationships between Variables
Regression
Contingency Tables
Inference with a Single Observation
Population
?
Sampling
Parameter: 
Inference
Observation Xi
• Each observation Xi in a random sample is a
representative of unobserved variables in population
• How different would this observation be if we took a
different random sample?
Normal Distribution
• Last class, we learned normal distribution as
a model for our overall population
• Can calculate the probability of getting
observations greater than or less than any value
• Usually don’t have a single observation, but
instead the mean of a set of observations
Inference with Sample Mean
Population
?
Sampling
Sample
Parameter: 
Inference
Estimation
Statistic: x
• Sample mean is our estimate of population mean
• How much would the sample mean change if we took
a different sample?
• Key to this question: Sampling Distribution of x
Sampling Distribution of Sample Mean
• Distribution of values taken by statistic in all possible
samples of size n from the same population
• Model assumption: our observations xi are sampled
from a population with mean  and variance 2
Population
Unknown
Parameter:

Sample 1 of size n
Sample 2 of size n
Sample 3 of size n
Sample 4 of size n
Sample 5 of size n
Sample 6 of size n
Sample 7 of size n
Sample 8 of size n
.
.
.
x
x
x
x
x
x
x
x
Distribution
of these
values?
Mean of Sample Mean
• First, we examine the center of the sampling
distribution of the sample mean.
• Center of the sampling distribution of the sample
mean is the unknown population mean:
mean( X ) = μ
• Over repeated samples, the sample mean will, on
average, be equal to the population mean
– no guarantees for any one sample!
Variance of Sample Mean
• Next, we examine the spread of the sampling
distribution of the sample mean
• The variance of the sampling distribution of the
sample mean is
variance( X ) = 2/n
• As sample size increases, variance of the sample
mean decreases!
• Averaging over many observations is more accurate than
just looking at one or two observations
• Comparing the sampling distribution of the
sample mean when n = 1 vs. n = 10
Law of Large Numbers
• Remember the Law of Large Numbers:
• If one draws independent samples from a
population with mean μ, then as the number of
observations increases, the sample mean x gets
closer and closer to the population mean μ
• This is easier to see now since we know that
mean(x) = μ
variance(x) = 2/n
0 as n gets large
Example
• Population: seasonal home-run totals for
7032 baseball players from 1901 to 1996
• Take different samples from this population and
compare the sample mean we get each time
• In real life, we can’t do this because we don’t
usually have the entire population!
Mean
Variance
100 samples of size n = 1
3.69
46.8
100 samples of size n = 10
4.43
4.43
100 samples of size n = 100
4.42
0.43
100 samples of size n = 1000
4.42
0.06
Sample Size
Population Parameter
 = 4.42
Distribution of Sample Mean
• We now know the center and spread of the
sampling distribution for the sample mean.
• What about the shape of the distribution?
• If our data x1,x2,…, xn follow a Normal
distribution, then the sample mean x will also
follow a Normal distribution!
Example
• Mortality in US cities (deaths/100,000 people)
• This variable seems to approximately follow a Normal
distribution, so the sample mean will also
approximately follow a Normal distribution
Central Limit Theorem
• What if the original data doesn’t follow a Normal
distribution?
• HR/Season for sample of baseball players
• If the sample is large enough, it doesn’t matter!
• 14
Central Limit Theorem
• If the sample size is large enough, then the
sample mean x has an approximately
Normal distribution

• This is true no matter what the shape of
the distribution of the original data!
CENTRAL LIMIT THEOREM
•
•
•
specifies a theoretical distribution
formulated by the selection of all
possible random samples of a fixed
size n
a sample mean is calculated for each
sample and the distribution of sample
means is considered
SAMPLING DISTRIBUTION OF
THE MEAN
•
•
The mean of the sample means is
equal to the mean of the population
from which the samples were drawn.
The variance of the distribution is 
divided by the square root of n. (the
standard error.)
STANDARD ERROR
Standard Deviation of the Sampling
Distribution of Means
x = / \/n
How Large is Large?
• If the sample is normal, then the sampling
distribution of x will also be normal, no
matter what the sample size.
• When the sample population is approximately
symmetric, the distribution becomes
approximately normal for relatively small values
of n.
• When the sample population is skewed, the
sample size must be at least 25 before the
sampling distribution of x becomes
approximately normal.
Central Limit Theorem
• CLT states that for randomly selected sample size (n should be at
least 25, but the larger n , the better the approximation) with a mean
(μ) and standard deviation ( ):
1. The mean of the distribution of sample means is equal to the mean of
the population distribution (μ (x) = μ ).
2. The standard deviation of the distribution of sample means is equal to
the standard deviation of the population divided by the square root of
the sample size : ( (x) = /√n).
3. For any selected sample for any population with mean μ and standard
deviation  then the distribution of sample means is approximately
normal regardless whether the population distribution is normal or not.
EXAMPLE
A certain brand of tires has a mean life of
25,000 miles with a standard deviation of
1600 miles.
What is the probability that the mean life
of 64 tires is less than 24,600 miles?
Example continued
The sampling distribution of the means
has a mean of 25,000 miles (the
population mean)
 = 25000 mi.
and a standard deviation (i.e.. standard
error) of:
1600/8 = 200
Example continued
Convert 24,600 mi. to a z-score and use the
normal table to determine the required
probability.
z = (24600-25000)/200 = -2
P(z< -2) = 0.0228
or 2.28% of the sample means will be less
than 24,600 mi.
ESTIMATION OF POPULATION
VALUES
• Point Estimates
• Interval Estimates
CONFIDENCE INTERVAL
ESTIMATES for LARGE
SAMPLES
•
•
The sample has been randomly
selected
The population standard deviation is
known or the sample size is at least
25.
Confidence Interval Estimate of the
Population Mean


s
s
Xz
   Xz
n
n
-X: sample mean
s: sample standard deviation
n: sample size
EXAMPLE
Estimate, with 95% confidence, the
lifetime of nine volt batteries using a
randomly selected sample where:
-X = 49 hours
s = 4 hours
n = 36
EXAMPLE continued
Lower Limit:
49 - (1.96)(4/6)
49 - (1.3) = 47.7 hrs
Upper Limit:
49 + (1.96)(4/6)
49 + (1.3) = 50.3 hrs
We are 95% confident that the mean
lifetime of the population of batteries is
between 47.7 and 50.3 hours.