Download Estimation of a Population Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Estimation of a Population Mean
Suppose the population mean m is unknown. How could
you provide an estimate?
What do we know about the sampling distribution of the
sample mean. (3 things)
It is centered at the original population mean m.

How variable is it? The standard error of M is
.
n
If n is large enough, or if the original population from
which we are sampling is normally distributed, then the
sampling distribution of M is normal.
Example of the Sampling Distribution of the
Sample Mean
Suppose we know the population we are drawing the
sample from has a mean of m = 100 and a standard
deviation of  = 21.
Suppose the sample size is n = 49.
What does the sampling distribution of the sample mean
look like?
Sampling Distribution of the Sample Mean
Sampling Distribution of the Sample Mean
0.14
0.12
Probability
0.10
0.08
0.06
0.04
0.02
0.00
91
94
97
100
103
xbar
106
109
Probability M is Within 2 Standard Errors of m
What proportion of all M values are between 94 and
106? To answer this question, you convert these values
into z-scores.
Now 94 in terms of the M graph is equivalent to
z = -2.00, and 106 is located at z = +2.00.
How much area is between z = -2.00 and z = +2.00?
According to a z-table for probabilities, it is
.4772 + .4772 = .9544.
So the chance of randomly selecting a M value that is
between 94 and 106 is 95.44%.
Generic Formula to Estimate m
Notice that 94 is 2 standard errors below the mean of 100
and that 106 is 2 standard errors above the mean of 100.
For any scenario where we know the sampling
distribution is normal, we could be certain that a little
more than 95% of all sample mean values are less than 2
standard errors away from the population mean.
In other words, 95.44% of all M values are between
m  2 M and m  2 M . Why is this important?
Distance Between the M and m
For any of these 95.44% of sample mean values, what do
you know about the distance between the sample mean
value and the population mean?
m  M  2 M
But notice, this tells me that if I have one of these
95.44% of sample means, and I go 2 standard errors on
either side of the sample mean, I will have the value of
the population mean in that interval! In other words, I
have a 95.44% chance that m is between the values
M  2 M and M  2 .M
Difficulty with This Formula
In most cases,  is unknown.
Substitute s for , but this creates t-scores instead of zscores in our formula.
Result is the formula:
 s 
 s 
M  t
 to M  t 

 n
 n
Value of t comes form a table or software. What does
the distribution of t-scores look like?
What are T-scores?
The t-distribution (Student’s t-distribution, formally) is
formed when
a sample is taken from a population that is known
to follow a normal distribution and
the standardizing calculation uses the sample, not
the population, standard deviation
What does a graph of t-scores look like? What
characteristics are present?
Graph of T-scores
The t-distribution (Student’s t-distribution, formally) is
similar to a standard normal distribution in that:
it is symmetric and mound-shaped
it is centered at the mean of 0
However, it is different from the z-distribution in that:
t-scores are more variable than z-scores so the
t-curve is stretched farther out in the tails, and has
less probability in the center at the peak
What is Confidence?
Suppose I calculate one 95% confidence interval from a
sample and the interval turns out to be (94.5, 106.2) and
the value of m is 100.
What % of the time is 100 between 94.5 and 106.2?
100% of the time!
So what is meant by the term 95% confidence?
Recall how we created the interval formula and
remember that we based our estimate on the sample we
observed.
Meaning of Confidence
We know that only 95% of all sample means will be
within “t” standard errors of the population mean. Now
95% of all samples will gives us one of these M values.
However, 5% of all samples will give us a M value that
is more than “t” standard errors from m.
So 95% of all the intervals we could form using a certain
t-score in the formula will contain m, while 5% of all the
intervals we could form will not contain the value of m.
For any single interval, it either does or does not contain
m.
Trying to Explain the Meaning of Confidence
Clearly it is difficult to explain confidence for any single
interval. Notice, any single interval either estimates the
value of m correctly (hence is 100% correct) or it does
not (hence it is 0% correct).
The process (forming every possible interval using the
formula from before) works correctly 95% of the time
(assuming “t” if found to give 95% confidence) and
works incorrectly 5% of the time.
I suggest not trying to interpret confidence in terms of
only one sample. Use a large number of samples.
Large Number of Different Samples Explanation
The best option is to use a large number of samples. For
example, what does 99% confidence mean?
Suppose I took 100,000 samples of n = 50 observations
each (so the sampling distribution of M is normal) and
made intervals using t = 2.680 with the formula.
I would expect about 99,000 of the intervals (that is
99%) to contain the true value of m, while 1,000 intervals
would not contain m. Notice for any one interval, I don’t
know if it is one of the 99% correct ones or the 1% of
incorrect ones.
Confidence Interval Steps by Hand
To calculate a confidence interval for m using s:
1. Locate the sample mean, M
2 .Locate the sample standard deviation, s
3. Locate the sample size, n
4. Locate the t-value based on the desired
confidence and sample size ( df = n – 1)
5. Find the two endpoints of the interval
 s 
M  t
 and
 n
 s 
M  t

 n
6. Give an interpretation (see next slide)
7. Be sure the interval is valid (see later slide)
Interpreting a Confidence Interval and Validity
To interpret a confidence interval, you should include 3
parts: (1) the level of confidence in your statement, (2)
the parameter you are estimating, and (3) the values of
the interval.
General format for now: With 95% confidence, I
estimate the mean “fill in the scenario for this
population” is between “lower value” and “upper value”.
Validity of a Confidence Interval
A confidence interval will be valid using this formula if:
1. a random sample is taken from the population
2. the population standard deviation, , is
unknown so we are using s
3. the sampling distribution of the mean (M)
is normal (approximately at least). Recall this
is true when either
(i) n is at least 30 or
(ii) the population from which the sample is
taken is normally distributed
Example of a Confidence Interval
A bank manager would like to estimate the mean savings
account balance for all savings accounts at her bank. She
randomly selects 50 accounts and observes the balances.
For these 50 accounts, the mean balance is $2135. She
also calculates the standard deviation of these 50 account
balances to be $820. Using 99% confidence, provide the
bank manager with the information she seeks. Assume
t = 2.680
Notice we are estimating a population mean, specifically
m which is the mean balance for all savings accounts (the
population) at her bank. Also  is unknown, but we have
calculated s to be $820.
The Confidence Interval and Interpretation
M = $2135
s = $820
n = 50 balances
and t = 2.680 (from 99% confidence and df = 49)
 s 
 820 
M  t
  2135  2.680
  (1824.21, 2445.79)
 n
 50 
Note: the plus/minus symbol is shorthand. Once you do
the calculation with the subtraction, then do it with an
addition. This gives the two endpoints of your interval.
Interpretation: With 99% confidence, I estimate the mean
balance in all savings accounts at this bank to be between
$1824.21 and $2445.79.
Is the Confidence Interval Valid?
To see if the interval is valid, remember to check for the
3 requirements:
1. a random sample was selected
2.  is unknown so we use s = $820
3. n = 50 balances is at least 30
Yes, the confidence interval is valid.
One final note – the interpretation is a reference to the
population mean, hence the highlighted word “all” in my
interpretation. We always estimate parameters.