Download Estimating with Confidence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
It is commonly believed that anyone who tabulates numbers is a statistician.
This is like believing that anyone who owns a scalpel is a surgeon.
Hooke
Chapter 10 - Sec 10.1
TWO METHODS FOR INFERENCE:
CONFIDENCE INTERVALS are used to estimate the "value" of a population parameter.
The interval establishes boundaries between which we can have a specified level of
confidence about our parameter of interest. There are different levels of confidence and
we will use the most common ones and also learn to calculate any desired level.
TESTS of SIGNIFICANCE assess the evidence for a "claim" about the population as a
result of gathered data. These tests determine whether the results can be explained by
chance occurrence or not and whether the results differ enough from chance to be
statistically significant.
Both procedures are based on sampling distributions, from sample proportions or sample
means, and report the probabilities that state what would happen if we used inference
methods many times. Long run regular behavior is required. Inference is most reliable
when data comes from RANDOMIZED samples.
We must rely on previously learned concepts especially normal distributions and the
Central Limit Theorem as we move forward with our logic. We also will rely on
standard deviation of the sample = standard deviation of the population divided by the
square root of N (number of trials in the sample).
When we make a claim about a population parameter, we can say that the parameter is
"somewhere around" our sample statistic. SOMEWHERE AROUND is not precise
enough. A better question would be "How would the sampling statistic vary if we took
many samples of equal size from the same population."
Estimating with Confidence:
Suppose I want to know how often teenagers go to the movies. Specifically, I want to
know how many times per month a typical teenager (ages 13 through 17) goes to the
movies.
Suppose I take an SRS of 100 teenagers and calculate the sample mean to be x  2.1
The sample mean is an unbiased estimator of the unknown population mean μ, so I would
estimate the population mean to be approximately 2.1. However, a different sample
would have given a different sample mean, so I must consider the amount of variation in
the sampling model for x .
▪
The sampling model for x is approximately normal. (CLT)
▪
The mean of the sampling model is μ.
▪
The standard deviation of the sampling model is

n
assuming the population
size is at least 10n.
μ - 2σ
μ
μ + 2σ
Suppose we know that the population standard deviation is σ = 0.5. Then the standard
0.5

deviation for the sampling model is
=
 .05
100
n
Then 95% of our samples will produce a statistic x that is between μ – 0.10 and μ + 0.10.
Therefore in 95% of our samples, the interval between μ – 0.10 and μ + 0.10 will contain
the parameter μ (the true population mean).
The margin of error is 0.10.
For our sample of 100 teenagers, x  2.1 . Because the margin of error is 0.10, then we
are 95% confident that the true population mean lies somewhere in the interval 2.1 ± 0.10
or [2.0, 2.2].
The interval [2.0, 2.2] is a 95% confidence interval because we are 95% confident that
the unknown μ lies between 2.0 and 2.2.
How do we construct confidence intervals?
Start with sample data. Compute an interval that has probability C of containing the true
value of the parameter. This is called a level C confidence interval.
Since the sampling model of the sample mean x is approximately normal, we can use
normal calculations to construct confidence intervals.
For a 95% confidence interval, we want the interval corresponding to the middle 95% of
the normal curve.
For a 90% confidence interval, we want the interval corresponding to the middle 90% of
the normal curve.
And so on…
If we are using the standard normal curve, we want to find the interval using z-values.
Suppose we want to find a 90% confidence interval for a standard normal curve. If the
middle 90% lies within our interval, then the remaining 10% lies outside our interval.
Because the curve is symmetric, there is 5% below the interval and 5% above the
interval. Find the z-values with area 5% below and 5% above.
These z-values are denoted ± z*. Because they come from the standard normal curve,
they are centered at mean 0.
z* is called the upper p critical value, with probability p lying to its right under the
standard normal curve.
For a 95% confidence interval, we want the z-values with upper p critical value 2.5%.
For a 99% confidence interval, we want the z-values with upper p critical value 0.5%.
Remember that z-values tell us how many standard deviations we are above or below the
mean.
To construct a 95% confidence interval, we want to find the values 1.96 standard
deviation below the mean and 1.96 standard deviations above the mean, or μ ± 1.96σ.
Using our sample data, this is x  1.96

n
, assuming the population is at least 10n.
In general, to construct a level C confidence interval using our sample data, we want to
find x  z *

n
.
The margin of error is z *

n
. Note that the margin of error is a positive number. It is
not an interval.
We would like high confidence and a small margin of error.
A higher confidence level means a higher percentage of all samples produce a statistic
close to the true value of the parameter. Therefore we want a high level of confidence.
A smaller margin of error allows us to get closer to the true value of the parameter, so we
want a small margin of error.
So how do we reduce the margin of error?
▪
Lower the confidence level (by decreasing the value of z*)
▪
Lower the standard deviation
▪
Increase the sample size. To cut the margin of error in half, increase the sample
size by four times the previous size.
** You can have high confidence and a small margin of error if you choose the right
sample size.**
To determine the sample size n that will yield a confidence interval for a population mean
with a specified margin of error m, set the expression for the margin of error to be less
than or equal to m and solve for n.
z*

 z *  
 m OR n  

n
 m 
2
CAUTION!!
These methods only apply to certain situations. In order to construct a level C confidence
interval using the formula x  z *

n
1) the data must be an SRS
2) we must know the population standard deviation
3) we want to eliminate (if possible) any outliers.
The margin of error only covers random sampling errors.
Things like under-coverage, non-response, and poor sampling designs can cause
additional errors.
STEPS TO CONSTRUCT A CONFIDENCE INTERVAL:
1. Identify the population of interest and the parameter you want to draw conclusions
about. (μ = the true mean….)
2. Verify conditions are met/Assumptions (SRS, approx. normal, pop at least 10n)
3. Name the procedure (1 sample mean Z interval)/write the formula, do calculations
4. Interpret results in the context of the problem. (Based on this sample, I am ___%
confident that the true mean is between ____ and ____)
Using TI-83, press STAT – TESTS – 7:Zinterval, adjust your settings, choose Calculate