Download Estimated standard error of the sample average

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Estimation
Sampling Distributions
• Because estimators are based on random samples, they are
random variates just like data!
• Estimators have distributions called sampling distributions
Say are interested in mean Mn mass contained
in bullets manufactured at a particular factory
Lets use the average mass of Mn in a
sample (of size n) to estimate the
population mean mass:
What might the distribution of
size 10 over the course of a week?
look like if we take 1000 samples of
Sampling Distributions
Important features of an estimator’s
sampling distribution:
(Approximate) sampling
distribution of
Sampling dist
mean:
Sample size, n = 10 bullets
Number of samples = 1000
Sampling dist s.d.:
Handy Unbiased Estimators
• An unbiased estimator of the mean that we always use is:
Same as MLE estimate
• An unbiased estimator of the variance (which we will typically
use as a variance estimator) is:
Different from MLE estimate
Handy Unbiased Estimators
• An unbiased estimator for a proportion is:
Heads, Success,
etc, …
• An unbiased estimator of the standard error of p is:
Sampling Distributions
• Uncertainty in the estimate can be represented as standard
deviation for the sampling distribution:
is called the standard error of the estimator
• Estimated standard error of the sample average by plugging
in
Interval Estimation
• We are interested in methods that produce an interval:
• Given the assumptions of the methods are satisfied, the interval
covers the true value of the parameter with (approximate)
probability at least 1 – a.
• Common interval methods for:
• Confidence intervals
• Prediction intervals
• Tolerance intervals
• Credibility/Probability intervals (Bayesian)
Confidence Intervals
• q is a parameter we are interested in and assume we
don’t know its true value.
• e.g. a mean, a sd, a proportion, etc.
• Consider an experiment that will collect a sample of
data.
• Then BEFORE we collect the data, we can devise
procedure such that:
Estimates we will get from the sample we have yet to collect
Confidence Intervals
• In order to get actual numerical values for
the experiment and plug in the data
and
we perform
• The outcomes for this experiment are:
• Under the frequentist definition, probabilities (other than 0 or 1)
only exist for outcomes of experiments that haven’t happened yet.
• After we collect data
is a set of plausible values for q.
Confidence Intervals
• Given a sample of data, the (1 − a)×100% confidence interval for a
parameter estimate on the sample is:
• We are (1 − a)×100% confident that the true value of q is covered by
• The CI’s level of confidence: (1 − a)×100% is the same “number”
as the CI –method’s probability of producing an interval that
covers q, but…
confidence is not probability
Confidence Intervals
• So how do we compute a (1 − a)×100% confidence interval given
a set of data??
• General Case: (1 − a)×100% CIs for the mean m :
• Sample size n, sd sX unknown and estimated:
Two sided
One sided, lower bound
One sided, upper bound
Student-t(n-1) quantiles qt(1-a/2,df=n-1) or qt(1-a,df=n-1)
Compute the Confidence Intervals
A the mass of an unknown powder was determined 30
times. The Results are shown below (units: mg):
4.11, 3.70, 3.36, 3.68, 4.42, 3.23,
4.03, 4.03, 3.52, 4.75, 5.09, 3.47,
3.02, 4.24, 4.74, 4.51, 2.90, 4.15,
3.54, 3.81, 2.98, 3.82, 4.32, 3.06,
4.00, 4.05, 3.19, 3.17, 3.67, 4.37
Compute:
a.
b.
c.
d.
The sample mean:
The sample sd:
The estimated standard error of the mean:
The number of estimated standard errors that cover 95% of the sampling
distribution symmetrically about the sample mean: ±
Compute the Confidence Intervals
a. Sample mean = 3.83
b. Sample sd = 0.58
c. Est se of mean = 0.11
d. For 95% , a = 0.05.
95% spread symmetrically
about the mean we want t0.025, 29
and t0.975, 29 = ± 2.04523
# Data from the question:
x <- c(4.11, 3.70, 3.36, 3.68, 4.42, 3.23, 4.03, 4.03, 3.52, 4.75, 5.09,
3.47, 3.02, 4.24, 4.74, 4.51, 2.90, 4.15, 3.54, 3.81, 2.98, 3.82,
4.32, 3.06, 4.00, 4.05, 3.19, 3.17, 3.67, 4.37)
n <- length(x)
mn <- mean(x)
s <- sd(x)
se <- s/sqrt(n)
#
#
#
#
Sample size
Sample average (estimated mean)
Sample standard deviation
Estimated standard error of the mean
alpha <- 0.05
conf <- 1 - alpha/2
tt <- qt(p = conf, df = n-1)
# Level of significance
# Level of confidence
# t-quantile: The number of estimated standard
#
errors that cover conf*100% of the
#
sampling distribution for the mean.
Compute the Confidence Intervals
e. Compute the two-sided 95% CI for the mean given this data:
[ 3.83 – 2.04*0.11, 3.83 + 2.04*0.11 ]
lo <- mn - tt*se
hi <- mn + tt*se
c(lo,hi)
# Two-sided confidence interval for a set of
# plauseable values for the mean given this sample.
[3.61, 4.05]
Confidence Intervals
• For us, we can approximate the CI for any parameter we have
encountered as
• (1 − a)×100% CIs for general parameter q :
Two sided
One sided, lower bound
One sided, upper bound
Student-t(n-1) quantiles qt(1-a/2,df=n-1) or qt(1-a,df=n-1)
Example
Over a several month period the rate of attacks on a
certain computer network per day were measured:
11.1, 12.3, 12.0, 11.3, 12.6, 12.9, 12.0, 13.2,
11.8, 13.2, 12.4, 10.3, 12.0, 12.1, 13.1
Compute the 90% lower confidence limit of the hack
rate parameter.
Related documents