Download Lecture 14 - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Serial digital interface wikipedia , lookup

Transcript
Lecture 14
Sections 7.1 – 7.2
Objectives:
•Estimation and Statistical Intervals
− Point Estimation
− Large sample confidence interval
Statistical Inference
When we cannot get information from the entire population, we take a
sample.
1. However, as we have seen before statistics calculated from samples
vary from sample to sample. (Recall sampling distribution).
2. When we obtain a statistic from a sample, we do not expect it to be
the same as the corresponding parameter.
3. A single number (point estimate) or an interval of numbers (interval
estimate) can be used to estimate a population parameter.
4. It would be desirable to have a range of plausible values which take
into account the sampling distribution of the statistic. A range of
values will capture the value of the parameter of interest with some
level of confidence. This is known as a confidence interval (CI).
5. We can find confidence intervals for any parameter of interest,
however we will be primarily concerned with the CI’s for a population
mean μ , a population proportion π , and population mean difference
μ1 − μ2 in this chapter.
Point Estimation
Definition. A point estimate of some parameter (unknown) is a
single number, calculated from sample data, that can be regarded as
an educated guess for the value of the parameter.
Example.
x = 86oF: a single number for the average temperature in summer in
Auburn.
μ = the actual average temperature in summer in Auburn, which is
unknown.
A point estimate is usually obtained by selecting a suitable statistic
and calculate its value for the given sample data. The statistic used
to calculate an estimate is called an estimator.
Some Point Estimators
Point estimator for a population mean μ is the sample mean .x
Point estimator for a population variance σ2 is the sample
variance s2.
Point estimator for a population proportion π is the sample
proportion p.
Point estimator for a population correlation coefficient ρ is the
sample correlation coefficient r.
Properties of a good estimator
1) Unbiasedness
Definition. Denote a population parameter generically by the latter
θ and denote any estimator of this parameter by ˆ . Then, ˆ is an
unbiased estimator if the mean of the sampling distribution is equal to
the true value of the parameter being estimated, i.e., ˆ   Otherwise, it
is said to be biased, and the quantity ˆ   is called the bias of θ .
Example
 x is an unbiased estimator of μ .
 s2 is an unbiased estimator of σ2 .
 p is an unbiased estimator of π .
Properties of a good estimator
2) Consistency
Definition. If the probability that an estimator falls close to a
population parameter θ can be made as near to 1 as desired by
increasing the sample size n, then it is said to be a consistent
ˆ
estimator of θ .
Example
x
 x is a consistent
estimator of μ .
s2 is a consistent estimator of σ2 .
p is a consistent estimator of π .
Large-sample confidence intervals for a population
mean
A point estimate ,since it is a single number , provides no information
about the precision and reliability of the estimator.
Example. Use the sample mean to calculate a point estimator for the
true average height of students of AU and suppose that x =170cm.
Because of sampling variability, μ won’t equal to x . Point estimate
(170cm) says nothing about how close it might be to μ. Instead we
calculate and report an entire interval of plausible values – an interval
estimate or CI.
Confidence Interval
A CI is usually constructed in such a way that we have a certain
confidence that the interval does contain the unknown parameter.
 CI is always calculated by first selecting a confidence level, which is
a measure of the degree of reliability of interval.
 The higher the confidence level, the more strongly we believe that
the value of the parameter being estimated lies within the interval.
 The precision of an interval estimate is conveyed by the width of the
interval. If the confidence level is high and the resulting interval is quite
narrow, our knowledge of the value of the parameter is reasonably
precise.
Confidence Interval
By CLT, for large n, x~ N(μ,σ2/n) approximately and so
x


P  z 2 
 z 2   1  
/ n


where 1−α is the degree of confidence or confidence level and z_α / 2 is the
upper α / 2 percentile of a standard normal distribution.
Confidence Interval for a population mean μ
A 100(1−α )% CI for μ is given by
x  z 2

n
The most common levels of confidence are 90%, 95% and 99%, and the
corresponding z critical values are given in the table below.
α
100(1- α)%
(Confidence
level)
zα/2
0.10
90%
1.645
0.05
95%
1.96
0.01
99%
2.576
Things to note
1. We can use this formulas when (a) n is sufficiently large (say, n > 30) and
(b) σ is known.
2. It is unrealistic to know σ , in practice. The sample standard deviation (s)
can replace σ in the formula if n is sufficiently large.
3. We should not use the above formula when n is small and σ is unknown
We need the t-distribution (section 7.4) or nonparametric statistics (won’t be
covered in this course) when this happens.
Examples
Suppose a large hospital wants to estimate the average length of time
patients remain in the hospital. Since finding the average of all patient stays
is difficult and time-consuming, we select an SRS of 100 previous patients’
records and find the average of these stays to be 4.53 days with a standard
deviation of 3.68 days. Construct a 95% confidence interval for μ . Also,
construct 90% CI and 99% CI.
Examples
The alternating current (AC) breakdown voltage of an insulating liquid
indicates the dielectric strength. The following data on breakdown voltage
(kV) of a particular circuit under certain conditions:
62 50 53 57 41 53 55 61 59 64 50 53 64 62 50 68
54 55 57 50 55 50 56 55 46 55 53 54 52 47 47 55
57 48 63 57 57 55 53 59 53 52 50 55 60 50 56 58
a. Find a point estimate for the true average breakdown.
b. Construct a 95% confidence interval for the true average breakdown.
True meaning of C.I.
With 95% confidence, we can say
that µ should be within roughly 2

standard deviations (2*/√n) from

our sample mean x .
– In 95%
 of all possible samples
of this size n, µ will indeed fall
in our confidence interval.
x
– In only 5% of samples would
be farther from µ.

n
Choosing the sample size
The half-width of a CI is called the bound on the error of estimation.
i.e.
z 2

or
n
z 2
s
n
To get a desired bound of error (B) by adjusting the sample size n we use the
following:
- Determine the desired bound of error (B).
- Use the following formula:
 z 2
n  
 B




 z 2 s 

n  

 B 
2
If σ is known
2
If σ is unknown
Examples
Suppose we wish to construct a 95% confidence interval that is to be within
.5days of the true value of the average stay. It is known that σ =3.68 from a
previous study. What is the sample size required to achieve the desired
accuracy?
Revisit the breakdown voltage data example. Find the appropriate sample
size for estimating the average breakdown voltage to within 1kV with
confidence level 95%.
One-sided C.I.s
Sometimes,
An investigator may wish to calculate a 95% CI only upper confidence
bound for true reaction time to a particular stimulus,
A reliability engineer may wish to find only a lower confidence bound for
true average lifetime of components of a certain type.
A large sample upper confidence bound for μ is
  x  z
s
n
A large sample lower confidence bound for μ is
  x  z
s
n
The three most commonly used confidence levels, 90%, 95% and 99% use z
critical values of 1.28, 1.645 and 2.33 respectively.
Example
A sample of 48 shear strength observations gave a sample mean strength of
17.17N/mm2 and a sample standard deviation of 3.28 N/mm2. Find a 95%
lower confidence bound for true average shear strength μ .