Download Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Confidence interval wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Lecture 8
What is a statistical model?
A statistical model for some data is a set of distributions,
one of which corresponds to the true unknown distribution that
produced the data.
The variable θ is called the parameter of the model, and the set Ω
is called the parameter space.
From the definition of a statistical model, we see that there is a
unique value
, such that
is the true distribution that
generated the data. We refer to this value as the true parameter
value.
Example: Suppose we have observations of heights in cm of
individuals in a population and we feel that it is reasonable to
assume that the distribution of height of the population is normal
with some unknown mean and variance. The statistical model in
this case is
Goals of Statistics:
• Estimate unknown parameters of underlying probability
distribution.
• Measure errors of these estimates.
• Test whether data gives evidence that parameters are (or are not)
equal to a certain value or that the probability distribution has a
particular form.
Point Estimation
Most statistical procedures involve estimation of the unknown
value of the parameter of the statistical model.
A point estimator of the parameter θ is a function of the
underlying random variables and so it is a random variable with a
distribution function.
A point estimate of the parameter θ is a function of the data; it is a
statistic. For a given sample an estimate is a number.
Notation:
Desirable properties of a point estimator:
• Unbiased
• Consistent
• Minimum variance
• With known probability distribution
Definition: Let ̂ be a point estimator for a parameter θ. Then ̂ is
an unbiased estimator if ( ̂ )
.
Note: There may not always exist an unbiased estimator for θ.
Unbiased for θ, does not mean unbiased for g(θ).
Example (of unbiased estimator): The sample mean is an unbiased
estimator of the population mean.
If ( ̂ )
, ̂ is called biased.
Definition: The bias of a point estimator ̂ is given by
( ̂)
.
Definition: The mean square error of a point estimator ̂ is
( ̂)
[( ̂
) ].
( ̂)
Note:
Proof:
( ̂)
( ̂)
[
̂ ] .
Example:
Suppose
(̂ )
(̂ )
,
(̂ )
,
̂
̂ .
(̂ )
. Consider ̂
(a) Show that ̂ is an unbiased estimator for ;
(b) If ̂ and ̂ are independent, how should the constant a be
chosen to minimize the variance of ̂ ?
Solution:
Examples of Unbiased Point Estimators
We denote by
̂
estimator ̂ ,
̂
estimator.
the variance of the sampling distribution of the
√
̂
is called the standard error of the
Claim: Let
be a random sample of size n from a
population with mean µ and variance . Then the sample variance
̅ is an unbiased estimator of the population
∑
variance
Proof:
, but
∑
̅
is a biased estimator of
.
Goodness of Point Estimator
Definition: The error of estimation is the distance between an
estimator and its target parameter.
Suppose ̂ is an unbiased estimator of
and has a sampling
distribution. Select a number b and consider
.
Example: A sample of n = 1000 voters, randomly selected from a
city, showed y = 560 in favor of candidate Jones. Estimate p, the
fraction of voters in the population favouring Jones, and place a 2standard-error bound on the error of estimation.
Solution:
Example: (#8.24) Results of a public opinion poll reported on the
Internet indicated that 69% of respondents rated the cost of
gasoline as a crisis or major problem. The article states that 1001
adults, age 18 or older, were interviewed and that the results have a
sampling error of 3%. How was the 3% calculated, and how should
it be interpreted? Can we conclude that a majority of the
individuals in the 18+ age group felt that cost of gasoline was a
crisis or major problem?
Solution:
Confidence Intervals
A point estimate provides no information about the precision and
reliability of estimation. For example, the sample mean ̅ is a point
estimate of the population mean μ but because of sampling
variability, it is virtually never the case that ̅
. A point
estimate says nothing about how close it might be to μ.
An alternative to reporting a single sensible value for the parameter
being estimated is to calculate and report an entire interval of
plausible values – a confidence interval (CI).
Properties of the interval:
- It contains true parameter ;
- It is relatively narrow.
The upper and lower endpoints of a CI are called the upper and
lower confidence limits.
The probability that a CI will enclose
coefficient, denoted by
.
is called the confidence
Definition: A
confidence interval for a parameter
̂ ]
is a random interval ̂ ̂ such that [ ̂
regardless of the value of .
A confidence level is a measure of the degree of reliability of a
confidence interval. It is denoted as 100(1-α)%. The most
frequently used confidence levels are 90%, 95% and 99%.
The higher the confidence level, the more strongly we believe that
the true value of the parameter being estimated lies within the
interval.
Deriving a Confidence Interval
Suppose
are a random sample and we observed the data
which are the realization of these random variables.
We want a CI for some parameter θ.
Pivotal method:
To derive this CI we need to find another random variable that is
typically a function of the estimator of θ satisfying:
1) It depends on
and θ
2) Its probability distribution does not depend on θ or any other
unknown parameter.
Such a random variable is called a “pivot”.
Example: Suppose we are to obtain a single observation Y~Exp(θ).
Use Y to form a CI for θ with confidence coefficient 0.90, or 90%
confidence level.
Solution:
Example:
{
Show that is a pivotal quantity. Use it to find a 90% lower
confidence limit for θ.
Solution:
Large-Sample Confidence Intervals
Example: Let ̂ be a statistic ~
̂ . Find a confidence
interval for with a confidence coefficient
.
Solution:
Example: (#8.56) In a survey of n = 800 randomly chosen adults,
45% indicated that movies were getting better whereas 43%
indicated that movies were getting worse.
(a) Find a 98% CI for p, the overall proportion of adults who
say that movies are getting better.
(b) Does the interval include the value p = 0.50? Do you think
that a majority of adults say that movies are getting better?
Solution:
Width and Precision of CI:
The precision of an interval is conveyed by the width of the
interval.
If the confidence level is high and the resulting interval is quite
narrow, the interval is more precise (i.e., our knowledge of the
value of the parameter is reasonably precise).
A very wide CI implies that there is a great deal of uncertainty
concerning the value of the parameter we are estimating.
Note: Confidence intervals do not need to be central, any a and b
that solve
(
̅
√
)
define 100(1-α)% CI for
the population mean μ.
Example: The National Student Loan Survey collected data about
the amount of money that borrowers owe. The survey selected a
random sample of 1280 borrowers who began repayment of their
loans between four to six months prior to the study. The mean debt
for the selected borrowers was $18,900 and the standard deviation
was $49,000. Find a 95% for the mean debt for all borrowers.
Solution:
Interval Estimation of Variability
In many case we will be interested in making inference about the
population variance.
Theorem: Let
distribution with mean
be a random sample from a normal
and variance . Then
.
Proof:
Now let’s derive a
CI for
:
Example: An experimenter wanted to check the variability of
measurements obtained by using equipment designed to measure
the volume of an audio source. Three independent measurements
recorded by this equipment for the same sound were 4.1, 5.2, and
10.2. Estimate
with confidence coefficient 0.90.
Solution:
The t distribution
Definition: Let Z be a standard normal random variable and let X
be an independent chi-squared random variable with n degrees of
freedom.
The random variable
is said to follow a t distribution
√
with n degrees of freedom.
Theorem: Let
distribution with mean
be a random sample from a normal
and variance . Then,
̅
√
Proof:
CI for μ when σ is unknown
Suppose
are random sample from a normal
distribution with mean and variance , where both μ and σ are
unknown.
If
is unknown we can estimate it by
and use the
distribution. A 100(1-α)% confidence interval for μ in this case is
̅
√
Example: A manufacturer of gunpowder has developed a new
powder, which was tested in 8 shells. The resulting muzzle
velocity (ft/sec):
3005
3925
2935
2965
2995
3005
2939
2905
Find a 95% CI for the true average velocity for shells of this type.
Assume that velocities ~ appr. Normal.
Solution: