Download Chapter 8 Estimation - University of South Alabama

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Chapter 8
Estimation
Nutan S. Mishra
Department of Mathematics and Statistics
University of South Alabama
Statistical Inference
Drawing inference about the unknown
population parameter based on the
information from a sample.
Statistical inference is studied in two parts:
Estimation and Testing of Hypothesis.
Estimation
When we can not perform a census, we can not
know the value of the population parameter.
For example US census bureau may want to find
the average expenditure per month incurred by a
household in eating outside.
Population average  is unknown.
So we collect a sample and assign the value(s)
based on sample values
This is estimation
Examples of Estimation
1. Since we do not know the average expenditure
per week on outside eating, we collect a sample
and compute a sample mean and assign the
value of sample mean to unknown population
mean.
2. We do not know the proportion p of all the
smokers in united states so we collect a sample
of people and compute the sample proportion
x/n
Estimation procedure
• Select a sample
• Collect the required information from the
members of the sample.
• Calculate the value of sample statistics
Assign the value(s) to the corresponding
population parameter.
An estimator may be a point estimator or
interval estimator
Point Estimation
The single value of a sample statistics is called point
estimate of the corresponding population parameter.
Example:
When population mean is  unknown th en
the sample mean x is an estimator of 
we write ˆ  x
sample mean is a point estimator of the population mean
when population proportion p is unknown th en
the sample proportion p̂  x/n is an estimator of p
p̂ is a point estimator of p.
Point Estimation
Consider the problem of finding the average GPA of all
students at USA (around 13,000)
Let  be the population average, we do not know the value
Collect a sample of size say n=80 students.
Collect their GPAs. Compute the sample mean.
Suppose the sample mean is 3.04
x is called estimator of 
3.04 is an estimate of 
we collect another sample of size n=100 and compute the
sample mean. Suppose the sample mean is 2.99
Then 2.99 is another estimate of 
Error in the estimation
In the last chapter we have seen that there is a
difference between value of  and the value of
  x is the error in estimation.
Margin of error =
 1.96 x
x
Interval Estimator
To estimate an unknown value, instead of using a single point
value, we use an interval of the values.
This is called interval estimation.
In interval estimation, an interval is constructed around the
point estimate.
It is said that this interval is likely to contain unknown value of
population parameter
Question : how likely? Can we assign likelihood to our
statement?
Interval estimator
Consider the estimation of the population mean  of GPA of
the students at USA
After collecting a sample we computed the sample mean .
Suppose sample mean
x = 3.05
We add and subtract a number from x and ask the question:
how confident are we that the interval contains unknown
value of 
3.05+.15 = 3.20 and 3.05-.15 = 2.90
What is the probability that value of  lies between 3.20 and
2.90?
Questions: what number should be added and subtracted?
How to attach probability (confidence level) with an interval?
Interval estimation
Confidence level and Confidence interval:
Confidence level associated with an interval
states how much confidence we have that
this interval contains the true value of the
population parameter.
Such an interval is called confidence interval
Confidence level is denoted by (1-)%
Interval estimation of 
Recall that x is a point estimator of  and from
chapter 7 that x ~ N(, x ) =N(,/n) whenever
the sample size is large.
For large samples, the (1-α)*100% confidence
interval for µ is given by
x  z x if  is known
x  zs x if  is unknown
recall  x   / n and s x  s / n
value of z is read from z - table for given confidence level
E  z x or zsx is called maximum error estimate for 
Confidence interval
For large samples
x  z x if  is known
x  zs x if  is unknown
recall  x   / n and s x  s / n
value of z is read from z - table for given confidence level
Question: How to compute z for given (1-α) in the above
formula?
For a 95% confidence interval (1-α) = .95
α = .05, α/2 = .025
thus z = 1.96
Similarly we can compute z
α/2
α/2
(1-α)
for 99%, 98% etc. confidence
levels
-4
-3
-2
-z
-1
0
1
2
z
3
4
Example of Confidence Interval
8.11 Given n=64, x = 24.5, and s = 3.1
a. Point estimate of µ is 24.5
b. Margin of error associated with the point estimate
of µ is  1.96 x =1.96*s/√n = 1.96*3.1/8 =.7595
c. 99% confidence interval for µ is x  zs x =
24.5± z* 3.1/8
To compute z, 1-α=.99 , α=.01, α/2 = .005 z=2.58
Thus confidence interval is given by 24.5± 2.58*
3.1/8 = (24.5± .99975) =(23.50025,25.49975)
d. Maximum error of estimate is .99975
Interpretation of Confidence Interval
In the earlier example we constructed a 99%
confidence interval for µ, which is
=(23.50025,25.49975)
This means that we are 99% confident that the
unknown value of µ lies between 23.50025 and
25.49975
This does not mean that the interval contains µ
with probability .99
This means that if we draw all possible samples of
size 64 from the given population, then 99% of
all such intervals will contain the value of µ.
Interpretation of Confidence Interval
Recall the formula for confidence interval for µ
x  z x  x  z 
n
Note the following;
• The values in the interval depend on the sample chosen
• The width of the interval is 2 z 
n
• A narrow interval is a better interval
• The width depends on
– Z-value which in turn depends on confidence level
– Size of the sample
These are the two quantities which we can control.
To decrease the width of the interval
– Lower the confidence level (not a good choice)
– Increase the sample size .
Application: Ex. 8.22
X= amount of time spent/week online by mothers with children
under age 18.
n=1000 x = 16.87 hrs, s = 3.2
To construct 95% confidence interval for µ.
It’s a large sample, so formula to construct such interval is
x  zs x = 16.87± z *3.2/√1000
= 16.87± 1.96 *3.2/√1000
= 16.87± .1983
=(16.6717 ,17.06833)
Interpretation: If we draw a large number of samples each of
size 1000, and construct a confidence interval
corresponding to each sample, then 95% of all such
intervals will trap the true value of µ
Small samples case
Objective: To construct confidence interval for µ when the
sample is small.
T-distribution is used to construct a confidence interval for µ if
1. The population from which sample is drawn is
approximately normal
2. Sample size is small
3. Population standard deviation σ is unknown.
Formula is
The (1 -  )% conficence interval for  is
x  ts x where s x  s
n
the value of t is obtained from the t - table for
n - 1 degrees of freedom and give level of confidence
What is a t-distribution?
•A specific bell shaped sampling distribution
•Only parameter is (n-1) where n is size of the sample
•(n-1) is called degrees of freedom
•Shape depends on degrees of freedom (n-1)
•t-distribution approaches to standard normal for larger values of n
•Values of t are tabulated for different degrees o freedom and right tails.
Picture borrowed from:http://www.aiaccess.net/tutor_demo/tutor_t_1.htm
Exercise 8.39,.40,.41
8.39(a) Area in the right tail = .05, df =12
From the t-table value of t =1.782
4.40(a) Given that n= 21, area in the left tail is .10
Here df = n-1 = 20 and since t-curve is symmetric first we find
t-value for area in the right tial = .10 and then assign a
negative sign for the required value
For df=20 and area in right tail =.1, t=1.325
For df=20 and area in left tail = .1, t= -1.325
4.41(a) Given that t-value = 2.467 and df= 28, to find the area
in right tail
In the t-table in the first column look for 28. then in the row of
28, look for a t-value =2.467, find the corresponding area in
the top row.
= .01
http://lib.stat.cmu.edu/DASL
/
Exercise 8.43(a)
Given confidence level = 99%
1-α = .99 α= .01 α/2 = .005
Also given that df = 13
Thus from table for df=13 and α/2 = .005
t-value = 3.012
Exercise 8.49
X= time spent in waiting in a line to….
Assumption X~ N(µ,σ) both unknown
Draw a sample of size n=16
Computed x = 31, s = 7 minutes
To construct a 99% CI for µ
Note that
1. Population is approximately normal
2. Population standard deviation is unknown
3. Sample size is small
Then formula for CI is x  ts x where s x  s n
=31±t*7/√16 = 31±t*7/4
Computation of t-value α/2=.005 df=n-1 = 15 thus from table
t= 2.947
Exercise 8.49 continued