Download Handout 7 - TAMU Stat

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
STAT 211
Handout 7
(Chapter 7: Statistical Intervals based on a Single Sample)
A point estimate of a population characteristic is a single number that is based on sample data and
represents a plausible value of the characteristic.
The best statistic (MVUE) is the unbiased statistic with the smallest standard deviation.
Since the point estimate is a single number, it does not provide information about the precision and
reliability of estimation.
A confidence interval for a population characteristic (parameter) is an interval of plausible values
for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the
characteristic will be captured inside the interval. The confidence level, 1-, associated with a
confidence interval estimate the success rate of the method used to construct the interval.
If we repeatedly sample from a population and calculate a confidence interval each time with the
data available, then over the long run the proportion of the confidence intervals that actually contain
the true value of the population characteristic will be 100(1-)% (95%, 90%, or 99% for =0.05,
0.10, or 0.01, respectively).
The general form of a confidence interval:
(point estimate for a specified statistic)  (critical value).(standard error for the point estimate).
What is the best estimator for parameters, , 2, p? _____________
_
Empirical Rule tells you about 95% of all our values for x will be within 1.96 standard deviation
from the mean.



1- when you compute 95% confidence interval is 0.95
 when you compute 95% confidence interval is 0.05
z / 2 when you compute 95% confidence interval is 1.96
Confidence Interval for a Population Mean, 
(1) Let X1, X2, ....,Xn be a random sample from a normal population with the unknown population
mean  and the known population standard deviation , then 100(1-)% confidence interval for  is
_


_
 _
 
x 


 x  z / 2
, x  z / 2
 where P  z / 2 
 z / 2   1  
n
n
/ n





Thus, in 95% of all possible samples,  will be captured in the following calculated confidence
_
interval: x  1.96 

n
2
(2) Large Sample Confidence Interval for : Let X1, X2, ...,Xn is a random sample from a population
distribution with mean,  and standard deviation, . For the large sample size n, the CLT implies
_
that X has approximately a normal distribution for any population distribution. The value of the
population standard deviation  may not be known. Instead, the value of the sample standard
deviation s may be known. If n is sufficiently large (n>40), 100(1-)% large sample confidence
_
_
_
s _
s 
s
s 
interval for  is  x  z / 2
, x  z / 2
 where P x  z / 2
   x  z / 2
  1  
n
n
n
n


Thus, in 95% of all possible samples,  will be captured in the following calculated confidence
_
s
interval: x  1.96 
n
(3) Small Sample Confidence Interval for : When the sample size is small (n≤40), we have to
make specific assumptions to find the confidence intervals.
Assumption: The population of interest is normal, so that X1, X2, ...,Xn constitutes a random sample
from a normal distribution with both  and  unknown.
When the sample mean of a random sample of size n from a normal distribution with mean , the
_
random variable T 
x 
has a probability distribution called a t-distribution with n-1 degrees of
s/ n
freedom (Properties of t-distribution: discussion (page 300 of your textbook) and t-distribution
table is on page 743, Table A.5).
_
s _
s 
100(1-)%
confidence
interval
for

is
where
 x  t / 2;n 1
, x  t / 2;n 1

n
n

_


x 


P  t / 2;n1 
 t / 2;n1   1  
s/ n




Thus, in 95% of all possible samples,  will be captured in the following calculated confidence
_
s
interval: x  t 0.025;n 1 
n
Choosing the sample size: With the known desired confidence level and interval width, we can
determine the necessary sample size. Let X1, X2, ....,Xn be a random sample from a normal
population with the unknown population mean  and the known population standard deviation ,
The width of the interval is w= 2 z / 2
_
x will be within z / 2

n

n
and the bound on the error estimation is z / 2

n
. I mean
of . The sample size required to estimate a population mean  to within
3
an amount B= z / 2
z  
with 100(1)% confidence is n=   / 2  .
n
 B 

written using the interval width, w= 2 z / 2
2
The same formula can be
 2z  
then n=   / 2  .
n
 w 

2
Example 1:Each of the following is a confidence interval for true average amount of time spent by
the patients using physical therapy device using the sample data: (10.90, 25.44), (13.58, 22.76)
(a) What is the value of the sample mean time spent by the patients using physical therapy device?
(b) The confidence level for one of these intervals is 95% and for the other is 99%. Which of the
intervals has the 95% confidence level and why?
Example 2: Suppose we want to estimate the average # of violent acts on TV per hour for a specific
network. Data was collected from viewing random selection of 50 prime time hours and average of
11.7 violent acts were recorded. Suppose it is known that =5 and population distribution is
normal.
The 95% CI for  is (10.3141 , 13.0859)
The 95% confidence interval for  if 100 prime time hours had been viewed where the same mean
and the variance obtained is (10.72 , 12.68)
The 90% CI for  is (10.5368 , 12.8632)
The width of the 90% confidence interval for  is 2.3264
The bound on the error estimation of the 90% confidence interval for  is 1.1632
Example 3: Investigators would like to estimate the average taxable income of apartment dwellers
to within $500, using a 95% CI for the normally distributed data. Suppose that the previous studies
show that standard deviation is $8000. How many people should they study? (Answer: 984)
Example 4: The brightness of a television picture tube can be evaluated by measuring the amount of
current required to achieve a particular brightness. An engineer has designed a tube that he believes
will require 300 microamps of current to produce the desired brightness level. A sample of 10 tubes
results in the average of 317.2 and the standard deviation 15.7. Using 95% confidence interval, did
he achieve the desired brightness?
Example 5: I want to see how long on average, it takes Drano to unclog a sink. In a recent
commercial, the stated claim was that it takes on average, 15 minutes. I wanted to see if that claim
was true, so I tested Drano on 64 randomly selected sinks. I found that it took an average of 18
minutes with standard deviation of 2.5 minutes. Was their claim false?
99% CI for  is (17.1953 , 18.8047)
90% CI for  is (17.4859 , 18.5141)
4
Would my answer be different if I tested Drano on 25 randomly selected sinks and I found that it
took an average of 18 minutes with standard deviation of 2.5 minutes?
99% CI for  is (16.6015 , 19.3985)
90% CI for  is (17.1445 , 18.5555)
Example 6: Students weighed in kilograms at the beginning and end of a semester long fitness class.
Assume the population of weight changes follows a normal distribution. A random sample of 12
female students yielded a mean of 0.45 and standard deviation of 1.5.
99% CI to estimate the true mean weight change is (-0.8949 , 1.7949).
Would you believe me if I claimed the average weight change was 0?
What is different in one-sided confidence intervals? Discussion
Example 7: Determine the confidence level for each of the following large sample one-sided
confidence bounds.
_
s
(a) Upper bound: x  0.93
(Answer: 0.8238)
n
_
s
(b) Lower bound: x  1.75
(Answer: 0.9599)
n
Would your answer be different in small samples?
A General Large Sample Confidence Interval
^
When the estimator  satisfies the following properties,
a. The estimator has approximately a normal population distribution
b. It is at least unbiased
c. standard deviation of the estimator is known
The
confidence
interval
for

^
can
be
constructed
as
  z / 2
^

where
^






P   z / 2 
 z / 2   1  



^



Example 8: large sample confidence interval for the parameter  in Poisson distribution is
_
_ 
_
_



x _
x
x 


, x  z / 2
 z / 2   1  
 x  z / 2
 where P  z / 2 
n
n
/n







Large Sample Confidence Interval for a population proportion, p
If n is sufficiently large, 100(1-)% large sample confidence interval for
^
^
^
^ 
^


^



p(1  p) ^
p(1  p) 
p p
, p  z / 2
 z / 2   1  
 p  z / 2
 where P  z / 2  ^
^
n
n




p
(
1

p
)/n




p
is
5
^
^


Check if n p  10 and n1  p   10 to see if you have a large sample. Otherwise, there is a


formula (7.10) in your textbook, which can be used without checking if it is a large sample. I mean
formula (7.10) can be used for large and small samples.
Choosing the sample size: With the known desired confidence level and interval width, we can
^
determine the necessary sample size. Bound on the error estimation is z / 2
^
^
^
p(1  p)
. I mean p
n
^
p(1  p)
will be within z / 2
of p. The sample size required to estimate a population proportion p
n
^
^


^
^
z2 / 2 p1  p 
p(1  p)

 . The same
to within an amount B= z / 2
with 100(1)% confidence is n=
2
n
B
^
^


^
^
4 z2 / 2 p1  p 
p(1  p)

.
formula can be written using the interval width, w= 2 z / 2
then n=
n
w2
^
^
The conservative sample size can be found when p = 1  p =0.5
What is different in one-sided confidence intervals? Discussion
Example 9: We are interested in proportion of all students enrolled in Stat211 who listen to country
music. Using our class as random sample from Stat211 students, we see that ___________ out of
___________ listen to country music. Estimate the true proportion of all Stat211 students that
listen to country music using 90% confidence interval.
What parameter are we estimating?_______________
Example 10: Scripps News service reported that 4% of the members of the American Bar
Association (ABA) are African American. Suppose that this figure is based on a random sample of
400 ABA members.
(a) Is the sample size large enough to justify the use of the large-sample confidence interval for a
population proportion?
(b) Construct and interpret a 90% confidence interval for the true proportion of all ABA members
who are African American. (Answer: (0.0239 , 0.0561))
Example 11: I want to estimate the proportion of freshmen Aggies who will drop out before
graduation. How many Aggies should I include in my study in order to estimate p within 0.05 with
95% confidence? (Answer: 385)
6
A Prediction Interval for a Single Future Value:
Let X1, X2, ...,Xn be a random sample from a normal population distribution and we wish to predict
the value of Xn+1, a single future observation. 100(1-)% prediction interval for Xn+1 is
_
1 _
1
 x  t / 2;n 1 s 1  , x  t / 2;n 1 s 1   where

n
n 



_


x  x n 1

P  t / 2;n 1 
 t / 2;n 1   1  


1


s 1
n


Example 12: What is the 99% prediction interval for the weight change of an individual student
from the population distribution in example 6? (Answer: (-4.3992 , 5.2992))
Tolerance Intervals: Let k be a number between 0 and 100. A tolerance interval for capturing at
least k% of the values in a normal distribution with a confidence level 100(1-)% has the form
_
_

 x  critical value  s , x  critical value  s 


Table A.6 (page 726) is designed for the tolerance critical values where k=90, 95, 99 and =0.05
,0.01 in one and two-sided intervals.
Example 13: Use example 6 and calculate an interval that includes at least 95% of the student
weight changes in the population distribution using a confidence level of 99%. (Answer: (-5.355 ,
6.255))
Confidence Intervals for the Variance, 2 and Standard Deviation,  of a Normal Population :
The population of interest is normal, so that X1, X2, ...,Xn constitutes a random sample from a
normal distribution with parameters  and 2. Then the random variable
_


x

x



i
2
(n  1)  s

i 1 

2
2
n

freedom.

100(1-)%
2
has a chi-squared (  2 ) probability distribution with n-1 degrees of
confidence
interval
for
2
is
 (n  1)  s 2 (n  1)  s 2

, 2
 2
 1 / 2;n 1
  / 2;n 1




where


(n  1)  s 2
P 12 / 2;n1 
  2 / 2;n1   1   .
2



The details of the chi-squared (  2 ) probability distribution will be discussed in class and the table
of critical values (Table A.7, Page 727) will be demonstrated.
7
Example 14: Determine the following:
(a) The 95th percentile for the chi-squared distribution with n=20.
(b) The 5th percentile for the chi-squared distribution with n=20.
(c) P(10.117  2  30.143) where  2 is a chi-squared r.v. with n=20.
(d) P(  2 <10.283 or  2 >35.478) where  2 is a chi-squared r.v. with n=22.
Example 15 (Exercise 7.46, the 6th edition and Exercise 7.44, the 5th edition):
(a) Is it plausible to assume that the data come from a normal population distribution?
Normal Probability Plot for turbidity
99
ML Estimates
95
Mean:
25.3133
StDev:
1.52528
90
Percent
80
70
60
50
40
30
20
10
5
1
22
24
26
28
30
Data
Variable
turbidity
n
15
Mean
25.313
Median
25.800
TrMean
25.438
Variable
turbidity
Minimum
21.700
Maximum
27.300
Q1
24.100
Q3
26.700
StDev
1.579
SE Mean
0.408
(b) Calculate a 95% CI for the population standard deviation of turbidity.
 (n  1)  s 2 (n  1)  s 2   14(1.579) 2 14(1.579) 2

,
,
95% CI for  is 
2
  2 / 2;n 1
 

 02.025;14
 02.975;14
1 / 2; n 1 


where  02.025;14 =26.119 and  02.975;14 =5.629

 =(1.16 , 2.49)


(c) Calculate an upper bound with the confidence level 95% CI for the population standard
deviation of turbidity.
(n  1)  s 2
  ;n 1
2

14(1.579) 2

2
0.05;14
=1.214 where  02.95;14 =23.685
8
Discussion on finding the confidence interval for the linear combination of the population means
Example 16 (Exercise 7.53, the 6th edition Exercise 7.51, the 5th edition): Four different groves of
fruit trees are selected for experimentation. The first three groves are sprayed with pesticides and
the fourth is treated with the ladybugs. We like to measure the difference in true average yields
between treatment with pesticides and treatment with ladybugs. Compute the 95% CI for
  13 ( 1   2   3 )   4 where  i is the ith true average yield.
_
Treatment
si
ni
xi
1 (pesticide) 100 10.5 1.5
2 (pesticide) 90 10.0 1.3
3 (pesticide) 100 10.1 1.8
4 (ladybugs) 120 10.7 1.6
^
1 _
3
_
_


_
1
3
   x1  x 2  x 3   x 4  (10.5  10  10.1)  10.7 =-0.5
2
2 2 2
_ 
_ 
 _ 
 _  1 
 ^  1
Var   Var x1   Var x 2   Var x 3    Var x 4    1  2  3   4
n3  n4
  9
 
 
 
  9  n1 n2
2
2
2
2
 ^  1  1.5 1.3 1.8  1.6
Estimated Var   
=0.0295



90 100  120
  9  100
95% CI for  is  0.5  1.96 0.0295 =(-0.8366 , -0.1634)
Related documents