Download Interval Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
MGMT 201: Statistics
Interval Estimation (ASW Chapter 8)
 What is interval estimation?
 An interval estimate is range of values such we believe a certain characteristic falls within that
range with some given probability.
 This is quite useful in nearly every field. Consider political polling, space flights, etc.
 Interval Estimation of a Population Mean
 Recall that x   is defined as the sampling error.



We typically will never know the sampling error, but we can say something about it.
Basic Intuition: The CLT tells us that x will be approximately normally distributed when n is
large. We can therefore use the normal table to establish confidence intervals (and, in fact, do
much more).
example: Consider the following population data.
14
91
87
48
18
24
32
12
34
33
80
25
42
24
33
67
24
67
95
77
82
36
0
5
35
85
30
18
82
91
44
25
76
0
86
88
2
91
75
91
4
70
25
58
31
72
63
73
48
66
24
55
16
25
64
34
97
32
12
88
96
35
94
72
90
91
78
41
97
29
 =43.686; =30.251
 If we were to draw a random sample of size n from the population….
 We are 95% certain that x will fall within 1.96 standard deviations of the mean….BUT,
we need the standard deviation of x . Suppose n=36.
 Since we know the population, we can use the finite population formula. Most software
packages do not provide for this and instead use the infinite population formula. This is
reasonable because if we know the population, there’s no need to be taking samples!
Below are the solutions using both approaches.
 Finite population approach (the correct one for this problem).



43.686+1.963.54 (36.747 and 50.621).
 This is our interval estimate.
Infinite population approach (the one used by software packages and encountered
most often in reality)



70  36  30.251 

  3.54 .
70  1  36 
So, we are 95% sure that x will fall between 43.686–1.963.54 and
x 
x 
30.251
36
 5.04 .
So, we are 95% sure that x will fall between 43.686–1.965.04 and
43.686+1.965.04 (33.804 and 53.568).
 This is our interval estimate.
We might alternatively look at a 99% confidence interval.


From the normal table, we see that we are 99% sure that x will fall within 2.576
standard deviations of the mean.
 Finite population: We are 99% sure that x will fall between 34.567 and 52.801.
 Infinite population: We are 99% sure that x will fall between 30.698 and 56.674.
 We call such statements precision statements.
 To be clear, let me stress that the finite population approach is the correct one for this
problem. In nearly every “real” case, we will be using the infinite population approach.
In general, P x    z / 2  x  1   .





Here,  is the probability of falling outsider the given range and 1- is the confidence level.
z/2 corresponds to the area in the tail. So if =5%, we want to find the entry in the normal
table corresponding to a tail of area 2.5%. We therefore look up 0.5-0.025 = 0.475 to find the
appropriate z-score.
example: A random variable has population mean 200 and population standard deviation of
50. What is a 90% confidence interval for a sample size of 100?
  = 10%, so we need to find the z score corresponding to a tail of 5% (i.e., we look up
0.45). From the normal table, we see that z5% = 1.645.

x 
50
100
 5 , so a 90% confidence interval is {200-1.6455, 200+1.6455} =
{191.775,208.225}. We are 90% sure that x will fall in this range when n=100.


Formally, we write x  z / 2

n
for a 1- confidence interval.
Now, suppose we want to choose n so that we are 95% sure that x will fall between 190
and 210. What is n?
 Recall that z2.5% = 1.96. We therefore need  x to satisfy 1.96  x = 10. So,  x =
5.102.


, so n  
x 
n
 x

2
  50 
  
  96.04 .
  5.102 
2

We therefore need a sample size of 97 to be 95% sure that x will fall between 190
and 210.

Notice that z / 2

n
 E here (E is the margin of error specified in the units of the
  z / 2 
random variable), so we can rearrange the formula to get n  

 E 
2
 50 1.96 
In this problem, E=10, =50, and z/2 = 1.96, so n  
 = 96.04.
 10 
2

 Interval Estimation using Sample Means
 In most cases, we do not know the population parameters. In such cases, we must use the
information contained in samples to extract information about the underlying population. The
only substantial difference in the approach is that we use s instead of .
 example: Consider election polls and suppose that a poll of 1000 likely voters showed Bush with
48%. What is the margin of error?

Recall that s p 
p1  p 
, so s p 
n 1
0.48  1  0.48
 0.0158
1000  1


The “margin of error” is typically quoted as a 95% confidence interval, so we are interested
in the range of numbers within 1.96 standard deviations of the mean. In this case,
1.960.0158 = 0.03097. The margin of error is then about 3%.
example: Suppose that we want to do an election poll and want a margin of error of 1%. What
sample size do we need?
 We do not know p or p prior to taking the sample, so we must arbitrarily choose some
value.

p1  p 
is highest when p=0.5. Any other value results in a numerator
n
Notice that  p 
that is less than 0.25. Because this represents the worst case scenario, it is common practice
to assume p=0.5 when designing a poll.


We want 1.96 
0.5  1  0.5
 0.01 .
n
 Solving gives n=9604.
example: Consider the following sample data:
1698
1926
1566
1812
1858
1807
1241
1248
1263
1367
1687
1388
1119
1714
1022
1544
1881
1636
1389
1039
1875
1492
1552
1827
1848
1341
1601
1053
1768
1408
1503
1372
1786
1550
1474
1257
1238
1625
1648
1842
 x = 1565.14; s = 274.95
 What is a 92% confidence interval?


sx 
s
n

274.95
70
1388
1454
1210
1604
1915
1103
1623
1496
1255
1536
1097
1872
1253
1538
1957
1586
1188
1885
1960
1549
1825
1991
1697
1251
1938
1989
1890
1885
1541
1817
 32.863
From the normal table, we see that 0.46 corresponds to a z score of about 1.75. We look up
0.46 because it is half of 92%. In that way, we allow for a 4% tail on each side of the
distribution.
 So, a 92% confidence interval is [1565.14-1.7532.863, 1565.14+1.7532.863] =
[1507.6,1622.6]. We are 92% certain that  is within that range.
 Interestingly, I generated the data using a discrete uniform distribution between 1000 and
2000. That distribution has  = 1500. We therefore might incorrectly assume from the
sample that the population mean is above 1500. This error is called a Type I error and will be
considered in the next chapter.
 Dealing with Small Samples
 The Central Limit Theorem is great, but we are often faced with situations in which we have a
small sample. Perhaps, for example, we are doing a test that destroys the product. If we use a
large sample, it is extremely costly.
 If the underlying distribution is not approximately normal in distribution, we are in a bind.
Unless we know the underlying distribution and can form other test statistics, we cannot
adequately compute test statistics. In such cases, our only option is to increase the sample size to
the point where the CLT is reasonable.
 If, however, the underlying distribution is approximately normal, we have some hope.



Why?
Recall that if the underlying random variable is normally distributed, then x will be
“approximately” normally distributed for any n.
 By approximately, I mean that x has a normal-like shape, but isn’t quite normal for small n.
Fortunately, we know the distribution of x for those cases.
The t-Distribution
 The t-distribution is used in precisely the same way as the normal, but is used when n is small
and the underlying distribution is approximately normal in distribution.
standard
normal
f(x)
t with
n = 30
t with
n = 15

x


Notice that as n gets larger, the t-distribution becomes closer and closer to the normal
distribution.
example: Consider the following sample data:
47.00
67.33
43.10
37.22
28.16
33.10
52.44
47.66
31.53
62.76
60.95
40.22
61.98
39.13
42.26
 What is an 80% confidence interval?
 Looking at the table (Table 2 in Appendix B), we see that the t-distribution requires us to
choose something called the degrees of freedom. Intuitively, the degrees of freedom are the
number of opportunities the data has to vary. We have 15 observations, but 14 degrees of
freedom.
 Also notice that the t table differs from the normal table. Instead of having the z-score on the
axis and the probabilities tabulated, the t table has the probability and degrees of freedom on
the axes and the equivalent of the z-score tabulated.
 For an 80% confidence interval, we need 10% in each tail. So, we look up the entry for 14
degrees of freedom and 0.10 in the upper tail. This gives us t = 1.345.

x = 46.32 and s x 
12.34
15
 3.19 , so the 80% confidence interval is [46.32-
1.3453.19,46.32+1.3453.19] = [42.04,52.61].

Formally, we write x  t / 2
s
n
for a 1- confidence interval.