Download 252onea - On-line Web Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Misuse of statistics wikipedia , lookup

Probability amplitude wikipedia , lookup

Law of large numbers wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
1/24/07 252onea (Open this document in 'Outline' view!)
ECONOMICS 252 COURSE OUTLINE
A. Parameter Estimation
1. Review of the Normal Distribution
See 251greatD, 251distrex2, 251distrex3, 251distrex4
2. Point and Interval Estimation
Point and Interval Estimation. Properties of Estimators.
Let ˆ be an estimator for  .
 
i. Unbiassedness E ˆ   .
ii. Consistency (As sample size gets larger, estimate gets better.).
iii. Efficiency ( ˆ has a small variance). Define BLUE.
iv. Maximum Likelihood ( ˆ is the value of  that is most likely to have produced the observed
data)
3. A Confidence Interval for the Mean when the Population Variance is
Known.
a. A Two-Sided Confidence Interval
An interval of this type is used in two situations: (i) where the population variance,  2 , is in fact,
known and the sample size is relatively large; or (ii) where the variance is not known and the sample
variance, s 2 , is used to replace  2 , but the degrees of freedom are so large that the appropriate value of
t n 1 is not very different from z . The first of these situations is not very realistic, but serves as a good
introduction to confidence intervals. The formula for this type of confidence interval for the mean is,
  x  z  x , where  x  
. Note: If n  .05 N , use  x 
n
( n is sample size and N is population size) See 252oneaex1.

2
n
N n
N 1
Don’t use this method unless you know the population variance.
b. A One-Sided Confidence Interval.
There are two types of one-sided confidence interval for the mean.
These are (i) An upper bound, and (ii) a lower bound, and have the form:   x  z  x and
  x  z  x . An example is in 252oneaex1a.
4. A Confidence Interval for the Mean when the Population Variance is not
Known.
1
"The variance is not known " implies no previous knowledge or assumption about the value of  2 .
Knowing s 2 is having a guess as to what the variance is; it is not the same as knowing the variance. If the
population distribution is normal or approximately normal, the formula for a two-sided confidence interval
for the mean is   x  tn1s x , where s x  s
. Note: If n  .05 N , use s x 
s
N n
N 1
n
2
n
See 252oneaex2 and 252oneaex3.
Note: this is the more common case – if you do not know the population variance and the sample size
is not very large, using z instead of t is a very bad idea.
2
5. Deciding on Sample Size when working with a Mean
The formula usually suggested is n 

z 2 2
e2
, where, if  is not known, it can be approximated by
x.001  x.999
.
6
6. A Confidence Interval for a Proportion.
(a. Small Samples.
Table 16 gives Confidence Intervals for proportions. These tables are of use when the conditions
do not exist in which one can use the normal distribution. For example if n  10 and p  .5 , and we wish to
find a 95% confidence interval, we can look at the horizontal axis of the upper table. There we can find
p  .5 and look up to find the upper and lower curves for n  10 . Then vertical line at p  .5 intersects
these curves. The lower curve meets the vertical line at about p  .175 . (Read up the vertical axis.). The
upper curve meets the vertical line at about p  .825 , so that our 95% confidence interval is about
.175  p  .825 .)
b. Large Samples.
More usually, using the normal approximation to the binomial distribution, and using p for the
population probability of success and q for the population probability of failure, and letting
the corresponding sample quantities, we can write p  p  z s p , where s p 
2
p and q be
pq
and q  1  p . An
n
example is in 251 proport.
c. Deciding on Sample Size.
The usually suggested formula is n 
pqz 2
, but since p is usually unknown, a conservative choice is to
e2
set p  0.5 . This is the formula everyone forgets that we covered.
7. A Confidence Interval for a Variance.
This method is only appropriate when the population distribution is normal or approximately
normal. For small samples
chi-square table use.
n  1s 2
 22
s 2DF 
z 2  2DF 
 2 
 
n  1s 2
12 2
, but if the degrees of freedom are too large for the
s 2DF 
 z 2  2DF 
. An example is in 252oneaex4.
3
(8. Appendix
A Confidence Interval for a Median.
In a situation where the population distribution is not normal, it is often more appropriate to find
the median than the mean. The process of finding a confidence interval for a median is based on one simple
fact: the probability that a single number picked at random from a population is above (or below) the
median is 50%. Similarly, the probability that any two numbers picked at random from a population are
both above (or both below) the median is 25%.. This comes from the multiplication rule: If A is the
probability that the first number is above the median, and B is the probability that the second number is
above the median, then P A  B   P A  PB  if A and B are independent events. If the probability of
both numbers being above the median is 25%, and the probability of both numbers being below the median
is 25%, then the probability that both numbers are on the same side of the median is 50%. This is due to the
addition rule: Let event C be "both numbers are above the median," and event D be "both numbers are
below the median." Then event C  D is "both numbers are on the same side of the median." The
addition rule says that if C and D are mutually exclusive, PC  D  P(C )  P( D) . Finally, if the
probability that both numbers are on the same side of the median is 50%, then the probability that the two
numbers are on opposite sides of the median is also 50%. This means that, since any two numbers picked
from the sample have a 50% chance of bracketing the median, these two numbers constitute a 50%
confidence interval.
Note that, since p , the probability that any one number is above the median, is 0.5, and q , the
probability that any one number is below the median, is also 0.5, we have a problem that resembles
finding the distribution of the number of heads on two tosses of a fair coin. If we call a head a success, the
distribution of heads on two tosses is described by the binomial distribution with n (the number of tries)
set at 2, and p (the probability of success on one try) set at 0.5. For convenience, we will use q (the
probability of failure on one try) for the probability that one number is below the median or of getting a tail
on one toss of a fair coin. It is always true that q  1  p . The formula for the binomial distribution is
Px  Cxn p x q nx , where x is the number of successes. For the probability of two successes (heads) in 2
tries, we find that P2  C22 .52 .50  1.25 1  .25 . We find the probability of two heads or two tails
in two tries by noting that the probability of two failures (tails) is P0  C02 .50 .52  .25 . Thus the
probability of two heads or two tails is P2  P0  .25  .25  .50. This is the same as the
probability of two randomly picked numbers both being on the same side of the mean.
To take this a bit further, let us assume that we take a sample of n numbers from a population and
then take two numbers at equal distances from the ends of the sample (for example, the fourth lowest and
the fourth highest of a sample of 20 numbers). We will find that it is relatively easy to figure out the
probability that these numbers bracket the median, and this will be our confidence level. This process
requires some new thinking because: (i) we find our confidence interval without using a point estimate as
we did in every previously studied method for constructing a confidence interval; and (ii) we find the
interval first and then figure out its confidence level instead of starting with a confidence level and then
figuring out the interval. This process serves as an introduction to the field of nonparametric statistics,
which is largely made up of methods that do intervals and tests without assuming that the parent distribution
(the distribution of the population from which the sample is drawn) is normal. In the case of finding a
median, the process to be explained would be unnecessary if the parent population were normal, because
in a normal population the mean and median are identical. Therefore, if the parent population is normal,
we could use a method for finding a confidence interval for the mean in place of a method for finding a
confidence interval for a median.
4
Assume that we pick a sample of four from a population, and that this sample, when put in ascending order,
is 20,25,29,30 . If we use two numbers at equal distances from the ends as our confidence interval , we
can use 20    30 or 25    29 (  is our symbol for a population median). The first of these intervals
( 20    30 ) is wrong only if all four numbers in the sample are below the median or all four numbers are
above the median. The probability that all four are above the median is the same as the probability of four
heads in four tosses, P4  C 44 .54 .50  .0625 . The probability that all four numbers are below the
median is the same as the probability of four tails on four tosses P0  C 04 .50 .54  .0625 . We can find
the probability of all four being above the median from a cumulative binomial table by noting that, for
n  4,Px  4  Px  4  1  Px  3 .
The binomial table will tell us that, for p  .5 , Px  4  1 , and
Px  3  .9375 , so Px  4  1  .9375  .0625 . Since the probability that all four numbers are below the
median, Px  0 , is the same as the probability that all four numbers are above the median, the
probability that the two numbers do not bracket the median (the probability that we are wrong or the
significance level) is   2Px  0  2.0625   .1250 . The confidence level is thus
1    1  2Px  0  1  2.0625   .8750 .
Now try picking the confidence interval 25    29 , by choosing the numbers x2 and x3 , that
is the second from the top and the second from the bottom in the ordered sample, 20,25,29,30 . This
interval is invalid if (i) the lowest three or more numbers in the sample are below the median (equivalent to
three or more tails when a coin is tossed four times), or (ii) the highest three or more numbers in the sample
are above the median (equivalent to three or more heads). The probability of the first of these events is (for
n  4 and p  .5) Px  1 , and the probability of the second event is Px  3 . But, using the binomial
table we find that Px  3  1  Px  2  Px  1  .3125 .
So the probability that the interval
does not bracket the median is 2Px  1  2.3125   .6250 , and the confidence level is
1    1  2Px  1  1  2.3125   .3750 .
5
Generalize this to a situation where we take a random sample of n items from a population and put
the numbers in ascending order so that x1  x2  x3    xn1  xn . Now pick x k and x n -k +1 , the
numbers that are the k th from the bottom and the k th from the top, respectively. This interval is invalid if
(i) all the numbers included in the interval and all the numbers below the interval are below the median or
(ii) all the numbers on the interval and all the numbers above the interval are above the median. The
probability of the first event is Px  k  1 and the probability of the second event is
Px  n  k   Px  k  1  the equality is due to the symmetry of the binomial distribution for p  .5 . So
  2Px  k  1 , and the confidence level is 1    1  2Px  k  1 . For example, if we take a sample of
100 items and put them in order and then use the interval x38    x 63  , that is, the 38th number from the
bottom and the 38th number from the top, the confidence level (from the binomial table for
n  100 and p  .5 ) is 1    1  2Px  37   1  2.0060   .9880 .
There will be some situations in which we cannot find Px  k  1 on the cumulative binomial
table. Then we must use a normal approximation to the binomial distribution, that is (using a continuity
correction), find the normal probability,


k  1  1 2  np 

k  .5  .5n 
 . (In the last part of this equality, .5 was
 P x  k  1  1 2   P z 
 P z 



2
npq
.
5
n




substituted for both p and q .) This takes us back to a more conventional formulation for the confidence
interval because we can choose k so that  z  2 
that k 
n  1  z . 2 n
2
k  .5  .5n
. If we solve this equation for k , we find
.5 n
. Thus if we want a 95% confidence interval for the median, and we take a sample
of n  150 , and pick k 
150  1  1.96 150
 63 .4975 . Our interval will then be x63    x88 .)
2
© 2002 R. E. Bove
6