Download Chapter 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 2
Statistical Background
2.3 Random Variables and
Probability Distributions
• A variable X is said to be a random variable (rv)
if for every real number a there exists a
probability P(X ≦ a) that X takes on a value less
than or equal to a
• Thus P(X = x) is the probability that the random
variable X takes the value x.
• P( x1≦ X ≦ x2 ) is the probability that the random
variable X takes values between x1 and x2, both
inclusive.
2.3 Random Variables and
Probability Distributions
• A formula giving the probabilities for different
values of the random variable X is called a
probability distribution in the case of discrete
random variables.
• Probability density function (denoted by p.d.f.)
for continuous random variables. This is
usually denoted by f(x)
2.3 Random Variables and
Probability Distributions
• In general, for a continuous random variable,
the occurrence of any exact value of X may be
regarded as having a zero probability.
• Hence probabilities are discussed in terms of
some ranges.
• These probabilities are obtained by integrating
f(x) over the desired range.
2.3 Random Variables and
Probability Distributions
• For instance, if we want Prob (a≦ X ≦b),
this is given by
b
Prob (a  X  b)   f ( x) dx
a
c
F (c)  Prob ( X  c) 


f ( x) dx
2.4 The Normal Probability Distribution
and
Related Distributions
• There are some probability distributions for
which the probabilities have been tabulated
and which are considered suitable descriptions
for a wide variety of phenomena.
• These are the normal distribution and the x2, t,
and F distributions
2.4 The Normal Probability Distribution
and
Related Distributions
• There is also a question of whether the normal
distribution is an appropriate one to use to
describe economic variables.
• However, even if the variables are not normally
distributed, one can consider transformations of
the variables so that the transformed variables
are normally distributed.
2.4 The Normal Probability Distribution
and
Related Distributions
The Normal Distribution (an example)
• The normal distribution is a bell-shaped
distribution which is used most extensively in
statistical applications in a wide variety of fields.
• Its probability density function is given by
 1
1
2
f ( x) 
exp  2 ( x   ) 
 2
 2

   x  
2.4 The Normal Probability Distribution
and
Related Distributions
x1 ~ N (1 ,  ) and
2
1
x2 ~ N (2 ,  )
and the correlation between x and x is
2
2
 , then
a1 x1  a2 x2 ~ N (a1 1 a2 2 , a12 12  a22 22  2 a1a2 1 2 )
In particular,
x1  x2 ~ N (1  2 , 12   22  21 2 )
and
x1  x2 ~ N (1  2 , 12   22  21 2 )
2.4 The Normal Probability Distribution
and
Related Distributions
2
X ~ Distribution
• If x1 , x2 , …, xn are independent normal
variable with mean zero and variance 1,
that is, x1 ~ IN (0 , 1) , i  1, 2 ,...., n , then
Z x
i
2
i
is said to have the X 2- distribution with
degrees of freedom (d.f.) n, and we will
write this as Z ~ X n2 .
2.4 The Normal Probability Distribution
and
Related Distributions
• The subscript n denotes the d.f.
2
n
• The X - distribution is the distribution of
the sum of squares of n independent
standard normal variables.
2.4 The Normal Probability Distribution
and
Related Distributions
2
• If xi ~ IN ( 0 ,  ) ,then Z should be defined as
Z 
i
xi2
2
• The X 2 -distribution also has an “additive
property,” although it is different from the
property of the normal distribution and is much
more restrictive.
2.4 The Normal Probability
Distribution and
Related Distributions
• The property is:
If
Z1 ~ X n2and Z 2 ~ X m2and Z1 and Z 2are
independent,
then
Z1  Z 2 ~ X
2
nm
2.4 The Normal Probability Distribution
and
Related Distributions
t - Distribution
• If x ~ N (0,1) and y ~ X n2 and x and y are
independent, Z  x y n has a t-distribution with
d.f. n.
• We write this as Z ~ tn
.
• The subscript n again denotes the d.f.
2.4 The Normal Probability Distribution
and
Related Distributions
• Thus the t-distribution is the distribution of a
standard normal variable divided by the square
root of an independent averaged X 2 variable
( X 2 variable divided by its degrees of freedom).
• The t-distribution is a symmetric probability
distribution like the normal distribution but is
flatter than the normal and has longer tails.
• As the d.f n approached infinity, the t-distribution
approaches the normal distribution.
2.4 The Normal Probability Distribution
and
Related Distributions
F ~Distribution
• If y1 ~ xn21 and y
2
~x
2
n2
and y1 and y 2 are
independent, Z  ( y / n ) /( y / n )
1
1
2
2
has the F-distribution with d.f. n1 and n2.
• We write this as Z ~ F
n1, n 2
2.4 The Normal Probability
Distribution and
Related Distributions
F ~Distribution
• The first subscript, n1, refers to the d.f. the
numerator, and the second subscript, n2 ,refers to
the d.f. of the denominator.
• The F -distribution is thus the distribution of the
ratio of two independent averaged X 2 variables.
2.5 Classical Statistical Inference
• Statistical inference is the area that
describes the procedures by which we use
the observed data to draw conclusions
about the population from which the data
came or about the process by which the
data were generated.
2.5 Classical Statistical Inference
• Broadly speaking, statistical inference can
be classified under two headings:
– Classical inference
– Bayesian inference.
2.5 Classical Statistical Inference
• In Bayesian inference we combine sample
information with prior information.
• Suppose that we draw a random sample y1 ,
y2 , …, yn of size n from a normal population with
mean  and variance

2
(assumed known),
and we want to make inferences about  .
2.5 Classical Statistical Inference
• In classical inference we take the sample mean
2
as our estimate of  .Its variance is 
n
.
• The inverse of this variance is known as the
sample precision. Thus the sample precision
is n
2 .
y
2.5 Classical Statistical Inference
• In Bayesian inference we have prior information
on  .
• This is expressed in term of a probability
distribution known as the prior distribution.
• Suppose that the prior distribution is normal with
mean  0 and variance  02 , that is, precision 1 2 .
0
2.5 Classical Statistical Inference
• We now combine this with the sample
information to obtain what is known as the
posterior distribution of  .
• This distribution can be shown to be normal.
• Its mean is a weighted average of the sample.
2.5 Classical Statistical Inference
• Mean y and the prior mean  0 , weighted by the
sample precision and prior precision, respectively.
Thus
w yw 
 (Bayesian ) 
Where
w1  n
1
2
0
w1  w2
 sample precision

w2  n 2  prior precision
0
• Also, the precision (or inverse of the variance) of the
posterior distribution of  is w1  w2 , that is, the
sum of sample precision and prior precision.
2
2.5 Classical Statistical Inference
• For instance, if the sample mean is 20 with variance
4 and the prior mean is 10 with variance 2, we have
posterior
posterior
1 1 1 4
variance  (  )   1.33
4 2
3
1
1
(20)  (10)
10
4
2
mean 

 13.33
1 1
3

4 2
4
• The posterior mean will lie between the sample
mean and the prior mean.
• The posterior variance will be less than both the
sample and prior variance.
2.5 Classical Statistical Inference
• 1. Point estimation.
• 2. Interval estimation.
• 3. Testing of hypotheses.
2.6 Properties of Estimators
• There are some desirable properties of
estimators that are often mentioned in the book.
• These are:
– 1. Unbiasedness.
– 2. Efficiency.
– 3. Consistency.
• The first two are small-sample properties. The
third is a large-sample property.
2.6 Properties of Estimators
Unbiasedness
• An estimator g is said to be unbiased for

if Eg    ,that is, the mean of the sampling
distribution of g is equal to  .
• What this says is that if we calculate g for each
sample and repeat this process infinitely many times,
the average of all these estimates will be equal to  .
2.6 Properties of Estimators
• If Eg    , then g is said to be biased and we
refer to Eg    as the bias.
• Unbiasedness is a desirable property but not at
all costs.
• Suppose that we have two estimator g1 and g2
can assume values far away from  and yet
have its mean equal to  ,whereas g2 always
ranges close to  but has its mean slightly away
from  .
2.6 Properties of Estimators
• Then we might prefer g2 to g1 because it has
smaller variance even though it is biased.
• If the variance of the estimator is large, we can
have some unlucky samples where our estimate is
far from the true value.
• Thus the second property we want our estimators
to have is a small variance.
• One criterion that is often suggested is the meansquared error (MSE), which is defined by
MSE  (bias ) 2  variance
2.6 Properties of Estimators
Efficiency
• The property of efficiency is concerned with the
variance of estimators.
• Obviously, it is a relative concept and we have to
confine ourselves to a particular class.
• If g is an unbiased estimator and it has the minimum
variance in the class of unbiased estimators, g is said
to be an efficient estimators.
• We say that g is an MVUE (a minimum-variance
unbiased estimator).
2.6 Properties of Estimators
• If we confine ourselves to linear estimators,
that is,
g  c1 y1  c2 y2  .....  cn yn
• where the c’s are constants which we choose
so that g is unbiased and has minimum
variance, g is called a BLUE (a best linear
unbiased estimator).
2.6 Properties of Estimators
Consistency
• Often it is not possible to find estimators that have
desirable small-sample properties such as
properties.
• In such cases, it is customary to look at desirable
properties in large samples.
• These are called asymptotic properties.
– Three such properties often mentioned are
consistency, asymptotic unbiasedness ,and
asymptotic efficiency
2.6 Properties of Estimators
• Suppose that ˆ is the estimator of  base on a
n
sample of size n.
• Then the sequence of estimators ˆn is called a
consistent sequence if for any arbitrarily small
positive numbers  and  there is a sample
size n0 such that
prob
 ˆ      1  
n
for all
n  n0
2.6 Properties of Estimators
• That is, by increasing the sample size n the
estimator ˆ can be made to lie arbitrarily close
n
to the true value of  with probability arbitrarily
close to 1.
• This statement is also written as
lim
P
n 
 ˆ
n

  1
And more briefly we write it as
ˆnp  
or
plim ˆn  
2.6 Properties of Estimators
• A sufficient condition for

to be consistent is that
the bias and variance should both tend to zero as
the sample size increase.
• This condition is often useful to check in practice,
but it should be noted that the condition is not
necessary.
• An estimator can be consistent even if the bias
does not tend to zero
• An example: unbiased and consistent estimate