Download Statistics Lecture 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data, Models, Parameters, and Statistics
In this lecture, we will see more datasets and give a brief introduction to some typical
models and setups.
In statistics, our starting point is a collection of data X 1 , X 2 ,, X n . Each X i could
be a number, a vector, or even a matrix. Our goal is to draw useful information from
the data.
Examples:
1. Old faithful data.
data(faithful)
faithful
eruptions: numeric Eruption time in minutes.
Waiting: numeric Waiting time to next eruption (in minutes).
2. ChickWeight data
data(ChickWeight)
ChickWeight
Weight: a numeric vector giving the body weight of the chick (gm).
Time: a numeric vector giving the number of days since birth when the measurement
was made.
Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the
chick.
Diet: a factor with levels 1,...,4 indicating which experimental diet the chick received.
3. Longley's Economic Regression Data
data(longley)
longley
This is a macroeconomic data set which provides a well-known example for a highly collinear
regression.
GNP.deflator: GNP implicit price deflator (1954=100)
GNP: Gross National Product.
Unemployed: number of unemployed.
Armed.Forces: number of people in the armed forces.
Population: ‘noninstitutionalized’ population >= 14 years of age.
Year: the year (time).
Employed: number of people employed.
lm(Employed ~ GNP,data=longley)
4. Air passenger data.
data(AirPassengers)
AirPassengers
Assumptions: Once we have a dataset, we need proper assumptions to do statistical
inferences (Estimation, Testing, Prediction, Confidence Interval, etc).
1.
2.
3.
4.
5.
The samples are independent.
The samples are identically distributed.
Relationship among the coordinates of each sample (linear, for example).
The samples follow a particular distribution (normal, exponential, uniform, etc.).
……..
We should be careful when apply those assumptions on the dataset.
Parameters: If we assume the samples follow some particular distribution, there will
be parameters for the distribution, generally unknown.
Example : Michaelson-Morley Speed of Light Data.
data(morley)
morley
attach(morley)
hist(Speed)
qqnorm(Speed)
The samples of Speed are approximately normal, so assume Speed follow a
N ( ,  2 ) distribution is reasonable. But the parameters  and  2 are unknown.
We need to estimate them in some cases.
Basic Models and Goals
1. Estimation.
Observe i.i.d. samples X 1 , X 2 ,, X n . They follow some distribution with parameter
 . Our goal is to estimate  , or more generally, a function of  , q( ) .
2. Confidence Interval.
We do not need a actual estimate of the parameter. But we want to find a interval
(Cl , Cu ) such that it will cover the true parameter with high probability (for example,
95%).
3. Hypothesis Testing.
We want to get a yes or no answer to some questions. Foe example   0 or   0 ,
1   2 or 1   2 .
For example:In ChickWeight data, we want to compare the weight of Chicken with
different diet.
4. Prediction.
Predict the value of next observation. For example, the air passenger data.
5. Linear Regression Model.
We observe paired data. ( x1 , y1 ), , ( xn , y n ) . We assume x1 , x2 ,, xn are
nonrandom and y1 , y 2 ,, y n are realization of the random variables
Yi    xi   i
where  i are independent random variables with expectation 0 and variance  2 .
 ,  and  2 are unknown parameters. y     x is called the regression line.
We want to estimate it.
data(trees)
attach(trees)
plot(Volume, Girth)
Measurement of Performance
Once we got an answer to a statistics problem, we need to know how good it is. We
need to measure the performance of our decision.
Unbiased estimation.
Mean squared error.
Efficiency.
……
Unbiased Estimator
In this lecture, we will study the estimation problem. Our goal here is to use the
dataset to estimate a quantity of interest. We will focus on the case where the quantity
of interest is a certain function of the parameter of the distribution of samples.
Examples:
1. data(morley)
We want to estimate the speed of light, under normal assumption.
2. Exponential distribution. (life time of a machine)
X<-rexp(100,rate=2)
Let us pretend that we do not know the true parameter (which is 2), and estimate it
based on the samples.
An estimate is a value that only depends on the dataset x1 , x2 ,, xn , i.e., the estimate
t is a function of the data set t  h( x1 , x2 ,, xn ) .
One can often think of several estimates for the parameter of interest.
In example 1, we could use sample mean or sample median.
In example 2, we could use the reciprocal of the sample mean or
log 2
.
sample median
Then we need to answer the following questions:
When is one estimate better than another? Does there exist a best estimate?
Since the dataset x1 , x2 ,, xn is a realization of random variables X 1 , X 2 ,, X n .
So the estimate t is a realization of random variable T  h( X 1 , X 2 ,, X n ) . T is
called an estimator.
Example:
y<-rep(0,50);
z<-rep(0,50);
for (i in 1:50) {
X<-rexp(100,rate=2);
y[i]<-1/mean(X);
z[i]<-log(2)/median(X);
}
For each set of samples, we have an estimate. So the estimator T 
100
is
X 1    X 100
a random variable. We need to investigate the behavior of the estimators.
hist(y); mean(y); var(y);
hist(z); mean(z); var(z);
The mean squared error of the estimator is defines as MSE (T )  E [(T   ) 2 ]
mean((y-2)^2)
mean((z-2)^2)
Now we know that an estimator is a random variable. The probability distribution of
T  h( X 1 , X 2 ,, X n ) is also called the sampling distribution of T .
Definition: An estimator T is called an unbiased estimator for parameter  , if
E (T )   for all  . Generally, the difference E (T )   is called the bias of T .
Let us consider the normal mean problem. Suppose X 1 , X 2 ,, X n follow N (  ,1)
distribution and we want to estimate  . Since  is the expectation of the
distribution, an intuitive estimator will be T 
X1    X n
 X n . This is an
n
unbiased estimator.
Unbiased estimator for expectation and variance
Suppose X 1 , X 2 ,, X n are i.i.d. random variables with mean  and variance  2 .
Now we have the following unbiased estimators for both of them.
Xn 
X1   X n
is an unbiased estimator of  and
n
S n2 
1 n
( X i  X n ) 2 is an unbiased estimator of  2 .

n  1 i 1
Remark: Unbiaed estimators do not necessarily exist. Unbiasedness does not
always carry over. T is an unbiased estimator of  does not mean g (T ) is an
unbiased estimator of g ( ) , unless g is a linear function.
Method of Moments
From the previous normal example, we can see that if the parameter of interest is the
expectation or variance of the distribution, we can use the sample expectation or
sample variance to estimate it. This estimator is reasonable.
Suppose we have i.i.d. samples X 1 , X 2 ,, X n follow some distribution with
unknown parameter  . Now we want to estimate this parameter  . We first
calculate the expectation of the distribution E ( X ) . Usually, this is a function of 
(think about normal or exponential distribution). Suppose E ( X )  g ( ) , then under
suitable conditions,  can be written as   g 1 ( E ( X )) . Since we can always use
sample mean to estimate the expectation, we have an intuitive estimator for  ,
ˆ  ˆ( X 1 , X 2 ,, X n )  g 1 ( X n )
In general, we can calculate the expectation of a function of X . Suppose
E (m( X ))  h( ) for some function m(x) . In the previous discussion, m( x)  x .
Actually, m(x) could be any function, for example x 2 , x 3 , e x …as long as its
expectation is easy to compute. Then an estimator of  would be
ˆ  ˆ( X 1 , X 2 ,, X n )  g 1 (
m( X 1 )  m( X 2 )    m( X n )
)
n
This method is called method of moments. From the Law of Large Number, we know
that these estimators are not bad.