Download Notes 1 - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Statistics 550 Notes 1
Reading: Section 1.1.
I. Basic definitions and examples of models (Section 1.1.1)
Goal of statistics: Draw useful information from data.
Model based approach to statistics: Treat data as the
outcome of a random experiment that we model
mathematically.
Random Experiment: Any procedure that (1) can be
repeated, theoretically, an infinite number of times; and (2)
has a well-defined set of possible outcomes (the sample
space).
The outcome of the experiment is data X .
Examples of experiments and data:
 Experiment: Randomly select 1000 people without
replacement from the U.S. adult population and ask
them whether they are employed. Data:
X  ( X1 , , X1000 ) . X i  1 if person i in sample is
employed, X i  0 if person i in sample is not
employed.
 Experiment: Randomly sample 500 handwritten ZIP
codes on envelopes from U.S. postal mail . Data:
X  ( X1 , , X 500 ) . X i = 216 x 72 matrix where
elements in the matrix are numbers from 0 to 255 that
represent the intensity of writing in each part of the
image.
The probability distribution for the data X over repeated
experiments is P .
Frequentist concept of probability:
P ( X  E) = Proportion of times in repeated experiment that
the data X falls in the set E.
(Statistical) Model: Family of possible P ’s:
P P = { P , } . The  ’s label the P ’s and  is a
space of labels called the parameter space.
Goal of statistical inference: On the basis of the data X ,
make inferences about the true P that generated the data.
We will study three types of inferences:
(1) Point estimation – best estimate of  .
(2) Hypothesis testing – decide whether  is in a
specified subset of  .
(3) Interval (set) estimation – estimate a set that  lies
in.
Goal of this course: Study how to make “good” inferences.
Examples of statistical models:
Example 1: Shaquille O’Neal’s free show shooting.
The following are the number of free throw attempts and
number of free throws made by Shaquille O’Neal during
each game of the 2000 NBA playoffs:
Game
1
2
3
4
5
6
7
8
9
10
11
12
Number Number
made
of
Attempts
4
5
5
11
5
14
5
12
2
7
7
10
6
14
9
15
4
12
1
4
13
27
5
17
Game
13
14
15
16
17
18
19
20
21
22
23
Number Number
made
of
attempts
6
12
9
9
7
12
3
10
8
12
1
6
18
39
3
13
10
17
1
6
3
12
Experiment: In a sequence of 23 games, Shaq shoots 5 free
throws in the first game, 11 free throws in the second game,
..., 12 free throws in the 23rd game.
Data: X = (X 11 , , X 15 , X 21 , , X 2,11 , , X 23,1 , , X 23,12 ) .
X ij  1 or 0 according to whether Shaq makes his jth free
throw in his ith game.
Potential model:
(X 11 , , X 15 , X 21 ,
, X 2,11 ,
, X 23,1 ,
, X 23,12 ) are
independent and identically distributed (iid) Bernoulli
random variables with P ( X ij  1)  p .
  p ,   [0,1] .
Commentators remarked that Shaq’s shooting varied
dramatically from game to game.
Another model:
(X 11 , , X 15 , X 21 ,
, X 2,11 ,
, X 23,1 ,
, X 23,12 ) are
independent.
X ij are Bernoulli pi , i  1, , 23 .
  ( p1 , , p23 ) ,   ([0,1])23 .
Choosing models:
Consultation with subject matter experts and knowledge
about how the data are collected are important for selecting
a reasonable model.
George Box (1979): “Models, of course, are never true but
fortunately it is only necessary that they be useful.”
We will focus mostly on making inferences about the true
P conditional on the model’s validity, i.e.,
P P = { P , } , but another important step in data
analysis is to investigate the model’s validity through
diagnostics (techniques for doing this will be discussed in
Chapter 4).
II. Parameterization and Parameters (Section 1.1.2)
Model: P P = { P , } . The vector  is a way of
labeling the distributions in the model.
Parameterization: Formally, an onto map from a parameter
space  P is called a parameterization of P . The
parameterization is a way of labeling the distributions in
the model.
The parameterization is not unique. For example in
Example 1, Model 1: Instead of using the parameterization
  p,   [0,1] , we can use the parameterization
  10 p,   [0,10] to label the distributions in the model.
We try to choose a parameterization in which the
components of the parameterization are interpretable
in terms of the phenomenon we are trying to measure.
Example 3: Sal is a pizza inspector for the city health
department. Recently, he has received a number of
complaints directed against a certain pizzeria for allegedly
failing to comply with their advertisements. The pizzeria
claims that on the average, each of their large pepperoni
pizzas is topped with 2 ounces of pepperoni. The
dissatisfied customers feel that the actual average amount
of pepperoni used is considerably less than that. To settle
the matter, Sal takes a random sample of 10 pizzas. The
data is ( X 1 , , X10 ) , the amount of pepperoni on each of
the ten pizzas.
Sal assumes the model is
( X 1 , , X10 ) iid N (  ,  2 ) (where  ,  2 are the mean and
variance of the normal distribution respectively).
Two possible parameterizations are (  ,  ) and
2
2
(  ,    ).
2
Parametric vs. Nonparametric models: Models in which
 is a nice subset of a finite dimensional Euclidean space
are called “parametric” models, e.g., the model in Example
3 is parametric. Models in which  is infinite dimensional
are called “nonparametric.” For example, if in Example 3,
we considered ( X 1 , , X10 ) iid from any distribution with a
density, the model would be nonparametric.
Identifiability: The parameterization is identifiable if the
map  P is one-to-one, i.e., if 1   2  P1  P2 .
The parameterization is unidentifiable if there exists
1   2 such that P1  P2 .
When the parameterization is unidentifiable, then parts of
 remain unknowable even with “infinite amounts of data”,
i.e., even if we knew the true P
Example 4: Suppose X 1 ,..., X n iid. Exponential with mean
 , i.e.,
1
n
f ( x1 , , xn )  n exp( i 1 xi /  )

The parameterization    is identifiable. The
parameterization ( 1 ,  2 ) where 1 2   is unidentifiable
because P( 1 , 2 )  P( 12 ,1) .
Parameter: A parameter is a feature  ( P ) of P , i.e., a map
from P to another space N .
2
e.g., for Example 3, ( X 1 , , X10 ) iid N (  ,  ) ,
 ,the mean of each X i , is a parameter.
 2 , the variance of each X i , is a parameter.
 2   2  E ( X i2 ) is a parameter.
Some parameters are of interest and others are nuisance
parameters that are not of central interest.
2
In Example 3, for the parameterization (  ,  ), the
parameter  is the parameter of interest and the parameter
 2 is a nuisance parameter. The pizzeria’s claim concerns
the average amount of pepperoni.
A parameter is by definition “identified,” meaning that if
we knew the true P , we would know the parameter.
For a given parameterization   P ,  is a parameter if
and only if the parameterization is identifiable.
Proof: If the parameterization is identifiable, then  is
equal to the inverse of the parameterization which maps
  P . If the parameterization is not identifiable, then for
some 1 , 2 , we have P1  P 2 and consequently we can’t
write    ( P) for any function  .
Remark: Even if the parameterization is unidentifiable,
components of the parameterization may be identifiable
(i.e., parameters).
Why would we ever want to consider an unidentifiable
parameterization?
Components of the parameterization may capture the
scientific features of interest. We may be interested if
certain components of the parameterization are identifiable.
Example 5: Suppose X 1 ,
, X n are iid from density
f ( x)   g ( x)  (1   )h( x) , 0    1where g and h are
densities (nothing further assumed about g and h ). A
setting in which this model arises is one in which we take a
health measurement X i on individuals, some of which have
a disease and some of which are healthy, but we do not
have a way of diagnosing which individuals have the
disease.  represents the proportion of individuals with the
disease and g , h represent the densities for diseased and
healthy individuals respectively. An example of this
setting is in the study of malaria, where X i is the parasite
level of an individual.
The parameterization ( g , h,  ) is unidentifiable:
Suppose f ( x)   g ( x)  (1   )h( x) for some   1 . Then
f ( x)   * g * ( x)  (1   * )h* ( x) where
 *  1, g * ( x)   g ( x)  (1   )h( x), h* ( x) can be any
density.
However, certain features of the model are identified (i.e.,
parameters). For example, the mean of the observations,
E ( X )   f ( x)dx , is identified (i.e., a parameter).
III. Statistics
A statistic Y  T ( X ) is a random variable or random
vector that is a function of the data.
Example 3 continued: ( X1 ,
statistics are

X 
n
i 1
, X n ) iid N (  ,  2 ) . Two
Xi
n
and the sample variance
1 n
s 
( X i  X )2 .

n  1 i 1
2
Section 1.1.4: Examples, Regression Models. Provides an
example of one of the most important models in statistics.