Download srs.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Law of large numbers wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Simple random sampling
Peter McCullagh
∗
July 2007
1
Simple random sampling
1.1
Background and terminology
Samples and sample values
Let N be a positive integer, let [N ] = {1, . . . , N } be a set containing N
elements, and let Y : [N ] → R be a given function. In the present context,
[N ] is called the population, the elements of [N ] are called statistical units
or sampling units, and Y (1), . . . , Y (n) is the list of population values. The
value Y (i) ≡ Yi on unit i is assumed at present to be a real number; in
practice it may be vector-valued.
Let 0 ≤ n ≤ N be a non-negative integer. A one-to-one function ϕ: [n] →
[N ] is called a sample. One-to-one means that ϕ(u) = ϕ(u0 ) implies u =
u0 , so the sample contains no duplicate units. The sample is an ordered
list ϕ(1), . . . , ϕ(n) consisting of n distinct units in the population. The
composition Y ϕ: [n] → R is the list of sample values Y (ϕ(1)), . . . , Y (ϕ(n)).
The sampled units are distinct, but the sample values may not be distinct.
It is important to distinguish between the population elements, each of
which is an identifying label such as u = Mr G. Bush, and the value of Y
associated with that unit. For example, if Y = weight in kg, the value Y (u)
is a real number such as 85.1. Note that the units of measurement (kg.)
are part of the definition of the variable Y , which is contrary to normal
practice in physics and engineering. In everyday speech, we often say that
Y = weight in which case Y (u) = 85.1kg is not a real number.
If σ: [n] → [n] is a permutation of the sample labels, the composition
ϕσ: [n] → [N ] is a sample of size n whose elements ϕ(σ(1)), . . . , ϕ(σ(n))
coincide with the sample ϕ, but in a different order. Two samples ϕ, ϕ0
∗
Support for this research was provided by NSF Grant DMS-0305009
1
consisting of the same units in different orders are distinct and are not
regarded as equivalent. Accordingly, the number of distinct samples [n] →
[N ] is given by the descending factorial function
à !
N
↓n
= N (N − 1) · · · (N − n + 1) =
N
n!
n
A simple random sample is a random element ϕ having the uniform distribution on the set of injective maps [n] → [N ]. As always, the sample values
Y ϕ: [n] → R are obtained by composition with Y .
Symmetric functions
This section is concerned with properties of a function T (x1 , . . . , xn ) under
permutation of arguments. We say that T is a symmetric function if
T (xσ(1) , . . . , xσ(n) ) = T (x1 , . . . , xn )
for each permutation σ: [n] → [n]. It is immediately evident that if T, T 0
are two real-valued symmetric functions, each linear combination is also
symmetric, as is the product and the ratio T /T 0 (provided that T 0 6= 0).
A symmetric function need not be real-valued, but we focus initially on
real-valued functions on account of their simplicity. Examples of symmetric
functions:
X
xj ,
X
min(x1 , . . . , xn ),
x2j ,
x̄n ,
s2n ,
X
max(x1 , . . . , xn ),
(xj − x̄n )2 ,
med(x1 , . . . , xn ).
Examples of non-symmetric functions
x1 ,
xn ,
x1 + 2x2 + · · · + nxn
Note that the sample variance s2n is defined only for n ≥ 2. The emphasis
in this section is on homogeneous polynomial symmetric functions.
Statistic
For present purposes, a statistic is a symmetric function of the sample values
defined for each sample ϕ: [n] → [N ] with n ≥ d sufficiently large. That is
to say, a statistic is a sequence T = (Tn )n≥d in which each component Tn
is a symmetric function Rn → R. For example, the sample mean and the
total sum of squares
x̄n = (x1 + · · · + xn )/n
Sn2 =
X
2
(xi − x̄n )2
are both defined for all n ≥ 1. However, the sample variance s2n = Sn2 /(n−1)
is defined only for n ≥ 2, while the sample skewness
k3,n (x) = n
X
(xi − x̄n )3 /((n − 1)(n − 2))
is defined for n ≥ 3.
Ordinarily, a statistic is defined as an ordinary function Rn → R for
some fixed n, but here we insist on a sequence of functions. The rationale
is that the most important properties of a statistic such as the sample variance are not properties of the individual functions, but properties of the
sequence s22 , s23 , . . .. By defining a statistic as a sequence of functions, we
are forced to declare, at least implicitly, a relation between the functions Tn
and Tn+1 . Implicitly, we are saying that if we had one additional data value
we would compute Tn+1 (x1 , . . . , xn+1 ) rather than Tn (x1 , . . . , xn ), and if the
entire population were available we would compute TN (x1 , . . . , xN ). It is
therefore natural to require that the sequence of functions be related to one
another. In 1950, Tukey proposed a mathematical definition of the concept
of a natural statistic, related to simple random sampling, as follows.
Let T be a statistic with components Tn : Rn → R defined for each
n ≥ d. Consider a population of size N with values x1 , . . . , xN , and a
sample ϕ of size n with values xϕ(1) , . . . , xϕ(n) . The statistic T associates
with each sample ϕ a number Tn (xϕ), and with the population another
number TN (x). The sequence T is said to be inherited on the average if
for each n ≤ N the average sample value Tn (xϕ) is equal to the population
value TN (x). This average is taken over simple random samples ϕ: [n] → [N ]
for fixed population values x: [n] → R. In other words, for each x: [N ] → R,
E(Tn (xϕ) | x) = TN (x),
where the expectation is taken with respect to ϕ: [n] → [N ], uniformly distributed on the set of N ↓n samples.
It is worth remarking at this point that Tukey’s definition, which he
termed ‘inheritance on the average’, looks very much like the definition
of unbiasedness in parametric statistical models. However, unbiasedness
in parametric models is a property of individual functions Tn (x), whereas
inheritance is a property of the sequence. This is a subtle but fundamental
distinction. By contrast, consistency and rates of convergence are both
properties of a sequence. However, they depend only on the tail sequence
so there are no direct logical implications for finite samples.
Since each element in [N ] occurs in the image of ϕ with probability n/N ,
3
the sample total has the property that
aveϕ (xϕ(1) + · · · + xϕ(n) ) =
n
(x1 + · · · + xN ).
N
Thus, while the totals are symmetric, they do not have the inheritance
property. However, the sample averge is inherited. The sample variance can
be written in the form
2s2n =
1 X]
(xi − xj )2
n↓2 ij
with summation over distinct units, which makes it clear that this statistic is
also inherited under simple random sampling. Likewise, the sample fraction
of identical pairs
1 X]
I(xi = xj )
n↓2 ij
and the mean absolute deviation defined as
1 X]
|xi − xj |
n↓2
are both inherited for n ≥ 2. Other examples include the sample skewness
k3,n (x) and
k11,n = x̄2n − s2n /n.
In addition, if T, T 0 are inherited statistics, each linear combination αT +α0 T 0
is also inherited. In this context, T, T 0 are both statistics, i.e. T is a sequence
of symmetric functions, so the coefficients α, α0 are scalars independent of n.
Inheritance on the average is a very demanding property, satisfied by
only by very special sequences such as U -statistics. In practice, we often
work with statistics that are only approximately inherited. For example,
neither the sample median nor the inter-quartile range is inherited.
1.2
k-statistics
Sample means
For any x ∈ Rn define the following power averages
mr,n (x) =
1X r
x ,
n i i
mrs,n (x) =
1 X] r s
x x ,
n↓2 ij i j
mrst,n (x) =
1 X] r s t
x x x ,...
n↓3 ijk i j k
The sequence of functions mr = (mr,n )n≥1 is a statistic defined for n ≥ 1.
Likewise, mrs = (mrs,n )n≥2 is a sequence of symmetric functions and thus
4
a statistic in the same sense. Ordinarily, we suppress the index n and write
mrs (x) instead of mrs,n (x), the value of n being inferred from the argument
x ∈ Rn . It is evident from their construction as U -statistics that each
of these statistics has the inheritance property. Consequently each linear
combination also has the inheritance property. The combinations that have
proved to be most useful and natural for statistical purposes are certain
homogeneous polynomials called k-statistics and polykays defined as follows:
k1 (x) = m1 (x),
k11 (x) = m11 (x),
k111 = m111 ,
k1111 = m1111 ,
k2 (x) = m2 (x) − m11 (x)
k21 = m21 − m111 ,
k211 = m211 − m1111 ,
k22 = m22 − 2m211 + m1111 ,
k3 = m3 − 3m21 + 2m111
k31 = m31 − 3m211 + 2m1111 ,
k4 = m4 − 4m31 − 3m22 + 12m211 − 6m1111
In general, kr,n (x) is a polynomial of degree r in x
kr,n (x) =
X
φi1 ,...,ir xi1 xi2 · · · xir
with coefficients φi1 ,...,ir = (−1)ν−1 /n↓ν where ν is the number of distinct
values in i1 , . . . , ir . The single-index ks, called k-statistics, are due to Fisher
(1929); the multi-index ks called polykays are due to Tukey (1950, 1956).
The following rationale helps to explain the coefficients that occur in the
definition of the k-statistics. Suppose that the components of x are independent and identically distributed random variables with distribution F .
Then the expected value of the power average is E(mr (x)) = µr , the rth
moment of F . Likewise, E(mrs (x)) = µr µs is the product of two moments, E(mrst (x)) = µr µs µt is the product of three moments, and so
on. For the k-statistics of degree two, E(k11 (x)) = κ21 and E(k2 (x)) =
µ2 − µ21 = κ2 , a cumulant or product of cumulants of F . For the k-statistics
of degree three, E(k111 (x)) = κ31 , E(k21 (x)) = µ2 µ1 − µ31 = κ1 κ2 , and
E(k3 (x)) = µ3 − 3µ2 µ1 + 2µ31 = κ3 . For the k-statistics of degree four,
E(k22 (x)) = µ22 − 2µ2 µ21 + µ41 = κ22 , and E(k4 (x)) = κ4 . These expectations are for a fixed sample with values that are independent and identically
distributed. This is not to be confused with the expectation for a simple
random sample ϕ: [n] → [N ] with fixed x: [N ] → R.
Multiplication tables
The polynomial k1,n (x) is homogeneous symmetric of degree one in x, while
2 (x)
k11,n (x) and k2,n (x) are homogeneous of degree two. The square k1,n
is homogeneous of degree two, while the products k1,n k11,n and k1,n k2,n are
5
homogeneous of degree three. In order to compute variances and covariances,
it is helpful to have a multiplication table that expresses each product as
a linear combination of k-statistics and polykays. This is a multiplication
table for functions, so the coefficients in the linear combination depend on n.
Two explicit calculations may help to illustrate what is involved in the
construction of such a multiplication table.
2
k1,n
(x) = (x1 + · · · + xn )2 /n2
=
X]
xi xj /n2 +
X
ij
x2i /n2
i
↓2
2
= n m11,n (x)/n + m2,n (x)/n
= k11,n (x)(n − 1)/n + k2,n (x)/n + k11,n (x)/n
= k11,n (x) + k2,n (x)/n
2
k2,n
(x) =
=
³X
X]
φij xi xj
´2
φij φkl xi xj xk xl +
X]
(4φij φik + 2φii φjk )x2i xj xk +
X]
(2φij φij + φii φjj )x2i x2j +
X]
4φii φij
= n↓4 m1111 /(n↓2 )2 + n↓3 m211 (4/(n↓2 )2 − 2/(n2 (n − 1))) + n↓2 m22 (2/(n↓2 )2 − 1/n2 ) − 4m31 /n +
= k22,n (x)(n + 1)/(n − 1) + k4,n (x)/n
where φii = 1/n and φij = −1/n↓2 for i 6= j are the coefficients in k2,n (x).
Some additional multiplication formulae are as follows:
k12 = k11 + k2 /n
k13 = k111 + 3k21 /n + k3 /n2
k14 = k1111 + 6k211 /n + 3k22 /n2 + 4k31 /n2 + k4 /n3
k1 k2 = k21 + k3 /n
k12 k2 = k211 + k22 /n + 2k31 /n + k4 /n2
k1 k3 = k31 + k4 /n
k22 = k22 (n + 1)/(n − 1) + k4 /n
Variances
Let x: [N ] → R be given and let ϕ: [n] → [N ] be a simple random sample. The sample average k1,n (xϕ) is a random variable whose mean is
E(k1,n (xϕ) | x) = k1,N (x), the population mean. The variance is obtained
by using the multiplication table twice as follows:
2
2
var(k1,n (xϕ)) = E(k1,n
(xϕ)) − k1,N
(x)
= E(k11,n (xϕ) + k2,n (xϕ)/n) − k11,N (x) − k2,N (x)/N
6
= k11,N (x) + k2,N (x)/n − k11,N (x) − k2,N (x)/N
= k2,N (x)(1/n − 1/N ) = n−1 k2,N (x)(1 − n/N ).
In the limit as N → ∞, the variance is κ2 /n, where κ2 is the assumed limit
of k2,N (x). The factor 1 − n/N is called the finite-population correction.
The sample variance k2,n (xϕ) is a random variable whose mean under
simple random sampling
E(k2,n (xϕ) | x) = k2,N (x)
coincides with the population variance, a consequence of inheritance. The
variance is obtained by using the multiplication table twice and using the
inheritance property as follows:
2
2
var(k2,n (xϕ)) = E(k2,n
(xϕ)) − k2,N
(x)
= E(k22,n (xϕ)/(n + 1)(n − 1) + k4,n (xϕ)/n) − k22,N (x)(N + 1)/(N − 1) − k4,N (x)/N
= k22,N (x)(2/(n − 1) − 2/(N − 1)) + k4,N (x)(1/n − 1/N ).
In the limit as N → ∞, the variance is 2κ22 /(n − 1) + κ4 /n.
In a similar manner we find that
cov(k1,n (xϕ), k2,n (xϕ) | x) = k3,N (x)(1/n − 1/N ),
while the third-order cumulant of k1,n (xϕ) is
cum3 (k1,n (xϕ | x)) = k3,N (x)(1/n − 1/N )(1/n − 2/N )
References:
Dressel, P.L. (1940) Statistical seminvariants and their setimates with particular emphasis on their relation to algebraic seminvariants. Ann. Math.
Statist. 11, 33–57.
Dwyer, P.S. and Tracy. D.S. (1964) A combinatorial method for the product
of two polykays with some general formulae. Ann. Math. Statist. 35,
1174–1185.
Fisher, R.A. (1929) Moments and product moments of sampling distributions. Proc. Lond. Math. Soc. Series 2 30, 199–238.
Thiele, T.N. (1897) Elementaer Iagttageleseslaere. Reprinted in English as
The theory of observations. Ann. Math. Statist. (1931) 2, 165–308.
Tracy, D.S. (1968) Some rules for a combinatorial method for multiple products of generalized k-statistics. Ann. Math. Statist. 39, 983–998. Tukey,
J.W. (1950) Some sampling simplified. J. Amer. Statist. Assoc. 45 501–
519.
Tukey, J.W. (1956) Variances of variance components: I. Balanced designs.
Ann. Math. Statist. 27, 362–377.
7