Download Asymptotic Results

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cayley–Hamilton theorem wikipedia , lookup

Brouwer fixed-point theorem wikipedia , lookup

Transcript
Chapter 7
Asymptotic Results
This chapter is devoted to asymptotic results; firstly, consistency is discussed, the problem of when a
sequence of estimators (b
qn )n≥1 converges in probability to q(θ), the quantity to be estimated. Then,
techniques based on the Central Limit theorem are discussed to give conditions under which the
maximum likelihood estimators of the canonical parameters an exponential family are asymptotically
normal.
7.1
Consistency
Let θ be an unknown parameter from a parameter space Θ. Let (b
qn )n≥1 be a sequence of estimators
p
of q(θ), where q : Θ → R .
Definition 7.1. The sequence (b
qn )n≥1 is consistent if for all θ ∈ Θ and ǫ > 0,
n→∞
Pθ (|b
qn − q(θ)| ≥ ǫ) −→ 0.
(7.1)
where |.| denotes the Euclidean norm. It is said to be uniformly consistent over K ⊂ Θ (or simply
uniformly consistent if K = Θ) if
n→∞
sup Pθ (|b
qn − q(θ)| ≥ ǫ) −→ 0.
(7.2)
θ∈K
7.1.1
The Weak Law of Large Numbers
The simplest example of consistency is convergence of the sample average to the population average.
Theorem 7.2. Let X1 , . . . , Xn be i.i.d. with distribution P. Suppose that E [|Xi |] < +∞, then X →P
E [X1 ] =: µ.
Sketch of Proof Only a sketch of the proof is given, since the result is treated fully in Probability 2.
Let φX (t) = E eitX denote the characteristic function of the random variable X. Since |φX (t)| ≤ 1
for all t ∈ R, Taylor’s expansion theorem may be applied to give:
119
120
CHAPTER 7. ASYMPTOTIC RESULTS
φX (t) = E [1 + itZ + o(t)] = 1 + itµ + o(t).
It follows that, for X = n1 (X1 + . . . + Xn ),
φX (t) =
n
Y
j=1
n t
t n n→+∞ itµ
t
= 1 + i µ + o( )
−→ e .
φXj /n (t) = φX
n
n
n
This is the characteristic function of the constant random variable µ and hence by the Lévy continuity
theorem (omitted) X →P µ.
Without further assumptions, uniform consistency cannot be proved only on the assumption that
Eθ [|X1 |] < +∞ for each θ ∈ Θ; stronger conditions are required. If, in addition, Vθ (X1 ) < M < +∞
where M does not depend on θ, Chebyshev’s inequality may be used to prove uniform consistency;
1
M
Pθ X − µ(θ) > ǫ ≤ 2 Vθ (X) ≤ 2 .
ǫ
nǫ
The main tool to prove consistency will be the law of large numbers. This is natural for a method of
moments estimator, where the parameter estimators will be functions of moment estimators.
We start with a result about functions of estimators of multinomial sampling probabilities. The estimators pi are the sample averages and hence are consistent by the law of large numbers. To prove
uniform consistency for a function of these estimators, the function has to be uniformly continuous.
This is obtained ‘for free’ if the parameter space is compact. In the following theorem for multinomial
sampling, the usual parameter space is extended to obtain compactness. A continuous function over a
compact space is uniformly continuous.
Pk
Theorem 7.3. Let P = {(p1 , . . . , pk ) : 0 ≤ pj ≤ 1, 1 ≤ j ≤ k,
j=1 pj = 1}. Let Pp denote
the probability distribution p = (p1 , . . . , pk ) over X = (x1 , . . . , xk ). Let X1 , . . . , Xn denote a random
P
N
sample from Pp . Let Nj = ni=1 1xj (Xi ) and pbn,j = nj for j = 1, . . . , k. Let q : P → Rp be continuous.
pn,1 , . . . , pbn,k ). Then qbn := q(b
pn ) is a uniformly consistant estimator of q(p).
Let pbn = (b
Proof Let pbn = (b
p1,n , . . . , pbk,n ). Note that E [b
pnj ] = pj for each j and Vp (b
pnj ) =
Chebyshev’s inequality, it follows that for all p = (p1 , . . . , pk ) ∈ P and δ > 0,
pj (1−pj )
n
≤
1
4n .
By


k
n
o
X
Pp pbn − p ≥ δ
(b
pnj − pj )2 ≤ δ 2  ≤ Pp ∪kj=1 k (b
pnj − pj )2 ≥ δ 2
= Pp 
j=1
≤
k
X
j=1
Pp
δ
|b
pnj − pj | ≥ √
k
≤
k2
.
4nδ 2
Because q is continuous and P is compact, it follows that q is uniformly continuous on P. It follows
that for every ǫ > 0, there exists a δ(ǫ) > 0 such that for all p, p′ ∈ P,
|p′ − p| ≤ δ(ǫ)
⇒
|q(p′ ) − q(p)| ≤ ǫ.
121
7.1. CONSISTENCY
It follows that
and the result follows.
Pp (|b
qn − q| ≥ ǫ) ≤ Pp |b
pn − p| ≥ δ(ǫ) ≤
k2
4nδ(ǫ)2
The aim of the discussion that now follows is to establish Theorem 7.6; that is, that as n → +∞, the
probability of existence of the ML estimator for an exponential family tends to 1 and the sequence
of ML estimators from random samples of size n is consistent. This requires Proposition 7.4 and
Corollary 7.5, which is a corollary to Theorem 4.5.
Proposition 7.4. Let X1 , . . . , Xn be i.i.d., each with state space X and let P = {Pθ : θ ∈ Θ} be a
regular family of probability distributions over X . Let g = (g1 , . . . , gd ) map X onto Y ⊂ Rd . Suppose
Eθ [|gj (X1 )|] < +∞ for 1 ≤ j ≤ d for all θ ∈ Θ. Let mj (θ) = Eθ [gj (X1 )] and let q(θ) = h(m(θ)) where
h : Y → Rp is a continuous function. Then
n
is a consistent estimate of q(θ).
qb = h(g) = h
1X
g(Xi )
n
i=1
!
Proof It follows from the weak law of large numbers that
n
1X
g(Xi ) →P EP [g(X1 )] .
n
i=1
It is straightforward to establish that if Y n →P Y and h is continuous then h(Y n ) →P h(Y ).
The following result is a corollary to Theorem 4.5.
Lemma 7.5. Suppose that P = {Pη : η ∈ E}, where E, the natural parameter space is open; P is the
canonical exponential family generated by (h, T ) where T = (T1 , . . . , Tk ), of rank k. Let CT denote the
convex support of the distribution of T under Pη for all η ∈ E. Let t0 = Eη [T (X)]. Then ηb, the MLE
exists and is unique if and only if t0 ∈ CT0 , the interior of CT .
Proof of Lemma 7.5 Recall Theorem 4.5, that if there is a δ > 0 and an ǫ > 0 such that t0 satisfies:
infP
(c1 ,...,ck ):
2
j cj =1
P ((c, T (X) − t0 ) > δ) > ǫ
then the MLE ηb exists, is unique and is the solution to the equation
Ȧ(η) = Eη [T (X)] = t0 .
122
CHAPTER 7. ASYMPTOTIC RESULTS
The point t0 ∈ C 0 the interior of a convex set C if and only if for every d 6= 0, both {t : (d, t) > (d, t0 )} =
6
φ and {t : (d, t) < (d, t0 )} =
6 φ. The equivalence of Equation (4.1) and Lemma 7.5 follows.
The main result, which is a consequence of Proposition 7.4, together with Lemma 7.5 may now be
stated and proved.
Theorem 7.6. Let P be a canonical exponential family of rank d generated by T = (T1 , . . . , Td ). Let η
denote the natural parameter, E the natural parameter space and A the log partition function. Suppose
that E is open. Let X1 , . . . , Xn be a random sample from Pη ∈ P. Let ηb denote the MLE. Then
1. Pη (b
ηM LE
n→+∞
exists) −→ 1.
2. (b
ηn )n≥1 is consistent.
P
Proof It follows from Lemma 7.5 that ηb(X1 , . . . , Xn ) exists if and only if T n := n1 nj=1 T (Xj )
belongs to the interior CT0 of the convex support of the distribution of T n . If η0 is the parameter
value, then Eη0 [T (X1 )] belongs to the interior of the convex support by Theorem 4.5 since η0 solves
the equation Ȧ(η0 ) = t0 = Eη0 [T (X1 )]. By definition of the convex support, there exists a ball
Sδ := {t : |t − Eη0 [T (X1 )] < δ} ⊂ CT0 .
By WLLN,
n
1X
T (Xi ) −→Pη0 Eη0 [T (X1 )] ,
n
i=1
from which
n
P η0
1X
T (Xi ) ∈ CT0
n
i=1
Since ηb is the solution to Ȧ(η) =
1 and hence 1. follows.
1
n
!
≥ Pη 0
Pn
n
!
1 X
n→+∞
T (Xi ) − Eη0 [T (X1 )] < δ
−→ 1.
n
i=1 T (Xi ),
i=1
it follows that the probability that ηb exists tends to
For part 2., by Theorem 3.7, the map η → Ȧ(η) is 1 - 1 and continuous on E. It follows that the
inverse Ȧ−1 : Ȧ(E) → E is continuous on Sδ and the result now follows by Proposition 7.4.
7.1.2
Consistency of Minimum Contrast Estimators
Let P = {Pθ : θ ∈ Θ} be a regular family, let ρ(x, θ) be a contrast function and let X = (X1 , . . . , Xn )
be a random sample from Pθ . Let θb be a minimum contrast estimate that minimises
n
ρn (X, θ) =
1X
ρ(Xi , θ).
n
i=1
Recall that if ρ is a contrast function, then (by definition) D(θ0 , θ) := Eθ0 [ρ(X1 , θ)] is uniquely minimised at θ = θ0 for all θ0 ∈ Θ.
123
7.1. CONSISTENCY
Theorem 7.7. Suppose
)
( n
1 X
P
θ0
sup 0
(ρ(Xi , θ) − D(θ0 , θ)) −→
n
θ∈Θ
(7.3)
i=1
and
inf {D(θ0 , θ) − D(θ0 , θ0 ) : |θ − θ0 | > ǫ} > 0
(7.4)
∀ǫ > 0
then θb is consistent.
Proof Firstly, consider the set on which |θb − θ0 | > ǫ. This is contained in the set where |θ − θ0 | > ǫ|
and ρn (X, θ) ≤ ρn (X, θ0 ). It follows that
Let

Pθ0 |θb − θ0 | ≥ ǫ ≤ Pθ0 inf
A = inf
(
n

n
1 X
n
j=1
(ρ(Xj , θ) − ρ(Xj , θ0 )) : |θ − θ0 | ≥ ǫ
1X
(ρ(Xi , θ) − ρ(Xi , θ0 )) : |θ − θ0 | ≥ ǫ
n
i=1
)




≤ 0 .
(7.5)
and
B = inf {D(θ0 , θ) − D(θ0 , θ0 ) : |θ − θ0 | ≥ ǫ} .
n→+∞
From the hypotheses, Pθ0 (supθ |ρn (X, θ) − D(θ0 , θ)| > δ) −→ 0 and hence, using
D(θ0 , θ) − D(θ0 , θ0 ) ≥ 0,
that for all δ > 0,
Pθ0 (A − B ≤ −δ)
≤
n→+∞
Pθ 0
inf
θ:|θ−θ0 |≥ǫ
(ρn (X, θ) − D(θ0 , θ)) − (ρn (X, θ0 ) − D(θ0 , θ0 )) ≤ −δ
(7.6)
−→ 0.
Now choose δ = inf θ:|θ−θ0 |>ǫ (D(θ0 , θ) − D(θ0 , θ0 )). Then δ > 0 and from Equation (7.6), it follows
directly that the right hand side of (7.5) tends to zero.
The following simple and important corollary gives a condition under which the MLE is consistent.
Corollary 7.8. Let Θ = {θ1 , . . . , θd } denote a finite parameter space. Suppose that
max Eθj [| log p(X1 , θk )|] < +∞
j,k
and suppose that the parametrisation is identifiable. Let θb denote the MLE of θ. Then Pθj θb =
6 θj → 0
for all j ∈ {1, . . . , d}.
124
CHAPTER 7. ASYMPTOTIC RESULTS
Proof Since the parameter space is discrete and finite, it follows that there is an ǫ > 0 such that
Pθj θb 6= θk = Pθj θb − θk ≥ ǫ
∀(j, k).
Recall that the MLE estimator is the minimum contrast estimator with contrast function
n
1X
log p(Xj , θ).
n
i=1
By Shannon’s lemma 4.2, D(θ0 , θ) is minimised for θ = θ0 for all θ0 ∈ Θ. It follows that only
equations (7.3) and (7.4) need to be checked.
Equation (7.3) follows from the WLLN. Equation (7.4) follows from Shannon’s lemma.
7.2
The Delta Method
Let X1 , . . . , Xn be a random sample from a parent distribution satisfying E[X1 ] = µ and V(X1 ) =
σ 2 < +∞. The central limit theorem states that
√
n→+∞
L( n(X − µ)) −→ N (0, σ 2 ).
The delta method is simply the name given to the application of Taylor’s expansion theorem to obtain
the distribution of functions of the sample average.
Theorem 7.9 (The Delta Method). Let X1 , . . . , Xn be a random sample, where X1 has state space R,
E[X1 ] = µ, V(X1 ) = σ 2 < +∞ and h : R → R a differentiable function. Then
n→+∞ 2 √
(7.7)
L n(h(X) − h(µ)) −→ N 0, h′ (µ) σ 2
The result follows from the following lemma.
Lemma 7.10. Let {Un } be a sequence of real valued random variables and {an } a sequence of constants
that satisfies an → +∞ as n → +∞. Suppose that
n→+∞
1. an (Un − u) −→
L
V for some constant u ∈ R where V is a well defined random variable,
2. g : R → R is differentiable at u with derivative g ′ (u). Then
n→+∞
an (g(Un ) − g(u)) −→ g ′ (u)V
Proof of Lemma 7.10 Since an → +∞ as n → +∞, it follows that for every δ > 0,
P(|Un − u| ≤ δ) → 1.
From the definition of a derivative, it follows that for every ǫ > 0, there exists a δ > 0 such that
|v − u| ≤ δ ⇒ |g(v) − g(u) − (v − u)g ′ (u)| ≤ ǫ|v − u|.
(7.8)
125
7.2. THE DELTA METHOD
From this, it follows that
P g(Un ) − g(u) − g ′ (u)(Un − u) ≤ ǫ |Un − u| → 1,
from which
n→+∞
P an (g(Un ) − g(u)) − g ′ (u)(an (Un − u) ≤ ǫ |an (Un − u)| −→ 1.
Since an (Un − u) →L V , the result follows.
Proof of Theorem 7.9 This follows from the lemma by setting Un = X, an = n1/2 , u = µ and
V ∼ N (0, σ 2 ).
The delta method can be extended to situations where h : R → R is a twice differentiable function
with h′ (µ) = 0, but h′′ (µ) 6= 0.
Theorem 7.11 (Second order delta method). Let (Yn ) be a sequence of random variables that satisfy
√
n(Yn − µ) →L N (0, σ 2 ). Let h be a function that is twice differentiable and satisfies h′ (µ) = 0,
h′′ (µ) 6= 0. Then
2
n→+∞ σ
n(h(Yn ) − h(µ)) −→ L
h′′ (µ)V
2
where V ∼ χ21 .
Proof Similar to the first order delta method; consider the second derivative and recall that if V ∼ χ21 ,
then V =L Z 2 where Z ∼ N (0, 1).
The delta method extends to the multivariate setting. Firstly, Lemma 7.10 extends directly:
Lemma 7.12. Let {U n } be random d-vectors and let {an } be a sequence of constants satisfying an →
+∞ as n → +∞ and suppose that
n→+∞
1. an (U n − u) −→
L
V where V is a random d-vector.
(1)
2. g : Rd → Rp has a differential gp×d (u) at u. Then
n→+∞
an (g(U n ) − g(u)) −→
L
g (1) (u)V .
Proof Similar to Lemma 7.10.
From this, the multivariate version of the delta method can be stated and proved.
Theorem 7.13 (Multivariate delta method). Let Y 1 , . . . , Y n be i.i.d. random d-vectors with well
defined expected value µ and covariance matrix Σ. Let h : O → Rp where O is an open subset of Rd .
Suppose that h has a well defined differential h(1) (µ), where
(1)
hij (µ) =
∂hi
(µ).
∂xj
126
CHAPTER 7. ASYMPTOTIC RESULTS
Then
h(Y ) = h(µ) + h(1) (µ) Y − µ + oP (n−1/2 )
(7.9)
where oP (n−1/2 ) denotes a quantity V that satisfies:
n→+∞
P(n1/2 |V | > ǫ) −→ 0
∀ǫ > 0
and
n→+∞
(7.10)
−→ N (0, h(1) (µ)Σh(1)t (µ)).
√
The proof follows in the same way as before; let an = n, U n = Y , u = µ and V ∼ N (0, Σ). Then
√
n h(Y ) − h(µ)
h(1) (µ)V ∼ N (0, h(1) (µ)Σh(1)t (µ))
as required.
7.3
Asymptotic Results for Maximum Likelihood
The following result gives the asymptotic distribution for the maximum likelihood estimator of the
canonical parameters.
Theorem 7.14. Let P be a canonical exponential family of rank d generated by T and suppose that E
(the natural parameter space) is open. Let X1 , . . . , Xn be a random sample from Pη ∈ P. Let ηb be the
MLE if it exists and equal to a constant vector c otherwise. Then
1.
n
ηb = η + Ä
2.
−1
(η)
1X
T (Xi ) − Ȧ(η)
n
i=1
!
+ oPη (n−1/2 )
√
n→+∞
L( n(b
η − η)) −→ N (0, I −1 (η))
where oPη (n−1/2 ) denotes a quantity Vn such limn→+∞ Pη (n1/2 |Vn | > ǫ) = 0 for all ǫ > 0.
√
Remark The asymptotic variance matrix I −1 (η) of n (b
η − η) is the matrix that gives the Cramér
Rao lower bound on variances of unbiased estimators of linear combinations of (η1 , . . . , ηd ). This is the
asymptotic efficiency property of the ML estimator for exponential families.
Proof This is an immediate consequence of the multivariate delta method. Firstly, let
n
T =
1X
T (Xj ),
n
j=1
then Pη T ∈ Ȧ(E)
Theorem 7.13.
→ 1 and hence Pη ηb = Ȧ−1 (T )
→ 1. Now set h = Ȧ−1 and µ = Ȧ(η) in
127
7.3. ASYMPTOTIC RESULTS FOR MAXIMUM LIKELIHOOD
The following result is from Analysis 2: Let h : Rd → Rd be
1−
1 and continuously differentiable
∂hi
on an open neighbourhood O of x. Suppose that Dh(x) := ∂xj
be non singular. Then h−1 :
d×d
h(O) → O is differentiable at y = h(x) and
Dh−1 (y) = (Dh(x))−1 .
By definition, DȦ = Ä. In Theorem 7.13, h(1) (µ) = Ä−1 (η). The first statement of the theorem now
follows from (7.9). The second part follows by noting that the covariance matrix of T (X1 ) is Σ = Ä(η),
from which
h(1) (µ)Σh(1)t (µ) = Ä−1 ÄÄ−1 (η) = Ä−1 (η).
statement 2. now follows from (7.10).
Example 7.1 (Normal Random Sample).
Let X1 , . . . , Xn be a N (µ, σ 2 ) random sample. Note that
Let η1 =
µ
σ2


n
n

µ X
µ2
1 X 2
1
−
n
x
−
x
−
n
log
σ
exp
p(x1 , . . . , xn ; µ, σ) =
j
j

 σ2
2σ 2
2σ 2
(2π)n/2
j=1
j=1
and η2 = − 2σ1 2 , then the model can be re-written in canonical form as
r
1
η12
1
2
exp n η1 x + η2 x +
− log −
p(x1 , . . . , xn ; η) =
4η2
2η2
(2π)n/2
Then T = (T1 , T2 ) where T1 = X and T2 = X 2 is a sufficient statistic for the parameters. Since
Eη [T1 ] = µ and Eη [T2 ] = σ 2 + µ2 , it follows from the central limit theorem that:
√
η2
n
!
T1 − µ
T2 − (µ2 + σ 2 )
n→+∞
−→
L
0
0
N
where A(η) = − 4η12 − 21 (log 2 + log(−η2 )) so that
1
Ä(η) = 2
2η2
−η2
η1
η1
The maximum likelihood estimators for the normal are:
and
It follows that ηb1 =
n
X
σ
b2
and ηb2 =
σ
b2 =
n
η12
4η2
, Ä(η)
!
µ
b = X = T1
1X
(Xj − X)2 = T2 − (T1 )2 .
n
j=1
− 2bσ1 2 .
√
1−
!
By the preceding theorem,
ηb1 − η1
ηb2 − η2
!
n→+∞
−→L
N 0, I −1 (η) .
!
128
7.4
CHAPTER 7. ASYMPTOTIC RESULTS
Asymptotic Distribution of Minimum Contrast Estimators
The result for the asymptotic distribution of MLE estimators for the canonical parameters of exponential families may be extended to a large class of minimum contrast estimators. Let θ ∈ Θ ⊂ Rd ,
where θ = (θ1 , . . . , θd ). Let X = (X1 , . . . , Xn ) and let ρn (X, θ) be a minimum contrast function based
on the random sample of the form
n
ρn (X, θ) = −
1X
ρ(Xj , θ)
n
(7.11)
j=1
where ρ is a contrast function for a single observation. For example, take ρ(x, θ) = log p(x, θ) for
maximum likelihood estimation. Assume the following:
1. ρn is differentiable in θj for each j = 1, . . . , d. Let θbn denote the minimum contrast estimate;
that is, θbn satisfies
∂ρn
(X, θbn ) = 0
∂θj
j = 1, . . . , d.
(7.12)
In the case of ρn given by Equation (7.11), this is the maximum likelihood estimate.
2.
Eθ
∂ρn
(X, θ) = 0.
∂θj
(7.13)
h
i
Eθ |∇θ ρn (X, θ)|2 < +∞
(7.14)
where |.| denotes the Euclidean norm.
3. ρn is twice differentiable in θ and satisfies
X
j,k
The matrix with entries Eθ
4. θbn −→Pθ θ for each θ ∈ Θ.
h
∂2
ρn (X, θ) < +∞
Eθ ∂θj ∂θk
∂2
∂θi ∂θj ρn (X, θ)
i
∀θ ∈ Θ
is non-singular for each θ ∈ Θ.
For the fourth of these, in the case of exponential families of full rank, where θ = θ(η) for a continuous
1 − 1 mapping θ, θbn →Pθ θ by virtue of Theorem 7.14.
Theorem 7.15. Let P = {Pθ : θ ∈ Θ ⊂ Rd } be a regular parametric family. Let X = (X1 , . . . , Xn ) be
a random sample. Suppose that conditions 1., 2., 3. and 4. hold. Let J be the matrix with entries
2
∂
ρ(X, θ)
Jjk (θ) = E
∂θj θk
7.4. ASYMPTOTIC DISTRIBUTION OF MINIMUM CONTRAST ESTIMATORS
129
and let K be the matrix with entries
Kjk (θ) = Eθ
∂ρ
∂ρ
(X, θ)
(X, θ) .
∂θj
∂θk
Then the minimum contrast estimate satisfies:
θbn = θ + J −1 (θ)∇ρn (X, θ) + oPθ (n−1/2 ),
so that
L
√ n→+∞
−→ N (0, J −1 (θ)K(θ)J −1 (θ)).
n θbn − θb
In the case of the maximum likelihood estimate, I = J = K so that
L
√ n→+∞
−→ N (0, I −1 (θ)).
n θbn − θb
Proof By Taylor’s expansion theorem:
X ∂2
X ∂2
∂
∂
ρn (X, θ) =
ρn (X, θbn ) +
ρn (X, θn∗ )(θj − θbn,j ) =
ρn (X, θn∗ )(θj − θbn,j )
∂θk
∂θk
∂θj ∂θk
∂θj ∂θk
j
j
∗ − θ | ≤ |θ
bn,j − θj | for each j.
where |θn,j
j
It follows from assumption 4. that θn∗ → θ and hence from the WLLN that
∂2
∂2
∗
ρn (X, θn ) −→Pθ Eθ
ρ(X, θ) = Jj,k (θ)
∂θj ∂θk
∂θj ∂θk
h
i
Since Eθ ∂θ∂k ρn (X, θ) = 0, it follows that K(θ) is the covariance matrix of ∇ρ(X, θ) and hence, from
the central limit theorem, that
√
n∇ρn (X, θ) −→Pθ N (0, K(θ)).
If V ∼ N (0, K(θ)), then J −1 (θ)V ∼ N (0, J −1 (θ)K(θ)J −1 (θ)), from which it follows that
√
n→+∞
n(θbn − θ) −→ N (0, J −1 (θ)K(θ)J −1 (θ)).
The result for maximum likelihood follows directly, since for maximum likelihood I = J = K.