Download Review of Classic Quadratic Variation Results and

Document related concepts
no text concepts found
Transcript
Lecture III: Review of Classic Quadratic Variation
Results and Relevance to Statistical Inference in
Finance
Christopher P. Calderon
Rice University / Numerica Corporation
Research Scientist
Chris Calderon, PASI, Lecture 3
Outline
I Refresher on Some Unique Properties of
Brownian Motion
II Stochastic Integration and Quadratic
Variation of SDEs
III Demonstration of How Results Help
Understand ”Realized Variation” Discretization
Error and Idea Behind “Two Scale Realized
Volatility Estimator”
Chris Calderon, PASI, Lecture 3
Part I
Refresher on Some Unique
Properties of Brownian Motion
Chris Calderon, PASI, Lecture 3
Outline
For Items I & II Draw Heavily from:
Protter, P. (2004) Stochastic Integration and Differential
Equations, Springer-Verlag, Berlin.
Steele, J.M. (2001) Stochastic Calculus and Financial
Applications, Springer, New York.
For Item III Highlight Results from:
Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265.
Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.
Chris Calderon, PASI, Lecture 3
Assuming Some Basic Familiarity
with Brownian Motion (B.M.)
• Stationary Independent Increments
• Increments are Gaussian
Bt − Bs ∼ N (0, t − s)
• Paths are Continuous but “Rough” / “Jittery”
(Not Classically Differentiable)
• Paths are of Infinite Variation so “Funny” Integrals Used in
Stochastic Integration, e.g. Rt
0
Chris Calderon, PASI, Lecture 3
Bs dBs = 12 (Bt2 − t)
Assuming Some Basic Familiarity
with Brownian Motion (B.M.)
The Material in Parts I and II are “Classic”
Fundamental & Established Results but Set the
Stage to Understand Basics in Part III.
The References Listed at the Beginning Provide
a Detailed Mathematical Account of the Material
I Summarize Briefly Here.
Chris Calderon, PASI, Lecture 3
A Classic Result Worth Reflecting On
N
2
lim
(Bti − Bti−1 ) = T
N →∞ i=1
ti =
T
iN
Chris Calderon, PASI, Lecture 3
Implies Something Not
Intuitive to Many Unfamiliar
with Brownian Motion …
Discretely Sampled Brownian Motion Paths
2
1.8
1.6
Fix Terminal Time, “T”
and Refine Mesh by
Increasing “N”.
Zt
1.4
Shift IC to
Zero if
“Standard
Brownian
Motion”
Desired
1.2
1
0.8
0
Chris Calderon, PASI, Lecture 3
1
t
Discretely Sampled Brownian Motion Paths
2
1.8
1.6
Fix Terminal Time, “T”
and Refine Mesh by
Increasing “N”.
T
δ≡ N
T
ti ≡ i N
Suppose You are Interested in Limit of:
1.4
Zt
k(t)
2
1.2
i=1
1
k(t) = max{i : ti+1 ≤ t}
0.8
0
Chris Calderon, PASI, Lecture 3
(Bti − Bti−1 )
1
t
Different Paths, Same Limit
2
1.8
lim
k(t)
P
N →∞ i=1
(Bti − Bti−1 )2 = t
[Proof Sketched on Board]
1.6
Bt (ω1 )
Zt
1.4
1.2
1
θ̂1
Bt (ω2 )
0.8
0
Chris Calderon, PASI, Lecture 3
1
t
Different Paths, Same Limit
2
1.8
lim
k(t)
P
N →∞ i=1
(Bti − Bti−1 )2 = t
[Proof Sketched on Board]
1.6
Bt (ω1 )
Zt
1.4
1.2
1
θ̂1
Bt (ω2 )
0.8
0
Chris Calderon, PASI, Lecture 3
1
t
Compare To
Central Limit
Theorem for “Nice”
iid Random
Variables.
Different Paths, Same Limit
lim
N
P
(Bti − Bti−1 )2 ≡ lim
N →∞2 i=1
1.8
1.6
Zt
1.4
1.2
N
P
N →∞ i=1
Yi = T
Compare/Contrast to Properly Centered and
Scaled Sum of “Nice” iid Random Variables:
N
P
Bt (ω1 )
(θ(ωi )−μ)
i=1
√
SN ≡
N
1
θ̂1
Bt (ω2 )
0.8
0
Chris Calderon, PASI, Lecture 3
1
t
L
SN → N (0, σ 2 )
lim
N
P
Continuity in Time
(Bti − Bti−1 )2 ≡ lim
N →∞2 i=1
1.8
1.6
Zt
1.4
1.2
1
N
P
N →∞ i=1
Yi = T
Above is “Degenerate” but Note Time Scaling
Used in Our Previous Example Makes Infinite
Sum Along a Path Seem Similar to
Computing Variance of Independent RV’s
N
P
(θ(ωi )−μ)
θ̂1
SN ≡ i=1 √N
L
0.8
0
Chris Calderon, PASI, Lecture 3
1
t
SN → N (0, σ 2 )
B.M. Paths Do Not Have Finite
Variation
Let 0 < t ≤ T
πn = {0 ≤ t1 ≤ . . . ti ≤ . . . ≤ tn = t}
lim max(ti+1 − ti ) = 0
n→∞ i<n
P ≡ All Finite Partitionsof [0, t]
Chris Calderon, PASI, Lecture 3
B.M. Paths Do Not Have Finite
Variation
V[0,t] (ω) = sup
π∈P ti ∈π
Suppose That:
Then:
t = lim
N
P
≤ lim
P(V[0,t] < ∞) > 0
(Bti − Bti−1 )2
N →∞ i=1
sup |Bti − Bti−1 |
N →∞ ti ∈πN
Chris Calderon, PASI, Lecture 3
|Bti+1 − Bti |
N
P
i=1
|Bti − Bti−1 |
B.M. Paths Do Not Have Finite
Variation
t = lim
N
P
(Bti − Bti−1 )2
N →∞ i=1
≤ lim
sup |Bti − Bti−1 |
N →∞ ti ∈πN
N
P
i=1
|Bti − Bti−1 |
≤ lim
sup |Bti − Bti−1 |V[0,t]
=0
Tends to Zero Due to
a.s. uniform continuity of
B.M. Paths
N →∞ ti ∈πN
Chris Calderon, PASI, Lecture 3
t > 0, so
Probability of
Finite Variation
Must be Zero
Part II
Stochastic Integration and
Quadratic Variation of SDEs
Chris Calderon, PASI, Lecture 3
Stochastic Integration
P(V[0,t] < ∞) = 0
t
X(ω, t) =
t
b(ω, s)ds +
0
“Finite Variation Term”
Chris Calderon, PASI, Lecture 3
σ(ω, s)dBs
0
“Infinite Variation Term”
Stochastic Integration
P(V[0,t] < ∞) = 0
t
X(ω, t) =
t
b(ω, s)ds +
0
σ(ω, s)dBs
0
“Infinite Variation Term” Complicates
Interpretting Limit as Lebesgue Integral
σ Xti (Bti ) |Bti+1 − Bti |
Chris Calderon, PASI, Lecture 3
Stochastic Differential Equations
(SDEs)
t
X(ω, t) =
t
b(ω, s)ds +
0
σ(ω, s)dBs
0
Written in Abbreviated Form:
dX(ω, t) = b(ω, t)dt + σ(ω, t)dBt
Chris Calderon, PASI, Lecture 3
Stochastic Differential Equations
Driven by B.M.
dX(ω, t) = b(ω, t)dt + σ(ω, t)dBt
Adapted and Measurable Processes
dXt = bt dt + σt dBt
I will use this shorthand for the above (which is
itself shorthand for stochastic integrals…)
Chris Calderon, PASI, Lecture 3
Stochastic Differential Equations
Driven by B.M.
dXt = bt dt + σt dBt
General Ito Process
dXt = b(Xt , t)dt + σ(Xt , t)dBt
Diffusion Process
dXt = 0dt + 1dBt
Repackaging Brownian Motion
Chris Calderon, PASI, Lecture 3
Quadratic Variation for General
SDEs (Jump Processes Readily Handled)
RVπN (t) ≡
N
P
(Xti − Xti−1 )2
i=1
“Realized Variation”
Quadratic Variation: Defined if the limit taken over
any partition exists with the mesh going to zero.
:={ Vt }
Chris Calderon, PASI, Lecture 3
Stochastic Integration
t
t
0
Y (ω, t) =
0
b (ω, s)ds +
0
“Finite Variation Term”
σ (ω, s)dXs
0
“Infinite Variation Term”
Quadratic Variation of “X” Can Be Used to
Define Other SDEs (A “Good Stochastic
Integrator”)
Chris Calderon, PASI, Lecture 3
Quadratic Variation Comes Entirely
From Stochastic Integral
t
X(ω, t) =
t
b(ω, s)ds +
0
σ(ω, s)dBs
0
i.e. the drift (or term in front of “ds” makes)
no contribution to the quadratic variation
[quick sketch of proof on board]
Chris Calderon, PASI, Lecture 3
Stochastic Integration
P(V[0,t] < ∞) = 0
t
X(ω, t) =
t
b(ω, s)ds +
0
σ(ω, s)dBs
0
“Infinite Variation Term” Can Now Be
Dealt With If One Uses Q.V.
σ Xti (Bti ) |Bti+1 − Bti |
Chris Calderon, PASI, Lecture 3
Quadratic Variation and “Volatility”
t
Xt =
t
bs ds +
0
σs dBs
0
t
Typical Notation for QV
2
σs dt
[X, X]t ≡ Vt =
0
Chris Calderon, PASI, Lecture 3
Quadratic Variation and “Volatility”
For Diffusions
t
2
σs dt
[X, X]t ≡ Vt =
0
More Generally for Semimartingales
(Includes Jump Processes)
[X, X]t :=
2
Xt
t
−2
Xs− dXs
0
Chris Calderon, PASI, Lecture 3
Part III
Demonstration of How Results Help
Understand ”Realized Variation”
Discretization Error and Idea Behind “Two
Scale Realized Volatility Estimator”
Chris Calderon, PASI, Lecture 3
Most People and Institutions Don’t
Sample Functions (Discrete
Observations)
RVπN (t) ≡
N
P
(Xti − Xti−1 )2
i=1
Assuming Process is a Genuine Diffusion, How
Far is Finite N Approximation From Limit?
[X, X]t
Chris Calderon, PASI, Lecture 3
Discretization Errors and Their
Limit Distributions
N
¡
1/2
¢
L
RVπN (t) − [X, X]t |σt → N (0, C
Rt
σs4 ds)
0
Paper Below Derived Explicit Expressions for
Variance for Fairly General SDEs Driven by B.M.
Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265.
Chris Calderon, PASI, Lecture 3
Discretization Errors and Their
Limit Distributions
N
¡
1/2
¢
L
RVπN (t) − [X, X]t |σt → N (0, C
Rt
σs4 ds)
0
Their paper goes over some nice classic moment
generating function results for wide class of spot
volatility processes (and proves things beyond QV)
Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265.
Chris Calderon, PASI, Lecture 3
Discretization Errors and Their
Limit Distributions
One Condition That They Assume:
lim δ
N →0
N
1/2
i=1
2
|σti +δ
−
δ≡
2
σt i |
t
N
=0
Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265.
Chris Calderon, PASI, Lecture 3
Discretization Errors and Their
Limit Distributions
δ≡
Good for Many Stochastic Volatility Models … BUT
Some Exceptions are Easy to Find (See Board)
lim δ
N →0
N
1/2
i=1
2
|σti +δ
−
2
σt i |
t
N
=0
Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265.
Chris Calderon, PASI, Lecture 3
What Happens When One Has to
Deal with “Imperfect” Data?
Suppose “Price” Process Is Not Directly
Observed But Instead One Has:
y ti = X t1 + ² i
Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.
Chris Calderon, PASI, Lecture 3
What Happens When One Has to
Deal with “Imperfect” Data?
Due to Fact that Real World Data Does
Not Adhere to Strict Mathematical
Definition of a Diffusion
y ti = X t1 + ² i
Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.
Chris Calderon, PASI, Lecture 3
What Happens When One Has to
Deal with “Imperfect” Data?
y ti = X t1 + ² i
If “observation noise” sequence is i.i.d. (and
independent of X) and it is very easy to see problem of
estimating quadratic variation (see demostration on
board).
Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.
Chris Calderon, PASI, Lecture 3
What Happens When One Has to
Deal with “Imperfect” Data?
y ti = X t1 + ² i
If “observation noise” sequence is i.i.d. (and
independent of X) and it is very easy to see problem
of estimating quadratic variation (see demostration
on board). Surprisingly These Stringent Assumption
Model Many Real World Data Sets (e.g. See Results
from Lecture 1)
Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.
Chris Calderon, PASI, Lecture 3
Simple Case “Bias” Estimate
Compute Realized Variation of Y and Divide by 2N
Then Obtain (Biased) Estimate by Subsampling
Finally Aggregate Subsample Estimates and
Remove Bias
Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.
Chris Calderon, PASI, Lecture 3
Simple Case “Bias Corrected” Estimate
\[X, X] = [Y, Y ](avg) −
T
T
(all)
Ns
[Y,
Y
]
T
N
Result Based on “Subsampling”
Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.
Chris Calderon, PASI, Lecture 3
Chasing A (Time Series) Dream
Obtaining a “Consistent” Estimate with Asymptotic Error Bounds
\ X]
N 1/6 ([X,
T
³
´
T
R
L
[X, X]T )|σt → N 0, C1 (VAR(²))2 + C2 σs4 ds
0
Result Above Valid for Simple Observation Noise but
Results (by Authors Below) Have Been Extended to
Situations More General Than iid Case.
Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.
Chris Calderon, PASI, Lecture 3
DNA Melting
Force
Force
Tensions Important in Fundamental Life Processes Such As:
DNA Repair and DNA Transcription
Single-Molecule Experiments Not Just a Neat Toy:
They Have Provided Insights Bulk Methods Cannot
Chris Calderon, PASI, Lecture 3
Time Dependent Diffusions
Stochastic Differential Equation (SDE):
dzt = b(zt , t; Θ)dt + σ(zt ; Θ)dWt
yti = zti + ²ti
“Measurement” Noise
For Frequently Sampled Single-Molecule Experimental
Data, “Physically Uninteresting Measurement Noise” Can
Be Large Component of Signal.
Find/Approximate:
Chris Calderon, PASI, Lecture 3
p(zt , |ys ; Θ)
“Transition Density” /
Conditional Probability
Density
A Word from the Wise (Lucien Le Cam)
• Basic Principle 0: Don’t trust any principle
• Principle 1: Have clear in your mind what it is you want
to estimate
…
• Principle 4: If satisfied that everything is in order, try first
a crude but reliable procedure to locate the general area
your parameters lie.
• Principle 5: Having localized yourself by (4), refine the
estimate using some of your theoretical assumptions,
being careful all the while not to undo what you did in (4).
…
Le Cam L.. (1990) International Statistics Review 58 153-171.
Chris Calderon, PASI, Lecture 3
Example of Relevance of Going Beyond iid Setting
Watching Process at Too
Fine of Resolution Allows
“Messy” Details to Be
Statistically Detectable
Calderon, J. Chem. Phys. (2007).
Chris Calderon, PASI, Lecture 3
Example of Relevance of Going Beyond iid Setting
Xt+dt-E[Xt+dt|Xt]
r ≡ Xn − EPθi [Xn |Xn−1 ]
r
Calderon, J. Chem. Phys. (2007).
Chris Calderon, PASI, Lecture 3
Lecture IV: A Selected Review of Some Recent
Goodness-of-Fit Tests Applicable to Levy Processes
Christopher P. Calderon
Rice University / Numerica Corporation
Research Scientist
Chris Calderon, PASI, Lecture 3
Outline
I Compare iid Goodness of Fit Problem to Time
Series Case
II Review a Classic Result Often Attributed to
Rosenblatt (The Probability Integral Transform)
III Highlight Hong and Li’s “Omnibus” Test
IV Demonstrate Utility and Discuss Some
Other Recent Approaches and Open Problems
Chris Calderon, PASI, Lecture 3
Outline
Primary References (Item II & III):
Diebold, F. Hahn J., & Tay, A. (1999) Review of
Economics and Statistics 81 661-663.
Hong, Y. and Li, H. (2005) Review of Financial Studies 18
37-84.
Chris Calderon, PASI, Lecture 3
Goodness-of-Fit Issues .
0.4
0.35
0.3
p(x)
0.25
0.2
0.15
True and
Parametric
Density
0.1
0.05
0
−5
Chris Calderon, PASI, Lecture 3
0
x
5
Goodness-of-Fit in Random Variable
Case (No Time Evolution)
1
0.9
0.8
0.7
0.4
0.35
0.6
0.5
0.25
p(x)
P(x)
0.3
0.4
0.2
0.15
0.3
0.1
0.05
0.2
0
−5
0
x
0.1
0
−5
Chris Calderon, PASI, Lecture 3
0
x
5
5
Goodness-of-Fit Issues
0.4
0.4
0.35
0.35
0.3
0.3
0.25
0.2
p(x)
p(x)
0.25
0.15
0.2
0.15
0.1
0.1
0.05
0.05
0
−5
0
x
Chris Calderon, PASI, Lecture 3
5
0
−5
0
5
x
10
15
Goodness-of-Fit in Random Variable
Case (No Time Evolution)
0.35
0.35
0.3
0.3
0.25
0.25
p(x)
0.4
p(x)
0.4
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
−5
0
x
0
−5
5
10
15
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
P(x)
P(x)
5
x
1
1
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
−5
0
0
x
Chris Calderon, PASI, Lecture 3
5
0
−5
0
5
x
10
15
Goodness-of-Fit in Time Series
1
Note:
0.9
0.8
0.7
P1
P(x)
0.6
F1
P2
0.5
Truth and approximate
distribution changing with
(known) time index
F2
0.4
0.3
Time series only get “one”
sample from each
evolving distribution
0.2
0.1
0
−5
0
5
x
Chris Calderon, PASI, Lecture 3
10
15
Time Series: Residual and Extensions
ri ≡ Xi − Eθ̂ [Xi |Xi−1 ]
Problem with Using Low Order
Correlation Shown In Lecture 2.
Can we Make Use of All Moments?
Chris Calderon, PASI, Lecture 3
Time Series: Residual and Extensions
ri ≡ Xi − Eθ̂ [Xi |Xi−1 ]
Problem with Using Low Order
Correlation Shown In Lecture 2.
Can we Make Use of All Moments?
YES. With the Probability Integral X
Ri
Transform (PIT)
Zi =
Also Called Generalized Residuals
Chris Calderon, PASI, Lecture 3
−∞
p(X|Xi−1 ; θ0 )dX
Test the Model Against Observed Data:
{X1 , X2 , . . . XN }
θ̂
Time Series (Noisy, Correlated,
and Possibly Nonstationary)
Zi = G(Xi ; Xi−1 , θ̂)
Probability Integral Transform / Rosenblatt
Transform /“Generalized Residual”
If Assumed Model Correct, Residuals
i.i.d. with Known Distribution::
Formulate Test Statistic::
Zi ∼ U [0, 1]
Chris Calderon, PASI, Lecture 3
Q = H({Z1 , Z2 , . . . ZT })
Simple Illustration of PIT
[Specialized to Markov Dynamics]
Start time series random variables characterized by : Xn ∼ P(Xn |Xn−1 )
Introduce (strictly increasing) transformation: Zn = h(Xn ; θ)
Transformation introduces “new” R.V. => New distribution Zn ∼ F(Zn |Zn−1 ; θ)
p(Xn |Xn−1 ) ≡
“Truth” density in “X” (NOTE: no parameter)
X
Rn
PIT transformation: Zn ≡ h(Xn ) :=
f (X 0 |Xn−1 ; θ)dX 0
dP (Xn |Xn−1 )
dXn
−∞
Want expression for “new” density in terms of “Z”
dF (Zn |Zn−1 ;θ)
dZn
=
dP(Xn |Xn−1 )
dXn
=
1
dZn
dXn
Chris Calderon, PASI, Lecture 3
dP(Xn |Xn−1 ) dXn
dXn
dZn
=
Monotone
Function of X
?
1
p(Xn |Xn−1 ) f (Xn |Xn−1 ;θ) =
1
³
´
1
2
Q̂(j) ≡ h(N − j)M̂1 (j) − hA0 /V0
Location (Depends on
Bandwidth)
Scale
Z1 Z1 ³
´2
M̂1 (j) ≡
gˆj (z1 , z2 ) − 1 dz1 dz2 .
0
0
1
gˆj (z1 , z2 ) ≡
N −j
Chris Calderon, PASI, Lecture 3
N
X
τ =j+1
Kh (z1 , Ẑτ )Kh (z2 , Ẑτ −j ).
³
´
1
2
Q̂(j) ≡ h(N − j)M̂1 (j) − hA0 /V0
Z1 Z1 ³
´2
M̂1 (j) ≡
gˆj (z1 , z2 ) − 1 dz1 dz2 .
0
0
1
gˆj (z1 , z2 ) ≡
N −j
Chris Calderon, PASI, Lecture 3
N
X
τ =j+1
Kh (z1 , Ẑτ )Kh (z2 , Ẑτ −j ).
Stationary Time Series [No Time Dependent Forcing]
−0.8
−0.9
−1
y
−1.1
−1.2
−1.3
−1.4
−1.5
−1.6
−1.7
0
0.2
Chris Calderon, PASI, Lecture 3
0.4
0.6
Time [ns]
0.8
1
Stationary Time Series?
−0.8
−0.9
−1
y
−1.1
−1.2
−1.3
−1.4
−1.5
−1.6
−1.7
0
0.2
Chris Calderon, PASI, Lecture 3
0.4
0.6
Time [ns]
0.8
1
Stationary Time Series?
−0.8
−0.9
−1
y
−1.1
−1.2
−1.3
−1.4
−1.5
−1.6
−1.7
0
0.2
Chris Calderon, PASI, Lecture 3
0.4
0.6
Time [ns]
0.8
1
Dynamic Rules “Changing” Over Time
−0.8
(Simple 1D SDE Parameters “Evolving” )
−0.9
−1
y
−1.1
−1.2
−1.3
−1.4
−1.5
−1.6
−1.7
0
0.2
Chris Calderon, PASI, Lecture 3
0.4
0.6
Time [ns]
0.8
1
“Subtle” Coupling
Calderon, J. Phys. Chem. B (2010).
Chris Calderon, PASI, Lecture 3
Estimate “Simple” SDE in Local
Time Window. Then Apply
Hypothesis Tests To Quantify
Time of Validity
−0.8
−0.9
−1
y
−1.1
−1.2
−1.3
−1.4
−1.5
−1.6
−1.7
0
dΦt = (A + BΦt )dt + (C + DΦt )dWt
0.2
Chris Calderon, PASI, Lecture 3
0.4
0.6
Time [ns]
0.8
1
Estimate Model in Window “0”; Keep
Fixed and Apply to Other Windows
Chris Calderon, PASI, Lecture 3
Window 0
Window 2
Window 6
Window 8
−0.8
−0.9
−1
y
−1.1
−1.2
−1.3
−1.4
−1.5
−1.6
−1.7
0
0.2
Chris Calderon, PASI, Lecture 3
0.4
0.6
Time [ns]
0.8
1
Each Test Statistic Uses ONE Short Path
Calderon, J. Phys. Chem. B (2010).
Chris Calderon, PASI, Lecture 3
Stationary Time Series?
Subtle Noise Magnitude Changes due Unresolved
Degrees of Freedom (Check Validity of a “Born-Oppenheimer” Type Proxy)
dΦt = (A + BΦt )dt + (C + DΦt )dWt
Chris Calderon, PASI, Lecture 3
Stationary Time Series?
Subtle Noise Magnitude Changes due Unresolved
Degrees of Freedom (Check Validity of a “Born-Oppenheimer” Type Proxy)
Evolving “X” Influences Higher Order Moments of “Y”
Chris Calderon, PASI, Lecture 3
Stationary Time Series?
Subtle Noise Magnitude Changes due Unresolved
Degrees of Freedom (Check Validity of a “Born-Oppenheimer” Type Proxy)
Chris Calderon, PASI, Lecture 3
Estimate “Simple” SDE in Local
Time Window. Then Apply
Hypothesis Tests To Quantify
Time of Validity
Chris Calderon, PASI, Lecture 3
Window 0
Chris Calderon, PASI, Lecture 3
Window 2
Window 6
Window 8
“Subtle” Noise Magnitude Changes
Calderon, J. Phys. Chem. B (2010).
Chris Calderon, PASI, Lecture 3
Frequentist vs. Bayesian Approaches
• Concept of “Residual” Unnatural in Bayesian Setting.
All Information Condensed into Posterior (Time Residual
Can Help in Non-ergodic Settings)
• Uncertainty Information Available in Both Under Ideal
Models with Heavy Computation (Bayesian Methods Need
“Prior” Specification)
• Test Against Many Alternatives in a Frequentist
Setting (Instead of a Handful of Candidate Models as
is Done in Bayesian Methods)
Chris Calderon, PASI, Lecture 3
Several Issues
Evaluate Z at Estimated Parameter (Their Limit
Distribution Requires Root N Consistent Estimator to Get
[Asymptotic ] Critical Values)
Asymptopia is a Strange Place [Multiplying by Infinity Can
Be Dangerous When Numerical Proxies are Involved]
In Nonstationary Case, Initial Condition Distribution May
Be Important but Often Unknown.
Chris Calderon, PASI, Lecture 3
Testing Surrogate Models
Hong & Li, Rev. Financial Studies, 18 (2005). Chen, S, Leung, D. & Qin,J., Annals of Statistics, 36
(2008). Ait-Sahalia, Fan, Peng, JASA (in press) Calderon & Arora, J. Chemical Theory &
Computation 5 (2009).
(a)
1
0.9
0.45 ps
∆t = 0.30ps
0.8
0.7
F(Q)
0.6
0.5
∆t = 0.15ps
0.4
OD Δ
OD Δ
OD Δ
OU Δ
OU Δ
OU Δ
Null
0.3
0.2
0.1
0
−4
−2
0
2
4
Q
6
t=0.15
t=0.30
t=0.45
t=0.15
t=0.30
t=0.45
8
ps
ps
ps
ps
ps
ps
10
One Times Series Reduces to One Test Statistic. Empirical
CDF Summarizes Results from 100 Time Series
Chris Calderon, PASI, Lecture 3
A Sketch of Local to Global
or
Fitting Nonlinear SDEs Without A
Priori Knolwedge of Global
Parametric Model
Chris Calderon, PASI, Lecture 3
Local Time Dependent Diffusion Models
Calderon, J. Chem. Phys. 126 (2007).
dzt = b(zt , t; Θ)dt + σ(zt ; Θ)dWt
Approximate Locally by Low Order Polynomials, e.g.
b(z) ≈A + Bz + f (t)
σ(z) ≈C + Dz
Can use Physics-based Proxies e.g. Overdamped Langevin Dynamics
σ(z) ≈C + Dz
b(z) ≈(C + Dz)2 /(2kB T )(A + Bz + f (t))
Chris Calderon, PASI, Lecture 3
Maximum Likelihood
For given model & discrete observations,
find the maximum likelihood estimate:
θ̂ ≡ maxθ p(z0 , . . . , zT ; θ)
Special case of Markovian Dynamics:
θ̂ ≡ maxθ p(z0 ; Θ)p(z1 |z0 ; θ) . . . p(zT |zT −1 ; θ)
“Transition Density” (aka Conditional Probability Density)
Chris Calderon, PASI, Lecture 3
After Estimating Acceptable (Not Rejected)
Local Models, What’s Next?
• Quality of Data: “Variance” (MC
Simulation in Local Windows)
• Then Global Fit (Nonlinear Regression)
Chris Calderon, PASI, Lecture 3
Note: Noise Magnitude Depends on State
Z (State Space)
And the State is Evolving in a Non-Stationary Fashion
2
State Dependent
Noise Would
“Fool” Wavelet
Type Methods
1.5
1
0.5
0
0
0.25
0.5
Time
0.75
1
dZt = k(vpull t − Zt )dt + σ(Zt )dWt
Chris Calderon, PASI, Lecture 3
Note: Noise Magnitude Depends on State
And the State is Evolving in a Non-Stationary Fashion
2
1.5
σ(Z)
Z (State Space)
2
1.5
1
0.5
0.5
0
0
1
0.25
0.5
Time
Chris Calderon, PASI, Lecture 3
0.75
1
0
0
0.5
1
1.5
Z (State Space)
2
Note: Noise Magnitude Depends on State
And the State is Evolving in a Non-Stationary Fashion
2
1.5
σ(Z)
Z (State Space)
2
1.5
1
0.5
0.5
0
0
1
0.25
0.5
Time
Chris Calderon, PASI, Lecture 3
0.75
1
0
0
0.5
1
1.5
Z (State Space)
Local Time Window
2
Z (State Space)
Corresponding State Window
2
2
1.5
1.5
1
1
0.5
0.5
0
0
0.25
0.5
Time
Chris Calderon, PASI, Lecture 3
0.75
1
0
2
1.5
Local Time Window
1
σ(Z)
0.5
0
Z (State Space)
Global Nonlinear Function of Interest (the “truth”)
2
2
1.5
1.5
1
1
0.5
0.5
0
0
0.25
0.5
Time
Chris Calderon, PASI, Lecture 3
0.75
1
0
2
1.5
1
σ(Z)
0.5
0
Z (State Space)
Point Estimate Comes From Local Parametric Model
(Parametric Likelihood Inference Tools Available)
2
2
1.5
1.5
1
1
0.5
0.5
0
0
0.25
0.5
Time
Chris Calderon, PASI, Lecture 3
0.75
1
0
2
1.5
1
σ(Z)
0.5
0
Point Estimate Comes From Local Parametric Model
(Parametric Likelihood Inference Tools Available)
σ(z) ≈ C + D(z − z )
2
Z (State Space)
2
1.5
1.5
1
1
0.5
0.5
0
0
0
0.25
0.5
Time
Chris Calderon, PASI, Lecture 3
0.75
1
0
2
Ĉ
1.5
1
σ(Z)
0.5
0
Maximum Likelihood
For given model & discrete observations,
find the maximum likelihood estimate:
θ̂ ≡ maxθ p(z0 , . . . , zT ; θ)
Special case of Markovian Dynamics:
θ̂ ≡ maxθ p(z0 ; Θ)p(z1 |z0 ; θ) . . . p(zT |zT −1 ; θ)
“Transition Density” (aka Conditional Probability Density)
Chris Calderon, PASI, Lecture 3
Noisy Point Estimates (finite discrete time series sample
uncertainty)
0
Z (State Space)
σ(z) ≈ C + D(z − z )
2
2
2
1.5
1.5
1.5
1
o
Z
0.5
0
0
Ĉ
1
0.5
0.5
0.25
0.5
Time
0.75
1 0
2
D̂
1
1.5
1
0.5
σ(Z)
0
0 3
2.5
2
1.5
1
0
∂σ(Z)/∂Z
Spatial Derivative of Function of Interest
Chris Calderon, PASI, Lecture 3
0.5
Z (State Space)
Idea Similar in spirit to J. Fan, Y. Fan, J. Jiang, JASA 102
(2007) except uses fully parametric MLE (no local kernels)
in discrete local state windows
2
2
1.5
1.5
From Path to Point
1
1
0.5
0.5
0
0
0.25
0.5
Time
Chris Calderon, PASI, Lecture 3
0.75
1
0
2
1.5
Ĉ
1
σ(Z)
0.5
0
Z (State Space)
HOWEVER: Not Willing To Assume Stationarity in Windows.
(Completely Nonparametric and Fourier Analysis
Problematic). “Subject Specific Function Variability” of Interest
2
2
1.5
1.5
1
1
0.5
0.5
0
0
0.25
0.5
0.75
1
0
2
1.5
1
Validity of Local Models Can Be Tested Using
Generalized Residuals Available Using Transition
Density with Local Parametric Approach
Chris Calderon, PASI, Lecture 3
0.5
σ(Z)
0
2.5
Function of Interest Noisy Point
σ(Z)
2
Estimates
1.5
(Obtained Using Local
Maximum Likelihood)
1
0.5
0
0
Transform
0.5
1
1.5
2
0
“X vs Y=F(X)”
Time
0.25
0.5
0.75
1
0
“X vs t” Data to
0.5
1
Zo
1.5
Z (State Space)
Chris Calderon, PASI, Lecture 3
2
Estimate
Nonlinear
Function via
Regression
∂σ(Z)/∂Z
3
Spatial Derivative of Function of
Noisy Point
Interest
2.5
Estimates
2
(Obtained Using Local
Maximum Likelihood)
1.5
1
Transform
0.5
0
0
0.5
1
1.5
2
0
“X vs Y’=G(X)”
Time
0.25
0.5
0.75
1
0
“X vs t” Data to
0.5
1
Zo
1.5
Z (State Space)
Chris Calderon, PASI, Lecture 3
2
Estimate
Nonlinear
Function via
Regression
2.5
σ(Z)
2
1.5
M
{zm , Ĉ, D̂}m=1
(3)
1
(2)
0.5
0
0
0.5
1
1.5
2
1.5
2
0
Time
0.25
(1)
0.5
0.75
1
0
0.5
1
Z (State Space)
Chris Calderon, PASI, Lecture 3
Use Noisy
Diffusive Path
to Infer
Regression
Data
Same Ideas Apply to Drift Function:
M
{zm , Â, B̂}m=1
Can Also Entertain Simultaneous Regression:
M
{zm , Â, B̂, Ĉ, D̂}m=1
Chris Calderon, PASI, Lecture 3
2.5
σ (Z )
2
Spline
Function
Estimate
Function of Interest
1.5
1
{zm , σ(zm ),
0.5
0
0
0.5
1
1.5
2
1
1.5
2
0
Time
0.25
0.5
0.75
1
0
0.5
Z (State Space)
Chris Calderon, PASI, Lecture 3
dσ(z)
M
|
}
dz zm m=1
Variability of Between
Different Functions
Gives Information About
Unresolved Degrees of
Freedom Reliable
Means for Patching
Windowed Estimates
Together is Desirable
Open Mathematical Questions for
Local SDE Inference
How Can One Efficiently Choose “Optimal” Local
Window Sizes” ?
Hypothesis Tests Checking Markovian Assumption with
Non-Stationary Data? (“Omnibus” goodness-of-fit tests
of Hong and Li [2005] based on probability integral
transform currently employed)
Chris Calderon, PASI, Lecture 3
Gramicidin A: Commonly Studied Benchmark
36,727 Atom Molecular Dynamics Simulation
Potassium
Ion
Explicitly
Modeled
Lipid
Molecules
Explicitly
Modeled
Water
Molecules
Gramicidin A
Protein
Compute Work: Integralz of “Force time Distance”
Chris Calderon, PASI, Lecture 3
PMF and Non-equilibrium Work
50
40
Molecular
Dynamics
Trajectory “i”
SDE
Trajectory
“Tube i”
Nonergodic Sampling:
Pulling “Too Fast”
Variability Induced by
Conformation Initial
Condition (variability
W [kT]
30
20
between curves)
and
10
“Thermal” Noise (Solvent
0
−10
0
Bombardment, Vibrational Motion, etc.
quantified by tube width)
0.2
0.4
τ [ns]
Chris Calderon, PASI, Lecture 3
0.6
0.8
1
PMF and Non-equilibrium Work
50
40
Molecular
Dynamics
Trajectory “i”
SDE
Trajectory
“Tube i”
Distribution at
Final Time is
non-Gaussian
and can be
viewed as
MIXTURE of
distributions
W [kT]
30
20
10
0
−10
0
0.2
0.4
τ [ns]
Chris Calderon, PASI, Lecture 3
0.6
0.8
1
Importance of Tube Variability
Mixture of Distributions
Conformational
Noise
Thermal
Noise
Chris Calderon, PASI, Lecture 3
Other Kinetic Issues
Diffusion Coefficient / Effective Friction State Dependent
NOTE: Noise
Intensity
Differs in the
Two Sets of
Trajectories
Chris Calderon, PASI, Lecture 3
PMF and Non-equilibrium Work
Molecular Dynamics
Trajectory “Tube i”
[each tube starts with same
q’s but different p’s]
60
50
50
40
40
W[kT]
W[kT]
60
Molecular
Dynamics
Trajectory “j”
30
30
20
20
10
3.00
10
3.00
2.25
1.50
λ[Å]
Chris Calderon, PASI, Lecture 3
0.75
0.00
2.25
SDE
Trajectory
“Tube j”
1.50
λ[Å]
0.75
0
2
Pathwise View
Use one path,
discretely
sampled
Ensemble
Averaging
at many time points, to obtain a
Due
to
collectionProblematic
of θ̂s (one for each
path)
1.8
θ̂3
θ̂5
θ̂2
θ̂4
θ̂1
1.6
Zt
1.4
1.2
1
Collection of Models
Another Way of
Accounting for “Memory”
θ̂i
Vertical lines reprent
observation times
0.8
0
Nonergodic /
Nonstationary Sampling
0.2 0.4 0.6 0.8
1
t
Chris Calderon, PASI, Lecture 3
Potential of Mean Force Surface
Hummer & Szabo, PNAS 98 (2001).
Minh & Adib, PRL 100 (2008).
Several issues arise when one attempts
to use this surface to approximate
dynamics in complex systems.
Chris Calderon, PASI, Lecture 3
PMF and Non-equilibrium Work
50
40
Molecular
Dynamics
Trajectory “i”
SDE
Trajectory
“Tube i”
Distribution at
Final Time is
non-Gaussian
and can be
viewed as
MIXTURE of
distributions
W [kT]
30
20
10
0
−10
0
0.2
0.4
τ [ns]
Chris Calderon, PASI, Lecture 3
0.6
0.8
1
PMF of a “Good” Coordinate
Calderon, Janosi & Kosztin, J. Chem. Phys. 130 (2009).
25
U[kT]
20
15
10
Allen et al. (Ref. 27)
Bastug et al. (Ref. 28)
FR (Ref. 16)
FR (w/ SPA Data)
SPA
PMF and
Confidence Band
Computed Using
10 Stretching &
Relaxing Noneq
all-atom MD
Paths
PMFs
Computed
Using Umbrella
Sampling
5
0
−15
−10
−5
0
z[Å]
Chris Calderon, PASI, Lecture 3
5
10
15
Penalized Splines
(See Ruppert, Wand, Carroll’s Text: Semiparametric Regression, Cambridge
University Press, 2003.)
y = {f (x1 ), . . . , f (xm ), ∂f (x1 ), . . . , ∂f (xm )} + ²,
Observed or Inferred Data
yi = η0 + η1 xi . . . ηp xpi +
K
X
Observation Error
ζj Bj (xi ),
j=1
Spline Basis with K<<m (tall and skinny design matrix)
“Fixed Effects”
where β ≡ (η, ζ)
“Random Effects”
Chris Calderon, PASI, Lecture 3
Penalized Splines
(See Ruppert, Wand, Carroll’s Text: Semiparametric Regression, Cambridge
University Press, 2003.)
y = {f (x1 ), . . . , f (xm ), ∂f (x1 ), . . . , ∂f (xm )} + ²,
Observed or Inferred Data
yi = η0 + η1 xi . . . ηp xpi +
K
X
ζj Bj (xi ),
j=1
Spline Basis with K<<m (tall and skinny design matrix)
ky −
2
Cβk2
+
Chris Calderon, PASI, Lecture 3
2
αkDβk2
P-Spline Problem
(Flexible Penalty)
where
β ≡ (η, ζ)
Penalized Splines Using Derivative
Information
⎛
C PuDI :=
1
⎜ ..
⎜.
⎜
⎜1
⎜
⎜0
⎜
⎜.
⎝ ..
0
x1 . . . xp1
(κ1 −
x1 )p+ . . . (κK
−
x1 )p+
..
.
xm . . . xpm
1 . . . pxp−1
1
..
.
..
.
(κ1 − xm )p+ . . . (κK − xm)p+
p−1
p(κ1 − x1 )p−1
.
.
.
p(κ
−
x
)
K
1
+
+
..
.
1 . . . pxp−1
m
p−1
p(κ1 − xm)p−1
.
.
.
p(κ
−
x
)
K
m +
+
Poorly Conditioned Design Matrix
Chris Calderon, PASI, Lecture 3
⎞
⎟
⎟
⎟
⎟
⎟.
⎟
⎟
⎟
⎠
Pick Smoothness: Solve Many
Least Squares Problems
ky −
2
Cβk2
α
+
2
αkDβk2
P-Spline Problem
(Flexible Penalty)
Choose
with Cost Function (GCV). This object has
nice “mixed model” interpretation as ratio of observation
variance / random effects variance
α
Selection Requires Traces and Residuals for
Each Candidate
Chris Calderon, PASI, Lecture 3
PSQR Penalized Splines using QR
(Calderon, Martinez, Carroll, Sorensen)
Allows Fast and Numerically Stable P-spline Results.
Exactly Rank Deficient “C” Can Be Treated
Can Process Many Batches of Curves (Facilitates
Solving Many GCV Type Optimization Problems).
Data Mining without User Intervention
(e.g. Closely Spaced and/or Overlapping Knots Are Not
Problematic)
Let Regularization Be Handled By Built In Smoothness
Penalty (Do Not Introduce Extra Numerical
Regularization Steps).
Chris Calderon, PASI, Lecture 3
Demmler Reinsch Basis Approach
(Popular Method For Efficiently Computing Smoothing Splines)
Forms Eigenbasis using Cholesky Factor of
CT C
α
Traces and Residuals for Each Candidate
Can Easily be Obtained in Vectorized Fashion
Squaring a Matrix is Not a Good Idea (Subtle and
Dramatic Consequences)
Chris Calderon, PASI, Lecture 3
Avoid ad hoc Regularization
Analogous Existing Methods Do Not
Use Good Numerics
e.g. if C T C Cannot be Cholesky
Factored, ad hoc solution:
Find Factor of
C T C + ηD
Instead and Solving that Penalized
Regression Problem Originally Posed
Others use SVD truncation in
Combination with Regularization (Again
Not Problem Posed)
Chris Calderon, PASI, Lecture 3
Basic Sketch of Our Approach
(1) Factor C via QR (Done Only Once)
(2) Then Do SVD on Penalized Columns
α
(3) For Given
Find (Lower Dimensional) QR
and Exploit Banded Matrix Structure. Solve Original
Penalized Least Squares Problem with CHEAP QR
(4) Repeat (3) and Choose
α
Banded “R”
Like Matrix
Chris Calderon, PASI, Lecture 3
µ
S
√
αI
¶
QR and Least Squares
°µ
¶
µ ¶°
°
°2
y °
° √C
β−
°
° .
αD
0 °
°
Efficient QR Treatment?
2
Exploit P-Spline Problem
Structure
Minimizing Vector
Equivalent to Solution of
(C T C + αD T D)β = C T y
AT Aβ = AT (y T , 0)T
¶
µ
C
A≡ √
αD
Chris Calderon, PASI, Lecture 3
Some in Statistics use QR,
but Combine Penalized
Regression with SVD
Truncation
PSQR (Factor Steps)
1. Obtain the QR decomposition of C = QR.
µ F
R11
2. Partition result above as: QR = (QF , QP )
0
P
3. Obtain the SVD of R22
= U SV T .
4. Form the following:
µ
¶
I
0
Q̃ = (QF , QP U ), Ṽ =
,
0 VT
µ F
¶ µ
¶
R̃11 R̃12
R11 R12 V
R̃ =
≡
,
0
S
0
R̃22
⎡µ F ¶⎤
µ T ¶
b
Q̃ y
b=
≡ ⎣ bP ⎦.
0
0
Chris Calderon, PASI, Lecture 3
¶
R12
.
P
R22
PSQR (Solution Steps)
√ P
P
e
fα =
5. For each given α (and/or D ) form: Dα = αD V and W
fα = Q0 R0 .
6. Obtain the QR decomposition
W
µ P¶
b
.
7. Form c = (R0 )−1 (Q0 )T
0
¶
µ −1 F
R̃11 (b − R̃12 c)
.
8. Solve β̂αa =
Vc
Chris Calderon, PASI, Lecture 3
¶
S
eα .
D
µ
PSQR (Efficiency)
¶
µ T¶ µ
¶
µ
SV
S
R̃22
T
T
≡
V
=
V
,
P
DP
DP V
D V
So if
D = diag(0, . . . , 0, 1, . . . 1)
D P = diag(1, . . . 1)
Then problem reduces to finding x minimizing
°µ
°
¶
µ P¶°
¶
µ P¶°
¶µ
°
°2 ° µ
°2
b
b
° √S
°
°
° I, 0
√S
x−
x−
°
° =°
° .
αV
0 °
αI
0 °
°
° 0, V
2
Orthogonal Matrix
Chris Calderon, PASI, Lecture 3
2
Close to “R”
PSQR and Givens Rotations
D P = diag(1, . . . 1)
°
°µ
¶
µ P¶°
¶
µ P¶°
¶µ
°
°2 ° µ
°2
b
b
° I, 0
° √S
°
°
√S
x−
x−
°
° =°
° .
αV
0 °
αI
0 °
° 0, V
°
2
Orthogonal Matrix Close to “R”
Givens Rotations to Finalize QR
Applying Rotations to RHS Yields:
√ −1 P
( Λ) Sb
√
Where R = Λ
Chris Calderon, PASI, Lecture 3
2
Subtle Errors
Chris Calderon, PASI, Lecture 3
PuDI Design Matrix (One Curve at a Time)
Sparsity Pattern
of TPF Basis
QR Needed in
Step 1 (Before
Penalty Added)
Chris Calderon, PASI, Lecture 3
TPF: Free and Penalized Blocks
⎛
C PuDI :=
1 x1 . . . xp1
..
⎜ ..
⎜.
.
⎜
⎜1 xm . . . xpm
⎜
⎜0 1 . . . pxp−1
1
⎜
⎜.
..
⎝ ..
.
0 1 . . . pxp−1
m
(κ1 − x1 )p+ . . . (κK − x1 )p+
..
.
(κ1 − xm )p+ . . . (κK − xm )p+
p−1
p(κ1 − x1 )p−1
.
.
.
p(κ
−
x
)
K
1
+
+
..
.
p−1
p(κ1 − xm )p−1
.
.
.
p(κ
−
x
)
K
m
+
+
Last K Columns
Chris Calderon, PASI, Lecture 3
⎞
⎟
⎟
⎟
⎟
⎟.
⎟
⎟
⎟
⎠
PuDI Design Matrix (Batches of Curves)
Sparsity
Pattern of
TPF Basis
QR Needed
in Step 1
(Before
Penalty
Added)
Chris Calderon, PASI, Lecture 3
Screened Spectrum
Chris Calderon, PASI, Lecture 3
Simple Basis Readily Allows
For Useful Extensions
(e.g. Generalized Least Squares)
Chris Calderon, PASI, Lecture 3
Function of Interest (the “truth”)
Z (State Space)
Noisy Point Estimates
(finite discrete time series sample uncertainty)
2
2
1.5
1.5
1
Z
0.5
0
2
²i
o
1.5
1
σ(Z)
1
0.5
0.5
0
0
3
2.5
²i
2
1.5
1
0.5
0
∂σ(Z)/∂Z
Spatial Derivative of Function of Interest
Chris Calderon, PASI, Lecture 3
Z (State Space)
Point Estimates Noise Distribution Depends on Window and
Function Estimated (Quantifying and Balancing These Can Be
Important); Especially for Resolving “Spiky” Features.
2
2
1.5
1.5
1
Z
0.5
0
2
²i
o
1.5
1
σ(Z)
1
0.5
0.5
0
0
3
2.5
²i
2
1.5
1
0.5
0
∂σ(Z)/∂Z
Spatial Derivative of Function of Interest
Chris Calderon, PASI, Lecture 3