Download Solving large structured non-linear least

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Solving large structured non-linear least-squares
problems, with an application in earth observation
Peter Harris, Arta Dilo, Maurice Cox, Emma Woolliams, Jonathan Mittaz
and Sam Hunt
National Physical Laboratory, UK
Workshop on Mathematics for Measurement, ICMS, 30 Jan – 2 Feb 2017
Welcome to the National Physical Laboratory
Outline
• Measurement problem
– ‘Harmonisation’ of sensors on-board satellites in orbit
• Mathematical problem
– ‘Errors-in-variables’ regression problem
– Very large numbers of measured data and parameters
– Respect data standard uncertainties and covariances
– Require uncertainties for small subset of parameters
• Solution approach under development
2
Measurement problem
• Re-calibrate sensors on-board satellites in orbit
• Use data from ‘match-ups’ – pairs of sensors observe the
same part of the Earth at the same time
• Essential to obtaining records of climate variables that are
– consistent over long periods of time
– supported by valid statements of uncertainty
‘Reference’
Spectral radiance/
W sr−1 m−2 Hz−1
‘Match-ups’
Time/a
3
Measurement model
• Model for spectral radiance measured by each sensor
𝑅 = 𝑓(𝒂; 𝑥, 𝑦, … )
• Model for adjustment between pairs of sensors
ℎ 𝑓(𝒂𝑡 ; 𝑥𝑡 , 𝑦𝑡 , … )
𝐾 = ℎ 𝑓(𝒂𝑠 ; 𝑥𝑠 , 𝑦𝑠 , … ) −
ℎ 𝑅ref
𝒂
(unknown) sensor calibration parameters
𝑥, 𝑦, … stimulus variables
Data with
earth counts, temperatures, …
uncertainty
𝑅ref radiances from single reference sensor
information
𝐾
adjustment factor
𝑠, 𝑡
sensor indices
4
Example
• Advanced Very High Resolution Radiometer (AVHRR)
– senses in three solar channels in the visible-near
infrared region and in three thermal infrared channels
𝜖 + 𝑎1 𝑅ICT − 𝑎2 𝐶S − 𝐶ICT
𝑅 = 𝑎0 +
𝐶S − 𝐶ICT
+𝑎2 𝐶S − 𝐶E
2
2
𝐶S − 𝐶E
+ 𝑎3 𝑇O
𝑥, 𝑦, … = (𝐶E , 𝐶S , 𝐶ICT , 𝑅ICT , 𝑇O )
– model for adjustment
ℎ 𝑅 =𝑅
5
Data
data data
data
• Comprise values 𝑹data
,
𝒙
,
𝒚
,
…
,
𝑲
ref
• Various ‘uncertainty structures’, e.g.,
– Values of 𝑅ref (and 𝐾) are subject to random effects only
with known standard uncertainty 𝜎ref (and 𝜎𝐾 )
– Values of 𝑥 for a sensor within a match-up series are
obtained by applying ‘moving average’ operator to values
obtained independently with known 𝜎𝑥
– Values of 𝑦 for a sensor in all match-up series are
subject to random effects and a common systematic
effect with known 𝜎r and 𝜎s
– … maybe others
6
Parameters
• Comprise variables 𝑹ref , 𝒙, 𝒚, … , 𝑲, 𝒂
• Introduce standardised variables 𝑹ref , 𝒙, 𝒚, … , 𝑲 that are
independent and have variances of one
𝑅ref = 𝜎ref 𝑅ref ,
var 𝑅ref = 1
𝒙 = 𝜎𝑥 𝑾𝑥 𝒙,
var 𝒙 = 𝑰,
𝑾𝑥 a band matrix
𝒚 = 𝜎r 𝑰𝒚 + 𝜎s 𝟏𝑦0 ,
var 𝒚 = 𝑰,
var 𝑦0 = 1
– helps with problem scaling and conditioning
– helps treat data correlations
– transform data to relate to standardised variables
7
‘Errors-in-variables’ regression problem
Data
Parameters
data
data
data
𝑹data
,
𝒙
,
𝒚
,
…
,
𝑲
ref
𝑹ref , 𝒙, 𝒚, … , (𝑲, )𝒂𝑠 , 𝒂𝑡 , …
Objective function
𝐹≡
𝑹data
ref
− 𝑹ref
+ 𝒚
with
data
T
𝑹data
ref
−𝒚
T
− 𝑹ref +
𝒙data
−𝒙
T
𝒙data − 𝒙
𝒚data − 𝒚 + (𝑲data − 𝑲)T (𝑲data − 𝑲)
𝜎𝐾 𝐾 = ℎ 𝑓(𝒂𝑠 ; 𝑥𝑠 (𝒙𝑠 ), 𝑦𝑠 (𝑦𝑠 , 𝑦0,𝑠 ), … )
ℎ 𝑓(𝒂𝑡 ; 𝑥𝑡 (𝒙𝑡 ), 𝑦𝑡 (𝑦𝑡 , 𝑦0,𝑡 ), … )
−
ℎ 𝜎ref 𝑅ref
8
Structure of (weighted) Jacobian
𝑹data
ref
𝑹data
ref
𝒙𝐝𝐚𝐭𝐚
𝒙data
𝒚𝐝𝐚𝐭𝐚
𝒚data
𝑲𝐝𝐚𝐭𝐚
𝑲𝐝𝐚𝐭𝐚
𝑹ref
𝒙
𝒚
𝒂
𝑹ref
𝒙
𝒚
𝒂
9
Technical objective and characteristics
• Estimate sensor calibration parameters 𝒂𝑠 , 𝒂𝑡 , …
– other parameters regarded as nuisance parameters
• ‘Large’ problem
– Number 𝑚 of observations 𝑂 107 − 𝑂 109
– Number 𝑛 of adjustable parameters 𝑂 107 − 𝑂 109
– Number of calibration parameters small 𝑂(101 ) −𝑂 102
• Correlations exist ‘across’ match-ups
– Not ‘orthogonal distance regression’ (ODR)
• Evaluate covariance matrices for 𝒂𝑠 , 𝒂𝑡 , …
10
Non-linear least-squares problem
• Solve using Gauss-Newton (GN) algorithm
• Each iteration: solve linear least squares (LLS) problem
𝑱∆𝒑 = −𝒇
for increment to 𝒑, the current approximation to the
solution
• Since general performance of GN algorithms very
satisfactory, concentrate on solving LLS problem
11
Linear least-squares problem
• Example: 𝑚 ≈ 108 , 𝑛 ≈ 108 : 𝑂 1016 elements
• QR factorization of dense 𝑱: 𝑂 𝑚𝑛2 = 𝑂(1024 ) flops
PC:
𝑂 1011 flops 𝑠 −1  𝑂 1013 s
Tianhe-2: 𝑂 1017 flops 𝑠 −1  𝑂 107 s ≈ 𝑂 102 days
• Other direct methods have essentially same operation count
• Consider iterative methods that account for structure and
sparsity
12
Linear least-squares problem
• Iterative algorithm generates sequence of approximations
(in exact arithmetic, identical to conjugate gradients for 𝑱T 𝑱)
• Convergence in exact arithmetic in ≤ 𝑛 iterations
• Each iteration requires calculation of 𝑱𝒗 and 𝑱T𝒖
– Avoids need to store 𝑱
– Calculations implemented to exploit structure/sparsity
• Examples include ‘LSQR’ and ‘LSMR’
13
Linear least-squares problem
• Example: 𝑚 ≈ 108 , 𝑛 ≈ 108 : 𝑂 1016 elements
– 9 × 107 rows contain 1 non-zero element
– 1 × 107 rows contain 100 non-zero elements
i.e. 𝑛z ≈ 109 non-zero elements
• 𝑛 iterations take 𝑂 𝑛𝑛z flops ≈ 𝑂 1017 flops
PC
[𝑂 1011
Tianhe-2 [𝑂 1017
LSQR
flops 𝑠 −1 ] 𝑂 106 s
flops 𝑠 −1 ] 𝑂 100 s
• Usually convergence in ≪ 𝑛 iterations
Dense QR
𝑂 1013 s
𝑂 107 s
14
Uncertainty evaluation
• Element (𝑖, 𝑗) of the covariance matrix obtained by solving
cov 𝒑𝑖 , 𝒑𝑗 =
𝒆T𝑖
−1
T
𝑱 𝑱 𝒆𝑗 ,
𝒆𝑖
𝑘
= δ𝑖𝑘
• Or, partition the Jacobian and covariance matrix as
𝑰
𝑱=
𝑼
𝟎
,
𝑱𝑎
𝑽𝑥
𝑽=
𝑽𝑎𝑥
𝑽𝑥𝑎
𝑽𝑎
and then,
𝑽𝑎 =
𝑱T𝑎 𝑱𝑎
−
𝑱T𝑎 𝑼
𝑰+
−1
−1 T
T
𝑼 𝑼 𝑼 𝑱𝑎
• Depends on solving large square, symmetric systems of
equations with small numbers of right-hand sides
• Again apply iterative approach, e.g., ‘MINRES’
15
Numerical considerations
• Convergence problems in some cases
• Pre-conditioning 𝑱 gives decreased iteration count
– Seek 𝑷 such that
−1
T
𝑽= 𝑱 𝑱
≈ 𝑷𝑷T
– Solve
(𝑱𝑷)∆𝒒 = −𝒇
• Construct 𝑃 as Cholesky factor of approximation to 𝑽
– (Diagonal) covariance matrix for nuisance variables
– Covariance matrix for variables 𝒂 obtained from solving
least-squares problem defined by subset of data for 𝑲
and variables 𝒂 holding other variables fixed
16
Pre-conditioner
• Solve approximation to errors-in-variables problem
– Set nuisance variables equal to their measured values
with covariance matrix 𝑰
– Obtain estimates of 𝒂 and covariance matrix 𝑽𝑎 by
solving non-linear least-squares problems defined by
equations
data
𝐾 + ℎ 𝑅ref
= ℎ 𝑓(𝒂𝑠 ) ,
𝐾 = ℎ 𝑓(𝒂𝑠 ) − ℎ 𝑓(𝒂𝑡 )
and data
𝑲data ,
2
𝑽𝐾 = 𝜎𝐾2 𝑰 + 𝜎ref
𝑱ref 𝑱Tref + 𝜎𝑥2 𝑱𝑥 𝑱T𝑥 +…
17
Indicative results
• Simulated problem
– 4 sensors to be calibrated
– 5 match-up series each comprising 0.5 × 106 match-ups
– Models 𝑅 = 𝑓 𝒂; 𝑥, 𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑦, ℎ 𝑅 = 𝑅
– Different uncertainty structures for 𝑅ref , 𝑥, 𝑦, 𝐾
• Problem with 𝑚 ≈ 12 × 106 and 𝑛 ≈ 9.5 × 106
– 3 GN iterations taking 59, 27, 6 LSMR iterations
– Uncertainty evaluation both ways: 2  12 square
systems taking between 4 and 40 MINRES iterations
– Elapsed time of 72 mins on NPL workstation
18
Summary
• Measurement problem concerned with re-calibration of
sensors on-board satellites in orbit
– Treatment of uncertainties an essential component
• Mathematical problem
– Large non-linear least-squares problem
– Re-parametrisation to account for uncertainty information
– Iterative approaches applied to exploit structure/sparsity
– Pre-conditioning to help numerical performance
• Real data to become available in the (very) near-future
19
Work supported by
FIDUCEO project (Fidelity and Uncertainty in Climate data records from Earth
Observations) through funding from the European Union’s Horizon 2020
Programme for Research and Innovation under Grant Agreement no. 638822
(www.fiduceo.eu)
and the Mathematics and Modelling programme of the
The National Physical Laboratory is operated by NPL Management Ltd, a whollyowned company of the Department for Business, Energy and Industrial Strategy
BEIS).
20