Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Solving large structured non-linear least-squares problems, with an application in earth observation Peter Harris, Arta Dilo, Maurice Cox, Emma Woolliams, Jonathan Mittaz and Sam Hunt National Physical Laboratory, UK Workshop on Mathematics for Measurement, ICMS, 30 Jan – 2 Feb 2017 Welcome to the National Physical Laboratory Outline • Measurement problem – ‘Harmonisation’ of sensors on-board satellites in orbit • Mathematical problem – ‘Errors-in-variables’ regression problem – Very large numbers of measured data and parameters – Respect data standard uncertainties and covariances – Require uncertainties for small subset of parameters • Solution approach under development 2 Measurement problem • Re-calibrate sensors on-board satellites in orbit • Use data from ‘match-ups’ – pairs of sensors observe the same part of the Earth at the same time • Essential to obtaining records of climate variables that are – consistent over long periods of time – supported by valid statements of uncertainty ‘Reference’ Spectral radiance/ W sr−1 m−2 Hz−1 ‘Match-ups’ Time/a 3 Measurement model • Model for spectral radiance measured by each sensor 𝑅 = 𝑓(𝒂; 𝑥, 𝑦, … ) • Model for adjustment between pairs of sensors ℎ 𝑓(𝒂𝑡 ; 𝑥𝑡 , 𝑦𝑡 , … ) 𝐾 = ℎ 𝑓(𝒂𝑠 ; 𝑥𝑠 , 𝑦𝑠 , … ) − ℎ 𝑅ref 𝒂 (unknown) sensor calibration parameters 𝑥, 𝑦, … stimulus variables Data with earth counts, temperatures, … uncertainty 𝑅ref radiances from single reference sensor information 𝐾 adjustment factor 𝑠, 𝑡 sensor indices 4 Example • Advanced Very High Resolution Radiometer (AVHRR) – senses in three solar channels in the visible-near infrared region and in three thermal infrared channels 𝜖 + 𝑎1 𝑅ICT − 𝑎2 𝐶S − 𝐶ICT 𝑅 = 𝑎0 + 𝐶S − 𝐶ICT +𝑎2 𝐶S − 𝐶E 2 2 𝐶S − 𝐶E + 𝑎3 𝑇O 𝑥, 𝑦, … = (𝐶E , 𝐶S , 𝐶ICT , 𝑅ICT , 𝑇O ) – model for adjustment ℎ 𝑅 =𝑅 5 Data data data data • Comprise values 𝑹data , 𝒙 , 𝒚 , … , 𝑲 ref • Various ‘uncertainty structures’, e.g., – Values of 𝑅ref (and 𝐾) are subject to random effects only with known standard uncertainty 𝜎ref (and 𝜎𝐾 ) – Values of 𝑥 for a sensor within a match-up series are obtained by applying ‘moving average’ operator to values obtained independently with known 𝜎𝑥 – Values of 𝑦 for a sensor in all match-up series are subject to random effects and a common systematic effect with known 𝜎r and 𝜎s – … maybe others 6 Parameters • Comprise variables 𝑹ref , 𝒙, 𝒚, … , 𝑲, 𝒂 • Introduce standardised variables 𝑹ref , 𝒙, 𝒚, … , 𝑲 that are independent and have variances of one 𝑅ref = 𝜎ref 𝑅ref , var 𝑅ref = 1 𝒙 = 𝜎𝑥 𝑾𝑥 𝒙, var 𝒙 = 𝑰, 𝑾𝑥 a band matrix 𝒚 = 𝜎r 𝑰𝒚 + 𝜎s 𝟏𝑦0 , var 𝒚 = 𝑰, var 𝑦0 = 1 – helps with problem scaling and conditioning – helps treat data correlations – transform data to relate to standardised variables 7 ‘Errors-in-variables’ regression problem Data Parameters data data data 𝑹data , 𝒙 , 𝒚 , … , 𝑲 ref 𝑹ref , 𝒙, 𝒚, … , (𝑲, )𝒂𝑠 , 𝒂𝑡 , … Objective function 𝐹≡ 𝑹data ref − 𝑹ref + 𝒚 with data T 𝑹data ref −𝒚 T − 𝑹ref + 𝒙data −𝒙 T 𝒙data − 𝒙 𝒚data − 𝒚 + (𝑲data − 𝑲)T (𝑲data − 𝑲) 𝜎𝐾 𝐾 = ℎ 𝑓(𝒂𝑠 ; 𝑥𝑠 (𝒙𝑠 ), 𝑦𝑠 (𝑦𝑠 , 𝑦0,𝑠 ), … ) ℎ 𝑓(𝒂𝑡 ; 𝑥𝑡 (𝒙𝑡 ), 𝑦𝑡 (𝑦𝑡 , 𝑦0,𝑡 ), … ) − ℎ 𝜎ref 𝑅ref 8 Structure of (weighted) Jacobian 𝑹data ref 𝑹data ref 𝒙𝐝𝐚𝐭𝐚 𝒙data 𝒚𝐝𝐚𝐭𝐚 𝒚data 𝑲𝐝𝐚𝐭𝐚 𝑲𝐝𝐚𝐭𝐚 𝑹ref 𝒙 𝒚 𝒂 𝑹ref 𝒙 𝒚 𝒂 9 Technical objective and characteristics • Estimate sensor calibration parameters 𝒂𝑠 , 𝒂𝑡 , … – other parameters regarded as nuisance parameters • ‘Large’ problem – Number 𝑚 of observations 𝑂 107 − 𝑂 109 – Number 𝑛 of adjustable parameters 𝑂 107 − 𝑂 109 – Number of calibration parameters small 𝑂(101 ) −𝑂 102 • Correlations exist ‘across’ match-ups – Not ‘orthogonal distance regression’ (ODR) • Evaluate covariance matrices for 𝒂𝑠 , 𝒂𝑡 , … 10 Non-linear least-squares problem • Solve using Gauss-Newton (GN) algorithm • Each iteration: solve linear least squares (LLS) problem 𝑱∆𝒑 = −𝒇 for increment to 𝒑, the current approximation to the solution • Since general performance of GN algorithms very satisfactory, concentrate on solving LLS problem 11 Linear least-squares problem • Example: 𝑚 ≈ 108 , 𝑛 ≈ 108 : 𝑂 1016 elements • QR factorization of dense 𝑱: 𝑂 𝑚𝑛2 = 𝑂(1024 ) flops PC: 𝑂 1011 flops 𝑠 −1 𝑂 1013 s Tianhe-2: 𝑂 1017 flops 𝑠 −1 𝑂 107 s ≈ 𝑂 102 days • Other direct methods have essentially same operation count • Consider iterative methods that account for structure and sparsity 12 Linear least-squares problem • Iterative algorithm generates sequence of approximations (in exact arithmetic, identical to conjugate gradients for 𝑱T 𝑱) • Convergence in exact arithmetic in ≤ 𝑛 iterations • Each iteration requires calculation of 𝑱𝒗 and 𝑱T𝒖 – Avoids need to store 𝑱 – Calculations implemented to exploit structure/sparsity • Examples include ‘LSQR’ and ‘LSMR’ 13 Linear least-squares problem • Example: 𝑚 ≈ 108 , 𝑛 ≈ 108 : 𝑂 1016 elements – 9 × 107 rows contain 1 non-zero element – 1 × 107 rows contain 100 non-zero elements i.e. 𝑛z ≈ 109 non-zero elements • 𝑛 iterations take 𝑂 𝑛𝑛z flops ≈ 𝑂 1017 flops PC [𝑂 1011 Tianhe-2 [𝑂 1017 LSQR flops 𝑠 −1 ] 𝑂 106 s flops 𝑠 −1 ] 𝑂 100 s • Usually convergence in ≪ 𝑛 iterations Dense QR 𝑂 1013 s 𝑂 107 s 14 Uncertainty evaluation • Element (𝑖, 𝑗) of the covariance matrix obtained by solving cov 𝒑𝑖 , 𝒑𝑗 = 𝒆T𝑖 −1 T 𝑱 𝑱 𝒆𝑗 , 𝒆𝑖 𝑘 = δ𝑖𝑘 • Or, partition the Jacobian and covariance matrix as 𝑰 𝑱= 𝑼 𝟎 , 𝑱𝑎 𝑽𝑥 𝑽= 𝑽𝑎𝑥 𝑽𝑥𝑎 𝑽𝑎 and then, 𝑽𝑎 = 𝑱T𝑎 𝑱𝑎 − 𝑱T𝑎 𝑼 𝑰+ −1 −1 T T 𝑼 𝑼 𝑼 𝑱𝑎 • Depends on solving large square, symmetric systems of equations with small numbers of right-hand sides • Again apply iterative approach, e.g., ‘MINRES’ 15 Numerical considerations • Convergence problems in some cases • Pre-conditioning 𝑱 gives decreased iteration count – Seek 𝑷 such that −1 T 𝑽= 𝑱 𝑱 ≈ 𝑷𝑷T – Solve (𝑱𝑷)∆𝒒 = −𝒇 • Construct 𝑃 as Cholesky factor of approximation to 𝑽 – (Diagonal) covariance matrix for nuisance variables – Covariance matrix for variables 𝒂 obtained from solving least-squares problem defined by subset of data for 𝑲 and variables 𝒂 holding other variables fixed 16 Pre-conditioner • Solve approximation to errors-in-variables problem – Set nuisance variables equal to their measured values with covariance matrix 𝑰 – Obtain estimates of 𝒂 and covariance matrix 𝑽𝑎 by solving non-linear least-squares problems defined by equations data 𝐾 + ℎ 𝑅ref = ℎ 𝑓(𝒂𝑠 ) , 𝐾 = ℎ 𝑓(𝒂𝑠 ) − ℎ 𝑓(𝒂𝑡 ) and data 𝑲data , 2 𝑽𝐾 = 𝜎𝐾2 𝑰 + 𝜎ref 𝑱ref 𝑱Tref + 𝜎𝑥2 𝑱𝑥 𝑱T𝑥 +… 17 Indicative results • Simulated problem – 4 sensors to be calibrated – 5 match-up series each comprising 0.5 × 106 match-ups – Models 𝑅 = 𝑓 𝒂; 𝑥, 𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑦, ℎ 𝑅 = 𝑅 – Different uncertainty structures for 𝑅ref , 𝑥, 𝑦, 𝐾 • Problem with 𝑚 ≈ 12 × 106 and 𝑛 ≈ 9.5 × 106 – 3 GN iterations taking 59, 27, 6 LSMR iterations – Uncertainty evaluation both ways: 2 12 square systems taking between 4 and 40 MINRES iterations – Elapsed time of 72 mins on NPL workstation 18 Summary • Measurement problem concerned with re-calibration of sensors on-board satellites in orbit – Treatment of uncertainties an essential component • Mathematical problem – Large non-linear least-squares problem – Re-parametrisation to account for uncertainty information – Iterative approaches applied to exploit structure/sparsity – Pre-conditioning to help numerical performance • Real data to become available in the (very) near-future 19 Work supported by FIDUCEO project (Fidelity and Uncertainty in Climate data records from Earth Observations) through funding from the European Union’s Horizon 2020 Programme for Research and Innovation under Grant Agreement no. 638822 (www.fiduceo.eu) and the Mathematics and Modelling programme of the The National Physical Laboratory is operated by NPL Management Ltd, a whollyowned company of the Department for Business, Energy and Industrial Strategy BEIS). 20