Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NPE - Cross-cutting research on verification techniques Presentation Session Code: SCI-PS153.03 Verifying modelled currents using a threshold exceedance approach Dr Ray Mahdon An exploration of the Gerrity Skill Score © Crown copyright Met Office Verifying modelled currents using a threshold exceedance approach An exploration of the Gerrity Skill Score Table of Contents • Introduction • Data Source & Locations • Differing Current Regimes • Time Series, Continuous Statistics & Simple Cat. Metrics • Neighbourhood Methods • Bias Removal Questions • Multi-Cat. Metric – Gerrity Skill Score & Ocean Currents • Threshold Choices © Crown copyright Met Office Introduction • Surface currents forecasts important for commercial or defence “weather-windows” • e.g. Current speed below 1kt for 12 hours. • e.g. Does not exceed 1kt more than x times • Good for site-specific & threshold based analysis • Some questions we are trying to answer….. • Does the model capture extreme events or “weather-windows”? • In which locations or time of year do the models have the best performance; is there a significant difference in regime, time or area? © Crown copyright Met Office Data Source & Locations Donostia 62083 62025 6201030 Shelf Circulation 62024 Wind & Tidal Matxitxako Currents 61280 Eddies 61281 General Ocean Circulation 61417 62085 26-56N,19W-5E © Crown copyright Met Office 61198 Slope Current MyOcean Puertos Del Estado 61430 Data, Time Series & Continuous Statistics • Hourly frequency, Jan 2012 – Jun 2014 (30 months) • Collocated model & In-Situ moored observation surface currents • Continuous statistics are helpful to describe overall behaviour • e.g. q-q & histogram plots describe climatology • Timeseries can show seasonal patterns or significant events • Do not quantify the performance of a system when exceeding thresholds is of interest • We focus on surface currents • validation is relatively sparse for this parameter • → Categorical Metric Assessment • Simple 2x2 (binary) contingency table per chosen threshold © Crown copyright Met Office Neighbourhood Sampling Spatial Neighbourhoods Neighbourhoods: 1x1, 3x3, 5x5,..,NxN Combinations spatial & temporal neighbourhoods trialled Temporal Neighbourhoods T+1 T+0 T-1 © Crown copyright Met Office Time averaging & shifting Simple Categorical Metrics Improvements from temporal averaging hour-hour assessment not good as CSI → ETS says model mostly correct by chance! CORR. REJ. CSI ETS F. ALARMS HITS MISSES CSI & ETS require un-biased input data Over what period should a tidally dominated field be normalised:– 1 tidal cycle; spring-neap cycle; astronomical cycle? How to handle –ve currents? © Crown copyright Met Office Multi-Categorical Metric Method The Gerrity Skill Score © Crown copyright Met Office Gerrity* Skill Score (GSS) • Refinement of binary categorical methods • Does not depend on the forecast distribution • Rewards/penalises for rare(extreme)/disparate events • does not reward conservative forecasting • Large choice of threshold divisions • Good observation (sample) climatology required • Contingency table distribution leads to scoring matrix OBS • EquitableC≤T1 (i.e., random constant forecasts score a value of 0) T1<C≤T2 &C≥T3 GSS=0.38 C≤T1 46 21 4 MODEL T1<C≤T2 26 55 5 C≥T3 5 4 8 © Crown copyright Met Office × 0.56 -0.45 -1.00 -0.45 0.55 -0.01 -1.00 -0.01 5.08 * Gerrity, J.P., (1992), Monthly Weather Review, 120, 2709-2712. GSS - Threshold Choices 1 year rolling data per point, captured from 2 ½ years (365 × 24 = 8760 pts. – a good climatology!) Skewed Thresholds Equal Frequency Distribution [0.10,0.25,0.45,0.7] [20,40,60,80] percentiles Variability in skill versus thresholds, neighbourhood & time Clues in events from time series & data captured © Crown copyright Met Office GSS - Threshold Choices Cont. Equal Frequency Distribution = [0.07 , 0.12 , 0.18 , 0.25] Skewed Thresholds = [0.10 , 0.25 , 0.45 , 0.70] Daily Max/Min Current Speed - 62024 Mean error = -0.03 ms-1 © Crown copyright Met Office RMSE = 0.11 ms-1 GSS - Threshold Choices Cont. Equal Frequency Distribution = [0.05 , 0.1 , 0.15 , 0.2] Skewed Thresholds = [0.1 , 0.25 , 0.45 , 0.7] Daily Max/Min Current Speed - 62024 Mean error = -0.03 ms-1 © Crown copyright Met Office RMSE = 0.11 ms-1 GSS - Threshold Choices Cont. 1 year’s data captured from 2 ½ years (365 × 24 = 8760 pts. – a good climatology ) Equal Frequency Distribution Regular Thresholds OBS OBS C<=0.25 0.25<C<=0.5 FC C<=0.25 272 6 FC 0.25<C<=0.5 16 19 GSS=0.7 × 0.09 -1.00 -1.00 11.52 Equal Frequency Distribution = [0.07 , 0.12 , 0.18 , 0.25] Regular Thresholds = [0.25 , 0.5 , 0.75 , 1.0] © Crown copyright Met Office Other trials & results • Various spatial & temporal neighbourhoods • Report similar results • Preliminary results on other model systems show similar skill scores • Met Office FOAM-Shelf system • Maximum skill versus neighbourhood size • Other binning thresholds • No firm a priori binning remains a deficiency • Decoupling tidal cycle & residual current from raw signal to highlight skill partitioning • Doodson sea surface height decoupler trialled • Separation of potentially non-parallel (orthogonal) fields not addressed © Crown copyright Met Office Conclusions © Crown copyright Met Office Conclusions • Hourly frequency currents, Jan 2012 – Jun 2014 (30 months) • Threshold based assessment • Continuous statistics are helpful to describe overall behaviour • Timeseries can show seasonal patterns • Does not quantify spatial or temporally coordinated model/obs values • → Categorical Metric Assessment • Gerrity Skill Score – attractive attributes for rewards/penalties © Crown copyright Met Office Conclusions cont. • Choice of thresholds important • Model CAN CAPTURE EXTREME EVENTS – Threshold dependent ! • Equal Frequency Distribution appears to be the fairest a priori • Can be personalised to a particular regime or current distribution • Timeseries needed alongside Gerrity • Missing data can skew results • Similar locations/regimes appear to give broadly similar Gerrity Skill Scores • Winter months tend to show better skill – more extreme events • Multi-category methods on surface ocean current speed are relatively new, so expectation of skill level is unknown © Crown copyright Met Office Future Work • Now concept established, apply to forecast data • Include other regional models which have long-term observation record • Bootstrapping Gerrity Skill Score • Error estimation around each score • Return to bias removal issue • Scaled currents, rather than constant removal? • Assess wind speed with Gerrity Skill Score & compare to surface currents • Potentially highlights efficiency of wind speed transmission to surface currents in Ocean:Atmosphere boundary © Crown copyright Met Office Acknowledgement • Thank you to MyOcean for funding towards this work © Crown copyright Met Office THANK YOU FOR YOUR ATTENTION Any Questions (& answers)? © Crown copyright Met Office