Download Talk

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Receiver operating characteristic wikipedia , lookup

History of numerical weather prediction wikipedia , lookup

Numerical weather prediction wikipedia , lookup

Tropical cyclone forecast model wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Verification of nowcasts and
very short range forecasts
Beth Ebert
BMRC, Australia
WWRP Int'l Symposium on Nowcasting and Very Short Range Forecasting, Toulouse, 5-9 Sept 2005
1
Why verify forecasts?
• To monitor performance over time
 summary scores
• To evaluate and compare forecast
systems
 continuous and
categorical scores
• To show impact of forecast
 skill & value scores
• To understand error in order to
improve forecast system
 diagnostic methods
The verification approach taken depends on the purpose of the
verification
2
Verifying nowcasts and very short range forecasts
Nowcast characteristics
Impact on verification
concerned mainly with high impact
weather
rare events difficult to verify in
systematic manner
may detect severe weather
elements
storm spotter observations &
damage surveys required
observations-based
same observations often used to
verify nowcasts
high temporal frequency
many nowcasts to verify
high spatial resolution
observation network usually not
dense enough (except radar)
small spatial domain
relatively small number of standard
observations
3
Observations – issues for nowcasts
Thunderstorms and severe weather (mesocyclones, hail,
lightning, damaging winds)
• Spotter observations may contain error
• Biased observations
• More observations during daytime & in populated areas
• More storm reports when warnings were in effect
• Cell mis-association by cell tracking algorithms
Precipitation
• Radar rain rates contain error
• Scale mismatch between gauge observations and radar pixels
Observation error can be large but is usually neglected
 more research required on handling observation error
4
Matching forecasts and observations
• Matching approach depends on
• Nature of forecasts and observations
• Scale
• Consistency
• Sparseness
• Other matching criteria
point-to-grid
grid-to-point
Forecast grid
• Verification goals
• Use of forecasts
• Matching approach can impact
verification results
• Grid to grid approach
Observed grid
• Overlay forecast and observed grids
• Match each forecast and observation
5
Forecast Quality Definitions
Wilson subjective categories
1 – forecast and observed almost perfect overlap.
2 – majority of observed and forecast echoes overlap or offsets <50 km
3 – forecast and observed look similar but there are a number of echo
offsets and several areas maybe missing or extra.
4 – the forecasts and observed are significantly different with very little
overlap; but some features are suggestive of what actually occurred.
5 – there is no resemblance to forecast and observed.
First rule of forecast verification – look at the results!
Systematic verification – many cases
Aggregation and stratification
• Aggregation
•
•
•
•
More samples  more robust statistics
Across time - results for each point in space
Space - results for each time
Space and time - results summarized across spatial region
and across time
• Stratification
• Homogeneous subsamples  better understanding of how
errors depend on regime
• By location or region
• By time period (diurnal or seasonal variation)
7
Real-time nowcast verification
• Rapid feedback from latest radar scan
• Evaluate the latest objective guidance while it is still "fresh"
• Better understand strengths and weaknesses of nowcast system
• Tends to be subjective in nature
1
radar anal
LAPS05
Probability of exceedance
0.1
0.01
0.001
0.0001
0.00001
0.000001
0
10
20
30
40
50
60
70
Rain (mm/h)
Real time forecast verification system (RTFV) under development in BMRC
• Not commonly performed!
8
Post-event verification
POD
• More observations may be available
• verification results more robust
• No single measure is adequate!
Frequency
bias
• several metrics needed
• distributions-oriented verification
• scatter plots
• (multi-category) contingency tables
• box-whisker plots
CSI
FAR
• Confidence intervals recommended,
especially when comparing one set
of results with another
• Bootstrap (resampling) method
simple to apply
9
Accuracy – categorical verification
Estimated
yes
M
observations
H
F
CR
Observed
forecast
no
yes
H = hits
M = misses
no
F = false
alarms
CR = correct
rejections
Standard categorical verification scores
PC = (H + CR) / N
Bias = (F + H) / (M + H)
POD = H / (H + M)
POFD = F / (CR + F)
FAR = F / (H + F)
CSI = H / (H + M + F)
ETS = (H – Hrandom) / (H + M + F – Hrandom)
HSS = (H + CR – PCrandom) / (N – PCrandom)
HK = POD – POFD
OR = (H * CR) / (F * M)
proportion correct (accuracy)
frequency bias
probability of detection
probability of false detection
false alarm ratio
critical success index (threat score)
equitable threat score
Heidke skill score
Hanssen and Kuipers discriminant
odds ratio
10
Accuracy – continuous verification
Forecast
F
Observations
O
Domain
Standard continuous verification scores
(scores computed over entire domain)
bias = F  O
mean error
MAE = F  O
mean absolute error
1
( F  O )2

N
 ( F  F )( O  O )
RMSE =
r=
2
(
F

F
)

2
(
O

O
)

root mean square error
correlation coefficient
11
Accuracy – probabilistic verification
Standard probabilistic verification scores/methods
Reliability diagram
Relative operating characteristic (ROC)
1 N
BS 
( pi  oi )2

N i 1
BSS  1 
BS
BSreference
1 M
RPS 
( CDFfcst,m  CDFobs ,m )2

M  1 m 1
Brier score
Brier skill score
Ranked probability score 12
Strategy 1: Plot the performance of the
forecast system and the unskilled
reference on the same diagram
Skill scores measure the relative
improvement of the forecast over
the reference forecast:
skill 
scoreforecast  scorereference forecast
scoreperfect  scorereference forecast
Strategy 2: Plot the value of the skill score
0.6
> 0 mm
0.5
> 1 mm
> 5 mm
0.4
0.3
0.2
0.1
0
-0.1
30
60
90
120 150
Forecast (min)
180
____ Nowcast _ _ _ Extrapolation........ Gauge persistence
Skill w.r.t. gauge persis.
A forecast has skill if it is more
accurate than a reference forecast
(usually persistence, cell extrapolation, or random chance).
Hanssen & Kuipers score
Skill
0
-0.2
-0.4
-0.6
-0.8
-1
30
60
90
120
150
180
Forecast (min)
____ Nowcast _ _ _ Extrapolation
13
Practically perfect hindcast – upper bound on accuracy
Approach: If the forecaster had all of the observations in
advance, what would the "practically perfect" forecast look like?
• Apply a smoothing function to the observations to get probability
contours, choose an appropriate yes/no threshold
• Did the actual forecast look like the practically perfect forecast?
• How did the performance of the actual forecast compare to the
performance of the practically perfect forecast?
Convective outlook
was 75% of the way
to being "practically
perfect"
SPC convective outlook CSI = 0.34
Practically perfect hindcast CSI = 0.48
Kay and Brooks, 2000
14
"Double penalty"
Event predicted where it did not occur, no event predicted
where it did occur
Big problem for nowcasts and other high resolution forecasts
Ex: Two rain forecasts giving the same volume
fcst
obs
10
10
High resolution forecast
RMS ~ 4.7
POD=0, FAR=1, CSI=0
fcst
3
obs
10
Low resolution forecast
RMS ~ 2.7
POD~1, FAR~0.7, CSI~0.3
15
Value
A forecast has value if it helps a user make a better decision
Value scores measures the relative economic value of the forecast
over some reference forecast:
expenseforecast  expensereference forecast
value 
expenseperfect  expensereference forecast
obs
fcst
Small or rare events with high losses,
value maximized by over-prediction
Expense depends on the cost of
taking preventative action and the
loss incurred for a missed event
obs fcst
Events with high costs and displacement error
likely, value maximized by under-prediction
The most accurate forecast is not always the most valuable!
Baldwin and Kain, 2004
16
Exact match vs. "close enough"
Need we get a high resolution forecast exactly right?
Often "close" is still useful to a forecaster
YES
• High stakes situations (e.g. space
shuttle launch, hurricane landfall)
• Hydrological applications (e.g. flash
floods)
• Topographically influenced weather
(valley winds, orographic rain, etc.)
Standard verification methods
appropriate (POD, FAR, CSI,
bias, RMSE, correlation, etc.)
NO
• Guidance for forecasters
• Model validation (does it predict
what we expect it to predict?)
• Observations may not allow
standard verification of high
resolution forecasts
"Fuzzy" verification methods,
diagnostic methods
verify attributes of forecast
17
"Fuzzy" verification methods
• Large forecast and observed variability at high resolution
Sydney
Forecasters don't (shouldn't!) take a high resolution forecast
at face value – instead they interpret it in a probabilistic way.
• Fuzzy verification methods don't require an exact match between
forecasts and observations to get a good score
• Damrath, 2004
• Rezacova and Sokol, 2004 *
t-1
• Theis et al., 2005
t
• Roberts, 2004 *
t+1
Frequency
• Vary the size of the space / time neighborhood around a point
Forecast value
• Germann and Zawadski, 2004
• Also vary magnitude, other elements
• Atger, 2001
• Evaluate using categorical, continuous, probabilistic
scores / methods
* Giving a talk in this Symposium
18
Spatial multi-event contingency table
Verify using the Relative Operating
Characteristic (ROC)
• Measures how well the forecast can
separate events from non-events based
on some decision threshold
single threshold
Decision thresholds to vary:
• magnitude (ex: 1 mm h-1 to 20 mm h-1)
• distance from point of interest (ex:
within 10 km, .... , within 100 km)
• timing (ex: within 1 h, ... , within 12 h)
• anything else that may be important in
interpreting the forecast
Can apply to ensembles, and to compare
deterministic forecasts to ensemble
forecasts
Atger, 2001
ROC curve for varying rain threshold
ROC curve for ensemble forecast,
varying rain threshold
19
Object- and entity-based verification
• Consistent with human interpretation
• Provides diagnostic information on whole-system properties
•
•
•
•
Location
Amplitude
Size
Shape
NCAR
fcst
obs
Df
Cf
Bf
• Techniques
Af
Co
Bo
Do
Ao
• Contiguous Rain Area (CRA) verification (Ebert and McBride, 2000)
• NCAR object-oriented approach* (Brown et al., 2004)
• Cluster analysis (Marzban and Sandgathe, 2005)
• Composite method (Nachamkin, 2004)
8 clusters
identified in
x-y-p space
MM5
20
Contiguous Rain Area (CRA) verification
• Define entities using threshold (Contiguous Rain Areas)
• Horizontally translate the forecast until a pattern matching
criterion is met:
• minimum total squared error
• maximum correlation
• maximum overlap
Obs
Fcst
• The displacement is the vector difference between the original
and final locations of the forecast.
• Compare properties of matched entities
•
•
•
•
area
mean intensity
max intensity
shape, etc.
Ebert and McBride, 2000
21
Error decomposition methods
• Attempt to quantify the causes of the errors
• Some approaches:
• CRA verification (Ebert and McBride, 2000)
MSEtotal = MSEdisplacement + MSEvolume + MSEpattern
• Feature calibration and alignment (Nehrkorn et al., 2003)
E(x,y) = Ephase(x,y) + Elocal bias(x,y) + Eresidual(x,y)
• Acuity-fidelity approach (Marshall et al., 2004)
minimize cost function: J = Jdistance + Jtiming + Jintensity + Jmisses
from both perspectives of forecast (fidelity) and observations (acuity)
• Error separation (Ciach and Krajewski, 1999)
MSEforecast = MSEtrue + MSEreference
22
Scale separation methods
• Measure correspondence between forecast and observations at
a variety of spatial scales
• Some approaches:
• Multiscale statistical properties (Zepeda-Arce et al.,
2000; Harris et al., 2001)
• Scale recursive
estimation (Tustison et al.,
2003)
• Intensity-scale approach*
(Casati et al., 2004)
SATELLITE
l=0
MODEL
l=1
RADAR
l=2
RAIN
GAUGES
l=3
23
Summary
• Nowcasts and very short range forecasts present some unique
challenges for verification
• High impact weather
• High resolution forecasts
• Imperfect observations
• There is still a place for standard scores
•
•
•
•
•
Historical reasons
When highly accurate forecasts are required
Useful for monitoring improvement
Must use several metrics
Please quantify uncertainty, especially when intercomparing
forecast schemes
• Compare with unskilled forecast such as persistence
24
Summary (cont'd)
• Evolving concept of what makes a "good" forecast
• Recognizing value of "close enough"
• Probabilistic view of deterministic forecasts
• Exciting new developments of diagnostic methods to better
understand the nature and causes of forecast errors
• Object- and entity-based
• Error decomposition
• Scale separation
25
http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.html
26