Download CSNI-matrics-a.ss - Department of Earth and Planetary Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CSNI Workshop on
Testing PSHA Results and Benefit of Bayesian Techniques for Seismic Hazard Assessment
Pavia, Italy (4-6 February 2015)
Metrics, observations, and biases
in quantitative assessment
of seismic hazard model predictions
Edward Brooks1, Seth Stein1, Bruce D. Spencer2
Antonella Peresan3,4
1 Department
of Earth & Planetary Sciences and Institute
for Policy Research, Northwestern University, Evanston,
Illinois, USA
Department of Mathematics
and Geosciences
2 Department
of Statistics and Institute for Policy
Research, Northwestern University, Evanston, Illinois, USA
3 Department
of Mathematics and Geosciences, University
of Trieste. Italy
4
SAND Group, ICTP. Trieste. Italy
Forecasting ground shaking:
many maps… and many questions
 What’s going wrong with existing maps?
 How can we improve forecasts?
 How can we quantify their uncertainties?
 How can we measure their performance?
 How do we know when to update them?
 How good do they have to be useful?
 How do we make sensible policy given forecasts
limitations?
Geller
2011
Geller (2011) argued that
“all of Japan is at risk from
earthquakes, and the
present state of
seismological science does
not allow us to reliably
differentiate the risk level in
particular geographic areas,”
so a map showing uniform
hazard would be preferable
to the existing maps.
How should we
test this idea?
How good a baseball player was
Babe Ruth?
The answer depends on the
metric used.
In many seasons Ruth led the
league in both home runs and in
the number of times he struck
out.
By one metric he did very well,
and by another, very poorly.
From users’ perspective,
what specifically should hazard maps seek to
accomplish?
Different users likely want different things
How do we measure how well they meet
users requirements?
No agreed way yet…
Lessons from meteorology
n
Weather forecasts are routinely evaluated to assess
how well their predictions matched what actually
occurred: "it is difficult to establish well-defined
goals for any project designed to enhance
forecasting performance without an unambiguous
definition of what constitutes a good forecast."
(Murphy, 1993)
n
Information about how a forecast performs is crucial
in determining how best to use it. The better a
weather forecast has worked to date, the more we
factor it into our daily plans.
Chosing appropriate
metrics is crucial in
assessing performance of
forecasts.
Silver (2012) shows that TV
weather forecasts have a "wet
bias" - predicting more rain
than actually occurs, probably
because they feel that
customers accept
unexpectedly sunny weather,
but are annoyed by
unexpected rain.
From users’ perspective,
what specifically should hazard maps
seek to accomplish?
How do we measure how well they do it?
How much can we improve them?
How can we quantify their large
uncertainties?
How to measure map
performance?
Implicit probabilistic map
criterion: after appropriate
time predicted shaking
exceeded at only a fraction
p of sites
Define fractional site
exceedance metric
M0(f,p) = |f – p|
where f is fraction of sites
exceeding predicted shaking
Ideal map has M0 = 0
M0=0
Fractional site
exceedance is a
useful metric
but only tells
part of the story
M0=0
Both maps are
successful, but…
This map exposed some sites to much
greater shaking than predicted. This
situation could reflect faults that had
larger earthquakes than assumed.
Fractional site
exceedance is a
useful metric
but only tells
part of the story
M0=0
M0=0
All these maps
are successful,
but…
This map significantly
overpredicted shaking,
which could arise from
overestimating the
magnitude of the
largest earthquakes.
Other metrics can
provide additional
information beyond
the fractional site
exceedance M0
Squared misfit to the data
M1(s,x) = i (xi - si)2/N
measures how well the
predicted shaking compares
to the highest observed.
From a purely seismological
view, M1 tells us more than
M0 about how well a map
performed.
Other metrics can
provide additional
information beyond
the fractional site
exceedance M0
Because underprediction does
potentially more harm than
overprediction, we could weight
underprediction more heavily.
Asymmetric squared misfit
M2(s,x) = i wi(xi - si)2/N
with
wi = a for (xi - si) > 0 and
wi = b for (xi - si) ≤ 0
More useful for hazard
mitigation than M1
Other metrics can
provide additional
information beyond
the fractional site
exceedance M0
Shaking-weighted
asymmetric squared misfit
We could use larger
weights for areas predicted
to be the most hazardous,
so the map is judged most
on how it does there.
Other metrics can
provide additional
information beyond
the fractional site
exceedance M0
Exposure-weighted
asymmetric squared misfit
We could use larger
weights for areas with the
largest exposure of people
or property, so the map is
judged most on how it
does there.
Although no single
metric fully
characterizes map
performance, using
several metrics can
provide valuable
insight for assessing
and improving
hazard maps
Comparing maps could be done via the skill score
SS(s,r,x) = 1 - M(s,x) / M(r,x)
where M is any of the metrics, x is the maximum observed
shaking, s is the map prediction, and r is the prediction of a
reference map produced using a selected null hypothesis (e.g.
uniform hazard).
The skill score would be positive if the map's predictions did
better than those of the map made with the null hypothesis, and
negative if they did worse.
We could assess how well maps have done after a certain time,
and whether successive generations of maps do better.
217 BC – 2002 AD
Nekrasova et al.,
2014
One possible space-time sampling bias…
 The probabilistic map with 2%
probability of exceedance in 50
years (i.e. ground shaking
expected at least once in 2475
years) significantly overestimates
the shaking reported over a
comparable time span (about 2200
years).
 The deterministic map, which is
not associated to a specific time
span, also tends to overestimate
the ground shaking with respect to
past earthquakes.
Historical catalog thought to be
incomplete (Stucchi et al., 2004) and may
underestimate the largest shaking due to
space-time sampling bias
Dependence of seismic hazard estimates
on the time span of the input catalog: NDSHA map
a) TOTAL – [1000,1500)
b) TOTAL – [1500,2000)
Intensity differences between the NDSHA map obtained for the entire catalog
(TOTAL) and the maps obtained for the time intervals (500 years catalog): a)
[1000,1500) e b) [1500, 2000)
Dependence of seismic hazard estimates
on the time span of the input catalog: NDSHA map
TOTAL – [1000,1500)
TOTAL – [1500,2000)
Intensity differences between the NDSHA map obtained for the entire catalog
(TOTAL) and the maps obtained, considering the seismogenic nodes, for the
time intervals: a) [1000,1500) e b) [1500, 2000)
Options after an
earthquake yields
shaking larger than
anticipated:
Either regard the high
shaking as a lowprobability event allowed
by the map
Or – as usually done accept that high shaking
was not simply a lowprobability event and
revise the map
No formal or objective
criteria are used to
decide whether to
change map & how
Done via BOGSAT
(“Bunch Of Guys
Sitting Around Table”)
Challenge: a new map
that better describes the
past may or may not
better predict the future
?
Deciding whether to remake a map
is like deciding after a coin has come up heads a
number of times whether to continue assuming that
the coin is fair and the run is a low-probability event,
or to change to a model in which the coin is
assumed to be biased.
Changing the model may describe future worse
?
Bayes’ Rule – how much to change depends on one’s
confidence in prior model
Revised probability model =
Likelihood of observations given the prior model
x Prior probability model
If you were confident that the coin was fair, you would
probably not change your model. If you were given the coin
at a magic show, your confidence would be lower and you
would be more likely to change your model.
Assume Poisson
earthquake
recurrence with
λ = 1/T = 1/50 =
0.02 years
This estimate is
assumed (prior) to
have mean μ and
standard deviation σ
If earthquake occurs
after only 1 year
The updated forecast, described by the
posterior mean, increasingly differs
from the initial forecast (prior mean)
when the uncertainty in the prior
distribution is larger. The less
confidence we have in the prior model,
the more a new datum can change it.
Conclusions

We need agreed ways of assessing how well hazard maps
performed and thus whether one map performed better than
another.

This information is crucial to tell how much confidence to have in
using them for very expensive policy decisions.

Although no single metric alone fully characterizes map behavior,
using several metrics can provide useful insight for comparing and
improving maps.

Deciding when and how to revise hazard maps should combine
BOGSAT – subjective judgement given limited information - and
Bayes – ideas about parameter uncertainty.
Challenge
U.S. Meteorologists (Hirschberg et al., 2011)
have adopted a goal of “routinely providing
the nation with comprehensive, skillful,
reliable, sharp, and useful information about
the uncertainty of hydrometeorological
forecasts.”
Although seismologists have a tougher
challenge and a longer way to go, we should
try to do the same for earthquake hazards.