Download linear regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Oceanography 569
Oceanographic Data Analysis Laboratory
Kathie Kelly
Applied Physics Laboratory
515 Ben Hall IR Bldg
class web site:
faculty.washington.edu/kellyapl/classes/ocean569_
2014/
Exercise 3a: Error Estimates
Known errors for
Q, T and H
Need error
estimates for
• Q/(ρ cpH)
• dT/dt
Exercise 3a: Error Estimates
Terms quite similar:
• Q/(ρ cpH)
• dT/dt
Compare LHS variance
<[dT/dt - Q/(ρ cpH)]2>
with error variance
estimate
Exercise 3a: Error Estimates
1) Compute estimated relative error variance for heating
term
2) Convert to error variance by multiplying by variance
3) Compute error variance for dT/dt (assuming uncorrelated
errors)
4) Total error variance is sum of these terms
5) Compare with variance of LHS
6) Error variance > LHS variance => other terms negligible
comared with errors
Exercise 3b: Significance Test
Is the recent global temperature change significantly different from the
previous values?
Probability of year-to-year
temperature differences
Exercise 3b: Significance Test
Differences ΔT can be normalized to N(0,1) using the Z-transform
Probability p of Z score (or lower) found from a table OR
probability of ΔT score can be found from Matlab:
p = normcdf(Z,0,1)
[Matlab function]
Is the most recent temperature change significantly different from
previous values at 5%? (Is the probability of recent value smaller than
5%?)
Derivatives and Errors: strategy
The error variance in the derivative dT/dt is given by
How is the error variance affected by the choice of time step Δt?
Consider two cases:
1) errors are point-to-point random
2) errors have time scales long relative to T
Derivatives and Errors
1) For errors with little serial correlation the error variance in the
derivative decreases with increasing interval Δt
2) For errors that are highly correlated (more like a bias) there is
little advantage to increasing the interval size because the error
terms nearly cancel
Analysis of Variance (ANOVA)
Evaluate a dynamical or statistical model using observations
[mean and periodic signals removed]
Example: linear function
The error is the difference between the model and the observations.
The data variance NOT explained by model (error variance) is
The fraction of data variance σ2 that IS explained by the model is
usually called the skill
Signal-to-noise ratio:
Chi-Squared
A related measure of the goodness of fit of a least-squares
estimator is chi-squared (Χ2)
For errors that are all about the same size this is just the error
variance divided by the data variance
which is the fraction of variance in the error so in this case
skill is given by
Analysis of Variance: correlation vs. skill
What is the relationship between correlation and skill for
By definition:
In terms of variance:
Eliminate y:
which is the skill S. This is a special case where amplitude of y is nearly the
same as that of x and
Analysis of Variance for a Model
What is the relationship between skill and correlation for
In terms of variance:
So the skill S is given by
Exercise 4: Lagged correlations
Lag correlate the SSH at one location with
SSH at every other location to get this
image:
Δx
Δt
SSH: longitude-time
Measuring Similarity
The correlation gives a measure of the similarity of two time series
but not their magnitude
Example of a model that is
highly correlated with data
but smaller in magnitude
The fraction of variance explained by the model (skill) does not
distinguish between correlations and magnitudes.
Does any metric describe both?
Taylor Diagram
Truth
---
To compare both correlations and relative magnitudes
(Taylor, 2001)
Given a model of an observed value
The mean squared error is given by
which reduces to
(1)
σm
ε
because
Compare with law of cosines for triangle
θ
σo
(2)
Taylor Diagram & Normalized Version
Eqns (1) and (2) match if we define
Triangle shows error contributions of both
correlation and magnitude geometrically
Normalized version:
Divide through by variance of observations σo
or
where r is relative magnitude
Analysis of Variance for a Model
What is the relationship between skill and correlation for
In terms of variance:
So the skill S is given by
where
. For large a, S can be negative. (For small a the correlation
will be small and the assumption is violated.)
Skill: Empirical vs. Model
A good estimator has a skill near 1 (small squared error).
A linear regression is an empirical fit to data that minimizes the squared
error (least-squares fit). Its skill is positive by design.
Skill or fraction of variance explained can also be used to evaluate a
model
dm(x,t) = f[u(x,t])
However, models do not optimize the fit to the observations (unless
data assimilation is used) so the skill can be negative.
For example, a model that predicts the seasonal cycle of ocean
temperature, with a good match of phase, but an underestimate of
amplitude, could give a negative skill.
Therefore, Taylor diagrams are frequently used to evaluate models.
Lowpass Filter
Smooth data to remove errors.
What is the assumption?
filter removes half the power at
specified time (or space) scale
Input to function “butter” in
signal processing toolbox:
Wn = 2*t/half_power
2*t is twice the grid spacing
(Nyquist frequency)
Other Filters
Highpass filter:
removes low
frequencies (or large
spatial scales)
Bandpass filter:
removes low and high
frequencies (small and
large scales)
Filtering and Correlations
Lowpass filter: what
happens to the integral
length scale?
How does this affect N*?
How does this affect the
significance level for
correlations?
Linear Algebra Review (2)
Linear Algebra Review (3)
B=A\C
Linear Algebra Review (1)
Linear Regression
Linear Regression
Linear Regression (cont’d)
observation
z
estimate error
y
minimum error
best estimate
estimate
x
Linear Regression (cont’d)
y=X
=X\y
Linear Regression:
limiting the number of variables
Note: fitting data to a curve is a simple form of linear regression in
which variables X are 1, x, x2, x3, ...
Coefficients are optimized to give best fit to the data. For each
variable X added to the regression squared error decreases, because
coefficients also fit random components (“noise”).
However, on another set of data the same coefficients will not fit
random components and the fit may be worse. The amount by which
the estimator overfits data is sometimes called “artificial skill.”
Minimize “artificial skill” by limiting regression to only significant
variables, which are determined using an F test on the variance
reduction (skill) of the estimator.
We check the estimator by comparing the errors from regression and
errors from another set of data.
Significance of Linear Regression
k is number
of additional
parameters
Code for Linear Regression (1)
Code for Linear
Regression (2)
Exercise 6: Linear Regression
create linear estimator for heat flux
Linear regression
Test each variable (hindcast)
cosine
air-sea temp.
Regression on single
variables for latent heat
flux
[use only half the data]
wind speed
humidity diff.
Multiple Regression
Test combinations of variables (hindcast)
humidity
humidity + wind
Find variable(s) least
correlated with best
single variable
correlated = redundant
F test for evaluating
additional variables
humidity + wind + cosine
Multiple Regression
Examine residuals
residual
plot residual
check histogram
(nearly normal?)
Is there a pattern in
residual?
humidity + wind + cosine
Linear regression
Test regressions on new data (forecast)
humidity + wind
Compare hindcast and forecast errors
Do estimators perform as predicted?
Check for patterns in residual