Download Fixing Autocorrelatin

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Spatial autoregressive
methods
Nr245
Austin Troy
Based on Spatial Analysis by Fortin
and Dale, Chapter 5
Autcorrelation types
• None: independence
xi   i ,  i  N (0,  2 )
• Spatial independence,
functional dependence
 xi  zi   i

 zi  i
• True autocorrelation>>
inherent autoregressive
xi  xi 1   i ,1    1
• Functional autocorr>>
induced autoregressive
 xi  zi   i
2
,
where
,


N
(
0
,


)
 zi  zi 1  i
Autocorrelation types
• Double autoregressive
 xi  zi   x xi 1   i

 zi   z zi 1  i
• Notice there are now two
autocorrelation parameters x and -z
Effects?
• Standard test statistics become “too
liberal”—more significant results than the
data justify
• Because observations are not totally
independent have lower actual degrees of
freedom, or lower “effective sample size”:
n’ instead of n; since t stat denominator =
s/n, if n is too big it inflates the t statistic
What to do? Non-effective
• Why not just adjust up the significance
level? E.g. 99% instead of 95%? Because
don’t how by how much to adjust without
further information. Could end up with a
test that is way too conservative
• Why not just adjust sampling to only
include “independent samples?” Because
wasteful of data and because easy to
mistake “critical distance to independence”
Best approach: Adjust effective
sample size
1 
• For large sample sizes
n'  n
1 
– So for instance n=1000 and ro=.4 means n’=429
• Problem is that, to be useful, autoregressive
model (ro parameter) has to be an effective
descriptor of the structure of autocorrelation of
the data
Moving average models
• How calculated depends on “order”
• A simple model for adjusting sample size: first order
autoregressive model, only immediate (first order)
neighbors are correlated with ro>0. All other pairs are
zero.
• In such a model xi is a function of xi+1 and xi-1
• Hence half the info for xi is in each neighbor; produce
k
ro=.5 for large n and n’=n/2.
xi   i    j  i  j
• An n order model can take form
j 1
• Translates into generalized matrix form X  Z  W  
• With variance covariance matrix
C   [(I  W) (I  W)]
2
T
Moving average
• When you increase the order, calculating sample size gets
complicated; e.g. second order model, where two ro
parameters now
n'  n 2 /[ n  2(n  1) 1  2(n  2)  2 )
• Important point: If there are several different levels of
autocorrelation (k), each k must be incorporated even if
non-significant; this can have a huge impact on the
calculation of effective sample size
• Fortin and Dale recommend not using moving average
approach because very sensitive to irregularities in the
data and can produce a wide range of estimates
Two dimensional approaches
• Problem with MA approach as it was just
presented is assumes one-dimensionality
• In spatial data, xi depends on all neighbors
most likely
• Two best ways for dealing with this:
– Simultaneous autoregressive models (SAR)
– Conditional autoregressive models (CAR)
SAR
• Based on concept of set of simultaneous equations to be
solved. In this xi and xi-1 are each defined by their own
equations
x  Z  u
• Where x is a vector and is linearly dependent on a vector
of underlying variables z , z z …. Given as matrix Z, u is a
vector non-independent error terms with mean zero and
var-covar matrix C
• Spatial autocorrelation enters via u where
1
2
3
u  W u  
• Here e is independent error term and W is neighbor
weights standardized to row totals of 1. W is not
necessarily symmetrical, allowing for inclusion of
anisotropy. Wij is >0 if values at location i is not
independent of value at location j
SAR
• This yields the model
x  Z  W(x  Z )  
• With variance
covariance matrix
(from u)
C   2 [(I  W)T (I  W)]1
• Note how similar to
MA—difference is no
inverse in formula
• The elements of C
are variances
From Fortin and Dale p. 231
CAR
• More commonly used in spatial statistics
• Not based on spatial dependence per se; instead probability
of a certain value is conditional on neighbor values
• Similar to SAR, but requires that weight matrix be
symmetrical
2
1
C


(
I

j
V
)
• Here
Where j is the autocorrelation parameter and V is a
symmetrical weight matrix
Any SAR process is a CAR process if
V= W + WT – W TW