Download 1 Introduction

c HYON-JUNG KIM, 2016 1 Introduction Spatial analysis is the quantitative study of phenomena that are located in space. Spatial data analysis usually refers to an analysis of the observations in which the spatial locations of sites are taken into account, and includes the reduction of spatial patterns to a few clear and useful summaries. Spatial statistics goes beyond this in that these summaries are compared with what might be expected from theories of how the pattern might have originated and developed, i.e., inferential statistics. So, Spatial Statistics involves the inferential level of analysis, model building, testing and interpretation. It is a vast subject in large part because spatial data are of so many different types. Spatial data: Data that are location specific and that vary in space. The observations may be: - univariate or multivariate - categorical or continuous - real-valued (numerical) or not real-valued - observational or experimental The data locations may - be points, regions, line segments, or curves - be regularly or irregularly spaced - be regularly or irregularly shaped - belong to Euclidean or non-Euclidean space The mechanism that generates the data locations may be: - known or unknown - random or non-random - related or unrelated to the processes that govern the observations PAGE 1 c HYON-JUNG KIM, 2016 Typical data - sample of observations from the process of interest - often very noisy, NOT independent Three prototypes of data: 1. Geostatistical data The components of geostatistical data are the locations, and the measurements at each location. e.g. Rainfall measurements in Tampere, Temperature for weather stations in Finland, Air pollutants measurements, Soil pH in water, etc. 2. Lattice data (Areal or aggregate data) Counts or averages of a quantity on subregions that make up a larger region. e.g. Presence or absence of a plant species in square quadrats over a study area, number of deaths due to SIDS in the counties of North Carolina, Pixel values from remote sensing (satellites) 3. Spatial point patterns e.g. Location of bird nests in a suitable habitat (evidence of territoriality), location of lunar craters (meteor impacts or volcanism) etc. Note that the distinction between these types are not always clearcut. Especially, geostatistical data and lattice data have many similarities. Spatial Structure • Large-scale structure (Global) - Mean function of geostatistical process - Intensity of spatial point process - Mean vector of lattice data PAGE 2 c HYON-JUNG KIM, 2016 • Small-scale structure (Local) - Variogram, covariance function of geostatistical process (and lattice process) - Ripley’s K function, second-order intensity, nearest-neighbor functions for spatial point process - Neighbor weights for lattice process Stationarity implies constant large-scale structure and small-scale structure which depends on the spatial locations only through their relative positions (formal descriptions will be discussed later.) Main objectives of Spatial Statistics - Inference for spatial structure - Inference for non-spatial structure - Prediction of unobserved variables - Design issues, such as where to take observations or how to arrange treatments in a spatial experiment. Temporal Statistics, Spatial Statistics, and Spatio-Temporal Statistics The inherent difference between temporal statistics and spatial statistics is due to the fact that time flows in one direction only, from past to present to future. - In spatial statistics, observations are often irregularly spaced and models must be more flexible. - In geostatistics and lattice data analysis, observations are usually assumed to be dependent and non-identically distributed; in particular, models usually include a trend. - In space, interaction regarding each observation generally occurs in all directions and many geostatistical/lattice models incorporate omnidirectional interaction. - In time series, prediction usually consists of extrapolating to a future time point. In geostatistics/lattice analysis, interpolation is as important as extrapolation. - Geostatistics and lattice data analysis are most similar to that subfield of modern longitudinal data analysis which explicitly models the temporal correlation among observations. PAGE 3 c HYON-JUNG KIM, 2016 - Spatial point pattern analysis is most similar to failure time data analysis. Spatiotemporal statistics: data are observations with identifiable and observed spatial and temporal labels. e.g. Earthquakes (locations random in time and space), change in locations of trees over time, environmental monitoring of water quality, etc. Space-time data can be modeled either as a collection of spatially correlated time series or a collection of temporally correlated spatial random fields, lattice processes, or spatial point processes. There are many possibilities to combine spatial data types with temporal data types and the interaction between them. We will focus on “pure” spatial statistics in this course but occasionally we will discuss spatiotemporal extensions of certain issues, topics, or methods. Basic Notation and Statistical Model - Space S, which for concreteness we assume to be Euclidean: S = Rd where d = 1, 2, or 3. - Study region, A ⊂ S - Spatial data (or point) locations s1 , . . . , sn , si ∈ D, D an index set - Observations Z(s1 ), Z(s2 ), . . . , Z(sn ) - Covariates X(s1 ), X(s2 ), . . . , X(sn ) - Model: {Z(s), s ∈ D, D ⊆ Rd } This is a stochastic process, i.e. a collection of random variables, indexed by points or regions in D. Either the Z values or s values or both are random. The X values are usually assumed to be nonrandom. PAGE 4 1.1 Visualization 1.1 c HYON-JUNG KIM, 2016 Visualization 1. Visualizing Geostatistical (point referenced) Data The best way to visualize these data is to display on a map, and differentiate the values of the measurements of interest by colour or size. Example: Field observations of air pollution measurements in the northeast US. The points are air pollution monitors: the monthly average P M2.5 concentration colour coded (a gradient from blue (low) to red (high)) Alternatively we can display the points as gradients in size (2nd figure): the monthly cal Data average P M2.5 concentration where larger circles represent higher concentrations, smaller circles are lower concentrations. Note the choice of colour and size gradient of the points (ugm-3) can lead to different conclusions! PM2.5 concentrations 55 r 40 30 35 Latitude 45 50 M2.5 ded (a ) to -90 -85 -80 -75 -70 Longitude • Goals of spatial statistics applied to geostatistical data - Explore the spatial pattern in the observations. (Often called spatial “structure”). 23 / 53 - Quantify the spatial pattern with a function. - Model the spatial correlation/covariance in the observations. PAGE 5 1.1 Visualization cal Data - Make predictions at unobserved locations: interpolation, smoothing. Additional considerations: Account for spatial structure in regression models and/or (ugm-3) Test a null hypothesis of no PM2.5 spatialconcentrations structure. 55 the c HYON-JUNG KIM, 2016 50 ion 30 35 and an lead 40 Latitude circles s 45 M2.5 ger -90 2. Areal data -85 -80 -75 -70 Longitude Areal units are often referenced as polygons. The centroids of the areal units may be useful for a spatial reference, in combination with the area of the polygon. The best 24 / 53 way to visualize these data is to display as a map, differentiating the areal units by colour. Areal data (lattices) use neighbor relationships. Examples: - Median household income in Los Angeles neighborhoods - State-specific (or county-, census tract-, zip code-specific) election results - County hospital admission rates for influenza Information collected in areal units may be census related, health related, environmental (satellite estimates of pollution, land cover). • Goals of spatial statistics applied to areal data PAGE 6 1.1 Visualization c HYON-JUNG KIM, 2016 Example- Understand the linkage between areal units. - We want to determine spatial patterns of areal units within a region. - If there is a spatial pattern, how strong is it? A pattern through visualization is often subjective. Independent measurements will usually have no pattern. in the Visualizing Areal Data: Example 30 / 53 32 / 53 PAGE 7 1.1 Visualization c HYON-JUNG KIM, 2016 3. Point Pattern Data A spatial point process is a stochastic mechanism that generates events in 2D. Event is an observation (e.g. presence/absence) and the point is the location. Mapped point pattern: Events in a study area D have been recorded. Sampled point pattern: Events are recorded after taking samples in an area D. Examples: - Locations of homeless in Los Angeles - Cases of malaria in Nairobi - Locations of a specific tree species in a forest If there are different categories of a point pattern, such as with the homeless data, then Point Pattern Data: An Everyday Example these categories may be coloured separately. Often conclusions cannot be drawn from visual inspection alone. http://graphics.latimes.com/homeless-los-angeles-2015/ 39 / 53 • Goals about point pattern data: Model some spatial pattern and determine if our observed point pattern fits this model. Measure of intensity: mean number of events per unit area PAGE 8 1.1 Visualization c HYON-JUNG KIM, 2016 Questions we would like to answer: - Is there a regular pattern in the points? - Is there clustering of the points? - Can we define a point process that our events follow? - Is there an underlying population distribution from which events arise in a region? 4. Spatio-temporal data All three types of data we have described may be referenced in space and in time. That is, data that are location specific can have replicates in time: - Each observation has a location, time and value Geostatistical: Relationship between daily air pollution measured at discrete locations in the US Northeast and hospital admissions Visualizing Areal Data Areal: Examining birth rates from year to year in US states. Crude birth rates by state based on equal-interval cut points Point process: Changes in spatial clustering of homeless individuals from 2015 to 2016. Figure: Monomier, N. Lying with Maps. Statistical Science 2005, 20(3) 215222. 34 / 53 PAGE 9 c HYON-JUNG KIM, 2016 2 Geostatistics The (stochastic) process varies continuously over the space, but data is measured only at discrete locations. - Process (Markov Random Field) {Z(s), s ∈ D, D ⊆ Rd } - Observations: z1 = Z(s1 ), z2 = Z(s2 ), . . . , zn = Z(sn ) First law of geography: Nearby quantities tend to be more alike than those far apart The usual model for many kinds of data is Datum = Mean + Residual In a Geostatistical context, the basic model takes the form Z(s) = m(s) + (s) ( i.e. large scale variation + small scale) = m(s) + W (s)(smooth) + δ(s)(white noise) = signal + noise where m(s) ≡ E[Z(s)] is the mean function which is usually nonrandom quantity. When we specify the distribution of (s) sufficiently, the distribution of {Z(s), s ∈ D} will be specified. However, the random sampling assumptions generally are not appropriate. Geostatistical data generally represent an incomplete sampling of a single realization. Some further assumptions about Z(·) must be made for inference to be possible and such an assumption is stationarity (to be discussed later in detail). 2.1 Exploratory Data Analysis 1. Non-spatial summaries - Numerical summaries: Mean, median, standard deviation, range, etc. - Graphic tools: stem-and-leaf, box plots, etc. 2. Descriptive statistics for spatial information PAGE 10 2.1 Exploratory Data Analysis c HYON-JUNG KIM, 2016 a) Methods mainly to explore large-scale variation: • Plot of Zi versus each marginal coordinate • Plot of mean or median of Zi versus row index or column index (data locations on a regular grid) • 2-D or 3-D scatterplots: a plot of Zi vs. data location (for d = 3) • Indicator maps: assign each data point to one of only two classes using two symbols • contour plots, greyscale maps, proportional symbol maps • Spatial moving averages: estimation by averaging the values at neighboring sampled data points • Nonparametric smoothing : Kernel estimation: (Bailey and Gatrell, section 2.3.2), LOESS (locally weighted polynomial regression) • Mean or median polish - Requires a rectangular grid, say p × q - Decomposes data: data = overall + row effect + col effect + residuals (i.e. removes some trend, large scale variation) - Alternately subtract row means (medians) and column means and accumulate these in extra cells. Repeat this procedure until another iteration produces virtually no change. b) Methods to explore small-scale variation: • h-scatterplots (or same-lag scatterplots) - Methods to explore dependence - Requires regular spacing between data locations - for a fixed vector e of unit length and a scalar h, plot Z(si + he) vs. Z(si ) for all i - May reveal direction of dependence, outliers or the existence of nonstationarity in the mean and/or variance PAGE 11 2.1 Exploratory Data Analysis c HYON-JUNG KIM, 2016 • 3-D plot of standard deviation versus(vs.) spatial location, computed from a moving window • Scatterplot of standard deviation vs. mean, computed from a moving window • Semivariogram cloud - Plot (Z(si ) − Z(sj ))2 or |Z(si ) − Z(sj )|1/2 vs. (si − sj )1/2 for all possible pairs of observations - Note that this implicitly assumes some kind of stationarity e.g. Coal Ash Data (Cressie) The data contains 208 coal ash core samples collected on a grid. Suppose X=% coal ash, Y1 = % coal ash of neighbor to the East and Y2 = % coal ash of second nearest neighbor to the East. Let D12 = (X − Y1 )2 , D12 = (X − Y1 )2 , etc. Make a boxplot of D12 , D22 , · · · and put them side by side. D12 small ⇒ D12 large ⇒ • Empirical (or sample, or experimental) semivariogram (Matheron, 1962) (Assume that large scale variation for Z(·) is removed or ignorable for now.) γb (h) = X 1 {Z(si ) − Z(sj )}2 2|N (h)| N (h) where N (h) = {(si , sj ) : si − sj = h : i, j = 1, 2, ..., n} and |N (h)| is the number of distinct pairs in N (h). • Sample covariance function The usual estimator is b C(h) = X 1 (Z(si ) − Z)(Z(sj ) − Z) |N (h)| N (h) which is the spatial generalization of the sample autocovariance function used in time series analysis. (This will be discussed more in depth later.) PAGE 12 2.2 Models 2.2 c HYON-JUNG KIM, 2016 Models Stationarity a) Strict stationarity - requires that the joint probability of the data depends only on the relative positions of the sites at which the data were taken. b) Second-order stationarity i) the variate’s mean is constant. ii) Covariance between variates at two sites depends only on the site’s relative positions. C(s, t) = C(s + h, t + h), for all h c) Intrinsic stationarity i) the mean is constant E[Z(s)] = µ for all s ∈ D ii) 21 Var[Z(s) − Z(t)] depends only on the lag difference s − t for all s, t ∈ D. Trend surface (Mean functions) The first requirement for stationarity (that the spatial variate have constant mean) does not seem reasonable in many cases. What seems more reasonable is that sites close to one another should have similar means, but sites far apart need not. This kind of local stationarity rather than global stationarity leads to the postulation of a continuous, relatively smooth but nonconstant function for the mean. - The conventional multiple regression model: Z(s) = X(s)β + (s) - A very useful class of mean functions are the polynomials: e.g. m(x, y) = β0 + β1 x + β2 y - Another kind of continuous (but less smooth) function is the surface that results from performing a median polish. PAGE 13 2.2 Models c HYON-JUNG KIM, 2016 - An alternative to a parametric approach to modeling the mean function is a nonparametric approach using splines or LOESS or a kernel estimator. Recall that in page 10, if we assume that the distribution of {(s), s ∈ D} is a Gaussian process, then the distribution of {Z(s), s ∈ D} is completely specified. Now, the convention in Geostatistics is that the distribution of {Z(s), s ∈ D} is specified through its covariance function as a function of the coordinates of the two corresponding sites. Covariance functions The function needs to satisfy the following properties: a) Evenness C(h) = C(−h) for all h b) Nonnegative definiteness n X n X ai aj C(si − sj ) ≥ 0 i=1 j=1 for all n, all sequences {ai , i = 1, . . . , n} and all sequences of spatial locations {si , i = 1, . . . , n}. a) and b) ⇒ C(0) ≥ 0, |C(h)| ≤ C(0) for all h Bochner’s theorem: a function is nonnegative definite iff (if and only if) it is the Fourier transform of a positive Borel measure. Isotropy and Anisotropy A stationary covariance function is called isotropic if the covariance between any two values depends only on the Euclidean distance ks−tk between locations i.e., C(h) = C(khk) When the covariance depends on the direction, it is called anisotropic. Isotropic, parametric (valid) covariance function models Let r = khk for convenience. PAGE 14 2.2 Models c HYON-JUNG KIM, 2016 • Tent (triangular, piecewise linear) model (valid in R1 only) C(r; θ) =    θ1 (1 − r/θ2 ) for 0 ≤ r ≤ θ2   0 for θ2 < r • Spherical model C(r; θ) =    θ1 1 −   0 3r 2θ2 + r3 2θ23 for 0 ≤ r ≤ θ2 for θ2 < r • Exponential model C(r; θ) = θ1 exp(−θ2 r) θ1 ≥ 0, θ2 ≥ 0 C(r; θ) = θ1 exp(−θ2 r2 ) θ1 ≥ 0, θ2 ≥ 0 • Gaussian model • Rational quadratic model C(r; θ) = θ1 r2 θ2 − 1 + r2 /θ2 ! θ1 ≥ 0, θ2 ≥ 0 • Matern class of model θ1 C(r; θ) = θ3 −1 2 Γ(θ3 ) 2θ3 1/2 r θ2 !θ3  1/2  2θ r K θ3  3  θ2 θ1 ≥ 0, θ2 ≥ 0, , θ3 > 0 where Kθ3 is called the modified Bessel function of the third kind of order θ3 . • Cosine model C(r; θ) = θ1 cos(r/θ2 ) θ1 ≥ 0, θ2 ≥ 0 • Wave or hole-effect model C(r; θ) = θ1 θ2 sin(r/θ2 ) r θ1 ≥ 0, θ2 ≥ 0 Note that we can constuct more complicated models using the following rules: - If C1 (·) and C2 (·) are valid covariance functions in Rd , then so is C(·) ≡ C1 (·) + C2 (·) - If C0 (·) is a valid covariance function in Rd and b > 0, then C(·) ≡ bC0 (·) is a valid covariance function in Rd PAGE 15 2.2 Models c HYON-JUNG KIM, 2016 - If C1 (·) and C2 (·) are valid covariance functions in R1d and R2d respectively, then C(·) ≡ C1 (·)C2 (·) is a valid covariance function in Rd1 +d2 - A valid isotropic covariance function in R1d may not be a valid isotropic covariance function in R2d where d2 > d1 . However, the converse is true. With the exception of the tent model, all the models listed above are valid in R2 and R3 . Semivariogram Traditionally, geostatistical practitioners have adopted a slightly more general kind of stationarity assumption (intrinsic stationarity) than second-order stationarity, and they modeled the small-scale dependence through a function (semivariogram) somewhat different than the covariance function. 1 γ(s − t) = Var[Z(s) − Z(t)]. 2 The function 2γ(·) is called the variogram. When the process is intrinsically stationary, it can be also expressed as 1 γ(h) = E[Z(s) − Z(t)]2 where h = s − t. 2 A second-order stationary random process with covariance function C(·) is intrinsically stationary, with semivariogram γ(h) = C(0) − C(h) but the converse is not true in general. That is, there exist processes that are intrinsically stationary but not second-order stationary. The semivariogram must satisfy the following properties: a) It vanishes at 0, i.e. γ(0) = 0 b) Evenness c) It needs to be conditionally negative-definite; that is, it must satisfy n X n X λi λj γ(si − sj ) ≤ 0 i=1 j=1 for each set of locations s1 , ..., sn and all λ1 , . . . , λn such that d) limkhk→∞ {γ(h)/khk2 } = 0 PAGE 16 Pn i=1 λi = 0. 2.2 Models c HYON-JUNG KIM, 2016 Attributes of the semivariogram • Nugget effect microscale variability • Sill ( = partial sill+ nugget effect ) • Range or effective range The range of an isotropic semivariogram (or covariance function) is defined as the distance beyond which correlation is equal to 0. Of the models listed, only the tent and spherical models have a range (which is equal to θ2 ). For isotropic models that do not have a range, effective range, if one exists, is defined as the distance beyond which correlation does not exceed 0.95× variance (or C(0), partial sill). The exponential, Gaussian, rational quadratic, and Matern models all have effective ranges; but the cosine model does not. • Slope Examples of valid isotropic semivariogram models • Tent (valid in R1 only)    θ1 r/θ2 for 0 ≤ r ≤ θ2  θ1 for θ2 < r γ(r; θ) =  • Linear γ(r; θ1 ) = θ1 r θ1 ≥ 0 • Power γ(r; θ) = θ1 rθ2 θ1 ≥ 0, 0 ≤ θ2 < 2 • Spherical γ(r; θ) =    θ1   θ1 3r 2θ2 − r3 2θ23 for 0 ≤ r ≤ θ2 for θ2 < r • Exponential γ(r; θ) = θ1 {1 − exp(−θ2 r)} PAGE 17 θ1 ≥ 0, θ2 ≥ 0 2.2 Models c HYON-JUNG KIM, 2016 • Gaussian model γ(r; θ) = θ1 {1 − exp(−θ2 r2 )} θ1 ≥ 0, θ2 ≥ 0 • Rational quadratic model γ(r; θ) = θ1 r2 1 + r2 /θ2 θ1 ≥ 0, θ2 ≥ 0 • Cosine model γ(r; θ) = θ1 {1 − cos(r/θ2 )} θ1 ≥ 0, θ2 ≥ 0 • Wave or hole-effect model γ(r; θ) = θ1 {1 − θ2 sin(r/θ2 ) } r θ1 ≥ 0, θ2 ≥ 0 • Matern class of model  1 γ(r; θ) = θ1 1 − θ3 −1 2 Γ(θ3 ) 2θ3 1/2 r θ2 !θ3  1/2  2θ r Kθ3  3  θ2 θ1 ≥ 0, θ2 ≥ 0, , θ3 > 0 Geostatistical Data: Semivariogram Interpretation - The exponential model is a special case of the Matern model with θ = ; the Gaussian 3 model is the limiting case of the Matern model as θ3 → ∞. . PAGE 18 1 2 2.2 Models c HYON-JUNG KIM, 2016 0 400 0 2 4 6 8 Gaussian semivariogram Semivariogram 0 5 10 Semivariogram 20 Exponential semivariogram 800 0 100 250 Power semivariogram 20 ω =ω 1.5 =1 1.0 ω = 0.3 0.0 Semivariogram 5 0 0 2.0 Spherical semivariogram 10 15 h Semivariogram h 40 0.0 h 1.0 2.0 h Modeling anisotropy a) Range anisotropy - Most often seen in practice (sill and nugget are the same). - Geometric anisotropy is easiest to model. Any valid isotropic model can be generalized to make it geometrically anisotropic. e.g. C(h : θ) = θ1 exp[−θ2 (h21 + 2θ3 h1 h2 + θ4 h22 )1/2 ] PAGE 19 2.2 Models c HYON-JUNG KIM, 2016 b) Sill anisotropy - Either the assumption of second-order stationarity is violated or there are measurement errors which are correlated or do not have mean zero. c) Nugget anisotropy - Can be caused by correlated measurement errors. - Typically occurs in one direction only. d) Slope anisotropy - Can be dealt with a similar fashion as geometric anisotropy. Other types of anisotropy i) Geometric anisotropy: A covariance function is geometrically anisotropic if a positive definite matrix A exists such that C(h) = C([h0 Ah]1/2 ) for all h ii) Zonal anisotropy Estimation of C(·) and γ(·) (revisited) • Empirical (or sample, or experimental) semivariogram For a sample of given realizations from Z(·) where the mean function is taken to be constant, the empirical semivariogram is the unbiased estimator of an isotropic semivariogram given by γb (h) = 1 2|N (h)| {Z(si ) − Z(sj )}2 si −sj =h X When non-constant trend is assumed, the sample semivariogrm is computed based on the residuals γb (h) = 1 2|N (h)| {ˆ(si ) − ˆ(sj )}2 si −sj =h X where ˆ(si ) = Z(si ) − m(si ; β̂), PAGE 20 i = 1, . . . , n 2.2 Models c HYON-JUNG KIM, 2016 This estimator is unbiased for the semivariogram (assuming the correct mean function has been adopted): method of moments type estimator. When data locations are irregularly spaced, we partition the lag space H = {(s−t) : s, t ∈ D} into lag classes or windows H1 , . . . , Hk , say, and assign each lag in the data set to one of these classes. For non-regularly spaced data, this estimator is approximately unbiased because the grouping (binning) of lags into classes cause a blurring effect. Need to replace ‘si − sj = h’ with ‘si − sj ∈ T (h)’ where T (h) is a tolerance region about h. ⇒ γ + (h) = 21 AVG{[Z(si ) − Z(sj )]2 : si − sj ∈ T (hl )} Two main types of partitions: 1. Polar partitioning, i.e. angle and distance classes 2. Rectangular partitioning Rules of thumb to be considered (Journel and Huijbregts, 1978): i) Empirical semivariogram should be considered only for distances for which the number of pairs is greater than (about) 30. ii) The distance of reliability is half the maximum distance over the field of data. • Robust semivariogram estimators - Cressie and Hawkins 1984 γ(h) = { 2|N1(h)| P N (h) |ˆ(si ) − ˆ(sj )|1/2 }4 .457 + [.494/N (h)] - Genton 1998 • Sample covariance function Recall that the estimator is given by b C(h) = X 1 (Z(si ) − Z)(Z(sj ) − Z) |N (h)| N (h) PAGE 21 2.2 Models c HYON-JUNG KIM, 2016 istical Data: Anisotropy This estimator is biased even for regularly-spaced data and is meaningful only if the process is second-order stationary. b b NOTE: γb (h) 6= C(0) − C(h) py means that the semivariance depends only on the d en points, not direction. tropy means the semivariance also depends on direction s distance. an examine anisotropy with a directional semivariogra • Correlation function (Correlogram) ρ(h) = C(h)/C(0) • Checking for isotropy - Superimposition of directional sample semivariogram - Rose diagram: consists of smoothing the directional sample semivariograms, then in the lag space connecting with a smooth curve, those lag vectors h for which these smoothed semivariograms are roughly equal. In effect, this plots estimated isocorrela- 1.5 tion contours (in case of a second-order stationary process). 0.5 0.0 semivariance 1.0 0° 45° 90° 135° 0 5 10 15 Distance (h), degrees PAGE 22 20 25 2.3 Estimation for geostatistical models 2.3 c HYON-JUNG KIM, 2016 Estimation for geostatistical models In summary, the general (or classical) model we use for our analysis of geostatistical data is Z(s) = m(s; β) + (s) where m(·; β) is a specified family of continuous functions, β is a vector of unknown parameters, {(s) : s ∈ D} is a intrinsically (or second-order) stationary process with mean zero and semivariogram γ(·; θ) (or covariance function C(·; θ)), and θ is a vector of unknown parameters. Overview of the geostatistical method: i) Using exploratory techniques, prior knowledge, and etc., set up an appropriate model (e.g. model given above) with assumptions on the mean function and stationarity of the process that generated the data. ii) Estimate β for the mean function (if it is not assumed to be constant): β̂ (e.g. by ordinary least squares or median polish). iii) Obtain the fitted residuals: ˆ(si ) = Z(si ) − m(si ; β̂). Compute the empirical semivariogram of the residuals. iv) Select a valid semivariogram model that is compatible with the plot from the previous step. Fit the chosen model to empirical semivariogram to estimate the model’s parameters. v) Using the fitted semivariogram model, re-estimate β by generalized least squares (or some other method which accounts for correlation among observations). vi) Repeat steps iii) - v) if needed. vii) Predict (‘krige’) unobserved values at sites (or over regions) of interest and estimate the corresponding variances of prediction error. Determine optimal locations to take additional observations , and repeat the above steps if needed. Semivariogram Model Fitting Although the empirical semivariogram is unbiased for the semivariogram, it may not be negative-definite. Neither the sample semivariogram nor the sample covariance function can be used directly used for statistical inference ,e.g., spatial prediction (kriging). PAGE 23 2.3 Estimation for geostatistical models c HYON-JUNG KIM, 2016 ⇒ Fit a valid semivariogram model to the sample semivariogram Methods of fitting i) By inspection (by eye) ii) Ordinary nonlinear least squares (OLS) M in X [γ̂(h) − γ(h; θ)]2 with respect to θ h Semivariogram estimates are correlated! iii) Weighted nonlinear least squares (WNLS) Cressie, 1985 A weighted nonlinear estimator of γ(h; θ) is defined as a value θ̂ that minimizes the weighted residual sum of squares function M in X |N (h)| [γ̂(h) − γ(h; θ)]2 [γ(h; θ)]2 Note that the nonparametric estimates at large lags tend to receive relatively less weight. iv) Generalized nonlinear least squares (GLS) −1 ˆ M in[γ̂ − γ(θ)]0 [Var(γ̂)] [γ̂ − γ(θ)] ˆ - Derivation and calculation of Var(γ̂)]? v) Maximum likelihood (ML) / Restricted maximum likelihood (REML) - Assuming normality for a model Z = Xβ + , 1 1 L(β, θ; Z) = − log|V | − (Z − Xβ)0 V −1 (Z − Xβ) 2 2 where V = V (θ) denotes the covariance matrix of Z = (Z1 , . . . , Zn )0 and X is the model matrix for covariates. - Estimates θ and β simultaneously by finding values that maximizes L(β, θ). - Applicable to processes with second-order stationary errors only. - The restricted MLE (REML estimator) maximizes the log likelihood function associated with n−rank(X) linearly independent error contrasts. It is known to be less biased than MLE’s and thus, often more preferred especially when rank(X) is appreciable relative to n. PAGE 24 2.3 Estimation for geostatistical models c HYON-JUNG KIM, 2016 Model Selection Procedures - Visual inspection of semivariogram plot - Minimized weighted (or generalized) residual sum of squares function - Maximized log-likelihood (restricted log-likelihood) function - Penalized likelihood criteria e.g., Akaike’s criterion AIC = L(β̂, θ̂)− no. of estimated parameters Estimating the large-scale variation If the mean function m(s; β) is linear or nonlinear function of the elements of β, then linear or nonlinear least squares can be used to fit the model to the data. This is called trend surface analysis. This approach is quite easy to implement due to wide availability of computing software (e.g. PROC REG in SAS or lm in Splus, etc). Other approaches: • Median Polish The mean function is taken to be m(xl , yk ; β) = a + rk + cl • Locally weighted least squares (LOESS) - Only assumes that the mean function is smooth. - Estimates the smooth trend in a moving fashion by fitting a site-specific first-order or second-order polynomial to only the most proximate data to a site. - Fits using weighted least squares with weights inversely related to distance from the site. • Kernel estimator It is a type of local smoother which calculates a weighted average of observations near a target point(s): n X s − si 1 k zi 2 b i=1 b PAGE 25 2.3 Estimation for geostatistical models c HYON-JUNG KIM, 2016 where k(·) is called a kernel function or simply a kernel satisfying some moment conditions (e.g. a quadratic, or uniform kernel). • Smoothing splines It is an estimator which minimizes a functional criterion (penalized residual sum of squares) to fit the data well and at the same time has some degree of smoothness. Spatial Regression i) Generalized least squares (GLS) with known covariance matrix Model: Z = Xβ + , E() = 0, Var() = V (θ) where V = V (θ) is a completely specified positive definite matrix. - GLS estimator of β : β̂ GLS = (X 0 V −1 X)−1 X 0 V −1 Z ii) Estimated generalized least squares (EGLS) In practice, the true value of θ and consequently V is hardly known and completely specified. A natural way to deal with this problem is to replace θ in the evaluation of V by an estimator θ̂, thereby obtaining V̂ . - EGLS estimator of β : β̂ EGLS = (X 0 V̂ −1 X)−1 X 0 V̂ −1 Z Example: *** Mean structure or Covariance structure? The issue that was mentioned previously is that in practice, a decomposition of the data into large-scale and small-scale variation is not so clearcut. This problem is often addressed as follows (Statistics for Spatial Data, Cressie): “One man’s mean structure is another man’s covariance structure.” If replications of a spatial process are available, statistical procedures exist for distinguishing between two structures. In practice, however, geostatistical data are not usually replicated so we must settle for plausibility, rather than a high degree of certainty. PAGE 26 2.4 Spatial Prediction (Kriging) 2.4 c HYON-JUNG KIM, 2016 Spatial Prediction (Kriging) Goal: Predict a value of Z(s) at s0 (an arbitrary location in D) Spatial prediction usually refers to ‘interpolating’ a value rather than extrapolation for a random spatial process. The main idea relies on a form of weighted averaging in which the weights are chosen such that the error associated with the predictor is less than for any other linear sum. The terminology ’kriging’ is from D.G. Krige, a South African mining engineer who in the 1950’s developed empirical methods for predicting ore grades at unsampled locations using the known grades of ore sampled at nearby sites. For kriging, i) First choose a parametric model for the semivariogram or covariance function. ii) Estimate the semivariogram (covariance) parameters. iii) Make predictions and uncertainty estimates given the parameter estimates. The types of Kriging: a. Simple Kriging: assumes a constant known mean, but is not often used because for unbiasedness constraint to be applicable in kriging equations, we must estimate the expected value. b. Ordinary Kriging: assumes a constant unknown mean (mean needs to be estimated). c. Universal Kriging: assumes a trend in x and y, and may include other spatially varying covariates. 1. Ordinary Kriging (O.K.) by D.G. Krige Basic assumptions: i) The mean function is assumed to be constant. ii) The semivariogram is assumed to be known. Restrictions to obtain an ordinary kriging predictor: PAGE 27 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 i) It is a linear combination of the data values. ii) It is unbiased. iii) It minimizes the variance of prediction error among all functions satisfying the above 2 properties. ⇒ min Var[ n X λi Z(si ) − Z(s0 )] i=1 subject to n X λi = 1 i=1 Then, Ẑ(s0 ) = n X λi Z(si ) with E[Ẑ(s0 )] = µ. i=1 Kriging gives us the best linear unbiased predictor (BLUP) at any new location s0 . With the method of Lagrange multiplier (from Calculus), it is shown that the optimal coefficients λ1 , . . . , λn are the first n elements of the vector λo that satisfies the following system of linear equations, known as the ordinary kriging equations: Γo λo = γ o where λo = (λ1 , . . . , λn , m)0 γo = [γ(s1 − s0 ), . . . , γ(sn − s0 ), 1]0        γ(si − sj ) 1 for i = n + 1; j = 1, . . . , n     0 for i = n + 1; j = n + 1 Γo =   for i = 1, . . . , n; j = 1, . . . , n and m is a Lagrange multiplier and Γo is symmetric. The minimized variance called the kriging variance is given by 2 σOK (s0 ) = n X λi γ(si − s0 ) + m = λ0o γo . i=1 PAGE 28 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 3.0 Example: 1.5 ● 1.0 y 2.0 2.5 ● ● ● 0.0 0.5 ● ● ● 0 1 2 3 4 x Take γ(||h||) = 1 − exp(−||h||/2). Γo =  γo = √  1 − exp(− 5/2)    1 − exp(−1/2)         1 − exp(−1)      √    1 − exp(− 2/2)       1 − exp(−1)      √    1 − exp(− 2/2)     λo = Γo −1 γo = 1 0.017    0.422         0.065         0.218       0.031         0.246    0.004 2 σOK (s0 ) = λo 0 γo = 0.478. PAGE 29  2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Alternative expressions for ordinary kriging predictor and prediction variance which do not involve the unknown Lagrange multiplier are given below. Define λ = (λ1 , . . . , λn )0 γ = (γ(s1 − s0 ), . . . , γ(sn − s0 ))0 Γ = {γ(si − sj )} Then it can be shown that 1 − 10 Γ−1 γ m = − 0 −1 1 "Γ 1 # 1 − 10 Γ−1 γ −1 γ = Γ γ+ 1 10 Γ−1 1 So the OK predictor can be obtained as #0 1 − 10 Γ−1 γ 1 Γ−1 Z Ẑ(s0 ) = γ + 0 −1 1Γ 1 " and the kriging variance is 2 σOK (s0 ) = γ 0 Γ−1 γ − (1 − 10 Γ−1 γ)2 10 Γ−1 1 100(1 − α)% prediction interval for Z(s0 ), assuming the random field is Gaussian: Ẑ(s0 ) ± zα/2 σOK (s0 ) where zα/2 is the upper α/2 percentage point of the standard normal distribution. Remarks: Ordinary kriging is derived under the assumption of constant mean. This assumption will be relaxed later in discussion of Universal kriging. It is also derived under an assumption that the semivariogram is known. In practice, it is hardly known and must be estimated, and the estimator γ̂(·) replace γ(·) in kriging equations and kriging variance. However, it should be noted that the estimated kriging variance tends to underestimate the prediction error variance of the estimated OK predictor because it does not account for the estimation error incurred in estimating θ. Example continued from p 20: PAGE 30 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Suppose we wish to minimize the kriging variance at s0 (a new site inside the sampling configuration), and we have sufficient resources to take an observation at any one of the remaining unsampled sites (excluding s0 ). Kriging variances at s0 corresponding to the addition of each of the sites, A,B,C and D are Cross Validation It is a method of evaluating the aptness of a spatial correlation model using only data from the sample. It can be used for evaluating choices of search radius, lag tolerance, etc. Procedure: i) For location si , omit zi from the data set temporarily. ii) Estimate Z(si ) = zi from the remaining points and call it ẑ−i . iii) Compare the estimate ẑ−i to zi . iv) Repeat the above steps for all points i = 1, . . . , n in the sample. v) Compute the summary statistics and graphs of the cross-validation error distribution. Summary statistics: 1. Average of prediction sum of squares (PRESS): 1 n Pn i=1 (zi − ẑ−i )2 where ẑ−i indicates the prediction of zi from the rest of the data. 2. Mean of standardized PRESS residuals: n 1X (zi − ẑ−i )/σ̂−i n i=1 2 where σ̂−i is the mean squared prediction error for predicting zi from the rest. 3. Root mean squared prediction (standardized) residuals: v u n u1 X t n i=1 zi − ẑ−i σ̂−i !2 4. Histogram, scatterplots, of maps of PRESS residuals or standardized PRESS residuals Cautions: The model that appears best may depend on which summary statistics you used. PAGE 31 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 2. Universal Kriging A constant-mean assumption in ordinary kriging may not be reasonable in many practical situations. Two extensions which allow for nonconstant mean are universal kriging and median polish kriging. Assume Z(s) = β0 + β1 f1 (s) + . . . + βp fp (s) + (s) where fj (·)’s are functions of spatial location (which can be any covariates measured at each location) and (·) is assumed to be intrinsically stationary. Again, we seek to find a linear unbiased estimator which minimizes the variance of prediction error: min Var[ n X λi Z(si ) − Z(s0 )] i=1 subject to n X E[ λi Z(si )] = β0 + β1 f1 (s) + . . . + βp fp (s) i=1 (This yields a set of p + 1 constraints.) Then there are p + 1 Lagrange multiplier to be found, and the algebra is messier than the case of ordinary kriging. The optimal coefficients λ1 , . . . , λn are the first n elements of the vector λU that satisfies the following system of linear equations (UK equations): ΓU λU = γ U where λU = (λ1 , . . . , λn , m0 , m1 , . . . , mp )0 γ U = [γ(s1 − s0 ), . . . , γ(sn − s0 ), 1, f1 (s0 ), . . . , fp (s0 )]0 ΓU =              γ(si − sj ) for i = 1, . . . , n; j = 1, . . . , n fj−1−n (si ) for i = 1, . . . , n; j = n + 1, . . . , n + p + 1 0 for i = n + 1, . . . , n + p + 1; j = n + 1, . . . , n + p + 1 and ΓU is a symmetric (n + p + 1) × (n + p + 1) matrix. We should try to understand why the trend exists based on the nature of our data and use a simple form of the trend if possible. Then, we subtract this trend from the observed PAGE 32 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 data to obtain the residuals. We then use the residuals to compute the sample variogram, fit a model variogram to it, predict the values at the unsampled locations (“kriged” the residuals), and finally add the kriged residuals back to the trend. OTHER EXTENSIONS OF ORDINARY KRIGING: We have considered point kriging, i.e. prediction at a single site so far. Sometimes it is desirable to predict the average value over a region. This can be done by a straight forward extension of OK called ordinary block kriging. In some cases, quantity as P (Z(s0 ) ≥ z0 |Z) (e.g. “ozone levels in air cannot exceed 2 ppm” in environmental monitoring) is of more importance and a method for predicting such a quantity is called indicator kriging, which utilizes 0-1 data (exceeds standard or not). In other situations, there are measurements for more than one variable at each data location. An extended method which utilizes dependence between variables as well as dependence within variables to predict values at unsampled locations, is called cokriging. Block Kriging Suppose that we want to predict the average value of Z over a region B, i.e., R B Z(B) ≡ Z(s)ds |B| where |B| is the area of the block. The theoretical development is similar as in ordinary kriging and yields ordinary block kriging equations, ΓOB λOB = γ OB where γ OB = [γ(B, s1 ), . . . , γ(B, sn ), 1]0 −1 γ(B, si ) = |B| Z γ(u − si )du B The ordinary block kriging predictor of is given by Ẑ(s0 ) = n X λOB,i Z(si ) i=1 where λB,1 , . . . , λB,n are the first n elements of λOB . The kriging variance is given by λ0OB γ OB − |B|−2 Z Z B γ(u − v)dudv B PAGE 33 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 In practice, it will generally be necessary to evaluate the integrals by a numerical integration procedure. -“Change of support” problem Median Polish Kriging First do a median polish fit of overall, row and column effects and compute the residuals from this fit. Then, perform ordinary kriging to get, say, ˆ(s0 ) using those residuals. To get the median polish kriging predictor of Z(s0 ), add just the planar interpolated median polish fit at s0 to the kriged residual: Ẑ(s0 ) = m(s0 ; â, {r̂k }{ĉl }) + ˆ(s0 ) The kriging variance of the median-polish kriging predictor is taken (with little modification) to be the ordinary kriging variance based on the median polish residuals. Indicator Kriging Define the indicator random field I(s, z) =    1 if Z(s) ≤ z   0 otherwise The indicator random field is intrinsically stationary if the following conditions hold: i) E[I(s, z)] ≡ F (z) for all s and all z ∈ R. ii) Var[ I(s, z) − I(s + h, z)] ≡ 2γI,z (h, z) for all h and all z ∈ R. Indicator kriging proceeds as does ordinary kriging, but with I(si , z) in place of zi and γI,z (·) instead of γ(·). Prediction is often carried out at K levels z1 , . . . , zk , which requires the K corresponding semivariograms to be estimated and modeled. - Other simple methods of spatial prediction: 1. Method of polygons. 2. Weighted average based on triangulation. 3. Inverse distance (k-nn) method. PAGE 34 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Characterization of spatial cross-dependence When sampling over a spatial domain, measurements are often collected on more than one variables, say m variables and we may also be interested in the correlations between them. Consider for now the simplest case where we confine our development to the case of two variables, i.e. m = 2. As before, there are several functions that can be used to characterize the dependence of two variables. • Cross-covariance function Cij (s, t) = Cov(Zi (s), Zj (t)) i, j = 1, 2 Note that Cij (s, t) 6= Cij (t, s) for i 6= j, Cij (s, t) 6= Cji (s, t) for i 6= j, in general. • “Traditional” cross-variogram 2νij (s, t) = Cov(Zi (s) − Zi (t), Zj (s) − Zj (t)) i, j = 1, 2 • “Pseudo” cross-variogram 2γij (s, t) = Var(Zi (s) − Zj (t)) i, j = 1, 2 Note that νij requires that data on both variables must be measured at the same locations or at least at many of the same locations, whereas γij requires that the two variables be measured in the same units in order to be meaningful. It is recommended to standardize the variables before estimating this quantity. Estimation: Sample cross-covariance function (h = s − t) 2Ĉij (s, t) = 1 X (i) (j) Zi (s)Zj (t) − Z −h Z +h |N (h)| Sample cross-variograms 2ν̂ij (s, t) = 2γ̂ij (s, t) = 1 X (Zi (s) − Zi (t))(Zj (s) − Zj (t)) |N (h)| 1 X (i) (j) (Zi (s) − Zj (t))2 − (Z −h − Z +h )2 |N (h)| Example: PAGE 35 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Cokriging Suppose that the data are now m×1 vectors Z(s1 ), . . . , Z(sn ) and we may want to predict one or more values of the variables at an unsampled location. Denote the jth element of the ith of these vectors by Zj (si ). Let s0 denote the unsampled site. First consider that we wish to predict, say Z1 (s0 ). We can merely do the ordinary kriging to get a predicted value. However, if the other variables are correlated with the first variable, then a better predictor can be obtained from basing the prediction on all of the elements of Z(s1 ), . . . , Z(sn ). The best linear unbiased predictor of Z1 (s0 ) based on all of these others is called the (ordinary) cokriging predictor. When we wish to predict the entire vector of variables at an unsampled site, i.e. Z(s0 ), then it can be accomplished using similar ideas and is called multivariate spatial prediction. For m = 2, define Z1 = [Z1 (s1 ), . . . , Z1 (sn )]0 and Z2 = [Z2 (s1 ), . . . , Z2 (sn )]0 . Then the cokriging predictor of Z1 (s0 ) is given by λ1 0 Z1 + λ2 0 Z2 whereas the multivariate spatial predictor of Z(s0 ) is given by Λ1 Z1 + Λ2 Z2 where Λ1 and Λ2 are matrices. COKRIGING EQUATIONS: Assume that m = 2 and that the two variables are jointly second-order stationary for simplicity. The model for the process is Z(s) = β + (s). Or   Z1 Z2    = 1 0 0 1   β1 β2   + 1 2  . Also,  Z(s0 ) =  1 0 0 1   β1 β2   + PAGE 36 1 (s0 ) 2 (s0 )  . 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Define  Σ= where Σ11 Σ21 Σ12 Σ22   Cij (s1 , s1 )  ..  . Σij = Cov(i , j ) =   ··· Cij (s1 , sn )  ..   .    Cij (sn , s1 ) · · · Cij (sn , sn ) and   C11 (s1 , s0 )   ..     . c1 = Cov(, 1 (s0 )) =      C (s , s )   11 n 0       C21 (s1 , s0 )      ..   .     C21 (sn , sn ) The cokriging equations to predict Z1 (s0 ) are  Σ11    Σ21    10   00 Σ12 1 0  λ1        1   λ2       0   m1     Σ22 0 00 0 10 0 0  c1       =  1  0 m2 The ordinary cokriging predictor of Z1 (s0 ) is then λ1 0 Z1 + λ2 0 Z2 and the associated cokriging variance is given by (λ1 0 , λ2 0 )c1 + m1 . Note that the symmetry condition Cij (s, t) = Cij (t, s) should be satisfied in order for cokriging based on 2νij to give the optimal predictor. This condition is not required for 2γij which always gives the same predictor as cokriging based on the cross-covariance function. We can get the same results using the variance-based cross-variogram. The cokriging PAGE 37 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 equations in these terms are  Γ11    Γ21    10   00 Γ12 1 0  λ1       λ2  1       0   m1     Γ22 0 00 0 10 0 0  γ1        = 1  0 m2 The ordinary cokriging predictor of Z1 (s0 ) is then λ1 0 Z1 + λ2 0 Z2 and the associated cokriging variance is given by (λ1 0 , λ2 0 )γ1 + m1 . In order to implement cokriging, we need to estimate the cross-covariance functions or cross variograms, choose valid parametric models for these functions, and fit the model to the estimates. Much research is still needed on these topics especially because of the scarcity of known valid models. EXAMPLE: m = 2 γij (k, l) =    1 − exp(−|l − k|) for i = j   1 − 0.5exp(−|l − k + 1|) for i 6= j PAGE 38 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Space-time Geostatistics Suppose that we have observed spatial data at each of m time points, i.e., {Z(s1i , ti ), . . . , Z(sni , ti ) : i = 1, . . . , m} where s1i , . . . , sni are the ni data locations at time i, and t1 < t2 < . . . < tm are the times of observation. When ni ≡ n for all i, the data are said to be rectangular. The data are usually assumed to be an incomplete sampling of one realization of the stochastic process {Z(s, t), s ∈ D(t), t ∈ T }. If D(t) ≡ D and T = {1, 2, . . . , }, then we can view this as a time series of spatial processes. If the temporal correlation is non-negligible, then we generally need to assume spatial and temporal stationary of some kind. The generic space-time problem is to use the data to predict Z(s, t), where s ∈ D and t0 ∈ T . Typically, t0 ≥ tm . In principle, we can use ideas from spatial kriging to perform space-time kriging, but some differences arise: - Data in time often reveal a cyclical or periodic component but data in space usually do not. e.g. This can be dealt with by using a mean function model that contains some periodic components. - We must use a valid space-time covariance function or semivariogram. i) Include an extra parameter to scale properly for time. ii) Assume space-time additivity. iii) Assume space-time separability. PAGE 39 c HYON-JUNG KIM, 2016 3 Lattice Data Recall that the definition of the lattice data: nontrivial observations are taken at a finite number of sites whose whole constitutes the entire study region. For this type of data, there is no possibility of a response between data locations. When the data locations are points, geostatistical methods can be used to handle the data. So we shall focus on the cases where data locations are regions. For areal (lattice) data, we use neighbour information to define spatial relationships. Examples: - Cancer rate in each city district - Census data with zipcode division for a metropolitan area - Remotely sensed data Exploratory Data Analysis Many of the EDA tools previously introduced for geostatistical data can also be applied to lattice data. For data on a regular grid: median polish, plots of row or column mean versus row or column index, same-lag scatterplots Irregularly spaced regions: 3-D scatterplots, semivariogram cloud, plots of each datum against the average of its nearest neighbors, gray-scale maps, plot of response versus area of region, etc. The data analysis involves: representation of spatial proximity, testing for spatial pattern using Moran’s I or Geary’s c statistic, modeling with autoregressive models (SAR, CAR). Measures of spatial autocorrelation The study objective is mainly to measure how strong the tendency is for observations from nearby regions to more (or less) alike than observations from regions far apart, and then judge whether any apparent tendency is sufficiently strong that it is unlikely to be due to chance alone. - The data locations may be points or regions and response variables can be either discrete or continuous. PAGE 40 c HYON-JUNG KIM, 2016 Examples of spatial autocorrelation for binary (0-1) data: 1 1 0 0 1 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1. The general cross-product statistic Notation: - Let Zi denote the response at the ith location, i = 1, . . . , n. - Let Yij be a measure of how similar or dissimilar the responses are at locations i and j. - Let Wij be a measure of the spatial proximity of locations i and j. - Define matrices (for future reference) Y = (Yij ) and W = (Wij ). W is called a proximity matrix. The general cross-product statistic is given by C= XX i Wij Yij . j If C too small ⇒ If C too large ⇒ Example (hypothetical): Let Yij = (Zi − Zj )2 for binary Zi ’s. Wij =    1 if locations i and j are adjacent   0 otherwise 1 1 0 0 1 0 1 1 0 1 0 1 0 0 0 0 1 0 PAGE 41 c HYON-JUNG KIM, 2016 Testing the statistical significance of C: H0 : no correlation - Normal approximation - Comparison to randomization distribution - Monte Carlo approach i) Normal approximation of C: Let S0 = XX Wij , S1 = i6=j 1 XX (Wij + Wji )2 , 2 i6=j S2 = (Wi. + W.i )2 X i Let T0 , T1 , and T2 similarly but for the Yi j’s. Then, E(C) = S0 T0 n(n − 1) and Var(C) = (S2 − 2S1 )(T2 − 2T1 ) (S02 + S1 − S2 )(T02 + T1 − T2 ) S1 T1 + + − [E(C)]2 2n(n − 1) 4n(n − 1)(n − 2) n(n − 1)(n − 2)(n − 3) C ∼ approx. N (E(C), Var(C)). Compute z= |C − E(C)| − 1 q Var(C) Example continued: 1 1 0 1 1 0 0 0 0 PAGE 42 c HYON-JUNG KIM, 2016 ii) Randomization distribution - List all possible arrangements of the observed responses over the locations obtained by permutation of responses. - Compute C for each arrangement, and rank these. - Determine where the data’s C values fits in; P -value for the test is the number of C values in the randomization distribution as extreme or more extreme than the observed C. Example continued: iii) Monte Carlo approach - Observe that complete enumeration of the possible arrangements may be computationally prohibitive even for moderately-sized data sets. - So instead, obtain a random sample from the randomization distribution and follow the same type of procedure. - In order to implement this random sampling, generate n random numbers (one for each data location), rank these random numbers from smallest to largest, then rearrange the observations in accordance with the ranking of random numbers. C is computed for this arrangement, and repeat the whole process m times. - The P -values estimates the proportion of C values as extreme or more extreme than the observed C, and is given by P = 1 + number of C values ≥ observed C 1+m Example: PAGE 43 c HYON-JUNG KIM, 2016 2. Join-Count Statistics A subclass of general cross-product statistics which are for use with binary data. - Code the data as either 1 (black) or 0 (white). The black-white classification is for the purpose of making a map. - Question of interest: Are neighboring locations more likely to display the same color (or opposite colors) than what we would expect in the absence of spatial correlation? Procedures: • Classify the “joins” between contiguous regions as BB, BW , or W W . • Define Wij = 1 if regions i and j share an edge, and 0 otherwise (using rook’s definition of neighborhood). Other ways of defining neighborhoods: bishop’s, queen’s etc. • Count the number of joins of a specified type, e.g. the # of BW joins= BW . Note that if we define Yij = (Zi − Zj )2 , then C= XX i Wij Yij = 2BW j i.e. BW = C/2. Likewise, BB = C ∗ /2 where C ∗ is the value of C obtained by defining Yij = Zi Zj . If the total # of joins in the system is J, then W W = J − BB − BW . BW statistic: (There is some evidence that this statistic is slightly better than the other two.) Let b = # of black regions and w = # of white regions; (b + w = n). Note E(BW ) = 21 E(C) and Var(BW ) = 41 Var(C). It can be also shown that T0 = 2bw, T1 = 2T0 , T2 = 4nbw If the regions form a rectangular r × c lattice, and the rook’s contiguity definition is used, then S0 = 2(2rc − r − c), S1 = 2S0 , PAGE 44 S2 = 8(8rc − 7r − 7c + 4) c HYON-JUNG KIM, 2016 • Commonly used definitions of neighborhood Areal Data - Rook’s: spatial correlation down rows and across rows - Bishop’s: spatial correlation in diagonal direction Border/Edge Connectivity - Queen’s: omni-directional correlation Queen a single point meansspaced they are The same approach canshared be usedboundary for data at irregularly andneighbours. shaped locations, but Rook requires more than a single shared point to constitute neighbours. only formulas given for T0 , T1 , and T2 can apply but S0 , S1 , and S2 cannot. -BW statistic: T0 = b(b − 1), T1 = 2T0 , T2 = 4b(b − 1)2 20 / 46 Extensions to polytomous categorical data (i.e. a multi-colored map) are possible. 3. Moran’s and Geary’s statistic (for continuous data) Moran’s I (1950, Biometrika): n I= S0 where Z = P P i j Wij (Zi − Z)(Zj − Z) P 2 i (Zi − Z) P Zi . i n 1 E(I) = − n−1 under independence. 1 I > − n−1 ⇒ 1 I < − n−1 ⇒ - Normal approximation to distribution of I under independence (n > 25): E (I) as before. PAGE 45 c HYON-JUNG KIM, 2016 Var(I) = n[(n2 − 3n + 3)S1 − nS2 + 3S02 ] − k[n(n − 1)S1 − 2nS2 + 6S02 ] 1 − n(n − 1)(n − 2) (n − 1)2 P 4 n i (Zi −Z) where k = (P (Z . −Z)2 )2 i i - For small sample sizes, can use randomization distribution or Monte Carlo approach to evaluate significance. Geary (1954, The Incorporated Statistician): n−1 c= S0 P P i j P Wij (Zi − Zj )2 2 i (Zi − Z) Example: 6 9 6 5 7 4 4 2 2 4. Generalized Proximity Values The joint count statistic and Moran’s and Geary’s statistics have assumed that the Wij ’s are binary (0 or 1). This is rather crude. In many situations we may be able to measure spatial proximity on a more refined scale (as we do the Yij ’s in going from BW to I or c). Possible refinements: - Use lengths of common boundary. - Use actual distance between locations or centroids of locations, e.g. the inverse of Euclidean or city-block distance between locations. - Incorporate directionality by allowing Wij 6= Wij . A side benefit of using non-binary Wij is that the distribution of the test statistic under independence is better approximated by the normal distribution. Example: PAGE 46 c HYON-JUNG KIM, 2016 5. Spatial Autocorrelation Functions The statistics considered so far attempt to express information about spatial autocorrelation in a single number. Alternatively, we could consider regarding spatial autocorrelation as a function of distance. - Divide the range of distances into q classes. - Compute a previously considered spatial autocorrelation measure, e.g. I, once for each of the q distance classes; in other words, we use only those pairs of locations that are within the same distance class. - Plot the statistic, e.g. Id vs. d. Such a plot is called the correlogram corresponding to that statistic (just as done for geostatistical data). MODELS The most popular models for lattice data are similar to commonly used models for discrete time series. In time series analysis, one of the most well-known models is the autoregressive model of order one, AR(1): Xt = ρXt−1 + t , t ∼ iid N (0, σ 2 ), t = 0, ±1, ±2, . . . where ρ ∈ (−1, 1) is called the autoregressive coefficient. It can be shown that corr(Xt , Xt−1 ) = ρ, corr(Xt , Xt−2 ) = ρ2 , . . . , and in general, corr(Xt , Xt−k ) = ρk . Instead of assuming zero mean as in the AR(1) model above, we can allow for a trend by supposing that deviations of responses from time-specific means, rather than the response themselves, follow an AR(1) model: Xt − µt = ρ(Xt−1 − µt−1 ) + t , t ∼ iid N (0, σ 2 ), t = 0, ±1, ±2, . . . where µt ≡ E(Xt ). There are two ways to specify a first-order autoregressive model: i) A simultaneous AR(1) as above ii) A conditional AR(1): Xt |Xt−1 ∼ independent N (µt + ρ(Xt−1 − µt−1 ), σ 2 ). PAGE 47 c HYON-JUNG KIM, 2016 It turns out that these two specifications are equivalent here, i.e. they produce responses X1 , . . . , Xn that have the same joint distribution. Moreover, they are both equivalent to a “two-sided” specification, ρ ρ Xt − µt = (Xt−1 − µt−1 ) + (Xt+1 − µt+1 ) + t , 2 2 t ∼ iid N (0, σ 2 ) t = 0, ±1, ±2, . . . We can write this using matrix notation as follows: X − µ = ρW (X − µ) + , ∼ N (0, σ 2 I) where W is an n × n matrix whose nonzero elements specify the neighboring times of each time point. Specifically, after accounting for “edge effects”, W is given by We generalize these ideas to spatial models for lattice data. Consider a spatial model in which each response is a first-order autoregression on the average of its neighbors’ responses, i.e., P j∈Ni Zj + i , i ∼ iid N (0, σ 2 ) i = 1, . . . , n |Ni | where Ni is the set of neighbors of location i and |Ni | is the number of those neighbors. Zi = ρ Example: data on a 3 × 3 regular rectangular lattice, with neighbors defined as adjacent sites. Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9 This kind of model described is called a spatial autoregression model. As in the time series case, we can specify two kinds of spatial autoregressive models. In their most general forms they are: 1. Simultaneous autoregression (SAR) model Zi − µi = X Sij (Zj − µj ) + i , i ∼ iid N (0, σ 2 ) i = 1, . . . , n j where S ≡ {Sij } is such that I − S is nonsingular. In matrix form, Z − µ = S(Z − µ) + , ∼ iid N (0, σ 2 I) PAGE 48 c HYON-JUNG KIM, 2016 2. Conditional autoregression (CAR) model (Zi |Zj , j 6= i) ∼ N (µi + X Cij (Zj − µj ), σ 2 ), j where C ≡ {Cij } is such that I − C is symmetric and positive definite. However, in contrast to the time series case, the two specifications yield different models, i.e. if we take Cij = Sij the CAR yields responses whose joint distribution is different than for the SAR. Examples: INFERENCE For the SAR, the distribution of the data is as follows: Z − µ ∼ N (0, σ 2 (I − S)−1 (I − S 0 )−1 ) For the CAR, Z − µ ∼ N (0, σ 2 (I − C)−1 ) The log-likelihood function associated with a Z from either process is − n 1 1 log (2πσ 2 ) + |B| − 2 (Z − µ)0 B(Z − µ) 2 2 2σ where B=    (I − S 0 )(I − S) for a SAR   I −C for a CAR Usually the mean is parametrized by a linear model µ = Xβ, and the MLE’s of β, σ 2 and the parameters in B are given by β̂ = (X 0 B̂X)−1 X 0 B̂Z 1 σˆ2 = (Z − X β̂)0 B(Z − X β̂) n and B̂ minimizes the “profile log-likelihood” L(B) = n log (σ̂ 2 ) − log |B|. PAGE 49 c HYON-JUNG KIM, 2016 4 Spatial Point Patterns Terminology: • Meaning of “pattern”: pattern is the characteristic of a set of points (events) which describes the location of these points in terms of the relative distance and orientations of one point or one group of points to another points at one or more scales of observations. • Spatial point pattern: locations of a finite number of events in a bounded region A ⊆ Rd . • Spatial point process (SPP): a set of random events in Rd - a random mechanism for generating a countable set of points in A. • Key point: locations are modeled as random variables. Subsequently, we call the points in a spatial point pattern or process as events to distinguish them from arbitrary points in A. If one or more additional variables other than location labels are measured at each point, such a SPP is referred to as a marked point pattern. • Objectives of statistical analysis of SPPs: - What is the nature of the spatial pattern? - What is the intensity of the underlying process? - Can we model the process that we envisage has generated the data? Can we do statistical inference on the parameters of the model? • Appropriate statistical methods for addressing these questions depend on - the extent of sampling (completely mapped or sparsely mapped) - the type of sampling (quadrat or distance sampling) • Four-way classification of SPPs (qualitative, single scale, simplistic classifications) i) Completely random (complete spatial randomness:CSR ) No obvious structure, a random sample from the uniform distribution on A. PAGE 50 c HYON-JUNG KIM, 2016 ii) Aggregated (clustered, clumped) iii) Regular (overdispersed, inhibitory, superuniform) iv) Heterogeneous Exploratory Analysis 1. Quadrat methods Partition A into subregions of equal size (quadrats) and summarize the spatial pattern in each quadrat. Quadrats are usually rectangular, but may or may not constitute an exhaustive partition of the study area. 2. Distance methods Based on a reduction of the SPP to distance to events. These methods may utilize interevent distance such as distance of an event to its nearest neighbor or point-to-event distances or both. Note: - Size and shape of quadrats are arbitrary, and different choices can give you different results. - Two main problems with distance methods are edge effects and overlap effects. • Edge effects Distance measurements taken near the boundary of A will tend to be larger than those taken in the interior, since points or events near the boundary are denied the possibility of neighbors outside the boundary. Possible remedies: - Restrict attention to points or events in interior, surrounded by “buffer zone” - If A is rectangular, connect opposite edges (toroidal edge correction) - Censor the search distance and incorporate this into the distribution of distance measurements. - Base inferences on actual areas searched instead of distances PAGE 51 4.1 Models c HYON-JUNG KIM, 2016 • Overlap effects “Search areas” for the nearest event can overlap, resulting in dependent measurements. Possible remedies: - Use sparse sampling (undesirable however, for completely mapped patterns) - Censor to prevent overlap 4.1 Models • Notation and definitions: - Process: {N (x) : x ∈ A} - N (B): number of events in an arbitrary region B ⊆ A - |B|: area of B - dx: infinitesimal region containing x ∈ A - Intentsity function (first-order): λ(x) = lim|dx|→0 E[N (dx)] |dx| ! - Second-order intensity function: λ2 (x, y) = lim|dx|,|dy |→0 E[N (dx)N (dy)] |dx||dy| ! - Stationarity: A process is stationary if all probability statements about it in any region B ⊆ A are invariant under arbitrary translations of B. - Isotropy: A process is isotropic if the same invariance holds under rotation as well as translation. - Orderliness: A process is first-degree orderly if lim|dx|→0 P (N (dx) > 1) =0 |dx| PAGE 52 4.1 Models c HYON-JUNG KIM, 2016 A process is second-degree orderly if lim|dx|,|dy |→0 P (N (dx) > 1, N (dy) > 1) =0 |dx||dy| Note that orderliness of the first degree implies respectively, that the probability of a single event in an increasingly small area is constant, independent of other events. 1. Homogenous Poisson Process (HPP, CSR) Two equivalent characterizations: i) For every B ⊆ A, N (B) has a Poisson distribution with mean λ|B| for some λ > 0. ii) Conditional on N (A), the events of the process are a random sample from a uniform distribution on A. Note that i) ⇔ ii) - Stationary, isotropic, with the intensitiy = λ - λ2 (t) = λ2 2. Poisson Cluster Process (PCP) Three postulates: i) Cluster centers form a HPP with intensity ρ ii) The number of events in each cluster are iid variates with mean µ iii) Postisions of events within a cluster, relative to its center are iid ∼ pdf h(·) - Stationary with intensitiy λ = ρµ - isotropic ⇔ h(·) is radially symmetric. 3. Simple Inhibition Process (IPP) Two types i) Static: Modify an HPP of intensity ρ by deleting all pairs of events less than δ unit apart. ii) Sequential (dynamic): First event is uniformly distributed in A. The distribution of each subsequent event, conditional on all previously realized events, is uniform on that portion of A that lies no closer than δ to any previously realized event. - Stationary and isotropic PAGE 53 4.1 Models c HYON-JUNG KIM, 2016 - Static SIP has λ = ρexp(−πρδ 2 ) and λ2 =    0 if 0 < t < δ   ρ2 exp(−ρUδ (t)) if t ≥ δ where Uδ (t) is the area of the union of 2 circles of radius δ and centers a distance t apart. - For the sequential SIP, for any fixed number of events desired, δ cannot be too large or else it becomes impossible to add further events (related to maximum packing intensity). The maximum permissible value of δ is usually given by s √ 2|A| 3 3N 4. Inhomogeneous Poisson Process (IPP) This is a nonstationary process with variable intensity function λ(x). i) For every B ⊂ A, N (B) ∼ Poisson with mean R B λ(x)dx ii) Conditional on N (A), the events are a random sample from a continuous distribution with pdf ∝ λ(x) Simulation: - Generate an event from the uniform distribution on A. Call its coordinate vector x. - Retain the event at x with probability λ(x)/λ0 where λ0 ≡ max(x)∈A λ(x). - Repeat the above steps until N events have been obtained. 5. Cox Process (Doubly stochastic process) {Λ(x) : x ∈ R2 } is a nonnegative-valued stochastic process. Conditional on {Λ(x) = λ(x) : x ∈ R2 }, the events form an IPP with intensity function λ(x). 6. Markov Process A process is Markov of range δ if the conditional intensity at the point x, λ(x) configuration of events in A − x depends only on the configuration in a well-defined neighborhood of x. PAGE 54 4.2 Tests for CSR: Completely Mapped Patterns c HYON-JUNG KIM, 2016 7. Strauss Process This process belongs to the class of pairwise interaction processes. The joint density function for n point locations (s1 , . . . , sn ) which contains m distinct pairs of neighbors is specified as f (s1 , . . . , sn ) = αβ n γ m , β > 0, 0 ≤ γ ≤ 1 where α is the normalizing constant, β reflects the intensity, and γ describes the interaction between neighbors. 8. Thinned process Start with any “primary” process (an HPP, for example). Bring in a stochastic process {Z(x)} such that 0 ≤ Z(x) ≤ 1 ∀x. This is called the “thinning field”. Let {z(xi )} denote its realization. Retain each event xi in the realized primary process with probability z(xi ). 4.2 Tests for CSR: Completely Mapped Patterns 1. Quadrat Methods Let n1 , . . . , nm denote the counts from a partitioning of A into m equally-sized quadrats. Write n = P ni /m for the sample mean of the ni ’s. Then compute the “index of dispersion”, X2 = m X (ni − n)2 /n. i=1 If the pattern is completely random, then the distribution of X 2 is, to a good approximation, χ2m−1 (provided that n is not too small, say n ≥ 5. The test is two-sided: X 2 too large ⇒ X 2 too small ⇒ Example: Analysis of Japanese black pines. PAGE 55 examples of mapped patterns: 4.2 Tests for CSR: Completely Mapped Patterns •• • • • • • •• • • ••• • • • •• •• • •• • • • • • •• • • • • • • • • • •• • ••• •• • • • • • • c HYON-JUNG KIM, 2016 • • • •• •• • ••• • • •• • •• • • •• • •• • ••• •• •• ••• • •• ••• • •••• ••• •••• •• • • • • • • •• • • • •• • Pines • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Redwoods • • • • • • • • • • • • • • • • • • • • ••• • • • •• •• • • • ••• • • Cells • •• • • • • • • • • • Rushes - Quadrat methods are insensitive to regular departure from CSR. - Conclusion can depend on quadrat size and shape and the choice of which is quite arbitrary. - Too much information is lost by reducing the pattern to quadrat counts. However, an analysis based on combining contiguous quadrats can be very useful for characterizing pattern at different scales. For example, i) Successively combine quadrats into 2 × 2, 4 × 4, . . . , blocks ii) Plot X 2 for each block size vs. block size iii) Peaks or troughs in the plot can be interpreted as evidence of scales of pattern. iv) A problem with this approach is that the values of X 2 at different scales are not independent. Mead (Biometrics, 1974) suggests a modification that yields a sequence of independent tests. PAGE 56 4.2 Tests for CSR: Completely Mapped Patterns c HYON-JUNG KIM, 2016 2. Distance Methods a) Clark-Evans test (Ecology, 1954) - Based on the mean nearest-neighbor (NN) distance, Y Y too small ⇒ Y too large ⇒ Test statistic is given by CE = 1 √ 2 λ q 4−π 4λπN Y − where λ = N/|A|. - Test tends to be powerful for detecting aggregation and regularity, weak at detecting heterogeneity. - Under CSR, and if edge and overlap effects are ignored, the distribution of CE is to a fairly good approximation, N (0, 1). Note that the above statistic ignores edge and overlap effects. There are various modifications for these, and one way given by Donnelly, is as follows: s E(Y ) = 0.5 |A| l(A) l(A) + 0.0514 + 0.041 3/2 N N N s |A| |A| V ar(Y ) = 0.0703 2 + 0.037 l(A) N N5 where l(A) is the length of the study region’s perimeter. Example: PAGE 57 4.2 Tests for CSR: Completely Mapped Patterns c HYON-JUNG KIM, 2016 b) Diggle’s Refined NN analysis (Biometrics, 1979) A test based on the entire empirical distribution function (EDF) of the NN distances may be more sensitive than Clark-Evans test when there are few intermediate distances expected under CSR. Let Ĝ(y) = 1 #(Yi ≤ y). N If CSR holds, Ĝ(y) should be close to G(y) = 1 − exp(−λπy 2 ) for all y > 0, and a plot of Ĝ(y) vs. G(y) should be nearly a straight line. Ĝ(y) > G(y) for small y ⇒ Ĝ(y) < G(y) for small y ⇒ • Measures of discrepancy between Ĝ(·) and G(·): - ∆G = maxy |Ĝ(y) − G(y)| (Kolmogorov-Smirnov type) - {Ĝ(y) − G(y)}2 dy (Cramer-von Mises type) R • For significance of tests, usually Monte Carlo testing is used due to difficult distribution theory. That is, we compare the measure’s value for our data to the measure’s values for s simulations (typically take 99 or 999 times) of an HPP. - Edge and overlap effect modification: Ḡi (y) = 1 X Ĝj (y) s − 1 i6=i in place of G(y) is recommended. Koen (1990, Biometrical Journal) has tabulated the distribution of ∆G using simulation. Rather than reducing the EDF to a single summary statistic, it may be more informative to look at a plot of EDF. If the SPP is consistent with CSR, then a plot of Ĝ(y) vs. G(y) should be nearly a straight line through the origin. Departures from CSR can be detected by means of simulation envelopes, whose upper and lower endpoints are defined as U (y) = max i=1,...,s {Ĝi (y)}, PAGE 58 4.2 Tests for CSR: Completely Mapped Patterns c HYON-JUNG KIM, 2016 L(y) = mini=1,...,s {Ĝi (y)} where s is the number of simulated HPP patterns having the same number of events (s is usually taken to be 99), and Ĝi (·) is the NN-distance EDF for the ith simulation. For each y > 0, P [Ĝ(y) > U (y)] = P [Ĝ(y) < L(y)] = 1 . s+1 Simulation envelopes also indicate the distance at which a deviation from CSR occurs, if there is any. We can do precisely the same kinds of tests using the EDF of the point-to-nearest event distances X1 , . . . , Xm from m random or systematically placed sample points. Let F̂ (x) = 1 #(Xi ≤ x) m If CSR holds, F̂ (x) should be close to F (x) = 1 − exp(−λπx2 ) for all x > 0, and a plot of F̂ (x) vs. F (x) should be nearly a straight line. F̂ (x) > F (x) for small x ⇒ F̂ (x) < F (x) for small x ⇒ The use of both Ĝ(y) and F̂ (x) is what Diggle calls refined NN analysis. c) Ripley’s K-function approach (JRSS-B, 1979) The “K-function” (second-moment cumulative function) is defined as K(t) = 1 E(# of additional events within t of a randomly chosen event) λ - It combines distance measurements with quadrat counting. For a HPP, K(t) = πt2 . Equivalently, K(t) L(t) ≡ t − π L(t) < 0 (K(t) > πt2 ) for small t ⇒ L(t) > 0 (K(t) < πt2 ) for small t ⇒ PAGE 59 !1/2 =0 4.2 Tests for CSR: Completely Mapped Patterns c HYON-JUNG KIM, 2016 - Ripley proposes a nonparametric estimator K̂(t) of K(t) (whose exact form we will not go into). He suggests looking at the plot of L̂(t) ≡ t − {K̂(t)/π}1/2 vs. t and computing a test statistic Lmax = max t<t0 |L̂(t)| The upper bound t0 is used to account for the scarcity of information about K(t) at “large” distances. • A Monte Carlo approach can be used preferably to assess significance. Example: Modeling Completely Mapped Patterns Methods used to fit models to patterns may be different for different models. For example, for some models, maximum likelihood estimation is possible but for others the likelihood function is intractable or computationally burdensome to evaluate. 1. Stationary Processes If CSR is rejected, we may want to fit an alternative model to the data such as PCP, or SIP. Let K̂(t) be a nonparametric estimator of K(t), and suppose that we wish to fit a family of stationary models whose k-function is a known function of a parameter vector θ. A modified least squares estimator for θ is obtained by minimizing Q(θ) = Z t0 {[K̂(t)]c − [K(t; θ)]c }2 dt 0 where c and t0 are “tuning constants”. • Some computable K-functions - PCP with Poisson number of offspring per parent: K(t) = πt2 + H(t)/ρ where H(t) is a nonnegative valued function. - SIP: 2 K(t) = 2πexp(2πρδ ) Z t δ PAGE 60 exp {−ρUδ (x)}xdx 4.2 Tests for CSR: Completely Mapped Patterns c HYON-JUNG KIM, 2016 • c is used to control for heterogeneity of variance of K̂(t); c = aggregated patterns, and c = 1 2 1 4 is suggested for for regular pattern. • t0 is used as an upper limit since the pattern supplies increasingly limited information as t increases. 2. Inhomogeneous Poisson Process (IPP) a) Maximum likelihood estimation Consider a parametric family of intensity functions {λθ (x, y) : θ ∈ Θ}. For this family, the likelihood function is proportional to l(θ; A) = N (A) {Πi=1 λθ (xi , yi )} exp {− Z A λθ (u, v)dudv} An MLE of θ is a value of θ̂ that maximizes l(θ, A). A particularly useful family of intensity function is λ(x, y; θ) = exp {θ0 z(x, y)} where z(x, y) is a vector whose components may be values of concomitant environmental variables, known functions of the coordinates themselves, or distances to known environmental features. - Special case of HPP: b) Nonparametric estimation As an alternative to parametric estimation, nonparametric methods for multivariate density estimation can be used to the problem of estimating λ(·). An edge-corrected kernel estimator for λ(x, y) is given by q  N (A) X 1 (x − xi )2 + (y − yi )2 1   λ̂h (x, y) = κ ph (x, y) i=1 h2 h where κ(·) is a probability density (kernel) function symmetric about the origin, h > 0 is a bandwidth that determines the amount of smoothing, and ph (x, y) is an edge correction. PAGE 61 4.3 Testing for CSR: Sparsely Sampled Data 4.3 c HYON-JUNG KIM, 2016 Testing for CSR: Sparsely Sampled Data Now suppose that there were not sufficient resources to completely map the events and that the pattern was sampled in some manner. The most common sampling methods that are used are completely random sampling and systematic sampling. Note that the sampling must be taken completely independently of the observed events. - Advantages and disadvantages of systematic sampling: 1. Quadrat methods Here n1 , . . . , nm are the counts from m randomly placed, non-overlapping, equally-sized quadrats in A with relatively sparse coverage of A, rather than a complete partition of A. The same test statistic and its limiting distribution under CSR can be used as in the completely mapped case: 2 X = Pm i=1 (ni − n)2 n ∼ χ2m−1 under CSR. This approach is more competitive in this context and generally quite powerful against aggregation and heterogeneity, but weak against regularity. - Example: Lansing woods data 2. Distance methods Data generally cannot be NN distances if it is regarded as a random sample. So consider methods based on m sample point-to-nearest-event distances, X1 , . . . , Xm . - Then, each Xi has cdf F (x) = 1 − exp (−λπx2 ) under CSR (ignoring edge effects). - So if X1 , . . . , Xm are independent (as is the case if the overlap effects are ignored), then 2λ m X πXi2 ∼ χ22m . i=1 However, an exact test for CSR cannot be based on this since λ is unknown. • Hopkins’ test Suppose that we could measure NN distances Y1 , . . . , Ym from a randomly selected subset of m events for the sake of argument. Then by the same arguments, PAGE 62 4.3 Testing for CSR: Sparsely Sampled Data 2λ m X c HYON-JUNG KIM, 2016 πYi2 ∼ χ22m . i=1 which are independent of 2λ Pm i=1 πXi2 under CSR ignoring overlap effects. Then, 2 2λ m i=1 πXi /2m H≡ ∼ F2m,2m . Pm 2λ i=1 πYi2 /2m P H small ⇒ H large ⇒ But as noted above, Hopkins’ test is not quite sound as we cannot get a random sample of Yi ’s. • Alternatively, consider T-square sampling: Let Xi be the distance from sample point to nearest event and Zi be the distance from the nearest event to its NN within the half plane “perpendicular” to the chord from the point to the nearest event. Thus, the search area associated with Zi is a semicircle, not a circle. By an argument similar to one given before, λ m X πZi2 ∼ χ22m . i=1 Then, we can test for CSR using 2λ m πX 2 t ≡ Pmi=1 2i ∼ F2m,2m . λ i=1 πZi P There are several distance-based tests for CSR that have been proposed and you can refer to, for example, the book by Cressie for more of those. PAGE 63

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1 Introduction