Download ch4.distance.methods..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Chapter 4 – Distance methods
Distance sampling (or called plotless sampling) is widely used in forestry and ecology
to study the spatial patterns of plants. Numerous mathematical models based on
distance sampling have been developed since the 50’s.
These models depend partly or wholly on distances from randomly selected points to
the nearest plant or from a randomly selected plant to its nearest neighbor.
The majority of the models are based on the assumptions that (1) the population of
interest is randomly distributed (Poisson distribution) within an infinitely large area and
(2) an observed distribution is a realization (or part) of the theoretical population.
Distance methods make use of precise information on the locations of events and have
the advantage of not depending on arbitrary choices of quadrat size or shape.
1
100
80
1. Two types of distance
measures: from tree to
tree and from point to
tree.
2. In general a buffer
zone is needed to
eliminate edge effect.
y
60
ri
x
20
40
ri
ri
0
20
40
60
80
100
x
2
Nearest neighbor distance index
This index is the simplest one, based on the distance from a tree to its nearest
neighbor. It was first developed by Clark and Evans (1954). It is defined as
R
rA
rE
where R = the nearest neighbor index
rA = average distance from randomly selected plants to their nearest
neighbors
rE = expected mean distance between nearest neighbors. Under the Poisson
distribution with intensity l, we have
rE 
1
2 l
* Clark, P.J. and Evans, F.C. 1954. Distance to nearest neighbor as a measure of spatial relationships in
populations. Ecology 35:445-453.
3
Testing the nearest neighbor distance index
R
rA
rE
The ratio R provides a method for detecting the degree to which the observed
distance departs from random expectation. In a regular distribution, R would be
significantly greater than 1, whereas in an aggregated distribution R would be
significantly less than 1. To test the null hypothesis (H0) that the observed
distance is from a randomly distributed population, we have
z
where rE 
1.
2.
1
2 l
rA  rE
~ N (0,1)
sr
and sr  4   1  0.26136 .
4 nl
nl
Two-tail test: p-value = p(|z|  zobs). Large |zobs| value has small p-value,
evidence against H0, suggesting aggregated or regular pattern.
One-tail test: p-value = p(z  zobs) for testing regularity, or p-value = p(z  zobs)
for testing aggregated pattern.
4
Derivation of the nearest neighbor distance
We now go on to show how the nearest neighbor distance was derived. Assume a
population of organisms randomly distributed with intensity l, the probability of x
individuals falling in any area of unit size is
p( x ) 
lx e  l
x!
•
.
r
Then, the number of individuals in a circle of radius r follows
a Poisson distribution with mean
p( x ) 
lr2:
•
•
r1
•
2 x l r 2
( lr ) e
x!
.
Similarly, the probability that the number of individuals in the annulus between the
concentric circles radii r and r1 is
p( x ) 
[l ( r12
x l ( r12 r 2 )
 r )] e
x!
2
.
5
p( x ) 
2 x l r12
( lr1 ) e
x!
[l ( r22  r12 )] x el
p( x ) 
x!
•
.
r
•
( r22 r12 )
•
•
.
r1
The probability for the nearest neighbor distance r can be derived as follows.
p(r)  p(circle r is empty, but individuals occur in the annulus)
= p(circle r is empty)  p(individuals occur in the annulus)
It is straightforward to compute the two probabilities:
The first probability is:
p( x  0 )  e l r .
2
The second probability is: p( x  0)  1  e
Therefore,
p( r )  e
lr 2 
1  e

l ( r12 r 2 )
.
l ( r12 r 2 ) 
.

6
The probability for the nearest neighbor distance r is obtained by assuming r1  r:
p( r )  e
e
lr 2 
1  e

l r 2
l ( r12 r 2 ) 

l ( r12

e

lr 2
 r )  le
2
1  1  l (r
2
1
l r 2
 r2 )

( r1  r )( r1  r )  2rle
l r 2
dr
Thus, the pdf for the nearest neighbor distance r is a Weibull distribution:
p( r )  2lre

Mean:
2 lr 2
E ( r )   2lr e
l r 2
dr 
0
Variance:
V (r ) 
4 
4l
V (r ) 
1
2 l
4 
4nl
Need to use the gamma function:

( )   x 1e  x dx
0
(
1
)
2

7
An example for the nearest neighbor distance
We test the spatial pattern for the western hemlock in the Victoria Watershed plot.
There are 982 hemlock stems in the 10387 m plot. The procedure is as follows.
1.
2.
20
40
60
80
Clark & Evans Nearest Neighbor Index
0
Randomly choose 200 stems,
Measure the distance for each of these 200 stems to
its nearest neighbor,
3.
Average these 200 distances (= 1.0458),
4.
Calculate the expected mean distance (= 1.5104),
5.
Compute the density l = 0.1096,
6.
The nearest neighbor index R = 0.6924,
7.
Calculate the standard error sr = 0.05582,
8.
Calculate the z-value = (1.0458-1.5104)/0.05582 =
-8.3232,
9.
p-value = p(z  zobs) = p(z  -8.3232) = 0,
10. Conclusion: Reject null hypothesis of random
distribution; strong evidence for aggregated spatial
pattern.
11. R: distance.main(hl.xy,200,”event.event”)
0
20
40
60
80
100
8
The nth nearest neighbor distance
Thompson (1956) proved that the mean distance to the nth nearest neighbor is
E (rn ) 
For Victoria HL:

1 n  2n !
l 2n  n!2

1
l
0.5 1/ 8 n
n e

1
l
n 0.5
Observed Hemlock
l  0.1124
CSR expectation
Observed Hemlock
Thompson, H.R. 1956. Distribution of distance to n th neighbour in a
population of randomly distributed individuals. Ecology 37:391-394.
9
1 n(2n)!
1 1/ 2
E (rn ) 

n
n
2
l (2 n!)
l
Hubbell, S.P. et al. 2008. How many tree species are
there in the Amazon and how many of them will go
extinct? PNAS 105:11498-11504
Index of point to plant distances
First proposed by Pielou (1959), is based on the distances from randomly chosen
points to their respective nearest events (trees). The index is defined as
  l
where  = Pielou’s index of non-randomness
l = average density of events per unit area
 = mean squared distance between randomly chosen points to their nearest
neighbors. For randomly distributed population, it is E ( )  1 .
l
For observed distances, it is calculated as
1 n 2
   ri
n i 1
(ri is the distance from the ith point to its nn)
* Pielou, E.C. 1959. The use of point-to-plant distances in the study of the pattern of plant populations.
Journal of Ecology 47:607-613.
12
Test statistics for Pielou’s index
  l
It can be shown that 2n ~ c22n. (Sketch of the derivation: Following the Weibull
distribution on p.7, it is easy to show that  has an exponential distribution:
f() = le- l = e- ( is the density per unit circle). Then the sum of ’s
follows a gamma distribution of which c2 is a special case.)
E ( ) 
Thus,
n 1 1
n l
(Unbiased estimator)
E (2n )  E ( 2nl )  2nlE ( )  2( n  1)
Test for the hypothesis of random pattern:
1.
p-value = p(c22n > 2n) for testing aggregated pattern of distribution. Large
2n value has small p-value, evidence against H0, suggesting aggregated
patterns.
2.
p-value = p(c22n < 2n) for testing regularity. Small 2n value leads to small
p-value, evidence to suggest regular patterns.
13
Hopkins and Skellam’s coefficient of aggregation
This test is based on the assumption that a population is randomly distributed if the
distribution of distances from a random point to its nearest neighbor is identical to
the distribution of distances from a random plant to its nearest neighbor. The index
is defined as the ratio of the sum of the squared distances from point-to-plant (1)
to the sum of the squared distances from plant-to-plant (2):
A
 1
2
A = 1 for a randomly distributed population
A > 1 for an aggregated population
A < 1 for a regular population.
To test whether A departs significantly from its expectation of 1, the sampling
distribution for the following statistic is derived:
x
A
u
 1


1 A
uv
 1    2
14the type of
Hopkins, B. (with an appendix by Skellam, J.G.) 1954. A new method for determining
distribustion of plant individuals. Ann. Bot., London, N.S. 18:213.
x ~ Beta distribution
It is not difficult to show that x follows a beta distribution.
A
u
 1
x


1 A
uv
 1    2
That is
where
f ( x) 
( n)
u n1e u
 ( n ) ( n )
.
 ( 2n)
The mean and variance of the beta distribution are:
V ( x) 
f (u ) 
n
1
x n 1 (1  x) n 1.
B(n, n)
B(n, n) 
E ( x) 
Note (same for v):


 0.5
Standard beta distribution:
f ( x) 
1
x 1 (1  x)  1
B( ,  )

1

(   ) 2 (    1) 4(2n  1)
15
Test for x
x
A
u
 1


1 A
uv
 1    2
x = 0.5 is for random distribution
x > 0.5 is for aggregated distribution
x < 0.5 is for regular distribution
For a large sample size n, x tends towards normality. We have
z  2( x  0.5) 2n  1 ~ N (0,1).
Therefore, a statistical decision can be made based on the size of p-value:
1.
p-value = p(z > zobs) for testing aggregated pattern of distribution. Large zobs
value has small p-value, suggesting an aggregated pattern.
2.
p-value = p(z < zobs) for testing regularity. Small zobs value leads to small pvalue, evidence for a regular pattern.
16
Spatial relationships between two species
Segregated species
0.4
0.6
0.2
0.2
0.4
Random pattern
0.6
0.8
0.8
1.0
1.0
Unsegregated species
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
Aggregated pattern
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
0.8
17
Index of species segregation
Segregation is the degree to which the individuals of two (or more) species tend
to separate from one another. We have learned that quadrat counts could be
used to test the association of two species, but the results are strongly influenced
by quadrat size. An alternative approach which overcomes this problem is
based on distance sampling.
Assume there are two species, we randomly select an individual plant and locate
its nearest neighbor and then record the species type. This process is repeated N
times. The data can be summarized in a contingency table similar to the one for
the quadrat counts.
Nearest neighbor
Base Species
Species A
Species B
Species A
a
b
m = a+b
Species B
c
d
n = c+d
r = a+c
s = b+d
N
18
Index of segregation (Kappa statistic)
Cohen (1960):
2
Pielou (1961):
 
 1
1
 
N
2
where
N (b  c )
2
N  xii   x1
i 1
i 1
2
N 2   xi  x i
i 1
N  ( mr  ns )
2
 1 (1  1 ) 2(1  1 )( 21 2   3 ) (1  1 )2 ( 4  4 22 ) 



3
4
 (1   )2 

(
1


)
(
1


)
2
2
2


xii
i 1 N
2 x
x  x i
 3   ii i 
i 1 N
N
2
1  
xi  x i
2
i 1 N
2 2 x
x  x j 2
 4    ij ( i 
)
N
N
i 1 j 1
2
2  
Note: With a large
sample size,  ~ N(0,1)
Nearest neighbor
Base Species
Species A
Species B
Species A
a (x11)
b (x12)
m = a+b (x1+)
Species B
c (x21)
d (x22)
n = c+d (x2+)
r = a+c (x+1)
s = b+d (x+2)
N
* Pielou, E.C. 1961. Segregation and symmetry in two-species population as studied by nearest-neighbor
19
relationships. Journal of Ecology 49:255-269
What we have learned?
•
•
•
•
•
The concept of nearest neighbor distances
Tree-to-tree (event-to-event) distances (Clark & Evans 1954)
Point-to-tree distances (Pielou 1959)
Hopkins and Skellam’s index of aggregation (Hopkins 1954)
Index of species aggregation (kappa statistics)
20