Download PPT21 - SOEST

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
GG 313 beginning Chapter 5
Sequences and Time Series Analysis
Sequences and Markov Chains
Lecture 21
Nov. 12, 2005
Ordered Sequences
Many geological problems have ordered sequences for data.
Such data have observations whose time of occurrence or
location are important - as opposed to sequences where the
time or location is not important.
Many such sequences occur in pairs, with a time and an
observation, such as earthquake occurrences, volcanic
eruptions, seismometer data, magnetic data, temperatures,
etc. Other data may have an observation tied to a location,
such as bathymetric data, and any map information. These
data are often called TIME SERIES, whether the
independent variable is a location or time.
Analysis of sequential data is aimed at addressing
questions such as:
1) Are the data random, or do they contain a pattern or
trend?
2) If there is a trend, what form does it have?
3) Are there any periodicities in the data?
4) What can be estimated or predicted from the data?
Methods for comparing two or more sequences are
broadly grouped into two classes. In the first, the exact
location of an event matters. Good examples of this class
are X-ray diffraction data and mass spectrometer data.
Two peaks at different locations are not related and give
us specific information.
The second class compares sequences where absolute
location is not important, such as different types of
earthquakes and other characteristic events. These
sequences are compared by cross-correlation, and we will
cover them beginning in the next class.
Some data form sequences that do not fit well into the
time series classification, a good example is stratigraphic
sequences.
It is not a simple matter to relate a particular layer or
sequence of layers to time, since compaction and
variations in deposition rate do not allow a simple
relationship between layer thickness and time.
In Figure 5.1, a stratigraphic sequence is shown. If we
sample this sequence 62 times, we can record 61
transitions from one type of rock to another.
In this example, we have four different rock types, (A)
sandstone, (B) limestone, (C) shale, and (D) coal. We
would like to know if there are some transitions that are
favored over random transitions. For example, is change
from limestone to sandstone more likely than limestone to
coal? Since there are four different states, or rock types,
there are 16 possible types of transitions:
AA

BA
CA

DA
AB
BB
CB
DB
AC
BC
CC
DC
AD 

BD 
CD 

DD 
By looking at the change from observation n to observation
n+1 for the sequence in Figure 5.1, we can set up a
 frequency matrix:
transition
transition frequency matrix:
A
B
C
D
column
total
A
B
C
D
17
0
5
0
22
0
5
2
0
7
5
2
17
3
27
0
0
3
2
5
Row
total
22
7
27
5
61
Thus, there are 17 cases where A remains A from one
measurement to the next, and 5 cases where A changes to
C, but no cases where A changes to D.
Is it necessary that there be as many cases of A changing
to C as there are C changing to A? Does the matrix need
to be symmetric? Paul says it should be.
We can more easily quantify the tendency to change from
one state to another by changing the numbers in the above
matrix to fractions or probabilities:, making a transition
probability matrix by dividing each row by its total, such
that each row sums to 1.0:
From/
To
A
B
C
D
A
0.77
0
0.23
0
B
0
0.71
0.29
0
C
0.19
0.07
0.63
0.11
D
0
0
0.6
0.4
The probability of D
changing to C is 0.6.

If we divide the row totals in the transition frequency matrix
by the total number of transitions, we get the probability of
each state, in other words, we get the proportions of each
lithology, called the marginal or fixed probability vector:
f  0.36 0.12 0.44 0.08
(5.1)
We can describe these properties in a cyclic diagram to
show the probabilities involved.
A similar diagram could be drawn for the hydrologic cycle
and similar phenomena.
Recall from Chapter 2 that the probability of two events (A
and B) will occur (the joint probability) equals the probability
of B given that A has occurred times the probability of A
occurring:
P(A, B)  P(B | A) P(A)
Which we rearrange to:
P(A, B)
P(B | A) 
P(A)
(5.2)
(5.3)

In our example, this is the probability that B occurs after A.
If all events are independent, that is the probability
distribution is uniform,
and no event depends on the

previous event, then:
P(B | A)  P(B | anything )  P(B)
(5.4)
If all events are independent, then we can make a matrix
showing what the probabilities of each transition are, given
the relative abundance of each state. Each row will be the
same and the sum of the probabilities in each row will equal
1. The numbers in each row are the fixed probability vector
(5.1):
A
A (SS) 0.36
B
0.12
C
0.44
D
0.08
B (LS) 0.36
C (SH) 0.36
D (CO) 0.36
0.12
0.12
0.12
0.44
0.44
0.44
0.08
0.08
0.08
These are the EXPECTED transition probabilities given
that each transition from lithology to another is
independent.
We are now in a position to be able to test whether the
observed transitions are random (independent) or not. Our
hypothesis is that the transitions are not random, and our
null hypothesis is that the transitions occur randomly. We
first change our probability matrix above back to
frequencies by multiplying by the row totals from the
transition frequency matrix:
22
 0.36

 
7

 0.36

27  0.36

 
5

 0.36
0.12
0.12
0.12
0.12
0.44
0.44
0.44
0.44
0.08 7.9
 
0.08 2.5

0.08 9.7
 
0.08 1.8
2.6 9.7 1.8 

0.8 3.1 0.6
3.2 11.9 2.2

0.6 2.2 0.4
(5.6)
and test with:
(Oi  Ei )
 
Ei
i1
n
2
(5.7)
The Oi’s are from the data frequency matrix and the Ei’s
are from the predicted frequency matrix (5.6).
The degrees of freedom , are (m-1)*(m-1) where m=4 in
this example.
One degree of freedom is lost from each

row and column because “all must add up to 1”. I see
where the rows add up to 1, but where do the columns add
to 1?
2 is only valid if the expected values are greater than 5,
otherwise the error is too large for a valid test. We can get
around this by combining some categories to raise the
expected values. Since we are only testing for
independence , this is OK.
Some transitions are larger than 5 anyhow (AA, A AC,
CC, and we combine others to form a total of 7 new
categories:
7.9

2.5
9.7

1.8
2.6 9.7 1.8 

0.8 3.1 0.6
3.2 11.9 2.2

0.6 2.2 0.4
And 2 thus is:
2 17  7.9 
2
 
7.9
5  9.7


2
9.7
5  9.7


2
9.7

(5.8)
17 11.9  7  7.0  5  5.0  5  9.8  19.7
11.9
7.0
5.0
9.8
2
2
2
2
We haven’t lost any degrees of freedom by combining
categories, so =(m-1)*(m=1)=9, and the critical 2 value
from the tables is 16.92 at 95% confidence. Since the
critical value is smaller than the observed (5.8), we can
reject the null hypothesis that the transitions are
independent, thus there is a statistical dependence on the
transitions from one lithology to the next.
Geologically, this is to be expected, since limestones are
most often deep-water depositional environments, and
coals are subaerial.
Sequences that are partially (statistically) dependent on the
previous state are called Markov chains. Sequences that
are completely determined by the previous value are called
deterministic. For example the sequence: [ 0 1 2 3 4 5 6 ]
is deterministic, or fully predictable. We can also have
sequences that are completely random.
Markov chains where values depend only on the previous
value are said to have first-order Markov properties.
Those that depend on the next previous value also have
2nd order Markov properties, etc.
We can use the transition probability matrix to predict the
likely lithology 2 feet above a point. This might be
necessary to fill in a missing point, for example, with the
most probable value.
For example if we have B (limestone) at some depth, what
is the most likely lithology 2 feet (two steps) above?
From/
To
A
B
C
D
BA (SS) : 0%
A
0.77
0
0.23
0
BB (LS): 71%
B
0
0.71 0.29
0
For a single step,
BC (SH): 29%
BA (CO): 0%
C
D
0.19 0.07 0.63 0.11
0
0
0.6
0.4
So we can only get from B to B or B to C. If the transition is
to C, then the probability of the NEXT transition is:
CA (SS): 19%
CB (LS): 7%
CC (SH): 63%
CD (CO): 11%
From/
To
A
B
C
D
A
0.77
0
0.23
0
B
0
0.71 0.29
0
C
D
0.19 0.07 0.63 0.11
0
0
0.6
0.4
We can see that the the probability of each possibility is:
P(BC)•P(CB)=.29•0.07=2%
P(BB)*P(BB)=.71•.71=50%, so
P(B?B)=P (BBB)+ P(BCB)=50%+2%=52%
This process gets more complex as we ask for predictions
higher in the chain, (higher order Markov properties) but
there is an easy way. We just square the probability matrix
to get the 2nd order Markov properties, and cube it to get
3rd order, etc.
IN CLASS: what is the probability of having a shale three
feet above a limestone in this example?
Embedded Markov chains:
The choice of sampling interval is arbitrary and important. If
we sample too sparsely, we will likely miss information
completely. If we sample too closely, then the diagonal
elements of the probability matrix will approach 1 and the
off-diagonal elements will approach zero.
What if we only sample at points of “real” transitions, and
ignore points where the two states are the same?
In this case, the transition frequency matrix will be zeroes
along the diagonal:
In this example we have five lithologies, A (Ms,
mudstone), B (Sh), C (Ss, siltstone), D (Ss), and E (Co).
The fixed probability vector is found by dividing the row totals
by the grand total:
f=[0.3 0.09 0.24 0.2 0.17]
To test the Markov properties, we would like to do a 2 test,
but we cannot use the fixed vector to estimate the transition
frequency matrix because the diagonal terms would be nonzero.
If we did NOT ignore the repeated states, then the frequency
matrix would have identical numbers except along the
diagonal. If we raise this matrix to a higher power, then we
could discard the diagonal terms, adjust the off-diagonal
terms to sum to 1, and get our results. Since we don’t know
the number of repeated states, we look for the diagonal terms
by trial and error.
We iterate (try over and over) to find these terms as
follows:
1) Put arbitrary large estimates (like 1000) into the
diagonal positions in the observation matrix.
2) Divide the row totals by the grand total to get the
diagonal probability.
3) Calculate new diagonal estimates by multiplying
diagonal probabilities from step 2 by the latest row
sums.
4) Repeat the process steps 2 and 3 until the diagonal
elements remain unchanged, typically about 10-20
iterations.
For our comparison, we test against independent states,
and the probability that state j will follow state i is
P(ij)=P(i)•P(j).
We construct our expected probability matrix: Pc,
We now zero out the diagonal elements and use the offdiagonal counts to calculate the 2 value for our data from
eqn (5.7). We get 2 =172, which is much larger than the
critical value for =(m-1)2-m=11 degrees of freedom,
indicating a strong dependence on the transitions - a strong
1st order Markov characteristic.