Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GG 313 beginning Chapter 5 Sequences and Time Series Analysis Sequences and Markov Chains Lecture 21 Nov. 12, 2005 Ordered Sequences Many geological problems have ordered sequences for data. Such data have observations whose time of occurrence or location are important - as opposed to sequences where the time or location is not important. Many such sequences occur in pairs, with a time and an observation, such as earthquake occurrences, volcanic eruptions, seismometer data, magnetic data, temperatures, etc. Other data may have an observation tied to a location, such as bathymetric data, and any map information. These data are often called TIME SERIES, whether the independent variable is a location or time. Analysis of sequential data is aimed at addressing questions such as: 1) Are the data random, or do they contain a pattern or trend? 2) If there is a trend, what form does it have? 3) Are there any periodicities in the data? 4) What can be estimated or predicted from the data? Methods for comparing two or more sequences are broadly grouped into two classes. In the first, the exact location of an event matters. Good examples of this class are X-ray diffraction data and mass spectrometer data. Two peaks at different locations are not related and give us specific information. The second class compares sequences where absolute location is not important, such as different types of earthquakes and other characteristic events. These sequences are compared by cross-correlation, and we will cover them beginning in the next class. Some data form sequences that do not fit well into the time series classification, a good example is stratigraphic sequences. It is not a simple matter to relate a particular layer or sequence of layers to time, since compaction and variations in deposition rate do not allow a simple relationship between layer thickness and time. In Figure 5.1, a stratigraphic sequence is shown. If we sample this sequence 62 times, we can record 61 transitions from one type of rock to another. In this example, we have four different rock types, (A) sandstone, (B) limestone, (C) shale, and (D) coal. We would like to know if there are some transitions that are favored over random transitions. For example, is change from limestone to sandstone more likely than limestone to coal? Since there are four different states, or rock types, there are 16 possible types of transitions: AA BA CA DA AB BB CB DB AC BC CC DC AD BD CD DD By looking at the change from observation n to observation n+1 for the sequence in Figure 5.1, we can set up a frequency matrix: transition transition frequency matrix: A B C D column total A B C D 17 0 5 0 22 0 5 2 0 7 5 2 17 3 27 0 0 3 2 5 Row total 22 7 27 5 61 Thus, there are 17 cases where A remains A from one measurement to the next, and 5 cases where A changes to C, but no cases where A changes to D. Is it necessary that there be as many cases of A changing to C as there are C changing to A? Does the matrix need to be symmetric? Paul says it should be. We can more easily quantify the tendency to change from one state to another by changing the numbers in the above matrix to fractions or probabilities:, making a transition probability matrix by dividing each row by its total, such that each row sums to 1.0: From/ To A B C D A 0.77 0 0.23 0 B 0 0.71 0.29 0 C 0.19 0.07 0.63 0.11 D 0 0 0.6 0.4 The probability of D changing to C is 0.6. If we divide the row totals in the transition frequency matrix by the total number of transitions, we get the probability of each state, in other words, we get the proportions of each lithology, called the marginal or fixed probability vector: f 0.36 0.12 0.44 0.08 (5.1) We can describe these properties in a cyclic diagram to show the probabilities involved. A similar diagram could be drawn for the hydrologic cycle and similar phenomena. Recall from Chapter 2 that the probability of two events (A and B) will occur (the joint probability) equals the probability of B given that A has occurred times the probability of A occurring: P(A, B) P(B | A) P(A) Which we rearrange to: P(A, B) P(B | A) P(A) (5.2) (5.3) In our example, this is the probability that B occurs after A. If all events are independent, that is the probability distribution is uniform, and no event depends on the previous event, then: P(B | A) P(B | anything ) P(B) (5.4) If all events are independent, then we can make a matrix showing what the probabilities of each transition are, given the relative abundance of each state. Each row will be the same and the sum of the probabilities in each row will equal 1. The numbers in each row are the fixed probability vector (5.1): A A (SS) 0.36 B 0.12 C 0.44 D 0.08 B (LS) 0.36 C (SH) 0.36 D (CO) 0.36 0.12 0.12 0.12 0.44 0.44 0.44 0.08 0.08 0.08 These are the EXPECTED transition probabilities given that each transition from lithology to another is independent. We are now in a position to be able to test whether the observed transitions are random (independent) or not. Our hypothesis is that the transitions are not random, and our null hypothesis is that the transitions occur randomly. We first change our probability matrix above back to frequencies by multiplying by the row totals from the transition frequency matrix: 22 0.36 7 0.36 27 0.36 5 0.36 0.12 0.12 0.12 0.12 0.44 0.44 0.44 0.44 0.08 7.9 0.08 2.5 0.08 9.7 0.08 1.8 2.6 9.7 1.8 0.8 3.1 0.6 3.2 11.9 2.2 0.6 2.2 0.4 (5.6) and test with: (Oi Ei ) Ei i1 n 2 (5.7) The Oi’s are from the data frequency matrix and the Ei’s are from the predicted frequency matrix (5.6). The degrees of freedom , are (m-1)*(m-1) where m=4 in this example. One degree of freedom is lost from each row and column because “all must add up to 1”. I see where the rows add up to 1, but where do the columns add to 1? 2 is only valid if the expected values are greater than 5, otherwise the error is too large for a valid test. We can get around this by combining some categories to raise the expected values. Since we are only testing for independence , this is OK. Some transitions are larger than 5 anyhow (AA, A AC, CC, and we combine others to form a total of 7 new categories: 7.9 2.5 9.7 1.8 2.6 9.7 1.8 0.8 3.1 0.6 3.2 11.9 2.2 0.6 2.2 0.4 And 2 thus is: 2 17 7.9 2 7.9 5 9.7 2 9.7 5 9.7 2 9.7 (5.8) 17 11.9 7 7.0 5 5.0 5 9.8 19.7 11.9 7.0 5.0 9.8 2 2 2 2 We haven’t lost any degrees of freedom by combining categories, so =(m-1)*(m=1)=9, and the critical 2 value from the tables is 16.92 at 95% confidence. Since the critical value is smaller than the observed (5.8), we can reject the null hypothesis that the transitions are independent, thus there is a statistical dependence on the transitions from one lithology to the next. Geologically, this is to be expected, since limestones are most often deep-water depositional environments, and coals are subaerial. Sequences that are partially (statistically) dependent on the previous state are called Markov chains. Sequences that are completely determined by the previous value are called deterministic. For example the sequence: [ 0 1 2 3 4 5 6 ] is deterministic, or fully predictable. We can also have sequences that are completely random. Markov chains where values depend only on the previous value are said to have first-order Markov properties. Those that depend on the next previous value also have 2nd order Markov properties, etc. We can use the transition probability matrix to predict the likely lithology 2 feet above a point. This might be necessary to fill in a missing point, for example, with the most probable value. For example if we have B (limestone) at some depth, what is the most likely lithology 2 feet (two steps) above? From/ To A B C D BA (SS) : 0% A 0.77 0 0.23 0 BB (LS): 71% B 0 0.71 0.29 0 For a single step, BC (SH): 29% BA (CO): 0% C D 0.19 0.07 0.63 0.11 0 0 0.6 0.4 So we can only get from B to B or B to C. If the transition is to C, then the probability of the NEXT transition is: CA (SS): 19% CB (LS): 7% CC (SH): 63% CD (CO): 11% From/ To A B C D A 0.77 0 0.23 0 B 0 0.71 0.29 0 C D 0.19 0.07 0.63 0.11 0 0 0.6 0.4 We can see that the the probability of each possibility is: P(BC)•P(CB)=.29•0.07=2% P(BB)*P(BB)=.71•.71=50%, so P(B?B)=P (BBB)+ P(BCB)=50%+2%=52% This process gets more complex as we ask for predictions higher in the chain, (higher order Markov properties) but there is an easy way. We just square the probability matrix to get the 2nd order Markov properties, and cube it to get 3rd order, etc. IN CLASS: what is the probability of having a shale three feet above a limestone in this example? Embedded Markov chains: The choice of sampling interval is arbitrary and important. If we sample too sparsely, we will likely miss information completely. If we sample too closely, then the diagonal elements of the probability matrix will approach 1 and the off-diagonal elements will approach zero. What if we only sample at points of “real” transitions, and ignore points where the two states are the same? In this case, the transition frequency matrix will be zeroes along the diagonal: In this example we have five lithologies, A (Ms, mudstone), B (Sh), C (Ss, siltstone), D (Ss), and E (Co). The fixed probability vector is found by dividing the row totals by the grand total: f=[0.3 0.09 0.24 0.2 0.17] To test the Markov properties, we would like to do a 2 test, but we cannot use the fixed vector to estimate the transition frequency matrix because the diagonal terms would be nonzero. If we did NOT ignore the repeated states, then the frequency matrix would have identical numbers except along the diagonal. If we raise this matrix to a higher power, then we could discard the diagonal terms, adjust the off-diagonal terms to sum to 1, and get our results. Since we don’t know the number of repeated states, we look for the diagonal terms by trial and error. We iterate (try over and over) to find these terms as follows: 1) Put arbitrary large estimates (like 1000) into the diagonal positions in the observation matrix. 2) Divide the row totals by the grand total to get the diagonal probability. 3) Calculate new diagonal estimates by multiplying diagonal probabilities from step 2 by the latest row sums. 4) Repeat the process steps 2 and 3 until the diagonal elements remain unchanged, typically about 10-20 iterations. For our comparison, we test against independent states, and the probability that state j will follow state i is P(ij)=P(i)•P(j). We construct our expected probability matrix: Pc, We now zero out the diagonal elements and use the offdiagonal counts to calculate the 2 value for our data from eqn (5.7). We get 2 =172, which is much larger than the critical value for =(m-1)2-m=11 degrees of freedom, indicating a strong dependence on the transitions - a strong 1st order Markov characteristic.