Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia, lookup

Expectation–maximization algorithm wikipedia, lookup

Data assimilation wikipedia, lookup

Data analysis wikipedia, lookup

K-nearest neighbors algorithm wikipedia, lookup

Computational phylogenetics wikipedia, lookup

Multidimensional empirical mode decomposition wikipedia, lookup

Theoretical computer science wikipedia, lookup

Geographic information system wikipedia, lookup

Probabilistic context-free grammar wikipedia, lookup

Transcript

Spatial Sequential Pattern Mining Spatial Sequential Pattern Mining for Seismic Data Riccardo Campisano, Fabio Porto, Esther Pacitti, Florent Masseglia, Eduardo Ogasawara Oct 05th, 2016 CEFET/RJ National Laboratory Scientific Computing Spatial-time series ● A large number of different applications collects observed events data in the form of time series. ● In many of them, observed events are space related, resulting in even larger spatial-time series: – Astronomical data – Seismic data – Climatic data (e.g. temperature changes are time and latitude related) 2 Spatial-time series ● Analysis of such data is a challenge due: – large volume of data – difficulty to perform matches between continuous (non discrete) observed value [Fu, 2011] – complex relationship between spatial and time dimensions [Han et al., 2007] 3 Real world seismic use-case ● Seismic dataset of F3 block in the Dutch sector of the North Sea. 4 Real world seismic use-case ● Data is obtained producing shots on the terrain surface, provoking waves that are reflected by different subsoil materials and collected from receivers located at specific locations. 5 Real world seismic use-case ● The results is a 3D data cube of observations collected at different positions of the terrain surface, composed of 2D vertical sections called inlines and crosslines that corresponds to the E,Z plan and the N,Z plan in the figure, respectively. Crosslines Inlines 6 Real world seismic use-case ● Each inline and crossline are composed of vertical time series. Time series at specific spatial positions Inline 100 7 Real world seismic use-case One interesting analysis aim is the detection of faults and horizons, i.e., zones of major unconformities that corresponds to specific geologic boundaries [Zhou, 2014]. Inline 100 8 Methodology ● ● Sequence pattern mining is used successfully to obtain insight from large volume of transactional databases. Scope of this work is the use of such technique to discover sequential patterns on seismic spatial-time series: – – – indexing technique used to discretize the input adapted algorithm implemented to retrieve discovered patterns positions results are presented over original seismic trace images to better evaluate the quality of results 1) Discretization 2) Sequential pattern mining 3) Visualization 9 1) Discretization ● Time series contains continuous (non discrete) values. ● Is not possible to find patterns performing an exact match between items of such sequences. ● SAX indexation [Lin et al., 2003] was applied to convert continuous values to a discrete symbolic representation. 10 1) Discretization Translated figure of Inline 100 Portion of original seismic dataset SAX converted data 11 2) Sequential pattern mining ● ● ● A sequential pattern algorithm was adapted to performs sequence mining in spatial-time series dataset. It uses of the Apriori Principle: if a set of items is frequent, any of its subset is frequent too. itemsets of size k-1 → itemsets of size k 12 3) Visualization ● For each detected frequent sequence the algorithm provide all the positions where the sequence was encountered. ● With this associated positions is possible to visually represents the match positions of each sequence and this allow a supervised evaluation of the quality of the results. 13 Results alphabet-size: 25 min-support: 70% max-stretch: 2 Sequence: <a,a,y,y,> Several horizon segments detected Inline 100 14 Results alphabet-size: 25 min-support: 80% max-stretch: 10 Sequence: <y,y,y,a,a> Continuous horizons detected Inline 401 15 Conclusions ● The algorithm was able to detect sequential patterns in spatial-time series database. ● The position of detected frequent sequences follows some of the geologic boundaries of the subsoil. ● However, the large number of algorithm results make difficult to select interesting patterns. ● Moreover, a sequence contained in all the spatial-time series is not so surprising to be found. 16 Future works ● Ranking can be used to prioritize interesting results. ● Tight patterns can be used to detect faults. <a,a,y,y> sequence positions at inline 100 17