Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Proceedings of the International Conference , “Computational Systems and Communication Technology” Jan.,9,2009 - by Lord Venkateshwaraa Engineering College, Kanchipuram Dt.PIN-631 605,INDIA A Survey on Mining Spatio-Temporal Data using Independent Component Analysis - 1B. PRIYA, 2 M. VELU 1 Lecturer, MCA Dept, Sri SaiRam Engineering College, Chennai,[email protected] 2 Senior Analyst Programmer, London Underground - Information Management, London, [email protected] brain imaging, and electrical brain signals to telecommunications and stock predictions. Abstract The recent advances of telecommunications (e.g., GPS, Cellular networks, etc.) has facilitated the collection of large spatial and spatio-temporal datasets. The volume of such data and their potentially high update rate makes their manual analysis extremely difficult (if not impossible), calling for mining techniques for automatic extraction of valuable information.Furthermore the special nature of the data and the analysis objectives renders knowledge extraction techniques for simple data types inadequate. Special issues in this Data Mining (DM) field include the fuzzy and implicit nature of spatial and spatio-temporal relationships between objects, the complex geometry of spatial objects, the varying temporal nature of events (instantaneous vs. durable), the variability of spatio-temporal data (moving objects, evolution of spatial events or phenomena, etc.), and the multiple (spatial and temporal) resolution levels of abstraction. Spatio-temporal data mining represents the confluence of several fields including spatiotemporal databases, machine learning, statistics, geographic visualization, and information theory. Exploration of spatial data mining and temporal data mining has received much attention independently in KDD and DM research community. Nevertheless, the need to investigate both “spatial” and “temporal” relations at the same time complicates the data mining tasks even further. A crucial challenge in Spatio-Temporal DM is the exploration of efficient methods due to the large amount of spatio-temporal data and the complexity of spatio-temporal data types, data representation, and spatial data structure. The applications for Independent Component Analysis (ICA) range from speech processing, This article deals with mining the spatiotemporal data using ICA with some case studies related to weather Data Mining . Key Words: Spatio-Temporal Data, ICA, PCA, NAO, Machine learning, Weather Data Mining. 1. Introduction Spatio-temporal data records spatial views of objects across time. Data produced from fluid dynamics simulations, or geoinformatics data that tracks the behavior of intrusions are of this type. A fundamental difference between pure spatial data and spatio-temporal data is that objects in spatio-temporal data are under constant change. Regardless of the nature of a change (e.g., location change, shape change), a standard assumption is that change is continuous. This means that while changing, a quantity must pass through all the intermediate values. For example, in the quantity space f-, 0, +g, a value cannot change from ’-’ to ’+’ without going through value 0. It is often impossible to model and represent the continuous properties of changes, especially, when multiple objects are involved. A commonly used approach is to represent a continuously changing system as a sequence of snapshots, where each snapshot records the state of every involved object at a certain time point. Temporal and spatial data are complex data to be mined because of their internal structure, that can be considered as multi-dimensional. Indeed, spatial data may involve two or three dimensions for determining a region and complex relations as well for describing the relative positions of regions between each others. Temporal data may Copy Right @CSE/IT/ECE/MCA-LVEC-2009 Proceedings of the International Conference , “Computational Systems and Communication Technology” Jan.,9,2009 - by Lord Venkateshwaraa Engineering College, Kanchipuram Dt.PIN-631 605,INDIA present a linear but also a two-dimensional aspect, when time intervals are taken into account and have to be analyzed. In this way, mining temporal or spatial data are tasks related to KDDK [1]. ICA is becoming an increasingly important tool for analyzing large data sets. In essence, ICA separates an observed set of signal mixtures into a set of statistically independent component signals, or source signals. In so doing, this powerful method can extract the relatively small amount of useful information typically found in large data sets. 2. Independent Component Analysis (ICA) ICA is a fairly new and a generally applicable method to several challenges in signal processing. It reveals a diversity of theoretical questions and opens a variety of potential applications. Successful results in Electro Encephalo Graphic (EEG), functional Magnetic Resonance Imaging (fMRI), and speech recognition and face recognition systems indicate the power and optimistic hope in the new paradigm. ICA is a method for finding underlying factors or components from multidimensional statistical data. There are many latent variable decompositions method, such as Principle Component Analysis (PCA), singular value decomposition (SVD), factor analysis, projection pursuit and so on. What distinguishes between ICA from these methods is that it looks for components that are both statistically independent and non-Gaussian. In PCA or factor analysis, an observed vector x(t) is first centered by removing its mean(in practice, the mean is estimated as the average value of the vector in a sample). Then the vector is transformed by a linear transformation into a new vector, possibly of lower dimension, whose elements are uncorrelated with each other. The linear transformation is found by computing the eigen value decomposition of the covariance matrix, which for zero-mean vectors is the correlation matrix E[x(t)(x(t))T] and the eigenvectors of it form a new coordinate system in which the data are presented. As a result, the number of components yi(t) will be quite small, maybe only 1 or 2, but these components contain most information which may provide an insight into the structure of the data in the meaning of second order statistics. The basic PCA network can be described by y i (t)= ∑w i j x j (t) x j ‘(t)= x j (t) - ∑w i j y j (t) w i j = ή x j ‘(t) y j (t) i=1,2,… N But in many applications, uncorrelatedness is not enough, we must find the independent components (ICs). Here, the independence is not corresponding to the independence in factor analysis. Factor analysis originally developed in social sciences, which is often claimed that the factors are independent, but this is only partly true, because factor analysis assumes that the data has a Gaussian distribution. If the data has a Gaussian distribution, it is easy to find the ICs, for Gaussian data, uncorrelated components are equivalent to independent [2]. On the other hand, ICA tries to find statistical independent sources by additionally minimizing higher order statistics between various components. In the applications, when we intend to find the independent factors among the huge set, ICA is a ideal method more than PCA or factor analysis. 3. Mining Spatio-Temporal Data “Data mining is the process of digging through large data sets and extracting the useful information for analyzing to find the hidden patterns and relationships using modern statistical and computational techniques.”[3] Many natural phenomena present intrinsic spatial and temporal characteristics. With the recent advances in data collection technologies, high resolution spatio-temporal datasets can be stored and analyzed to accurately study the behavior of such events. However, these datasets are often very large and difficult to analyze and display. Recently much attention has been dedicated to the application of innovative data-mining techniques to filter out relevant subsets of very large repositories as well as to the development of visualization tools to effectively display the corresponding results. Copy Right @CSE/IT/ECE/MCA-LVEC-2009 Proceedings of the International Conference , “Computational Systems and Communication Technology” Jan.,9,2009 - by Lord Venkateshwaraa Engineering College, Kanchipuram Dt.PIN-631 605,INDIA Spatio-temporal data mining is an emerging research area dedicated to the development and application of novel computational techniques for the analysis of large spatio-temporal databases. manipulating the geometrical components of the spatial data (Shneiderman, 2002) are some of the challenges that still need to be tackled[4]. ICA is a efficient method to be used in geospatial environment for mining Spatio-Temporal Data. Some research estimates that about 80% of the data stored in corporate databases integrate spatial information (Fayyad and Grinstein, 2001), leading to huge amounts of georeferenced information that need to be analyzed and processed. These datasets are often critical for decision support, but their value depends on the ability to extract useful information for studying and understanding the phenomena governing the data source. Therefore, the need for efficient and effective techniques for analyzing spatiotemporal datasets has recently emerged as a research priority (Bédard et al, 2001): spatio-temporal Data Mining aims at addressing these needs. It encompasses a set of exploratory, computational and interactive approaches for analyzing very large spatial and spatio-temporal datasets. Numerous research projects on spatial data mining have been conducted in the last two decades (a comprehensive review is provided by Andrienko et al., 2003). Several open issues have been identified, ranging from the definition of mining techniques capable of dealing with spatialtemporal information, to the development of effective methods for interpreting and visualizing the final results. The main impulse to research in this subfield of data mining comes from the large amount of In particular, visualization techniques are widely recognized to be powerful in this domain (Andrienko et al., 2003), (Andrienko et al., 2005), (Johnston, 2001), since they take advantage of human abilities to perceive visual patterns and to interpret them (Andrienko et al., 2003), (Kopanakis and Theodoulidis, 2003), (Costabile and Malerba, 2003). However, it is recognized that spatial visualization features provided in the existing geographical applications are not adequate for decision support when used alone. Hence, alternative solutions have to be defined (Bédard et al, 2001), to dynamically and interactively obtain different spatial and temporal views, and to interact in different ways with the results produced during the data mining process. The problems of how to visualize the spatio-temporal multidimensional dataset (Bédard et al, 1997) and how to define effective visual interfaces for viewing and spatial data made available by GIS, CAD, robotics and computer vision applications, computational biology, mobile computing applications; temporal data obtained by registering events (e.g., telecommunication or web traffic data) and monitoring processes and workflows. Both the temporal and spatial dimensions add substantial complexity to data mining tasks. First of all, the spatial relations, both metric (such as distance) and non-metric (such as topology, direction, shape, etc.) and the temporal relations (such as before and after) are information bearing and therefore need to be considered in the data mining methods. Secondly, some spatial and temporal relations are implicitly defined, that is, they are not explicitly encoded in a database. These relations must be extracted from the data and there is a trade-off between pre computing them before the actual mining process starts (eager approach) and computing them on-the-fly when they are actually needed (lazy approach). Moreover, despite much formalization of space and time relations available in spatio-temporal reasoning, the extraction of spatial/temporal relations implicitly defined in the data introduces some degree of fuzziness that may have a large impact on the results of the data mining process. Thirdly, working at the level of stored data, that is, geometric representations (points, lines and regions) for spatial data or time stamps for temporal data is often undesirable. For instance, urban planning researchers are interested in possible relations between two roads, which either cross each other, or run parallel, or can be confluent, independently of the fact that the two roads are represented by one or more tuples of a relational table of “lines” or “regions”. Therefore, complex transformations are required to describe the units of analysis at higher Copy Right @CSE/IT/ECE/MCA-LVEC-2009 Proceedings of the International Conference , “Computational Systems and Communication Technology” Jan.,9,2009 - by Lord Venkateshwaraa Engineering College, Kanchipuram Dt.PIN-631 605,INDIA conceptual levels, where human-interpretable properties and relations are expressed. Fourthly, spatial resolution or temporal granularity can have direct impact on the strength of patterns that can be discovered in the datasets. Interesting patterns are more likely to be discovered at the lowest resolution/granularity level. On the other hand, large support is more likely to exist at higher levels. Fifthly, many rules of qualitative reasoning on spatial and temporal data (e.g., transitive properties for temporal relations after and before) as well as spatio-temporal ontologies, provide a valuable source of domain independent knowledge that should be taken into account when generating patterns. How to express these rules and how to integrate spatio-temporal reasoning mechanisms in data mining systems are still open problems. Additional research issues related to spatiotemporal data mining concern visualization of spatio-temporal patterns and phenomena, scalability of the methods, data structures used to represent and efficiently index spatio-temporal data [5]. 4. Analysis of Spatio-Temporal Climate Variability by ICA Statistical approaches to weather and climate prediction have a long and distinguished history that predates modeling based on physics and dynamics. This trend continues today with new approaches based on machine learning algorithms. The central problem in weather and climate modeling is to predict the future states of the atmospheric system. It is therefore possible to view the weather variables as sources of spatio-temporal signals. The information from these spatio temporal signals can be extracted using data mining techniques. The variation in the weather variables can be viewed as a mixture of several independently occurring spatio temporal signals with different strengths. A key problem in climatology is to deduce from observations the physical phenomena at the origin of climate variability. Classical approaches such as PCA are based on hypotheses that are not always valid for the analysis of climate (linearity, Gaussian distributions, orthogonality of components, maximum of variance in a minimum number of modes). This statistical technique (ICA) aims at extracting linearly or nonlinearly independent components from a dataset of observations or model outputs using a criterion of statistical independence, which is a stronger constraint than decorrelation, used in the classical approaches. Recently there has been increased interest in the use of the ICA for image analysis. ICA can be considered as one approach to component analysis. Among other approaches, the traditional Principle Component Analysis (PCA) is most popular. The component analysis that extracts the most important components of the data is useful for data mining in remote sensing which normally involves a very large amount of data. While PCA method attempts to decorrelate the components in a vector, ICA methods are to make the components as statistically independent as possible. There are several ICA algorithms, which can be implemented efficiently by a neural network. As such it is a very useful tool for data mining in remote sensing. 5. Case Studies 5.1 Weather Data Mining Weather Data Mining is a form of Data mining concerned with finding the hidden patterns out of the Large available meteorological data, so that the information retrieved can be transformed into the usable knowledge. 5.2 Use of ICA in weather Data Mining with regard to Pacific Decadal Oscillation If the assumption of independent stable activity in the weather variables holds true then it is also possible to extract them using the same technique of ICA. One basic assumption is that we view the weather phenomenon as a mixture of a certain number of signals with independent stable activity. The weather changes due to the changes in the mixing patterns of these stable activities over time. For linear mixtures, the change in the mixing coefficients gives rise to the changing nature of the global weather. We have to investigate if there exist any such set of spatio-temporal stable patterns such that the variation of the mixture gives rise to the observed weather or climate phenomena. The conjecture is that there exist independent stable spatio-temporal activities, the mixture of which Copy Right @CSE/IT/ECE/MCA-LVEC-2009 Proceedings of the International Conference , “Computational Systems and Communication Technology” Jan.,9,2009 - by Lord Venkateshwaraa Engineering College, Kanchipuram Dt.PIN-631 605,INDIA give rise to the weather variables; and these stable activities can be extracted by ICA of the data arising from the weather and climate patterns, viewing them as spatio-temporal signals. If the conjecture about the existence of stable spatio-temporal activity in the weather is true, then the mixing coefficients will vary in accordance with the changes in the weather variables. Figure 5.2. represents an application of temporal ICA with regard to Pacific Decadal Oscillation[6]. 5.3.meteorological measurements. The method of mining spatio-temporal data is generic in nature and is not subject only to the weather phenomenon. The same method can be applied to find certain stable characteristics in other spatio-temporal systems. Even when a spatiotemporal system is chaotic, the method may be applied to extract meaningful patterns if the system embeds some such stable patterns (possibly weather is a natural example of a physical chaotic system) as shown in figure 5.3. Figure 5.2 Pacific Decadal Oscillation 5.3 North Atlantic Oscillation (NAO) In the research work by M.S.Santhanam [7], they have provided a new way of viewing the physical phenomena of changing weather and climate by mining spatio-temporal data of weather and climate variables. NAO is considered as a typical example and mine the Sea level Pressure (SLP) data using ICA. Techniques are provided for determining the strongest independent components in the multidimensional data set, and observed that the strongest stable patterns as obtained by ICA matched with the physical patterns of oscillation in SLP. The results are also verified by finding a linear fit of the independent components with the standard NAO index as provided by the meteorological measurements. The method of mining spatio-temporal data is generic in nature and is not subject only to the weather phenomenon. The same method can be applied to find certain stable characteristics in other spatio-temporal systems. Even when a spatiotemporal system is chaotic, the method may be applied to extract meaningful patterns if the system embeds some such stable patterns (possibly weather is a natural example of a physical chaotic system) as shown in figure Figure 5.3 North Atlantic Oscillation The method can be further investigated in the following manner. First, it extracts certain stable patterns whose temporal trend perfectly matches with the physical phenomenon. Therefore, the individual stable oscillations (obtained as independent components from the spatiotemporal data) can be analyzed further to predict the time-series behavior of the oscillation. Second, it is very difficult to analyze the NAO in order to find the physical correlations between various modes that interact to produce the NAO phenomenon. However, ICA gives a mixing matrix that provides an indication about how the various modes interact (in a linear manner). Third, we assumed a linear mixture of various independent components. In further Copy Right @CSE/IT/ECE/MCA-LVEC-2009 Proceedings of the International Conference , “Computational Systems and Communication Technology” Jan.,9,2009 - by Lord Venkateshwaraa Engineering College, Kanchipuram Dt.PIN-631 605,INDIA investigation, this assumption can be relaxed and nonlinear ICA can be performed on these kinds of spatio-temporal data sets in order to find even more meaningful characteristics. 6. Conclusion Spatio-temporal data mining is an emerging research area dedicated to the development and application of novel computational techniques for the analysis of very large, spatio-temporal databases. Data mining techniques are typically inductive, as opposed to deductive, in that they are not used to prove or disprove pre-existing hypotheses but rather to identify patterns embedded within data, and thereby support hypothesis generation. Most research in spatial, temporal, and spatio-temporal data mining has sought to adapt ‘classical’ data mining algorithms intended to operate on more conventional data types. Spatiotemporal data mining presents a number of challenges due to the complexity of geographic domains, the mapping of all data values into a spatial and temporal framework, and the spatial and temporal autocorrelation exhibited in most spatio-temporal data sets. ICA is a powerful tool for mining spatio temporal data for usage in weather data mining. REFERENCES [1]http://ralyx.inria.fr/2007/Raweb/orpailleur/ui d17.html [2] “Data mining with Independent Component Analysis” by Rui Li, proceedings of the 6th world congress on Intelligent Control and Automation. [3]http://en.wikipedia.org/wiki/Weather_Data_ Mining" [4]http://geoanalytics.net/VisA-SDS2006/paper28.pdf [5]http://www.di.uniba.it/~malerba/activities/mst d/ [6] ti.arc.nasa.gov/is/IDU/tasks/MLDM.html [7] “Weather Data Mining Using Independent Component Analysis” by Basak, J. Sudarshan, A. Trivedi, D. Santhanam, M. S. JOURNAL OF MACHINE LEARNING RESEARCH 2005, VOL 5; NUMB 1, pages 239254 [8] http://cnl.salk.edu/~tewon/ica_cnl.html Copy Right @CSE/IT/ECE/MCA-LVEC-2009