1 META MINING SYSTEM FOR SUPERVISED LEARNING by
... compactness of user-friendly data models that it generates. These two features make it applicable for applications that use megabytes, or even gigabytes of data. The fields contributing to this research are Inductive Machine Learning, Data Mining and Knowledge Discovery, and Meta Mining. A study of ...
... compactness of user-friendly data models that it generates. These two features make it applicable for applications that use megabytes, or even gigabytes of data. The fields contributing to this research are Inductive Machine Learning, Data Mining and Knowledge Discovery, and Meta Mining. A study of ...
Scaling Up All Pairs Similarity Search
... The all-pairs similarity search problem is a generalization of the well-known nearest neighbor problem in which the goal is to find the nearest neighbors of a given point query. There is a wide body of work on this problem, with many recent works considering various approximation techniques [6, 10, ...
... The all-pairs similarity search problem is a generalization of the well-known nearest neighbor problem in which the goal is to find the nearest neighbors of a given point query. There is a wide body of work on this problem, with many recent works considering various approximation techniques [6, 10, ...
DOCTORAATSPROEFSCHRIFT
... information spaces" and by the Bulgarian National Science Fund under the Project D002-308 "Automated Metadata Generating for e-Documents Specifications and Standards". I would like to express my gratitude to Hasselt University, Belgium and Institute of Mathematics and Informatics, Bulgaria for ensur ...
... information spaces" and by the Bulgarian National Science Fund under the Project D002-308 "Automated Metadata Generating for e-Documents Specifications and Standards". I would like to express my gratitude to Hasselt University, Belgium and Institute of Mathematics and Informatics, Bulgaria for ensur ...
Here - Advanced Computing Group home page
... their comparison and joint study. To fill this gap, we developed a feature selection repository, which is designed to collect the most popular algorithms that have been developed in the feature selection research to serve as a platform to facilitate their application, comparison and joint study. The ...
... their comparison and joint study. To fill this gap, we developed a feature selection repository, which is designed to collect the most popular algorithms that have been developed in the feature selection research to serve as a platform to facilitate their application, comparison and joint study. The ...
Class Association Rule Mining Using Multi
... information spaces" and by the Bulgarian National Science Fund under the Project D002-308 "Automated Metadata Generating for e-Documents Specifications and Standards". I would like to express my gratitude to Hasselt University, Belgium and Institute of Mathematics and Informatics, Bulgaria for ensur ...
... information spaces" and by the Bulgarian National Science Fund under the Project D002-308 "Automated Metadata Generating for e-Documents Specifications and Standards". I would like to express my gratitude to Hasselt University, Belgium and Institute of Mathematics and Informatics, Bulgaria for ensur ...
A survey of temporal knowledge discovery paradigms and methods
... Datatype: The data subject to the knowledge discovery process can be conventional scalar values, such as stock prices, or events that cannot be ordered, such as telecommunication signals. We also consider one further datatype, the one describing the mining results themselves, so that we can observe ...
... Datatype: The data subject to the knowledge discovery process can be conventional scalar values, such as stock prices, or events that cannot be ordered, such as telecommunication signals. We also consider one further datatype, the one describing the mining results themselves, so that we can observe ...
Proceedings of the ECMLPKDD 2015 Doctoral Consortium
... statistical methods seemed most appropriate. We first explored parametric-based statistical approach using a Gaussianbased model. This technique works well if the underlying distribution fits properly and the distribution is fixed over time. But in case of evolving data stream, it is often the case is ...
... statistical methods seemed most appropriate. We first explored parametric-based statistical approach using a Gaussianbased model. This technique works well if the underlying distribution fits properly and the distribution is fixed over time. But in case of evolving data stream, it is often the case is ...
Mining Outlying Aspects on Numeric Data
... a set of rules A → B, where A and B are subspaces, and the outlier is normal in subspace A but deviates substantially in subspace B. The deviation degree can be computed using some outlier score, such as LOF (Breunig et al, 2000). Then, a ranked list of rules is output as the explanation of the outl ...
... a set of rules A → B, where A and B are subspaces, and the outlier is normal in subspace A but deviates substantially in subspace B. The deviation degree can be computed using some outlier score, such as LOF (Breunig et al, 2000). Then, a ranked list of rules is output as the explanation of the outl ...
Information Mining Technologies to Enable Discovery of Actionable
... The current focus of the data mining community is the application of data mining to nonstandard data sets (i.e. non-tabular data sets) such as image sets, documents, video, multimedia data, network data, matrices, graphs and tensors. For the last three listed data sets, the data mining algorithms em ...
... The current focus of the data mining community is the application of data mining to nonstandard data sets (i.e. non-tabular data sets) such as image sets, documents, video, multimedia data, network data, matrices, graphs and tensors. For the last three listed data sets, the data mining algorithms em ...
Spatial autocorrelation
... Spatial regression (SR) Spatial regression (SR) is a global spatial modeling technique in which spatial autocorrelation among the regression parameters are taken into account. SR is usually performed for spatial data obtained from spatial zones or areas. The basic aim in SR modeling is to establi ...
... Spatial regression (SR) Spatial regression (SR) is a global spatial modeling technique in which spatial autocorrelation among the regression parameters are taken into account. SR is usually performed for spatial data obtained from spatial zones or areas. The basic aim in SR modeling is to establi ...
MCAIM: Modified CAIM Discretization Algorithm for Classification
... attribute into the smallest number of intervals and maximizes the class attribute interdependency and, thus makes the classification subsequently performed much easier. The algorithm automatically selects the number of discrete intervals without any user supervision. Experiments in [5] showed that C ...
... attribute into the smallest number of intervals and maximizes the class attribute interdependency and, thus makes the classification subsequently performed much easier. The algorithm automatically selects the number of discrete intervals without any user supervision. Experiments in [5] showed that C ...
Frequent pattern analysis for decision making in big data
... classification errors made by this method using standard statistical methods. Therefore, the proposed method can be employed without extensive empirical performance evaluations that are necessary for other state-of-the-art approximate methods. Multiple Re-sampling Method (MRM) is an improved versi ...
... classification errors made by this method using standard statistical methods. Therefore, the proposed method can be employed without extensive empirical performance evaluations that are necessary for other state-of-the-art approximate methods. Multiple Re-sampling Method (MRM) is an improved versi ...
A comprehensive review on privacy preserving data
... mining operation between a number of users u1,…um with m ≥ 2. The data is viewed as a database of n records, each consisting of l fields, where each record represents an individual ii and illustrates them through its fields. In a simplified representation a table T contains rows to signify i1,…in an ...
... mining operation between a number of users u1,…um with m ≥ 2. The data is viewed as a database of n records, each consisting of l fields, where each record represents an individual ii and illustrates them through its fields. In a simplified representation a table T contains rows to signify i1,…in an ...
Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It will often be necessary to modify data preprocessing and model parameters until the result achieves the desired properties.Besides the term clustering, there are a number of terms with similar meanings, including automatic classification, numerical taxonomy, botryology (from Greek βότρυς ""grape"") and typological analysis. The subtle differences are often in the usage of the results: while in data mining, the resulting groups are the matter of interest, in automatic classification the resulting discriminative power is of interest. This often leads to misunderstandings between researchers coming from the fields of data mining and machine learning, since they use the same terms and often the same algorithms, but have different goals.Cluster analysis was originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Robert Tryon in 1939 and famously used by Cattell beginning in 1943 for trait theory classification in personality psychology.