new methods for mining sequential and time series data
... high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing ...
... high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing ...
Object-Based Selective Materialization for Efficient Implementation
... merging spatial (vector or raster) objects is a computationally expensive operation [10]. This object-based approach reflects a trade-off between space and time for efficient implementation of spatial OLAP operations. On the one hand, it is important to precompute some spatial OLAP results, such as ...
... merging spatial (vector or raster) objects is a computationally expensive operation [10]. This object-based approach reflects a trade-off between space and time for efficient implementation of spatial OLAP operations. On the one hand, it is important to precompute some spatial OLAP results, such as ...
Clustering of time-series subsequences is meaningless: implications
... We then measured the average cluster distance (as defined in equation 1), between each set of cluster centers in X̂ , to each other set of cluster centers in X̂ . We call this number within_set_ X̂ _distance. within _ set _ Xˆ _ dista nce = ...
... We then measured the average cluster distance (as defined in equation 1), between each set of cluster centers in X̂ , to each other set of cluster centers in X̂ . We call this number within_set_ X̂ _distance. within _ set _ Xˆ _ dista nce = ...
Detection of Outliers in Time Series Data - e
... In this Section, we discuss the problem of outlier detection in natural gas consumption time series. An outlier is an entry in a data set that is anomalous with respect to the behavior seen in the majority of the other entries in the data set (3; 4; 5). The data sets used in this thesis are provided ...
... In this Section, we discuss the problem of outlier detection in natural gas consumption time series. An outlier is an entry in a data set that is anomalous with respect to the behavior seen in the majority of the other entries in the data set (3; 4; 5). The data sets used in this thesis are provided ...
Customer Churn Prediction for the Icelandic Mobile
... prediction model that can output the probabilities that customers will churn in the near future. Churn prediction is formulated as a classification task of churners and non-churners. Learning algorithms are applied on training data to build classifiers. The data is a set of customers where each one ...
... prediction model that can output the probabilities that customers will churn in the near future. Churn prediction is formulated as a classification task of churners and non-churners. Learning algorithms are applied on training data to build classifiers. The data is a set of customers where each one ...
Optimizing Query Processing In Sensor Networks
... – nodes participating in a query send • the selectivity of each of the queries predicates • It’s longitude and latitude ...
... – nodes participating in a query send • the selectivity of each of the queries predicates • It’s longitude and latitude ...
finding or not finding rules in time series
... 1. Calculate the distance between all objects. Store the results in a distance matrix. 2. Search through the distance matrix and find the two most similar clusters/objects. 3. Join the two clusters/objects to produce a cluster that now has at least 2 objects. 4. Update the matrix by calculating the ...
... 1. Calculate the distance between all objects. Store the results in a distance matrix. 2. Search through the distance matrix and find the two most similar clusters/objects. 3. Join the two clusters/objects to produce a cluster that now has at least 2 objects. 4. Update the matrix by calculating the ...
An overview on subgroup discovery - Soft Computing and Intelligent
... for the first value of the target variable can be observed, where the rule attempts to cover a high number of objects with a single function: a circle. As can be observed, the subgroup does not cover all the examples for the target value x even the examples covered are not positive in all the cases, ...
... for the first value of the target variable can be observed, where the rule attempts to cover a high number of objects with a single function: a circle. As can be observed, the subgroup does not cover all the examples for the target value x even the examples covered are not positive in all the cases, ...
Kmeans - chandan reddy
... 3. Bradley and Fayyad [5]: Choose random subsamples from the data and apply K-means clustering to all these subsamples using random seeds. The centroids from each of these subsamples are then collected and a new dataset consisting of only these centroids is created. This new dataset is clustered usi ...
... 3. Bradley and Fayyad [5]: Choose random subsamples from the data and apply K-means clustering to all these subsamples using random seeds. The centroids from each of these subsamples are then collected and a new dataset consisting of only these centroids is created. This new dataset is clustered usi ...
CG33504508
... cluster V is selected and bisected further into two partitions V1 and V2 using the basic KM algorithm. This process continues until the desired number of clusters or some other specified stopping condition is reached. There are a number of different ways to choose which cluster to split. For example ...
... cluster V is selected and bisected further into two partitions V1 and V2 using the basic KM algorithm. This process continues until the desired number of clusters or some other specified stopping condition is reached. There are a number of different ways to choose which cluster to split. For example ...
NEW DENSITY-BASED CLUSTERING TECHNIQUE Rwand D. Ahmed
... Density Based Spatial Clustering of Applications of Noise (DBSCAN) is one of the most popular algorithms for cluster analysis. It can discover clusters with arbitrary shape and separate noises. But this algorithm cannot choose its parameter according to distribution of dataset. It simply uses the gl ...
... Density Based Spatial Clustering of Applications of Noise (DBSCAN) is one of the most popular algorithms for cluster analysis. It can discover clusters with arbitrary shape and separate noises. But this algorithm cannot choose its parameter according to distribution of dataset. It simply uses the gl ...
Ensembles for Unsupervised Outlier Detection: Challenges
... If given a ground truth dataset where we know, for each object, whether it actually is an outlier or not, two ways of measuring the quality of the outlier detection result are commonly used in the literature [76]. The first, more widely used measure of success is based on receiver operating characte ...
... If given a ground truth dataset where we know, for each object, whether it actually is an outlier or not, two ways of measuring the quality of the outlier detection result are commonly used in the literature [76]. The first, more widely used measure of success is based on receiver operating characte ...
Rule-Based Classifier
... Given a record with attributes (A1, A2,…,An) – Goal is to predict class C – Specifically, we want to find the value of C that ...
... Given a record with attributes (A1, A2,…,An) – Goal is to predict class C – Specifically, we want to find the value of C that ...
HSC: A SPECTRAL CLUSTERING ALGORITHM
... data processing and analysis tool. Many clustering applications can be found in these fields, such as web mining, biological data analysis, social network analysis [1], etc. However, clustering is still an attractive and challenging problem. It is hard for any clustering method to give a reasonable p ...
... data processing and analysis tool. Many clustering applications can be found in these fields, such as web mining, biological data analysis, social network analysis [1], etc. However, clustering is still an attractive and challenging problem. It is hard for any clustering method to give a reasonable p ...
Session 9: Clustering
... Comments on the K-Means Method Strength: Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. • Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k)) ...
... Comments on the K-Means Method Strength: Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. • Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k)) ...
Parallel Itemset Mining in Massively Distributed Environments
... some hidden relationships cannot be easily driven and detected inside the data. This is specifically the case when the data is very large and massively distributed. To this end, a careful analysis of the informativeness of the itemsets would give more explanation about the existing correlations and ...
... some hidden relationships cannot be easily driven and detected inside the data. This is specifically the case when the data is very large and massively distributed. To this end, a careful analysis of the informativeness of the itemsets would give more explanation about the existing correlations and ...
A Dense-Region Based Approach to On
... area of clusterization in large databases. CLARANS is a partition technique which improves the k-mediod methods 12]. BIRCH uses CF-trees to reduce the input size and adopts an approximate technique for clustering in large database 17]. CURE is another algorithm that uses sampling and partitioning ...
... area of clusterization in large databases. CLARANS is a partition technique which improves the k-mediod methods 12]. BIRCH uses CF-trees to reduce the input size and adopts an approximate technique for clustering in large database 17]. CURE is another algorithm that uses sampling and partitioning ...
K-nearest neighbors algorithm
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.