Understanding taxi driving behaviors from movement data 1
... 8000 GPS points of each car would be recorded in one day (24 hours) given the GPS device effective. Each position record has nine attributes, i.e. car identification number, company name, current timestamp, current location (longitude, latitude), instantaneous velocity, and the GPS effectiveness. Th ...
... 8000 GPS points of each car would be recorded in one day (24 hours) given the GPS device effective. Each position record has nine attributes, i.e. car identification number, company name, current timestamp, current location (longitude, latitude), instantaneous velocity, and the GPS effectiveness. Th ...
PDF version
... common pages among sessions. The second model, Click-Stream Tree, considers both the order information of pages in a session and the time spent on them. User sessions are clustered according to their pair-wise similarity and the resulting clusters are then represented by a click-stream tree. A new m ...
... common pages among sessions. The second model, Click-Stream Tree, considers both the order information of pages in a session and the time spent on them. User sessions are clustered according to their pair-wise similarity and the resulting clusters are then represented by a click-stream tree. A new m ...
Data Stream Mining with Extensible Markov Model
... Markov process is a random process satisfying Markov property. Markov chain is a Markov process with discrete states. Clustering -> determine representative granules in the data space. Static Markov chain -> dynamic Markov chain Map a cluster into a state in Markov chain What is EMM: A data ...
... Markov process is a random process satisfying Markov property. Markov chain is a Markov process with discrete states. Clustering -> determine representative granules in the data space. Static Markov chain -> dynamic Markov chain Map a cluster into a state in Markov chain What is EMM: A data ...
impacts of frequent itemset hiding algorithms on privacy
... Public sensitivity against data mining increased because it is seen a threat to individuals private information as shown in the example above. On the other hand, data mining is important for efficiently discovering knowledge. Privacy preserving data mining arise from the need for continue performing ...
... Public sensitivity against data mining increased because it is seen a threat to individuals private information as shown in the example above. On the other hand, data mining is important for efficiently discovering knowledge. Privacy preserving data mining arise from the need for continue performing ...
Machine learning and data mining for yeast functional genomics
... such as the results of phenotypic growth experiments, microarray experiments, sequence characteristics, secondary structure prediction and sequence similarity searches. This work builds on existing approaches to analysis of ORF function in the M. tuberculosis and E. coli genomes and extends the comp ...
... such as the results of phenotypic growth experiments, microarray experiments, sequence characteristics, secondary structure prediction and sequence similarity searches. This work builds on existing approaches to analysis of ORF function in the M. tuberculosis and E. coli genomes and extends the comp ...
Inducing Generalized Multi-Label Rules with Learning Classifier
... Thus, defining the problem from a machine learning point of view, a multi-label classification model approximates a function f : X → L∗ where X is the feature space and L∗ is the powerset of the label space L (i.e., the powerset of the set of all possible labels). The general multi-label classificat ...
... Thus, defining the problem from a machine learning point of view, a multi-label classification model approximates a function f : X → L∗ where X is the feature space and L∗ is the powerset of the label space L (i.e., the powerset of the set of all possible labels). The general multi-label classificat ...
TESI DOCTORAL
... One of the most appealing machine learning paradigms are Learning Classifier Systems (LCSs), and more specifically Michigan-style LCSs, an open framework that combines an apportionment of credit mechanism with a knowledge discovery technique inspired by biological processes to evolve their internal ...
... One of the most appealing machine learning paradigms are Learning Classifier Systems (LCSs), and more specifically Michigan-style LCSs, an open framework that combines an apportionment of credit mechanism with a knowledge discovery technique inspired by biological processes to evolve their internal ...
Cooperative Clustering Model and Its Applications
... Data clustering plays an important role in many disciplines, including data mining, machine learning, bioinformatics, pattern recognition, and other fields, where there is a need to learn the inherent grouping structure of data in an unsupervised manner. There are many clustering approaches proposed ...
... Data clustering plays an important role in many disciplines, including data mining, machine learning, bioinformatics, pattern recognition, and other fields, where there is a need to learn the inherent grouping structure of data in an unsupervised manner. There are many clustering approaches proposed ...
Mining Frequent Patterns with Counting Inference
... We present the PASCAL2 algorithm, introducing a novel, effective and simple optimization of the Apriori algorithm. This optimization is based on pattern counting inference t h a t relies on the new concept of key patterns. A key pattern is a minimal pattern of an equivalence class gathering all patt ...
... We present the PASCAL2 algorithm, introducing a novel, effective and simple optimization of the Apriori algorithm. This optimization is based on pattern counting inference t h a t relies on the new concept of key patterns. A key pattern is a minimal pattern of an equivalence class gathering all patt ...
Generalizing Self-Organizing Map for Categorical Data
... attributes with the domain of This straightforward approach has several drawbacks including increased dimensionality of the transformed relation, difficulty in maintaining the transformed relation schema, and inability to convey the semantics of the original attribute. Most importantly, this approac ...
... attributes with the domain of This straightforward approach has several drawbacks including increased dimensionality of the transformed relation, difficulty in maintaining the transformed relation schema, and inability to convey the semantics of the original attribute. Most importantly, this approac ...
A Dense-Region Based Approach to On
... that techniques such as image analysis, decision tree classication, and clusterization can be used for these purpose 9, 15]. However, our investigation nds out that none of these can deliver a suitable solution. The techniques of grid generation in image analysis are similar to nding dense regio ...
... that techniques such as image analysis, decision tree classication, and clusterization can be used for these purpose 9, 15]. However, our investigation nds out that none of these can deliver a suitable solution. The techniques of grid generation in image analysis are similar to nding dense regio ...
Ensemble of Feature Selection Techniques for High
... multiple feature selection techniques are combined to yield more robust and stable results. Ensemble of multiple feature ranking techniques is performed in two steps. The first step involves creating a set of different feature selectors, each providing its sorted order of features, while the second ...
... multiple feature selection techniques are combined to yield more robust and stable results. Ensemble of multiple feature ranking techniques is performed in two steps. The first step involves creating a set of different feature selectors, each providing its sorted order of features, while the second ...
Comparison of Chi-Square Based Algorithms for Discretization of
... concluded that the most common techniques had been Equal-width Discretization (EWD) and Equalfrequency Discretization (EFD), MDLP, ID3, ChiMerge, 1R, D2, and Chi2. Among these, EWD and EFD are common unsupervised discretization methods due to their simplicity and availability in many data mining app ...
... concluded that the most common techniques had been Equal-width Discretization (EWD) and Equalfrequency Discretization (EFD), MDLP, ID3, ChiMerge, 1R, D2, and Chi2. Among these, EWD and EFD are common unsupervised discretization methods due to their simplicity and availability in many data mining app ...
Proceedings as a pdf file - Helsinki Institute for Information
... “normal” routes from peculiar ones: the former will be grouped in clusters and the latter will be marked as noise. In our library of distance functions, we have a function “route similarity” [2][11], which measures the correspondence between the geometric shapes of two trajectories and the closeness ...
... “normal” routes from peculiar ones: the former will be grouped in clusters and the latter will be marked as noise. In our library of distance functions, we have a function “route similarity” [2][11], which measures the correspondence between the geometric shapes of two trajectories and the closeness ...
OPTIMIZATION-BASED MACHINE LEARNING AND DATA MINING
... An example showing that the set Γ1 discretized in (3.15) need not contain the region {x|g(x)+ = 0} in which the left-hand side of the implication (3.12) is satisfied. Each of the figures (a), (b) and (c) depict 600 points denoted by “+” and “o” that are obtained from three bivariate normal distribut ...
... An example showing that the set Γ1 discretized in (3.15) need not contain the region {x|g(x)+ = 0} in which the left-hand side of the implication (3.12) is satisfied. Each of the figures (a), (b) and (c) depict 600 points denoted by “+” and “o” that are obtained from three bivariate normal distribut ...