
CV - Peter Laurinec
... to big data. I analyze methods that effectively handle large volumes of data and data streams. I see the application in the domain of energy and smart grids. The area is interesting to examine from the perspective of sustainable sources of energy, economy and environment. ...
... to big data. I analyze methods that effectively handle large volumes of data and data streams. I see the application in the domain of energy and smart grids. The area is interesting to examine from the perspective of sustainable sources of energy, economy and environment. ...
Data Mining
... • Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. • Approach: – collect different attributes on customers based on geographical, and lifestyle related information – identify clust ...
... • Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. • Approach: – collect different attributes on customers based on geographical, and lifestyle related information – identify clust ...
Region Discovery Technology - Department of Computer Science
... building search engines that can navigate through millions of documents and return a ranked set of documents based on user interests and user feedback. Earth scientists are interested to have similar capabilities to find interesting regions on the planet earth based on knowledge that s stored in mul ...
... building search engines that can navigate through millions of documents and return a ranked set of documents based on user interests and user feedback. Earth scientists are interested to have similar capabilities to find interesting regions on the planet earth based on knowledge that s stored in mul ...
Comparative Analysis of K-Means and Kohonen
... information technology and computer science, high-capacity data appear in our lives . In order to help people analyzing and digging out useful information, the generation and application of data mining technology seem so significance. Clustering and decision tree are the mostly used methods of data ...
... information technology and computer science, high-capacity data appear in our lives . In order to help people analyzing and digging out useful information, the generation and application of data mining technology seem so significance. Clustering and decision tree are the mostly used methods of data ...
Data Mining Assignment
... 1b.) Data smoothing is processing data to reduce the number of values by removing noise in the data. It is external smoothing if done before classification and internal smoothing if done during the classification procedure. Good 1c). Decimal scaling accomplishes the normalization of values by moving ...
... 1b.) Data smoothing is processing data to reduce the number of values by removing noise in the data. It is external smoothing if done before classification and internal smoothing if done during the classification procedure. Good 1c). Decimal scaling accomplishes the normalization of values by moving ...
Scalable Cluster Analysis of Spatial Events
... is characterized by the number of events in it, its duration, and start and end time. The durations of the clusters range from 34 seconds to 242 minutes, 43% of them have duration up to 10 minutes while very long clusters are rare. We interactively filter out clusters with durations below 10 minutes ...
... is characterized by the number of events in it, its duration, and start and end time. The durations of the clusters range from 34 seconds to 242 minutes, 43% of them have duration up to 10 minutes while very long clusters are rare. We interactively filter out clusters with durations below 10 minutes ...
Questions October 4
... In the news clustering problem we computed the distance between two news entities based on their (key-) wordlists A and B as follows: distance(A,B)=1-(|AB)|/|AB|) with ‘||’ denoting set cardinality; e.g. |{a,b}|=2. Why do we divide by (AB) in the formula? 2. What is the main difference between or ...
... In the news clustering problem we computed the distance between two news entities based on their (key-) wordlists A and B as follows: distance(A,B)=1-(|AB)|/|AB|) with ‘||’ denoting set cardinality; e.g. |{a,b}|=2. Why do we divide by (AB) in the formula? 2. What is the main difference between or ...
improved mountain clustering algorithm for gene expression data
... service, GOstat [22]. This accepts group IDs, of clustered genes which are to be annotated and of the total genes in the microarray data as input. The enrichment p-value is calculated using hypergeometric distribution [23]. K-means Clustering K-means [5, 6] is one of the most widely used clustering ...
... service, GOstat [22]. This accepts group IDs, of clustered genes which are to be annotated and of the total genes in the microarray data as input. The enrichment p-value is calculated using hypergeometric distribution [23]. K-means Clustering K-means [5, 6] is one of the most widely used clustering ...
4 - Read
... The main idea is to define k centroids, one for each cluster. These centroids shoud be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to ...
... The main idea is to define k centroids, one for each cluster. These centroids shoud be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to ...
beyond the curse of multidimensionality: high dimensional clustering
... methods enable the structural information represented in relatively few dimensions to be preserved, removing noise. Multidimensional data analysis techniques are often used for this task. Correspondence analysis (Lebart et al., 1998) is one of the most common methods for analysing lexical tables, an ...
... methods enable the structural information represented in relatively few dimensions to be preserved, removing noise. Multidimensional data analysis techniques are often used for this task. Correspondence analysis (Lebart et al., 1998) is one of the most common methods for analysing lexical tables, an ...
Different Perspectives at Clustering: The Number-of
... Clustering with Ward Criterion; Extensions of Ward Clustering DATA RECOVERY MODELS: Statistics Modelling as Data Recovery; Data Recovery Model for K-Means; for Ward; Extensions to Other Data Types; One-by-One Clustering DIFFERENT CLUSTERING APPROACHES: Extensions of K-Means; Graph-Theoretic Approach ...
... Clustering with Ward Criterion; Extensions of Ward Clustering DATA RECOVERY MODELS: Statistics Modelling as Data Recovery; Data Recovery Model for K-Means; for Ward; Extensions to Other Data Types; One-by-One Clustering DIFFERENT CLUSTERING APPROACHES: Extensions of K-Means; Graph-Theoretic Approach ...
My presentation - User Web Pages
... [4] Kargupta, H., Bhargava, R., Liu, K., Powers, M., Blair, P., Bushra, S., Dull, J., Sarkar, K., Klein, M., Vasa, M., Handy, D.: VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring. Accepted for publication in the Proceedings of the SIAM International Data Min ...
... [4] Kargupta, H., Bhargava, R., Liu, K., Powers, M., Blair, P., Bushra, S., Dull, J., Sarkar, K., Klein, M., Vasa, M., Handy, D.: VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring. Accepted for publication in the Proceedings of the SIAM International Data Min ...
Review of Kohonen-SOM and K-Means data mining Clustering
... of the Two-Step clustering where combine K-Means A self-organizing map (SOM) or self-organizing feature and HAC. map (SOFM) is a kind of artificial neural network that [1] is trained using unsupervised learning to produce a The Kohonen algorithm is a very powerful tool for data low-dimensional (typi ...
... of the Two-Step clustering where combine K-Means A self-organizing map (SOM) or self-organizing feature and HAC. map (SOFM) is a kind of artificial neural network that [1] is trained using unsupervised learning to produce a The Kohonen algorithm is a very powerful tool for data low-dimensional (typi ...
Non-parametric Mixture Models for Clustering
... to squared error based clustering algorithms such as K-means, which is one of the most popular clustering algorithms due to its ease of implementation and reasonable empirical performance [1]. The limitations of parametric mixture models can be overcome by the use of algorithms that exploit non-para ...
... to squared error based clustering algorithms such as K-means, which is one of the most popular clustering algorithms due to its ease of implementation and reasonable empirical performance [1]. The limitations of parametric mixture models can be overcome by the use of algorithms that exploit non-para ...
Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It will often be necessary to modify data preprocessing and model parameters until the result achieves the desired properties.Besides the term clustering, there are a number of terms with similar meanings, including automatic classification, numerical taxonomy, botryology (from Greek βότρυς ""grape"") and typological analysis. The subtle differences are often in the usage of the results: while in data mining, the resulting groups are the matter of interest, in automatic classification the resulting discriminative power is of interest. This often leads to misunderstandings between researchers coming from the fields of data mining and machine learning, since they use the same terms and often the same algorithms, but have different goals.Cluster analysis was originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Robert Tryon in 1939 and famously used by Cattell beginning in 1943 for trait theory classification in personality psychology.