
Outlier Detection using Semi-supervised and Unsupervised Learning on High Dimensional Data
... The ODIN method used outlier scoring based on Nk counts. Without using threshold parameters ...
... The ODIN method used outlier scoring based on Nk counts. Without using threshold parameters ...
Trajectory Boundary Modeling of Time Series for Anomaly Detection
... we mean that each test point receives an anomaly score, with an upper bound on computation time. We accept that there is no "best" anomaly detection algorithm for all data, and that many algorithms have ad-hoc parameters which are tuned to specific data sets. Therefore our subgoal is to provide tool ...
... we mean that each test point receives an anomaly score, with an upper bound on computation time. We accept that there is no "best" anomaly detection algorithm for all data, and that many algorithms have ad-hoc parameters which are tuned to specific data sets. Therefore our subgoal is to provide tool ...
Assisting Higher Education in Assessing, Predicting, and Managing
... Historically, the most prominent category in educational data mining has been the relational mining. In this research, we focus on knowledge discovery in data mining (KDD) and predictive modeling applications in education. This area is known as educational data mining (EDM). Education data mining is ...
... Historically, the most prominent category in educational data mining has been the relational mining. In this research, we focus on knowledge discovery in data mining (KDD) and predictive modeling applications in education. This area is known as educational data mining (EDM). Education data mining is ...
FROM DATA MINING TO KNOWLEDGE MINING: SYMBOLIC DATA
... Example 1: in decision tree By standard encoding of symbolic data, the variable size desapears as it is replaced by the « Size Min » and « Size max » The symbolic approach allows the use of the variable « size » itself and not the variables « Size Min » and « Size max » ...
... Example 1: in decision tree By standard encoding of symbolic data, the variable size desapears as it is replaced by the « Size Min » and « Size max » The symbolic approach allows the use of the variable « size » itself and not the variables « Size Min » and « Size max » ...
ARAA: A Fast Advanced Reverse Apriori Algorithm for Mining
... together during a server session [10]. Association rules shows the potential relationship between pages that are often visited together although they are not directly associated. It can depict the associations between groups of users with precise interests. Association rule mining are used for busin ...
... together during a server session [10]. Association rules shows the potential relationship between pages that are often visited together although they are not directly associated. It can depict the associations between groups of users with precise interests. Association rule mining are used for busin ...
data stream mining algorithms – a review of issues and existing
... mining result leads to a dilemma. The smaller the value of ε, the more accurate is the approximation but the greater is the number of sub-FIs generated, which requires both more memory space and more CPU processing power. However, if ε approaches σ, more false-positive answers will be included in th ...
... mining result leads to a dilemma. The smaller the value of ε, the more accurate is the approximation but the greater is the number of sub-FIs generated, which requires both more memory space and more CPU processing power. However, if ε approaches σ, more false-positive answers will be included in th ...
Spatial Data Mining
... the need for the automated discovery of spatial knowledge. Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful patterns from spatial databases. The complexity of spatial data and intrinsic spatial relationships limits the usefulness of convent ...
... the need for the automated discovery of spatial knowledge. Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful patterns from spatial databases. The complexity of spatial data and intrinsic spatial relationships limits the usefulness of convent ...
Instant Selection of High Contrast Projections in Multi
... as PCA detect one global projection for the entire database [13]. However, as a major limitation of PCA and other dimensionality reduction techniques, they all provide a single projection only. Thus, they miss outliers hidden in different projections. Subspace outlier mining has tackled this challen ...
... as PCA detect one global projection for the entire database [13]. However, as a major limitation of PCA and other dimensionality reduction techniques, they all provide a single projection only. Thus, they miss outliers hidden in different projections. Subspace outlier mining has tackled this challen ...
Algorithmic Approach to Data Mining and Classification Techniques
... In2 proposed how data mining and knowledge discovery are related to each other and to other fields such as machine learning and statistics. Method to discover knowledge discovery from a database through data mining is given. Data mining steps of KDD steps are given. An experiment was performed on a ...
... In2 proposed how data mining and knowledge discovery are related to each other and to other fields such as machine learning and statistics. Method to discover knowledge discovery from a database through data mining is given. Data mining steps of KDD steps are given. An experiment was performed on a ...
Social Big Data: Recent achievements and new challenges (PDF
... social big data application: knowledge extraction and exploitation. The rest of the paper is structured as follows; Section 2 provides an introduction to the basics on the methodologies, frameworks, and software used to work with big data. Section 3 provides a description of the current state of the ...
... social big data application: knowledge extraction and exploitation. The rest of the paper is structured as follows; Section 2 provides an introduction to the basics on the methodologies, frameworks, and software used to work with big data. Section 3 provides a description of the current state of the ...
pdf 167K
... is additionally supposed to profit from ongoing trends in hardware development. E.g. [13] suggests that the transfer rates of disk drives continue to improve much faster than the rotational delay time. As a consequence the optimum page size with respect to I/O will even increase. As we will show in ...
... is additionally supposed to profit from ongoing trends in hardware development. E.g. [13] suggests that the transfer rates of disk drives continue to improve much faster than the rotational delay time. As a consequence the optimum page size with respect to I/O will even increase. As we will show in ...
Data mining in soft computing framework: a survey
... support or exploration, and understanding the phenomenon governing the data source. In most domains, data analysis was traditionally a manual process. One or more analysts would become intimately familiar with the data and, with the help of statistical techniques, provide summaries and generate repo ...
... support or exploration, and understanding the phenomenon governing the data source. In most domains, data analysis was traditionally a manual process. One or more analysts would become intimately familiar with the data and, with the help of statistical techniques, provide summaries and generate repo ...
... This work has been partially supported by the Spanish Ministry of Science and Innovation (DAMASK project, Data mining algorithms with semantic knowledge, TIN2009-11005) and the Spanish Government (PlanE, Spanish Economy and Employment Stimulation Plan), the Universitat Rovira i Virgili (2009AIRE-04) ...
Spatio-Temporal Outlier Detection in Precipitation Data
... The Exact-Grid Top-k algorithm finds the top-k outliers for each time period by keeping track of the highest discrepancy regions as they are found. As it iterates through all the region shapes, it may find a new region that has a discrepancy value higher than the lowest discrepancy value (kth value) ...
... The Exact-Grid Top-k algorithm finds the top-k outliers for each time period by keeping track of the highest discrepancy regions as they are found. As it iterates through all the region shapes, it may find a new region that has a discrepancy value higher than the lowest discrepancy value (kth value) ...
Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It will often be necessary to modify data preprocessing and model parameters until the result achieves the desired properties.Besides the term clustering, there are a number of terms with similar meanings, including automatic classification, numerical taxonomy, botryology (from Greek βότρυς ""grape"") and typological analysis. The subtle differences are often in the usage of the results: while in data mining, the resulting groups are the matter of interest, in automatic classification the resulting discriminative power is of interest. This often leads to misunderstandings between researchers coming from the fields of data mining and machine learning, since they use the same terms and often the same algorithms, but have different goals.Cluster analysis was originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Robert Tryon in 1939 and famously used by Cattell beginning in 1943 for trait theory classification in personality psychology.