Applications of Data Mining Techniques to Electric Load Profiling

... Data Mining (abbreviated DM) is currently a fashionable term, and seems to be gaining slight favour over its near synonym Knowledge Discovery in Databases (KDD). Since there is no unique definition, it is not possible to set rigid boundaries upon what is and is not a data mining technique; the defin ...

See new possibilities with predictive analytics

... predictive analytics software is driving high ROI with leading organizations worldwide, SPSS continues to receive recognition from top industry experts.” Last year, SPSS was selected by CRMGuru.com as the most customer-centric solution provider in marketing automation. CRMGuru.com is the world’s lar ...

Tensor Decompositions and Applications

... See [16] for further discussion of concepts in this subsection. 2.6. Matrix Kronecker, Khatri–Rao, and Hadamard Products. Several matrix products are important in the sections that follow, so we brieﬂy deﬁne them here. The Kronecker product of matrices A ∈ RI×J and B ∈ RK×L is denoted by A ⊗ B. The ...

Multiple additive regression trees: a methodology for

... Total error rate for CNI classification of OAK-Detailed data with MART(M=14,F=3). ........19 Misclassification error rate for CNI classes; in classification of OAK-Detailed data with MART(M=14, F=3). .................................................................................................... ...

Tensor Decompositions and Applications

... See [16] for further discussion of concepts in this subsection. 2.6. Matrix Kronecker, Khatri-Rao, and Hadamard products. Several matrix products are important in the sections that follow, so we briefly define them here. The Kronecker product of matrices A ∈ RI×J and B ∈ RK×L is denoted by A ⊗ B. Th ...

Selecting the right correlation measure for binary data.

... sets of arbitrary size, most of the published work with regard to correlation is related to finding correlated pairs [Tan et al. 2004; Geng and Hamilton 2006]. Related work with association rules [Brin et al. 1997a, 1997b; Omiecinski 2003] is a special case of correlation pairs since each rule has a ...

Exploiting A Support-based Upper Bound of Pearson`s Correlation

... the upper bound of the φ correlation coefficient as a coarse filter. In other words, if the upper bound of the φ correlation coefficient for an item pair is less than the user-specified correlation threshold, we can prune this item pair right way. The second pruning technique prunes item pairs based ...

Collinearity: a review of methods to deal with it and a simulation

... (“independent” or “explanatory”) variables. Examples include regression models of all types, classification and regression trees as well as neural networks. Ecologists might be interested in understanding the factors affecting some observed response, or they might want to fit a ...

Mining Hierarchies of Correlation Clusters

... Since the local covariance matrix ΣP of a point P is a square matrix it can be decomposed into the Eigenvalue matrix EP of P and the Eigenvector matrix VP of P such that ΣP = VP · EP · VPT . The Eigenvalue matrix EP is a diagonal matrix holding the Eigenvalues of ΣP in decreasing order in its diagon ...

Truth and robustness in cross-country growth regressions

... Advocates of extreme-bounds analysis sometimes argue that the concern with truth is misplaced because to reject a variable as fragile is not to deny that it might be a true determinant of the dependent variable. Rather it is to deny that we have good evidence for its truth. That is, the claim is ab ...

Relational Data Mining Through Extraction of Representative

... Let us examine the stability of the standard when outliers are feared. We simulate outliers that we append to an initial dataset. We consider that the standard extraction is robust against outliers when the extracted standard remains one of three most frequent standards of the initial dataset. In th ...

Data mining reconsidered: encompassing and the general

... features of the economic world.4 Any more parsimonious model is an improvement on such a complicated model if it conveys all of the same information in a simpler, more compact form. Such a parsimonious model would necessarily be superior to all other models that are restrictions of the completely ge ...

Lecture 2 Use SAS Enterprise Miner

... Cluster analysis (also known as data segmentation in the data mining community) has a variety of goals. The goals are invariably related to grouping or segmenting a collection of objects into disjoint subsets or “clusters” such that those objects within each cluster are “similar” to each other while ...

Two heads better than one: Pattern Discovery in Time

... multi-aspect streams? Some recent developments are along these lines such as Dynamic Tensor Analysis [11] and Window-based Tensor Analysis [10], which incrementalize the standard offline tensor decompositions such as Tensor PCA (Tucker 2) and Tucker. However, the existing work often adopts the same ...

paper - USC CSSE`s homepage - University of Southern California

... system will analyze the subsets of data as requested by the user to calculate the required statistics.” Capability requirements, on the other hand, are more precise statements about the functional requirements of a project. A typical capability requirement specifies the pre-conditions, post-conditio ...

Variable Selection and Outlier Detection for Automated K

... always outliers since they can be considered as another cluster if there are some cases over some threshold compared to the total number of cases. Our focus is to find local outliers. To detect local outliers, we adopt hybrid approaches that combine clustering-based approaches and distance-based app ...

Streaming Pattern Discovery in Multiple Time

... principal components yτ,1 and yτ,2 are the coordinates of these projections in the orthogonal coordinate system defined by w1 and w2 . However, batch methods for estimating the principal components require time that depends on the duration t, which grows to infinity. In fact, the principal direction ...

Mining TOP-K Strongly Correlated Pairs in Large Databases

... cube computations. He showed that finding the subcubes that satisfy statistical tests such as χ2 are inherently NPhard, but can be made more tractable using approximation schemes. Jermaine [9] also presented an iterative procedure for high-dimensional correlation analysis by shaving off part of the ...

Tree-based Models: Identification of Influential Factors under Condition of Instability

... inputs from the original data set to be analyzed. Each tree-based model from the Pareto-optimal set is represented in this data set by one column that reflects the importance measurements of the corresponding input variables. The concept of partial lists (Dwork et al., 2000) provide a practical way ...

Comparison of K-means, Normal Mixtures and Probabilistic-D Clustering for B2B Segmentation using Customers’ Perceptions

... characteristics of their customers in order to satisfy them [8]. Several clustering methods and numerous clustering algorithms have been developed by statisticians, and are available in the literature. These methods and algorithms vary depending on how similarity between observations is defined as w ...

PREDICTION AND CLASSIFICATION IN NONLINEAR DATA

... optimal ordering. Nonmonotonic functions can also be used for continuous (numeric) and ordinal variables when nonlinear relationships among the variables are assumed. In these cases, we can collapse the data in a limited number of categories (sometimes called binning), and find an optimal quantifica ...

Using Probabilistic Latent Semantic Analysis for Web Page Grouping

... recently. Basically, there are two kinds of clustering methods in the context of web usage mining, which are associated with the objects of performing: user session clustering and web page clustering [16]. One successful application of web page clustering is adaptive web site. For example, an algori ...

Aalborg Universitet Co-clustering for Weblogs in Semantic Space

... capturing the latent semantic factor hidden in the co-occurrence observations. As such combining the latent semantic analysis with clustering motivates the idea presented in this paper. In this paper, we aim to propose a novel approach addressing the co-clustering of web users and pages by leveragin ...

Data Mining Techniques for Mortality at Advanced Age

... quadratic relationship between the response variable and the dependent variables. For example, the tree will produce a step function to explain the relationship among age and mortality rate even if the true relationship between age and mortality rate is linear or quadratic. Thus, some preliminary wo ...

Feature Subset Selection and Feature Ranking for

... In practice, PCA is performed by applying Singular Value Decomposition (SVD) to either a covariance matrix or a correlation matrix of an MTS item depending on the data set. That is, when a covariance matrix A is decomposed by SVD, i.e., A ¼ UU T , a matrix U contains the variables’ loadings for the ...

1 2 3 >

Exploratory factor analysis

In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. EFA is based on the common factor model. Within the common factor model, a function of common factors, unique factors, and errors of measurements expresses measured variables. Common factors inﬂuence two or more measured variables, while each unique factor inﬂuences only one measured variable and does not explain correlations among measured variables.EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to confirmatory factor analysis (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Exploratory factor analysis