Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
國立雲林科技大學 National Yunlin University of Science and Technology Toward Exploratory Test-Instance-Centered Diagnosis in High-Dimensional Classification Charu C. Aggarwal TKDE, Vol. 19, No. 8, 2007, pp. 1001-1015. Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 2007/10/3 Intelligent Database Systems Lab Outline Introduction Quantification of discriminatory subspaces Exploratory construction of decision paths N.Y.U.S.T. I. M. Determination of subspace alternatives for path construction Construction of visual density profiles Isolation of instance-centered local data segments Experimental results Conclusion and summary Comments 2 Intelligent Database Systems Lab Root Motivation Age Education Gender N.Y.U.S.T. I. M. Family Decision Tree and rule-based system Salary Strict hierarchical partitioning makes one cluster be divided into many different nodes. A large number of overlapping rules that are not particularly optimized to the test instance. Basic limitation of classification methods The succinct summary may fail to capture such instancespecific characteristics. This incompleteness in data characterization may result in the particular structure of the classifiers to be more or less suited to particular kinds of test instances. 3 Intelligent Database Systems Lab Objective Diagnostic classification N.Y.U.S.T. I. M. Comprehensive exploratory ability for individual test instances. Decision path construction For an exploratory classification of high dimensional data. Finding the diagnostic classification behavior of a particular test instance. Providing user a visual representation of the data in a small number of well chosen subspaces. 4 Intelligent Database Systems Lab Quantification of discriminatory subspaces N.Y.U.S.T. I. M. Kernel density estimation One way of intuitively characterizing the discrimination in a subspace is to quantify the difference in class distribution at each point in the space. Accuracy density A(x, Ci, D) for the class Ci Interest density for the class Ci ii The class Ci is overrepresented at x when the interest density is larger than 1. 5 Intelligent Database Systems Lab Exploratory construction of decision paths N.Y.U.S.T. I. M. Subspace determination process Finds the appropriate local discriminative subspaces for a given test example. In each of these subspaces, the user is provided with a visual profile of the accuracy density. In the event that a decision path is chosen, which is not strongly indicative of any class, the user has the option to backtrack to a higher level node and explore a different path of the tree. 6 Intelligent Database Systems Lab Determination of subspace alternatives for path construction 7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Speeding up subspace determination Representative points For the first level of the decision process, we randomly sampled maxrep points from the database and computed their (dominant) interest density and class in each of the possible 2D combinations. Maximum size sample For subsequent iterations (lower levels of the decision process), only a random sample of the maximum size (maxsiz) was used for determining the classification subspace. 8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Construction of visual density profiles Visual profile of the accuracy density Once the most discriminatory subspaces have been determined at a given node, we construct it in these projections. 9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Isolation of instance-centered local data segments N.Y.U.S.T. I. M. Concept An easy way of isolating smaller segments of the data is for the user to specify an accuracy density threshold. A well-defined local region around the test instance can be clearly distinguished Such a judgment can be effectively made only by human perception and intuition. 10 Intelligent Database Systems Lab Termination User determination N.Y.U.S.T. I. M. Provides the user with an open-ended exploratory ability, the final decision of termination is dependent upon the user. Cumulative dominance level Statistical measure of the level of significance of a given path is obtained by computing the cumulative dominance level of each class Ci along PATH. 11 Intelligent Database Systems Lab Experimental results N.Y.U.S.T. I. M. Commercial document from UCI The class label was binary, 150,000 records containing 46 attributes corresponding to topical content. 12 Intelligent Database Systems Lab Conclusion and summary SD-Path method N.Y.U.S.T. I. M. An effective exploratory instance-based approach for the decision path construction for high-dimensional data sets. Combining the data mining process with human interaction in order to provide a good understanding of the classification characteristics of a given test instance. The ability to explore multiple paths of an instancespecific process Provides the user with multiple perspectives of the important characteristics in the instance. 13 Intelligent Database Systems Lab Comments Advantage This proposed method provides a novel solution for finding decision path from indeterminate instances in general classification methods. This diagnosis helps the user understand the various combinations of dimensions, which reveal this contradicting behavior. Drawback N.Y.U.S.T. I. M. Is it possible to remove the judgment of human intuition from the proposed method , if the performance of SD(Subspace Decision)-path can be measured via specified equations? Application Insight exploration of indeterminate instance which cannot be classified via general classification method. 14 Intelligent Database Systems Lab