Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Soft Computing, Machine Intelligence and Data Mining Sankar K. Pal Machine Intelligence Unit Indian Statistical Institute, Calcutta http://www.isical.ac.in/~sankar MIU Activities (Formed in March 1993) Pattern Recognition and Image Processing Data Mining Data Condensation, Feature Selection Support Vector Machine Case Generation Soft Computing Fuzzy Logic, Neural Networks, Genetic Algorithms, Rough Sets Hybridization Case Based Reasoning Fractals/Wavelets Image Compression Digital Watermarking Wavelet + ANN Bioinformatics Color Image Processing Externally Funded Projects INTEL CSIR Silicogene Center for Excellence in Soft Computing Research Foreign Collaborations (Japan, France, Poland, Honk Kong, Australia) Editorial Activities Journals, Special Issues Books Achievements/Recognitions Faculty: 10 Research Scholar/Associate: 8 Contents What is Soft Computing ? - Computational Theory of Perception Pattern Recognition and Machine Intelligence Relevance of Soft Computing Tools Different Integrations Emergence of Data Mining Need KDD Process Relevance of Soft Computing Tools Rule Generation/Evaluation • Modular Evolutionary Rough Fuzzy MLP – Modular Network – Rough Sets, Granules & Rule Generation – Variable Mutation Operations – Knowledge Flow – Example and Merits Rough-fuzzy Case Generation Granular Computing Fuzzy Granulation Mapping Dependency Rules to Cases Case Retrieval Examples and Merits Conclusions SOFT COMPUTING (L. A. Zadeh) Aim : • To exploit the tolerance for imprecision uncertainty, approximate reasoning and partial truth to achieve tractability, robustness, low solution cost, and close resemblance with human like decision making • To find an approximate solution to an imprecisely/precisely formulated problem. Parking a Car Generally, a car can be parked rather easily because the final position of the car is not specified exactly. It it were specified to within, say, a fraction of a millimeter and a few seconds of arc, it would take hours or days of maneuvering and precise measurements of distance and angular position to solve the problem. High precision carries a high cost. The challenge is to exploit the tolerance for imprecision by devising methods of computation which lead to an acceptable solution at low cost. This, in essence, is the guiding principle of soft computing. • Soft Computing is a collection of methodologies (working synergistically, not competitively) which, in one form or another, reflect its guiding principle: Exploit the tolerance for imprecision, uncertainty, approximate reasoning and partial truth to achieve Tractability, Robustness, and close resemblance with human like decision making. Foundation for the conception and design of high MIQ (Machine IQ) systems. Provides Flexible Information Processing Capability for representation and evaluation of various real life ambiguous and uncertain situations. Real World Computing •• It may be argued that it is soft computing rather than hard computing that should be viewed as the foundation for Artificial Intelligence. • At this junction, the principal constituents of soft computing are Fuzzy Logic FL , Neurocomputing NC , Genetic Algorithms GA and Rough Sets RS . • Within Soft Computing FL, NC, GA, RS are Complementary rather than Competitive Role of FL : the algorithms for dealing with imprecision and uncertainty NC : the machinery for learning and curve fitting GA : the algorithms for search and optimization RS handling uncertainty arising from the granularity in the domain of discourse Referring back to example: “Parking a Car’’ Do we use any measurement and computation while performing the tasks in Soft Computing? We use Computational Theory of Perceptions (CTP) AI Magazine, 22(1), 73-84, 2001 Computational Theory of Perceptions (CTP) Provides capability to compute and reason with perception based information Example: Car parking, driving in city, cooking meal, summarizing story Humans have remarkable capability to perform a wide variety of physical and mental tasks without any measurement and computations They use perceptions of time, direction, speed, shape, possibility, likelihood, truth, and other attributes of physical and mental objects Reflecting the finite ability of the sensory organs and (finally the brain) to resolve details, Perceptions are inherently imprecise Perceptions are fuzzy (F) – granular (both fuzzy and granular) Boundaries of perceived classes are unsharp Values of attributes are granulated. (a clump of indistinguishable points/objects) Example: Granules in age: very young, young, not so old,… Granules in direction: slightly left, sharp right,… Hybrid Systems Neuro-fuzzy Genetic neural Fuzzy genetic Probabilistic reasoning Fuzzy neuro Approximate reasoning genetic Case based reasoning Knowledge-based Systems Fuzzy logic Machine Intelligence Data Driven Systems Neural network system Evolutionary computing Non-linear Dynamics Rough sets Pattern recognition and learning Chaos theory Rescaled range analysis (wavelet) Fractal analysis Machine Intelligence: A core concept for grouping various advanced technologies with Pattern Recognition and Learning Relevance of FL, ANN, GAs Individually to PR Problems is Established In late eighties scientists thought – Why NOT Integrations ? Fuzzy Logic + ANN ANN + GA Fuzzy Logic + ANN + GA Fuzzy Logic + ANN + GA + Rough Set Neuro-fuzzy hybridization is the most visible integration realized so far. Why Fusion Fuzzy Set theoretic models try to mimic human reasoning and the capability of handling uncertainty – (SW) Neural Network models attempt to emulate architecture and information representation scheme of human brain – (HW) NEURO-FUZZY Computing (for More Intelligent System) FUZZY SYSTEM ANN used for learning and Adaptation NFS ANN Fuzzy Sets used to Augment its Application FNN domain MERITS GENERIC APPLICATION SPECIFIC Rough Fuzzy Hybridization: A New Trend in Decision Making, S. K. Pal and A. Skowron (eds), Springer-Verlag, Singapore, 1999 IEEE TNN, .9, 1203-1216, 1998 Incorporate Domain Knowledge using Rough Sets L M Fj H FjL 1 FjM 2 FjH 3 1 2 GA Tuning XX|000|XX 0 0 | X X X | 00 Integrating of ANN, FL, Gas and Rough Sets 3 Before we describe • Modular Evolutionary Rough-fuzzy MLP • Rough-fuzzy Case Generation System We explain Data Mining and the significance of Pattern Recognition, Image Processing and Machine Intelligence. Why Data Mining ? Digital revolution has made digitized information easy to capture and fairly inexpensive to store. With the development of computer hardware and software and the rapid computerization of business, huge amount of data have been collected and stored in centralized or distributed databases. • Data is heterogeneous (mixture of text, symbolic, numeric, texture, image), huge (both in dimension and size) and scattered. • The rate at which such data is stored is growing at a phenomenal rate. As a result, traditional ad hoc mixtures of statistical techniques and data management tools are no longer adequate for analyzing this vast collection of data. Pattern Recognition and Machine Learning principles applied to a very large (both in size and dimension) heterogeneous database Data Mining Data Mining + Knowledge Interpretation Knowledge Discovery Process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data Pattern Recognition, World Scientific, 2001 Data Mining (DM) •Data Cleaning Knowledge MatheInterpretation matical Preprocessed Model of •Knowledge •Dimensionality Data •Classification Extraction Reduction •Clustering Data (Patterns) •Knowledge •Rule Evaluation Generation •Data Wrapping/ Description •Data Condensation Huge Raw Data Machine Learning Knowledge Discovery in Database (KDD) Useful Knowledge Why Growth of Interest ? Falling cost of large storage devices and increasing ease of collecting data over networks. Availability of Robust/Efficient machine learning algorithms to process data. Falling cost of computational power enabling use of computationally intensive methods for data analysis. Example : Medical Data Numeric and textual information may be interspersed Different symbols can be used with same meaning Redundancy often exists Erroneous/misspelled medical terms are common Data is often sparsely distributed Robust preprocessing system is required to extract any kind of knowledge from even medium-sized medical data sets The data must not only be cleaned of errors and redundancy, but organized in a fashion that makes sense for the problem So, We NEED Efficient Robust Flexible Machine Learning Algorithms NEED for Soft Computing Paradigm Without “Soft Computing” Machine Intelligence Research Remains Incomplete. Modular Neural Networks Task: Split a learning task into several subtasks, train a Subnetwork for each subtask, integrate the subnetworks to generate the final solution. Strategy: ‘Divide and Conquer’ The approach involves • Effective decomposition of the problems s.t. the Subproblems could be solved with compact networks. • Effective combination and training of the subnetworks s.t. there is Gain in terms of both total training time, network size and accuracy of solution. Advantages • Accelerated training • The final solution network has more structured components • Representation of individual clusters (irrespective of size/importance) is better preserved in the final solution network. • The catastrophic interference problem of neural network learning (in case of overlapped regions) is reduced. 3-class problem Class 1 Subnetwork 3 (2-class problem) Class 2 Subnetwork Class 3 Subnetwork Integrate Subnetwork Modules Links to be grown Links with values preserved Final Training Phase I Final Network Inter-module links grown Modular Rough Fuzzy MLP? A modular network designed using four different Soft Computing tools. Basic Network Model: Fuzzy MLP Rough Set theory is used to generate Crude decision rules Representing each of the classes from the Discernibility Matrix. (There may be multiple rules for each class multiple subnetworks for each class) The Knowledge based subnetworks are concatenated to form a population of initial solution networks. The final solution network is evolved using a GA with variable mutation operator. The bits corresponding to the Intra-module links (already evolved) have low mutation probability, while Inter-module links have high mutation probability. Rough Sets WB U Upper Approximation BX Set X Lower Approximation BX .x [x]B (Granules) [x]B = set of all points belonging to the same granule as of the point x in feature space WB. [x]B is the set of all points which are indiscernible with point x in terms of feature subset B. Approximations of the set X U w.r.t feature subset B B-lower: BX = {x U : [ x]B X } Granules definitely belonging to X B-upper: BX = {x U : [ x]B X } Granules definitely and possibly belonging to X If BX = BX, X is B-exact or B-definable Otherwise it is Roughly definable Rough Sets Uncertainty Handling (Using lower & upper approximations) Granular Computing (Using information granules) Granular Computing: Computation is performed using information granules and not the data points (objects) Information compression Computational gain Information Granules and Rough Set Theoretic Rules low medium high F2 low medium high F 1 Rule M1 M 2 • Rule provides crude description of the class using granule Rough Set Rule Generation Decision Table: Object F1 F2 F3 F4 F5 Decision x1 x2 x3 x4 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 Class 1 Class 1 Class 1 Class 2 x5 1 1 1 0 0 Class 2 Discernibility Matrix (c) for Class 1: cij {a : a( xi ) a( x j )},1 i, j p} Objects x1 x2 x3 x1 F1 , F 3 F2 , F4 x2 x3 F1,F2,F3,F4 Discernibility function: f xk j { (ckj ) : 1 j p, j k , ckj } Discernibility function considering the object x1 belonging to Class 1 = Discernibility of x1 w.r.t x2 (and) Discernibility of x1 w.r.t x3 = ( F1 F3 ) ( F2 F4 ) Similarly, Discernibility function considering object x2 F1 F2 F3 F4 Dependency Rules (AND-OR form): Class 1 ( F1 F2 ) ( F1 F4 ) ( F3 F2 ) ( F3 F4 ) Class 1 F1 F2 F3 F4 Knowledge Flow in Modular Rough Fuzzy MLP IEEE Trans. Knowledge Data Engg., 15(1), 14-25, 2003 Feature Space Rough Set Rules c1 (L1 M 2 ) (M1 H 2 ) (R1) c2 M 2 H1 (R2) c2 L2 L1(R3) Network Mapping F2 C1 (R1) C1 C2(R2) C2(R3) R1 (Subnet 1) R2 (Subnet 2) R3 (Subnet 3) F1 Partial Training with Ordinary GA (SN1) (SN2) (SN3) Feature Space SN1 F2 SN2 Partially Refined Subnetworks SN3 F1 Concatenation of Subnetworks high mutation prob. low mutation prob. Evolution of the Population of Concatenated networks with GA having variable mutation operator Feature Space C1 F2 C2 Final Solution Network F1 Speech Data: 3 Features, 6 Classes 90% 80% 70% MLP 60% FMLP 50% MFMLP 40% RFMLP 30% MRFMLP 20% 10% 0% Train (20%) Test (80%) Classification Accuracy 250 200 MLP FMLP MFMLP RFMLP MRFML 150 100 50 0 Network Size (No. of Links) 3.00 2.50 2.00 1.50 1.00 MLP FMLP MFMLP RFMLP MRFMLP 0.50 0.00 Training Time (hrs) DEC Alpha Workstation @400MHz 1. MLP 2. Fuzzy MLP 3. Modular Fuzzy MLP 4. Rough Fuzzy MLP 5. Modular Rough Fuzzy MLP Network Structure: IEEE Trans. Knowledge Data Engg., 15(1), 14-25, 2003 Modular Rough Fuzzy MLP Structured (# of links few) Fuzzy MLP Unstructured (# of links more) Histogram of weight values Connectivity of the network obtained using Modular Rough Fuzzy MLP Rule Evaluation Accuracy Fidelity (Number of times network and rule base output agree) Confusion (should be restricted within minimum no. of classes) • Coverage (a rule base with smaller uncovered region i.e., % test set for which no rules are fired, is better) • Rule base size (smaller the no. of rules, more compact is the rule base) • Certainty (confidence of rules) IEEE Trans. Knowledge Data Engg., 15(1), 14-25, 2003 84% 82% 80% 78% 76% 74% 72% 70% 68% 66% PROPOSED Subset MofN X2R C4.5 Accuracy User's Accuracy Kappa Comparison of Rules obtained for Speech data 35% 16 30% 14 25% 12 Proposed Subset MofN X2R C4.5 10 8 6 4 2 Proposed Subset MofN X2R C4.5 20% 15% 10% 5% 0% 0 Number of Rules Uncovered Region (Sample) 2 1.4 1.2 1.5 1 Proposed Proposed 0.8 S ubset 0.6 MofN 0.4 X2R Subset 1 MofN X2R C4.5 0.5 C4.5 0.2 0 0 (sec) CPU Time Confusion Case Based Reasoning (CBR) • Cases : some typical situations, already experienced by the system. conceptualized piece of knowledge representing an experience that teaches a lesson for achieving the goals of the system. CBR involves — adapting old solutions to meet new demands using old cases to explain new situations or to justify new solutions — reasoning from precedents to interpret new situations. learns and becomes more efficient as a byproduct of its reasoning activity. • Example : Medical diagnosis and Law interpretation where the knowledge available is incomplete and/or evidence is sparse. Case Selection Cases belong to the set of examples encountered. Case Generation Constructed ‘Cases’ need not be any of the examples. Rough Sets Uncertainty Handling (Using lower & upper approximations) Granular Computing (Using information granules) IEEE Trans. Knowledge Data Engg., to appear Granular Computing and Case Generation Information Granules: A group of similar objects clubbed together by an indiscernibility relation. Granular Computing: Computation is performed using information granules and not the data points (objects) – Information compression – Computational gain Cases – Informative patterns (prototypes) characterizing the problems. • In rough set theoretic framework: Cases Information Granules • In rough-fuzzy framework: Cases Fuzzy Information Granules Characteristics and Merits • Cases are cluster granules, not sample points • Involves only reduced number of relevant features with variable size •• Less storage requirements •• Fast retrieval Suitable for mining data with large dimension and size How to Achieve? Fuzzy sets help in linguistic representation of patterns, providing a fuzzy granulation of the feature space Rough sets help in generating dependency rules to model ‘informative/representative regions’ in the granulated feature space. Fuzzy membership functions corresponding to the ‘representative regions’ are stored as Cases. Membership value Fuzzy (F)-Granulation: 1 low medium high 0.5 cM cL lL cH lM p-function Feature j Example IEEE Trans. Knowledge Data Engg., to appear F2 CASE 1 0.9 C1 L1 H 2 C 2 H1 L2 0.4 CASE 2 XX X X X X X X X 0.2 0.1 0.5 0.7 F1 Parameters of fuzzy linguistic sets low, medium, high Feature 1 : cL 0.1, lL 0.5, cM 0.5, lM 0.7, cH 0.7, lH 0.4 Feature 2 : cL 0.2, lL 0.5, cM 0.4, lM 0.7, cH 0.9, lH 0.5 Dependency Rules and Cases Obtained: Class1 L1 H 2 Class 2 H1 L2 Case 1: Feature No: 1, fuzzset (L): c = 0.1, l = 0.5 Feature No: 2, fuzzset (H): c = 0.9, l = 0.5 Class=1 Case 2: Feature No: 1, fuzzset (H): c = 0.7, l = 0.4 Feature No: 2, fuzzset (L): c = 0.2, l = 0.5 Class=2 Case Retrieval Similarity (sim(x,c)) between a pattern x and a case c is defined as: 1 sim ( x, c) n n ( j fuzzset ( x)) 2 j 1 n: number of features present in case c j fuzzset (x) : the degree of belongingness of pattern x to fuzzy linguistic set fuzzset for feature j. For classifying an unknown pattern, the case closest to the pattern in terms of sim(x,c) is retrieved and its class is assigned to the pattern. Evaluation in terms of: 1-NN classification accuracy using the cases. Training set 10% for case generation, and Test set 90% b) Number of cases stored in the case base. c) Average number of features required to store a case (navg). d) CPU time required for case generation (tgen). e) Average CPU time required to retrieve a case (tret). (on a Sun UltraSparc @350 MHz Workstation) a) Iris Flowers: 4 features, 3 classes, 150 samples 4 3.5 3 2.5 2 Rough-fuzzy 1.5 1 IB4 IB3 Random 0.5 0 100% 98% 96% 94% 92% 90% 88% 86% 84% 82% 80% Classification Accuracy (1-NN) avg. feature/case 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Rough-fuzzy IB3 IB4 Random 0.01 0.008 Rough-fuzzy IB3 IB4 Rough-fuzzy 0.006 IB3 0.004 IB4 Random Random 0.002 0 tgen(sec) Number of cases = 3 (for all methods) tret(sec) Forest Cover Types: 10 features, 7 classes, 5,86,012 samples 70% 10 60% 8 50% 6 4 Rough-fuzzy 40% IB3 30% IB4 Random 2 Rough-fuzzy IB3 IB4 Random 20% 10% 0% 0 Classification Accuracy (1-NN) avg. feature/case 8000 60 7000 50 6000 5000 Rough-fuzzy 4000 IB3 3000 IB4 2000 Random 40 Rough-fuzzy 30 IB3 IB4 20 Random 10 1000 0 tgen(sec) 0 Number of cases = 545 (for all methods) tret(sec) Hand Written Numerals: 649 features, 10 classes, 2000 samples 80% 700 600 500 Rough-fuzzy 400 IB3 300 IB4 200 Random 100 0 70% 60% 50% 40% 30% 20% 10% 0% Classification Accuracy (1-NN) avg. feature/case 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Rough-fuzzy IB3 IB4 Random 600 500 Rough-fuzzy IB3 400 Rough-fuzzy 300 IB3 IB4 Random IB4 200 Random 100 tgen(sec) 0 Number of cases = 50 (for all methods) tret(sec) For same number of cases Accuracy: Proposed method much superior to random selection and IB4, close IB3. Average Number of Features Stored: Proposed method stores much less than the original data dimension. Case Generation Time: Proposed method requires much less compared to IB3 and IB4. Case Retrieval Time: Several orders less for proposed method compared to IB3 and random selection. Also less than IB4. Conclusions • Relation between Soft Computing, Machine Intelligence and Pattern Recognition is explained. • Emergence of Data Mining and Knowledge Discovery from PR point of view is explained. • Significance of Hybridization in Soft Computing paradigm is illustrated. • Modular concept enhances performance, accelerates training and makes the network structured with less no. of links. • Rules generated are superior to other related methods in terms of accuracy, coverage, fidelity, confusion, size and certainty. • Rough sets used for generating information granules. • Fuzzy sets provide efficient granulation of feature space (F -granulation). • Reduced and variable feature subset representation of cases is a unique feature of the scheme. • Rough-fuzzy case generation method is suitable for CBR systems involving datasets large both in dimension and size. • Unsupervised case generation, Rough-SOM (Applied intelligence, to appear) • Application to multi-spectral image segmentation (IEEE Trans. Geoscience and Remote Sensing, 40(11), 2495-2501, 2002) • Significance in Computational Theory of Perception (CTP) Thank You!!