Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tutorial on Incremental Learning Ms. Preeti Mulay I. MOTIVATION Life-long learning, learning from examples, learning based on historical / empirical data is one of the interesting research problems. Our research aims at solving such research problems. Our goal is to develop a new method of clustering which allows incremental learning when new examples of known cluster structure become available. Completely new clusters are discovered when new examples are available. Consequently, we aim at developing new method to effectively incorporate new knowledge without loss of information which is already known. Incremental learning aims at enhancement of knowledge. Aim of our research is to develop self adaptive algorithm, which involves improving performance over time. To explain with example: enhancement of knowledge in humans happen over time, as all the information necessary to learn many concepts is rarely available a priori. Knowledge is constantly revised (evolves) as new pieces of information is made available over time. We term it “knowledge augmentation”. Hence it is feasible to achieve knowledge augmentation via incremental learning. II. INTRODUCTION Modern computing, technological advances and information technology has enabled large amount of engineering data collection. To discover hidden patterns for learning and knowledge enhancement for forecasting and estimation, various data mining and advanced machine learning techniques are required. These techniques focuses on pre-processing of raw data to remove noise and select impactful factors, to discover patterns, models etc. Clustering is one of the important areas of data mining. “Incremental Clustering” is the most important area of advance data mining which enables “incremental learning”. Patterns embedded in large volume of organizational data may change over time. Ideal advanced data mining technique should have incremental learning ability to update existing patterns with influx of new data. This research introduces new incremental clustering method based on closeness of data points. This new method works in two phases. In initial phase clusters are formed from raw data set, after pre-processing, and stored in cluster database for future reuse, along with cluster characteristics (cluster members, centre of cluster, distance between cluster, threshold values etc). Once basic clusters are formed, on influx of new data tuples, either existing clusters will be updated or new clusters are formed. Such continuous updates of cluster details are useful for incremental learning and knowledge augmentation. III. DEFINITIONS OF INCREMENTAL LEARNING As mentioned by author in paper [1] the first definition of incremental learning is: Definition 1 A learning task is incremental if the training examples used to solve it become available over time, usually one at a time.[1] Note that it is undesirable to wait for a long time to achieve incremental learning. Hence while designing new incremental learning algorithm there is an implicit assumption that waiting to learn is undesirable. Here are few scenarios of incremental learning. In all these scenarios incrementality is achieved based on human behavior, unpredictable situations, past experience, to name a few. For example, to improve the accuracy of software project development estimates related to cost, effort, duration etc. prior experience and or empirical project data is essential. Duration of software project development is generally long and hence useful quantitative data is made available in bits and pieces over time. Forecasting software project details is considered as one of the case studies of our research. IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 2 7 ISSN (Print):2278-5698 Another example to quote is about changes in human behavior. Users’ behavior and interests may vary over time, as less as in 3-6 months. Hence according to experts, incremental learning algorithm should be well equipped with continuous monitoring of user behavior, which will be useful for predictions and learning. In case of intelligent devices based on robotics principals and AI technology, the environment in which they are used may vary constantly. To achieve given tasks successfully intelligent device should adapt incrementally to the changes in environment and act accordingly. With these examples, the main characteristics of an incremental learning task are as follows: - Incremental learning is indefinite and - New examples are available over time, generally in batches. Definition 2 A learning algorithm is incremental if, for any given training sample e1; : : : ; en, it produces a sequence of hypotheses h0; h1; : : : ; hn, such that hi+1 depends only on hi and the current example ei.[1] In paper [1] in addition to these definitions of incremental learning, author named incremental learning algorithms as memory less online algorithms. To develop new incremental learning algorithm, online consistent monitoring for new information is preferred with assumption of huge free memory. IV. INCREMENTAL LEARNING ALGORITHMS Incremental learning algorithms are required to enhance knowledge required for forecasting and estimations, in dynamic, non-stationary organizational environment. An ideal incremental learning algorithm should posses: - Accommodate new information as and when available - Ability to work with unlabelled data - Ability to handle multidimensional data - Bounded complexity - Learn incrementally from empirical data - Handle changes in concepts etc. The following is a list of incremental learning algorithms implementing above mentioned characteristics. a. K-NEAREST NEIGHBOUR It is one of the pioneered algorithms for classifying cases based on their closeness in problem space. It is also called cased based reasoning machine learning algorithm. To explain further, a case is assigned a class based on how close its values of certain attributes are to the values of attributes from previously seen cases. The final classification is the most common closest k cases. this algorithm learns incrementally and knowledge is enhanced with each new case without rescanning from scratch. The list of disadvantages associated with this algorithm includes: - efficiency of method substantially decreases with increase in number of cases - this method cannot work with large number of attributes and cases - output produced by this method is not user friendly - this may cause the system to become inefficient [2] b. INCREMENTAL SVM ALGORITHM As mentioned by authors in [3], the major application of Incremental SVM is in online learning. Incremental SVM is fast, numerically stable and robust algorithm. Author in [3] suggests a new improved design as compared to previous L1 and L2 versions which improves convergence, complexity and speed of training by factor of 5 to 20. Incremental SVM is proved to work effectively with limited resources as well perform active learning in dynamic environment. Embryonic applications of incremental SVM learning include, among others include network intrusion detection, network surveillance, monitoring of non-stationary time series etc. The major drawback of this incremental algorithm is its huge memory requirement, as set of support vectors need to be stored during entire learning process. Because of which this algorithm works best with only tens of thousands of example, not beyond that. The datasets in these boundaries Incremental SVM provides exact solution for non-stationary data. IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 2 8 ISSN (Print):2278-5698 c. CANDIDATE-ELIMINATION The input to this incremental algorithm is a set of E examples and space H. This algorithm incrementally builds [4] the version space based on given input. The examples are added one at a time. The version space is minimized by removing inconsistent hypothesis with each example. This is achieved by updating general and specific boundaries for each and specific new example. It requires noise free positive and negative examples. The inductive bias of this algorithm is that it is only able to classify a new piece of data if the entire hypothesis contained within its version space gives the data the same classification. Hence the inductive bias does impose a limitation on the learning method [5]. d. COBWEB This incremental algorithm suits best to categorical data. It forms a cluster tree, and leaf nodes indicate actual clusters. On arrival of new information, entire tree need to be searched. Number of clusters, depth of category memberships is decided by category utility function. While traversing the entire tree for finding suitable cluster match, it may form a skewed tree or imbalance tree. Tree traversal is mandatory with each new information is time consuming and increases memory utilization, along with complexity, merging and dividing of tree nodes etc. There are many more incremental algorithms including ID5, ID5R, ILA, IOLIN, IPYRAMID, GDCP, Incremental k-means, INTHELEX and Naive based approach. Authors in [12] quoted “In every learning stage, an algorithmic learner takes as input one element of an information sequence for some target concept and its previously made hypothesis and outputs a new hypothesis. The sequence of hypotheses has to converge to a hypothesis describing the target concept correctly.” V. DATA INCREMENTAL LEARNING It is indicated that on certain difficult learning tasks where non-incremental learning has a low probability of success, incremental learning is more efficient. Why and when incremental learning works depends on several factors which are inherent to specific learning tasks. SELF, Selective Learning with Flexible Neural Architectures works on data and network data sets. A small subset of training set is given as input initially which will be increased incrementally. In case of failure of training converge, unseen network units are added to increase the capacity. One more form of incremental learning is termed as adaptive or parameter learning. Such learning involves adjustment of parameters related to learning rule. Local and global adaptive learning are two different categories of this learning. RProp and QuickProp are examples of local adaptive learning techniques. They are called local adaptive learning, as this technique considers only local information such as weight of single attribute, while learning and parallel processing. [Neural Network Learning also uses parallel processing]. In contrast to local adaptive learning, global techniques use information about entire network, i.e. overall / average weight of impactful attributes called conjugate gradients. Incorporating weights of all attributes and considering overall network, this method requires more computations. The important advantage of this method is faster convergence rate, as compared to back propagation learning, the benchmark technique. Epoch learning, pattern learning are some more terms used synonymously with incremental learning, as mentioned in [6]. If variable weights are updated after each presentation of single pattern, it is termed as pattern learning. In contrast if variable weights are updated after presentation of entire training set then epoch learning is used. If training set contains lot of redundant information then pattern learning is preferred. Online learning is another term used synonymously with incremental and pattern learning. Pattern learning need not be online, as reuse of data is feasible in pattern learning and not feasible in online learning. More details are found in [6]. VI. COMPONENTS REQUIRED FOR LEARNING FROM EXPERIENCE [7] Set of learning tasks required varies from domain, application, environment, data category etc. In machine learning most of the learning-methods are similar in many respects. In the paper [7] [8] authors suggested general components required for learning from experience, so as to compare specific learning method with others. Here is the list of suggested components: IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 2 9 ISSN (Print):2278-5698 -clustering - initialization - projection - evaluation - refinement - aggregation - storage This is not the full list of learning components, but a suggested list to achieve learning, knowledge expansion based on formed concepts into pruned parts. VII. CONCLUDING REMARKS Various computer systems, network of systems, advances in technology, different modern gadgets etc. become ever increasing part of people’s lives. These systemic changes extend to new and challenging areas of research, and hence require more effective and flexible learning algorithms. Hence incrementality need to be entrenched in design of artificial intelligence as artificial learning systems. Our research suggests incrementality only adds flexibility and broadens applicability. As described in introduction section of this tutorial, it is proved that an algorithm possesses incremental learning capabilities, if it meets the following criteria [10]: 1. Ability to acquire additional knowledge when new datasets are introduced. 2. Ability to retain previously learned information. 3. Ability to learn new classes if introduced by new data. Our research suggests statistical, order independent, closeness based, simple incremental clustering algorithm which achieves incremental learning and knowledge augmentation effectively. REFERENCES [1] Christophe Giraud-Carrier, “A Note on the Utility of Incremental Learning, ” Department of Computer Science, University of Bristol, Bristol BS8 1UB, U.K. AI Communications ISSN 0921-7126, IOS Press1998 Available: http://www.cs.bris.ac.uk/Publications/Papers/1000535.pdf [2] D.J. Hand and V. Vinciotti (2003) Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters, 24, 1555-1562. [3] Pavel Laskov, Christian Gehl, Stefan Kr¨uger, “Incremental Support Vector Learning: Analysis, Implementation and Applications, ” Fraunhofer-FIRST.ID, Kekul´estrasse, 12489 Berlin, Germany, Journal of Machine Learning Research (2006) 1909–1936 Submitted 10/05; Revised 3/06; Published 9/06 Available: http://jmlr.csail.mit.edu/papers/volume7/laskov06a/laskov06a.pdf [4] David Poole and Alan Mackworth, “Candidate Elimination Algorithm, Artificial Intelligence, Foundations of Computational Agents,” 2010, First edition book series. Available: http://artint.info/html/ArtInt_193.html [5] Ben Coppin, “Artificial Intelligence Illuminated,” Library of Catalogin-published data, Edition I [6] Stephan K. and Chalup, “Incremental Learning in Biological and Machine Learning Systems”, International Journal of Neural Systems, Vol. 12, No. 6 (2002), pp: 447{465} World Scientific Publishing Company Available: http://www.cs.newscastle.edu.au/~chalup/papers/chalupIJNS2002.pdf [7] Jeffrey C. Schlimmer, Richard H. Granger, Jr., “Incremental Learning from Noisy Data”, Journal of Machine Learning 1, pp 317-354, 1986.Available: http://link.springer.com/content/pdf/10.1007%2FBF00116895 [8] Tristan Ronald Ling, “An Incremental Learning Method for Data Mining from Large Databases,” University of Tasmania, 2006. Available: http://eprints.utas.edu.au/793/trling_Honours_Thesis.pdf [9] Samaneh Khoushrou, Jaime Cardoso, Liis Teixerira, “Evolution of different Incremental Learning Methods for Video Surveillance scenarios”, Faculdade de Engenharia Universidade do Porto, 2010 Available: http://www.inescporto.pt/~jsc/publictions/conferences/2012SKhoushrouRECPAD.pdf [10] Prerana Gupta, Amit Thakkar, Amit Ganatra, “Comprehensive study on techniques of Incremental learning with decision trees for streamed data,” International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-3, February 2012. Available: http://www.ijeat.org/attachments/File/V1Issue3/C0201021312.pdf IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 3 0 ISSN (Print):2278-5698 [11] Antoine Cornuéjols, “Getting order Independence in Incremental Learning,” Reprinted from European Conf. On Lachine Learning (1993) pp.196-212 Available: http://www.lri.fr/~antoine/Papers/ac-ecml-93.pdf [12] Steffen Lange and Gunter Grieser , “On the Power of Incremental Learning,” Theoretical Computer Science Vol. 288, Issue 2, 17 September 2002, pp. 277 - 307. Available: http://www-alg.ist.hokudai.ac.jp/~thomas/ALT99/ABS/lgtcs.html [13] Nong Ye, Xiangyang Li, “A Scalable, Incremental Learning Algorithm for Classification Problems,” Journal of Computers & Industrial Engineering, Elsevier, Vol. 43 (2002), pp: 677–692 Available: http://enpub.fulton.asu.edu/ye/Published_journal_Papers/Ye_39.pdf Author: Preeti Mulay is a full time research student at Bharati Vidyapeeth University, Pune, India. Her research areas include Machine Learning, Incremental clustering and Incremental learning. She has completed her MS in Software Engineering from Wayne State University, MI, USA, also M.Tech in Software Engineering from JNTU Hyderabad and BE in Computer Technology. She has couple of renowned publications on her name and papers presented in various conferences and journals. She is in the education field and teaching at various levels including Corporate Training since 1995". IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 3 1 ISSN (Print):2278-5698