Download Incremental Learning - GH Raisoni Institute Of Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Tutorial on
Incremental Learning
Ms. Preeti Mulay
I.
MOTIVATION
Life-long learning, learning from examples, learning based on historical / empirical data is one of the interesting
research problems. Our research aims at solving such research problems. Our goal is to develop a new method
of clustering which allows incremental learning when new examples of known cluster structure become
available. Completely new clusters are discovered when new examples are available. Consequently, we aim at
developing new method to effectively incorporate new knowledge without loss of information which is already
known.
Incremental learning aims at enhancement of knowledge. Aim of our research is to develop self adaptive
algorithm, which involves improving performance over time. To explain with example: enhancement of
knowledge in humans happen over time, as all the information necessary to learn many concepts is rarely
available a priori. Knowledge is constantly revised (evolves) as new pieces of information is made available
over time. We term it “knowledge augmentation”. Hence it is feasible to achieve knowledge augmentation via
incremental learning.
II.
INTRODUCTION
Modern computing, technological advances and information technology has enabled large amount of
engineering data collection. To discover hidden patterns for learning and knowledge enhancement for
forecasting and estimation, various data mining and advanced machine learning techniques are required. These
techniques focuses on pre-processing of raw data to remove noise and select impactful factors, to discover
patterns, models etc. Clustering is one of the important areas of data mining. “Incremental Clustering” is the
most important area of advance data mining which enables “incremental learning”. Patterns embedded in large
volume of organizational data may change over time. Ideal advanced data mining technique should have
incremental learning ability to update existing patterns with influx of new data.
This research introduces new incremental clustering method based on closeness of data points. This new method
works in two phases. In initial phase clusters are formed from raw data set, after pre-processing, and stored in
cluster database for future reuse, along with cluster characteristics (cluster members, centre of cluster, distance
between cluster, threshold values etc). Once basic clusters are formed, on influx of new data tuples, either
existing clusters will be updated or new clusters are formed. Such continuous updates of cluster details are
useful for incremental learning and knowledge augmentation.
III.
DEFINITIONS OF INCREMENTAL LEARNING
As mentioned by author in paper [1] the first definition of incremental learning is:
Definition 1 A learning task is incremental if the training examples used to solve it become available over time,
usually one at a time.[1]
Note that it is undesirable to wait for a long time to achieve incremental learning. Hence while designing new
incremental learning algorithm there is an implicit assumption that waiting to learn is undesirable. Here are few
scenarios of incremental learning. In all these scenarios incrementality is achieved based on human behavior,
unpredictable situations, past experience, to name a few. For example, to improve the accuracy of software
project development estimates related to cost, effort, duration etc. prior experience and or empirical project data
is essential. Duration of software project development is generally long and hence useful quantitative data is
made available in bits and pieces over time. Forecasting software project details is considered as one of the case
studies of our research.
IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 2 7 ISSN (Print):2278-5698
Another example to quote is about changes in human behavior. Users’ behavior and interests may vary over
time, as less as in 3-6 months. Hence according to experts, incremental learning algorithm should be well
equipped with continuous monitoring of user behavior, which will be useful for predictions and learning.
In case of intelligent devices based on robotics principals and AI technology, the environment in which they are
used may vary constantly. To achieve given tasks successfully intelligent device should adapt incrementally to
the changes in environment and act accordingly.
With these examples, the main characteristics of an incremental learning task are as follows:
- Incremental learning is indefinite and
- New examples are available over time, generally in batches.
Definition 2 A learning algorithm is incremental if, for any given training sample e1; : : : ; en, it produces a
sequence of hypotheses h0; h1; : : : ; hn, such that hi+1 depends only on hi and the current example ei.[1]
In paper [1] in addition to these definitions of incremental learning, author named incremental learning
algorithms as memory less online algorithms. To develop new incremental learning algorithm, online consistent
monitoring for new information is preferred with assumption of huge free memory.
IV.
INCREMENTAL LEARNING ALGORITHMS
Incremental learning algorithms are required to enhance knowledge required for forecasting and estimations, in
dynamic, non-stationary organizational environment. An ideal incremental learning algorithm should posses:
- Accommodate new information as and when available
- Ability to work with unlabelled data
- Ability to handle multidimensional data
- Bounded complexity
- Learn incrementally from empirical data
- Handle changes in concepts etc.
The following is a list of incremental learning algorithms implementing above mentioned characteristics.
a. K-NEAREST NEIGHBOUR
It is one of the pioneered algorithms for classifying cases based on their closeness in problem space. It is also
called cased based reasoning machine learning algorithm. To explain further, a case is assigned a class based on
how close its values of certain attributes are to the values of attributes from previously seen cases. The final
classification is the most common closest k cases. this algorithm learns incrementally and knowledge is
enhanced with each new case without rescanning from scratch. The list of disadvantages associated with this
algorithm includes:
- efficiency of method substantially decreases with increase in number of cases
- this method cannot work with large number of attributes and cases
- output produced by this method is not user friendly
- this may cause the system to become inefficient [2]
b. INCREMENTAL SVM ALGORITHM
As mentioned by authors in [3], the major application of Incremental SVM is in online learning. Incremental
SVM is fast, numerically stable and robust algorithm. Author in [3] suggests a new improved design as
compared to previous L1 and L2 versions which improves convergence, complexity and speed of training by
factor of 5 to 20. Incremental SVM is proved to work effectively with limited resources as well perform active
learning in dynamic environment. Embryonic applications of incremental SVM learning include, among others
include network intrusion detection, network surveillance, monitoring of non-stationary time series etc.
The major drawback of this incremental algorithm is its huge memory requirement, as set of support vectors
need to be stored during entire learning process. Because of which this algorithm works best with only tens of
thousands of example, not beyond that. The datasets in these boundaries Incremental SVM provides exact
solution for non-stationary data.
IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 2 8 ISSN (Print):2278-5698
c.
CANDIDATE-ELIMINATION
The input to this incremental algorithm is a set of E examples and space H. This algorithm incrementally builds
[4] the version space based on given input. The examples are added one at a time. The version space is
minimized by removing inconsistent hypothesis with each example. This is achieved by updating general and
specific boundaries for each and specific new example. It requires noise free positive and negative examples.
The inductive bias of this algorithm is that it is only able to classify a new piece of data if the entire hypothesis
contained within its version space gives the data the same classification. Hence the inductive bias does impose a
limitation on the learning method [5].
d. COBWEB
This incremental algorithm suits best to categorical data. It forms a cluster tree, and leaf nodes indicate actual
clusters. On arrival of new information, entire tree need to be searched. Number of clusters, depth of category
memberships is decided by category utility function. While traversing the entire tree for finding suitable cluster
match, it may form a skewed tree or imbalance tree. Tree traversal is mandatory with each new information is
time consuming and increases memory utilization, along with complexity, merging and dividing of tree nodes
etc.
There are many more incremental algorithms including ID5, ID5R, ILA, IOLIN, IPYRAMID, GDCP,
Incremental k-means, INTHELEX and Naive based approach.
Authors in [12] quoted “In every learning stage, an algorithmic learner takes as input one element of an
information sequence for some target concept and its previously made hypothesis and outputs a new hypothesis.
The sequence of hypotheses has to converge to a hypothesis describing the target concept correctly.”
V.
DATA INCREMENTAL LEARNING
It is indicated that on certain difficult learning tasks where non-incremental learning has a low probability of
success, incremental learning is more efficient. Why and when incremental learning works depends on several
factors which are inherent to specific learning tasks.
SELF, Selective Learning with Flexible Neural Architectures works on data and network data sets. A small
subset of training set is given as input initially which will be increased incrementally. In case of failure of
training converge, unseen network units are added to increase the capacity.
One more form of incremental learning is termed as adaptive or parameter learning. Such learning involves
adjustment of parameters related to learning rule. Local and global adaptive learning are two different categories
of this learning. RProp and QuickProp are examples of local adaptive learning techniques. They are called local
adaptive learning, as this technique considers only local information such as weight of single attribute, while
learning and parallel processing. [Neural Network Learning also uses parallel processing]. In contrast to local
adaptive learning, global techniques use information about entire network, i.e. overall / average weight of
impactful attributes called conjugate gradients. Incorporating weights of all attributes and considering overall
network, this method requires more computations. The important advantage of this method is faster convergence
rate, as compared to back propagation learning, the benchmark technique.
Epoch learning, pattern learning are some more terms used synonymously with incremental learning, as
mentioned in [6]. If variable weights are updated after each presentation of single pattern, it is termed as pattern
learning. In contrast if variable weights are updated after presentation of entire training set then epoch learning
is used. If training set contains lot of redundant information then pattern learning is preferred. Online learning is
another term used synonymously with incremental and pattern learning. Pattern learning need not be online, as
reuse of data is feasible in pattern learning and not feasible in online learning. More details are found in [6].
VI.
COMPONENTS REQUIRED FOR LEARNING FROM EXPERIENCE [7]
Set of learning tasks required varies from domain, application, environment, data category etc. In machine
learning most of the learning-methods are similar in many respects. In the paper [7] [8] authors suggested
general components required for learning from experience, so as to compare specific learning method with
others. Here is the list of suggested components:
IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 2 9 ISSN (Print):2278-5698
-clustering
- initialization
- projection
- evaluation
- refinement
- aggregation
- storage
This is not the full list of learning components, but a suggested list to achieve learning, knowledge expansion
based on formed concepts into pruned parts.
VII.
CONCLUDING REMARKS
Various computer systems, network of systems, advances in technology, different modern gadgets etc. become
ever increasing part of people’s lives. These systemic changes extend to new and challenging areas of research,
and hence require more effective and flexible learning algorithms. Hence incrementality need to be entrenched
in design of artificial intelligence as artificial learning systems. Our research suggests incrementality only adds
flexibility and broadens applicability.
As described in introduction section of this tutorial, it is proved that an algorithm possesses incremental learning
capabilities, if it meets the following criteria [10]:
1. Ability to acquire additional knowledge when new datasets are introduced.
2. Ability to retain previously learned information.
3. Ability to learn new classes if introduced by new data.
Our research suggests statistical, order independent, closeness based, simple incremental clustering algorithm
which achieves incremental learning and knowledge augmentation effectively.
REFERENCES
[1] Christophe Giraud-Carrier, “A Note on the Utility of Incremental Learning, ” Department of Computer Science, University of Bristol,
Bristol BS8 1UB, U.K. AI Communications ISSN 0921-7126, IOS Press1998
Available: http://www.cs.bris.ac.uk/Publications/Papers/1000535.pdf
[2] D.J. Hand and V. Vinciotti (2003) Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition
Letters, 24, 1555-1562.
[3] Pavel Laskov, Christian Gehl, Stefan Kr¨uger, “Incremental Support Vector Learning: Analysis, Implementation and Applications, ”
Fraunhofer-FIRST.ID, Kekul´estrasse, 12489 Berlin, Germany, Journal of Machine Learning Research (2006) 1909–1936 Submitted 10/05;
Revised 3/06; Published 9/06
Available: http://jmlr.csail.mit.edu/papers/volume7/laskov06a/laskov06a.pdf
[4] David Poole and Alan Mackworth, “Candidate Elimination Algorithm, Artificial Intelligence, Foundations of Computational Agents,”
2010, First edition book series. Available: http://artint.info/html/ArtInt_193.html
[5] Ben Coppin, “Artificial Intelligence Illuminated,” Library of Catalogin-published data, Edition I
[6] Stephan K. and Chalup, “Incremental Learning in Biological and Machine Learning Systems”, International Journal of Neural Systems,
Vol. 12, No. 6 (2002), pp: 447{465} World Scientific Publishing Company
Available: http://www.cs.newscastle.edu.au/~chalup/papers/chalupIJNS2002.pdf
[7] Jeffrey C. Schlimmer, Richard H. Granger, Jr., “Incremental Learning from Noisy Data”, Journal of Machine Learning 1, pp 317-354,
1986.Available: http://link.springer.com/content/pdf/10.1007%2FBF00116895
[8] Tristan Ronald Ling, “An Incremental Learning Method for Data Mining from Large Databases,” University of Tasmania, 2006.
Available: http://eprints.utas.edu.au/793/trling_Honours_Thesis.pdf
[9] Samaneh Khoushrou, Jaime Cardoso, Liis Teixerira, “Evolution of different Incremental Learning Methods for Video Surveillance
scenarios”, Faculdade de Engenharia Universidade do Porto, 2010
Available: http://www.inescporto.pt/~jsc/publictions/conferences/2012SKhoushrouRECPAD.pdf
[10] Prerana Gupta, Amit Thakkar, Amit Ganatra, “Comprehensive study on techniques of Incremental learning with decision trees for
streamed data,” International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-3, February
2012. Available: http://www.ijeat.org/attachments/File/V1Issue3/C0201021312.pdf
IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 3 0 ISSN (Print):2278-5698
[11] Antoine Cornuéjols, “Getting order Independence in Incremental Learning,” Reprinted from European Conf. On Lachine Learning
(1993) pp.196-212 Available: http://www.lri.fr/~antoine/Papers/ac-ecml-93.pdf
[12] Steffen Lange and Gunter Grieser , “On the Power of Incremental Learning,” Theoretical Computer Science Vol. 288, Issue 2, 17
September 2002, pp. 277 - 307. Available: http://www-alg.ist.hokudai.ac.jp/~thomas/ALT99/ABS/lgtcs.html
[13] Nong Ye, Xiangyang Li, “A Scalable, Incremental Learning Algorithm for Classification Problems,” Journal of Computers & Industrial
Engineering, Elsevier, Vol. 43 (2002), pp: 677–692 Available: http://enpub.fulton.asu.edu/ye/Published_journal_Papers/Ye_39.pdf
Author:
Preeti Mulay is a full time research student at Bharati Vidyapeeth University, Pune, India. Her research areas include Machine Learning,
Incremental clustering and Incremental learning. She has completed her MS in Software Engineering from Wayne State University, MI,
USA, also M.Tech in Software Engineering from JNTU Hyderabad and BE in Computer Technology. She has couple of renowned
publications on her name and papers presented in various conferences and journals. She is in the education field and teaching at various
levels including Corporate Training since 1995".
IJACKD JOURNAL OF RESEARCH | VOL 2 | ISSUE 1 |FEBRUARY 2013|Page 3 1 ISSN (Print):2278-5698