Download Improved And Ensemble Methods For Time Series Classification

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
International Journal of Science and Engineering Research (IJ0SER),
Vol 3 Issue 11 November-2015
3221 5687, (P) 3221 568X
Improved And Ensemble Methods For Time Series
Classification With Cote
1
Yamunadevi S1, Dr.S.Uma2 ,Bhuvaneshwari P3
Student, 2Head of the deportment, 3 Assistant Professor,
Department of Computer Science and Engineering
Hindusthan Institute of Technology,
Coimbatore, Tamilnadu
India
Abstract -The main objective of this research is to analyze the state of the art techniques for time series classification dataset and discover an
efficient method that overcomes the major issues. And also this scenario is used to improve the classification accuracy and reducing the time
complexity using ensemble algorithm. Methods:The analysis has been done by different methods to provide the efficient time series
classification results. The diverseefficientmethods are considered to find the appropriateand effectivemethod. Findings:The various research
works has been analyzed and evaluated. From the analysis, the voting ensemble methodis found to be better for time series classification also
superior performance is achieved in terms of precision, recall, f-measure and accuracy. Application/Improvements:The findings of this work
PROVE that the voting method provides better result than other approaches.
Keyword:Data mining, time series classification, ensemble and voting method
1. INTRODUCTION
Data mining is used to extract the important data from the large
repositories. Data mining can be executed on data signified in
quantitative, textual or multimedia forms. Data mining
application could use several parameters to inspect the data.
They contain the concepts such as association, sequence
analysis, classification, clustering and forecasting. Association is
defined as patterns where one event is connected to other event,
like purchasing pen and bag. Path analysis or sequence is
described as patterns where one event guides to other event, like
the birth of a child and buying diapers. Classification is defined
as the identification of new patterns such as coincidence among
duct type purchases and plastic sheeting purchases. Clustering is
described as determining and visually documenting groups of
previously unknown facts, like geographic location and brand
preferences. Forecasting is nothing but determining patterns
from which one can make reasonable predictions regarding
future activities. Data mining tools are used to forecast the future
trends and behaviors.
In general, data mining is used in extensive series of profiling
practices like marketing, surveillance, fraud detection and
scientific discovery. A prominent concept is that constructing an
extracting model is part of superior process which contains
everything from defining the basic crisis that the representation
will resolve, to arranging the model into a functioning
environment.Data mining methods are the result of long research
process and product development. This development start while
business data was first saved on computers, continued with
improvements in data access, and more recently, generated
expertise’s that permit users for navigating via their data in real
time.
Yamunadevi ,Bhuvaneshwari,Uma…. (IJ0SER) November - 2015
Classification is defined as the process of analysis and
categorized into different classes. It is used to predict class labels
based on the specified model. The time series classification is
defined as constructing the classification model depends on
labeled time series and utilizes this model to classify the label of
unlabeled time series. The feature extraction methods applied
and build features from the time series data initially, then use the
classification methods such as support vector machine, k nearest
neighbor, regression and decision tree to the feature set.A novel
time series primitive called as time series shapeletswhich is used
to predict most informative data [1].
Time series similarity is an important and great attention in data
mining applications. Several time series representation and
distance measures are produced. The preceding research was
focused on the shape based similarity matching for low
dimensional dataset [2]. For high dimensional dataset, the
ensemble approaches and shapelets approaches are introduced.
The several data mining concepts and techniques are utilized to
evaluate the time series classification data.
2. ANALYSIS OF RESEARCH METHODOLOGY
In [3], AbdhullahMueen et.al (2011) discussed about logicalshapelets for time series classification. This scenario is notices
the issue through introducing a new algorithm that discovers
shapelets in a smaller amount of time than present methods via
an order of magnitude. The proposed algorithm is depends on
intelligent caching and reclaim of calculations, and the allowable
pruning of the seek space. Since the current algorithm is so
speedy, it makes an opening to consume extra communicative
shapelet queries. In specific, the scenario has shown for the first
time an amplified shapelet illustration that differentiates the
records depends on conjunctions or disjunctions of shapelets. In
International Journal of Science and Engineering Research (IJ0SER),
Vol 3 Issue 11 November-2015
3221 5687, (P) 3221 568X
this research, use a novel representation logicalshapelets.
Itdisplay the effectiveness of this approach on the
typicalstandard datasets utilized for these problems, and
demonstratenumerous case studies where logical shapelets
extensively achieve the innovative shapelet representation and
other time series categorizationmethods. We illustrate the utility
of thoughts in domains as various as motionidentification,
robotics, and biometrics. Consequently this technique has been
identified as the improvedmethod as the approach presents deep
study of superior accuracy classification for the given datasets.
The effectiveness of this scenario is better than the conventional
approaches. However it does not improve the worst case
complexity.
In [4], Lexiang Ye et.al (2011) suggested a novel method for
time series shapelets. Classification of time series has been
attracting great interest over the past decade. The simple nearest
neighbor approach is hard to handle the several time series
problems, particularly for high dimensional datasets. Initially,
the nearest neighbor approachneeds storing and seeking the
complete dataset, resulting in a long time and space
difficultywhich restricts its applicability, particularly on resource
restricted sensors. Subsequently, beyond simplecategorization
accuracy, we frequentlydesire to achieve some impending into
the information and to construct the categorization result more
understandable, which global uniqueness of the nearest neighbor
cannot make available. In this research, we introduced novel
time series primitive, time series shapelets, which focused these
restrictions. Casually, in some sense shapelets are time series
subsequences which are maximally commissioner of a group.
This scenario can utilize the distance to the shapelet, than the
distance to the nearest neighbor to categorize objects. It has
shown along with widespread empirical estimations in various
domains, categorization algorithms depends on the time series
shapelet primitives can be interpretable, more precise, and
considerablyquicker.
In [5], Jason Lines et.al (2012) presented a shapelet transform for
time series classification. The problem of time series
classification (TSC), the research consider any real-valued
structuredinformation a time series, provides a particular
machine learning task as the organizing of variables is
frequentlyessential in discovering the optimalselective features.
One of the most emerging current techniques is to discover shape
lets within a data set. A shape let is a time series subsequence
which is recognized as being characterized of class association.
The research scenario in this area embedded the process of
determining shape lets within a decision tree. We recommend
disconnecting the procedure of decision shape lets from the
categorization algorithm through proposing a shape let
transformation. Itportrayhowmined the k best shape lets from a
data set in a single pass, and utilize these shape lets to change
information by means of computing the distances from a series
to every shape let. This research demonstrateswhich
transformation
into
this
novel
data
space
couldprogresscategorizationaccurateness, whilemaintaining the
descriptive power presented through shape lets.
Yamunadevi ,Bhuvaneshwari,Uma…. (IJ0SER) November - 2015
In [6], Mustafa GokceBaydogan (2013) discussed a bag of
features structure to categorize time series.Time series
categorization is an essentialprocess along with several
challenging applications. A nearest neighbor (NN) classifier
along with dynamic time warping (DTW) distance is a powerful
solution in this research area. On the other hand, a feature-based
technique is enhanced as both classifiers and to presentimminent
into the series, however these methods has issue indealing
translations and dilations in local patterns. To overcome the
above mentioned issues,the scenario proposed a structure to
categorize time series based on a bag-of-features representation
(TSBF). Multiple subsequences chosen from random settings
and of random lengths are separated into smaller intervals to
hold the local information. Therefore, features calculated from
these subsequences determine properties at diverse locations and
dilations when inspected from the novel series. It presents a
feature-based method which can deal warping though in a
different way from DTW. Furthermore, a supervised learner
whichdeals mixed data types, diverse units and incorporates
location data into a compact codebook via class probability
computes. Moreover, applicable global features couldsimply
supplement the codebook. TSBF is evaluated to NN classifiers
and other alternatives such as bag-of words strategies, sparse
spatial sample kernels alsoshapelets.
In [7], Houtao Deng (2013)presented a time series forest for
classification and feature extraction. A tree-ensemble technique,
named as time series forest (TSF), is introduced for time series
categorization. TSF uses a mixture of entropy gain and a
distance measure, called as the entry (entropy and distance) gain,
for estimating the splits. An important feature is extracted which
captures the temporal characteristics for high feature space. In
this research work, the method is used to illustrate that the
entrance gain increases the accuracy of TSF. TSF arbitrarily
samples features ineverytree node and has execution complexity
linear in the length of time series, and couldconstructedby using
parallel computing methods. The temporal significance curve is
enhanced to obtain the temporal distinctiveness useful for
categorization. The scenario finallydisplay that TSF using simple
features such as mean, standard deviation and slope is
computationally resourceful and performs powerful competitors
such as one-nearest-neighbor classifiers along with dynamic time
warping.
In [8], Jon Hills et.al (2014) discussed classification of time
series by shapelet transformation. Time-series classification
(TSC) problems provide a particular challenge for classification
algorithms that how to calculatelikenessamong series. A
shapeletis a time-series subsequence which permits for TSC
depends on local, phase-independent likeness in shape. Shapeletbased categorization utilizes the similarity among a shapelet and
a series as aunbalanced feature. One advantage of the shape let
method is that shapelets are understandable, and
couldpresentimminent into the problem domain. The original
shapelet-based classifier inserts the shapelet-discovery approach
in a decision tree, and utilizes information gain to evaluate the
quality of candidates, discovering a novelshapeletbyall node of
International Journal of Science and Engineering Research (IJ0SER),
Vol 3 Issue 11 November-2015
3221 5687, (P) 3221 568X
the tree via an enumerative search. Subsequent scenario is
concentrated mainly on methods to speed up the search. This
scenario has analysis how most excellent to use the shapelet
primitive to build classifiers. In this research, method introduced
named as single-scan shapelet algorithm which determines the
best k shapelets,which are utilized to construct a transformed
dataset, where each of the k features symbolized the distance
among a time series and a shapelet. The main benefits over the
embedded method are that transformed data is used in
conjunction along with any classifier, and that there is no
recursive search for shapelets. Itreveals that the transformed
data, in conjunction along with more complex classifiers,
producessuperior accuracy rather than the embedded shapelet
tree. This scenario also executes three similarity measures that
generate equivalent results to information gain in small amount
of time.
In [9], Anthony Bagnall et.al (2015) suggested time series
classification with collective of transformations based ensemble
methods. It is mainly used for increasing the classification
accuracy than preceding research. Another algorithm is named as
Time series classification (TSC) which is used for transformation
process which is based on comparative features. COTE contains
classifiers constructed in the time, frequency, change, and
shapelet transformation domains combined in alternative
ensemble structures.The flat collective of transform-based
ensembles (flat-COTE) weights the vote of each classifier by its
cross-validation accuracy on the training data.We compare the
accuracy ranks of ensembles constructed on each transform
domain, flat-COTE, and, for bench marking, 1-NN with
Euclidean distance and dynamic time warping with warping
window set through cross validation. The mean rank of flatCOTE is significantly higher than all of the other classifiers.
The information provided by the shapelet transform and, to a
lesser extent, the power spectrum and change domains, provides
discriminatory features that are hard, if not impossible, to detect
in the time domain. COTE is used to provide better performance
for all specified dataset than other datasets. Voting method is
used to produce higher performance in terms of accuracy,
precision, recall and f-measure rather the preceding research.
Thus this method has been identified as the better technique as
the approach provides deep analysis of higher classification
performance for the specified datasets. The efficiency of this
approach is better than the other techniques and also it provides
pathway for further improvement.
provisioning the future enhanced than can be performed to
overcome its shortcomings.
Table 1. Comparison of Research Methodologies
S.N
O
TITLE
PAPER
OF
AUTHO
R NAME
MERITS
DEMERIT
S
1
Logicalshapelets: an
expressive
primitive for
time
series
classification
Mueen,
Abdullah
, Eamonn
Keogh,
and Neal
Young
Efficiency
is improved
It has worst
case
complexity.
2
Time
series
shapelets:
a
novel
technique that
allows
accurate,
interpretable
and
fast
classification
A
shapelet
transform for
time
series
classification
Ye,
Lexiang,
and
Eamonn
Keogh
Space
complexity
is reduced
Training
time takes
long time
Lines,
Jason, et
al
Increased
the
classificatio
n accuracy
It degrades
the
multi
class
problems
Baydoga
n,
Mustafa
Gokce,
George
Runger,
and
Eugene
Tuv
It
deals
with high
dimensiona
l data
It does not
satisfy
particular
datasets
A time series
forest
for
classification
and
feature
extraction
Deng,
Houtao,
et al
High
quality
results
Classification
of time series
by
shapelet
transformation
Hills,
Jon, et al.
3
4
5
A
bag-offeatures
framework to
classify time
series
Yamunadevi ,Bhuvaneshwari,Uma…. (IJ0SER) November - 2015
6
Essential
features are
selected
of
Space
efficiency is
reduced
High
interpretabl
e results
3. COMPARISON METHODOLOGIES
This section provides an overview about the pros and cons that
are occurred in the research methodologies whose functional
scenarios are discussed in depth in the previous section. From
the following table, it can be predicted a better approach that
provides considerable improvement in the proposed scenarios.
The table helps in detecting the efficient technique as well as in
It handles
noisy
dataset
effectively
It handles
large
volume of
dataset
It is slow
process
International Journal of Science and Engineering Research (IJ0SER),
Vol 3 Issue 11 November-2015
7
Time-Series
Classification
with COTE:
The Collective
of
Transformatio
n-Based
Ensembles.
Bagnall,
Anthony,
et al
Accuracy is
increased
Space
storage is
more robust
It requires
further
improveme
nt
Overall
performanc
e
is
improved
Time
complexity
is reduced
4. CONCLUSION
In this section, the conclusion decides that various recent
progress in time series classification problems. The time series
classification is an important factor and various classification
methods are applied to research. The classification accuracy and
performance is improved significantly by using various
techniques. In this analysis, survey of different methodologies is
conducted and various performance parameters are used to
discover the better approach for better classification results. The
voting ensemble method provides higher classification
performance rather than other techniques.
5. ACKNOWLEDGEMENT
We the authors assure you that, this is our own work and also
assure you there is no conflict of interest.
6. REFERENCES
1. Ye, Lexiang, and Eamonn Keogh, Time series shapelets: a new primitive for
data mining, Proceedings of the 15th ACM SIGKDD international conference on
Knowledge discovery and data mining, ACM, 2009.
2. Lin, Jessica, RohanKhade, and Yuan Li, Rotation-invariant similarity in time
series using bag-of-patterns representation, Journal of Intelligent Information
Systems 39.2 (2012): 287-315.
3. Mueen, Abdullah, Eamonn Keogh, and Neal Young, Logical-shapelets: an
expressive primitive for time series classification, Proceedings of the 17th ACM
SIGKDD international conference on Knowledge discovery and data mining,
ACM, 2011.
4. Ye, Lexiang, and Eamonn Keogh, Time series shapelets: a novel technique
that allows accurate, interpretable and fast classification, Data mining and
knowledge discovery 22.1-2 (2011): 149-182.
5. Lines, Jason, et al, A shapelet transform for time series classification,
Proceedings of the 18th ACM SIGKDD international conference on Knowledge
discovery and data mining, ACM, 2012.
Yamunadevi ,Bhuvaneshwari,Uma…. (IJ0SER) November - 2015
3221 5687, (P) 3221 568X
6. Baydogan, Mustafa Gokce, George Runger, and Eugene Tuv, A bag-offeatures framework to classify time series, Pattern Analysis and Machine
Intelligence, IEEE Transactions on 35.11 (2013): 2796-2802.
7. Deng, Houtao, et al, A time series forest for classification and feature
extraction, Information Sciences 239 (2013): 142-153.
8. Hills, Jon, et al, Classification of time series by shapelet transformation, Data
Mining and Knowledge Discovery 28.4 (2014): 851-881.
9. Bagnall, Anthony, et al, Time-Series Classification with COTE: The
Collective of Transformation-Based Ensembles, IEEE transactions on knowledge
and data engineeri l. 27, no. 9, 2015.