Download Activity Recognition Using, Smartphone Based

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Activity Recognition Using, Smartphone Based, Accelerometer Sensors
by
Ricardo Emanuel Gouveia da Costa Cachucho
Thesis of MADSAD on Data Mining
Supervised by
João Mendes Moreira
João Gama
2011
I was born in Madeira in 1986 having lived there until 2004 when I moved to Porto to
study economics and then Data Mining in the Faculty of Economics of Porto (FEP).
In Madeira I took part of several groups connected to outdoor activities: member of the
first paintball team in Madeira, member of MadeiraEcoChallenges: a group created to
organize eco-friendly activities as bike tourism (Porto-Algarve 2006 edition), trekking
(crossing Madeira by walk in 2008) and other activities related to nature.
During the bachelor period in Porto I developed a particular will to learn how to model
complex problems and that was one of the major reasons to take part of the master in
Data Analysis and Decision Support Systems. During this period I have also been a
member of this institution choir, eCOROmia, a great school in terms of human relations
and public performances. It is also with big pleasure that I am a member of the
Pedagogic board of FEP having the opportunity to take part in very important
implementations in FEP in these past year and half.
After almost 7 years living in the beautiful city of Porto I moved to Leiden where I plan
to stay for the next years to improve my knowledge in the Data Mining with the Pattern
recognition group of LIACS and living the Dutch dream: “going by bike to work”.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página II
Acknowledgments
Writing the acknowledgments is the nicest part! A lot of people helped me in this work
directly and indirectly and I will try to thank to them all even if they don´t appear here.
This work is the result of a partnership between the master I took part (MADSAD) and
LIACS. I have to start thanking the Professor Carlos Soares for pushing forward this
partnership. My both supervisors, Professor João Gama and João Moreira I thank their
patience, encouragement and share of knowledge in this last year; I hope we can
continue this relation in the future improving this work.
To Arno Knobbe, my supervisor in Leiden a special thanks for receiving me, making
me part of your group, encouraging the share of knowledge and giving me a opportunity
for my life; I hope our expectations come truth in the close future. To all the rest of the
guys in the Pattern Recognition group of LIACS, Ugo Vespier, Shenfa Miao, Marvin
Meeng, Joaquin Vanschoren and the unpronounceable Wouter Duivesteijn thank you
for your daily support sharing knowledge, brainstorming and reviewing my work.
To my group of friends in Madeira, Porto and Leiden: You make my life enjoyable,
thank you for inspiring me and making me believe in the future. A special thanks to
Irene, which has been by my side, loving me, listening and inspiring my work; being
next to you, is my daily inspiration.
At last but not least to my family, supporting me in all the ways possible: Alexandra,
Miguel, Cristina, Manuel you have been my support in Porto in these last years; Mother
Cecília and father Cachucho to you I own almost all I have done until now: It is
admirable the family you have built and I thank you for that. At last but not least to little
João Tomé: You were born during one night of writing and reviewing this thesis and
when I knew that you were so perfect I got the strength to work several days without
sleeping; so young and yet already so inspiring!
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página III
Resumo
Activity Recognition é uma área de investigação que se propõe a monitorizar indivíduos
de forma eficiente e não intrusiva. Uma das abordagens mais discutidas é o uso de
sensores pelos indivíduos, por exemplo através de acelerómetros, que recolhem
informação para a criação de modelos e posterior classificação de dados não rotulados.
A informação desta modelação poderá servir para aplicações de interactividade social,
marketing ou serviços de saúde. Na área da saúde já existe uma grande implementação
de sensores de forma a captar informação, mas o uso de aplicações não intrusivas e que
produzam resultados actualizados pode ajudar na prevenção de doenças, educação para
a saúde (através de sistemas de recomendação), sistemas de alerta para idosos que
pretendam levar uma vida independente mas em alto risco de acidentes ou
monitorização de recuperações de pacientes.
O objectivo deste trabalho é a criação de um modelo geral que possa ser aplicado num
ambiente móvel (smatphones). Já existem propostas na literatura para este problema
mas nenhum considera os problemas de orientação do telemóvel na recolha dos dados.
Para além disso fazemos uma proposta para este problema ser interpretado como um
problema de classificação hierárquico onde partimos de características mais gerais das
actividades (actividades passivas ou activas) para as mais específicas (andar, correr,
sentado, em pé parado).
As experiências criadas têm como objectivo mostrar que o conhecimento sobre o
problema aumenta através da aprendizagem hierárquica e que é possível gerar melhores
resultados que a abordagem normal aos problemas de classificação. Tentamos também
conjugar a abordagem hierárquica com a construção de atributos que possam lidar com
os problemas de orientação resultantes da rotação dos sensores usados para a captação
de dados.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página IV
Abstract
Activity Recognition is the task of predicting which activities are taking place at a
certain moment when considering only one individual user. A common approach to this
problem is by using body worn sensors in order to collect data. The development of
inexpensive and small sensor made this area a reality, and normally the approaches will
propose the use of accelerometer sensors. For example when considering the two
biggest smarthphone operator systems, all their available smartphones will have a built
in triaxial accelerometer.
With the models for activity recognition it is possible to build new applications for
social interaction, marketing or healthcare systems. Focusing in the healthcare domain it
is easy to realize that this must be one of the areas that take into account the sensoring.
Though there is the opportunity to create applications that can help in an efficient and
less expensive way in the recovery of patients, preventing diseases, educating
individuals for a healthier life or monitoring elderly people who want to have an
independent life.
The main goal of this thesis is to create a generalized model to be applied in a mobile
environment. There are some proposals in literature for this problem but they don´t
consider the orientation problem of the device collecting data. There is the need of
taking rotation of the sensor into account when considering collecting data from
smartphones. We also found that there is the possibility to interpret this problem as a
hierarchical model, where the classes can be described from more general
characteristics to lower levels of classification with more specific information.
The experiences purposes are to show the qualities of the hierarchical approach for the
problem of activity recognition and the development of features which are able to deal
with orientation problems in order to enable the application of this model to mobile
environment.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página V
Contents
Chapter 1: Introduction......................................................................................................... 1
1.1
Our Problem: Sensor-based, single-user activity recognition................................... 1
1.2
Motivation ................................................................................................................. 2
1.3
Challenges ................................................................................................................. 4
1.4
Contributions ............................................................................................................ 6
1.5
Outline ....................................................................................................................... 7
Chapter 2: Related Work....................................................................................................... 9
2.1
Single User, Sensor-based Activity Recognition ........................................................ 9
2.2
Feature Transformation .......................................................................................... 11
2.2.1
2.3
2.3.1
Feature Transformation for Activity Recognition ............................................... 13
Modeling ................................................................................................................. 14
Hierarchical Classification ................................................................................... 16
Chapter 3: Methodology ..................................................................................................... 17
3.1
Data Mining Overview: CRISP-DM .......................................................................... 17
3.2
Features Construction and Selection ...................................................................... 18
3.2.1
Sliding Windows .................................................................................................. 20
3.2.2
The orientation problem ..................................................................................... 21
3.3
Modeling ................................................................................................................. 22
3.3.1
Learning Algorithm .............................................................................................. 22
3.3.2
Hierarchical classification .................................................................................... 25
3.4
3.4.1
3.5
Evaluation................................................................................................................ 26
Model Selection and Assessment ....................................................................... 27
Conclusion ............................................................................................................... 29
Chapter 4: Data Analysis ..................................................................................................... 30
4.1
Data Collected ......................................................................................................... 30
4.2
Data Description...................................................................................................... 31
2.2.1
Time Series .......................................................................................................... 32
4.2.2
Non Temporal Analysis, Boxplots........................................................................ 34
4.2.3
Non Temporal Analysis, Scatterplots .................................................................. 36
4.2.4
Non Temporal Analysis, Pearson Correlations .................................................... 38
Chapter 5: Experiments and Modeling ............................................................................... 41
5.1
Experimental Setups ............................................................................................... 41
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página VI
5.2
Baseline Experiments .............................................................................................. 42
5.2.1
All Activities Classification ................................................................................... 43
5.2.2
Active Activities Classification ............................................................................. 45
5.2.3
Passive Activities Classificcation ......................................................................... 46
5.3
First set of Statistical Measures .............................................................................. 48
5.3.1
All Activities ......................................................................................................... 49
5.3.2
Active Activities ................................................................................................... 51
5.3.3
Passive Activities ................................................................................................. 53
5.4
5.4.1
5.5
5.5.1
5.6
Hierarchical Classification ....................................................................................... 54
Sub-grouping into Passive and Active Activities ................................................. 55
The Orientation Problem ........................................................................................ 58
Implementing 1st Derivatives .............................................................................. 58
Model Assessment .................................................................................................. 61
5.6.1
Assessment of the Flat Classifier ......................................................................... 61
5.6.1
Assessment of the Hierarchical Classifier ........................................................... 63
5.7
Discussion ................................................................................................................ 66
Chapter 6: Conclusions and Future Work ........................................................................... 68
References........................................................................................................................... 69
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página VII
Chapter 1: Introduction
In this chapter, we will give an overview of the topic discussed in this thesis. We will
start with a definition of the central problem of activity recognition, and provide a
motivation for this work in terms of the interactions between man and machine. We will
then present the main challenges in activity recognition, as inspired by our initial
investigation and analysis of the relevant literature. In Section 1.4, we present the main
contributions of this thesis, in terms of single-user activity recognition, with a particular
focus on triaxial accelerometers (sensors for measuring acceleration in three
perpendicular directions). We conclude with an outline of the remainder of this thesis,
in Section 1.5.
1.1
Our Problem: Sensor-based, single-user activity recognition
Activity recognition is the task of recognizing the actions of individuals, based on the
limited information from sensors on and around the individual. We can identify two
main sources of information for this recognition task: environmental inputs (e.g. Wi-fi,
vision based recognition, or other sensors placed in the [1]) or body-worn sensors
(smartphone inputs, networks of wearable sensors). In this thesis, we will focus on this
second source of input. We will specifically consider the applicability of a practical
mobile device already in use by the large majority of the population in developed
countries, the mobile phone, more specifically the smartphone. This choice makes sense
as they are typically incorporated with a triaxial accelerometer.
An accelerometer is a sensor that can measure the force acting upon it, be it from
physical acceleration or from the Earth‟s gravity. Most accelerometers measure
acceleration along either 2 (X and Y) or 3 axes (X, Y and Z). The majority of
smartphones are fitted with a triaxial accelerometer (three axes), since the two most
used smartphone software platforms (Android and iPhone) require this type of sensor.
The accelerometer in smartphones is used by the operating system of the phone to
perform orientation-sensitive tasks (such as rotating the screen to match the view of the
user), as well as by various applications installed on the phone. With this useful
capability already built into the phone, our aim is to „abuse‟ the sensors for the task of
capturing the forces acting on the phone due to the owner‟s activities, in a continuous
manner. The data thus produced will be the starting point for training an automated
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 1
system to recognize the person‟s activity from acceleration data. For obtaining a system
to recognize activities, we build upon the large body of work in the Data Mining field
[references].
The input data considered for this mining exercise will be streaming data from
accelerometer sensors collected from smartphones with Android systems (although the
proposed method could work just as well on alternative smartphones). More formally,
the definition of our initial problem is: from a stream of accelerometer data S = {s0, …}
with si = (xi, yi, zi), try to predict one activity from a predefined set of activities A = {a1,
…, ak}. In our experiments, the set of activities will cover simple activities from everyday life: A = {walking, running, standing, sitting}.
Being the aim of this work to find a generalized activity recognition model, considering
simple daily life activities such as walking, running, standing, sitting, our testing
hypothesis for this thesis is if exploring the hierarchy of the classes we can improve the
accuracy of the model, exploring several measures in terms of accuracy and efficiency.
To start exploring our problem and the hypothesis considered above, in section 1.3 it
will be presented the challenges I found to be important. From the challenges found, in
chapter 3 can be found the modeling methodology, where some aspects of our testing
hypothesis will be reasoned.
1.2
Motivation
During the last decade the evolution of inexpensive and wearable sensors such as the
accelerometer, GPS receptor, cameras and microphones along with computational
developments in terms of hardware and software, has opened a new field of
opportunities in the mobile applications domain. Any of these sensors considered can be
easily
found
in
any
smartphone,
widespread
in
almost
all
the
mobile
telecommunications markets. This sensoring world is changing the paradigm of human
relation with the machine.
This new paradigm is what in literature can be found as Ubiquitous Computing or
Pervasive Computing. Until these days our relation with the machines can be
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 2
characterized by the fact that we receive a reaction, or a set of reactions, from a direct
instruction that we give. For example, when we drive a car we are giving instructions to
the car and it will respond according to these instructions or when we type something in
a computer normally we expect some results from it. The development of activity
recognition models is one of the steps needed to transform our actions into reactions
from computers and other automated systems without needing to give direct instructions
and many of the times unconsciously.
The most promising field to apply these activity recognition models is the healthcare
systems domain. This is caused by the high costs of monitoring people nowadays. We
could consider the possibility to check the recovery development of patients
automatically and constantly that have their activities constraint or even monitoring
elderly people that want to have an independent way of living but at the same time need
some monitoring because the risk of falling is high. Another consideration is the
implementation of recommender systems in order to avoid sedentary ways of life. The
enormous application field along with the need that societies have to make healthcare
systems more efficient should be enough for activity recognition to be considered a hot
topic.
There are also opportunities to explore the enrichment of activity recognition model
with information about networks of people, business and places. Through the
classification of activities, localization and crossing this information with networks, it is
possible to create efficient connections. One could imagine the possibility of connecting
people with people, people with services and people with places with a ubiquitous
computing system.
The development of systems where people interact with the machines fitting to their
characteristics have such a potential that I believe they will be developed in a short
period of time. But on the other hand, this area is also causing some discussions about
privacy issues for the subjects that are being monitored and therefore there is the need
for the development of an ethical methodology procedure for Ubiquitous computing.
One good reason to do this is that societies are more and more aware of this systems and
the lack of confidence in the systems` privacy can be a setback for the development of
this area. Although it is my belief that developing ethical methodological standards for
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 3
this area is something that must be discussed it is not my aim to discuss this topic in this
thesis.
1.3
Challenges
In this section we have an overview of which challenges can be considered when facing
a single user activity recognition problem built from accelerometer data.
What kind of transformations can be applied to the raw data streams in order to classify
the activities? Feature construction and selection is a much discussed area in Data
Mining literature and one of the biggest discussions inside the topic of activity
recognition. The approaches to what have been done can be found in section 2.3.
Is the orientation of the smartphone important for a good classification? Yes! It is
possible to detect the orientation of the device because the accelerometer can detect the
direction of Earth´s gravity. Considering that, we will build a classifier for smartphones
where mobiles can be placed anyhow in the pants pocket. Therefore we have the
problem of which features to create in order to achieve good accuracy measures for the
classifier. The figure 1-1 shows the axis rotation possibilities inside a pocket.
1-1 Example of a smartphone rotation.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 4
What kind of activities can be recognized using accelerometers data? There are works
that have considered many daily-basic activities [2, 3] (waking, running, standing,
sitting, walking upstairs, walking downstairs…), but for a start I will only consider
simple activities that can be classified at any time: walking, running, standing and
sitting.
Do different placements of the smartphone change the signal pattern of the
accelerometer for a single activity? Until now all the works consider only one specific
placement and orientation for the accelerometer [3, 4] or network of sensors [2, 5]. In
the case of data collected from phones, they all consider the pants pocket to be the best
place because the reference work [2] claim the sensor placed on the hip is the most
informative sensor from the network of sensors used and also because it is a typical
location for the placement of a smartphone.
How about to collect a sample, in order to infer about a normal population? If we think
about sampling, to build a general activity recognition model we need to think about the
placement of the device and also about whether the subject is representative of the
population (for example, are the patterns of activities the same for males and females?).
The fact is that it is very difficult to collect data from many people and it is a process
that can be done if we have resources to pay subjects to do this or it can be a very long
process to find candidates that will effectively do this collection.
What kind of model has a good tradeoff between accuracy and computational costs,
considering mobile devices? In section 3.3.1 there is a discussion about which
characteristics a learning algorithm should have in order to balance the tradeoff between
accuracy and performance.
Which kind of learning procedure can we develop in case we have a little amount of
labeled data? There is a very good work [6] that suggests that it is possible to construct
a model training with 5% of labeled data. The problem that I find here is that those 5%
must be representative of the population activities patterns, so we should start searching
for a good generalized model and only then move to semi-supervised learning. This
implementation looks promising when considering that each model will feat to each
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 5
user and from that we have improvements in terms of accuracy and better information
from each user.
Should we consider the temporal continuity of the activities? If temporal continuity of
the activities is implemented then there is a possibility to improve the classifier by
considering a chain of activities. Considering a sequence of five classifications where
one can find “walking”, “walking”, “running”, “walking”, “walking”: the probability of
having a misclassification in “running” is high and the temporal continuity could help to
solve this problem.
1.4
Contributions
The claim of this thesis is that it is possible to create a simple generalized hierarchical
classification model for activity recognition, by the use of few time domain features,
extracted from sliding windows.
We build this model from a dataset partially collected for previous works in activity
recognition [7] and partially for this research and having as inputs the timestamp, the X,
Y, Z axes acceleration values and a label of which activity is taking place for each
record. The framework for the development of feature construction relies on the
knowledge about each activity considered in order to capture relevant characteristics
aiming the best discrimination between activities.
Then it can be found an unused modeling approach for activity recognition that enables
the minimization the number of features needed to have a good classification model.
The state of the art in activity recognition proposes a flat classification model: from the
input data transformation there is a class prediction. But considering these previous
works it is possible to see that most of the misclassification are made into similar
classes as for example: walking upstairs, walking downstairs and regular walking
(check Kwapisz et al. [3] confusion matrixes).
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 6
0
-2
-1
Acceleration in G force
1
2
All Activities
Active and
Active and
Passive and
Passive and
Walking
Running
Sitting
Standing
Data Streams: frequency = 20Hz
Red=X.Axis; Green=Y.Axis; Yellow=Z.Axis
Figure 1-2 Example of accelerometer data for different activities.
Considering the classification of four activities: walking, sitting, standing and running,
is possible to see, by the visualization of the signal produced by the accelerometer for
each activity, that we are facing two different kinds of upper level activities: Passive
(activities which don´t involve movements: Standing and Sitting) and Active (activities
that
involve
movement:
Walking
and
Running).
From
this
hierarchical
conceptualization it was develop a model that first classifies any activity into Passive or
Active and then a lower level classification models for standing and sitting when
considered a Passive activity or walking and running after classified as Active.
1.5
Outline
We briefly provide some background information in the next chapter. Section 2.1
introduces the related work of activity recognition when considering the use of sensor
based information. Section 2.2 describes some concepts and literature of a major task in
activity recognition that is feature transformation. Section 2.3 presents literature related
of the following step in activity recognition: the implementation of learning algorithms
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 7
in order to build a generalized model. We will introduce a small section, 2.3.1, of
hierarchical classification that was never applied for the problem we are dealing with.
Chapter 3 explains our framework and methodologies proposed to address our problem
in more detail. We describe some notions of CRISP-DM standard model in Section 3.1
and how it can be useful to have a previous overview of what was developed during this
work. The techniques for the transformation of the original data in order to build a good
model are explained in Section 3.2, where we present the concept of sliding window
and how the construction of features is directly related to the problem of orientation
described in the section 1.3.
In Chapter 4, is dedicated to the data analysis as a preparation procedure for the
dynamics that we will develop in the chapter where we present the experimental
procedures. In section 4.1 we present the characteristics of the dataset initially collected.
Section 4.2 start by giving an interpretation of the dataset as a time series and then we
move to a non temporal analysis once it is proposed in section 3.2.1 the implementation
of sliding windows to extract timeless features.
Chapter 5 summarizes the experiment setup and outcomes. We start the experimental
procedures by presenting the baseline results and then we move to the section 5.2 until
5.5 where there is a chain of relations that will lead us to the selection of two models to
compare in the model assessment in section 5.6. At last in this chapter we present a
brief discussion about some of the choices made during the experimental procedures or
while studying methodologies and related work. Finally, we conclude our work in
Chapter 6 with some conclusions and ideas for future work.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 8
Chapter 2: Related Work
In this chapter we start the work by analyzing the state of the art of activity recognition
when considering a single user classifier model based in accelerometer data as input. In
relation to the related work about this topic the overview can be found in section 2.1.
The data mining approach for activity recognition involves many techniques as feature
transformation (feature construction and selection) and learning methods. The feature
transformation is approached in section 2.2 as a general data mining task and then
focusing what have been done in activity recognition in relation to this topic.
At last we move into the machine learning area in section 2.3. Being impossible to make
an overview of this topic related work for the purposes I intent we have as starting
point, for the decision of which learning algorithms to consider, the characteristics of
the dataset and the influences on the decision to be made. In this section there are also
some references to the topic of mobile data mining.
2.1
Single User, Sensor-based Activity Recognition
The topic of single user activity recognition based on accelerometer data is recent but
there have been many approaches in order to deal with it. When interested in this topic
there are scientific conferences that can be used as references: UbiComp, ISWC,
Pervasive, ACM SIGKDD, SensorKDD, are some examples of where to look for works.
The common approach to the single user activity recognition problem [2-4, 8, 9], is to
apply the techniques proposed via a two-stage process. First they derived features from
the accelerometer raw data collected extracting them from batches of data (normally
called sliding windows). Then they have applied one or more classifiers to recognize
different activities considered in their works.
Bao & Intille [2] have the most referenced work in this topic and can be found almost in
all the works done after 2004. They have collected a sample from 20 subjects in an
unsupervised way, using a network of sensor placed simultaneously in different parts of
the body. In this work [2] they have two conclusions very important for the direction of
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 9
the research done afterwards. The first indicates that there are good possibilities to
create a generalized model. Secondly they have concluded that the most discriminative
accelerometer in terms of which activity is taking place was placed on the hip, what can
be a good indicator for the pocket in pants is a good placement to collect the input data
for the purpose of having a good model and this conclusions can be also found on other
works [3, 4]. If there is the will to take this kind of model to the application field we
need also to consider other normal placements for the smartphone (i.e. handbag for the
women or the suits pocket).
From the inspiration created by Bao & Intille using a network of sensors to measure
accelerations of the body in order to classify human activities, there were many works
developed with this framework [9-15]
where the data would be collected from
networks of sensors and then sent to a desktop computer where the classification would
take place. There is also the approach of using sensors placed in the body and the
classification would take place in the mobile phone, after a generalized model being
learnt in a desktop computer [16].
The work of Győrbíró et tal. [16] is also interesting once the placement in the wrist of a
wearable sensor solved their orientation problems. When considering the use of
smartphones to collect data from the built in accelerometer sensor it is possible to
imagine the rotation inside pockets of even different placements of the mobile causing
problems in the classification process. But the question here is: are the people really
willing to use a sensor in the wrist every day for the assessment of their activities?
Another approach was done by Miluzzo et al. [4] by the development of a mobile
application that involves several classifiers, some of them working on the phone and
some backend classifiers, producing several levels of classification. The mobile, from
the raw accelerometer data collected by the built in accelerometer, calculates the mean,
the standard deviation of the acceleration and the number of peaks in each batch of data,
applying the sequence based sliding windows [17]. Fom these features they propose a
decision tree was trained using J48[18] in WEKA[19] workbench, to classify which
activity is taking place. In this work they deal with the activity recognition problem in a
different framework from the previous works, where all the sensoring and classification
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 10
takes place on the mobile device that people already use, but they don´t deal with the
mobile orientation problem, and the results in the test dataset are not that good.
The WISDM (Wireless Sensor Data Mining) group [3], published their work done in
2010, were they have collected a sample of activities, accelerometer-based data, from
29 subjects. All these subjects were asked to record some daily activities for specific
periods of time using a smartphone on their right pants pocket facing forward. This is
by far the most complete sample that we can find in literature about single user, sensorbased, activity recognition. Experimental results from the batches of instances with 20
and 10 seconds extracted from time-based sliding windows [17], showing that 10
seconds is better to classify the activities. They didn´t consider the hypothesis of using
some technique in order to find the best length to apply in a sliding window. The
classification techniques proposed were J48 Decision Trees, Logistic Regression and
Multilayer Perceptron and they have concluded that there isn´t a classification model
that overalls all the others. Here raises the question of what model to choose? This
question is addressed in section 3.3.1.
One of the solutions proposed to improve the capability of a good classification of the
activities is adding a temporal continuity probability in order to reflect a level of
confidence for the prediction. This approach can be found in [9] where Krishnan &
Panchanathan propose a method to aid classifying successive temporally close frames.
2.2
Feature Transformation
There is a naïve approach in the Data Mining process. In this approach when having a
dataset the first to do is look inside a machine learning software package, e.g. Weka
Explorer [19], and try to find the algorithm that adjusts the best to the dataset. Although
there are several proposals in literature [20] and software packages[19, 21] in order to
deal with different kinds of problems and characteristics in the datasets, there will be
always a problem to deal with a noisy dataset and/or, even more, an inadequate
description of the space domain. There are already learning techniques to deal with
noisy dataset [22] as decision trees or K-NN but the construction of features that
represent the space in the optimal way, creating domains in the space represented
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 11
without overlapping it will always improve the results of a classifier. It is possible to
find this search for the adequate representation language of the space in literature as
feature transformation [23-25] and normally can be separated into two different topics:
Feature construction and Feature Selection.
Normally the conventional learning algorithms rely on the existing features that were
provided by the user. In this case, data analysts assume the task of analyzing the original
dataset and extracting the unique characteristics of each class from the original dataset,
in order to provide the learning algorithm with the features needed to learn a robust
model. This method is a domain knowledge procedure but there are methods
implemented that uses multiple operators to improve the representation space of the
dataset [23] with less intervention and effort from the data analyst.
The feature construction procedures are normally integrated with a selection method in
order to optimize the dimension of the space representation minimizing it without
compromising the predictive efficiency of the model. A good example of this
interaction in the use of a selective learner, like C4.5 decision trees, with sophisticated
constructive induction components proposed by Pfahringer [26].
Whereas feature construction produces better descriptions of the space domain than the
original dataset improving the learning and classification process, feature selection
eliminates irrelevant features because they are insignificant for the problem or
redundant in relation to others. Most of the data mining problems start with a rich
original dataset in terms of features where many of them won´t be useful for the
classification process, so this topic can be found in generalized literature about data
mining [20, 22] more often than feature construction.
There are some learning algorithm that implement this feature selection procedures as
C4.5 decision trees [18] or MARS [22] but many others are not able to deal with
irrelevant features, as neural networks, SVMs or KNN [22] (see section 3.3.1). In order
to implement these learning algorithms incapable to deal with irrelevant features there is
a preprocessing stage, in data mining literature [20, 22] or software‟s [19, 21], normally
called feature or attribute selection where several techniques can be applied [23].
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 12
2.2.1 Feature Transformation for Activity Recognition
The starting point for my reviews about feature transformations in the context of my
problem is Preece and Goulermas [5]. This work is the main reference in terms of
feature extraction for this thesis once it is the only work comparing different techniques
to extract features. These authors made a comparison of 14 different works where
several feature extraction techniques, in order to classify activities from accelerometer
data, were classified as time-varying acceleration signal, frequency analysis and wavelet
analysis:

Time domain features: mean acceleration, standard deviation, inter quartile
range, correlation between axes, number or average of peaks (considering
sliding windows), time between peaks;

Spectral features: energy, frequency-domain entropy, principal frequency;

Wavelet features (not developed in this thesis);
Time domain features are normally what we know by statistical description of the
datasets but in these cases they are extracted from batches of data limited by time (see
section 3.2.1 for sliding windows). Features extracted from the three axes as means,
standard deviation [14], first quartile and third quartile [10] or correlations between axes
are commonly used for the classification process but they need some modifications in
order to be reliable in a real application environment (see the orientation problem in
section 3.2.2).
Once the data collected from the accelerometer is many times recognized as a signal
there are many approaches that extracted the frequency of the cycles from this data.
Frequency-domain features normally involve the Fourier Transformation technique
from which can be extracted, for example, the principal frequencies in the signal or
spectral energy. One of the most referenced works in activity recognition [2] used a
mixture of time and frequency-domain features as a way to improve results but by using
networks of sensors placed in specific places they didn´t have to deal with most of the
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 13
challenges presented in section 1.3 as for example the orientation problem that is
conceptualized in section 3.2.2 and addressed in chapter 5 during the experiences.
2.3
Modeling
When considering modeling using machine learning techniques it is important to have
in mind that the choice for the algorithm must be done according to the characteristics
of the dataset used. According to Friedman et al. [22] facing a large dataset as a data
stream with a set of features that have not been selected yet there are not many options.
He suggests the use of decision trees or a multivariate adaptive regression analysis
created by him (MARS) and with implementation in R language.
But there are different approaches when dealing with large datasets which have several
features that were not selected yet. When analyzing the problem trough the data mining
perspective it is usual to prepare the dataset in order to be handled by a specific
algorithm [20]. According to Witten and Frank it would be normal to consider the
possibility to apply feature selection techniques and down sampling techniques and then
apply an algorithm that suits better the interests of the data analyst. It might be
interesting the use of KNN if we consider semi-supervised learning but for now, having
in mind the goal of a generalized model for activity recognition, we don´t see the need
to transform the characteristics of my dataset in order to use a particular learning
algorithm.
All the previous works in single user sensor-based activity recognition we analyzed,
proposed a two-step methodology where from the transformation of the raw data a
number of features were constructed by the use of sliding windows and then used to
learn a model. We have seen different approaches to build features but there were also
different approaches when it was the moment to choose a learning algorithm.
When deciding the learning algorithm to use, some of the authors decided to apply a set
of different learning techniques [3, 13], others proposed only Decision trees as J48 [4]
in WEKA workbench because of their efficiency and lightweight in the classification
process, or kNNs [5] because of their conceptual simplicity, even if KNN is not
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 14
efficient for a large dataset [22]. Once more it is important to remember the possibility
to transform the original dataset in order to apply it to most of the traditional algorithms
found in literature.
Since the costs of labeling the datasets for a general classifier are very time consuming
there is already a very interesting proposal [27] to overcome this problem trough the
ability to learn from unlabeled data for activity recognition, whereas the only problem is
not having the implementation in a mobile environment. Masud [6] has also a proposal
for semi-supervised learning from dynamic datasets in general, that could be applied for
the problem of activity recognition from accelerometer data streams.
When looking for directions having in mind the implementation in a mobile device, we
have found only few authors that implement activity classifiers on the mobile device
[4]. They implemented decision trees, having as arguments the lightweight, efficiency
and the fast implementation of this model. There is a very recent implementation [28]
where the training data is provided by the user itself, labeling the activities and then
learning a non linear model in the mobile device called GMT [29] (Geometric Matching
Template). This led us to the area of mobile data mining.
Last year‟s development of smartphones in terms of numbers in the telecommunications
market and their computational and sensorial capacity presents an unprecedented
opportunity for mobile data mining. At this moment the number of real-time analysis of
the data in mobile environment is growing at a very fast rate and for different
application domains, such as healthcare systems [30].
Mobile data mining involves the generation of interesting patterns from datasets that
were collected from mobile devices. One of the characteristics of this area is that the
datasets grow very fast and there must be a search for efficiency in order to implement
data mining models in the mobile device. There are proposals to reduce the source data
only to statistical summaries before performing data mining [31], that can be considered
during the implementation of sliding windows.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 15
2.3.1 Hierarchical Classification
Hierarchical classification deals with problems where classes are organized in
hierarchies. These classes are organized from more generic to more specific. There is a
good tutorial on hierarchical classification techniques done by Freitas and Carvalho
[32].
Some studies indicate that it is possible to distinguish static activities as sitting and
standing from the active activities by a threshold using some measure of acceleration
[12]. This means that the problem can be decomposed firstly in active and passive
activities and then in other subgroups within these two major classes. From this
decomposition it is my belief the orientation of this area can go from more general
activities to more specific considering the characteristics of each activity.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 16
Chapter 3: Methodology
In this section we explain some of the CRISP-DM [33] stages implemented in this
project as a framework for the implementation of some techniques and most of all to
explain the dynamics between the construction of features and the modeling
experiments. Then the most relevant stages of this project will be shown: Features
construction and modeling, discussing the most important concepts for the development
of this thesis. At last, in this section we will present some important concepts in order
evaluate the learnt models presented in chapter 4.
3.1
Data Mining Overview: CRISP-DM
The aim of this thesis is not to make an overview of the Data Mining world, but it was
considered important to define a framework in order to have a guideline for the
investigation project. As a reference the standard process model proposed by
consortium CRISP-DM [33] was considered, as seen in figure 3-1.
Figure 3-1: CRISP-DM procedures[32].
Certainly this work was not developed in a enterprise environment but assuming that
our “business” is activity recognition as described in section 1.1 and that we are highly
motivated by the reasons explored in section 1.2, our starting point here will be a
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 17
labeled dataset collected by multiple smartphones accelerometers where the activities
proposed in our problem are recorded.
Our next stage is to have a good data understanding that can be found in chapter 4.
Once we are working with a data stream that can be considered multivariate time series
(timestamp, X, Y, Z), we will start with the data analysis in order to have the
characterization of the time series that we will be working with. Then the non temporal
data analysis will be also shown to explore as much as possible the characteristics of
this data typology.
The next two phases, Data Preparation and Modeling, are the most time consuming and
very interactive. The result of the relation between these two phases will be described in
the Experiments chapter of this thesis (chapter 5) and we can observe the development
of this interaction along the sections inside this chapter.
The last phase considered for this project was the Evaluation. Along the period of
experiments we could find some results that would look very good in terms of modeling
evaluation but the conceptualization of this model performance in a real mobile
environment was time consuming.
3.2
Features Construction and Selection
One of the main topics in the Data Mining literature is Feature Transformation [34].
This area approaches two different kinds of problems that are typical when dealing with
some data mining problem: Feature Extraction or/and Feature Construction.
For feature extraction let‟s assume that we have a large dataset with many features
where there is irrelevant information or redundancy. The aim here is to select from the
original dataset which are the relevant features (for example with sub-group discovery)
or combination of features to reduce the dimension of the problem. The common
approaches in feature selection are PCA (Principal Component Analysis) for numerical
attributes and MCA (Multiple Correspondence Analysis) for categorical attributes.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 18
Another problem can be that the original dataset of a classification problem is composed
of features that need to be transformed in order to expand the space representation with
new, more predictive features. In this case we are facing a feature construction problem,
which is a form of modifying the representation of the space provided by the original
data or in other words to expand the space representation.
Constructive induction is a general approach to deal with inadequate attributes in the
original dataset where feature construction and feature selection interact in a dynamic
process in order to achieve the best set of features for a specific problem. This approach
can be addressed by automatic methods [34] or manually by the data analyst. The data
analyst in this case can assume the task of determining which could be the relevant
attributes transformed from the original space. There is a general approach explained by
Bloedorn et al. [23] and shown in figure 3-2.
Figure 3-2: General approach for constructive induction by Bloedorn [32].
The process of feature construction is an important issue because we are trying to avoid
much of the assumptions made in the past, as fixed placement and orientation of the
sensor [2, 3], fixed frequency of collection of the data [3] and the use of a network of
sensors [2], leading us to an implementable model.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 19
3.2.1 Sliding Windows
When extracting features from a data stream it is important to have a sampling
technique to keep the model actualized. A general approach to do this is to implement a
sliding window over the dataset extracting from those batches of data, timeless
statistical information [35] and then discarding past examples for the classification
process. This sampling decision can be a sequence-based window with a fixed
dimension of size n or a timestamp-based window of a fixed duration as explained by
Babcock et. al [17].
The sequence-based window of size k consists in deciding a dimension, k, to extract
information from the incoming data. From the same frequency it is arriving new data to
the processor it will be also dismissed the older data. Let´s assume a data stream
where
is the variable and a window of dimension , and
extraction of features in this case will start from 1 to
and then when the
. The
instance
arrives the first instance is dismissed for this local classification.
To extract features, there is the main question of deciding on a dimension of this
window, but normally this decision is made from understanding the problem we are
trying to classify, or from recommendations if the literature related to the problem has
already applied this technique.
Although there are different approaches to decide the dimension of this window, it is
always convenient to have in mind that longer windows will be richer in terms of
information, thus producing normally better features to be used in the classification
process. Smaller windows will have the ability to reflect more quickly different classes.
There will be always a trade-off between quality of features produced and the ability to
recognize changes in terms of the classification.
The timestamp-based window is used when we want to classify a period of time that
was chosen based on the understanding of a particular problem, but the data analyst
doesn´t know at what frequency the new data will come in. In the case of our sensor
there are some variations in terms of the collection frequency but that can be controlled,
so there wasn´t any major reason to apply this technique.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 20
3.2.2 The orientation problem
To start this section it is important to remember the challenge proposed to build features
having in mind the orientation problem. The placement of the mobile device in the pants
pockets does not solve the problem of orientation because the mobile can still rotate
inside the pocket, transferring information from the X to Y axes or vice-versa.
The initial idea to solve the mobile device orientation problem is to aggregate the
information in X and Y axes in order to always have the same information regardless of
the rotation the mobile phone is making inside the pockets.
In order to achieve good performances in the classification process we realized that
statistical measures of location will distinguish between passive activities: standing or
sitting. The explanation for this is that passive activities do not involve movement so all
the collection of data will be concentrated in some specific location measuring the
gravitational force. The question here then must be: how can we aggregate the
information of several axes in order to maintain the discriminative power of the
features?
Using statistical measures of dispersion is logically useful when considering active
activities. It is easy to conceptualize that some activities are more vigorous than others,
thus involving more movements as can be seen in figure 1-2. Then our question here is:
Is it possible aggregate the X and Y axis to all the statistical measures proposed in the
related work of activity recognition [5], when considering the orientation problem of the
mobile device?
In this thesis is shown a proposal of a new feature able to measure the amount of
movement, without losing the challenge of the mobile orientation inside the pants
pocket. If we consider the amount of movement in a time domain, the sum of the first
derivatives of each axis in relation to time is implemented by measuring the
accelerations regardless to the orientation. As a result there will be a measure of the
amount of movement in each window of data:
x2+
2
+
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
2
Página 21
3.3
Modeling
3.3.1 Learning Algorithm
When discovering the Data Mining area, it is fascinating to discover so many learning
algorithms, starting with simple Naïve Bayes, K Nearest Neighbors, Linear Regressions
or non-linear models as Decision Trees, Neural Networks and then moving to ensemble
learning methods; it is easy to achieve the conclusion that this area is already huge and
growing. At the same time for the data analyst there is a concern in relation to the
decision of which learning methodology should be implemented in a specific problem.
There are some dataset characteristics [36] that can help us decide which kind of model
should be used.
When working with data streams the first characteristic that comes out is the amount of
data we will be dealing with and then the first concern when choosing the learning
algorithm should be the computational scalability. Looking at the figure 3-3 it is
possible to see that facing a large number of instances (large N) decision trees is one of
the few good choices from the traditional learning methods.
Figure 3-3: Some characteristics of different learning methods dealing with datasets [36].
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 22
Let´s assume a classification problem where
(
) and
features (
is our target variable with two classes
) to discriminate between classes. From the
set of N observations, where each observation is
, the
goal is to build a classifier in order to predict on unlabeled data.
Decision trees learning algorithms are a top down induction approach, for example
C4.5[18] or CART[37], where the classification model grows from the root (most
discriminative feature) until the leaves (classes). This procedure is done by binary split
where each node represents a decision to separate the dataset into two different
subgroups, and from these two groups we will implemented a new splitting decision
until a threshold where the node becomes a set of hierarchical rules that originates a
classification.
In order to make a splitting decision we need a heuristic to compare between different
splitting options. Normally these heuristics will have two extremes where a division that
keeps the proportion of classes in each subgroup will be the minimum and on the other
extreme a split that differentiates perfectly between two classes will be considered the
maximum. These measures can be:
1. Measures that check the distribution of classes in a dataset and a subgroup of the
same dataset, normally classed Impurity measures. Two good examples are the
Gini Index [37] or the Information Gain [18];
2. Misclassification error measures [36];
3. Statistical measures of independence, normally based in the Chi-squared test,
between the proportions of classes in the dataset and the subgroups created by
the splits;
During the splitting process there is a risk of overfitting the classification model to the
dataset used in the learning process. From the training sample if we increase the
decision tree too much we will have a model completely adapted to the training dataset
and as a consequence not general enough to apply it in a real environment. This
overfitting is represented in the figure 3-4 from the moment when the prediction error
rate of the test samples achieves its minimum and then starts to increase along with the
growth of the decision tree (model complexity).
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 23
Figure 3-4: Relation between model complexity and prediction error rates [36].
In order to solve this overfitting problem there are some proposals in terms of pruning
the models created by decision trees. There are two pruning strategies: pre-pruning and
post-pruning. Most of them can be found in Esposito et al. [38] where they propose
several rules to stop growing decision trees.
For the pre-pruning decision normally we have as a stopping decision that all instances
in the leaf are from the same class or have a threshold of a minimum number of
instances in each leaf. The second pruning method presented is the post-pruning
strategy. A very simple method [39] to make this kind of pruning is to use the
misclassification rate in order to prune the tree. First we calculate the misclassification
rate of the bottom level node, considering the majority class in that node. Then if there
is an upper level node with a smaller or equal misclassification error the tree is pruned
and this upper level node becomes a leaf.
At last we must see which problems we might have when working with decision trees.
For instance it is known that one major problem of decision trees is their high variance:
often small changes in the data produces big differences in the models created. This is
caused by the hierarchical splitting of the dataset and one error in the top of the tree will
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 24
be propagated down to the rest of decision tree. Another limitation of this learning
algorithm is also the lack of smoothness on the decision domain. Even tough decision
trees are a non parametric method that can create non linear models, if we look into the
spaces created they are always rectangles (when comparing two features) or boxes
(when a comparing three features). According to Hastie et al. [36], the MARS learning
algorithm can be seen as a smoother solution to this discriminative problem.
3.3.2 Hierarchical classification
Normally a classification problem can be addressed as a flat classification, where each
example is assigned to a class out of a finite set of classes. But some problems can be
addressed as a hierarchical classification process where firstly there is a classification
into major classes and then inside each class it is possible to create a classifier into subclasses as shown in figure 3-5. Costa et al. [32] have an introduction offering
perspectives on the characteristics of a hierarchical classification problem.
Figure 3-5: Hierarchical classification conceptualization.
The figure 3-5 shows that the hierarchical classification problem starts without any kind
of classification and then in the first level of nodes we have three major classes (1,2,3).
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 25
After the first prediction level it is possible to take this classification process to lower
levels of classification with more specific information about each example. For
example, an instance can be classified as 1 and then there is another classification
process to classify it as 1.1 or 1.2. From this figure we can conclude then that this
organization of the classification decomposes what would be a complex decision
problem into smaller problems.
From this structure we are in a position to require that an example should be classified
into a leaf, which in this case would imply one of the following classifications: 1.1, 1.2,
2, 3.1, 3.2, and 3.3. Or to be more flexible, allowing the classifier to predict only above
some predictive confidence threshold, being able in this case to classify only as 1, 2 or
3. It is clear then that this decision will imply a trade-off between accuracy (predicting
1, 2, 3 is easier than predicting sub-classes, in general) and usefulness (normally
classifying as a sub-class will provide more information than when prediction a major
class) of the classification.
Assembling this classification procedure allows us to have more efficiency in terms of
computational requirements. Let‟s assume that there is a set of features, calculated from
the raw data that distinguishes between class 1, 2 or 3 but then in order to classify
between 1.1 and 1.2 we need a different set of features calculated from the raw data. If
one instance is classified as 2 then there is no need to calculate the set of features
needed to distinguish between 1.1 and 1.2.
At last if we consider the feature construction process that is proposed in section 3.2 it is
clear that it will be useful to use this hierarchical approach in order build the best set of
features to discriminate firstly between passive and active activities and then between
standing and sitting or between running and walking.
3.4
Evaluation
“Evaluation is the key to make real progress in data mining” [20]. From all the
techniques that we can find in literature and software it is necessary to decide which
techniques to use and whether the produced models are reliable.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 26
In the paragraph above it is possible to find two different goals for a specific data
mining problem. First, it is necessary to estimate the performance of different models in
order to choose the best one, usually called in literature as model selection process.
Then there is the model assessment where after choosing a final model it is necessary to
test it on an independent data set and estimate the predictions errors.
3.4.1 Model Selection and Assessment
The evaluation procedures, in order to make the model selection, estimate measures of
accuracy and error predictions from the training dataset. To estimate these measures
there are different methods and there are different kinds of measures that can be used to
make the selection of the best model.
The simplest method is to decide a partition between the training set and the testing set.
Let´s assume a dataset,
, and a decision to use 25% of this dataset to test the model.
There will be a random selection from
to extract 75% of instances,
will be used to learn a model. The remain dataset,
, that
, is then used to evaluate the
learnt model.
A well known method is the k fold cross-validation. In machine learning one normally
uses a holdout set of data in order to measure its performance. After choosing a learning
algorithm
to learn a predictor
from the training dataset , it is necessary
to evaluate the quality of this predictor. In order to do so, k fold cross-validation method
proposes the partition of the dataset between training and testing in k folds. This method
makes a partition of
train,
all
in
folds uses all the combinations of
, and the remaining fold to test,
is used to train and test the predictor
folds of data to
, repeating this k times until
.
Besides the partition methods to evaluate the models created it is necessary to see which
kind of evaluation measures should be used to make the decision of which is the best
model. Considering a binary classification problem, a very interesting output to interpret
the evaluation results is to see the confusion matrix, example in figure 3-6, and see what
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 27
kind of errors where produced by the model. But this interpretation is not a reliable
comparison measure from which we can make a selection of different models.
Figure 3-6: Confusion matrix [36]: true positive = TP; true negative = TN; false positive =FP; false negative = FN.
There is a naïve measurement for model selection that is the accuracy of the model,
which is the rate of correct predictions from the model.
There is the possibility in unbalanced datasets that the major class (for example, class
“yes”) is all well predicted, reflecting a high accuracy measure, and the less represented
class (class “no”) is being misclassified. This will give us a good accuracy measure but
a poor classification model. In order to solve this problem it is useful to interpret other
measures, such as the Area under ROC curve (AUC) or the relative absolute error
(RAE).
A ROC (Receiver Operating Characteristic) curve has two dimensions: the sensitivity
on the x-coordinate and (1 – specificity) on the y-coordinate.
One model, when
comparing their ROC curves, dominates another if it is above and left of the other
(higher true positive and lower false positive). The ROC curve displays the relationship
of predictions and outcomes by plotting the estimates of sensitivity versus (1specificity) for all possible threshold values. There are two boundaries for the AUC
measure [40]: 0.5 when we are facing a random prediction and 1 when we achieve the
best modeling from the dataset available to make the model selection.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 28
Suppose that for a single instance we have a classification probability vector
where k is the number of classes and then we have a vector where you can find the real
class
. Considering that the learning algorithm is a decision tree and makes
deterministic predictions where one of the probabilities will be 1 and all the rest will be
0 we have as a quadratic loss function for each instance
correct prediction is
where the
and the sum contains the incorrect predictions. From here it is
possible to apply the quadratic loss function to the test sample:
.
From the quadratic loss function it is possible to calculate another good evaluation
measure to select the best model: the RAE (Relative absolute Error). This measure is
represented by applying the absolute value to the quadratic loss function
. So, in order to select the best model it is important to minimize the RAE and
maximize the area under the ROC curve.
Having chosen the best model through the evaluation that was explained in this section
there is one last but very important evaluation to make: the model assessment. Using the
cross-validation method we will look for the lowest error rates. This comparison will
only be performed after the last model is chosen.
3.5
Conclusion
In this chapter we have seen some of the most important techniques and methods that
will be used in the next chapters 4 and 5. It has been clearly shown that the relation
between feature construction and model selection will be dynamic.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 29
Chapter 4: Data Analysis
This chapter has as purpose to understand the characteristics of the data we will be
using to select and assess our activity recognition model and help the development of
the experiments that will be described in chapter 5. To start there is a small description
of the dataset collected, seeing the application used to collect new data, and how it was
developed the collection of data.
In section 4.2 we start the statistical analysis of the dataset, starting with a time series
analysis and then moving to a non temporal analysis once it will be applied sliding
windows to the data streams in order to extract timeless statistical information.
4.1
Data Collected
One of the applications [41] available to collect data from Android-based smartphones,
which interface can be found in figure 4-1, collects the acceleration (in
) for each
of the three axis that these sensors can measure: X, Y and Z.
Figure 4-0-1: AccDataRec interface and QR Code (that can be used to download the application).
The starting dataset used for this project is composed by 4 different activities {Walking;
Running; Sitting; Standing} and have 293970 labeled examples, collected at 20Hz
which corresponds to 20 instances per second, giving us approximately 4 hours of
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 30
activities in total, collected with the AccDataRec application [42] measuring the
acceleration (in
) that the accelerometer can recognize, including the gravitational
force and then scaled to G-force (by the expression
). This data
was partially collected for this thesis and partially collected for previous works [7].
There is a constraint in this collection of data concerning to the frequency at which it is
collected. The intuition behind this problem is that the Android platform changes the
power oriented to the sensors depending on the status of the mobile (active, stand by,..)
but the changes in the collection frequency rate are not significant depending on the
smartphone used.
4.2
Data Description
The first characteristic of this dataset is the discrepancy in the distribution of examples
for the activities considered, as the figure 4-2 clearly shows. About 78% of the
examples collected are walking, then running is about 13% of the dataset, and at last we
have the passive activities (Sitting and Standing) representing only 9% of the dataset.
Percentage of instances for each Class
Standing
Sitting
Running
13%
78%
Walking
Figure 4-2: Distribution of examples for the activities.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 31
2.2.1 Time Series
To start our data analysis for each activity we present chronograms (in figures 4-3 and
4-4), with a sample of 100 instances for each activity, from where we can distinguish
active activities (that involve movement) and passive activities.
Modeling these time series has not been considered because our problem is not to
predict which the next activity of a particular user is, but to classify the activity that is
taking place at the moment.
The interest of the chronograms in figure 4-3 and 4-4 in this research is to indicate
which kind of features should be calculated in order to discriminate between classes.
Whereas in figure 4-4 we can see clear gravitational information in passive activities
that can be explored be statistical measures of location, in figure 4-3 the different
amounts of movement between the activity running and walking might be explored by
the use of statistical measures to calculate the dispersion.
1
0
Acceleration m/s^2
-2
-1
0
-2
-1
Acceleration m/s^2
1
2
Walking Acelerometer Data
2
Running Acelerometer Data
0
20
40
60
80
100
0
Frequency: 20Hz => 5 seconds of activity
Red=X Axis; Blue=Y Axis; Yellow=Z Axis
20
40
60
80
100
Frequency: 20Hz => 5 seconds of activity
Red=X Axis; Blue=Y Axis; Yellow=Z Axis
Figure 4-3: Sample of Active data: Running (left) and Walking (rigth).
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 32
0.5
0.0
Acceleration m/s^2
-1.0
-0.5
0.0
-1.0
-0.5
Acceleration m/s^2
0.5
1.0
Standing
1.0
Sitting
0
20
40
60
80
100
0
20
Frequency: 20Hz => 5 seconds of activity
Red=X Axis; Blue=Y Axis; Yellow=Z Axis
40
60
80
100
Frequency: 20Hz => 5 seconds of activity
Red=X Axis; Blue=Y Axis; Yellow=Z Axis
Figure 4-4: Sample of Passive data: Sitting (left) and Standing (rigth).
At last it is important to say that these time series don´t show any kind of trend for each
activity. In terms of seasonality it might look appealing considering the study of them
for active activities but these seasonality analyses, by producing the total and partial
autocorrelations for each axis, we find them to be insignificant when considering time
series with these dimensions, as we can find in the example of figure 4-5 where the
ACF and PACF of walking was calculated considering the Y axis.
-2.0
-1.0
ACF
0.0
1.0
Series: walk[3]
0
20
40
60
80
100
60
80
100
-2.0
PACF
-1.0
0.0
1.0
LAG
0
20
40
LAG
Figure 4-5: ACF and PACF of walking with a lag of 100 records.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 33
4.2.2 Non Temporal Analysis, Boxplots
A good way to have an intuition of how different de activities are and how the statistical
measures (as means, standard deviations and interquartile ranges) will be when applying
a sliding window to extract features is to look at the boxplot of all the training dataset
for each axis in relation to each activity.
The boxplots, represented in figures 4-6, 4-7 and 4-8, for each axis indicates that if we
try to discriminate activities with a simple mean considering all the data for each
activity it will be considerably goo if we only consider passive activities as sitting and
standing. Taking into account all the activities, it can be very difficult discriminate
between activities as running or walking once the distribution in each axis domain for
the active activities are similar.
0
-2
-1
X Axis Values
1
2
Boxplot of X Axis
Running
Sitting
Standing
Walking
Activities
Figure 4-6: Boxplot for the X axis data in relation to the different activities.
The difference between the Passive activities when analyzing the plots in the figures 4-7
and 4-8 are even bigger and this happens because the gravitational force when standing
is applied, for this dataset, on the Y axis and when sitting this force will be measured in
the Z axis which can be conceptualized (see figure 1-1) by the way the sensor is
measuring the acceleration.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 34
0
-2
-1
Y Axis Distribution
1
2
Boxplot of Y Axis
Running
Sitting
Standing
Walking
Activities
Figure 4-7: Boxplot for the Y axis data in relation to the different activities.
0
-2
-1
Z Axis Distribution
1
2
Boxplot of Z Axis
Running
Sitting
Standing
Walking
Activities
Figure 0-1: Boxplot for the Z axis data in relation to the difFerent activities.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 35
It can be concluded that features containing statistical measures of location will not be
useful to distinguish between active activities (walking and running) and there must be
first a separation between passive and active activities in order to use statistical location
measures for the classification of standing and sitting.
4.2.3 Non Temporal Analysis, Scatterplots
In our analysis to the scatterplots it is important to search for characteristics between
different axes from the raw data in order to define which characteristics the features that
will be computed should have. To start this analysis consider a scatterplot only with
passive (standing and sitting) activities, in figure 4-8, once we already know that
measures of location can discriminate well these two activities, and then adding the
remaining activities.
Figure 4-9: Scatterplot of all axes considering only passive activities.
As suspected before, when analyzing the box plots, with the scatter plots we can see two
different groups in the passive activities taking into account only the raw data. With
location measures, like the mean or the median for each axis, as features it will be
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 36
possible to reinforce the discrimination between the two passive activities but from the
raw data it would be possible to build a good model considering a non linear algorithm
as KNN. This is another good indication on the direction of which kind of features
should be calculated to discriminate these passive activities, but there is still the
problem discussed in the previous section of discriminating between passive and active
activities. In figure 4-10 we can see part of the problem when it is included running to
the scatterplot.
Figure 4-10: Scatterplot of all axes considering passive activities and running.
By adding to the scatterplot the activity running we can see overlapping problems in the
domain of the three axes. This is natural because if we conceptualize the cyclical
movement of the body while running; running involves cyclical accelerations in each
axis of the accelerometer so the challenge is to develop features that capture
peculiarities in these distributions, solving at the same time the orientation problem of
the smartphone. If we focus only in the distribution of Y it is possible to see some
saturation problem in this axis, something natural when the sensors have boundaries
ranges smaller than some of the accelerations that are produced by active activities. The
problem grows when we decide to consider in the same scatterplot all the activities, as it
can be seen in figure 4-11.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 37
Figure 4-11: Scatterplot of all axes considering all the activities.
With this last scatterplot it becomes clear the problem of classification only using raw
data. The activity walking has a huge majority of instances and the distribution in the
scatter plots tell us that this activity covers almost all the domain of each axis as it
happened with running in figure 4-10.
The problem of saturation here is even more recognizable. In the distributions of all the
axes we can see a peak in the extremes caused by the measuring limitation of the
accelerometers placed on almost all the smartphones. In the case of these collections the
range of the sensors were from -2,3g to 2,3g.
4.2.4 Non Temporal Analysis, Pearson Correlations
To finish this non temporal data description we will take a look on the crosscorrelations between axes for each activity. Once all the variables are numeric it was
used the Pearson correlation coefficient, from which it is possible to deduct if there is a
linear dependency between each two numerical variables; in this case between each two
axes for each activity.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 38
0.0
-1.0
-0.5
Correlation degree
0.5
1.0
Correlations
X/Y
X/Z
Y/Z
Axis in Relations
Red=Walking; Green=Running; Yellow=Sitting; Purple=Standing
Figure 4-12: Cross-correlation for each pair of axes calculated for all the activities.
Looking at the correlations between axes when taking into account all the dataset, we
can find that there aren´t linear relations between different axes for the activities
walking and running.
Although we cannot find strong correlations between the two active activities the
calculations of this correlation will be performed taking into account smaller portions of
data, using sliding windows because this features were proposed by related works[5] of
activity recognition.
A very interesting relation in the passive activities is that they reflect the linear relation
between Y and Z axis. This is happening because in someone is sitting if it moves to
adjust the way it is sitting the gravitational force will raise in the Z axis almost by the
same amount it will decrease in the Y axis.
From this stage of the document we can already see which will be the first set of
experiences after the baseline (that will be done only with the raw data). The data
analysis show that we must focus on dispersion features in order to characterize active
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 39
activities and location measures in order to find where the gravitational force is being
applied and from that predict if the person is Sitting (gravitational force measured in Z
axis) or standing (where the gravitational force will be recorded in Y axis, in X axis or
partially recorded by both these axes).
.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 40
Chapter 5: Experiments and Modeling
In this chapter I present and analyze all the experiments towards the best set of features
to learn a generalized model in order to recognize the activities I am proposing:
walking, running, sitting and standing. The Experimental setup works as a methodology
guide of the implementations done in this chapter to achieve the best set of techniques
implemented along the experimental procedures. Then all the experimental results can
be found from the section 4.2, with the baseline experiments, until the model
assessment in section 5.6, with the best set of features and techniques proposed, and a
brief discussion in section 5.7.
5.1
Experimental Setups
In this section we start by describing the dataset used in these experimental procedures,
the reasons to try hierarchical approach in this classification problem, the learning
algorithm implemented, the attribute selection techniques used in order to validate the
construction of new features, and evaluation of our experimental results.
The final dataset used in this work was provided by 4 subjects that have recorded all 4
activities several times. This collection was done by smart phones incorporated by
triaxial accelerometers. The baseline dataset is composed by almost 300000 labeled
instances collected at a 20Hz average frequency, which corresponds to one second per
each 20 instances. This gives us a total of about 4 hours of activities collected. The
baseline dataset was composed by the number of instances shown in table 5-1, used in
the baseline experiences:
Walking
Running
Sitting
Standing
instances
193840
2517
17776
8070
Percentage
87.23%
1.1%
8%
3.63%
Table 5-0-1: Number of instances and percentage used for the baseline experiences
Once in the chronograms in section 4.2.1 we can see that we need to construct different
features when considering different activities, it is easy to see that sub-grouping the
dataset first into Passive activities (involving low amount of movements: sitting and
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 41
standing) and Active activities (involving some or much movements: walking and
running) can be a good way to look for the best set of features.
In the baseline experiments will use the software Weka [19] (Waikato Environment for
Knowledge Analysis) to experiment the chosen algorithm, J48 Decision tree [18], in the
following datasets: “All.csv”, “Active.csv”, and “Passive.csv”; and from these results
see if the hierarchical approach should be used.
The choice of using decision trees for activity recognition in mobile applications comes
from some evidences that there is not a learning algorithm getting better results than all
others in Activity recognition until now [3] and also the aim of implementing this model
in a mobile environment where there is computational limitations and decision trees
already proved to be efficient [4].
When growing decision trees there are divide and conquer methods that select the best
features, starting the root of the tree and continue growing until we have the entire
dataset domain conquered and all the classes are separated in an optimal way. In order
to confirm the attribute selection of the divide and conquer methodology from C4.5,
implemented in Weka as J48 algorithm, it was decided to use the CfsSubsetEval for
attribute selection in Weka “Select Attributes” menu for the baseline experiments.
In the section 4.2 it was discussed the evaluation of models performance in order to
select the best for assessment. From the learning results of Weka, using the crossvalidation, we will try to minimize the Relative Absolute Error (RAE) and maximize
the areas under ROC curve trying to combine these two evaluation measures. Then the
best set of techniques will be the one that minimizes RAE and maximizes the ROC area.
In order to do this optimization I will start by comparing the first set of experiments
with features with the results of the baseline experiments, and build the best model
possible from those results.
5.2
Baseline Experiments
In this baseline experiment we have only used the original data recorded from the
mobiles accelerometers. There are three input features: X, Y and Z axis and the target
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 42
variable is the labels of the activities recorded. We start to experiment a divide and
conquer learning algorithm (J48) for all the activities collected and we will analyze the
evaluation measures from the cross-validation process.
Then, once it was proved previously in the data description that there are different
patterns in the data if we separate the dataset problems between active and passive
activities this problem will be divided into two subgroups and apply the same learning
procedure (J48 decision tree of Weka) to classify these two subgroups separately.
5.2.1 All Activities Classification
Taking into account all the activities we can see that the attribute selection process
selected all the axes as useful information in order to learn a classification model as it is
shown in the figure 5-1.
Figure 5-0-1: Feature Selection using CfsSubsetEval
The first classification model learnt was a C4.5 decision tree with a pre-pruning
decision of having a minimum number of objects in each leaf; 1% of instances
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 43
considering the smallest class. From the results in the figure 5-2 if we would take into
account only the percentage of corrected instances this model would look perfect.
Taking into account that the sample is composed by a large majority of the class
walking and the passive activities are have a steady signal that helps the decision tree
deciding where to cut for these two classes, it is clear that when growing this decision
tree it was considered almost only three classes: walking, sitting and standing. The class
running, according to the ROC area (70%), it is classified slightly better than a random
guess (that would be 50%).
Figure 5-2: Results from the baseline experiment for all the activities using J48 algorithm.
From the confusion matrix in figure 5-2 we can see that the biggest problem of this top
down approach, with overlapping of classes in the axes domains and unbalanced
dataset, is that the most represented class causes classification problems. We can see
that most of the instances labeled as running were classified as walking for example.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 44
This problem would have even more expressive results if we consider only active
activities.
The steady signal of the acceleration for the passive activities is a good indication that
after discriminating between passive and active activities classifying between standing
and sitting should be a trivial problem; so we need to set the efforts most of all to find
one measure that distinguishes between passive and active activities and also a set of
features to distinguish between the classes running and walking.
5.2.2 Active Activities Classification
Another approach in this phase of experiments is to growth a decision tree only with
active activities and an unbalanced dataset (more recording of walking than running).
Once more it was used J48 as a decision tree leaning algorithm with a pre-pruning
decision of having a minimum number of instances in each leaf: at least 25 examples,
corresponding to 1% of running. The results of this experiment are in the figure 5-3.
Figure 5-3: Results from the baseline experiment only for active activities using J48 algorithm.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 45
The results for the sub-group classification problem are as expected in section 5.2.1
very poor. The RAE it is nearly 100% and the ROC area shows a model no better than a
random guess.
From the evaluation measures of this model, we can point to two kinds of problems we
are facing with our baseline dataset. First, there are similar distributions caused from the
ambulatory processes of active activities. Secondly, the disparities between the samples
collected from the activity “Running” (1,3% of the dataset “Active.csv”) and “Walking”
is causing problems in the divide and conquer method of this learning algorithm.
In order to solve these two problems the development of features will be very important
in order to discriminate these two activities and the the composition of the dataset must
be more balanced. Once it is very difficult to have subjects willing to collect
accelerometer data and there are not online datasets for activity recognition it was
decided to balance the dataset in the following experiences by the down sampling of the
most represented activity, walking, and to put the sampling efforts in order to collect
only running activities.
5.2.3 Passive Activities Classificcation
As we can see in the beginning of this chapter, using the attribute selection toolkit in
Weka Explorer to find which features to use in learning algorithm without feature
selection, we discovered that from the three raw attributes {X, Y and Z} CfsSubsetEval
selected only two axes (Y and Z) to discriminate between the activities standing and
sitting, as it is shown in figure 5-4.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 46
Figure 5-4: Attribute selection from the raw data using CfsSubsetEval and considering only passive activities.
The approach for this subgroup classification problem is to use again the hierarchical
learning algorithm J48 to learn a model. The pre pruning decision in this case was to
growth a decision tree where in each leaf there must have a minimum number of
objects; at least 1% of cases taking into account the activity with fewer examples (in
this case standing: 8273*0.01 ≈ 82 examples per leave). The input data was {X, Y and
Z} accelerometer signal and the results and respective decision tree can be seen in the
figure 5-5
Figure 5-5: Results from the baseline experiment only for passive activities using J48 algorithm.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 47
In the case of passive activities we can see that there it is possible to growth a very
simple decision tree using the Z axis only once. With these results in terms of Accuracy
( ≈ 100%), Relative Absolute Error (REA ≈ 2%) and ROC areas ( ≈ 1) it is possible to
confirm that there will be no problems classifying Passive activities using only
statistical measures of location, after solving the problem of discrimination between
passive and active activities.
5.3
First set of Statistical Measures
In the related work of activity recognition was found a huge variety of feature
construction used for activity recognition. We can find that most of works [3-5] used
statistical features or combinations of statistical features and frequency-domain features.
For this first set of experiments it was chosen a set of statistical features where can be
found relations between axes, location and dispersion information: means and standard
deviations[4, 14], inter quartiles ranges[5] and correlation measures[2] were computed
in sliding windows of 100 instances, that will give us 5 seconds in average of
information. The computation of this first set of features was done in R language[21]
and the idea here is to repeat the experimental setup of the baseline after incorporating
the new set of features and balancing the dataset.
The final train and test dataset is more balanced than the dataset used for the data
analysis, which can be found in section 4.1 and in the baseline experiments, with more
instances of running and a down sampling of walking reducing the number of instances
for about 150000, which will give us 2 hours of information for all the activities to
extract features. The table 5-2 gives the information about the new dataset chosen for
the following experimental procedures.
Walking
Running
Sitting
Standing
instances
87630
37647
17776
8070
Percentage
58%
24.9%
11.78%
5.35%
Table 5-2: Number of instances and percentages from the dataset used from section 5.3 until section 5.5.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 48
5.3.1 All Activities
When our problem is considered as a flat classification problem the attribute selection
procedure, having as target variable all the activities {Running, Walking, Standing and
Sitting} and as input features {Mean.X, Mean.Y, Mean.Z, SD.X, SD.Y, SD.Z,
IQR.X,IQR.Y,IQR.Z, Cor.XY, Cor.XZ and Cor.YZ} , selects only information about
the Y axis. This selection, in figure 5-6, is the result of different oscillation amplitudes
when considering the running and walking in the SD.Y (standard deviation of Y) and
IQR.Y (Inter Quartile Range of Y) or the gravitational force measured in the Y axis that
can be used to distinguish between standing and sitting.
Figure 5-6: selection from the first set of features using CfsSubsetEval and considering all the activities.
These results are not taking into account the rotation that the mobile can have inside the
pockets. The information that is being measured in the Y axis can go partially or totally
to the X axis, but this problem will be addressed in section 5.5.
After this attribute selection stage it was implemented J48 for all the activities, with a
pre-pruning decision of a minimum number of objects in each leaf; 1% of instances at
least (considering the less represented activity: Standing). The setup information for this
experience is shown in the figure 5-7 and then the figure 5-8 presents the results from
the learning process.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 49
Figure 5-7: Experimental setup of the flat classification process with the J48 algorithm.
Figure 5-8: Results from the first set of experiments using statistical features of location, dispersion and relation
between axes, considering all the activities and using the J48 algorithm.
From the pruned decision tree created by J48 algorithm the first conclusion is that the
improvements from the baseline experiments to this stage are remarkable: the ROC
areas are all around 1 and the RAE decreased 4%.
The decision tree created has 31 leaves and the total number of nodes is 61 as it can be
seen in the figure 5-9 and used almost all features extracted from axes information:
means, standard deviations and Inter Quartile Ranges.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 50
Figure 5-9: Information about the decision tree created by this experience.
Apparently a simple classification problem with only 4 activities to be classified can
work pretty good with these statistical features proposed in the literature but if we take
into account that we want to implement this systems in a mobile environment and that
there is computational limitations, the question is now if we can still keep these good
results with simpler model.
At last, in most of the classification problems these results would be considered near to
perfection but we need to point out that in a real environment application this features
would not resist to the orientation problems that the mobile rotation could cause and it
would be good to improve the classification model in order to have a more clean
confusion matrix being clear that there are some improvements in relation to the
discrimination between walking and running should be done.
At last when the comparison between the selection of attributes by CfsSubsetEval,
where it was chosen only features extracted from the Y axis to discriminate between
groups, and the features used in the pruned decision tree. There is a big discrepancy
between the number of features used by the tree (9 features to build this classifier) and
number of features chosen by the CfsSubsetEval.
5.3.2 Active Activities
When applying the same experimental setup that can be found in section 5.3.1 but
applied only to active activities it is expected that the statistical measures used to
discriminate between walking and running will be measures of dispersion in each axis.
The evaluation measures of this learning procedure can be found in the figure 5-10.
Our initial expectations where correct as it is possible to see in figure 5-11 and with the
additional information that now the classifier only used statistical features of dispersion
containing information from the Y axis.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 51
Figure 5-10: Results from the first set of experiments using statistical features of location, dispersion and relation
between axes, considering only active activities and using the J48 algorithm.
The results considering only the active activities are surprising because the RAE from
almost 100% in the baseline experiments decreased to 3% when considering statistical
measures of dispersion calculated from sliding windows and the areas under the ROC
curves are almost perfect. It is clear that with these results are good and the decision tree
produced does not show signs of overfitting.
Figure 5-11: Decision tree produced by the J48 classifier only for active activities.
With the results from this dataset is easy to say that the problem is almost solved but it
is necessary to have in mind that all the smartphones were all orientated in the same
way creating the possibility to obtain good results only with features created from the Y
axis. This happens because when running the vertical oscillations are bigger than when
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 52
the subject is walking. But what if the orientation of the phone changes inside the
pockets? This problem will be addressed in the section 5.4.
5.3.3 Passive Activities
From the baseline experiments it was expected that location measures of the Z axis
would be enough to distinguish between standing and sitting. The algorithm used for
this classification problem, J48, was as until now defined with a pre-pruning decision of
having a minimum number of objects in each leaf; at least 1% of instances, considering
the smallest class in this dataset: standing.
This experiment was conducted having a dataset with only passive activities, where the
input features are the same from the section 5.3.1 and the target variable is composed
only by {Standing, Sitting} and the evaluation measures for this learning process can be
found in the figure 5-12 along with the description of the decision tree created in the
figure 5-13.
The expectations were not defrauded and it is possible a perfect model it a simple Mean
extracted from the Z axis. The ROC areas are perfect ( =1) and the the RAE is 0% what
is in accordance with the expectations created in section 5.2.3.
Figure 5-12: Results from the first set of experiments using statistical features of location, dispersion and relation
between axes, considering only passive activities and using the J48 algorithm.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 53
Figure 5-12: Decision tree produced by the J48 classifier only for passive activities.
It is possible to assure the robustness of this classification problem without the need to
create a feature were the rotation of the mobile inside the pockets is taken into account.
If the subject is sitting the gravitational force will be mostly measures in the Z axis, but
when standing the gravitational force can be measured in the Y axis, the X axis or even
the combination of both axes but never in the Z axis.
5.4
Hierarchical Classification
It was demonstrated in the Data Analysis in chapter 4 that the flat classification problem
of four activities there was the possibility of aggregate these classes into two major
groups: Passive and Active activities; but this methodology was not implemented until
now.
The idea in this section is to make some experiments where the problem is firstly
divided into Passive and Active activities, and then use the classifier from the section
5.3.2 to classify Walking and Running (inside active activities) and use the classifier
from the section 5.3.3 to classify Standing and Sitting (inside passive activities). Until
now the experiences have been developed in comparison with the baseline but now the
aim is to see if there is a feature from the set of features used in the section 5.3 that is
able to discriminate between passive and active activities.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 54
5.4.1 Sub-grouping into Passive and Active Activities
From the dataset created for the last section it was changed the name of the classes
associated to the features calculated where the instructions change the name of classes
as running and walking into “Active” or standing and sitting into “Passive” are:
> Passive_feat1<-Passive_feat
> Passive_feat1[,ncol(Passive_feat1)]="Passive"
> Active_feat1<-Active_feat
> Active_feat1[,ncol(Active_feat1)]="Active"
> All_feat_bin<-merge(Passive_feat1,Active_feat1,all=T)
> write.csv(All_feat_bin,file="All_feat_bin.csv")
From this transformation the goal of the experiences was to find if it is possible to have
a good discrimination between passive and active activities. The hypothesis at this stage
was that if this implementation would reduce the number of features needed for the
classification process, then it would be part of the methodology proposed to address my
problem.
Firstly it was implemented the CfsSubsetEval in the Select Attributes menu in Weka
explorer in order to see which were the best features that would solve this problem, and
once more the features from the Y axis showed to be the most important as it is possible
to see in the figure 5-14.
Figure 5-14: selection from the first set of features using CfsSubsetEval and considering all the activities after
renaming then into Passive and Active activities.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 55
Before stepping into the classification problem it is necessary to remember the problem
discussed in the end of the section 4.3.2 where it was reminded that there are no
guarantees that the mobile will be used in the pockets always with the same orientation,
then the information collected in the Y axis can be partially or totally transferred to X
axis. This problem will be addressed in the section 5.5.
Regardless of the problem described in the paragraph above, it is now important to
implement the chosen learning algorithm, J48, with the pre-pruning decision that was
applied until now of a minimum number of examples in each leaf: in this case the
threshold is 256 cases in each leaf having as target variable the classes {Passive and
Active} the same input features as in the section 5.3. The evaluation measures of the
model produced are shown in the figure 5-15.
Figure 5-15: Results from the upper level of the hierarchical approach using the same statistical features of
section 5.3, considering all the activities renamed into Passive and Active activities and using the J48 algorithm.
The evaluation measures of this model are: RAE around 1,7% and the areas under the
ROC are almost 1. These results must be considered among the RAE in section 5.3.2
and 5.3.3 but, once the RAEs of these sections are almost 0%, we can say that these
results are significantly better when compared with the 10% of RAE from the section
5.3.1 when it is considered all the activities together as a flat classification problem.
Secondly, it is also important to make a comparison between the number of features
needed for a classification problem considering all the activities at once and when it is
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 56
implemented the sub-group decomposition approach. From the section 4.3.1 we know
that if we consider all the activities for the classification problem the decision tree will
incorporate 9 features with information from all the axes: means, standard deviations
and inter quartile ranges.
As it is possible to see in the figure 5-16, when considering the hierarchical approach
we can see that the for the first classification between passive and active activities it is
needed 4 features and then moving to the lower level of classification when it is passive
activity we need only one feature, the mean of the Z axis (see figure 5-13), to
discriminate between standing and sitting or when it is an active activity it is necessary
2 features, taken from the Y axis (see figure 5-11), to discriminate between walking and
running. The features of the lower level of classification are also used to classify
between passive and active activities, reducing from 9 features needed in the section
4.3.1 to 4 features when using the hierarchical classifier.
Figure 5-16: Decision tree produced by the J48 classifier for all the activities in the upper level classification of the
hierarchical approach {Passive, Active}.
From this section the conclusion is that subgroup approach for these 4 activities is
useful because improves the results of the classification model and at the same time
reduces the computational needs for this problems what can be helpful in a mobile
environment.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 57
5.5
The Orientation Problem
In this section we present the development of features in order to solve the orientation
problem caused by the mobile device inside the subjects‟ pants pockets. We are still
considering the placement of the mobile in one of the pants pockets of an individual but
this can be considered as a step into the ability to implement this model in a real
environment.
We are now proposing the implementation of a feature called the sum of the first
derivatives in order to time. This feature is proposed because it calculates the amount of
movements recorded in all axes and I believe this can be the threshold needed to
discriminate between Passive and Active activities as it was presented in the section
3.2.2:
x2+
2
+
2
5.5.1 Implementing 1st Derivatives
At this stage of the experimental procedures it is presented a set of features almost the
same than in the last section. The only difference is the extraction of a new feature,
denominated as the sum of the 1st Derivatives (Sum.1st.Derivative). From the boxplot
presented in the figure 5-17 it is possible to have an idea of the discriminative power of
this feature.
3000
2000
1000
0
Sum of 1st Derivatives Distribution
4000
Boxplot: Sum of the 1st Derivatives
Active
Passive
Activities
Figure 5-17: Boxplot presenting the Sum.1st.Derivative when considering the upper level of the hierarchical
classification problem.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 58
The experiment setup of this section is the use of the input features {Mean.X, Mean.Y,
Mean.Z, SD.X, SD.Y, SD.Z, IQR.X, IQR.Y, IQR.Z, Sum.1st.Derivarite} in order to
classify the target variable {Active, Passive} by implementing the learning algorithm
J48, from Weka, with a pre-pruning decision of at least 1% of objects in each leaf
(considering the smaller class: Passive activities). This experiment has as goal to see if
this new feature reduces the number of features needed to discriminate between passive
and active activities and at the same time solves the problems of orientation detected
previously. For the observation of this experiment results and decision tree produced
please see the figure 5-18.
Figure 5-18: Results from the upper level of the hierarchical approach after the implementation of the feature
st
Sum.1 .Derivative, and it´s respectively decision tree produced by the J48 algorithm.
From the figure above we can conclude that the “Sum.1st.Derivative” becomes the most
powerful feature to discriminate between passive and active activities once we can see it
as the root of the decision tree.
The next comparisons were done in relation to the section 5.4.1 once the algorithm and
specifications are the same and the only difference is the introduction of the new
feature. The evaluation measures of the model also improved, when considering the
RAE (0.7208%). We can see also that the number of features needed to build this
classification model was also reduced from 4 features in section 5.4.1 to 3 features now.
This experiment was repeated once more with a new pre-pruning decision of having a
new pre-pruning decision of at least 5% of instances in each leaf, when considering the
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 59
smaller class (in this case the passive activities). The minimum number of objects in the
leaves at this time was 1300 and the results can be seen bellow in figure 1-19.
Figure 1-19: Results from the upper level of the hierarchical approach and a more severe pre-pruning decision;
respectively decision tree produced by the J48 algorithm.
With similar results in terms of evaluation measures from the section 5.4.1 the major
conclusion about this experiment is that it is possible to use this feature as a threshold to
distinguish between passive and active activities. From the confusion matrix it was
possible to see that most of the incorrectly classified instances were active activities
classified as passive which can be explained by some transition moments facing some
environmental obstacle.
At last for this section it should be presented transformations to the statistical features
presented in section 5.3 with the conceptualization described in section 3.2.2. These
transformations are meant to capture all the information of axes X and Y once the
mobile device can rotate inside the pockets, check figure 1-1. Once the dataset used was
collected using always the same placement and orientation it would worthless to present
these transformations in this thesis with the dataset available.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 60
5.6
Model Assessment
In the previous sections of experiments we can see two different approaches that must
be compared: flat classification and the hierarchical model. In this section it was
decided to compare the two approaches having as target variable the four activities
considered {walking, running, standing, sitting} and to include as input features
{Mean.X,
Mean.Y,
Mean.Z,
SD.X,
SD.Y,
SD.Z,
IQR.X,
IQR.Y,
IQR.Z,
Sum.1st.Derivarite}.
A new flat classification model was learnt in order to include the feature developed in
section 5.5 (Sum.1st.Derivative) and then this model will be compared to the
hierarchical model. The comparison will be done by estimating the error rates using the
cross-validation method, also by the comparing the dimension of the models by the
number of nodes and the time taken to learn each approach.
5.6.1 Assessment of the Flat Classifier
To start this comparison first we decided to start by learning a new flat model using one
more feature than the ones used in section 5.3.1: Sum.1st.Derivative. The input features
for this model are mentioned in the beginning of section 5.6 and the target variable is
composed by the activities {walking, running, standing, sitting}. The learning algorithm
used was J48 and the pre-pruning decision was for the minimum number of objects in
each leaf: at least 1% of the minor class (in this case standing) and the setup can be seen
in figure 5-20.
The size of the decision tree produced is 53 nodes (see figure 5-21), that is less than the
decision tree produced in the experiment 5.3.1 with a size of 61, and once the only
change from that experiment to this is the introduction of the feature developed in the
experiments 5.5 it is possible to claim that this improvement was caused by the feature
Sum.1st.Derivative. It is also possible to see in figure 5-21, that it took 27 seconds to
build this flat classification model.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 61
5-20: Setup for the flat classification model.
5-21: Time taken to build and size of the flat classification model
The evaluation results of the flat classifier can be seen in figure 5-22.
5-22: Evaluation results for the flat classifier.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 62
As figure 5-22 shows, the estimation of the RAE is 3.1325% for the flat classifier. All
areas under the ROC curves have a value close to one and in general it could be
considered as an almost perfect model. From these results it is necessary to compare
with the results of the hierarchical model.
5.6.1 Assessment of the Hierarchical Classifier
In order to estimate the error rate of the hierarchical classifier a three step stages was
implemented. Firstly we used the model created in section 5.5 (see figure 5-18) in order
to estimate the error rate of the upper classification level. The second step was to split
the dataset into passive and active activities and then build a lower level classifier for
both datasets. At last, after building the models in the second step, the weighted error
(RAE) was calculated for the hierarchical model in order to compare it with the error of
the flat classification model.
The first step only involves the interpretation of the results presented in figure 5-18. The
RAE of the upper levels classifier is 0.7208%. The size of this tree is seven nodes and
the building procedure took 8.41 seconds. In figure 5-23 it is shown the confusion
matrix of this model and the error at this stage of the model will be propagated until the
lower levels´ leaves.
5-23: Confusion matrix of the upper level classifier, from experience shown in figure 5-18.
In the second step, after splitting the dataset into passive and active activities, two lower
level models were produced. For the passive activities dataset the target variable was
{standing, sitting} and for the active dataset it was {walking, running}. The input
features for these models were the same as shown in figure 5-24. The purpose of these
models is to integrate them in the classification procedure after the activities have been
classified as active or passive as the figure 5-25 explains.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 63
5-24: Passive and active experimental setups.
5-25: Conceptualization of the hierarchical model.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 64
The output and evaluation results for the passive and active activities models are
presented in figure 5-26. For the active model, J48 algorithm selected as relevant
features {SD.Y, IQR.Y, SD.Z, IQR.X} and for the passive model the only feature
selected was {Mean.Z}. The sizes of the trees produced were 11 nodes for the active
model and 3 nodes for the passive model. In terms of time taken to build these models
the results are, 0.21 seconds for passive activities and 21.4 seconds for active activities.
5-26: Full output results for passive and active models, using J48 algorithm.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 65
At last, it is necessary to calculate a weighted error for the hierarchical model in order to
compare it with the flat classifier. For the upper level classifier, the errors will be
propagated until the lower level classifiers` leaves so the error should be fully
considered for the estimated RAE of the hierarchical model.
On the other hand, in order to estimate the error of the lower levels classifiers, we must
take into account only the correctly classified instances in the upper level classifier that
might be misclassified at this stage of classification truth the sensitivity and specificity
of the upper level classifier:
The weighted RAE for this hierarchical classification problem is 4.0084%.
5.7
Discussion
The first conclusion from this set of experiences is that the construction of features here
is the major problem in order to build good classifiers. The feature developed in the
section 5.5 addresses the orientation problem and at the same time produces better
discriminations between active and passive activities. This feature is also important
when we build the lower level classifier to classify between running and walking.
Secondly, the results that we have seen in section 5.6, it is difficult to have a strong
claim in relation to which modeling approach is better. It is our belief that the
hierarchical model it is better than the flat classification problem, even if it produces
slightly worst results. The hierarchical approach brings some advantages in terms of
knowledge about the activity recognition problem and when considering new activities,
not considered in this thesis, it will give a better framework to develop new features.
There might be also some discussions about the sampling procedure with the use of
sequence-based sliding windows. This method was only implemented in this work
because the collection frequency of the accelerometer was defined to be always at 20Hz.
Considering the need to preserve battery in the smartphones the implementation of
time-based sliding windows should be taken into account when implementing this
model in a mobile application.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 66
At last, at this moment, our claim is that this model is not prepared enough to be applied
in an application when considering other problems described in section 1.3 as
challenges. First of all, most of these challenges need to be solved and only then, we
might have a strong claim in relation to the robustness of this activity recognition model
to the real mobile environment.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 67
Chapter 6: Conclusions and Future Work
The development of this work was really helpful in relation to the development of Data
Mining knowledge of the author thanks to the working environment and the knowledge
shared by the supervisors. It should be seen as an incomplete work, as all master thesis,
but it is our belief that it should be considered as a good framework for future work.
As future work it is important to set at least five main goals. Firstly it is important to
develop the model in order to implement an application to the mobile environment. This
should be done by integrating most of the challenges in section 1.3 and by developing
programming skills. Secondly after the first goal is set, it is necessary considering new
activities into account for the model in order to take advantage of the hierarchical model
approach.
In order to consider new activities it is necessary the development of better sampling
methodologies. The approach to this problem could be the development of new mobile
applications in order to collect labeled accelerometer data, therefore making it easier for
the user and for the group developing the model.
One of the most interesting directions this work could have is the implementation of a
semi-supervised model, where the relations between features can be considered as
constraints so that the model adapts better to the single user by learning from the data
collected by him/her. It is our belief that this approach can give useful information by
interpreting the adapted models.
At last, to address the activity recognition problem, it would be interesting to enrich the
information of the model. There are several approaches that could be considered. For
example, more information could be collected if more sensors would be used. The
constraint to decide which sensors to use should be by focusing on the sensors already
integrated in the mobile devices. Another approach could be by enriching the model
with server-based information in order to interact with the users. It is also an option not
to collect more data, but by the use of the outputs of the model one could think to track
the routines of the users.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 68
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Marc Mertens, G.D., Jonas Van Den Bergh, Toon Goedemé, Koen Milisen, Jos Tournoy,
Jesse Davis, Tom Croonenborghs and Bart Vanrumste. Towards automatic monitoring
of activities using contactless sensors,. in 20th Annual Belgian-Dutch Conference on
Machine Learning, Benelearn 2011. 2011. The Hague, The Netherlands.
Intille, L.B.a.S.S., Activity Recognition from User-Annotated Acceleration Data.
PERVASIVE COMPUTING: Lecture Notes in Computer Science, 2004. Volume
3001/2004: p. 1-17.
Jennifer R. Kwapisz, G.M.W., Samuel A. Moore. Activity Recognition using Cell Phone
Accelerometers. in Fourth International Workshop on Knowledge Discovery from
Sensor Data. 2010. Washington, DC.
Miluzzo, E., et al., Sensing meets mobile social networks: the design, implementation
and evaluation of the CenceMe application, in Proceedings of the 6th ACM conference
on Embedded network sensor systems. 2008, ACM: Raleigh, NC, USA. p. 337-350.
Preece, S.J.G., J.Y.; Kenney, L.P.J.; Howard, D.;, A Comparison of Feature Extraction
Methods for the Classification of Dynamic Activities From Accelerometer Data. IEEE
Transactions on Biomedical Engineering, 2009. 56(3): p. 871 - 879.
Masud, M.M., et al., A Practical Approach to Classify Evolving Data Streams: Training
with Limited Amount of Labeled Data, in Proceedings of the 2008 Eighth IEEE
International Conference on Data Mining. 2008, IEEE Computer Society. p. 929-934.
Santos, A.C., et al., Providing user context for mobile and social networking
applications. Pervasive Mob. Comput., 2010. 6(3): p. 324-341.
Jhun-Ying Yang, Y.-P.C., Gwo-Yun Lee, Shun-Nan Liou and Jeen-Shing Wang, Activity
Recognition Using One Triaxial Accelerometer: A Neuro-fuzzy Classifier with Feature
Reduction. Lecture Notes in Computer Science, 2007. Volume 4740/2007: p. 395-400.
Krishnan, N.C.P., S.;, Analysis of low resolution accelerometer data for continuous
human activity recognition, in IEEE International Conference on Acoustics, Speech and
Signal Processing, 2008 (ICASSP 2008). 2008: Las Vegas, NV. p. 3337 - 3340.
Ermes, M., et al., Detection of Daily Activities and Sports With Wearable Sensors in
Controlled and Uncontrolled Conditions. Information Technology in Biomedicine, IEEE
Transactions on, 2008. 12(1): p. 20-26.
Felicity, R.A. and et al., Classification of a known sequence of motions and postures
from accelerometry data using adapted Gaussian mixture models. Physiological
Measurement, 2006. 27(10): p. 935.
M. J. Mathie, B.G.C., N. H. Lovell and A. C. F. Coster, Classification of basic daily
movements using a triaxial accelerometer. MEDICAL AND BIOLOGICAL ENGINEERING
AND COMPUTING, 2004. Volume 42(Number 5): p. 679-687.
Preece SJ, G.J., Kenney LP, Howard D, Meijer K, Crompton R., Activity identification
using body-mounted sensors--a review of classification techniques. Physiol Meas, 2009.
30(4): p. 1-33.
Susanna Pirttikangas, K.F.a.T.N., Feature Selection and Activity Recognition from
Wearable Sensors. Lecture Notes in Computer Science, 2006. Volume 4239/2006: p.
516-527.
Uwe Maurer, A.R., Asim Smailagic and Daniel Siewiorek, Location and Activity
Recognition Using eWatch: A Wearable Sensor Platform. Lecture Notes in Computer
Science, 2006. 3864/2006,: p. 86-102.
Norbert Győrbíró, Á.F.a.G.H., An Activity Recognition System For Mobile Phones.
MOBILE NETWORKS AND APPLICATIONS, 2008. Volume 14(Number 1): p. 82-91.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 69
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
Motwani, B.B.a.M.D.a.R. Sampling From a Moving Window Over Streaming Data. in
SODA. 2002.
Quinlan, J.R., C4.5: programs for machine learning. 1993, San Francisco, CA, USA:
Morgan Kaufmann Publishers Inc.
Waikato, M.L.G.a.U.o., WEKA. 2011.
Witten, I.H.a.F., Eibe and Hall, Mark A., Data Mining: Practical Machine Learning Tools
and Techniques. 3rd ed. 2011, San Francisco, CA, USA: Morgan Kaufmann Publishers
Inc.
R: A Language and Environment for Statistical Computing. 2009, R Development Core
Team: Vienna, Austria.
Trevor Hastie, R.T., Jerome Friedman, ed. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Springer Series in Statistics. 2001, Springer.
Bloedorn, E. and R.S. Michalski, Data-Driven Constructive Induction. IEEE Intelligent
Systems, 1998. 13(2): p. 30-37.
Dietterich T.G., M.R.S., Inductive Learning of Structural Descriptions: Evaluation Criteria
and Comparative Review of Selected Methods. Artificial Intelligence, 1981. 16(3): p.
pp.257-294.
Wah, P.M.a.L.A.R.a.B.W. Principled Constructive Induction. in Eleventh International
Joint Conference on Artificial Intelligence. 1089: Morgan Kaufmann.
Pfahringer, B. CIPF 2.0: A Robust Constructive Induction System. in ML-COLT'94. 1994.
Guan, D.a.Y., Weiwei and Lee, Young-Koo and Gavrilov, Andrey and Lee, Sungyoung.
Activity Recognition Based on Semi-supervised Learning. in 13th IEEE International
Conference on Embedded and Real-Time Computing Systems and Applications. 2007:
IEEE Computer Society.
Precup, J.F.S.M.a.D., Activity Recognition With Mobile Phones. 2011.
Frank, J.a.M., Shie and Precup, Doina. A novel similarity measure for time series data
with applications to gait and activity recognition. in 12th ACM international conference
adjunct papers on Ubiquitous computing. 2010. Copenhagen, Denmark: ACM.
Pari Delir Haghighi, A.Z., Shonali Krishnaswamy, Mohamed Medhat Gaber. Mobile
Data Mining for Intelligent Healthcare Support. in 42nd Hawaii International
Conference on System Sciences. 2009. Hawaii: IEEE Computer Society.
Goh, J. and D. Taniar, An Efficient Mobile Data Mining Model, in Parallel and
Distributed Processing and Applications. 2005. p. 54-58.
Costa, E.P., et al., Comparing several approaches for hierarchical classification of
proteins with decision trees, in Proceedings of the 2nd Brazilian conference on
Advances in bioinformatics and computational biology. 2007, Springer-Verlag: Angra
dos Reis, Brazil. p. 126-137.
Chapman, P., et al., CRISP-DM 1.0 Step-by-step data mining guide. 2000.
Motoda, H.L.a.H., Feature Extraction, Construction and Selection: A Data Mining
Perspective. 1998, Norwell, MA, USA: Kluwer Academic Publishers.
Motwani, M.D.a.A.G.a.P.I.a.R., Maintaining Stream Statistics over Sliding Windows.
SIAM Journal on Computing, 2002: p. 635--644.
Hastie, T., R. Tibshirani, and J.H. Friedman, The elements of statistical learning: data
mining, inference, and prediction. 2009: Springer.
Leo Breiman, J.H.F., Richard A. Olshen, and Charles J. Stone, Classification and
Regression Trees. 1984: Wadsworth International Group.
Esposito, F., D. Malerba, and G. Semeraro, Decision Tree Pruning as a Search in the
State Space, in Proceedings of the European Conference on Machine Learning. 1993,
Springer-Verlag. p. 165-184.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 70
39.
40.
41.
42.
Bratko, I., Prolog: programming for artificial intelligence. 3rd ed. ed. 2001, Boston, MA,
USA: Addison-Wesley Longman Publishing Co., Inc.
Fawcett, T., An introduction to ROC analysis. Pattern Recogn. Lett., 2006. 27(8): p. 861874.
Santos, A.C. AndroLib: Accelerometer Data Recorder.
2010; Available from:
http://www.androlib.com/android.application.pt-acoelhosantos-android-accnFwm.aspx.
Santos, A.C., AccDataRec. 2010, Android Market.
(a) Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
(b) Ramo do Conhecimento: Data Mining
Página 71