Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THE OPEN SOURCE MATLAB TOOLBOX Gait-CAD AND ITS APPLICATION TO BIOELECTRIC SIGNAL PROCESSING R. Mikut, O. Burmeister, S. Braun, M. Reischl Institute for Applied Computer Science, Forschungszentrum Karlsruhe GmbH, Germany E-Mail [email protected] Abstract In this paper, the open source Matlab toolbox Gait-CAD is presented. This toolbox is designed for the visualization and analysis of time series and single features with a special focus to classification problems. The aim is to provide an open platform for the development and improvement of data mining methods and the application to various medical and technical problems. Keywords Data Mining, Tools, Neuroprostheses Introduction In many applications, large data sets of time series and single features are recorded. An at least semi-automatic search for unknown or partially known relations requires the use of data mining methods [1]. In the last years, a huge number of potentially useful methods and software tools have been proposed including methods for feature extraction, classification, and regression. Many existing software tools are very powerful, but they cover only a very limited subset of implemented methods. However, the coupling between different necessary processing steps (as e.g. feature extraction from time series and classification) is rather weak. This leads often to the reimplementation of existing methods or a stepwise transfer of partial results between different tools. Some tools are focused on a script-based processing resulting in problems for a transfer to other applications due to a time-consuming manual adaptation of implemented algorithms. A generally accepted tool platform does not exist at the moment. These facts make a fast comparison of new developed methods against a broader set of existing methods very time consuming. As a consequence, the new methods will only be compared with a small number of concurrent approaches - a broad comparison is not feasible. In our opinion, an ideal data mining tool • has to contain various data mining methods from feature extraction to classification and regression using statistical approaches up to newer approaches from computational intelligence, • has to be free and open source to guarantee a wide acceptance in the scientific community and the fast integration of new methods, • needs to be modular with well documented interfaces to integrate various methods useful for highly specialized application domains, and • has to support a GUI based exploration of the data set as well as a highly automated script based processing of routine operations. This paper presents the Matlab toolbox Gait-CAD as a first step in this direction. It is focused on the visualization and analysis of time series and features, especially for classification, but also for regression problems. Our intention is the design of an open platform as a framework for the development and improvement of data mining methods. Methods The toolbox Gait-CAD bases on Matlab (tested for the versions 5.3 and 2007b). The decision to a Matlab-based solution was made to use the wide mathematical functionality of this package provided by The Mathworks Inc. A main disadvantage is the need for a MATLAB license. The toolbox is operated by a graphical user interface (GUI) with menu items and control elements like popup lists, checkboxes, and edit elements (Figure 1). This enables inexperienced users to work with the toolbox. However, the implemented algorithms work independently from the GUI. Thus, the Matlab-typical way of programming using a command prompt and variables is possible. Furthermore, an automation and batch standardization of analyzes is possible by designing individual macros. More details for the handling are explained in a comprehensible PDF handbook. Figure 1: Gait-CAD screenshot Gait-CAD is an open source software. The German version is available since November 2006, the English one since January 2008 It is licensed under the conditions of the GNU General Public License (GNU-GPL) of The Free Software Foundation. The download is possible using the downloading section at http://www.iai.fzk.de/projekte/biosignal/index.html. To use the toolbox for the design of a data mining algorithm, a training data set is required. This data set is normally given by a binary Matlab project file, containing matrices and vectors with predefined structures and names. This data set is normally given by a binary Matlab project file, containing matrices with given names. Additionally, the user is able to add own textual identifiers and further information to the matrices and structures. Missing information is compensated by standard values and identifiers. The import of data from text files (single files or complete directories, single features or time series) is possible. The training data set is organized with n = 1, ..., N data points, each containing • sz time series (described by a matrix with the dimension sz × K, with K - number of sample points), • s single features (vector with the dimension s) • sy discrete output variables (vector with the dimension sy). The management of multiple output variables (i.e. diagnoses with respect to diseases in medical applications, decisions for therapies, qualitative evaluations of therapy successes, gender, age-groups etc.) for each data point allows a flexible selection of multiple classification problems. Additionally, input and output variables may be switched depending on the problem. Gait-CAD implements the standardized data mining process proposed by [2]. The main components are shown in Figure 2. Gait-CAD permits a comfortable handling of numerous algorithms for the • selection of data points (e.g. detection of outliers, discarding of incomplete data points and features, selection of parts of data sets), • feature extraction (e.g. spectrograms, FFT analysis, correlation analysis, linear filtering, calculation of extrema, mean values, fuzzification etc.), • evaluation and selection of features and time series (e.g. multivariate analysis of variances, t-test, information measures, regression analysis), • feature aggregation (e.g. discriminant analysis, principal component analysis - PCA, independent component analysis - ICA), • supervised and unsupervised classification (e.g. decision trees, cluster algorithms, Bayes classifier, artificial neural networks (ANN), nearest neighbour algorithms, support vector machines - SVM, fuzzy systems), and • validation strategies (e.g. cross-validation, bootstrap). Additionally, there are various possibilities to visualize results, automatically log results and process steps in text and LaTeX files, rename variables etc. For some functions, Gait-CAD uses additional commercial Matlab toolboxes (e.g. Signal, Statistics, Neural Network, and Wavelet toolbox from the MathWorks, Inc.) or freely available GNU-GPL toolboxes. But most of the selfimplemented functions require only a standard Matlab installation. The feature extraction is realized with plugins. Plugins are single Matlab functions called plugin_*.m, which are included in a special directory or in the working directory. Database Problem formulation (verbalized) Collecting training data set Problem formulation (formalized) Evaluation measures Data point selection Feature extraction Validation strategies Feature selection Feature aggregation Visualization Classification/ Regression Design of a data mining method (Gait-CAD) Figure 2: Design process of a data mining algorithm [2] They generate • new time series from one (e.g. by low-pass or highpass filtering, segmentation) or more (e.g. minimum, mean or maximum value) existing time series, or • new single features from one time series in a predefined segment (e.g. mean value for the complete time series or the first 50% of sampling points). The segment can be defined by a special file or interactively by selecting a region of interest. Gait-CAD contains a large number of pre-defined plugins and segments. The structure allows a user-defined expansion with special feature types for each specific application field. Macros are recorded sequences of clicked menu items and control elements. The main advantages are an automation of long sequences of operations (e.g. for the use in different projects) and the opportunity for the integration of userdefined functions. A manual modification is possible due to its textual Matlab syntax. Application-specific extension packages can be easily integrated into the graphical user interface. Gait-CAD contains templates for new menu items and control elements as a starting point for a manual modification. It allows the integration of own functions using any parameter from the control elements or available variables. An example is a special package for electroneurography provided by the University of Freiburg. It contains the algorithms described in [3]. Results In many clinical applications, the available data set contains time series of recorded bioelectric signals such as muscle, nerve, or brain signals. The automatic design of data mining solutions offers an objective and reliable method for the generation of hypotheses for clinical trials, the data-based design of clinical decision support systems for diagnosis and therapy planning, and the adaptation of medical devices to individual patients. An example for the latter task is the detection of user intentions from brain, nerve or muscle signals or the information processing of nerve signals from natural limbs for neuroprostheses (Figure 3). Intentions Central Nervous System Neural Interface Sensor Interface Software Data analysis Control StimuStimulator lator Feedback Pattern generator Artificial Protheses Interface Software Pattern generator Stimulator Data analysis Sensor Natural Limbs Figure 3: Interface for the design of neuroprostheses [4] Table 1: Examples for recent applications of Gait-CAD to bioelectric signals (EMG: electromyography, ENG: electroneurography, EEG: electroencephalography, ECoG: electrocorticography) Applications Hand prosthesis control [5] Detection of mechanical stimuli from nerve signals with cuff electrodes [6] Detection of artefacts from Function Electrical Stimulation (FES) [7] Analysis of Central Pattern Generators [8] Design algorithms for Brain Computer Interfaces [5, 9] Gait analysis [10] Signals EMG ENG EMG, ENG EMG EEG, ECoG EMG Data analysis plays a key role in this concept for the databased detection of human intentions from bioelectric signals and for the use of biosensors. Gait-CAD has supported these steps for a number of different scenarios: For the first task, Brain Computer Interfaces are often controlled by imagined movements. The brain signals can be recorded by surface (EEG) or invasive (ECoG) electrode arrays resulting in a set of time series. The data mining task consists of the extraction new time series (e.g. by bandpass filters) and a classification to differentiate the movement intentions. In addition, an analysis of the local and temporal information content is useful to understand the processes [9]. Hand prostheses are usually controlled by muscle signals originating from two electrodes. Here, classification problems exist for the switching between different grasp types [5]. For future neuroprostheses, a scenario including functional electro stimulation and a recording of afferent nerve signals induced by mechanical stimuli is intended. The nerve signals are recorded by cuff electrodes. Here, very high sampling frequencies (50 kHz) are necessary to extract useful information. The problem is the detection and localization of mechanical stimuli by a classification task [6]. Besides these applications, Gait-CAD is now used in many medical, biological, and technical application scenarios. From a data mining point of view, these very different applications can be unified and the synergies can be used with the presented platform. Discussion The aim of Gait-CAD is to provide an interface to apply and compare data mining methods. Its architecture allows to enlarge the toolbox by further algorithms. Everyone is invited to support the further development of Gait-CAD. Acknowledgements Thanks to all the busy programmers, developers of algorithms, and testers, especially to Tobias Loose, and Sebastian Gollmer. The support by the Deutsche Forschungsgemeinschaft (German research association) within the project "Diagnosis support in gait analysis" and the Cooperate Research Center "Humanoid Robots" was a great help to build the basis for the further development of the toolbox. References [1] Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P.: From Data Mining to Knowledge Discovery in Databases. AI Magazine, Vol. 17, pp. 37–54, 1996. [2] Mikut, R.; Reischl, M.; Burmeister, O.; Loose, T.: Data Mining in Medical Time Series. Biomedizinische Technik, vol. 51, pp. 288–293, 2006. [3] Krüger, T. B.; Levchuk, O.; Stieglitz, T.: Decoding of Neural Signals with MATLAB - Onset Detection and Classification as a Guided Tool. Biomedizinische Technik, vol. 52, Ergänzungsband, 2007 [4] Mikut, R.; Krüger, T.; Reischl, M.; Burmeister, O.; Rupp, R.; Stieglitz, T.: Regelungs- und Steuerungskonzepte für Neuroprothesen am Beispiel der oberen Extremitäten. at - Automatisierungstechnik, vol. 54, pp. 523–536, 2006. [5] Reischl, M.: Ein Verfahren zum automatischen Entwurf von Mensch-Maschine-Schnittstellen am Beispiel myoelektrischer Handprothesen. Dissertation, Universität Karlsruhe, Universitätsverlag Karlsruhe. 2006. [6] Krüger, T.; Reischl, M.; Lago, N.; Burmeister, O.; Mikut, R.; Ruff, R.; Hoffmann, K.-P.; Navarro, X.; Stieglitz, T.: Analysis of Microelectrode-Signals in the Peripheral Nervous System, In-Vivo and PostProcessing. In: Proc., Mikrosystemtechnik Kongress Deutschland, pp. 69–72. Freiburg: VDE-Verlag. 2005. [7] Rohm, M.: Evaluierung und Inbetriebnahme von Sensorkonzepten für die Steuerung von funktionellen Orthesen der oberen Extremität. Diplomarbeit, Universität Darmstadt, Forschungszentrum Karlsruhe. 2008. [8] Chen, Y.: A Concept for the Application of Neural Oscillators and Spinal Reflexes to Humanoid Robots and Neuroprostheses. Diplomarbeit, Universität Karlsruhe (TH), in preparation, 2008. [9] Burmeister, O.; Reischl, M.; Mikut, R.: Application of Time-Variant Classifiers to Invasively Recorded Signals from Brain and Peripheral Nerve. Biomedizinische Technik, vol. 52, Ergänzungsband, 2007. [10] Wolf, S.; Loose, T.; Schablowski, M.; Döderlein, L.; Rupp, R.; Gerner, H. J.; Bretthauer, G.; Mikut, R.: Automated feature assessment in instrumented gait analysis. Gait & Posture, 23 (3), S. 331-338; 2006