Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An adaptive modular approach to the mining of sensor network data G. Bontempi, Y. Le Borgne (1) {gbonte,yleborgn}@ulb.ac.be Machine Learning Group Université Libre de Bruxelles – Belgium (1) Supported by the COMP2SYS project, sponsored by the HRM program of the European Community (MEST-CT-2004-505079) Outline Wireless sensor networks: Overview Machine learning in WSN An adaptive two-layer architecture Simulation and results Conclusion and perspective Y. Le Borgne 2 Sensor networks : Overview Goal : Allow for a sensing task over an environment Desiderata for the nodes: Autonomous power Wireless communication Computing capabilities Y. Le Borgne 3 Smart dust project Smart dust: Get mote size down to 1mm³ Berkeley - Deputy dust (2001) 6mm³ Solar powered Acceleration and light sensors Optical communication Low cost in large quantities Y. Le Borgne 4 Current available sensors Crossbow : Mica / Mica dot uProc: 4Mhz, 8 bit Atmel RISC Radio: 40 kbit 900/450/300 MHz or 250 kbit 2.5GHz (MicaZ 802.15.4) Memory: 4K RAM / 128 K Program Flash / 512 K Data Flash Power: 2 x AA or coin cell Intel : iMote uProc: 12Mhz, 16 bit ARM Radio: Bluetooth Memory: 64K SRAM / 512 K Data Flash Power: 2 x AA MoteIV : Telos uProc: 8Mhz, 16 bit TI RISC Radio: 250 kbit 2.5GHz (802.15.4) Memory: 2 K RAM / 60 K Program Flash / 512 K Data Flash Power: 2 x AA Y. Le Borgne 5 Applications Wildfire monitoring Ecosystem monitoring Earthquake monitoring Precision agriculture Object tracking Intrusion detection … Y. Le Borgne 6 Challenges for… Electronics Networking Systems Data bases Statistics Signal processing … Y. Le Borgne 7 Machine learning and WSN Local scale Spatio-temporal correlations Local predictive model identification Can be used to: Reduce sensor communication activity Predict values for malfunctioning sensors Y. Le Borgne 8 Machine learning and WSN Global scale The network as a a whole can achieve high level tasks Sensor network <-> Image Y. Le Borgne 9 Supervised learning and WSN Classification (Traffic type classification) Prediction (Pollution forecast) Regression (Wave intensity, population density) Y. Le Borgne 10 A supervised learning scenario Ѕ: Network of S sensors x(t)={s1(t),s2(t),…sS(t)} snapshot at time t y(t)=f(x(t))+ε(t) the value associated to S at time t (ε standing for noise) Let DN be a set of N observations (x(t),y(t)) Goal : Find a model that predicts y for any new x Y. Le Borgne 11 Centralized approach High transmission overhead Y. Le Borgne 12 Two-layer approach Use of compression Reduce transmission overhead Spatial correlation induces low loss in compression Reduction of learning problem dimensionality Y. Le Borgne 13 Two-layer adaptive approach PAST : Online compression Lazy learning : Online learning Y. Le Borgne 14 Compression : PCA PCA: Transform the set of n input variables m variables , m<n. Linear transformation : , Variance preserving maximization , into a set of Solution : m first eigenvectors of x correlation matrix, or Minimization of Y. Le Borgne 15 PAST – Recursive PCA Projection approximation subspace tracking [YAN95] Online formulation: Low memory requirement and computational complexity : O(n*m)+O(m²) Y. Le Borgne 16 PAST Algorithm Recursive formulation: [HYV01] Y. Le Borgne 17 Learning algorithm Lazy learning: K-NN approach Storage of observation set: When a query q is asked, takes the k nearest neighbours to q: Builds a local linear model: Computes the output at Y. Le Borgne , such that by applying 18 How many neighbours? •y=sin(x)+e •e : Gaussian noise with σ=0.1 •What is the y value at x=1.5? Y. Le Borgne 19 How many neighbours? •K=2 : Overfitting Y. Le Borgne 20 How many neighbours? •K=2 : Overfitting •K=3 : Overfitting Y. Le Borgne 21 How many neighbours? •K=2: Overfitting •K=3: Overfitting •K=4: Overfitting Y. Le Borgne 22 How many neighbours? •K=2: Overfitting •K=3: Overfitting •K=4: Overfitting •K=5: Good Y. Le Borgne 23 How many neighbours? •K=2: Overfitting •K=3: Overfitting •K=4: Overfitting •K=5: Good •K=6: Underfitting Y. Le Borgne 24 Automatic model selection ([BIR99],[BON99],[BON00]) Starting with a low k, local models are identified Their quality is assessed by a leave one out procedure The best model(s) are kept for computing the prediction Low computational cost PRESS statistics (ALL74) Recursive least squares ([GOO84]) Y. Le Borgne 25 Advantages of PAST and lazy No assumption on the process underlying data On-line learning capability Adaptive with non-stationarity Low computational and memory costs Y. Le Borgne 26 Simulation Modeling wave propagation phenomenon Helmholtz equation: k is the wave number •2372 sensors •30 k values between 1 and 146; 50 time instants •1500 Observations •Output k is noisy Y. Le Borgne 27 Test procedure Prediction error measurement Normalized Mean Square Error (NMSE) 10-fold cross-validation (1350/150) Example of learning curve: Y. Le Borgne 28 Experiment 1 Centralized configuration Comparison PCA / PAST for 1 to 16 first principal components Y. Le Borgne 29 Results m 1 2 3 4 5 6 8 12 16 NMSE PCA 0.621 0.266 0.181 0.144 0.138 0.134 0.133 0.124 0.116 NMSE PAST 0.782 0.363 0.257 0.223 0.183 0.196 0.132 0.124 0.115 •Prediction accuracy similar if number of principal components sufficient Y. Le Borgne 30 Clustering The number of clusters involves a tradeoff between The routing costs between clusters and gateway The final prediction accuracy The robustness of the architecture Y. Le Borgne 31 Experiment 2 Partitioning into geographical clusters P varies from P(2) to P(7) 2 main components for each cluster Ten-fold cross-validation – 1500 data Example of P(2) partitioning Y. Le Borgne 32 Results P(2) NMSE P(3) P(4) P(5) P(6) P(7) 0.140 0.118 0.118 0.118 0.116 0.114 •Comparison of P(2) (Top) and P(5) (bottom) error curves •As number of cluster increases: •Better accuracy •Faster convergence Y. Le Borgne 33 Experiment 3 Simulation: at each time instant Probability 10% for a sensor failure Probability 1% for a supernode failure Recursive PCA and lazy learning deals efficiently with input space dimension variations Robust with random sensor malfunctioning Y. Le Borgne 34 Results P(2) NMSE P(3) P(4) P(5) P(6) P(7) 0.501 0.132 0.119 0.116 0.116 0.117 •Comparison of P(2) (Top) and P(5) (bottom) error curves •The number of clusters increases the robustness Y. Le Borgne 35 Experiment 4 Time varying changes in sensor measures 2700 time instants Sensor response decreases linearly from a factor 1 to a factor 0.4 A temporal window: Only the last 1500 measures are kept Y. Le Borgne 36 Results •Due to the concept drift, the fixed model (in black) becomes outdated •The lazy characteristic of the proposed architecture can deal with this drift very easily Y. Le Borgne 37 Conclusion Architecture: Yielding good results compared to batch equivalent Computationally efficient Adaptive with appearing and disappearing units Handling easily non-stationarity Y. Le Borgne 38 Future work Extensions of tests to real-world data Improvement of clustering strategy Taking costs (routing/accuracy) into consideration Making use of ad-hoc feature of the network Test of other compression procedures Robust PCA ICA Y. Le Borgne 39 References Smart Dust project: http://www-bsac.eecs.berkeley.edu/archive /users/warneke-brett/SmartDust/ Crossbow: http://www.xbow.com/ [BON99] G.Bontempi. Local Techniques for Modeling, Prediction and Control. PhD Thesis, IRIDIA- Université Libre de Bruxelles, 1999. [YAN95] B. Yang. Projection Approximation Subspace Tracking, IEEE Transactions on Signal Processing, 43(1):95-107,1995. [ALL74] D.M. Allen. 1974. The relationship between variable and data augmentation and a method of prediction. Technometrics, 16, 125-127 [GOO84] G.C. Goodwin & K.S. Sin. 1984. Adaptive filtering Prediction and Control. Prentice-Hall. [HYV01] Independent Component Analysis. A. Hyvarinen, J. Karhunen, E. Oja. 2001. Y. Le Borgne 40 References on lazy learning [BIR99] M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets the recursive least square algorithm. In M. S. Kearns, S.a. Solla, and D.a. Cohn, editors, NIPS 11, pages 375-381, Cambridge,1999, MIT Press. [BON99] G. Bontempi, M.Birattari, and H.Bersini. Local learning for iterated time-series prediction. In I. Bratko and S. Dzeroski, editors, Machine Learning : Proceedings of the 16th International Conference, pages 3238, San Francisco, CA, 1999. Morgan Kaufmann Publishers. [BON00] G. Bontempi, M.Birattari, and H. Bersini. A model selection approach for local learning. Artificial Intelligence Communications, 121(1), 2000. Y. Le Borgne 41 Thanks for your attention! Y. Le Borgne 42