Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Management for Sensor Networks Elke A. Rundensteiner Based on Papers on TinyDB and Cougar, and others; and on material by S. Nittel CS525 Overview Programming sensor networks In-network data aggregation In-network query processing In-network data storage 2 Data Collection Scenarios • Embed numerous distributed devices to monitor and interact with physical world • Exploit spatially and temporally dense, in situ, sensing and actuation • Network these devices so that they can coordinate to perform higher-level identification and tasks. • Requires robust distributed systems of hundreds or thousands of devices. Deborah Estrin, UCLA 3 Indoor Applications Intel cubicle space Sensing: Light and sounds sensors on the ceiling or cubicle walls Actuation: detecting occupied cubicles and disturbing conversation outside of cubicles 4 Outdoor Applications Napa Valley vineyard Sensing: Humidity and temperature sensors at vines Actuation: ventilators to remove fog, and localized heaters Queries: monitoring micro-climates at vines 5 Example A “Macroscope” in the Redwoods. ACM Sensys 2005 Observation over ca. 60 days. 6 Viewing SN as DBS 7 Viewing a SN as a DB System Assumption: A sensor network can be viewed as Tiny (foot-print) database management systems running on sensor nodes are available distributed database system each sensor node is a database system that can accept, process, and answer queries participate in execution of global, distributed queries The user poses declarative queries to the SN as a whole. The DBMS figures out how to process the query. Examples: TinyDB (UCBerkeley), Cougar (Cornell University) However, constrained computing environment Adapting existing DB technology for in-network processing !!! 8 Sensor Network DBMS Objectives: Users: Model the application data and data needs No low-level detail programming of the sensor nodes and the data gathering details “What should be done”, not “how should it be done” Approach: Declarative SQL-style queries Intelligent query processing Fault Mitigation SELECT MAX(temperat) FROM sensors WHERE temperat > thresh SAMPLE PERIOD 64ms App Query, Trigger Data TinyDB Sensor Network © S. Madden, 2005. 10 TinyDB Architecture SELECT T:1, AVG: 225 AVG(temp) Queries Results T:2, AVG: 250 WHERE light > 400 Multihop Network Query Processor Aggavg(temp) Schema: •“Catalog” of commands & attributes Filterlight > 400 get (‘temp’) Tables getTempFunc(…) Samples Schema TinyOS TinyDB got(‘temp’) Name: temp Time to sample: 50 uS Cost to sample: 90 uJ Calibration Table: 3 Units: Deg. F Error: ± 5 Deg F Get f : getTempFunc()… 11 Declarative Queries for Sensor Networks “Find the sensors in bright nests.” 1 Examples: SELECT nodeid, nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s Sensors Epoch Nodeid nestNo Light 0 1 17 455 0 2 25 389 1 1 17 422 1 2 25 405 12 Aggregation Queries 2 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s “Count the number occupied nests in each loud region of the island.” Epoch region CNT(…) 0 North 3 360 0 South 3 520 GROUP BY region 1 North 3 370 HAVING AVG(sound) > 200 1 South 3 520 3 SELECT region, CNT(occupied) AVG(sound) FROM sensors EPOCH DURATION 10s AVG(…) Regions w/ AVG(sound) > 200 13 Queries over Sensor Networks Query types: Snapshot queries Continuous queries Report when temperature values are above threshold 1 Meta queries Report the temperature readings of sensor node #1 to #10 in the next 10 minutes at the interval of 1 min? Event queries Report the current temperature reading of sensor node #1? Lifetime estimation, etc. Common: Spatio-Temporal queries Point queries (“report temperature in room 324”) Spatial window queries (“report temperature over time from region A”) 14 Queries over Sensor Networks Common (cont.): ST Aggregation (average, max, min, etc) Temporal aggregation (“max temperature value in the last 24h”) Spatial aggregation (“average temperature value of all sensors on the first floor”) Basic aggregation: Min, max, average, sum, count, etc. Holistic aggregates: estimation 15 In-Network Data Aggregation Query {A,B,C,D,E,F} Each sensor node: • production of data stream • processing of data stream locally • processing of aggregated data • minimize communication A {B,D,E,F} B C collection points: Local and locallycoordinated processing of data “in the network” {D,E,F} D Partial state record E Computation is pushed to data F 16 Execution of Aggregates Flexible communication topology (network level) Aggregation computation over sensor networks consists of two phases: a (query) distribution phase a (data) collection phase in which aggregate queries are pushed down into the network, and where the aggregate values are continually routed up from children to parents. Query semantics : 1. partition time into epochs of duration 2. produce single aggregate value (when not grouping) that combines readings of all devices in network during that epoch. 18 Distribution Phase 1. When a sensor node n receives a request to aggregate r (e.g. max(temp)), it awakens, synchronizes its clock according to timing information in the message, and prepares to participate in aggregation. In tree-based routing scheme, n chooses sender s of the message as its parent. Also, query r includes interval when sender s is expecting to hear partial state records from n . 19 Distribution Phase 2. n forwards query r down the network, setting this delivery interval for children to be slightly before the time its parent expects to see its own n ’s partial state record. In tree-based approach, this forwarding can be broadcast of r , to include any nodes that did not hear the previous round, and include them as children (if it has any.) Nodes continue to forward the request, until query has been propagated throughout network 20 Collection Phase 3. During each epoch, Each sensor node listens for messages from its children during the interval it specified when forwarding the query. It also acquires its own data (sensing) It computes a partial state record consisting of combination of any child values it heard with its own local sensor readings (aggregation). Finally, during transmission interval requested by its parent, mote transmits partial state record up network 21 22 Acquisitional Query Processing Cynical question: what’s really different about sensor networks? –Low Power? Laptops! –Lots of Nodes? Distributed DBs! –Limited Processing Capabilities? Moore’s Law! So what is it ? 26 In-Network Query Processing Closed world assumption does not hold Could generate an infinite number of samples Key: Acquisitional Query Processing Traditional query processing: Sensor network query processing: query processing on stored data. acquiring the data from sensors Acquisitional query processor controls when, where, and with what frequency data is collected 27 ACQP: What’s Different? How does the user control acquisition? How should the query be processed? Rates or lifetimes. Event-based triggers Sampling as an operator! Events as joins Which nodes have relevant data? Semantic Routing Tree Nodes that are queried together route together Which samples should be transmitted? Pick most “valuable”? 28 Lifetime Queries Lifetime vs. sample rate SELECT … LIFETIME 30 days Implies not all data SELECT … is xmitted LIFETIME 10 days MIN SAMPLE INTERVAL 1s 29 Processing Lifetimes At root Compute SAMPLE PERIOD that satisfies lifetime If it exceeds MIN SAMPLE PERIOD (MSP), use MSP and compute transmission rate At other nodes Use root’s values or slower 30 Lifetime Based Queries 31 Event Based Processing ACQP – want to initiate queries in response to events CREATE BUFFER birds(uint16 cnt) SIZE 1 ON EVENT bird-enter(…) SELECT b.cnt+1 In-network storage Subject to optimization FROM birds AS b OUTPUT INTO b ONCE 32 More Events ON EVENT bird_detect(loc) AS bd SELECT AVG(s.light), AVG(s.temp) FROM sensors AS s WHERE dist(bd.loc,s.loc) < 10m SAMPLE PERIOD 1s for 10 33 Optimizing in ACQP Sampling/sensing = “expensive predicate” Some subtleties: Which predicate to “charge”? Can’t operate without samples Solution: Treat sampling as a separate task 34 Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) EPOCH DURATION 1s Traditional DBMS (pred1) (pred2) At 1 sample / sec, total power savings • could E(sampling mag) as >> 3.5mW E(sampling be as much light) 1500 uJ vs. uJ Comparable to 90 processor! Correct ordering (unless pred1 is very selective and pred2 is not): (pred1) ACQP Costly (pred2) Cheap mag light mag light (pred2) light (pred1) mag 35 Exemplary Aggregate Pushdown SELECT WINMAX(light,8s,8s) FROM sensors WHERE mag > x EPOCH DURATION 1s Traditional DBMS WINMAX (mag>x) ACQP WINMAX (mag>x) mag • Novel, general pushdown technique • Mag sampling is the most expensive operation! (light > MAX) light mag light 36 Acquisitional Query Processing Optimization Strategies: Avoiding unnecessary acquisition Sampling as a query operator Choosing Where to Sample via Coacquisition Index-like data structures Turn frequent event-triggering into a continuous join 37 Event-Join Duality ON EVENT E(nodeid) SELECT a FROM sensors AS s WHERE s.nodeid = e.nodeid SAMPLE INTERVAL d FOR k SELECT s.a FROM sensors AS s, events AS e WHERE s.nodeid = e.nodeid AND e.type = E AND s.time – e.time < k AND s.time > e.time SAMPLE INTERVAL d • Problem: multiple outstanding queries (lots of samples) • High event frequency → Use Rewrite • Rewrite problem: phase alignment! • Solution: subsample d t d d/2 38 Adaptive Rate Control Sample Rate vs. Delivery Rate 8 Adaptive = 2x Successful Xmissions Aggregate Delivery Rate (Packets/Second) 7 6 5 4 3 1 mote 4 motes 4 motes, adaptive 2 1 0 0 2 4 6 8 10 12 Samples Per Second (Per Mote) 14 16 39 Delta Encoding Must pick most valuable data How? Domain Dependent E.g., largest, average, shape preserving, frequency preserving, most samples, etc. Simple idea for time-series: order biggest-change-first 40 Aggregate Prioritization Insight: Shared channel enables nodes to hear neighbor values Suppress values that won’t affect aggregate E.g., MAX Applies to all exemplary, monotonic aggregates e.g. top N, MIN, MAX, etc. 41 In-Network Data Storage Storage challenges: Method:transmit all measurements to central db for storage Advantage: unconstrained search on historic data Disadvantage: high power consumption Queries on different Level of detail db Centralized Storage Hierarchical In-Network Storage 42 Summary Declarative Query Processing Simplify data collection in sensor-nets In-network processing Query optimization for performance Acquisitional Query Processing Focus on costs associated with sampling data New challenges of sensor nets 43