Download Distributed Information Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Theoretical computer science wikipedia , lookup

Data analysis wikipedia , lookup

Stream processing wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Data Management for Sensor
Networks
Elke A. Rundensteiner
Based on Papers on TinyDB and Cougar, and others;
and on material by S. Nittel
CS525
Overview




Programming sensor networks
In-network data aggregation
In-network query processing
In-network data storage
2
Data Collection Scenarios
• Embed numerous distributed
devices to monitor and interact
with physical world
• Exploit spatially and temporally
dense, in situ, sensing and
actuation
• Network these devices so that
they can coordinate to perform
higher-level identification and
tasks.
• Requires robust distributed
systems of hundreds or
thousands of devices.
Deborah Estrin, UCLA
3
Indoor Applications



Intel cubicle space
Sensing:
 Light and sounds
sensors on
 the ceiling or
cubicle walls
Actuation: detecting
occupied cubicles
and disturbing
conversation outside
of cubicles
4
Outdoor Applications




Napa Valley vineyard
Sensing:
 Humidity and
temperature
sensors at vines
Actuation: ventilators
to remove fog, and
localized heaters
Queries: monitoring
micro-climates at vines
5
Example



A “Macroscope”
in the
Redwoods.
ACM Sensys 2005
Observation over
ca. 60 days.
6
Viewing SN as DBS
7
Viewing a SN as a DB System

Assumption: A sensor network can be viewed as





Tiny (foot-print) database management systems running on
sensor nodes are available


distributed database system
each sensor node is a database system that
 can accept, process, and answer queries
 participate in execution of global, distributed queries
The user poses declarative queries to the SN as a whole.
The DBMS figures out how to process the query.
Examples: TinyDB (UCBerkeley), Cougar (Cornell University)
However, constrained computing environment

Adapting existing DB technology for in-network processing !!!
8
Sensor Network DBMS

Objectives:



Users: Model the application data
and data needs
No low-level detail programming
of the sensor nodes and the data
gathering details
 “What should be done”, not
“how should it be done”
Approach:



Declarative SQL-style queries
Intelligent query processing
Fault Mitigation
SELECT MAX(temperat)
FROM sensors
WHERE temperat > thresh
SAMPLE PERIOD 64ms
App
Query,
Trigger
Data
TinyDB
Sensor Network
© S. Madden, 2005.
10
TinyDB Architecture
SELECT
T:1, AVG: 225
AVG(temp) Queries
Results T:2, AVG: 250
WHERE
light > 400
Multihop
Network
Query Processor
Aggavg(temp)
Schema:
•“Catalog” of commands &
attributes
Filterlight >
400
get (‘temp’) Tables
getTempFunc(…)
Samples
Schema
TinyOS
TinyDB
got(‘temp’)
Name: temp
Time to sample: 50 uS
Cost to sample: 90 uJ
Calibration Table: 3
Units: Deg. F
Error: ± 5 Deg F
Get f : getTempFunc()…
11
Declarative Queries for Sensor
Networks
“Find the sensors in bright
nests.”
1

Examples:
SELECT nodeid, nestNo, light
FROM sensors
WHERE light > 400
EPOCH DURATION 1s
Sensors
Epoch
Nodeid
nestNo
Light
0
1
17
455
0
2
25
389
1
1
17
422
1
2
25
405
12
Aggregation Queries
2 SELECT AVG(sound)
FROM sensors
EPOCH DURATION 10s
“Count the number occupied
nests in each loud region of
the island.”
Epoch
region
CNT(…)
0
North
3
360
0
South
3
520
GROUP BY region
1
North
3
370
HAVING AVG(sound) > 200
1
South
3
520
3 SELECT region, CNT(occupied)
AVG(sound)
FROM sensors
EPOCH DURATION 10s
AVG(…)
Regions w/ AVG(sound) > 200
13
Queries over Sensor Networks

Query types:

Snapshot queries


Continuous queries


Report when temperature values are above threshold 1
Meta queries


Report the temperature readings of sensor node #1 to #10
in the next 10 minutes at the interval of 1 min?
Event queries


Report the current temperature reading of sensor node #1?
Lifetime estimation, etc.
Common:

Spatio-Temporal queries


Point queries (“report temperature in room 324”)
Spatial window queries (“report temperature over time from
region A”)
14
Queries over Sensor Networks

Common (cont.):

ST Aggregation (average, max, min, etc)



Temporal aggregation (“max temperature value
in the last 24h”)
Spatial aggregation (“average temperature
value of all sensors on the first floor”)
Basic aggregation:


Min, max, average, sum, count, etc.
Holistic aggregates: estimation
15
In-Network Data Aggregation
Query
{A,B,C,D,E,F}
Each sensor node:
• production of data stream
• processing of data stream locally
• processing of aggregated data
• minimize communication
A
{B,D,E,F}
B
C
collection points: Local and locallycoordinated processing of data “in
the network”
{D,E,F} D
Partial
state
record
E
Computation is pushed to data
F
16
Execution of Aggregates


Flexible communication topology (network level)
Aggregation computation over sensor networks consists
of two phases:
 a (query) distribution phase


a (data) collection phase


in which aggregate queries are pushed down into the
network, and
where the aggregate values are continually routed up from
children to parents.
Query semantics :


1. partition time into epochs of duration
2. produce single aggregate value (when not grouping) that
combines readings of all devices in network during that epoch.
18
Distribution Phase

1. When a sensor node n receives a request
to aggregate r (e.g. max(temp)),



it awakens, synchronizes its clock according to
timing information in the message, and prepares
to participate in aggregation.
In tree-based routing scheme, n chooses sender s
of the message as its parent.
Also, query r includes interval when sender s is
expecting to hear partial state records from n .
19
Distribution Phase

2. n forwards query r down the network,



setting this delivery interval for children to be
slightly before the time its parent expects to see
its own n ’s partial state record.
In tree-based approach, this forwarding can be
broadcast of r , to include any nodes that did not
hear the previous round, and include them as
children (if it has any.)
Nodes continue to forward the request, until query
has been propagated throughout network
20
Collection Phase

3. During each epoch,




Each sensor node listens for messages from its
children during the interval it specified when
forwarding the query.
It also acquires its own data (sensing)
It computes a partial state record consisting of
combination of any child values it heard with its own
local sensor readings (aggregation).
Finally, during transmission interval requested by its
parent, mote transmits partial state record up network
21
22
Acquisitional Query Processing

Cynical question: what’s really different
about sensor networks?
–Low Power?
Laptops!
–Lots of Nodes?
Distributed DBs!
–Limited Processing Capabilities?
Moore’s Law!
So what is it ?
26
In-Network Query Processing

Closed world assumption does not hold


Could generate an infinite number of samples
Key: Acquisitional Query Processing

Traditional query processing:


Sensor network query processing:


query processing on stored data.
acquiring the data from sensors
Acquisitional query processor controls



when,
where,
and with what frequency data is collected
27
ACQP: What’s Different?

How does the user control acquisition?



How should the query be processed?



Rates or lifetimes.
Event-based triggers
Sampling as an operator!
Events as joins
Which nodes have relevant data?

Semantic Routing Tree


Nodes that are queried together route together
Which samples should be transmitted?

Pick most “valuable”?
28
Lifetime Queries

Lifetime vs. sample rate
SELECT …
LIFETIME 30 days
Implies not all data
SELECT …
is xmitted
LIFETIME 10 days
MIN SAMPLE INTERVAL 1s
29
Processing Lifetimes

At root



Compute SAMPLE PERIOD that satisfies
lifetime
If it exceeds MIN SAMPLE PERIOD (MSP),
use MSP and compute transmission rate
At other nodes

Use root’s values or slower
30
Lifetime Based Queries
31
Event Based Processing

ACQP – want to initiate queries in
response to events
CREATE BUFFER birds(uint16 cnt)
SIZE 1
ON EVENT bird-enter(…)
SELECT b.cnt+1
In-network storage
Subject to
optimization
FROM birds AS b
OUTPUT INTO b
ONCE
32
More Events
ON EVENT bird_detect(loc) AS bd
SELECT AVG(s.light), AVG(s.temp)
FROM sensors AS s
WHERE dist(bd.loc,s.loc) < 10m
SAMPLE PERIOD 1s for 10
33
Optimizing in ACQP


Sampling/sensing = “expensive predicate”
Some subtleties:



Which predicate to “charge”?
Can’t operate without samples
Solution:

Treat sampling as a separate task
34
Operator Ordering: Interleave Sampling +
Selection
SELECT light, mag
FROM sensors
WHERE pred1(mag)
AND pred2(light)
EPOCH DURATION 1s
Traditional DBMS
(pred1)
(pred2)
At 1 sample / sec, total power savings
• could
E(sampling
mag) as
>> 3.5mW
E(sampling
be as much
 light)
1500 uJ vs.
uJ
Comparable
to 90
processor!
Correct ordering
(unless pred1 is very selective
and pred2 is not):
(pred1)
ACQP
Costly
(pred2)
Cheap
mag
light
mag
light
(pred2)
light
(pred1)
mag
35
Exemplary Aggregate
Pushdown
SELECT WINMAX(light,8s,8s)
FROM sensors
WHERE mag > x
EPOCH DURATION 1s
Traditional DBMS
WINMAX
(mag>x)
ACQP
WINMAX
(mag>x)
mag
• Novel, general
pushdown
technique
• Mag sampling is
the most
expensive
operation!
(light > MAX)
light
mag
light
36
Acquisitional Query Processing

Optimization Strategies:





Avoiding unnecessary acquisition
Sampling as a query operator
Choosing Where to Sample via Coacquisition
Index-like data structures
Turn frequent event-triggering into a
continuous join
37
Event-Join Duality
ON EVENT E(nodeid)
SELECT a
FROM sensors AS s
WHERE s.nodeid = e.nodeid
SAMPLE INTERVAL d FOR k
SELECT s.a
FROM sensors AS s,
events AS e
WHERE s.nodeid = e.nodeid
AND e.type = E
AND s.time – e.time < k
AND s.time > e.time
SAMPLE INTERVAL d
• Problem: multiple outstanding
queries (lots of samples)
• High event frequency → Use
Rewrite
• Rewrite problem: phase
alignment!
• Solution: subsample
d
t
d
d/2
38
Adaptive Rate Control
Sample Rate vs. Delivery Rate
8
Adaptive = 2x
Successful
Xmissions
Aggregate Delivery Rate
(Packets/Second)
7
6
5
4
3
1 mote
4 motes
4 motes, adaptive
2
1
0
0
2
4
6
8
10
12
Samples Per Second (Per Mote)
14
16
39
Delta Encoding


Must pick most valuable data
How?

Domain Dependent


E.g., largest, average, shape preserving,
frequency preserving, most samples, etc.
Simple idea for time-series:

order biggest-change-first
40
Aggregate Prioritization


Insight: Shared channel enables nodes to hear
neighbor values
Suppress values that won’t affect aggregate


E.g., MAX
Applies to all exemplary, monotonic aggregates e.g.
top N, MIN, MAX, etc.
41
In-Network Data Storage

Storage challenges:



Method:transmit all measurements to central db
for storage
Advantage: unconstrained search on historic data
Disadvantage: high power consumption
Queries on different
Level of detail
db
Centralized
Storage
Hierarchical
In-Network
Storage
42
Summary

Declarative Query Processing




Simplify data collection in sensor-nets
In-network processing
Query optimization for performance
Acquisitional Query Processing

Focus on costs associated with sampling
data

New challenges of sensor nets
43