Download SensornetDatabases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Data-centric view of
sensornets: An Overview
Puru Kulkarni
Vijay Sundaram
Bhuvan Urgaonkar
Motivation

Ubiquitous presence of sensor networks
– Communication, computation, limited
storage, sensing capabilities
– Used to sense, actuate, control
– Sensors everywhere = Data everywhere!

Require an infrastructure for data
access and storage
Overview



Sensors sense/generate data
Users/Applications interested in data or
some measure of data
Common user operations are:
– Queries and Monitoring
– Actuate and Control
Typical Queries

Historical
– What is the average rainfall over past 2
days?

Current
– What is the current temperate in Rm#
226?

Long Running
– Temperature in Rm# 226 over the next 4
hours every 30 seconds
Issues


How to identify relevant sensors?
Computation vs. Communication tradeoff
– Where to process query?
• inside the sensor network (route query)
– Need new techniques
• at a centralized location (route data)
– Large amounts of data transfer (not efficient)
– Data gathering may not reflect query rate
– How to process query?
• queries on streaming data
DataSpace: Querying and Monitoring
Deeply Networked Collections in Physical Space
T. Imielinski and S. Goel, Rutgers University





Billions of objects
populate space
Each produces and locally
stores data
Location aware
Can be selectively
monitored, queried and
controlled
Physical world enhanced with data
Characteristics

Dataspace
– Data lives on the object
– Users access not only “local” information
but can navigate entire dataspace
– Spatial world divided in 3-D datacubes
• CS Bldg. , street, block etc
– Communication, messaging and computation
techniques for querying and monitoring
required
Querying and Monitoring


Queries are spatially driven
Steps:
– Identify relevant datacubes
– Identify relevant nodes (dataflocks)
• Datacube directory service
– Aggregation for queries on several
datacubes
• e.g.: Information about Manhattan taxi cabs
Architecting DataSpace

Network as DataSpace engine
– multicast mechanisms
(each node has an IP address!)
– group membership based on
• physical location
• attribute (temperature, #vehicles etc)
– multicast fits selective node addressing criteria to
access relevant data
• e.g.: what is average temperature in CS Bldg?
• Query reaches only sensors in the CS Bldg datacube and
have the corresponding group address
Network as DataSpace engine
• Space Handle encodes datacube information
• Subject Handle attributes that are part of a
multicast group
• Dataspace address is a IPv6 mutlicast address
Based on location of datacube
<space-handle>
Based on interested attribute
<subject-handle>
DataSpace address
E.g.: Space handle: 224.4.5
Subject handle: 8
Dataspace address: 224.4.5.8
Geographic Routing infrastruture

Route message based on physical
location rather than IP address
– Use GPS coordinates for locations


Avoids use of multicast for routing
queries to datacubes
Once query reaches a region use
mutlicast
Geographic Routing infrastruture
– Geo-router (routes based on datacube
location)
– Geo-node (issue query to nodes in datacube)
– Geo-host (process geographics messages)
– Approach
• Route query to datacube
• Geo-nodes route query within datacube
– mulitcast with a TTL of 1

The Sensor Network as a Database
• Govindan, Hellerstein, Hong, Madden, Franklin,
Shenker

Querying the Physical World
• Bonnet, Gehrke, Seshadri
Sensornet Database architecture


Given a routing and access mechanism, how to
process queries?
Provide a DB-view to users/apps
– well understood programming interface
– common data operations use computation in network
• help energy-efficiency
– allow users to be unaware of actual network, but
treat it as a database
– Sensor Network + Data => Sensor Network Database
What is required?


Core DB operations tailored for sensor
networks
Design appropriate building blocks for
DB operations
– Join, aggregation, grouping, selection etc
Sensornet Database
Architecutre

1.
Two important ideas:
in-network implementations of primitive
database query operators such as
grouping, aggregation, and joins
–
group communication and routing protocols
with possible processing at intermediate
nodes implement the operator in an
application independent way
Sensornet Database
Architecutre
Relax the semantics of database queries
to allow approximate results
2.


relaxation enables energy-efficient
implementations even given the expected
high level of network dynamics
A sensor network is a proxy for a
continuous realworld phenomenon, and by
nature samples that phenomenon
discretely at some rate, with some
degree of error.
In-network Implementation

JOIN operator
– selection over cross-product of a pair of tables
– Tuples generated at different nodes might be
joined at a single node
– Some JOIN implementations are blocking

Blocking is infeasible in sensor networks
– tables can contain unbounded streams of data
– amount of memory available is limited

Need to retool these operations
– Pipelining
– Partitioning
Non Blocking Pipelinined Joins

Symmetric hash-join:
– Maintains two hash tables (keyed by the column(s) used for the
join)
– On an input tuple, looks up matching tuples from other input’s hash
table
– Outputs any matching results

Ripple joins:
– Statistically sample the two tables to be joined, in order to
produce a stream of joined tuples
– Relative rates at which the two tables are sampled adapt to match
the variance produced by the data in each
– low energy approach to obtain approximate answers
Partitioning

Partitioning:
– tuples are partitioned based on their join-column values and
redistributed on the fly across multiple nodes;
– the work of joining the individual partitions is done in parallel
by each of the nodes

Partitions can be defined by value, geographically, or
by sensor type, and a node (or nodes) can be
designated to perform the join for the partition
In-network Implementation

Aggregation operators
• summarization of a column(s) into a single
numerical value E.g. SUM, COUNT, AVERAGE,
MIN, MAX etc
• query flooded in the network and the responses
are routed on the reverse path trees,
• results aggregated across several nodes
• E.g: to calculate AVERAGE each node returns
(SUM, COUNT) values to parent
• Can be a very common operator
Distributed Sensnet DBs

How to represent devices in DBs on
sensornets?
– ADTs (Abstract Data Types)
– Methods correspond to sensing
functionality
– Virtual Relations (VRs) store local data
– Network used for query operations
Virtual Relation





VR with attributes as
Inputs to an ADT (device) function
Arguments to an ADT function
Output of the function
Timestamp of the function
Virtual Relation

Some VR properties
– records are never updated or deleted
– is naturally partitioned over the sensnet (each
device takes care of its set of VR records)


What does this mean? – a distributed DB
Records from the VRs (distributed over the
devices) are processed using distributed
query execution plans
Approximate Results


Energy-efficiency can be achieved using approximate
aggregates
Uniform sampling:
– Tuples are uniformly sampled and the resulting average is
assumed to represent the actual average
– Packet loss might invalidate the statistical assumptions that
these intervals depend on.

Logarithmic sampling
– The number of respondents (or the size of memory needed
for the count) scales logarithmically with the size of the
network
– Provides looser error bounds but uses significantly less
memory or communication.
Complex query evaluation

RxSxT
– What order to follow?
• (RxS)xT or Rx(SxT) or (RxT)XS
– Decided by query optimizer
• Usually depends on table size

With Sensernret DB
• Need adaptive policy to route tuples based on
– Energy consumption
– Topology
– Loss rates
Conclusions


Explosion of data from sensor networks needs
an infrastructure for access, storage etc
Organizing sensors
– Datacubes
– Other techniques ?

Identifying relevant sensors is preliminary to
fetch data
– Dataspace provided two solutions
– Other approaches ?
Conclusions

Sensornets as Distributed DB
– Provide a database view to sensornet data
– Pros
• App development easy
• In-network processing helps resource usage
•
– Cons
• Distributed DB can be difficult
• Requires to retool DB operations for sensornets
• Other approaches?
•
Representations for Devices
Functions




Internal Representation
We can’t use trad OO DB methods
- they all demand immediate access
- with asynchronous quality of sensnets
this is unacceptable
Overview

Direction of sensor networks progress
–
–
–
–
–

Small form-factor devices
On-board computation
Wireless communication
Increased sensing capabilities
Improved OS and networking functionalities
Prediction:
– Every device (> 1 $) will have some sensor
– Ubiquitous presence of sensor networks
Overview

Typical sensor networks usage:
– Sense, collect and convey data
– Provides a ubiquitous computing platform
– Applications query/monitor sensed data
• Ecosystem dynamics
• Temperature/weather sensing
• Automobile traffic analysis
– Data-centric network, generated data more
important than node identity
Requirements

Addressing
– Identify relevant sensors

How to access/process data?
– Communicate data and process centrally
– Compute query at node and perform DB
operations

Interface for querying/monitoring and
control
What to do with data?


Answer queries/give useful info
How ??
– Centralized approach
• Communicate data
• Store and process all data at central location
(traditional DB approach)
• Is all temporal data to be stored?
• Communication overhead?
What to do with data?
– De-centralized approach
•
•
•
•
•
Communicate query (query routing)
Required data attribute of node
Node stores and communicates data to queries
Processing at node
Computation overhead
– Computation overhead smaller than communication!
• How to aggregate data?
• How to route queries?
• How to map nodes to addresses for
communication purposes?
Need for Decentralization

Centralized (Traditional databases)
– Inefficient use of resources
• Large amounts of data communicated to central location
• All sensors send data all the time
– Dissociates access to device from query load
– Communication more expensive than computation

Decentralized (Distributed DBs)
– Data on devices
– In-network query processing
Pipelining Benefits


Provide streamed partial answers, hence,
can enable query refinement
Schemes like ripple joins form a low energy
approach to obtain approximate answers
and can be used together with sampling