Download SensornetDatabases

Data-centric view of sensornets: An Overview Puru Kulkarni Vijay Sundaram Bhuvan Urgaonkar Motivation  Ubiquitous presence of sensor networks – Communication, computation, limited storage, sensing capabilities – Used to sense, actuate, control – Sensors everywhere = Data everywhere!  Require an infrastructure for data access and storage Overview    Sensors sense/generate data Users/Applications interested in data or some measure of data Common user operations are: – Queries and Monitoring – Actuate and Control Typical Queries  Historical – What is the average rainfall over past 2 days?  Current – What is the current temperate in Rm# 226?  Long Running – Temperature in Rm# 226 over the next 4 hours every 30 seconds Issues   How to identify relevant sensors? Computation vs. Communication tradeoff – Where to process query? • inside the sensor network (route query) – Need new techniques • at a centralized location (route data) – Large amounts of data transfer (not efficient) – Data gathering may not reflect query rate – How to process query? • queries on streaming data DataSpace: Querying and Monitoring Deeply Networked Collections in Physical Space T. Imielinski and S. Goel, Rutgers University      Billions of objects populate space Each produces and locally stores data Location aware Can be selectively monitored, queried and controlled Physical world enhanced with data Characteristics  Dataspace – Data lives on the object – Users access not only “local” information but can navigate entire dataspace – Spatial world divided in 3-D datacubes • CS Bldg. , street, block etc – Communication, messaging and computation techniques for querying and monitoring required Querying and Monitoring   Queries are spatially driven Steps: – Identify relevant datacubes – Identify relevant nodes (dataflocks) • Datacube directory service – Aggregation for queries on several datacubes • e.g.: Information about Manhattan taxi cabs Architecting DataSpace  Network as DataSpace engine – multicast mechanisms (each node has an IP address!) – group membership based on • physical location • attribute (temperature, #vehicles etc) – multicast fits selective node addressing criteria to access relevant data • e.g.: what is average temperature in CS Bldg? • Query reaches only sensors in the CS Bldg datacube and have the corresponding group address Network as DataSpace engine • Space Handle encodes datacube information • Subject Handle attributes that are part of a multicast group • Dataspace address is a IPv6 mutlicast address Based on location of datacube <space-handle> Based on interested attribute <subject-handle> DataSpace address E.g.: Space handle: 224.4.5 Subject handle: 8 Dataspace address: 224.4.5.8 Geographic Routing infrastruture  Route message based on physical location rather than IP address – Use GPS coordinates for locations   Avoids use of multicast for routing queries to datacubes Once query reaches a region use mutlicast Geographic Routing infrastruture – Geo-router (routes based on datacube location) – Geo-node (issue query to nodes in datacube) – Geo-host (process geographics messages) – Approach • Route query to datacube • Geo-nodes route query within datacube – mulitcast with a TTL of 1  The Sensor Network as a Database • Govindan, Hellerstein, Hong, Madden, Franklin, Shenker  Querying the Physical World • Bonnet, Gehrke, Seshadri Sensornet Database architecture   Given a routing and access mechanism, how to process queries? Provide a DB-view to users/apps – well understood programming interface – common data operations use computation in network • help energy-efficiency – allow users to be unaware of actual network, but treat it as a database – Sensor Network + Data => Sensor Network Database What is required?   Core DB operations tailored for sensor networks Design appropriate building blocks for DB operations – Join, aggregation, grouping, selection etc Sensornet Database Architecutre  1. Two important ideas: in-network implementations of primitive database query operators such as grouping, aggregation, and joins – group communication and routing protocols with possible processing at intermediate nodes implement the operator in an application independent way Sensornet Database Architecutre Relax the semantics of database queries to allow approximate results 2.   relaxation enables energy-efficient implementations even given the expected high level of network dynamics A sensor network is a proxy for a continuous realworld phenomenon, and by nature samples that phenomenon discretely at some rate, with some degree of error. In-network Implementation  JOIN operator – selection over cross-product of a pair of tables – Tuples generated at different nodes might be joined at a single node – Some JOIN implementations are blocking  Blocking is infeasible in sensor networks – tables can contain unbounded streams of data – amount of memory available is limited  Need to retool these operations – Pipelining – Partitioning Non Blocking Pipelinined Joins  Symmetric hash-join: – Maintains two hash tables (keyed by the column(s) used for the join) – On an input tuple, looks up matching tuples from other input’s hash table – Outputs any matching results  Ripple joins: – Statistically sample the two tables to be joined, in order to produce a stream of joined tuples – Relative rates at which the two tables are sampled adapt to match the variance produced by the data in each – low energy approach to obtain approximate answers Partitioning  Partitioning: – tuples are partitioned based on their join-column values and redistributed on the fly across multiple nodes; – the work of joining the individual partitions is done in parallel by each of the nodes  Partitions can be defined by value, geographically, or by sensor type, and a node (or nodes) can be designated to perform the join for the partition In-network Implementation  Aggregation operators • summarization of a column(s) into a single numerical value E.g. SUM, COUNT, AVERAGE, MIN, MAX etc • query flooded in the network and the responses are routed on the reverse path trees, • results aggregated across several nodes • E.g: to calculate AVERAGE each node returns (SUM, COUNT) values to parent • Can be a very common operator Distributed Sensnet DBs  How to represent devices in DBs on sensornets? – ADTs (Abstract Data Types) – Methods correspond to sensing functionality – Virtual Relations (VRs) store local data – Network used for query operations Virtual Relation      VR with attributes as Inputs to an ADT (device) function Arguments to an ADT function Output of the function Timestamp of the function Virtual Relation  Some VR properties – records are never updated or deleted – is naturally partitioned over the sensnet (each device takes care of its set of VR records)   What does this mean? – a distributed DB Records from the VRs (distributed over the devices) are processed using distributed query execution plans Approximate Results   Energy-efficiency can be achieved using approximate aggregates Uniform sampling: – Tuples are uniformly sampled and the resulting average is assumed to represent the actual average – Packet loss might invalidate the statistical assumptions that these intervals depend on.  Logarithmic sampling – The number of respondents (or the size of memory needed for the count) scales logarithmically with the size of the network – Provides looser error bounds but uses significantly less memory or communication. Complex query evaluation  RxSxT – What order to follow? • (RxS)xT or Rx(SxT) or (RxT)XS – Decided by query optimizer • Usually depends on table size  With Sensernret DB • Need adaptive policy to route tuples based on – Energy consumption – Topology – Loss rates Conclusions   Explosion of data from sensor networks needs an infrastructure for access, storage etc Organizing sensors – Datacubes – Other techniques ?  Identifying relevant sensors is preliminary to fetch data – Dataspace provided two solutions – Other approaches ? Conclusions  Sensornets as Distributed DB – Provide a database view to sensornet data – Pros • App development easy • In-network processing helps resource usage • – Cons • Distributed DB can be difficult • Requires to retool DB operations for sensornets • Other approaches? • Representations for Devices Functions     Internal Representation We can’t use trad OO DB methods - they all demand immediate access - with asynchronous quality of sensnets this is unacceptable Overview  Direction of sensor networks progress – – – – –  Small form-factor devices On-board computation Wireless communication Increased sensing capabilities Improved OS and networking functionalities Prediction: – Every device (> 1 $) will have some sensor – Ubiquitous presence of sensor networks Overview  Typical sensor networks usage: – Sense, collect and convey data – Provides a ubiquitous computing platform – Applications query/monitor sensed data • Ecosystem dynamics • Temperature/weather sensing • Automobile traffic analysis – Data-centric network, generated data more important than node identity Requirements  Addressing – Identify relevant sensors  How to access/process data? – Communicate data and process centrally – Compute query at node and perform DB operations  Interface for querying/monitoring and control What to do with data?   Answer queries/give useful info How ?? – Centralized approach • Communicate data • Store and process all data at central location (traditional DB approach) • Is all temporal data to be stored? • Communication overhead? What to do with data? – De-centralized approach • • • • • Communicate query (query routing) Required data attribute of node Node stores and communicates data to queries Processing at node Computation overhead – Computation overhead smaller than communication! • How to aggregate data? • How to route queries? • How to map nodes to addresses for communication purposes? Need for Decentralization  Centralized (Traditional databases) – Inefficient use of resources • Large amounts of data communicated to central location • All sensors send data all the time – Dissociates access to device from query load – Communication more expensive than computation  Decentralized (Distributed DBs) – Data on devices – In-network query processing Pipelining Benefits   Provide streamed partial answers, hence, can enable query refinement Schemes like ripple joins form a low energy approach to obtain approximate answers and can be used together with sampling

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download SensornetDatabases