Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 862 Presentation Querying the Physical World ------ Cornell University Event Detection Services Using Data Service Middleware in Distributed Sensor Networks ------ University of Virginia Presented By Gary Zhou @ UVA 1 Comparison between these two papers Query the physical world Provide database-like abstraction to applications No avi value for each data, so not really real-time based. Concentrate on individual mote. Special interesting point: represent device function Event Detection Service Provide database-like abstraction to applications There is avi value for each data, so really Real-time based Group-based robust coordination Special interesting point: provide event detection service 2 Outline --- Querying the Physical World Device Networks & Their Query Processing Description of Device Networks Three kinds of queries Two approaches Device Database System Device & Function User representation Internal representation Queries Query Processing over Device Database System Performance Metrics Distributed Query Execution Plans Experiments Discussions 3 Outline --- Event Detection Service Motivation Data services in sensor networks Data Service Middleware (DSWare) Pay more attention to Event Detection Service Experiments and performance Discussions 4 Device Networks & Their Query Processing Description of Device Network The widespread deployment of sensors, actuators and mobile devices is transforming the physical world into a computing platform. Emerging networking techniques ensure that devices are interconnected and accessible from local- or wide-area networks. Using this new computing platform, users interact with portions of the physical world. 5 Three kinds of Queries Historical queries These are typically aggregate queries over historical data obtained from the device network. An example --- For each rainfall sensor in 1800 JPA, display the average level of rainfall for 1999. Snapshot queries These queries concern the device network at a given point in time. An example --- Retrieve the current rainfall level for all sensors in 1800 JPA. Long-running queries These queries concern the device network over a time interval. For the next 5 hours, retrieve every 30 seconds the rainfall level for all sensors in 1800 JPA. 6 Two Approaches The warehousing approach Definition --- In this approach, data are extracted from the devices in a predefined way and stored in a centralized database system that is responsible for query processing. Device database system Definition --- A database system that enables distributed query processing over a device network. 7 Two Approaches --- warehousing Advantages of warehousing approach It is well suited for aggregated queries asked for historical data. Disadvantages of warehousing approach It disassociates access to device from the query workload. It uses valuable resources to transfer large amount of raw data from devices to the database server. 8 Two Approaches --- Device database system Device database system Device & Function User representation Internal representation Queries 9 Device & Function Device Each device is a mini-server that supports a set of functions and can process portions of the queries directly at the device. example, a function that detects an abnormal rainfall level. Function A function either a) b) Acquires, stores and processes data or Triggers an action in the physical world Synchronous function It returns result immediately, on demand. It is used to monitor continuous phenomena, for example, a function that returns the rainfall level. Asynchronous function It returns result after an arbitrary period of time. It is used to monitor threshold events, for example, a function that 10 detects an abnormal rainfall level. User representation Devices are represented as ADTs Abstract Data Type (ADT) objects ADT objects are objects that are single attribute values encapsulating a collection of related data. ADT objects provide controlled access to encapsulated data through a well-defined interface. An example: RFSensors (Sensor,X,Y) provides Sensor.getRainfallLevel() 11 Internal representation Device functions are represented as virtual relations Virtual relation It is a tabular representation of a function. A record in it contains the input arguments and the output argument of the function it is associated with. Arguments of Device Function Attributes of Virtual Relation Device ADT ID a1 …… aM a1 …… aM Output value Time stamp Properties of Virtual relation It is appended only It is naturally partitioned across all devices represented by the same device ADT 12 Queries Historical queries Snapshot queries They are naturally formulated as declarative queries in SQL An example of long-running query SELECT R.Sensor.getRainfallLevel() FROM RFSensors R WHERE R.Sensor.getRainfallLevel() > 50 AND $every(30) The function $every(30) specifies that a new record is inserted every 30 seconds into the append-only virtual relation corresponding to the function RFSensor.getRainfallLevel(). 13 Query Processing over Device Database System Performance Metrics Traditional performance metrics Throughput --- average number of queries processed per unit of time Response time --- time needed by the system to produce all answer records to a query. New performance metrics Resource Usage --- The total amount of energy consumed by the devices when executing a query. Reaction Time --- The interval between the time a function, called on devices, returns the value and the time the corresponding answer is produced on the front-end. 14 Distributed Query Execution Plans Query --- Retrieve every 30 seconds the rainfall level if it is greater than 50 mm. SELECT VR.value FROM VRFSensorsGetRainfallLevel VR, RFSensors R WHERE VR.Sensor = R.Sensor AND VR.value > 50 AND $every(30) 15 Plan T Data extracted from the devices are materialized in the relation VR that is located on the front-end. Join relation R and relation VR (using join condition VR.Sensor = R.Sensor AND VR.value > 50) Both R and VR are in the front-end. And the join is executed on the frontend 16 Plan A It is a simple tree where R is joined on the front-end with relation VR partitioned across a set of devices. The front-end asked each device to measure rainfall level and to transfer the resulting virtual records back to the front-end. Each virtual record arriving on the front-end is then joined with relation R. Disadvantages --- All devices with rainfall sensors transmit data to the front-end while the query only concerns the sensors which measure a rainfall level greater than 50. 17 Plan B Define a semi-join between R and the partitions of VR located on the devices. The semi-join projects out the joining attribute from R (here the device ID Sensor) and sends it to all devices. On the devices, whenever the rainfall level is measured, a virtual record is generated and joined with the portion of relation R sent by the front-end (using joining condition R.Sensor = VR.Sensor and VR.value > 50) If the joining condition is verified, the virtual record is sent back to the frontend to get joined with complete records from relation R . 18 Plan C It only pushes the selection (VR.value > 50) onto the device. Only records that verify the condition are sent back to the front-end where they are joined with relation R. Compared to Plan B, there is no subset of relation R transmitted to the devices. 19 Resource usage for sensors located outside a flood area With Plan Plan B pays B, the C, a semi-join selection initial cost is pushed of transferring to the device. a fragment The of condition relation on R With Plan A, data is sent back to the front-end whenever it is to the the rainfall devices. levelThis is checked initial cost on is theamortized device and (compared no data also no data istosent Plan is sent back A) generate. because during back because theoflifespan being of locating outside of the long-running outside of the flood. of thequery. flood. 20 Resource usage for sensors located inside a flood area With all plans, data is always sent back to the front-end. The initialthe cost ofPlan Plan is here amortized. So linecurves? B will rise Because Question: Why cost of performing CBand Plan anever A selection have almost is lowsimilar compared to the rapidly with timedata. increasing. cost of sending 21 Conclusion of Plans Pushing a selection as in Plan C is the optimal. This is intuitive since the query filters out uninteresting events generated on the devices. Pushing the selection allows the device database system to trade efficiently increased processing on the devices for reduced communication. 22 Discussions I love the idea of using virtual relations to represent device functions The complete query semantics over a Device Database are not given here. No avi value for each data, so not really real-time based. Individual nodes are not important, and a mote’s sensor may get damaged and repots wrong value. So group-based coordinate should be introduced. 23 Event Detection Service 24 Motivation sensor networks are data-centric and real-time based – Abstraction of real-time data semantics needed Individual nodes in sensor networks are unreliable -- Group-based robust coordination needed Detection of some events relies on more than one type of sensor data -- The relationship can help to increase the reliability of data decisions 25 Data Services in sensor networks Queries (location, frequency, duration) Data/Event dissemination Data Aggregation Data-centric Storage/Caching Event Detection Data Security and Access Authorization 26 Data Service Middleware (DSWare) Application Services in Data Service Middleware Database-like abstraction Real-time Scheduling Event Detection DSWare Group Management Data Storage Caching Subscription Aggregation Authorization Sensor nodes Data Storage Compare? Staticthe Map copies key to & aprovide logicalreliability node Map a logical node to multiple physical nodes Caching Spread copies Variable copiesalong & improve the routing performance path 27 Problems with current event detection schemes An external node collects reports of atomic events and determines whether the compound event occurs reduce possible in-network processing and increase unnecessary concentrated traffic around the decision node Increase detection delay (unacceptable for some time-critical applications) Explosion Atomic Event Reports Determine the occurrence of compound events 28 Event Detection Service in DSWare Event: application-interested activity in the environment that can be monitored or detected Hierarchy of events Atomic event: detected through a single sensor’s observation e.g. High Temperature, light intensity change, acoustic change Compound event: consists of a set of atomic events detected based on the detection of atomic events that a compound event consists of e.g. Explosion Explosion Detected in the area: High Temperature, light intensity change, acoustic changes 29 Event Detection Scheme in DSWare Confidence Every compound event detection report has a confidence value, which indicates the reliability of the report Confidence function is designed based on data semantics Related importance of different atomic sub-events Temporary continuity of events Statistical models Similarity among adjacent regions Waiting Time Window The time that an aggregation node waits for the arrivals of all possible atomic event reports When TW timeouts, report a compound event if the confidence value reaches the minimum confidence requirements of this event Avoid endless waiting for messages loss Enable event detection based on partial information collected 30 A Simple Example: Explosion (E) Sub-events: high temperature (T), special light (L), acoustic changes (A) Confidence function: f = [0.6 * BOOL(T) + 0.3 * BOOL(L) + 0.3 * BOOL(A)] * h (h: history factor, increases if the explosion event has been detected in previous waiting time window. Assume 1≤h≤2) Minimum Confidence: 0.8 Lost Group Leader f=0.9h Report E f=0.6h T No reports f=0.9h f=0.3h f=0.3h f=0.3h f=0.9h f=1.2h Report E f=1.2h L A Time window L L T time A Shift time window time 31 Some other issues in event detection Temporal resolution – Some events last much longer than the sensing interval of a sensor. So probably some applications will report a single event repetitively, which is unnecessary. Spatial resolution – If the size of a detection group is too small compared to the event, there might be several groups in this event’s coverage that will report the same event. 32 Performance in Reduction of Communication Base line: – Only one report of an environment property is generated from a group during each sensing interval. – Send all reports to an outside node and the entire analysis will be done there. DSWare has less communication. 33 Performance in Differentiating Events and Event-like Factors How to differentiate repetition report of event from event-like factor? How about the performance with different time window size and different minimum confidence value? 34 Discussions The idea of event detection service is well developed and completely discussed. In DSWare, data is replicated in multiple physical nodes that can be mapped to a single logical node. So consistency among these nodes is a key issue. In this paper, “weak consistency” is mentioned. But what’s the definition of “weak consistency” in sensor network? Since multiple physical nodes are used to map to a single logical node, why data caching is needed? What’s the different purposes of introducing both of them. It is mentioned that application can specify the actual scheduling schema in the sensor networks based on the most important concerns. But is it a good way for application to do that? It doesn’t seem a simple work. 35 Discussions --- (cont.) What is the position of real-time scheduling in the system? How to provide real-time? Two questions about Fig 5. How to differentiate repetition report of event from event-like factor? How about the performance, with different time window size and different minimum confidence value? A little typing mistake: In the last sentence before 5.1, “an explosion event will be reported if the Confidence_E is not less than 0.9” should be “an explosion event will be reported if the Confidence_E is no less than 0.9” 36