Download Periodical huge data current control CS862 Advanced database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Functional Database Model wikipedia, lookup

Clusterpoint wikipedia, lookup

Object-relational impedance mismatch wikipedia, lookup

Relational model wikipedia, lookup

Database model wikipedia, lookup

Transcript
CS 862 Presentation
Querying the Physical World
------ Cornell University
Event Detection Services Using Data Service
Middleware in Distributed Sensor Networks
------
University of Virginia
Presented By Gary Zhou @ UVA
1
Comparison between these two papers
Query the physical world
 Provide database-like abstraction to applications
 No avi value for each data, so not really real-time based.
 Concentrate on individual mote.
 Special interesting point: represent device function
Event Detection Service
 Provide database-like abstraction to applications
 There is avi value for each data, so really Real-time based
 Group-based robust coordination
 Special interesting point: provide event detection service
2
Outline --- Querying the Physical World
Device Networks & Their Query Processing
 Description of Device Networks
 Three kinds of queries
 Two approaches
Device Database System




Device & Function
User representation
Internal representation
Queries
Query Processing over Device Database System
 Performance Metrics
 Distributed Query Execution Plans
 Experiments
Discussions
3
Outline --- Event Detection Service
Motivation
Data services in sensor networks
Data Service Middleware (DSWare)
Pay more attention to Event Detection Service
Experiments and performance
Discussions
4
Device Networks & Their Query Processing
Description of Device Network
The widespread deployment of sensors, actuators and mobile devices is
transforming the physical world into a computing platform.
Emerging networking techniques ensure that devices are
interconnected and accessible from local- or wide-area networks.
Using this new computing platform, users interact with portions of the
physical world.
5
Three kinds of Queries
Historical queries
These are typically aggregate queries over historical data obtained from
the device network.
An example --- For each rainfall sensor in 1800 JPA, display the
average level of rainfall for 1999.
Snapshot queries
These queries concern the device network at a given point in time.
An example --- Retrieve the current rainfall level for all sensors in
1800 JPA.
Long-running queries
These queries concern the device network over a time interval.
For the next 5 hours, retrieve every 30 seconds the rainfall level for all
sensors in 1800 JPA.
6
Two Approaches
The warehousing approach
Definition --- In this approach, data are extracted from the
devices in a predefined way and stored in a centralized database
system that is responsible for query processing.
Device database system
Definition --- A database system that enables distributed query
processing over a device network.
7
Two Approaches --- warehousing
Advantages of warehousing approach
It is well suited for aggregated queries asked for historical data.
Disadvantages of warehousing approach
It disassociates access to device from the query workload.
It uses valuable resources to transfer large amount of raw data
from devices to the database server.
8
Two Approaches --- Device database system
Device database system
Device & Function
User representation
Internal representation
Queries
9
Device & Function
Device
Each device is a mini-server that supports a set of functions and can
process portions of the queries directly at the device.
example, a function that detects an abnormal rainfall level.
Function
A function either
a)
b)
Acquires, stores and processes data or
Triggers an action in the physical world
Synchronous function


It returns result immediately, on demand.
It is used to monitor continuous phenomena, for example, a
function that returns the rainfall level.
Asynchronous function


It returns result after an arbitrary period of time.
It is used to monitor threshold events, for example, a function that
10
detects an abnormal rainfall level.
User representation
Devices are represented as ADTs
Abstract Data Type (ADT) objects
ADT objects are objects that are single attribute values encapsulating a
collection of related data. ADT objects provide controlled access to
encapsulated data through a well-defined interface.
An example: RFSensors (Sensor,X,Y) provides
Sensor.getRainfallLevel()
11
Internal representation
Device functions are represented as virtual relations
Virtual relation
It is a tabular representation of a function. A record in it contains the
input arguments and the output argument of the function it is
associated with.
Arguments of
Device Function
Attributes of
Virtual Relation
Device
ADT ID
a1
……
aM
a1
……
aM
Output
value
Time stamp
Properties of Virtual relation
It is appended only
It is naturally partitioned across all devices represented by the same
device ADT
12
Queries
Historical queries
Snapshot queries
They are naturally formulated as
declarative queries in SQL
An example of long-running query
SELECT R.Sensor.getRainfallLevel()
FROM RFSensors R
WHERE R.Sensor.getRainfallLevel() > 50
AND $every(30)
The function $every(30) specifies that a new record is
inserted every 30 seconds into the append-only
virtual relation corresponding to the function
RFSensor.getRainfallLevel().
13
Query Processing over Device Database System
Performance Metrics
Traditional performance metrics
 Throughput --- average number of queries processed per unit of time
 Response time --- time needed by the system to produce all answer
records to a query.
New performance metrics
 Resource Usage --- The total amount of energy consumed by the
devices when executing a query.
 Reaction Time --- The interval between the time a function, called
on devices, returns the value and the time the corresponding
answer is produced on the front-end.
14
Distributed Query Execution Plans
Query --- Retrieve every 30 seconds the rainfall level if it is
greater than 50 mm.
SELECT VR.value
FROM VRFSensorsGetRainfallLevel VR, RFSensors R
WHERE VR.Sensor = R.Sensor AND VR.value > 50
AND $every(30)
15
Plan T

Data extracted from the devices are materialized in the relation VR that is
located on the front-end.

Join relation R and relation VR (using join condition VR.Sensor = R.Sensor
AND VR.value > 50)

Both R and VR are in the front-end. And the join is executed on the frontend
16
Plan A

It is a simple tree where R is joined on the front-end with relation VR
partitioned across a set of devices.

The front-end asked each device to measure rainfall level and to transfer
the resulting virtual records back to the front-end.

Each virtual record arriving on the front-end is then joined with relation R.

Disadvantages --- All devices with rainfall sensors transmit data to the
front-end while the query only concerns the sensors which measure a
rainfall level greater than 50.
17
Plan B

Define a semi-join between R and the partitions of VR located on the
devices. The semi-join projects out the joining attribute from R (here the
device ID Sensor) and sends it to all devices.

On the devices, whenever the rainfall level is measured, a virtual record is
generated and joined with the portion of relation R sent by the front-end
(using joining condition R.Sensor = VR.Sensor and VR.value > 50)

If the joining condition is verified, the virtual record is sent back to the frontend to get joined with complete records from relation R .
18
Plan C

It only pushes the selection (VR.value > 50) onto the device. Only records
that verify the condition are sent back to the front-end where they are
joined with relation R.

Compared to Plan B, there is no subset of relation R transmitted to the
devices.
19
Resource usage for sensors located
outside a flood area
With Plan
Plan
B pays
B, the
C,
a semi-join
selection
initial cost
is pushed
of transferring
to the device.
a fragment
The of
condition
relation on
R
With Plan A, data is sent back to the front-end whenever it is
to the
the
rainfall
devices.
levelThis
is checked
initial cost
on is
theamortized
device and
(compared
no data
also
no data
istosent
Plan
is sent
back
A)
generate.
because
during
back
because
theoflifespan
being
of locating
outside
of the long-running
outside
of the flood.
of thequery.
flood.
20
Resource usage for sensors located
inside a flood area
With all plans, data is always sent back to the front-end.
The
initialthe
cost
ofPlan
Plan
is here
amortized.
So linecurves?
B will
rise
Because
Question:
Why
cost
of performing
CBand
Plan
anever
A
selection
have
almost
is lowsimilar
compared
to the
rapidly
with timedata.
increasing.
cost of sending
21
Conclusion of Plans
 Pushing a selection as in Plan C is the optimal. This is
intuitive since the query filters out uninteresting events generated
on the devices.
 Pushing the selection allows the device database system to
trade efficiently increased processing on the devices for reduced
communication.
22
Discussions
I love the idea of using virtual relations to represent device
functions
The complete query semantics over a Device Database are not
given here.
No avi value for each data, so not really real-time based.
Individual nodes are not important, and a mote’s sensor may get
damaged and repots wrong value. So group-based coordinate
should be introduced.
23
Event Detection Service
24
Motivation
sensor networks are data-centric and real-time based
– Abstraction of real-time data semantics needed
Individual nodes in sensor networks are unreliable
-- Group-based robust coordination needed
Detection of some events relies on more than one type of
sensor data
-- The relationship can help to increase the reliability of
data decisions
25
Data Services in sensor networks
Queries (location, frequency, duration)
Data/Event dissemination
Data Aggregation
Data-centric Storage/Caching
Event Detection
Data Security and Access Authorization
26
Data Service Middleware (DSWare)
Application
Services in Data Service Middleware
Database-like abstraction
Real-time Scheduling
Event Detection
DSWare
Group Management
Data Storage
Caching
Subscription
Aggregation
Authorization
Sensor nodes
Data Storage
Compare?
Staticthe
Map
copies
key to
& aprovide
logicalreliability
node
Map a logical node to multiple physical nodes
Caching
Spread copies
Variable
copiesalong
& improve
the routing
performance
path
27
Problems with current event detection schemes
An external node collects reports of atomic events and
determines whether the compound event occurs
 reduce possible in-network processing and increase
unnecessary concentrated traffic around the decision node
 Increase detection delay (unacceptable for some time-critical
applications)
Explosion
Atomic Event Reports
Determine the occurrence
of compound events
28
Event Detection Service in DSWare
Event: application-interested activity in the environment that can be
monitored or detected
Hierarchy of events
 Atomic event:
detected through a single sensor’s observation
e.g. High Temperature, light intensity change, acoustic change
 Compound event:
consists of a set of atomic events
detected based on the detection of atomic events that a compound
event consists of
e.g. Explosion
Explosion
Detected in the area:
High Temperature,
light intensity change,
acoustic changes
29
Event Detection Scheme in DSWare
Confidence
 Every compound event detection report has a confidence value,
which indicates the reliability of the report
 Confidence function is designed based on data semantics
Related importance of different atomic sub-events
Temporary continuity of events
Statistical models
Similarity among adjacent regions
Waiting Time Window
 The time that an aggregation node waits for the arrivals of all
possible atomic event reports
 When TW timeouts, report a compound event if the confidence
value reaches the minimum confidence requirements of this event
 Avoid endless waiting for messages loss
 Enable event detection based on partial information collected
30
A Simple Example: Explosion (E)
Sub-events: high temperature (T), special light (L), acoustic
changes (A)
Confidence function:
f = [0.6 * BOOL(T) + 0.3 * BOOL(L) + 0.3 * BOOL(A)] * h
(h: history factor, increases if the explosion event has been
detected in previous waiting time window. Assume 1≤h≤2)
Minimum Confidence: 0.8
Lost
Group
Leader
f=0.9h Report
E
f=0.6h
T
No
reports
f=0.9h
f=0.3h
f=0.3h
f=0.3h
f=0.9h
f=1.2h Report
E
f=1.2h
L
A
Time window
L
L
T
time
A
Shift time window
time
31
Some other issues in event detection
Temporal resolution
– Some events last much longer than the sensing interval of a
sensor. So probably some applications will report a single event
repetitively, which is unnecessary.
Spatial resolution
– If the size of a detection group is too small compared to the
event, there might be several groups in this event’s coverage
that will report the same event.
32
Performance in Reduction of Communication
Base line:
– Only one report of an environment
property is generated from a group
during each sensing interval.
– Send all reports to an outside
node and the entire analysis will
be done there.
DSWare has less communication.
33
Performance in Differentiating Events and Event-like Factors
How to differentiate repetition report of event from event-like
factor?
How about the performance with different time window size and
different minimum confidence value?
34
Discussions
The idea of event detection service is well developed and
completely discussed.
In DSWare, data is replicated in multiple physical nodes that can be
mapped to a single logical node. So consistency among these
nodes is a key issue. In this paper, “weak consistency” is mentioned.
But what’s the definition of “weak consistency” in sensor network?
Since multiple physical nodes are used to map to a single logical
node, why data caching is needed? What’s the different purposes of
introducing both of them.
It is mentioned that application can specify the actual scheduling
schema in the sensor networks based on the most important
concerns. But is it a good way for application to do that? It doesn’t
seem a simple work.
35
Discussions --- (cont.)
What is the position of real-time scheduling in the system? How to
provide real-time?
Two questions about Fig 5.
 How to differentiate repetition report of event from event-like
factor?
 How about the performance, with different time window size and
different minimum confidence value?
A little typing mistake:
 In the last sentence before 5.1, “an explosion event will be
reported if the Confidence_E is not less than 0.9” should be “an
explosion event will be reported if the Confidence_E is no less
than 0.9”
36