Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1 How to gather interesting data from thousands of Motes? • Tens to thousands of motes • Unreliable individually To collect and analyze data • Long term low energy deployment • Can using processing power at each Mote Analyze local before sharing data 2 Transmission of data is expensive compare to CPU cycles • 1Kb transmitted 100 meters = 3 million CPU instructions • AA power Mote can transmit 1 message per day for about two months (assuming no other power draws) • Power density is growing very slowly compared to computation power, storage, etc Analyze and process locally, only transmitting what is required 3 Minimize communications ◦ Minimize broadcast/receive time ◦ Minimize message size ◦ Move computations to individual nodes Nodes pass data in multi-hop fashion towards a root Select connectivity so graph helps with processing Handle faulty nodes within network 4 6 7 5 6 10 5 5 10 10 5 Max is very simple What about Count? ◦ Need to avoid double counting due to redundant paths What about spatial events? ◦ Need to evaluate readings across multiple sensors Correlation between events Failures of nodes can loose branches of the tree 6 • Connectivity Graph – unstructured or how to structure • • • • Diffusion of requests and how to combine data Maintenance messages vs Query messages Reliability of results Load balancing – messages traffic – storage • Storage costs at different nodes 7 S.Madden, M.Franklin, J.Hellerstein, and W.Hong Intel Research, 2002 8 • • • Aggregates values in low power, distributed network Implemented on TinyOS Motes SQL like language to search for values or sets of values – Simple declarative language • • Energy savings Tree based methodology – Root node generates requests and dissipates down the children 9 • Three functions to aggregate results – f (merge function) • Each node runs f to combine values • <z>=f (<x> , <y>) • EX: <SUM, COUNT>=f (<SUM1+SUM2>, <COUNT1+COUNT2>) – i (initialize function) • Generates state record at lowest level of tree • EX:<SUM, COUNT> – e (evaluator function) • Root uses e to generate the final result • RESULT=e<z>, • EX: SUM/COUNT • Functions must be preloaded on Motes or distributed via software protocols 10 Count = 10 1 2 7 1 3 1 1 3 1 1 Max via tree 11 All searches have different properties that affect aggregate performance • Duplicate insensitive – unaffected by double counting (Max, Min) vs (Count, Average) – Restrict network properties • Exemplary – return one value (Max/Min) – Sensitive to failure • Summary – computation over values (Average) – Less sensitive to failure 12 • • • Distributive – Partial states are the same as final state (Max) Algebraic – Partial states are of fixed size but differ from final state (Average - Sum, Count) Holistic – Partial states contain all sub-records (median) – Unique – similar to Holistic, but partial records may be smaller then holistic • Content Sensitive – Size of partial records depend on content (Count Distinct) 13 Diffusion of requests and then collection of information Epochs subdivided for each level to complete task ◦ Saves energy ◦ Limits rate of data flow 14 Snooping – Broadcast messages so others can hear messages ◦ Rejoin tree if parents have failure ◦ Listen to other broadcasts and only broadcast if its values are needed In case of MAX, do not broadcast if peer has transmitted a higher value Hypothesis testing – root guesses at value to minimize traffic 15 Theoretic results for ◦ 2500 Nodes Savings depend on function Duplicate Insensitive, summary best ◦ Distributive helps Holistic is the worse 16 • • • • • • 16 Mote network Count number of motes in 4 sec epochs No optimizations Quality of count is due to less radio contention in TAG Centralized used 4685 messages vs TAG’s 2330 50% reduction, but less then theoretical results – Different loss model, node placement 17 • Loss of nodes and subtrees – Maintenance for structured connectivity • Single message per node per epoch – Message size might increase at higher level nodes – Root gets overload (Does it always matter?) • Epochs give a method for idling nodes – Snooping not included, timing issues 18 S.Nath, P.Gibbons, S.Seshan, Z.Anderson Microsoft Research, 2008 20 TAG Synopsis Diffusion ◦ Not robust against node or link failure ◦ A single node failure leads to loss of the entire sub branch's data ◦ Exploiting the broadcast nature of wireless medium to enhance reliability ◦ Separating routing from aggregation ◦ The final aggregated data at the sink is independent of the underlying routing topology ◦ Synopsis diffusion can be used on top of any routing structure ◦ The order of evaluations and the number of times each data included in the result is irrelevant 21 3 10 Count = 10 1 2 7 1 3 1 1 3 1 1 Not robust against node or link failure 22 Multi-path routing Count = ◦ Benefits Robust Energy-efficient 20 23 15 2 7 3 ◦ Challenges Duplicate sensitivity Order sensitivity 58 10 4 1 1 2 23 A novel aggregation framework ◦ ODI synopsis: small-sized digest of the partial results Bit-vectors Sample Histogram Better aggregation topologies Example aggregates Performance evaluation ◦ Multi-path routing ◦ Implicit acknowledgment ◦ Adaptive rings 24 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation The exact definition of these functions depend on the particular aggregation function: ◦ SG(.) Takes a sensor reading and generates a synopsis ◦ SF(.,.) Takes two synopsis and generates a new one ◦ SE(.) Translates a synopsis into the final answer 25 Distribution phase ◦ The aggregate query is flooded ◦ The aggregate topology is constructed Aggregation phase ◦ Aggregated values are routed toward Sink ◦ SG() and SF() functions are used to create partial results 26 The sink is in R0 A node is in Ri if it’s i hops away from sink Nodes in Ri-1 should hear the broadcast by nodes in Ri Loose synchronization between nodes in different rings Each node transmits only once R3 A C R2 R1 B R0 ◦ Energy cost same as tree 27 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation Coin tossing experiment CT(x) used in Flajolet and Martin’s Algorithm: ◦ For i=1,…,x-1: CT(x) = i with probability 2 i ◦ Simulates the behavior of the exponential hash function ◦ Synopsis: a bit vector of length k > log(n) n is an upper bound on the number of the sensor nodes in the network ◦ SG(): a bit vector of length k with only the CT(k)th bit is set ◦ SF(): bit wise Boolean OR ◦ SE(): the index of lowest-order 0 in the bit vector= i->2i 1 / 0.77 Magic Constant 28 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation The number of live sensor nodes, N, is proportional to 2i 1 i N Intuition: The probability of N nodes all failing to set the ith bit is (1 2 ) i which is approximately 0.37 when N 2 and even smaller for larger N. 4 0 1 1 0 1 1 Count 1 bits 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 29 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation s s SF SF SF SF SF SF SF SF SF SG SG SG SG SG SG SG SG r1 r2 r3 r4 r5 r1 r2 Aggregation DAG SG SF SF SG r5 r4 r3 Canonical left-deep tree 30 Theorem: Properties P1-P4 are necessary and sufficient properties for ODI-Correctness ◦ P1: SG() preserves duplicates If two reading are considered duplicates then the same synopsis is generated ◦ P2: SF() is commutative SF(s1, s2) = SF(s2, s1) ◦ P3: SF() is associative SF(s1, SF(s2, s3)) = SF(SF(s1, s2), s3) ◦ P4: SF() is same-synopsis idempotent SF(s, s) = s 31 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation Uniform Sample of Readings ◦ Synopsis: A sample of size K of <value, random number, sensor id> tuples ◦ SG(): Output the tuple <valu, ru, idu> ◦ SF(s,s’): outputs the K tuples in s∪s’ with the K largest ri ◦ SE(s): Output the set of values vali in s ◦ Useful holistic aggregation 32 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation Frequent Items (items occurring at least T times) ◦ Synopsis: A set of <val, weight> pairs, the values are unique and the weights are at least log(T) ◦ SG(): Compute CT(k) where k>log(n) and call this weight and if it’s at least log(T) output <val, weight> ◦ SF(s,s’): For each distinct value discard all but the pair <value, weight> with maximum weight. Output the remaining pairs. ◦ SE(s): Output <value, 2 weight > for each <val, weight> pair in s as a frequent value and its approximate count ◦ Intuition: A value occurring at least T time is expected to have at least one of its calls to CT() return at least log(T) p=1/T 33 Communication error ◦ ◦ ◦ ◦ ◦ ◦ 1-Percent contributing h: height of DAG k: the number of neighbors each nodes has p: probability of loss The overall communication error upper bound:1 (1 If p=0.1, h=10 then the error is negligible with k=3 p k )h Approximation error ◦ Introduced by SG(), SF(), and SE() functions ◦ Theorem 2: any approximation error guarantees provided for the centralized data stream scenario immediately applies to a synopsis diffusion algorithm , as long as the data stream synopsis is ODI-correct. 34 ◦ ◦ Implicit acknowledgement provided by ODI synopses Retransmission High energy cost and delay Adapting the topology When the number of times a node’s transmission is included in the parents transmission is below a threshold Assigning the node to a ring that can have a good number of parents Assign a node in ring i with probability p to : Ring i +1 If ni > ni-1 ni+1 > ni -1 and ni+2 > ni Ring i -1 If ni-2 > ni-1 ni-1 < ni+1 and ni-2 > ni 35 Rings Adaptive Rings 36 The algorithms are implemented in TAG simulator 600 sensors deployed randomly in a 20 ft * 20 ft grid The query node is in the center Loss probabilities are assigned based of the distance between nodes 37 RMS Error % Value Included 38 Pros ◦ ◦ ◦ ◦ ◦ High reliability and robustness More accurate answers Implicit acknowledgment Dynamic topology adaptation Moderately affected by mobility Cons ◦ Approximation error ◦ Low node density decreases the benefits ◦ The fusion functions should be defined for each aggregation function ◦ Increased message size 39 Is there any benefit in coupling routing with aggregation? ◦ Choosing the paths and finding the optimal aggregation points ◦ Routing the sensed data along a longer path to maximize aggregation ◦ Finding the optimal routing structure Considering energy cost of links NP-Complete Heuristics (Greedy Incremental) Considering data correlation in the aggregation process ◦ Spatial ◦ Temporal Defining a threshold TiNA 40 Could energy saving gained by aggregation be outweighed by the cost of it? ◦ Aggregation function cost Storage cost Computation cost (Number of CPU cycles) No mobility ◦ Static aggregation tree Structure-less or structured? That is the question… ◦ Continuous ◦ On-demand 41 Transmitting large amounts of data on the internet is slow ◦ Better to process locally and transmit the interesting parts only 42 How does query rate affect design decisions? Load balancing between levels of the tree ◦ Overload root and main nodes How will video capabilities of Imote affect aggregation models? 43