Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Peer-to-Peer (P2P) and Sensor Networks Shivkumar Kalyanaraman Rensselaer Polytechnic Institute [email protected] http://www.ecse.rpi.edu/Homepages/shivkuma Based in part upon slides of Don Towsley, Ion Stoica, Scott Shenker, Joe Hellerstein, Jim Kurose, Hung-Chang Hsiao, Chung-Ta King Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Overview P2P networks: Napster, Gnutella, Kazaa Distributed Hash Tables (DHTs) Database perspectives: data-centricity, dataindependence Sensor networks and its connection to P2P Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2 P2P: Key Idea Share the content, storage and bandwidth of individual (home) users Internet Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5 What is P2P (Peer-to-Peer)? P2P as a mindset Slashdot P2P as a model Gnutella P2P as an implementation choice Application-layer multicast P2P as an inherent property Ad-hoc networks Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6 P2P Application Taxonomy P2P Systems Distributed Computing File Sharing SETI@home Gnutella Collaboration Jabber Platforms JXTA Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7 How to Find an Object in a Network? Network Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8 A Straightforward Idea Use a BIG server Store the object How to do it inNetwork a distributed way? Provide a directory Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9 Why Distributed? Client-server model: Client is dumb Server does most things (compute, store, control) Centralization makes things simple, but introduces Single point of failure, performance bottleneck, tighter control, access fee and manage cost, … ad hoc participation? Estimate of net PCs 10 billions of Mhz CPUs 10000 terabytes of storage Clients are not that dumb after all Use the resources in the clients (at net edges) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12 First Idea: Napster Distributing objects, centralizing directory: Network Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16 Today: P2P Video traffic is dominant Source: cachelogic; Video, bittorrent, edonkey ! Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17 40-60%+ P2P traffic Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 2006 p2p Data Between 50 and 65 percent of all download traffic is P2P related. Between 75 and 90 percent of all upload traffic is P2P related. And it seems that more people are using p2p today In 2004 1 CacheLogic-server registered 3 million IP-addresses in 30 days In 2006 1 CacheLogic-server registered 3 million IP-addresses in 8 days So what do people download? 61,4 percent video 11,3 percent audio 27,2 percent is games/software/etc. The average filesize of shared files is 1 gigabyte! Source: http://torrentfreak.com/peer-to-peer-traffic-statistics/ Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20 A More Aggressive Idea Distributing objects and directory: How to find Blind objects w/o flooding directory? ! Network Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22 Gnutella Distribute file location Idea: flood the request Hot to find a file: Send request to all neighbors Neighbors recursively multicast the request Eventually a machine that has the file receives the request, and it sends back the answer Advantages: Totally decentralized, highly robust Disadvantages: Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23 Gnutella: Unstructured P2P Ad-hoc topology Queries are flooded for bounded number of hops No guarantees on recall xyz xyz Query: “xyz” Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24 Now Bittorrent & Edonkey2000 (2006) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25 Lessons and Limitations Client-Server performs well But not always feasible Ideal performance is often not the key issue! Things that flood-based systems do well Organic scaling Decentralization of visibility and liability Finding popular stuff Fancy local queries Things that flood-based systems do poorly Finding unpopular stuff [Loo, et al VLDB 04] Fancy distributed queries Vulnerabilities: data poisoning, tracking, etc. Guarantees about anything (answer quality, privacy, etc.) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26 Detour …. Bittorrent Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29 BitTorrent – joining a torrent new leecher 2 join peer list metadata file 1 3 tracker data request 4 website seed/leecher Peers divided into: seeds: have the entire file leechers: still downloading 1. obtain the metadata file 2. contact the tracker 3. obtain a peer list (contains seeds & leechers) 4. contact peers from that list for dataShivkumar Kalyanaraman Rensselaer Polytechnic Institute 30 BitTorrent – exchanging data leecher B leecher A I have seed leecher C ● Verify pieces using hashes ● Download sub-pieces in parallel ● Advertise received pieces to the entire peer list ● Look for the rarest pieces Rensselaer Polytechnic Institute 31 Shivkumar Kalyanaraman ! BitTorrent - unchoking leecher B leecher A seed leecher C leecher D ● Periodically calculate data-receiving rates ● Upload to (unchoke) the fastest downloaders ● Optimistic unchoking ▪ periodically select a peer at random and upload to it Shivkumar Kalyanaraman ▪ continuously look for the fastest partners Rensselaer Polytechnic Institute 32 End of Detour …. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33 Back to … P2P Structures Unstructured P2P architecture Napster, Gnutella, Freenet No “logically” deterministic structures to organize the participating peers No guarantee objects be found How to find objects within some no. of hops? Extend hashing Structured P2P architecture CAN, Chord, Pastry, Tapestry, Tornado, … Viewed as a distributed hash table for directory Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34 How to Bound Search Quality? Many ideas …, again Work on placement! Network Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35 High-Level Idea: Indirection Indirection in space Logical (content-based) IDs, routing to those IDs “Content-addressable” network Tolerant of churn nodes joining and leaving the network to h h=z Indirection in time Want some scheme to temporally decouple send and receive Persistence required. Typical Internet solution: soft state Combo of persistence via storage and via retry “Publisher” requests TTL on storage Republishes as needed Metaphor: Distributed Hash Table z Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 36 Basic Idea Publish (H(y)) P2P Network Object “y” Objects have hash keys Join (H(x)) Peer “x” H(y) H(x) y Hash key Peer nodes also x have hash keys in the same hash space Place object to the peer with closest hash keys Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 37 Distributed Hash Tables (DHTs) Abstraction: a distributed hash-table data structure insert(id, item); item = query(id); (or lookup(id);) Note: item can be anything: a data object, document, file, pointer to a file… Proposals CAN, Chord, Kademlia, Pastry, Tapestry, etc Goals: Make sure that an item (file) identified is always found Scales to hundreds of thousands of nodes Handles rapid arrival and failure of nodes Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 38 Viewed as a Distributed Hash Table Hash table 0 2128-1 Peer node Each is responsible for a range of the hash table, according to the peer hash key Objects Note that are placed in the peer with the closest key peers are Internet edges Internet Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 39 How to Find an Object? Hash table 0 2128-1 Peer node Want to keep onlyone hop to a few entries! find the object Simplest idea: Everyone knows everyone else! Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 40 Structured Networks Distributed Hash Tables (DHTs) Hash table interface: put(key,item), get(key) O(log n) hops Guarantees on recall K I (K1,I1) K I K I K I K I I1 K I K I put(K1,I1) get (K1) K I K I Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 41 Content Addressable Network, CAN Distributed hash table Hash table as in a Cartesian coordinate space A peer only needs to know its logical neighbors Dimensional-ordered multihop routing Hash table 0 2128-1 Peer node Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 42 Content Addressable Network (CAN) Associate to each node and item a unique id in an ddimensional Cartesian space on a d-torus Properties Routing table size O(d) Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 43 CAN Example: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: Node n1:(1, 2) first node that joins cover the entire space 7 6 5 4 3 n1 2 1 0 0 1 2 3 4 5 6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 44 7 CAN Example: Two Dimensional Space Node n2:(4, 2) joins space is divided between n1 and n2 7 6 5 4 3 n2 n1 2 1 0 0 1 2 3 4 5 6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 45 7 CAN Example: Two Dimensional Space Node n2:(4, 2) joins space is divided between n1 and n2 7 6 n3 5 4 3 n2 n1 2 1 0 0 1 2 3 4 5 6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 46 7 CAN Example: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n2 n1 2 1 0 0 1 2 3 4 5 6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 47 7 CAN Example: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 5 f4 4 f1 3 n2 n1 2 f3 1 f2 0 0 1 2 3 4 5 6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 48 7 CAN Example: Two Dimensional Space Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 5 f4 4 f1 3 n2 n1 2 f3 1 f2 0 0 1 2 3 4 5 6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 49 7 CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 Can route around some failures 7 6 n5 n4 n3 5 f4 4 f1 3 n2 n1 2 f3 1 f2 0 0 1 2 3 4 5 6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 50 7 Another Design: Chord Node and object keys: random location around a circle Neighbors: 2-i nodes around the circle found by routing to desired key Routing: greedy pick nbr closest to destination Storage: “own” interval node owns key range between her key and previous node’s key Ownership range Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 51 OpenDHT A shared DHT service The Bamboo DHT Hosted on PlanetLab Simple RPC API You don’t need to deploy or host to play with a real DHT! Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 52 Review: DHTs vs Unstructured P2P DHTs good at: exact match for “rare” items DHTs bad at: keyword search, etc. [can’t construct DHT-based Google] tolerating extreme churn Gnutella etc. (unstructured P2P) good at: general search finding common objects very dynamic environments Gnutella etc. bad at: finding “rare” items Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 53 Distributed Systems Pre-Internet Connected by LANs (low loss and delay) Small scale (10s, maybe 100s per server) PODC literature focused on algorithms to achieve strict semantics in the face of failures Two-phase commits Synchronization Byzantine agreement Etc. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 54 Distributed Systems Post-Internet Very different context: Huge scales (thousands if not millions) Highly variable connectivity Failures common Organic growth Abandoned distributed strict semantics Adaptive apps rather than “guaranteed” infrastructure Adopted pairwise client-server approach Server is centralized (even if server farm) Relatively primitive approach (no sophisticated dist. algms.) Little support from infrastructure or middleware Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 55 A Database viewpoint on DHTs…: Towards Data-centricity, Data Independence Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 56 Host-centric Protocols Protocols defined in terms of IP addresses: Unicast: IP address = host Multicast: IP address = set of hosts Destination address is given to protocol Protocol delivers data from one host to another unicast: conceptually trivial multicast: address is logical, not physical Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 57 Host-centric Applications Classic applications: destination is “intrinsic” telnet: target machine FTP: location of files electronic mail: email address turns into mail server multimedia conferencing: machines of participants Destination is specified by user (not network) Usually specified by hostname not address DNS translates names into addresses Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 58 Domain Name System (DNS) DNS is built around recursive delegation Top level domains (TLDs): .com, .net, .edu, etc. TLDs delegate authority to subdomains berkeley.edu Subdomains can further delegate cs.berkeley.edu Hierarchy fits host administrative structure Local decentralized control Crucial to efficient hostname resolution Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 59 Modern Web Data-Centricity URLs often function as names of data users think of www.cnn.com as data, not a host Fact that www.cnn.com is a hostname is irrelevant Users want data, not access to particular host The web is now data-centric Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 60 Data-centric App in Host-centric World Data still associated with host names (URLs) administrative structure of data same as hosts weak point in current web Key enabler: search engines Searchable databases map keywords to URLs Allowed users to find desired data Networkers focused on technical problems: HTTP, persistence (URNs), replication (CDNs), ... Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 61 A DNS for Data? DHTs… Can we map data names into addresses? a data-centric DNS, distributed and scalable doesn’t alter net protocols, but aids data location not just about stolen music, but a general facility A formidable challenge: Data does not have a clear administrative hierarchy Likely need to support a flat namespace Can one do this scalably? Data-centrism requires scalable flat lookups => DHTs Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 62 Data Independence In DB Design Decouple app-level API from data organization Can make changes to data layout without modifying applications Simple version: location-independent names Fancier: declarative queries “As clear a paradigm shift as we can hope to find in computer science” - C. Papadimitriou Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 63 The Pillars of Data Independence DBMS B-Tree Indexes Value-based lookups have to compete with direct access Must adapt to shifting data distributions Must guarantee performance Join Ordering, AM Selection, etc. Query Optimization Support declarative queries beyond lookup/search Must adapt to shifting data distributions Must adapt to changes in environment Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 64 Generalizing Data Independence A classic “level of indirection” scheme Indexes are exactly that Complex queries are a richer indirection The key for data independence: It’s all about rates of change Hellerstein’s Data Independence Inequality: Data independence matters when d(environment)/dt >> d(app)/dt Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 65 Data Independence in Networks d(environment)/dt >> d(app)/dt In databases, the RHS is unusually small This drove the relational database revolution In extreme networked systems, LHS is unusually high And the applications increasingly complex and datadriven Simple indirections (e.g. local lookaside tables) insufficient Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 66 Hierarchical Networks (& Queries) IP Hierarchical name space (www.vldb.org, 141.12.12.51) Hierarchical routing Autonomous Systems correlate with name space (though not perfectly) DNS Hierarchical name space (“clients” + hierarchy of servers) Hierarchical routing w/aggressive caching 13 managed “root servers” Traditional pros/cons of Hierarchical data mgmt Works well for things aligned with the hierarchy Esp. physical locality a la Astrolabe Inflexible No data independence! Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 67 The Pillars of Data Independence DBMS B-Tree Indexes Value-based lookups have to compete with direct access Must adapt to shifting data distributions Must guarantee performance P2P ContentAddressable Overlay Networks (DHTs) Join Ordering, Multiquery AM Selection, dataflow etc. sharing? Query Optimization Support declarative queries beyond lookup/search Must adapt to shifting data distributions Must adapt to changes in environment Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 68 Sensor Networks: The Internet Meets the Environment Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 69 Today: Internet meets Mobile Wireless Computing iPoD: impact of disk size/cost Samsung Cameraphone w/ camcorder SONY PSP: mobile gaming Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Blackberry: phone + PDA Computing: smaller, faster Disks: larger size, small form Communications: wireless voice, data Multimedia integration: voice, data, video, games 70 Tomorrow: Embedded Networked Sensing Apps Seismic Structure response Marine Microorganisms Micro-sensors, on-board processing, wireless interfaces feasible at very small scale--can monitor phenomena “up close” Enables spatially and temporally dense environmental monitoring Contaminant Transport Embedded Networked Sensing will reveal previously unobservable phenomena Ecosystems, Biocomplexity Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 71 Embedded Networked Sensing: Motivation Imagine: high-rise buildings self-detect structural faults (e.g., weld cracks) schools detect airborn toxins at low concentrations, trace contaminant transport to source buoys alert swimmers to dangerous bacterial levels earthquake-rubbled building infiltrated with robots and sensors: locate survivors, evaluate structural damage ecosystems infused with chemical, physical, acoustic, image sensors to track global change parameters battlefield sprinkled with sensors that identify track friendly/foe air, ground vehicles, personnel Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 72 Embedded Sensor Nets: Enabling Technologies Embed numerous distributed devices to monitor and interact with physical world Network devices to coordinate and perform higher-level tasks Embedded Networked Exploit collaborative Sensing, action Control system w/ Small form factor Untethered nodes Sensing Tightly coupled to physical world Shivkumar Kalyanaraman Exploit spatially/temporally dense, in situ/remote, sensing/actuation Rensselaer Polytechnic Institute 73 Sensornets Vision: Many sensing devices with radio and processor Enable fine-grained measurements over large areas Huge potential impact on science, and society Technical challenges: untethered: power consumption must be limited unattended: robust and self-configuring wireless: ad hoc networking Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 74 Similarity w/ P2P Networks Sensornets are inherently data-centric Users know what data they want, not where it is Estrin, Govindan, Heidemann (2000, etc.) Centralized database infeasible vast amount of data, constantly being updated small fraction of data will ever be queried sending to single site expends too much energy Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 75 Sensor Nets: New Design Themes Self configuring systems that adapt to unpredictable environment Leverage data processing inside the network dynamic, messy (hard to model), environments preclude preconfigured behavior exploit computation near data to reduce communication collaborative signal processing achieve desired global behavior with localized algorithms (distributed control) Long-lived, unattended, untethered, low duty cycle systems energy a central concern communication primary consumer of scarce energy resource Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 76 From Embedded Sensing to Embedded Control embedded in unattended “control systems” control network, and act in environment critical app’s extend beyond sensing to control and actuation transportation, precision agriculture, medical monitoring and drug delivery, battlefield app’s concerns extend beyond traditional networked systems and app’s: usability, reliability, safety need systems architecture to manage interactions current system development: one-off, incrementally tuned, stove-piped repercussions for piecemeal uncoordinated design: insufficient longevity, interoperability, safety, robustness, scaling Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 77 Why cant we simply adapt Internet protocols, “end to end” architecture? Internet routes data using IP Addresses in Packets and Lookup tables in routers humans get data by “naming data” to a search engine many levels of indirection between name and IP address embedded, energy-constrained (un-tethered, small-form-factor), unattended systems cant tolerate communication overhead of indirection special purpose system function(s): don’t need want Internet general purpose functionality designed for elastic applications. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 78 Sample Layered Architecture User Queries, External Database Resource constraints call for more tightly integrated layers In-network: Application processing, Data aggregation, Query processing Data dissemination, storage, caching Open Question: What are defining Architectural Principles? Adaptive topology, Geo-Routing MAC, Time, Location Rensselaer Polytechnic Institute Phy: comm, sensing, actuation, SP Shivkumar Kalyanaraman 79 Coverage measures D x S area coverage: fraction of area covered by sensors detectability: probability sensors detect moving objects node coverage: fraction of sensors covered by other sensors control: Given: sensor field (either known sensor locations, or spatial density) where to add new nodes for max coverage how to move existing nodes for max coverage Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 80 In Network Processing communication expensive when limited power bandwidth perform (data) processing in network close to (at) data forward fused/synthesized results e.g., find max. of data distributed data, distributed computation Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 81 Distributed Representation and Storage K V K V K V K V K V K V K V K V K V Data Centric Protocols, In-network Processing goal: K V K V pattern-triggered data collection Time Interpretation of spatially distributed data (Pernode processing alone is not enough) network does in-network processing based on distribution of data Queries automatically directed towards nodes that maintain relevant/matching data Multi-resolution data storage and retrieval Distributed edge/feature detection Index data for easy temporal and spatial searching Finding global statistics (e.g., distribution) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 82 Directed Diffusion: Data Centric Routing Basic idea name data (not nodes) with externally relevant attributes: data type, time, location of node, SNR, diffuse requests and responses across network using application driven routing (e.g., geo sensitive or not) support in-network aggregation and processing data sources publish data, data clients subscribe to data however, all nodes may play both roles node that aggregates/combines/processes incoming sensor node data becomes a source of new data node that only publishes when combination of conditions arise, is client for triggering event data true peer to peer system? Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 83 Traditional Approach: Warehousing data extracted from sensors, stored on server Query processing takes place on server Warehouse Front-end Sensor Nodes Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 84 Sensor Database System Sensor Database System supports distributed query processing over sensor network Sensor DB Sensor DB Sensor DB Sensor DB Front-end Sensor DB Sensor DB Sensor DB Sensor DB Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 85 Sensor Nodes Sensor Database System • Characteristics of a Sensor Network: Streams of data Uncertain data Large number of nodes Multi-hop network No global knowledge about the network Node failure and interference is common Energy is the scarce resource Limited memory No administration, … Can existing database techniques be reused? What are the new problems and solutions? Representing sensor data Representing sensor queries Processing query fragments on sensor nodes Distributing query fragments Adapting to changing network conditions Dealing with site and communication failures Deploying and Managing a sensor database system Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 86 Summary P2P networks: Napster, Gnutella, Kazaa Distributed Hash Tables (DHTs) Database perspectives: data-centricity, dataindependence Sensor networks and its connection to P2P Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 87