Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Digital Enterprise Research Institute Introduction to Peer-to-Peer Networks Manfred Hauswirth, Marcel Karnstedt Copyright 2009 Digital Enterprise Research Institute. All rights reserved. www.deri.ie Goals of the Tutorial Digital Enterprise Research Institute Position the P2P paradigm in the design space of distributed systems Get an overview of P2P systems and the underlying concepts Understand the problem of decentralized data management in P2P systems www.deri.ie What is P2P? Digital Enterprise Research Institute Clay Shirkey (The Accelerator Group): “Peer-to-peer is a class of applications that take advantage of resources—storage, cycles, content, human presence— available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must operate outside the DNS and have significant or total autonomy of central servers.” P2P “litmus test:” – Does it allow for variable connectivity and temporary network addresses? – Does it give the nodes at the edges of the network significant autonomy? www.deri.ie P2P ~ an application-level Internet on top of the Internet P2P in a historical Context Digital Enterprise Research Institute www.deri.ie The original Internet was designed as a P2P system any 2 computers could send packets to each other – no firewalls / no network address translation – no asymmetric connections (V.90, ADSL, cable, etc.) the back-then “killer apps” FTP and telnet are C/S but anyone could telnet/FTP anyone else servers acted as clients and vice versa cooperation was a central goal and “value”: no spam or exhaustive bandwidth consumption Typical examples of “old-fashioned P2P”: Usenet News DNS The emergence of P2P can be seen as a renaissance of the original Internet model What is P2P? Digital Enterprise Research Institute Every participating node acts as both a client and a server (“servent”) Every node “pays” its participation by providing access to (some of) its resources Properties: no central coordination no central database no peer has a global view of the system global behavior emerges from local interactions all existing data and services are accessible from any peer peers are autonomous peers and connections are unreliable www.deri.ie Where is P2P – System layers ? Digital Enterprise Research Institute Users uses E-commerce systems can be P2P or centralized QoS Application Information management User Commerce and society is P2P Application layer www.deri.ie Directories and databases can be P2P or centralized exploits Information Management Networks are P2P QoS Internet exploits Qos Network Types of P2P Systems Digital Enterprise Research Institute E-commerce systems File sharing systems Napster, Gnutella, Freenet, … Distributed Databases eBay, B2B market places, B2B integration servers, … Mariposa [Stonebraker96], … Networks Arpanet Mobile ad-hoc networks www.deri.ie How much P2P is involved? Digital Enterprise Research Institute www.deri.ie P2P user interaction P2P application P2P information management eBay yes no no Napster yes yes no Gnutella, Freenet yes yes yes Related Approaches Digital Enterprise Research Institute Related distributed information system approaches: Event-based systems Push systems Mobile agents Distributed databases www.deri.ie Event-based (publish/subscribe) Digital Enterprise Research Institute www.deri.ie System model Components (peers) interact by generating and receiving events Components declare interest in receiving specific (patterns of) events and are notified upon their occurrence Supports a highly flexible interaction between loosely-coupled components XY Subscribe to X followed by Y Y X Event-based vs. Peer-to-Peer Digital Enterprise Research Institute Common properties: symmetric communication style dynamic binding between producers and consumers Subscription to events ~ “passive” queries EB: notification P2P: active discovery Subscription language supports more sophisticated queries and pattern matching (event patterns with time dependencies) Event-based systems typically have a specialized event distribution infrastructure EB: 2 node types, P2P: 1 node type EB infrastructure must be deployed www.deri.ie Push Systems Digital Enterprise Research Institute A set of designated broadcasters offer information that is pre-grouped in channels (weather, news, etc.) Receivers subscribe to channels of their interest and receive channel information as it is being “broadcast” (timely distribution) Receivers may have to pay prior to receiving the information (pay-per-view, flat fee, etc.) Pull push www.deri.ie Push Systems vs. Peer-to-Peer Digital Enterprise Research Institute www.deri.ie Asymmetric communication style (P2P: symmetric) Focus is on timely data distribution not on discovery Filtering may be deployed to reduce data transmission requirements Subscription to channels is prerequisite Producer/consumer binding is static Push systems require a specialized distribution infrastructure Push: 3 node types, P2P: 1 node type Push infrastructure must be deployed Mobile Agents Digital Enterprise Research Institute A mobile agent is a computational entity that moves around in a network at its own volition to accomplish a task on behalf of its owner can cooperate with other agents “learns” (“Whom to visit next?”) Mobility (heterogeneous network!) Weak: code, data Strong: code, data, execution Stack www.deri.ie Mobile Agents vs. Peer-to-Peer Digital Enterprise Research Institute www.deri.ie Very similar in terms of search and navigation P2P: the peers propagate requests (search, update) MA: the nodes propagate the agents Mobile agent ~ “active” query Mobile agent systems require a considerably more sophisticated environment mobile code support (heavy) security (protect the receiving node from malicious mobile agents and vice versa) In many domains P2P systems can take over more apt for distributed data management less requirements (sending code requires much bandwidth, security, etc.) Distributed Databases Digital Enterprise Research Institute www.deri.ie Fragmenting large databases (e.g., relational) over physically distributed nodes Efficient processing of complex queries (e.g., SQL) by decomposing them Efficient update strategies (e.g., lazy vs. eager) Consistent transactions (e.g., 2 phase commit) Normally approaches rely on central coordination Distributed Databases vs. Peer-to-Peer Digital Enterprise Research Institute Data distribution is a key issue for P2P systems Approaches in distributed DB that address scalability LH* family of scalable hash index structures [Litwin97] Snowball: scalable storage system for workstation clusters [Vingralek98] Fat-Btree: a scalable B-Tree for parallel DB [Yokota 9] Approaches in distributed DB that address autonomy (and scalability) www.deri.ie Mariposa: distributed relational DBMS based on an underlying economic model [Stonebraker96] P2P Data Management has to address both scalability and autonomy Usage Patterns to Position P2P Digital Enterprise Research Institute www.deri.ie Discovering information is the predominant problem Occasional discovery: search engines P2P, MA Notification: event-based systems push notification for (correlated) events (event patterns) E.g., notify me when my stocks drop below a threshold Systematic discovery: P2P systems ad hoc requests, irregular E.g., new town — where is the next car rental? search engines, MA find certain type of information on a regular basis E.g., search for MP3 files of Jethro Tull regularly Continuous information feed: push systems event-based subscription to a certain information type E.g., sports channel, updates are sent as soon as available The Interaction Spectrum Digital Enterprise Research Institute Event-based systems Push systems passive www.deri.ie Mobile agents Peer-to-peer systems active Peer-to-Peer vs. C/S and Web Digital Enterprise Research Institute www.deri.ie Client-Server Peer-to-Peer Sessionbased Web-based tight loose very loose asymmetric asymmetric symmetric Number of Clients moderate (1000) high (1,000,000) high (1,000,000) Number of Servers few (10) many (100,000) none (0) Coupling Comm. Style Coupling vs. Scalability Digital Enterprise Research Institute Coupling www.deri.ie session-based push-based event-based web-based peer-to-peer Scalability P2P System Models Digital Enterprise Research Institute www.deri.ie Centralized model global index held by a central authority (single point of failure) direct contact between requestors and providers Example: Napster Decentralized model Examples: Freenet, Gnutella no global index, no central coordination, global behavior emerges from local interactions, etc. direct contact between requestors and providers (Gnutella) or mediated by a chain of intermediaries (Freenet) Hierarchical model introduction of “super-peers” mix of centralized and decentralized model Centralized Information Systems Digital Enterprise Research Institute Web search engine www.deri.ie Client Global scale application Client Example: Google 150 Mio searches/day 1-2 Terabytes of data Client (April 2001) Result Find home page of Karl Aberer … "aberer" Client Client 1 2 Google Server Client Client Strengths Global ranking Fast response time Weaknesses Client Client Client Google: 15000 servers Infrastructure, administration, cost A new company for every global application ? (Semi-)Decentralized Information Systems Digital Enterprise Research Institute P2P Music file sharing www.deri.ie Global scale application Peer Example: Napster 1.57 Mio. Users 10 TeraByte of data (2 Mio songs, 220 songs per user) (February 2001) Find Request and transfer file <title> "brick in the wall" Result f.mp3 <artist> you find"pink f.mp3floyd" at peer x from peer X directly <size> "1 MB" schema <category> "rock" PeerX Peer 3 Peer Peer 1 2 Napster Server Peer Peer Peer Peer Peer Napster: 100 servers Lessons Learned from Napster Digital Enterprise Research Institute www.deri.ie Strengths: Resource Sharing Every node “pays” its participation by providing access to its resources – physical resources (disk, network), knowledge (annotations), ownership (files) Every participating node acts as both a client and a server (“servent”): P2P global information system without huge investment decentralization of cost and administration = avoiding resource bottlenecks Weaknesses: Centralization server is single point of failure unique entity required for controlling the system = design bottleneck copying copyrighted material made Napster target of legal attack increasing degree of resource sharing and decentralization Centralized System Decentralized System Fully Decentralized Information Systems Digital Enterprise Research Institute P2P file sharing www.deri.ie • Strengths Global scale application Example: Gnutella – Good response time, scalable – No infrastructure, no administration – No single point of failure 40.000 nodes, 3 Mio files • Weaknesses (August 2000) – High network traffic – No structured search – Free-riding I have Find "brick_in_the_wall.mp3" "brick in the wall" …. Self-organizing System Gnutella: no servers Self-Organization Digital Enterprise Research Institute www.deri.ie Self-organized systems well known from physics, biology, cybernetics distribution of control ( = decentralization = symmetry in roles = P2P) local interactions, information and decisions emergence of global structures failure resilience P2P Architectures Digital Enterprise Research Institute www.deri.ie Principle of self-organization can be applied at different system layers Networking Layer Internet Routing TCP/IP, DNS Data Access Layer Overlay Networks Resource Location Gnutella, FreeNet Service Layer P2P applications Messaging, Distributed Processing Napster, Seti, Groove User Layer User Communities Collaboration eBay, Ciao Original Internet designed as decentralized system: P2P overlay networks ~ application-level Internet on top of the Internet support application-specific addresses Resource Location in P2P Systems Digital Enterprise Research Institute www.deri.ie Problem: Peers need to locate distributed information Peers with address p store data items d that are identified by a key kd Given a key kd (or a predicate on kd) locate a peer that stores d, i.e. locate the index information (kd, p) Thus, the data we have to manage consists of the key-value pairs (kd, p) Can such a distributed database be maintained and accessed by a set of peers without central control ? P2 P3 P1 P8 kd="jingle-bells" p="P8" d="jingle-bells.mp3"" kd ="jingle" ? P4 ("jingle",P8) P7 P5 P6 Resource Location Problem Digital Enterprise Research Institute Operations search for a key at a peer: p->search(k) update a key at a peer: p->update(k,p') peers joining and leaving the network: p->join(p’) Performance Criteria (for search) search latency: e.g. searchtime(query) Log(size(database)) message bandwidth, e.g. messages(query) Log(size(database)) messages(update) Log(size(database)) storage space used, e.g. storagespace(peer) Log(size(database)) resilience to failures (network, peers) Qualitative Criteria complex search predicates: equality, prefix, containment, similarity search use of global knowledge peer autonomy peer anonymity and trust security (e.g. denial of service attacks) www.deri.ie Summary Digital Enterprise Research Institute What is a P2P System ? What is emergence ? At which layers can the P2P architecture occur ? How do we define efficiency for a P2P resource location system ? www.deri.ie Unstructured P2P Overlay Networks Digital Enterprise Research Institute No index information is used www.deri.ie i.e. the information (k, p) is only available directly from p Simplest approach: Message Flooding (Gossiping) send query message to C neighbors messages have limited time-to-live TTL messages have IDs to eliminate cycles k="jingle-bells" Example: C=3, TTL=2 Gnutella Digital Enterprise Research Institute Developed in a 14 days “quick hack” by Nullsoft (winamp) Originally intended for exchange of recipes Evolution of Gnutella www.deri.ie Published under GNU General Public License on the Nullsoft web server Taken off after a couple of hours by AOL (owner of Nullsoft) This was enough to “infect” the Internet Gnutella protocol was reverse engineered from downloaded versions of the original Gnutella software Third-party clients were published and Gnutella started to spread Based on message flooding Typical values C=4, TTL=7 TTL One request leads to 2 * C * (C 1)i 26,240 messages i 0 Hooking up to the Gnutella systems requires that a new peer knows at least one Gnutella host (gnutellahosts.com:6346; outside the Gnutella protocol specification) Neighbors are found using a basic discovery protocol (ping-pong messages) Gnutella: Protocol Message Types Digital Enterprise Research Institute Type Ping Pong Query QueryHit Push Description www.deri.ie Contained Information Announce availability and probe for None other servents Response to a ping IP address and port# of responding servent; number and total kb of files shared Search request Minimum network bandwidth of responding servent; search criteria Returned by servents that have IP address, port# and network bandwidth of the requested file responding servent; number of results and result set File download requests for Servent identifier; index of requested file; IP servents behind a firewall address and port to send file to Gnutella: Meeting Peers (Ping/Pong) Digital Enterprise Research Institute www.deri.ie C A B A’s ping B’s pong C’s pong D’s pong E’s pong D E Gnutella: Searching (Query/QueryHit/GET) Digital Enterprise Research Institute GET X.mp3 www.deri.ie X.mp3 X.mp3 C A B A’s query (e.g., X.mp3) C’s query hit E’s query hit D E X.mp3 Popularity of Queries [Sripanidkulchai01] Digital Enterprise Research Institute Very popular documents are approximately equally popular Less popular documents follow a Zipf-like distribution (i.e., the probability of seeing a query for the ith most popular query is proportional to 1/(ialpha) Access frequency of web documents also follows Zipf-like distributions caching might work for Gnutella www.deri.ie Free-riding on Gnutella [Adar00] Digital Enterprise Research Institute www.deri.ie 24 hour sampling period: 70% of Gnutella users share no files 50% of all responses are returned by top 1% of sharing hosts A social problem not a technical one Problems: Degradation of system performance: collapse? Increase of system vulnerability “Centralized” (“backbone”) Gnutella copyright issues? Verified hypotheses: H1: A significant portion of Gnutella peers are free riders. H2: Free riders are distributed evenly across domains H3: Often hosts share files nobody is interested in (are not downloaded) Free-riding Statistics - 1 [Adar00] Digital Enterprise Research Institute H1: Most Gnutella users are free riders Of 33,335 hosts: 22,084 (66%) of the peers share no files 24,347 (73%) share ten or less files Top 1 percent (333) hosts share 37% (1,142,645) of total files shared Top 5 percent (1,667) hosts share 70% (1,142,645) of total files shared Top 10 percent (3,334) hosts share 87% (2,692,082) of total files shared www.deri.ie Free-riding Statistics - 2 [Adar00] Digital Enterprise Research Institute H3: Many servents share files nobody downloads Of 11,585 sharing hosts: Top 1% of sites provide nearly 47% of all answers Top 25% of sites provide 98% of all answers 7,349 (63%) never provide a query response www.deri.ie Topology of Gnutella [Jovanovic01] Digital Enterprise Research Institute Small-world properties verified (“find everything close by”) Backbone + outskirts www.deri.ie Gnutella Backbone Digital Enterprise Research Institute [Jovanovic01] www.deri.ie Categories of Queries Digital Enterprise Research Institute Categorized top 20 queries [Sripanidkulchai01] www.deri.ie Caching in Gnutella [Sripanidkulchai01] Digital Enterprise Research Institute Average bandwidth consumption in tests: 3.5Mbps Best case: trace 2 (73% hit rate = 3.7 times traffic reduction) www.deri.ie Gnutella: Bandwidth Barriers Digital Enterprise Research Institute www.deri.ie Clip2 measured Gnutella over 1 month: typical query is 560 bits long (including TCP/IP headers) 25% of the traffic are queries, 50% pings, 25% other on average each peer seems to have 3 other peers actively connected Clip2 found a scalability barrier with substantial performance degradation if queries/sec > 10: 10 queries/sec * 560 bits/query * 4 (to account for the other 3 quarters of message traffic) * 3 simultaneous connections 67,200 bps 10 queries/sec maximum in the presence of many dialup users won’t improve (more bandwidth - larger files) Gnutella: Summary Digital Enterprise Research Institute www.deri.ie Completely decentralized Hit rates are high High fault tolerance Adopts well and dynamically to changing peer populations Protocol causes high network traffic (e.g., 3.5Mbps). For example: 4 connections C / peer, TTL = 7 1 ping packet can cause 2 * i 0 C * (C 1)i 26,240 packets TTL No estimates on the duration of queries can be given No probability for successful queries can be given Topology is unknown algorithms cannot exploit it Free riding is a problem Reputation of peers is not addressed Simple, robust, and scalable (at the moment) Modern Gnutella Digital Enterprise Research Institute Lots of improvements Hybrid Super-Peer architecture “Gnutella + DHT” www.deri.ie Improvements of Message Flooding Digital Enterprise Research Institute www.deri.ie Expanding Ring start search with small TTL (e.g. TTL = 1) if no success iteratively increase TTL (e.g. TTL = TTL +2) k-Random Walkers forward query to one randomly chosen neighbor only, with large TTL start k random walkers random walker periodically checks with requester whether to continue Discussion Unstructured Networks Digital Enterprise Research Institute www.deri.ie Performance Search latency: low (graph properties) Message Bandwidth: high – improvements through random walkers, but essentially the whole network needs to be explored Storage cost: low (only local neighborhood) Update and maintenance cost: low (only local updates) Resilience to failures good: multiple paths are explored and data is replicated Qualitative Criteria search predicates: very flexible, any predicate is possible global knowledge: none required peer autonomy: high Summary Digital Enterprise Research Institute www.deri.ie How are unstructured P2P networks characterized ? What is the purpose of the ping/pong messages in Gnutella ? Why is search latency in Gnutella low ? Which are methods to reduce message bandwidth in unstructured networks ? Hierarchical P2P Overlay Networks Digital Enterprise Research Institute www.deri.ie Servers provide index information, i.e. the information (k, p) is available from dedicated servers Simplest Approach one central server user register files service (file exchange) is organized as P2P architecture index server k="jingle-bells" Napster Digital Enterprise Research Institute www.deri.ie Central (virtual) database which holds an index of offered MP3/WMA files Clients connect to this server, identify themselves (account) and send a list of MP3/WMA files they are sharing (C/S) Other clients can search the index and learn from which clients they can retrieve the file (P2P) Additional services at server (chat etc.) Napster Server register (user, files) A “A has X.mp3” Download X.mp3 B Superpeer Networks Digital Enterprise Research Institute Improvement of Central Index Server (Morpheus, Kaaza) multiple index servers build a P2P network clients are associated with one (or more) superpeers superpeers use message flooding to forward search requests Experiences redundant superpeers are good superpeers should have high outdegree (>20) TTL should be minimized www.deri.ie Discussion Digital Enterprise Research Institute www.deri.ie Performance Search latency: very low (index) Message Bandwidth: low – with superpeers flooding occurs, but the number of superpeers is comparatively small Storage cost: low at client, high at index server Update cost: low (no replication) Resilience to failures: bad (system has single-point of failure) Qualitative Criteria search predicates: very flexible, any predicate is possible global knowledge: server peer autonomy: low Summary Digital Enterprise Research Institute www.deri.ie Which are the two levels of P2P networks in superpeer networks, and to which functional layers are they related ? Which problem of distribution is avoided in superpeer networks and addressed in structured network ? What is the impact on the relation between nodes and functional layers ? Structured P2P Overlay Networks Digital Enterprise Research Institute Unstructured overlay networks – what we learned simplicity (simple protocol) robustness (almost impossible to “kill” – no central authority) Performance search latency O(log n), n number of peers update and maintenance cost low Drawbacks tremendous bandwidth consumption for search free riding Can we do better? www.deri.ie Efficient Resource Location Digital Enterprise Research Institute www.deri.ie FULL REPLICATION update cost high low low search cost low STRUCTURED P2P OVERLAY NETWORKS (e.g. prefix routing) high UNSTRUCTURED P2P OVERLAY NETWORKS (e.g. Gnutella) high SERVER (e.g. Napster) maximal bandwidth Distribution of Index Information Digital Enterprise Research Institute www.deri.ie Goal: provide efficient search using few messages without using designated servers Easy: distribution of index information over all peers, i.e. every peer maintains and provides part of the index information (k, p) Difficult: distributing the data access structure to support efficient search Search starts here server Where to start the search? ? data access structure index information I I1 I2 I3 I4 peers (storing data and index information) peers (storing data) Approaches Digital Enterprise Research Institute www.deri.ie Different strategies P-Grid: distributing a binary search tree Chord: constructing a distributed hash table CAN: Routing in a d-dimensional space Freenet: caching index information along search paths Commonalities each peer maintains a small part of the index information (routing table) searches performed by directed message forwarding Differences performance and qualitative criteria P-Grid Digital Enterprise Research Institute www.deri.ie Search tree (prefix tree) ??? extra data ? 101 0?? 00? 000 001 1?? 01? 010 011 N objects log2(N) steps 10? 100 ? 101 ! 101 101 ? 101 11? 110 111 Scalable data access structures Digital Enterprise Research Institute Assume number of data objects >> storage of one node www.deri.ie Distributed storage Given a data access structure Size of data access structure = number of data objects Size of data access structure >> storage of one node Problem: where to store? Non-scalable Distribution of Search Tree Digital Enterprise Research Institute • www.deri.ie Distribute search tree over peers bottleneck ??? 0?? 00? 000 001 peer 1 1?? 01? 010 011 peer 2 10? 100 101 peer 3 11? 110 111 peer 4 Scalable Distribution of Search Tree Digital Enterprise Research Institute www.deri.ie "Napster" bottleneck ??? 0?? 00? 000 001 peer 1 1?? 01? 010 011 peer 2 10? 100 101 peer 3 11? 110 111 peer 4 Scalable data access structures Digital Enterprise Research Institute www.deri.ie Associate each peer with a complete path ??? 0?? 00? 000 001 peer 1 1?? 01? 010 011 peer 2 10? 100 101 peer 3 11? 110 111 peer 4 Scalable data access structures Digital Enterprise Research Institute www.deri.ie ??? 1?? peer 1 peer 2 knows more about this part of the tree 10? 100 101 peer 3 peer 4 knows more about this part of the tree The result is P-Grid Digital Enterprise Research Institute www.deri.ie 101 ? Peers cooperate in search ??? 101 ? 1?? peer 1 peer 2 ??? peer 1 peer 2 101 ? 101 1?? ? 10? 100 101 peer 3 Message to peer 3 101 ? peer 3 11? 110 111 peer 4 peer 4 ! 101 Construction Digital Enterprise Research Institute www.deri.ie Splitting Approach (P-Grid) peers meet and decide whether to extend search tree by splitting the data space peers can perform load balancing considering their storage load networks with different origins can merge, like Gnutella, Freenet (loose coupling) Node Insertion Approach (Chord, CAN, …) peers determine their "leaf position" based on their IP address nodes route from a gateway node to their node-id to populate the routing table network has to start from single origin (strong coupling) Replication of data items and routing table entries is used to increase failure resilience P-Grid Discussion Digital Enterprise Research Institute Performance Search latency: O(log n) (with high probability, provable) Message Bandwidth: O(log n) (selective routing) Storage cost: O(log n) (routing table) Update cost: low (like search) Qualitative Criteria search predicates: prefix searches global knowledge: key hashing peer autonomy: peers can locally decide on their role (splitting decision) www.deri.ie DHT example: Chord Digital Enterprise Research Institute Hashing of search keys AND peer addresses on binary keys of length m www.deri.ie e.g. m=8, key("jingle-bells.mp3")=17, key(196.178.0.1)=3 Data keys are stored at next larger node key peer with hashed identifier p, data with hashed identifier k, then k ] predecessor(p), p ] p k predecessor m=8 stored 32 keys at p2 p3 Search possibilities 1. every peer knows every other O(n) routing table size 2. peers know successor O(n) search cost Routing Tables Digital Enterprise Research Institute www.deri.ie Every peer knows m peers with exponentially increasing distance p p+1 p+2 Each peer p stores a routing table First peer with hashed identifier si such that si =successor(p+2i-1) for i=1,..,m We write also si = finger(i, p) p+4 s1, s2, s3 s5 p2 p+8 s4 p3 p4 p+16 i si 1 p2 2 p2 3 p2 4 p3 5 p4 Search O(log n) routing table size Search Digital Enterprise Research Institute www.deri.ie search(p, k) find in routing table largest (i, p*) such that p* [p,k[ /* largest peer key smaller than the searched data key */ if such a p* exists then search(p*, k) else return (successor(p)) // found p p+1 p+2 p+4 s1, s2, s3 s5 k2 p2 p+8 s4 k1 p3 p4 p+16 Search O(log n) search cost RT with exp. increasing distance O(log n) with high probability Node Insertion Digital Enterprise Research Institute www.deri.ie New node q joining the network q asks existing node p to find predecessor and fingers cost: O(log2 n) p p+1 p+2 q p+4 p2 p+8 p3 p4 p+16 routing table of p routing table of q i si i si 1 q 1 p2 2 q 2 p2 3 p2 3 p3 4 p3 4 p3 5 p4 5 p4 Load Balancing in Chord Digital Enterprise Research Institute Network size n=10^4 5 10^5 keys uniform data distribution 50 keys per node? NO, as IP addresses do not map uniformly into data key space. www.deri.ie Length of Search Paths Digital Enterprise Research Institute Network size n=2^12 100 2^12 keys Path length ½ Log2(n) RTs can be seen as an embedding of search trees into the network and thus search starts at a randomly selected tree depth www.deri.ie Chord Discussion Digital Enterprise Research Institute Performance Search: like P-Grid Node join/leave cost: O(log2 n) Resilience to failures: replication to successor nodes Qualitative Criteria search predicates: equality of keys only global knowledge: key hashing, network origin peer autonomy: nodes have by virtue of their address a specific role in the network www.deri.ie Topological Routing (CAN) Digital Enterprise Research Institute www.deri.ie Based on hashing of keys into a d-dimensional space (a torus) Each peer is responsible for keys of a subvolume of the space (a zone) Each peer stores the adresses of peers responsible for the neighboring zones for routing Search requests are greedily forwarded to the peers in the closest zones Assignment of peers to zones depends on a random selection made by the peer Network Search and Join Digital Enterprise Research Institute www.deri.ie Node 7 joins the network by choosing a coordinate in the volume of 1 => O(d) updates or RTs CAN Refinements Digital Enterprise Research Institute www.deri.ie Multiple Realities We can have r different coordinate spaces Nodes hold a zone in each of them Creates r replicas of the (key, value) pairs Increases robustness Reduces path length as search can be continued in the reality where the target is closest Overloading zones Different peers are responsible for the same zone Splits are only performed if a maximum occupancy (e.g. 4) is reached Nodes know all other nodes in the same zone But only one of the neighbors CAN Path Length Digital Enterprise Research Institute www.deri.ie CAN Discussion Digital Enterprise Research Institute Performance www.deri.ie Search latency: O(d n1/d), depends on choice of d (with high probability, provable) Message Bandwidth: O(d n1/d), (selective routing) Storage cost: O(d) (routing table) Update cost: low (like search) Node join/leave cost: O(d n1/d) Resilience to failures: realities and overloading Qualitative Criteria search predicates: spatial distance of multidimensional keys global knowledge: key hashing, network origin peer autonomy: nodes can decide on their position in the key space Dynamical Clustering (Freenet) Digital Enterprise Research Institute www.deri.ie Freenet Background P2P system which supports publication, replication, and retrieval of data Protects anonymity of authors and readers: infeasible to determine the origin or destination of data Nodes are not aware of what they store (keys and files are sent and stored encrypted) Uses an adaptive routing and caching strategy Index information maintained at each peer (limited cache size) Key Data Address 8e47683isdd0932uje89 456r5wero04d903iksd0 f3682jkjdn9ndaqmmxia wen09hjfdh03uhn4218 712345jb89b8nbopledh d0ui43203803ujoejqhh ZT38hwe01h02hdhgdzu Rhweui12340jhd091230 eqwe1089341ih0zuhge3 erwq038382hjh3728ee7 tcp/125.45.12.56:6474 tcp/67.12.4.65:4711 tcp/127.156.78.20:8811 tcp/78.6.6.7:2544 tcp/40.56.123.234:1111 tcp/128.121.89.12:9991 Freenet Routing Digital Enterprise Research Institute www.deri.ie If a search request arrives Either the data is in the table Or the request is forwarded to the addresses with the most similar keys (lexicographic similarity, edit distance) till an answer is found or TTL reached (e.g. TTL = 500) If an answer arrives The key, address and data of the answer are inserted into the table The least recently used key and data is evicted Quality of routing should improve over time Node is listed under certain key in routing tables Therefore gets more requests for similar keys Therefore tends to store more entries with similar keys (clustering) when receiving results and caching them Dynamic replication of data Freenet Routing Digital Enterprise Research Institute www.deri.ie peer p has k peer p' has k' search k response (k,p) new link established search k', k' similar to k Freenet: Inserting Files Digital Enterprise Research Institute www.deri.ie First a the key of the file is calculated An insert message with this proposed key and a hops-to-live value is sent to the neighbor with the most similar key Then every peer checks whether the proposed key is already present in its local store yes return stored file (original requester must propose new key) no route to next peer for further checking (routing uses the same key similarity measure as searching) continue until hops-to-live are 0 or failure Freenet: Evolution of Path Length Digital Enterprise Research Institute 1000 identical nodes max 50 data items/node max 200 references/node Initial references: (i-1, i-2, i+1, i+2) mod n each time-step: - randomly insert - TTL=20 every 100 time-steps: 300 requests (TTL=500) from random nodes and measure actual path length (failure=500). www.deri.ie median path length 500 6 Freenet Discussion Digital Enterprise Research Institute www.deri.ie Performance Search latency: low (small world property) Message Bandwidth: low (selective routing) Storage cost: relatively low (experimentally not validated !) Update cost: low (like search) – but a bootstrapping phase is required Resilience to failures: good (high degree of replication of data and keys) Qualitative Criteria search predicates: with encryption only equality of keys global knowledge: none peer autonomy: high (with encryption risk of storing undesired data) Comparison Digital Enterprise Research Institute www.deri.ie Paradigm Search Type Gnutella Breadth-first String search on graph comparison Freenet Depth-first Equality search on graph Chord CAN P-Grid Implicit binary search trees d-dimensional space Binary prefix trees Search Cost (messages) TTL 2* i 0 C *(C 1)i O(Log n) ? Equality O(Log n) Equality O(d n^(1/d)) Prefix O(Log n) Small World Graphs Digital Enterprise Research Institute Each P2P system can be interpreted as a directed graph (overlay network) peers correspond to nodes routing table entries as directed links Task Find a decentralized algorithm (greedy routing) to route a message from any node A to any other node B with few hops compared to the size of the graph Requires the existence of short paths in the graph www.deri.ie Milgram’s Experiment Digital Enterprise Research Institute Finding short chains of acquaintances linking pairs of people in USA who didn’t know each other; Source person in Nebraska Sends message with first name and location Target person in Massachusetts. Average length of the chains that were completed was between 5 and 6 steps “Six degrees of separation” principle BIG QUESTION: WHY there should be short chains of acquaintances linking together arbitrary pairs of strangers??? www.deri.ie Random Graphs Digital Enterprise Research Institute www.deri.ie For many years typical explanation was - random graphs Low diameter: expected distance between two nodes is logkN, where k is the outdegree and N the number of nodes When pairs or vertices are selected uniformly at random they are connected by a short path with high probability But there are some inaccuracies If A and B have a common friend C it is more likely that they themselves will be friends! (clustering) Many real world networks (social networks, biological networks in nature, artificial networks – power grid, WWW) exhibit this clustering property Random networks are NOT clustered. Clustering Digital Enterprise Research Institute Clustering measures the fraction of neighbors of a node that are connected themselves Regular Graphs have a high clustering coefficient but also a high diameter Random Graphs have a low clustering coefficient www.deri.ie but a low diameter Both models do match the properties expected from real networks! Regular Graph (k=4) Long paths L ~ n/(2k) Highly clustered C~3/4 Random Graph (k=4) Short path length L~logkN Almost no clustering C~k/n Small-World Networks Digital Enterprise Research Institute www.deri.ie Random rewiring of regular graph (by Watts and Strogatz) With probability p rewire each link in a regular graph to a randomly selected node Resulting graph has properties, both of regular and random graphs – High clustering and short path length Freenet has been shown to result in small world graphs Flashback: Freenet Search Performance Digital Enterprise Research Institute www.deri.ie Modifying routing tables in Freenet through caching has a "rewiring effect" Studies show that Freenet graphs have small-world properties Explains improving search performance Regular graph: n nodes, k nearest neighbors path length ~ n/2k 4096/16 = 256 Rewired graph (1% of nodes): path length ~ random graph clustering ~ regular graph Small World Graph Random graph: path length ~ log (n)/log(k) ~4 Search in Small World Graphs Digital Enterprise Research Institute BUT! Watts-Strogatz can provide a model for the structure of the graph existence high www.deri.ie of short paths clustering It does not explain how the shortest paths are found also Gnutella networks are small-world graphs why can search be efficient in Freenet? P2P Overlay Networks as Graphs Digital Enterprise Research Institute Each P2P system can be interpreted as a directed graph … peers correspond to nodes routing table entries as directed links … embedded in some space P-Grid: interval [0,1] Chord: ring [0,1) CAN: d-dimensional torus Freenet: strings + lexicographical distance www.deri.ie Kleinberg’s Small-World Model Digital Enterprise Research Institute www.deri.ie Kleinberg’s Small-World’s model Embed the graph into an r-dimensional grid constant number p of short range links (neighborhood) q long range links: choose long-range links such that the probability to have a long range contact is proportional to 1/dr Importance of r ! Decentralized (greedy) routing performs best iff. r = dimension of space r=2 Influence of “r” (1) Digital Enterprise Research Institute www.deri.ie 1 • Each peer u has link to the peer v with probability proportional to d ( u ,v ) r where d(u,v) is the distance between u and v. • Optimal value: r = dim = dimension of the space • • • If r < dim we tend to choose more far away neighbors (decentralized algorithm can quickly approach the neighborhood of target, but then slows down till finally reaches target itself). If r > dim we tend to choose more close neighbors (algorithm finds quickly target in it’s neighborhood, but reaches it slowly if it is far away). When r = 0 – long range contacts are chosen uniformly. Random graph theory proves that there exist short paths between every pair of vertices, BUT there is no decentralized algorithm capable finding these paths Influence of “r” (2) Digital Enterprise Research Institute www.deri.ie Given node u if we can partition the remaining peers into sets A1, A2, A3, … , AlogN , where Ai, consists of all nodes whose distance from u is between 2i and 2i+1, i=0..log(N-1). Then given r = dim each long range contact of u is nearly equally likely to belong to any of the sets Ai When q = log N – on average each node will have a link in each set of Ai A1 A2 A 3 A 4 DHTs and Kleinberg model Digital Enterprise Research Institute P-Grid’s model Kleinberg’s model www.deri.ie Conclusions from Kleinberg's Model Digital Enterprise Research Institute www.deri.ie With respect to the Watts and Strogatz model there is no decentralized algorithm capable performing effective search in the class of SW networks constructed according to Watts and Strogatz J. Kleinberg presented the infinite family of Small World networks that generalizes the Watts and Strogatz model and shows that decentralized search algorithms can find short paths with high probability there exist only one unique model within that family for which decentralized algorithms are effective. With respect to overlay networks Many of the structured P2P overlay networks are similar to Kleinberg’s model (e.g. Chord, randomized version, q=log N, r=1) Unstructured overlay networks also fit into the model (e.g. Gnutella q=5, r=0) Some variants of structured P2P overlay networks are having no neighborhood lattice (e.g. P-Grid, p=0) Extensions to spaces beyond regular grids are possible (e.g. arbitrary metric spaces) Summary Digital Enterprise Research Institute www.deri.ie How can we characterize P2P overlay networks such that we can study them using graph-theoretic approaches? What is the main difference between a random graph and a SW graph? What is the main difference between the Watts/Strogatz and the Kleinberg model? What is the relationship between structured overlay networks and small world graphs? What are possible variations of the small world graph model? Specific problems: Identity management Digital Enterprise Research Institute www.deri.ie Definition: Consistent mapping of a set of attributes onto an identifier in a unique, deterministic, and secure way Identification is an essential building block in distributed (information) systems Examples: directory services DNS: symbolic host names IP address X.500: distinguished name object (attributes) UDDI: query web service specification Identity management issues Digital Enterprise Research Institute www.deri.ie Data management Uniqueness of identifiers Centralized vs. distributed data management – Degree of decentralization (orthogonal to the distribution of data!) Update consistency Security Access permissions (+ management) Requires unique identification Resilience against attacks Infrastructure Third party? Scalability, robustness, 24/7 availability, etc. Use case: Dynamic IP addresses Digital Enterprise Research Institute www.deri.ie Most computers on the Internet have a dynamic IP address Limited number of IP addresses Dynamic Host Configuration Protocol (lease time) Host mobility (physical mobility) Problem for any system that builds a distributed management structure on top of such networks Use Case: Management of dynamic IP addresses in structured P2P systems (Chord, DKS, Pastry, P-Grid, etc.) P2P systems and dynamic IP addresses Digital Enterprise Research Institute www.deri.ie Structured P2P systems (Chord, Pastry, P-Grid, DKS, etc.) These systems construct a distributed index routing tables Dynamic IP addresses routing tables become inconsistent system can break down Unstructured (Gnutella) and hierarchical (FastTrack) systems Less of a problem But, they pay with – – – – high bandwidth consumption, or single point of failures, or high infrastructure costs, or etc. Problems to address Digital Enterprise Research Institute www.deri.ie How to find out that an IP address has become invalid? No response – Network problem or did the peer get a new address? Response – Is it still the same peer? (authenticity, replay, man-in-themiddle attacks) Frequency of address changes is crucial Peers can join and leave at any moment IP address can change at any moment Security: DOS attacks are very simple Assume peers report back their new IP address EvilHacker.org participates in the overlay and thus finds out IP addresses EvilHacker.org reports all IP addresses it finds pointing to random hosts or itself A scalable and secure infrastructure is required Problems of current P2P approaches Digital Enterprise Research Institute www.deri.ie Third-party infrastructures are required Very costly maintenance protocols Maintenance protocols may compromise structural properties (e.g., load-balancing) Previous knowledge is lost (e.g., reputation of the peer, QoS, etc.) No current approach addresses security Only the owner should be allowed to update the mapping DOS, replay, man-in-the-middle, etc. are not addressed IPv6 as an alternative Digital Enterprise Research Institute www.deri.ie With IPv6 dynamic addresses or NAT are no longer necessary IPv6 address space is ~3,4 * 1038 (or 1030 addresses per person on the planet) IPv4 (current) address space is 232 IPsec (included in IPv6) solves authentication problem DOS attacks are more difficult Mobility is addressed IPv6: home/foreign address IPv4: mobility extension but not supported on a large scale Problem: IPv6 has not been deployed yet DNS as an alternative Digital Enterprise Research Institute DNS extensions Several RFCs extend the original DNS specification so that DNS could support secure updates Problems – Very heavy-weight – Configuration is very difficult and error-prone – Not for the “normal user” (as in a P2P system) www.deri.ie DynDNS (and similar services) Hosts maintain a consistent name/address mapping in a special DNS domain via a special client Problems – Centralized scalability – Service may go out of business Basic idea Digital Enterprise Research Institute DNS lookup IP address www.deri.ie FQDN IP address routing based on FQDN (any overlay) static mapping Index P-Grid Data lookup IP address routing based on logical identifier P-Grid logical identifier IP address DYNAMIC mapping Informal discussion Digital Enterprise Research Institute www.deri.ie Use unique logical identifiers (UUIDs) instead of physical identifiers (IP addresses): Peer identifiers Routing based on UUIDs Use the overlay itself to securely maintain mappings between the logical and the physical identifier Self-referential approach Rate of changes < self-healing rate Dynamic equilibrium Advantages: General identification facility disentangling logical identifiers from network structure Tracking of chances (for example, reputation) Maintenance and routing Digital Enterprise Research Institute www.deri.ie Universal Unique Identifier (UUID) are generated locally Cryptographically secure hash function global uniqueness Index / routing tables: UUID Peers maintain an up-to-date UUID-IP mappings in P-Grid Routing Peers cache known UUID-IP mappings Mapping exists in cache identity of target peer is checked before forwarding No mapping is known query P-Grid for mapping Security issues Digital Enterprise Research Institute Peer generates public/private key UUID-public key P-Grid Why is it secure? www.deri.ie Data in P-Grid is stored at a number of random replicas Hard to attack Request are – signed (private key) authenticity & access permissions – time-stamped no replay possible Quorums are required for each request Better security than PGP Quorums Independent, random paths avoid weakest link problem Revoke and update of security relevant information is possible 113 Directory maintenance: self-healing Digital Enterprise Research Institute www.deri.ie Repair strategies Eager repair: repair each stale entry encountered immediately Lazy repair: repair a routing table when all references at one level become stale DNS lookup IP address routing based on FQDN (any overlay) lookup IP address in case of failure routing based on logical identifier maintain logical identifier IP address mapping: eager or lazy ID Presently online query(01*) @ 7 …query(0101) @ 7 (for stale entry 5, cycle -> abort) ID Presently offnline …query(1110) @ 7 (for stale entry 14, forward to 12 or 13) Digital Enterprise Institute StoresResearch mappings ID …query(1110) @ 12 (is offline) of peers …query(1110) @ 13 (for stale entry 2) Up-to-date cache ……query(0010) @ 13 (forward to 5) ……query(0010) @ 5 (forward to 7) 1 : 2 ,12 ……query(0010) @ 7 (forward to 9) ……query(0010) @ 9 (new entry for 2 found !) Stale cache …query(1110) @ 2 (new entry for 14 found !) query(01*) @ 14 (finally ) 0 00 000 1 01 001 10 010 011 100 101 www.deri.ie 11 2 0 : 1,14 10 : 11,13 12 1 1 1 : 12, 13 01 : 5, 10 001: 9,4 7 1 1 : 12, 13 01 : 5,14 001: 9,4 9 2,3 1 : 8,2 01 : 3, 10 000: 1,7 4 2,3 1 : 6,13 01 :10,14 000: 1,7 14 4,5 1 : 2,12 00 : 9,4 011: 3,10 5 4,5 1 : 8, 13 00 : 7,9 011: 3,10 3 6,7 1 : 11,12 00 : 1,9 010: 5,14 10 6,7 1 : 6,8 00 : 1,7 010: 5,14 11 8,9 0 : 4,7 11 : 2,12 101: 8,13 6 8,9 1 : 1,3 11 : 2,12 101: 8,13 13 10,11 0 : 5,9 11 : 2,12 100: 6,11 8 10,11 0 : 4,9 11 : 2,12 100: 6,11 12,13,14 12,13,14 0 : 5,7 10 : 6,13 Eager repair strategy Digital Enterprise Research Institute www.deri.ie System is in a dynamic equilibrium if the rate of changes due to changing mappings and the rate of repairs is equal Dynamic equilibrium equation LHS Rate at which repair of stale routing entries occur rup changes per 1-rup queries Nrec – 1 additional recursive queries Repair makes sense only if the routing entry to be repaired corresponds to an online peer A repair is possible only if recursive query succeeds RHS Rate of entries turning stale rup changes 1-pdyn probability of non-stale references (only these can turn stale) r references at each peer for each of log 2n levels Lazy repair strategy Digital Enterprise Research Institute www.deri.ie Repair only if all references of a level are stale Not all routing entries are treated uniformly The number of stale entries for each routing level at each peer defines the state of that level Markovian model 0 ref stale ID change 1 ref stale ID change 2 ref stale ID change … ID change r ref stale repairs Dynamic equilibrium equation inflow = outflow (for each state) At dynamic equilibrium, the number of routing levels with given number of stale entries over the whole system should not change N.B. We distinguish stale entries from offline peers Lazy repair: Analytical vs. simulation results Digital Enterprise Research Institute www.deri.ie H L Number of messages vs. rate of change (N=128,256,512,1024, replication factor is 8) Msg 80 60 Lazy Rec., Mess. vs. r_up for p_on=1 sim,N=128 ana,N=128 sim,N=256 ana,N=256 sim,N=512 ana,N=512 sim,N=1024 ana,N=1024 n=N 8 , rup 40 20 0.025 0.05 0.075 0.1 0.125 0.15 r_up Effect of pon on stability and message overhead Digital Enterprise Research Institute www.deri.ie In networks with more online peers the lazy strategy is advantageous but collapses earlier Msg 1200 1000 H HLL Lazy vs. Eager rec. r_up= 0.2, lg_2 n =5 Directory "unstable" Lazy rec 800 Eager rec 600 400 Directory "stable" 200 0.3 0.4 0.5 0.6 0.7 0.8 0.9 p_on Summary Digital Enterprise Research Institute Decentralized, self-maintaining, light-weight, and secure directory service Robust and applicable in unreliable environments Contributions www.deri.ie Logical independence of identity from network properties General approach for identification Structural properties are maintained Existing knowledge is retained Dynamic resilience of a P2P system under “churn” GridVine: Peer Data Management Digital Enterprise Research Institute www.deri.ie Searching semantically richer objects in large scale heterogeneous networks <xap:CreateDate>2001-12date? 19T18:49:03Z</xap:CreateDate> <xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate> <es:DofCreation> 05/08/2004 </es:DofCreation> ? ? ? ? ? <myRDF:Date> Jan 1, 2005 </myRDF:Date> ➠ Lack of semantic interoperability Information Heterogeneity Digital Enterprise Research Institute www.deri.ie Syntactic discrepancies ImageGUID cDate A0657B25 05.08.04 VS <es:cDate> 05/08/2004 </es:cDate> Semantic heterogeneity Extensible standards (XML, RDF, XMP, PSA, WinFS...) <rdf:Property rdf:ID="width"> <rdfs:label>Width</rdfs:label> <rdfs:subPropertyOf rdf:resource="#length"/> </rdf:Property> VS <rdf:Property rdf:ID=“Length-Y"> <rdfs:label>Length-Y</rdfs:label> <rdfs:subPropertyOf rdf:resource="#length"/> </rdf:Property> 100s of evolving schemas for one particular domain (e.g., protein information, picture metadata) ➠ Shared representation is not enough Integrating Data in Distributed Databases Digital Enterprise Research Institute The Wrapper-Mediator architecture www.deri.ie Date Date Integrating Data in the new Web Ecology Digital Enterprise Research Institute Mediated Architectures Large Scale Information Systems (e.g., WWW)) Scale Number of sources < 100 Number of sources > 1000 Uncertainty Consistent Data Uncertain Data - Coordination - Manually curated data - Autonomy - Semi-automatic creation of data Schemas created by administrators Schemas created by end users Relatively stable set of sources Network churn - stable mediator - node failures Sources known a priori Unknown sources Relational Data Structured Schemas Semi-structured data Schematas - Integrity constraints - Few integrity constraints Structured Queries Simple S-P Queries Dynamicity Expressiveness www.deri.ie Peer Data Management Systems Digital Enterprise Research Institute www.deri.ie date? <es:cDate> 05/08/2004 </es:cDate> <xap:CreateDate>2001-1219T18:49:03Z</xap:CreateDate> <xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate> es:cDate xap:CreateDate <myRDF:Date> Jan 1, 2005 </myRDF:Date> Pairwise mappings Peer Data Management Systems (PDMS) Local mappings overcome global heterogeneity Iterative query reformulation 3-Tier Network Digital Enterprise Research Institute www.deri.ie Semantic Mediation Layer Overlay Layer Internet Layer Jupp / P-Grid DHTs Data-Centric P2P Systems Digital Enterprise Research Institute Piazza / Hyperion More expressive mapping languages – LAV-style query reformulation in P2P settings? Network-intensive – Large-scale deployment? Perfect mappings PIER Scales a relational engine on top of a DHT Fixed schema RDFPeers Indexes RDF triples in a DHT No schemas www.deri.ie Credits Digital Enterprise Research Institute Karl Aberer Philippe Cudre-Mauroux Anwitaman Datta Roman Schmidt www.deri.ie Are you still awake? Digital Enterprise Research Institute www.deri.ie Digital Enterprise Research Institute Introduction to Peer-to-Peer Networks – Part 2 Manfred Hauswirth, Marcel Karnstedt Copyright 2009 Digital Enterprise Research Institute. All rights reserved. www.deri.ie Outline – Part 2 Digital Enterprise Research Institute The Vision: A Universal Storage for Web Data A Distributed Universal Storage Data & Query Model Operators Query Engine Mappings & Query Expansion The Praxis: Implementation Summary & Outlook 133 of XYZ www.deri.ie Outline – Part 2 Digital Enterprise Research Institute The Vision: A Universal Storage for Web Data A Distributed Universal Storage Data & Query Model Operators Query Engine Mappings & Query Expansion The Praxis: Implementation Summary & Outlook 134 of XYZ www.deri.ie Examples Digital Enterprise Research Institute 135 www.deri.ie Huge heterogeneous data sets, collaboration, dynamic, structured, deep, linked, … Clique project: Integrated structrured storage Data pre-processing, integration, restructuring, indexing Data access API and query processor for complex queries Marcel Karnstedt IFIP Database Meeting Nicosia, Cyprus, 2009 135 Public Data Management Digital Enterprise Research Institute www.deri.ie ....Semantic & social Web, encyclopedias, recommender systems ... “The world is a database“ Datasets, which are Maintained by large communities in a distributed way Of public interest „Homogenized“ database, extensible and flexible, distributed, scalable, structured data and queries 136 of XYZ Main Challenges Digital Enterprise Research Institute Data management Scalability and robustness Security, trust, fairness Guarantees, consistency, integrity CAP-Theorem [Gilbert et al. 2002], ACID vs. BASE Query expressiveness DB-like queries with advanced functionality Support of IR queries and similarity is mandatory Schema-unaware queries and/or queries on schema Efficient processing Efficient query operators Cost awareness in changing situations 137 of XYZ www.deri.ie Approaches Digital Enterprise Research Institute www.deri.ie Who pays the load? Who owns the data? views over 100.000 data sources? Do we trust them? Sindice, YARS Jena, Oracle SW-Store 138 of XYZ Mediator Efficient query processing? PIER, PeerDB AmbientDB RDFPeers UniStore Outline – Part 2 Digital Enterprise Research Institute The Vision: A Universal Storage for Web Data A Distributed Universal Storage Data & Query Model Operators Query Engine Mappings & Query Expansion The Praxis: Implementation Summary & Outlook 139 of XYZ www.deri.ie Influences Digital Enterprise Research Institute www.deri.ie Robustness, self-organization, scalability Efficient lookups P2P paradigm DHTs & SDDS Index structures Sensor networks Data streams 140 of XYZ PDMS Distributed DBS Transparency, query processing The Big Picture Digital Enterprise Research Institute www.deri.ie DB Who wrote an article for cool movies? Wikipedia(article,author) “Pulp Fiction”,”MK” Del.icio.us(bookmark,tag,creator) “http://…pfiction”,”cool”,”MKa” DBPedia(link,wikilink,category) “Pulp Fictoon”,”Q. Tarantino”,”movie” 141 of XYZ Layers of Processing Digital Enterprise Research Institute www.deri.ie Scheduling, Adaptation, Costs Processing Strategies Similarity / Approximate Operators Query Operators Multicast, Aggregation, Range Routing 142 of XYZ Outline – Part 2 Digital Enterprise Research Institute The Vision: A Universal Storage for Web Data A Distributed Universal Storage Data & Query Model Operators Query Engine Mappings & Query Expansion The Praxis: Implementation Summary & Outlook 143 of XYZ www.deri.ie Universal Relation Model Digital Enterprise Research Institute www.deri.ie Since the eighties: Model for simplified retrieval in (relational) databases Universal relation containing all attributes Simplifies navigation over multiple relations during query formulation SW-Store A1 144 of XYZ A2 A3 B1 B2 C1 ... Triple Store Digital Enterprise Research Institute www.deri.ie Universal relation model Storing each tuple as a set of triples (oid, attribute, value) Similar to RDF: subject, predicate, object OID Car Mileage HP Price 232 34.000 28.000 Volvo V70 180 232 Car Volvo V70 232 Mileage 34.000 232 HP 180 232 Price 28.000 145 of XYZ SW-Store RDFPeers Sindice, YARS ... Extensible Flexible Self-descriptive No need for representing null values Indexing Digital Enterprise Research Institute www.deri.ie Indexing of attributes = key for Hashing Which attributes? All! For tuple (oid, v1, v2, ...) of R(OID, A1, A2, ...) 232 Car Volvo V70 232 Mileage 34.000 232 HP 180 232 Price 28.000 h(oid) h(A1 || v1) h(A2 || v2) ... for object lookup for Ai ≥ v (prefix search) ...trade-off storage vs. performance 146 of XYZ YARS Hexastore P-Grid : Range Queries Digital Enterprise Research Institute [Datta et al. 2005] www.deri.ie Outline – Part 2 Digital Enterprise Research Institute The Vision: A Universal Storage for Web Data A Distributed Universal Storage Data & Query Model Operators Query Engine Mappings & Query Expansion The Praxis: Implementation Summary & Outlook 148 of XYZ www.deri.ie VQL Digital Enterprise Research Institute www.deri.ie Query language VQL Inspired by RDF query language SPARQL Conjunctive queries Enhanced by advanced operators Operations both on instance and schema level Basic query form SELECT ?oid, ?val WHERE { ?oid price ?val } ORDER BY ... LIMIT ... SPARQL 149 of XYZ Similarity Queries Digital Enterprise Research Institute www.deri.ie WHERE { ?o attrib ?value FILTER (edist(?value, v) < 2) } Numerical similarity: value distance String similarity: Edit distance using (positional) q-Grams [Gravano et al. 2001, Schallehn et al. 2004] Requires additional key-value pairs in P-Grid For each triple (oid, A, v) – – LSH forest SWAM h(q-grami(A)) oid h(q-grami(v)) oid Approach for instance and schema level 150 of XYZ String Similarity Digital Enterprise Research Institute www.deri.ie 0 0 0 1 ... 1 ... ... 00…0 1 11…1 “All values in distance d=1…” Query range for attribute vs. query d+1 q-grams [NetDB06] 151 of XYZ More Operators Digital Enterprise Research Institute Similarity joins [NetDB06] WHERE { ?o1 attr1 ?v1 . ?o2 attr2 ?v2 FILTER (edist(?v1, ?v2) < k) } Ranking queries: top-k, skyline [DBRank07] WHERE { ?o attr ?v } ORDER BY ?v LIMIT k WHERE { ?o attr ?v } ORDER BY ?v NN “A String“ LIMIT k WHERE { ?o attr1 ?x . ?o attr2 ?y} SKYLINE OF ?x MIN, ?y MAX 152 of XYZ www.deri.ie String Similarity Joins Digital Enterprise Research Institute www.deri.ie Doubled parallel Doubled sequential Cloud services Parallel and sequential 153 of XYZ Sequential and parallel Skyline Queries Digital Enterprise Research Institute www.deri.ie Objects that are not “dominated“ by other objects Scoring function on multiple attributes, no weighting dominated objects price mileage 154 of XYZ Skylines: Basic Idea Digital Enterprise Research Institute “Frame Skyline“ algorithm over 2 dimensions Minimum of first dimension defines maximum for second dimension Minima/Maxima provide a frame narrowing the search space 155 of XYZ www.deri.ie DSL Skyframe Skylines: Processing Digital Enterprise Research Institute www.deri.ie y ... ... ... Min y Min x x Find minimum in one selective dimension x 2. Use y value of min(x) to limit search range 3. Use range query routing to build local skylines 4. Always ship current skyline with query 5. Determine global skyline at one peer 6. Optionally: distributed range querying “on the way“ 156 of XYZ to min(x) 1. Skylines: More Dimensions Digital Enterprise Research Institute All projections to 2d sub-spaces are skyline candidates Objects of the searched frame can dominate projections Projections cannot dominate objects of the searched frame 157 of XYZ www.deri.ie Outline – Part 2 Digital Enterprise Research Institute The Vision: A Universal Storage for Web Data A Distributed Universal Storage Data & Query Model Operators Query Engine Mappings & Query Expansion The Praxis: Implementation Summary & Outlook 158 of XYZ www.deri.ie Query Execution Digital Enterprise Research Institute www.deri.ie Goal: stateless processing „Push“ approach Messages containing both plan and intermediate results (based on Mutant Query Plans [Papadimos et al. 2002]) Receiver peer is identified by applying the hash function Multiple instances of the plan travel trough the network (A) p0 (A) {(A,1),(A,2)} p1 B (A) B B p2 {(A,3),(A,4)} p3 {(A,5),(A,6)} 159 of XYZ {(A,2)} PIER, PeerDB DARQ B {(B,2),(C,1),(B,2),(C,4)} p4 {(A,2,B,2,C,1), (A,2,B,2,C,4)} {(A,3),(A,4)} B p5 {(B,5),(B,6)} {(A,5)} B p0 {(A,5),(B, 5)} Cost-Based Planning Digital Enterprise Research Institute www.deri.ie [NetDB06, P2P06, DBRank07] Find all values of attribute A in max. distance d=1 0 1 01 001 p1 p5 Query all peers for A in parallel or sequence #msgs = ml + |A|-1 p2 0 h(A) 1 Query d+1 q-grams in parallel 01 #msgs = (d+1)*ml 001 p1 p5 p2 h(A#q-gram2) h(A#q-gram1) 160 of XYZ ObjectGlobe DARQ Guarantees: Completeness Digital Enterprise Research Institute Problem: “Fire and forget“ query strategy doesn‘t guarantee complete results Goal: Allow to estimate result completeness – For the user (“98% of all possible answers“) – For blocking query operators (aggregators, ranking-based operators) in order to guarantee a certain level of completeness Idea: Not feasible on data level, but perhaps on peer level?! 161 of XYZ www.deri.ie Completeness Estimation Digital Enterprise Research Institute www.deri.ie Estimation on peer level Based on routing graphs and routing methods Support of probabilistic guarantees Accuracy improved by Milestone messages (MiMes) [CIKM08, WIDM08] Seaweed ... Join(A=B) Extract(A)sequ Extract(B)range Query graph P1B P2B P3B ... PmB P1A P2A P3A ... PnA P0 Routing graph PxY 162 of XYZ Routing level Routing point CERQ: Initial Estimation Digital Enterprise Research Institute www.deri.ie 0 00 000 0000 1 01 P1 001 0001 P4 P0 P2 P3 Example: P-Grid range query 00100-1101 at P0 Predict trie on information from: 1) local path: 0001 2) local routing table (at least one node per level/sub-tree) => estimates 8 (out of 10) ...the better, the more information is kept in each routing table of[P2P07] 163 XYZ CERQ: Estimation Refinement Digital Enterprise Research Institute The initial estimate might not be correct www.deri.ie Refinement by other (intermediate) peers Piggy-back information Query contains estimation of peers in sub-tree Query replies can contain corrections 0 00 Sub-query P0 -> P3 Sub-tree 001* Estimate: 3 peers P3’s routing table contains peer(s) of sub-tree 0010* 164 of XYZ 000 0000 P4 1 01 P1 001 0001 P0 P2 P3 CERQ: Further Improvements Digital Enterprise Research Institute Use of structural replication Peers have more than one entry per level Each entry might have a different path Every path allows peers to learn more about sub-trees More entries mean better initial estimates Use of caching P2P networks are dynamic Though the structure is likely to be stable The learned structure can be cached for later estimates 165 of XYZ www.deri.ie CERQ: Other Overlays Digital Enterprise Research Institute SkipGraphs Prefix hash tree (Chord) Most similar to P-Grid Routing information of multiple levels Unknown number of peers in bucket layer Peers build a tree-hierarchy Only applicable if number of children is known CAN and Mercury Forwarding along neighbors No estimation can be given => The idea can be mapped under certain conditions 166 of XYZ www.deri.ie Outline – Part 2 Digital Enterprise Research Institute The Vision: A Universal Storage for Web Data A Distributed Universal Storage Data & Query Model Operators Query Engine Mappings & Query Expansion The Praxis: Implementation Summary & Outlook 167 of XYZ www.deri.ie Representing Mappings Digital Enterprise Research Institute www.deri.ie Simple kind of attribute correspondences subsumes equiv A1 A2 A3 A4 A5 Triple representation (A4, equiv, A5) (A3, subsumes, A6) Extensible to ontologies and views [Ideas08] 168 of XYZ A6 PeerDB SQPeer Query Expansion Digital Enterprise Research Institute www.deri.ie Unexpanded query GridVine PDMS Map operators added 169 of XYZFirst mapping Expanded query Outline – Part 2 Digital Enterprise Research Institute The Vision: A Universal Storage for Web Data A Distributed Universal Storage Data & Query Model Operators Query Engine Mappings & Query Expansion The Praxis: Implementation Summary & Outlook 170 of XYZ www.deri.ie UniStore Digital Enterprise Research Institute [ICDE07] www.deri.ie Evaluation: Similarity Joins Digital Enterprise Research Institute c1: c2: c3: c4: c5: seq & par/seq par & par/seq seq & par/par par& par/par par & local 172 of XYZ www.deri.ie Evaluation: CERQ Digital Enterprise Research Institute 1 reference references www.deri.ie 3 references 5 Estimation is correct after a few corrections More references lead to (as expected) Better initial estimate Less corrections Smaller errors Better estimates for smaller ranges (q1 and q2) Replication factor 1 (similar results for factor 2) 173 of XYZ Evaluation: Completeness Digital Enterprise Research Institute Min, max, avg 74 peers 174 of XYZ Without MiMes www.deri.ie Min, max, avg 50 peers With MiMes Outline – Part 2 Digital Enterprise Research Institute The Vision: A Universal Storage for Web Data A Distributed Universal Storage Data & Query Model Operators Query Engine Mappings & Query Expansion The Praxis: Implementation Summary & Outlook 175 of XYZ www.deri.ie Summary & Outlook Digital Enterprise Research Institute www.deri.ie Web data is huge, heterogeneous, structured, linked Modern applications require a universal and flexible storage DB-like and RDF-liking DHTs well-suited for large-scale data management UniStore as one solution Robust and scalable, universal and light-weight Sophisticated query capabilities Adaptive, cost-based, stateless and parallel QP Guarantees, semantic layer …and all on totally decentralised and self-organising P2P!! Open issues Privacy & Trust, reputation 176 of XYZ Guarantees, consistency, integrity Acknowledgements Digital Enterprise Research Institute www.deri.ie Kai-Uwe Sattler, Manfred Hauswirth, Katja Hose, Roman Schmidt, Renault John, Brahmananda Sapkota, Conor Hayes, ... Students: Martin Richtarsky, Michael Haß, Jessica Müller, Mario Wiegandt, Stefan Schwalm, Matthias Marx, Thomas Kreyling, ... Supported by the Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2) and under Grant No. 08/SRC/I1407 (Clique: Graph & Network Analysis Cluster) 177 of XYZ Related Systems Digital Enterprise Research Institute www.deri.ie Sindice: Sindice. The semantic web index. http:// sindice.com/ YARS: A. Harth, J. Umbrich, A. Hogan, S. Decker, Yars2: A federated repository for querying graph structured data from the web, in: Proc. of ISWC/ASWC, 2007. Jena: Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB, pp. 131–150 (2003) Oracle: Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL- based RDF querying scheme. In: VLDB, pp. 1216–1227 (2005) SW-Store: D. J. Abadi, A. Marcus, S. R. Madden, K. Hollenbach. SW-Store: a vertically partitioned DBMS for Semantic Web data management. The VLDB Journal (2009) 18:385– 406 PIER: R. Huebsch, J. M. Hellerstein, N. Lanham, B. Thau Loo, S. Shenker, and I. Stoica. Querying the Internet with PIER. In VLDB’03, pages 321–332, 2003. RDFPeers: M. Cai and M. Frank. RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network. In WWW’04, pages 650–657, 2004. PeerDB: W. S. Ng, B. Ch. Ooi, and K.-L. Tan. PeerDB: A P2P-based System for Distributed Data Sharing. In ICDE ’03, pages 633–644, 2003. AmbientDB: P. Boncz and C. Treijtel. AmbientDB: Relational Query Processing in a P2P Network. In Workshop On Databases, Information Systems and P2P Computing, (DBISP2P’03), pages 153–168, 2003. Hexastore: Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. VLDB, 2008 SPARQL: E. Prud’hommeaux and A. Seaborne. SPARQL Query Language for RDF, 2006. W3C Candidate Recommendation. 178 of XYZ Related Systems /2 Digital Enterprise Research Institute www.deri.ie LSH forest: M. Bawa, T. Condie, and P. Ganesan. LSH forest: self-tuning indexes for similarity search. In WWW’05, pages 651–660, 2005. SWAM: F. Banaei-Kashani and C. Shahabi. SWAM: a family of access methods for similarity- search in peer-to-peer data networks. In CIKM’04, pages 304–313, 2004. Cloud services: M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building a database on S3. In SIGMOD ’08, pages 251–264, 2008. DSL: P. Wu, C. Zhan, Y. Feng, B. Zhao, D. Agrawal, and A. El Abbadi. Parallelizing skyline queries for scalable distribution. In EDBT’06, pages 112–130, 2006. Skyframe: S. Wang, Q. H. Vu, B. Ch. Ooi, A. K. H. Tung, and L. Xu. Skyframe: a framework for skyline query processing in peer-to-peer systems. The VLDB Journal, 18(1):345–362, 2009. DARQ: B. Quilitz and U. Leser. Querying Distributed RDF Data Sources with SPARQL. In ESWC’08, pages 524–538, 2008. ObjectGlobe: R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam, and K. Stocker. Ob jectGlobe: Ubiquitous query processing on the Internet. VLDB Journal, 10(1):48–71, 2001. Seaweed: D. Narayanan, A. Donnelly, R. Mortier, and A. Rowstron. Delay aware querying with seaweed. In VLDB’06, pages 727–738, 2006. SQPeer: G. Kokkinidis, E. Sidirourgos, and V. Christophides. Query Processing in RDF/SBased P2P Database Systems. Semantic Web and Peer-to-Peer, chapter 4, pages 59–81. Springer, 2006. GridVine: K. Aberer, P. Cudr´e-Mauroux, M. Hauswirth, and T. Van Pelt. GridVine: Building Internet-Scale Semantic Overlay Networks. In ISWC’04, pages 107–121, 2004. PDMS: A. Y. Halevy, Z. G. Ives, P. Mork, and I. Tatarinov. Piazza: data management infrastructure for semantic web applications. In WWW’03, pages 556–567, 2003. 179 of XYZ Thank you! Digital Enterprise Research Institute www.deri.ie [CIKM08]: M. Karnstedt, K.Sattler, M. Haß, M. Hauswirth, B. Sapkota, R. Schmidt: Estimating the Number of Answers with Guarantees for Structured Queries in P2P Databases, CIKM 2008, Napa, USA. [WIDM08]: M. Karnstedt, K. Sattler, M. Haß, M. Hauswirth, B. Sapkota, R. Schmidt: Approximating Query Completeness by Predicting the Number of Answers in DHT-based Web Applications, WIDM'08 icw CIKM'08, Napa, USA, 2008. [Ideas08]: M. Karnstedt, K.Sattler, M. Hauswirth, B. Sapkota, R. Schmidt: Ad-hoc Integration and Querying of Semantic Web Data, Ideas 2008, Coimbra, Portugal. [DBRank07]: M. Karnstedt, J. Müller, K. Sattler, Cost-Aware Skyline Queries in Structured Overlays, DBRank‘07@ICDE 2007, Istanbul, Turkey. [ICDE07]: M. Karnstedt, K.Sattler, M. Richtarsky, J. Müller, M. Hauswirth, R. Schmidt, R. John: UniStore: Querying a DHT-based Universal Storage, ICDE 2007 Demonstration. [P2P07]: M. Karnstedt, K.Sattler, R. Schmidt: Completeness Estimation of Range Queries in Structured Overlays, P2P 2007, Galway, Ireland. [P2P06]: M. Karnstedt, K. Sattler, M. Hauswirth, R. Schmidt: Cost-Aware Processing of Similarity Queries in Structured Overlays, P2P 2006, Cambridge, UK [NetDB06]: M. Karnstedt, K. Sattler, M. Hauswirth, R. Schmidt: Similarity Queries on Structured Data in Structured Overlays, NetDB'06 @ ICDE 2006, Atlanta, GA. [Gilbert et al. 2002]: S. Gilbert and N. Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51–59, 2002. [Datta et al. 2005]: A. Datta, M. Hauswirth, R. Schmidt, R. John, and K. Aberer. Range queries in trie- structured overlays. In P2P’05, pages 57–66, 2005. [Papadimos et al. 2002]: V. Papadimos, D. Maier. Mutant Query Plans. Information and Software Technology, 44(4):197–206, April 2002. [Schallehn et al. 2004]: E. Schallehn, I. Geist, K. Sattler: Supporting Similarity Operations based on Approximate String Matching on the Web, CoopIS 2004, Larnaca. [Gravano et al. 2001]: L. Gravano et al.; Approximate String Joins in a Database (almost) for VLDB 2001, Roma. 180Free, of XYZ