* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Representing Information
Survey
Document related concepts
Zero-configuration networking wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Airborne Networking wikipedia , lookup
Network tap wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Computer network wikipedia , lookup
Deep packet inspection wikipedia , lookup
Distributed firewall wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Real-Time Messaging Protocol wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Internet protocol suite wikipedia , lookup
Transcript
Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 4: Search 1 • google.com • google.cn • baidu.cn 2 3 ARPAnet, 1971 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 4 Clients and Servers Web Server e-mail Server e-mail Server www.google.com Mail.yahoo.com smtp.fas.harvard.edu QuickTi me™ and a T IFF (Uncom pressed) decom pressor are needed to see t his pict ure. download QuickTi me™ and a T IFF (Uncom pressed) decom pressor are needed to see t his pict ure. THE INTERNET Quic k Ti me™ and a T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure. upload 5 Client Computers IP = Internet Protocol Store and Forward Switch = Router Router in network core receives incoming packets and stores them in “buffer” (temporary storage) Routes packets on outgoing links May throw packets away if buffer is full Routing Table 6 “End to End”: Intelligence at Edge of Network Routers are relatively dumb and rely on intelligence at the edge to compensate Client application: email, web browser, iTunes • Packetize • Add serial #s Server application BEST EFFORT QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • Reassemble packets • Add fingerprint • (Maybe) report missing packets • Add destination address • (Maybe) report damaged packets 7 • Insert into network • Deliver to application Packets Packet size (1.5 KB max) a compromise Small enough that they can be “handled” quickly and with relatively low odds of being damaged Large enough that packaging does not outweigh the contents or “payload” 8 IP Addresses IPv4: 32 bits written as 4 decimal numerals less than 256, e.g. 141.211.125.22 (UMich) 4 billion not enough IPv6: 128 bits written as 8 blocks of 4 hex digits each, e.g. AF43:23BC:CAA1:0045:A5B2:90AC:FFEE:8080 At edge, translate URLs --> IP addresses, e.g. umich.edu --> 141.211.125.22 Authoritative sites for address translation = “Domain Name Server” (DNS) In the network core, IP addresses are used to route packets using routing tables 9 10 But who controls the names and numbers? ICANN = Internet Corporation for Assigned Names and Numbers A US nonprofit … but it’s a long story. 11 12 The Internet is IP Routers do not know what the bits in the packets represent Do not know if they are email, streaming video, html web pages Do not know if they are encrypted or unencrypted You can invent your own new service adhering to IP standards Gain Internet’s best-effort service and possibility of undelivered packets 13 Striping 1 Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 2 3 4 14 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 2 3 4 1 15 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 3 4 2 1 16 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 4 3 2 1 17 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 4 3 2 1 18 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 4 3 2 1 19 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 4 3 2 1 20 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 4 3 2 1 21 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 4 1 3 2 22 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 4 1 2 3 23 Striping Smallish packets also make better use of the network since later packets can leave before earlier packets arrive 1 2 3 4 24 Striping Utilizes the Network Store and Forward delays would add up if entire message had to be buffered at every router 1 2 3 4 25 Striping Utilizes the Network Store and Forward delays would add up if entire message had to be buffered at every router 1 2 3 4 26 Striping Utilizes the Network Store and Forward delays would add up if entire message had to be buffered at every router 1 2 3 4 27 Striping Utilizes the Network Store and Forward delays would add up if entire message had to be buffered at every router 1 2 3 4 28 Striping Utilizes the Network Store and Forward delays would add up if entire message had to be buffered at every router 1 2 3 4 29 Striping Utilizes the Network Store and Forward delays would add up if entire message had to be buffered at every router 1 2 30 3 4 TCP Transport Control Protocol Creates logical connection between two machines on the edge of the network Connected machines seem to have a circuit connecting them even though they do not tie up the network Provide reliable, perfect transport of messages, even though IP may drop packets Regulates the rate at which packets are inserted into the network 31 TCP, Basic Idea 32 TCP, Basic Idea 1 2 33 TCP, Basic Idea 1 2 + “3-Way Handshaking” 34 TCP, Basic Idea 1 2 + “3-Way Handshaking” 35 TCP, Basic Idea 1 2 + “3-Way Handshaking” 36 TCP, Basic Idea 1 2 + “3-Way Handshaking” 37 TCP, Basic Idea 1 2 + “3-Way Handshaking” 38 TCP, Basic Idea + 1 2 “3-Way Handshaking” 39 TCP, Basic Idea 1 2 * “3-Way Handshaking” 40 TCP, Basic Idea 1 2 * “3-Way Handshaking” 41 TCP, Basic Idea 1 2 * “3-Way Handshaking” 42 TCP, Basic Idea 1 2 * “3-Way Handshaking” 43 TCP, Basic Idea 1 2 * “3-Way Handshaking” 44 TCP, Basic Idea 1 2 * “3-Way Handshaking” 45 TCP, Basic Idea 1 2 * “3-Way Handshaking” 46 TCP, Basic Idea 1 2 * “3-Way Handshaking” 47 TCP, Basic Idea 1 2 * “3-Way Handshaking” 48 TCP, Basic Idea 1 2 * “3-Way Handshaking” 49 TCP, Basic Idea 1 2 * “3-Way Handshaking” 50 TCP, Basic Idea * 1 2 “3-Way Handshaking” 51 TCP, Basic Idea 1 2 “Virtual Circuit” now established between two hosts though the routers in between are not aware of it and the same path need not be followed by all packets 52 TCP, Basic Idea 1 2 1 53 TCP, Basic Idea 1 2 1 54 TCP, Basic Idea 1 2 2 1 55 TCP, Basic Idea 1 2 2 1 56 TCP, Basic Idea 1 2 2 1 57 TCP, Basic Idea 1 2 2 1 58 TCP, Basic Idea 1 2 2 1 ACK1 59 TCP, Basic Idea 1 2 2 1 ACK1 60 TCP, Basic Idea 1 2 1 2 ACK1 ACK2 61 TCP, Basic Idea 1 2 1 ACK1 ACK2 62 TCP, Basic Idea 1 2 1 2 ACK1 ACK2 63 TCP, Basic Idea 2 1 2 ACK2 64 TCP, Basic Idea 2 1 2 ACK2 65 TCP, Basic Idea 2 1 2 ACK2 66 TCP, Basic Idea 1 2 67 TCP, Basic Idea 1 2 68 Dropped Packets and Retransmission 1 2 1 69 Dropped Packets and Retransmission 1 2 1 70 Dropped Packets and Retransmission 1 2 2 1 71 Dropped Packets and Retransmission 1 2 2 72 Dropped Packets and Retransmission 1 2 2 73 Dropped Packets and Retransmission 1 2 2 74 Dropped Packets and Retransmission 1 2 2 75 Dropped Packets and Retransmission 1 2 2 76 TIMEOUT 1 2 Dropped Packets and Retransmission 1 2 77 The World Wide Web One of the facilities or services provided by certain of the computers on the Internet A logical network of web pages that need not be on physically connected computers 78 http://www.harvard.edu http://www.president.harvard.edu/ http://www.ksg.harvard.edu/ http://www.harvard.edu QuickTime™ and a TIFF (U ncompressed) decompressor are needed to see this picture. http://www.news.harvard.edu/gazette/… 79 http://www.brighamandwomens.org/PressReleases/… URL = Uniform Resource Locator Request “www.google.com” QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. The Internet Receive html code Your computer Google’s computer 80 Searching the Web Finding pages referring to the search terms Deciding which pages are the most “relevant” 81 Finding Relevant Pages 1. Build an index ahead of time Eddington URL, URL, … Edison URL, URL, … Edmonton URL, URL, … 2. When queried, look up in the index 82 Building the Index Google “crawls” the entire Web, following links and loading the pages they point to Every time it retrieves a page, it indexes everything on the page maybe keep a “cached” copy of the page A complete crawl probably takes a week or two Opt-out Caching and copyrights? 83 Search = lookup + ranking Look a term up in a huge index to retrieve a set of URLs Or several terms … Rank the results in order of “usefulness” or “desirability” Or political correctness! Try “falun gong” on google.com and google.cn Or profitability?? 84 Basic Structure of the Index The Lexicon The Lists of Pages Eddington URL, URL, … Edison URL, URL, … Edmonton URL, URL, … Eddington Edison Edmonton Primary Memory Secondary Memory 85 Page Ranking Hugely important commercially Page rank is really a new kind of capital People try to “spoof” ranking algorithms Search engineers try to detect and discount spoofing Endless game of cat and mouse … 86 “A page is important if a lot of pages point to it” Probably wrong. Also easy to spoof 87 “A page is important if a lot of important pages point to it” Circular? Not really. Can calculate a consistent meaning of “importance” where every page’s importance is the sum of the importance of the pages pointing to it Like scholarly citations of scholarly papers 88 Did we mention that searches are logged? Google Analytics: for marketing To help tune the search engine But many searches are personally identifiable! 89 The AOL search data release 90 What should happen? 91