Download Representing Information

Document related concepts

Zero-configuration networking wikipedia , lookup

Lag wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

Airborne Networking wikipedia , lookup

Network tap wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

IEEE 1355 wikipedia , lookup

RapidIO wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Computer network wikipedia , lookup

Deep packet inspection wikipedia , lookup

Distributed firewall wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Real-Time Messaging Protocol wikipedia , lookup

Net bias wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Internet protocol suite wikipedia , lookup

TCP congestion control wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Transcript
Harvard University
CSCI E-2a
Life, Liberty, and
Happiness
After the Digital Explosion
4: Search
1
• google.com
• google.cn
• baidu.cn
2
3
ARPAnet, 1971
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
4
Clients and Servers
Web Server
e-mail Server
e-mail Server
www.google.com
Mail.yahoo.com
smtp.fas.harvard.edu
QuickTi me™ and a
T IFF (Uncom pressed) decom pressor
are needed to see t his pict ure.
download
QuickTi me™ and a
T IFF (Uncom pressed) decom pressor
are needed to see t his pict ure.
THE INTERNET
Quic k Ti me™ and a
T IFF (Unc om pres s ed) dec om pres s or
are needed to s ee t his pic t ure.
upload
5
Client Computers
IP = Internet Protocol
Store and Forward Switch =
Router
Router in network core receives
incoming packets and stores them in
“buffer” (temporary storage)
Routes packets on outgoing links
May throw packets away if buffer is full
Routing Table
6
“End to End”:
Intelligence at Edge of
Network
Routers are relatively dumb and
rely on intelligence at the edge to
compensate
Client application:
email, web
browser, iTunes
• Packetize
• Add serial #s
Server application
BEST EFFORT
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• Reassemble packets
• Add fingerprint
• (Maybe) report missing
packets
• Add destination
address
• (Maybe) report damaged
packets
7
• Insert into network
• Deliver to application
Packets
Packet size (1.5 KB max) a compromise
Small enough that they can be
“handled” quickly and with relatively
low odds of being damaged
Large enough that packaging does not
outweigh the contents or “payload”
8
IP Addresses
 IPv4: 32 bits written as 4 decimal numerals less than
256, e.g. 141.211.125.22 (UMich)
 4 billion not enough
 IPv6: 128 bits written as 8 blocks of 4 hex digits each,
e.g. AF43:23BC:CAA1:0045:A5B2:90AC:FFEE:8080
 At edge, translate URLs --> IP addresses, e.g.
umich.edu --> 141.211.125.22
 Authoritative sites for address translation = “Domain
Name Server” (DNS)
 In the network core, IP addresses are used to route
packets using routing tables
9
10
But who controls the
names and numbers?
ICANN = Internet Corporation
for Assigned Names and
Numbers
A US nonprofit … but it’s a long
story.
11
12
The Internet is IP
 Routers do not know what the bits in the
packets represent
 Do not know if they are email, streaming
video, html web pages
 Do not know if they are encrypted or
unencrypted
 You can invent your own new service
adhering to IP standards
 Gain Internet’s best-effort service
 and possibility of undelivered packets
13
Striping
1
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
2 3 4
14
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
2 3 4
1
15
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
3 4
2
1
16
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets
arrive
4
3
2
1
17
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
4
3
2
1
18
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
4
3
2
1
19
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
4
3
2
1
20
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
4
3
2
1
21
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
4
1
3
2
22
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
4
1
2
3
23
Striping
Smallish packets also make better
use of the network since later
packets can leave before earlier
packets arrive
1
2
3
4
24
Striping Utilizes the
Network
Store and Forward delays would
add up if entire message had to be
buffered at every router
1
2
3
4
25
Striping Utilizes the
Network
Store and Forward delays would
add up if entire message had to be
buffered at every router
1
2
3
4
26
Striping Utilizes the
Network
Store and Forward delays would
add up if entire message had to be
buffered at every router
1
2
3
4
27
Striping Utilizes the
Network
Store and Forward delays would
add up if entire message had to be
buffered at every router
1
2
3
4
28
Striping Utilizes the
Network
Store and Forward delays would
add up if entire message had to be
buffered at every router
1
2
3
4
29
Striping Utilizes the
Network
Store and Forward delays would
add up if entire message had to be
buffered at every router
1
2
30
3
4
TCP
Transport Control Protocol
 Creates logical connection between two
machines on the edge of the network
 Connected machines seem to have a circuit
connecting them even though they do not tie
up the network
 Provide reliable, perfect transport of
messages, even though IP may drop packets
 Regulates the rate at which packets are
inserted into the network
31
TCP, Basic Idea
32
TCP, Basic Idea
1 2
33
TCP, Basic Idea
1 2
+
“3-Way Handshaking”
34
TCP, Basic Idea
1 2
+
“3-Way Handshaking”
35
TCP, Basic Idea
1 2
+
“3-Way Handshaking”
36
TCP, Basic Idea
1 2
+
“3-Way Handshaking”
37
TCP, Basic Idea
1 2
+
“3-Way Handshaking”
38
TCP, Basic Idea
+
1 2
“3-Way Handshaking”
39
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
40
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
41
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
42
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
43
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
44
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
45
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
46
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
47
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
48
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
49
TCP, Basic Idea
1 2
*
“3-Way Handshaking”
50
TCP, Basic Idea
*
1 2
“3-Way Handshaking”
51
TCP, Basic Idea
1 2
“Virtual Circuit” now established between two hosts
though the routers in between are not aware of it and
the same path need not be followed by all packets
52
TCP, Basic Idea
1 2
1
53
TCP, Basic Idea
1 2
1
54
TCP, Basic Idea
1 2
2
1
55
TCP, Basic Idea
1 2
2
1
56
TCP, Basic Idea
1 2
2
1
57
TCP, Basic Idea
1 2
2
1
58
TCP, Basic Idea
1 2
2
1
ACK1
59
TCP, Basic Idea
1 2
2
1
ACK1
60
TCP, Basic Idea
1 2
1 2
ACK1
ACK2
61
TCP, Basic Idea
1 2
1
ACK1
ACK2
62
TCP, Basic Idea
1 2
1 2
ACK1
ACK2
63
TCP, Basic Idea
2
1 2
ACK2
64
TCP, Basic Idea
2
1 2
ACK2
65
TCP, Basic Idea
2
1 2
ACK2
66
TCP, Basic Idea
1 2
67
TCP, Basic Idea
1 2
68
Dropped Packets and
Retransmission
1 2
1
69
Dropped Packets and
Retransmission
1 2
1
70
Dropped Packets and
Retransmission
1 2
2
1
71
Dropped Packets and
Retransmission
1 2
2
72
Dropped Packets and
Retransmission
1 2
2
73
Dropped Packets and
Retransmission
1 2
2
74
Dropped Packets and
Retransmission
1 2
2
75
Dropped Packets and
Retransmission
1 2
2
76
TIMEOUT
1 2
Dropped Packets and
Retransmission
1
2
77
The World Wide Web
One of the facilities or services
provided by certain of the
computers on the Internet
A logical network of web pages
that need not be on physically
connected computers
78
http://www.harvard.edu
http://www.president.harvard.edu/
http://www.ksg.harvard.edu/
http://www.harvard.edu
QuickTime™ and a
TIFF (U ncompressed) decompressor
are needed to see this picture.
http://www.news.harvard.edu/gazette/…
79
http://www.brighamandwomens.org/PressReleases/…
URL = Uniform Resource Locator
Request “www.google.com”
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
The Internet
Receive html code
Your computer
Google’s computer
80
Searching the Web
Finding pages referring to the
search terms
Deciding which pages are the most
“relevant”
81
Finding Relevant Pages
1.
Build an index ahead of time
Eddington
URL, URL, …
Edison
URL, URL, …
Edmonton
URL, URL, …
2. When queried, look up in the index
82
Building the Index
Google “crawls” the entire Web, following
links and loading the pages they point to
Every time it retrieves a page, it
indexes everything on the page
maybe keep a “cached” copy of the page
A complete crawl probably takes a week or
two
Opt-out
Caching and copyrights?
83
Search = lookup +
ranking
 Look a term up in a huge index to
retrieve a set of URLs
 Or several terms …
 Rank the results in order of
“usefulness” or “desirability”
 Or political correctness!
 Try “falun gong” on google.com and
google.cn
 Or profitability??
84
Basic Structure of the Index
The Lexicon
The Lists of Pages
Eddington
URL, URL, …
Edison
URL, URL, …
Edmonton
URL, URL, …
Eddington
Edison
Edmonton
Primary Memory
Secondary Memory
85
Page Ranking
Hugely important commercially
Page rank is really a new kind of
capital
People try to “spoof” ranking
algorithms
Search engineers try to detect and
discount spoofing
Endless game of cat and mouse …
86
“A page is important if a
lot of pages point to it”
Probably wrong. Also easy to spoof
87
“A page is important if a
lot of important pages
point to it”
Circular?
Not really. Can calculate a consistent
meaning of “importance” where every
page’s importance is the sum of the
importance of the pages pointing to it
Like scholarly citations of scholarly
papers
88
Did we mention that
searches are logged?
Google Analytics: for marketing
To help tune the search engine
But many searches are personally
identifiable!
89
The AOL search data
release
90
What should happen?
91