Download ppt - Network and Systems Laboratory

Document related concepts

IEEE 1355 wikipedia , lookup

Net neutrality wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Net neutrality law wikipedia , lookup

Network tap wikipedia , lookup

Distributed firewall wikipedia , lookup

Computer network wikipedia , lookup

Peering wikipedia , lookup

TCP congestion control wikipedia , lookup

Internet protocol suite wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Deep packet inspection wikipedia , lookup

Net bias wikipedia , lookup

Airborne Networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Transcript
Analyzing the Internet
Seminar 37-310
Polly Huang
[email protected]
11 April, 2000
1
Not quiet yet
• We don’t quite know how to analyze the
Internet yet.
• What do I mean by analyzing the Internet
– determine how much buffer for certain queues
– determine what form of flow control is more
appropriate
– where to place web caches
2
Series of two talks
• current status: what do we know
– overview
– review basic statistics
• the future: what can we do next
– identify the missing pieces
– put all the pieces together
3
Outlines
• Internet
• different from the telephone network
• the telephone network experience doesn’t
help much
• from exponential to heavy-tailed
4
Internet basic components
• like the postal system
• nodes
– end hosts and less number of routers
– homes and local/remote post offices
• links
– connecting nodes (Ethernet, T1, T3, OC3,
OC12, etc)
– roads/streets between homes and post offices
5
Internet basic constructions
• packets
– with IP addresses (129.132.66.28)
– with postal addresses (Gloriastrasse 35)
• protocols
– packets sent with TCP (reliable)
– packets sent with registered mail with
confirmation
– but no congestion control
6
Outlines
• Internet
• different from the telephone network
• the telephone network experience doesn’t
help much
• from exponential to heavy-tailed
7
Telephone network
• nodes
– telephones and switches
• links
– connecting nodes
8
But connection-oriented
•
•
•
•
reserved fully from one end to another
no need for congestion control
don't need to be 100% reliable
calls blocked from time to time
9
Different in many ways
• network topology expansion
– centralized vs. highly distributed
• traffic
– although humans still initiate calls/web sessions
– computers are doing most of the talking
– (a sign of Poisson not working anymore)
• probably can’t use the nice queuing
theory they have!!!
10
Outlines
• Internet
• different from the telephone network
• the telephone network experience doesn’t
help much
• from exponential to heavy-tailed
11
Theory for data network
• Maybe the telephone network experience
will help
12
Planning telephone network
• Topology
– big telephone companies know it's telephone
network
• Traffic
– voice phone connections were quickly
identified as Poisson
13
Queuing theory
• quickly emerging the era of queuing theory
– Poisson call arrivals
– exponential call duration
– Poisson mixing with Poisson is still Poisson
• pens and papers only
• (though start to see problems with FAX and
Internet accesses)
14
Then, we ask:
• Is it as easy for the data network (a.k.a. the
Internet)?
15
NO!
• Difficulties
– topology
– traffic
16
For topology
• changes are highly decentralized and highly
dynamic
• on one knows what the entire network look
like at the moment
17
For data traffic
• computers are now doing most of the
talking
• proved in several studies that data
connections are not Poisson (or exponential)
• bye-bye queuing theory
18
Outlines
• Internet
• different from the telephone network
• the telephone network experience doesn’t
help much
• from exponential to heavy-tailed
19
Curve fitting?
• too many parameters for a perfect fit
• nothing seems to be typical
– MCI backbone, Sept. 1997, 70% HTTP
– UCB Internet link, Dec. 1997, 37% HTTP
– Mar 1998, LBNL, median transfer 10,900 bytes
– Dec 1998, LBNL, median transfer 5,600 bytes
20
Traces?
• the Internet changes as we speak
• a little bit of difference in the network
condition may lead to very different results
(a non-linear system)
21
Invariants
• must search for 'invariants' that doesn't
change with time or location?
22
Heavy-tailed
• it turned out computer processes tend to be
heavy-tailed or power-law distributed!
–
–
–
–
–
–
–
CPU time consumed by Unix processes
size of Unix files
size of compressed video frames
size of FTP bursts
Telnet packet interarrivals
size of Web items
Ethernet bursts
23
How to tell?
24
Review some Statistics
•
•
•
•
•
density vs. distribution
Poisson
exponential
Pareto
self-similarity
25
Density vs. Distribution
• Density is the probability of certain events
to happen
– f(x)
• Distribution is usually referred to as the
accumulative density
– f(0)+f(dz)+f(2*dz)+…+f(x)
– F(x) = 0->xf(z) dz
26
exponential
• # of time units between events
• f(x) = ce-cx
27
Example exponential process
28
Poisson
• # of events per time unit
• f(x) = ce-c/x!
29
Example Poisson process
30
Pareto
• one of the heavy-tailed distributions
• f(x) = c*kc/(xc+1)
31
Example Pareto process
32
Distinguishing them
• density
• log density
• log-log density
33
Density
Log
Density
Log-Log
Density
34
Teletraffic vs. Data traffic
• Teletraffic
Exp
Exponential
• Data traffic
Exponential
Heavy tailed
35
Animation
• Show the telephone vs. Internet traffic demo
36
37
Self-similarity
• Distributions of #packets/unit look alike in
different time scale
Serpgask Triangles
38
Wavelet Analysis
•
•
•
•
•
FFT - frequency decomposition dj
WT - frequency and time decomposition dj,k
k(dj,k2) / Nj  Ej
Ej = 2j(2H-1) C (The magic!!) log2Ej Self-Similar
log2 Ej = (2H-1) j + log2C
-jj
39
’Shape' of self-similarity
Self-similar
Periodic
Multifractal?
40
Revisit the original goal
• Can we analyzing data networks?
– Topology???
– Traffic?
• Poisson arrival
• heavy-tailed duration
• self-similar aggregated traffic
• Pure analytical modeling for data network?
– a.k.a. only pens and papers?
41
NO!
• probably not in a few years
• confirmed by the experts
42
A few Reasons
• can't use well-known self-similar (or fractal)
processes
• not exactly self-similar
• 'shape' self-similarity changes with the
network conditions
• don't know what 'self-similar' processes add
up to (mathematically difficult)
43
No more math!
20 min break
44
Series of two talks
• current status: what do we know
– overview
– basic statistics review
• the future: what can we do (a new
research project)
– identify the missing pieces
– put all the pieces together
45
The project
• Modeling and Simulation for Large-scale
Data Networks
46
Project goals
• identify (at least) high-level user and system
characteristics
• run simulations the scale of the Internet
with significant packet-level detail on a PC
• or a cluster of low-cost PCs
47
A beautiful picture
• graduate students happily analyzing
protocols in realistic Internet topology and
traffic setups on their home PC (or a cluster
of low-cost PCs)
• (might not come true exactly)
48
Two parts
• modeling
• simulations
49
Review the models we know
•
•
•
•
•
char in the connectivity and routing
char in the bandwidth
char in user behavior
char in object size distribution
char in client/server location
50
Take the simulation approach
• setup the network topology
• select client and server
• generate traffic
51
validation
• verifying simulation results with the selfsimilarity (or multifractal) aggregated traffic
52
Two parts
• modeling
• simulations
53
Scalable Simulation
“[To simulate] five minutes of activity on a
network the size of today’s Internet would
require gigabytes of real memory and
months of computation on today’s 100 MIPS
uniprocessors.”
--- Ahn and Danzig, 1996
54
Scaling Solutions
• Parallel and distributed simulation
• Implementation Tuning
• Abstraction
55
Abstraction Techniques
• Large-scale Network Topology
– Algorithmic routing
• Large-scale Network Traffic
– Finite state automata modeling
56
Bottlenecks
• Topology - routing information
• Traffic - TCP
57
Routing Table Cost
• All-pair shortest pair routing
– O(N2)
• Hierarchical routing
– O(N lgN)
• Algorithmic Routing
– O(N)
58
Algorithmic Routing
• Next hop lookup
• Topology mapping: arbitrary -> tree
59
Route Lookup
a-1
2
a
2a+1
3
4
….
5
6
10
21
43
• walk up from b to
root by (b=(b-1)/2)
2
1
2a+2
• next_hop(a,b)
0
– if reaching a, return
last node visited
• else, return (a-1)/2
22
44
45
46
Next_hop(10,44)=21
Next_hop(1,45)=4
Next_hop(5,43)=2
60
Topology Mapping
6
0
5
1
2
0
7
1
BFS
4
2
3
O(N)
0
1
4
5
5
4
7
Re-assign
2
6
3
6
3
10
61
Evaluation
Memory
Route Length
Ns-2 routing allocation artifact
1
6.00
Flat
5.00
Hier
4.00
Algo
3.00
# Hops or %
MB
7.00
0.8
0.6
diff
diff%
0.4
0.2
0
0
200
400
600
# Nodes
0
200
400
600
# Nodes
• Transit-stub Topologies
• Short cycles
62
TCP
n0
•
•
•
•
n1
Slow start -> congestion avoidance
Retransmission and timeouts
A lot of variables per TCP connection!!
A batch of packets per RTT or Timeout
63
FSA TCP
• Coarse-grain TCP behavior
• FSA for Short TCP connections
– Numbers of packets sent per round trip time or
timeout
– Combinations of packet drops
• Preservation of the close-loop feedback
control (the KEY property)
64
Reno TCP (Partial)
1
2
4
8
16
28-30
15-27
12 14
7
(7,7)
10
8
4 6
46
1
(5,5)
1
6
1
(6,6)
6
5
7
6
7
2+2
2
2
4
(4,4)
3
(3,3)
1
2
(wnd, ssh) = (1,2)
4
3
2
5
4
3
5
4
5
6
6
6
7
7
65
Evaluation
Memory
% Difference in Throughput
800
detailed
fsa tcp
400
200
0
0
50
# web sessions
100
%
MB
600
4
3.5
3
2.5
2
0
50
100
# of web sessions
• ISP-like topology
• Each web session generates ~200 TCP
connections
66
Case Study : Self-similar Traffic
• SIGCOMM 99, Anja Feldmann, et. al.
• Internet traffic characteristics
– Large scale: self-similarity (user-related
factors)
– RTT scale: periodicity (TCP close-loop control)
– Small scale: multifractal? (TCP ack clocking?)
• Wavelet-based analysis: global scaling plot
to detect self-similarity and periodicity
67
Evaluation
Self-similar
Periodic
Multifractal?
FSA TCP’s delay difference is ~10msec!!
Time series are taken every 10msec!!
Not appropriate for multifractal analysis!!!
68
Future in modeling
• client and server location
• aggregated traffic
– explaining the shapes of self-similarity
– temporal and spatial correlation
• impact of web caching, IP telephony,
pricing, diffserv (QoS) on existing models
• classifying the Internet
69
Future in simulation
• Algorithmic Routing
– tree search algorithm
– optimizations
– Internet topology
• FSA TCP
– automatic FSA generation (Markov chain
model for short TCP connections)
– packet batch representatives
70
You can be in that future!
71
A Few Tips to Prepare Slides
•
•
•
•
must have outlines/roadmaps/overview
a picture is worth a thousand words
keep it less than 3 bullets per slide
http://www.diz.ethz.ch/dienstleistungen/unt
erlagen/ssp_unterlagen.html
72