Download The Content and Access Dynamics of a Busy Web Server: Findings

Document related concepts
no text concepts found
Transcript
Integrated Approach to
Improving Web Performance
Lili Qiu
Cornell University
1
Outline


Motivation & Open Issues
Solutions





Study Web workload, and properly provision the
content distribution networks
Optimizing TCP performance for Web transfers
Fast packet classification
Summary
Other Work
2
Motivation


Web is the dominant traffic in the
Internet today
Web performance is often unsatisfactory


WWW – World Wide Wait
Consequence: losing potential customers!
Network
congestion
Overloaded
Web server
3
Why is the Web so slow?

Application layer


Transport layer


Web transfers are short and busty, and interact poorly
with TCP
Network layer




Web servers are overloaded …
Routers are not fast enough
Network congestion
Route flaps and routing instabilities
…
Inefficiency in any layer of the
protocol stack can slow down the Web!
4
Our Solutions

Application layer



Transport layer


Study Web Workload
Properly provision content distribution
networks (CDNs)
Optimize TCP startup performance for Web
transfers
Network layer

Speed up packet classification (useful for
firewall & diff-serv)
5
Part I Application Layer Approach

Study the workload of busy Web servers


The Content and Access Dynamics of a Busy Web Site:
Findings and Implications. Proceedings of ACM
SIGCOMM 2000, Stockholm, Sweden, August 2000.
(Joint work with V. N. Padmanabhan)
Properly provision content distribution networks

On the Placement of Web Server Replicas. Submitted to
INFOCOM'2001. (Joint work with V. N. Padmanabhan and
G. M. Voelker)
6
Introduction



Solid understanding of Web workload is critical
for designing robust and scalable systems
The workload of popular Web servers is not well
understood
Study the content and access dynamics of MSNBC
web site





a large news server
one of the busiest sites in the Web
25 million accesses a day (HTML content alone)
Period studied: Aug. – Oct. 99 & Dec. 17, 98 flash crowd
Properly provision content distribution networks

Where to place the edge servers in the CDNs
7
Temporal Stability of
File Popularity

Methodology

17DEC98 - 18OCT99
01AUG99 - 18OCT99
17OCT99 - 18OCT99
Extent of overlap
1

0.8

0.6

0.4
Results

0.2
0

1
10
100
1000
# popular documents picked
10000
Consider the traces from a
pair of days
Pick the top n popular
documents from each day
Compute the overlap
100000

One day apart:significant
overlap (80%)
Two months apart: smaller
overlap (20-80%)
Ten months apart: very
small overlap (mostly below
20%)
The set of popular documents remains stable for days
8
Spatial Locality in
Client Accesses
Dec. 17, 1998
1.2
1
Fraction of requests shared
Fraction of requests shared
Normal Day
0.8
0.6
0.4
0.2
0
1
0.8
Trace
0.6
Random
0.4
0.2
0
0
10000
20000
30000
Domain ID
40000
50000
0
5000
10000 15000 20000 25000 30000 35000
Domain ID
Domain membership is significant
except when there is a “hot” event of global interest
9
Spatial Distribution of
Client Accesses

Cluster clients using
network aware clustering
[KW00]


IP addresses with the
same address prefix
belongs to a cluster
Top 10, 100, 1000, 3000
clusters account for
about 24%, 45%, 78%,
and 94% of the requests
respectively
A small number of client clusters
contribute most of the requests.
10
The Applicability of Zipf-law
to Web requests
MSNBC
Proxies
Less popular servers
2
1.5

1
0.5
0

The Web requests follow Zipf-like distribution


Request frequency  1/i, where i is a document’s ranking
The value of  is much larger in MSNBC traces




1.4 – 1.8 in MSNBC traces
smaller or close to 1 in the proxy traces
close to 1 in the small departmental server logs [ABC+96]
Highest when there is a hot event
11
Impact of larger 
Percentage of Requests

1.2
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
Percentage of Documents (sorted by popularity)
12/17/98 Server Traces
10/06/99 Proxy Traces
08/01/99 Server Traces
Accesses in MSNBC traces
are much more concentrated
90% of the accesses are
accounted by

Top 2-4% files in MSNBC
traces

Top 36% files in proxy
traces (Microsoft proxies
and the proxies studied in
[BCF+99])

Top 10% files in small
departmental server logs
reported in [AW96]
Popular news sites like MSNBC see much more concentrated
accesses  Reverse caching and replication can be very effective!
12
Introduction to Content
Distribution Networks (CDNs)

server
server
CDN server

server
server
Content providers want to
offer better service to their
clients at lower cost
Increasing deployment of
content distribution networks
(CDNs)

Clients
Content
Providers


Akamai, Digital Island, Exodus …
Idea: a network of servers
Features:
 Outsourcing infrastructure
 Improve performance by
moving content closer to end
users
 Flash crowd protection
13
Placement of CDN servers

server
Goal

server
CDN server

server
server
minimize users’ latency or
bandwidth usage
Minimum K-median problem

Select K centers to minimize
the sum of assignment costs

Clients
Content
Providers

Cost can be latency or
bandwidth or other metric we
want to optimize
NP-hard problem
14
Placement Algorithms

Tree based algorithm [LGG+99]



Random


Assume the underlying topologies are trees, and
model it as a dynamic programming problem
O(N3M2) for choosing M replicas among N potential
places
Pick the best among several random assignments
Hot spot

Place replicas near the clients that generate the
largest load
15
Placement Algorithms (Cont.)

Greedy algorithm
Greedy(N,M) {
for I = 1 .. M {
for each remaining replica R {
cost[R] = cost after placing an
additional replica at R
}
select the replica with the lowest cost
}
}

Super Optimal algorithm

Lagrangian relaxation + subgradient method
16
Simulation Methodology

Network topology

Randomly generated topologies


Real Internet network topology


AS level topology obtained using BGP routing data
from a set of seven geographically dispersed BGP
peers
Web Workload

Real server traces


Using GT-ITM Internet topology generator
MSNBC, ClarkNet, NASA Kennedy Space Center
Performance Metric

Relative performance: costpractical/costsuper-optimal
17
Simulation Results in
Random Tree Topologies
18
Simulation Results in
Random Graph Topologies
19
Simulation Results in
Real Internet Topologies
20
Effects of Imperfect Knowledge
about Input Data

Predict load using moving window average
(a) Perfect knowledge
about topology
(b) Knowledge about Topology
with a factor of 2 accurate
21
Conclusion


Characterize Web workload using MSNBC traces
Placement of CDN servers


Knowledge about client workload and topology is crucial
for provisioning CDNs
The greedy algorithm performs the best


The greedy algorithm is insensitive to noise


Stay within a factor of 2 of the super-optimal when the
salted error is a factor of 4
The hot spot algorithm performs nearly as well


Within a factor of 1.1 – 1.5 of super-optimal
Within a factor of 1.6 – 2 of super-optimal
How to obtain inputs


Moving window average for load prediction
Using BGP router data to obtain topology information
22
Part II Transport Layer
Approach

Speeding Up Short Data Transfers: Theory,
Architectural Support, and Simulation Results.
Proceedings of NOSSDAV 2000 (Joint work with
Yin Zhang and Srinivasan Keshav)
23
Motivation

Characteristics of Web data transfers



Short & bursty [Mah97]
Use TCP
Problem: Short data transfers interact
poorly with TCP !
24
TCP/Reno Basics

Slow Start



Congestion Avoidance


Exponential growth in
congestion window,
Slow: log(n) round trips
for n segments
Linear probing of BW
Fast Retransmission

Triggered by 3
Duplicated ACK’s
25
Related Work

P-HTTP [PM94]


T/TCP [Bra94]




Cache connection count, RTT
TCP Control Block Interdependence [Tou97]:


Reuses a single TCP connection for multiple Web
transfers, but still pays slow start penalty
Cache cwnd, but large bursts cause losses
Rate Based Pacing [VH97]
4K Initial Window [AFP98]
Fast Start [PK98, Pad98]

Need router support to ensure TCP friendliness
26
Our Approach


Directly enter Congestion Avoidance
Choose optimal initial congestion window

A Geometry Problem: Fitting a block to the
service rate curve to minimize completion time
27
Optimal Initial cwnd

Minimize completion time by having the
transfer end at an epoch boundary.
28
Shift Optimization

Minimize initial cwnd while keeping the
same integer number of RTT’s
Before optimization:
cwnd = 9
After optimization:
cwnd = 5
29
Effect of Shift
Optimization
30
TCP/SPAND

Estimate network state by sharing performance
information

SPAND: Shared PAssive Network Discovery
[SSK97]
Internet
Web Servers


Performance
Server
Directly enter Congestion Avoidance, starting with
the optimal initial cwnd
Avoid large bursts by pacing
31
Implementation Issues

Scope for sharing and aggregation



Collecting performance information


Sliding window average
Retrieving estimation of network state


Performance reports, New TCP option, Windmill’s
approach, …
Information aggregation


24-bit heuristic
network-aware clustering [KW00]
Explicit query, active push, …
Pacing

Leaky bucket based pacing
32
Opportunity for Sharing

MSNBC: 90% requests arrive within 5 minutes
since the most recent request from the same
client network (using 24-bit heuristic)
33
Cost for Sharing

MSNBC: 15,000-25,000 different client networks
in a 5-minute interval during peak hours (using 24bit heuristic)
34
Simulation Results

Methodology


Performance Metric


Download files in rounds
Average completion time
TCP flavors considered




reno-ssr: Reno with slow start restart
reno-nssr: Reno w/o slow start restart
newreno-ssr: NewReno with slow start restart
newreno-nssr: NewReno w/o slow start restart
35
Simulation Topologies
36
T1 Terrestrial WAN Link with
Single Bottleneck
37
T1 Terrestrial WAN Link with
Multiple Bottlenecks
38
T1 Terrestrial WAN Link with Multiple
Bottlenecks and Heavy Congestion
39
TCP Friendliness (I)
Against reno-ssr with 50-ms Timer
40
TCP Friendliness (II)
Against reno-ssr with 200-ms Timer
41
Conclusions

TCP/SPAND significantly reduces latency
for short data transfers





35-65% compared to reno-ssr / newreno-ssr
20-50% compared to reno-nssr / newreno-nssr
Even higher for fatter pipes
TCP/SPAND is TCP-friendly
TCP/SPAND is incrementally deployable


Server-side modification only
No modification at client-side
42
Part III Network Layer
Approach

Fast Packet Classification on Multiple Dimensions.
Cornell CS Technical Report 2000-1805, July
2000. (Joint work with G. Varghese and S. Suri, in
progress)
43
Motivation


Traditionally, routers forward packets based on
the destination field only
Diff-serv and firewall require layer 4 switching


forward packets based on multiple fields in the packet
header, e.g. source IP address, destination IP address,
source port, destination port, protocol, type of service
(tos) …
The general packet classification problem has poor
worst-case cost:

Given N arbitrary filters with k packet fields


either the worst-case search time is ((logN)k-1)
or the worst-case storage is O(Nk)
44
Problem Specification

Given a set of filters (or rules), where each filter
specifies




a class of packet headers based on K fields
an associated directive, which specifies how to
forward the packet matching this filter
Goal: Find the best matching filter for each
incoming packet
A packet P matches a filter F if every field of P
matches the corresponding field of F


Exact match, prefix match, or range match
Assume prefix matching
45
Problem Specification (Cont.)

Example of Cisco Access control List
(ACL)
1.
2.
3.

access-list 100 deny udp 26.145.168.192
255.255.255.255 74.199.168.192
255.255.255.0 eq 2049
access-list 100 permit ip
74.199.191.192 255.255.0.0 255
74.199.168.192.255.0.0
access-list 100 permit tcp
250.197.149.202 255.0.0.0 74.199.20.76
255.0.0.0
Packet: tcp 250.19.34.34 74.23.5.12 matches
filter 3
46
Backtracking Search
F1
00*
F2
10*


D
E
A trie is a binary
branching tree, with each
branch labeled 0 or 1
The prefix associated
with a node is the
concatenation of all the
bits from the root to the
node
47
Backtracking Search (Cont.)



Extend to multiple
dimensions
Backtracking is a
depth-first traversal
of the tree which
visits all the nodes
satisfying the given
constraints
Example: search for
[00*,0*,0*]
48
Trie Compression Algorithm
 If a path AB satisfies the Compressible Property:
 All nodes on its left point to the same place L
 All nodes on its right point to the same place R
then we compress the entire branches by 3 edges
 Center edge with value (AB) pointing to B
 Left edge with value < (AB) pointing to L
 Right edge with value > (AB) pointing to R
 Advantages of compression: save time & storage
49
Trading Storage for Time

Smoothly tradeoff storage for time
Exponential
Time

Exponential
Space
Selective push
 Push down the filters with large backtracking
time
 Iterate until the worst-case backtracking time
satisfies our requirement
50
Example of Selective Push
Goal: worst-case memory
accesses < 12



The filter [0*, 0*, 0000*]
has 12 memory accesses.
Push the filter down 
reduce lookup time
Now the search cost of
the filter [0*,0*,001*]
becomes 12 memory
accesses. So we need to
push it down. Done!
51
Using Available Hardware


So far, we focus on software techniques for
packet classification.
Further improve the performance by taking
advantage of limited hardware if it is available


By moving some filters (or rules) from software to
hardware
Key issue: Which filters to move from software to
hardware?
Answer:

To reduce lookup time, move the filters with the largest
number of memory accesses when using software approach
52
Summary
Approach
Description
Performance Gain
Trie
compression
algorithm
Effectively exploit
Reduce lookup time by a
redundancy in trie nodes factor of 2 – 5, save
storage by a factor of
2.8 – 8.7
Selective
push
“Push down” the filters
with large backtracking
time
Reduce lookup time by 10
– 25% with only marginal
increase in storage
Moving
filters from
software to
hardware
Heuristics to move a
small number of filters
from software to
hardware
Moving 10 – 20 rules to
hardware cuts storage
by 33% - 50%, or lookup
time by 10% – 20%
53
Contributions

Application layer



Transport layer


Study Web Workload of busy Web servers
Properly provision content distribution
networks
Optimize TCP startup performance for short
Web transfers
Network layer

Speed up packet classification
54
Other Work




Available at
http://www.cs.cornell.edu/lqiu/papers/papers.html
Integrating Packet FEC into Adaptive Voice Playout
Buffer Algorithms on the Internet. Proceedings of
IEEE INFOCOM'2000, Tel-Aviv, Israel, March 2000.
On Individual and Aggregate TCP Performance. 7th
International Conference on Network Protocols
(ICNP'99), Toronto, Canada, October 1999.
Understanding the End-to-End Performance Impact of
RED in a Heterogeneous Environment. July 2000.
Submitted to INFOCOM'2001.
55
Integrating Packet FEC into Adaptive
Voice Playout Buffer Algorithms

Internet telephony are subject to



Variable loss rate
Variable delay
Previous work has addressed the two
problems separately


Use FEC for loss recovery
Use playout buffer adaptation for delay
jitter compensation
56
Integrating Packet FEC into Adaptive
Voice Playout Buffer Algorithms
(Cont.)

Our work

Demonstrate the interaction between playout
algorithm and FEC



Playout algorithm should depend on both FEC and
network loss conditions and network jitter
Propose several playout algorithms that provide
this coupling
Demonstrate the effectiveness of the
algorithms through simulations
57
On Individual and Aggregate
TCP Performance

Motivation


TCP behavior under many competing TCP
connections has not been sufficiently
explored
Our work

Use extensive simulations to investigate
the individual and aggregate TCP
performance for many concurrent
connections
58
On Individual and Aggregate
TCP Performance (Cont.)

Major findings

All connections have the same rtt






Wc > 3*Conn  global synchronization
Conn < Wc < 3*Conn  local synchronization
Wc < Conn  shut off connections
Adding random processing time 
synchronization and consistent discrimination
less pronounced
Derive the general characterization of overall
throughput, goodput, and loss probability
Quantify the roundtrip bias for connections
with different RTT
59
Understanding the End-to-End
Performance Impact of RED in a
Heterogeneous Environment

Motivation



IETF recommends wide spread
deployment of RED in routers
Most previous work studies RED in
relatively homogeneous environment
Our work

Investigate the interaction of RED with
five types of heterogeneity
60
Understanding the End-to-End
Performance Impact of RED in a
Heterogeneous Environment (Cont.)

Major findings

Mix of short and long TCP connections


Mix of TCP and UDP


ECN-capable TCP connections get higher goodput than nonECN-capable TCP connections
Effect of different RTT


Bursty UDP tends to get lower loss rate with RED than with
Drop Tail
Mix of ECN and non-ECN capable traffic


Short TCP connections get higher goodput with RED than
with Drop Tail
RED reduces the bias against long-RTT bulk transfers
Effect of two-way traffic

When ACK path is congested, TCP gets higher goodput with
RED than with Drop Tail
61
Effects of Imperfect
Knowledge about Input Data
62
Effects of Imperfect Knowledge
about Input Data (Cont.)

The effect of imperfect
topology information


Randomly remove from 0 up
to 50% edges in the AS
topology derived from the
BGP routing tables
The greedy algorithm is
insensitive to edge removal

Performs within 2.6 of
optimal when the edge
removal is 50%
63