Download 12Introspection - BNRG - University of California, Berkeley

Document related concepts

Video on demand wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Network tap wikipedia , lookup

Computer network wikipedia , lookup

Dynamic Host Configuration Protocol wikipedia , lookup

Distributed firewall wikipedia , lookup

IEEE 1355 wikipedia , lookup

Deep packet inspection wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Net bias wikipedia , lookup

Airborne Networking wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Internet protocol suite wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Remote Desktop Services wikipedia , lookup

Real-Time Messaging Protocol wikipedia , lookup

Peer-to-peer wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Hypertext Transfer Protocol wikipedia , lookup

TCP congestion control wikipedia , lookup

Lag wikipedia , lookup

Transcript
Berkeley-Helsinki Summer Course
Lecture #12: Introspection and
Adaptation
Randy H. Katz
Computer Science Division
Electrical Engineering and Computer Science Department
University of California
Berkeley, CA 94720-1776
1
Outline
•
•
•
•
Introspection Concept and Methods
SPAND Content Level Adaptation
MIT Congestion Manager/TCP Layer Adaptation
ICAP Cache-Layer Adaptation
2
Outline
•
•
•
•
Introspection Concept and Methods
SPAND Content Level Adaptation
MIT Congestion Manager/TCP Layer Adaptation
ICAP Cache-Layer Adaptation
3
Introspection
• From Latin introspicere, “to look within”
– Process of observing the operations of one’s own mind with
a view to discovering the laws that govern the mind
• Within the context of computer systems
– Observing how a system is used (observe): usage patterns,
network activity, resource availability, denial of service
attacks, etc.
– Extracting a behavioral model from such use (discover)
– Use this model to improve the behavior of the system, by
making it more proactive, rather than reactive, to how it is
used
– Improve performance and fault tolerance, e.g., deciding
when to make replicas of objects and where to place them
4
Introspection in Computer
Systems
• Locality of Reference
– Temporal: objects that are used are likely to be used again
in the near future
– Geographic: objects near each other are likely to be used
together
• Exploited in many places
–
–
–
–
Hardware caches, virtual memory mechanisms, file caches
Object interrelationships
Adaptive name resolution
Mobility patterns
• Implications
– Prefetching/prestaging
– Clustering/grouping
– Continuous refinement of behavioral model
5
Example: Wide-Area Routing and
Data Location in OceanStore
• Requirements
– Find data quickly, wherever it might reside
» Locate nearby data without global communication
» Permit rapid data migration
– Insensitive to faults and denial of service attacks
» Provide multiple routes to each piece of data
» Route around bad servers and ignore bad data
– Repairable infrastructure
» Easy to reconstruct routing and location information
• Technique: Combined Routing and Data Location
– Packets are addressed to GUIDs, not locations
– Infrastructure gets the packets to their destinations and
verifies that servers are behaving
John Kubiatowicz
6
Two-levels of Routing
• Fast, probabilistic search for “routing cache”
– Built from attenuated Bloom filters
– Approximation to gradient search
– Not going to say more about this today
• Redundant Plaxton Mesh used for underlying
routing infrastructure:
– Randomized data structure with locality properties
– Redundant, insensitive to faults, and repairable
– Amenable to continuous adaptation to adjust for:
» Changing network behavior
» Faulty servers
» Denial of service attacks
John Kubiatowicz
7
Basic Plaxton Mesh
Incremental suffix-based routing
3
4
NodeID
0x79FE
NodeID
0x23FE
NodeID
0x993E
4
NodeID
0x035E
3
NodeID
0x43FE
4
NodeID
0x73FE
3
NodeID
0x44FE
2
2
4
3
1
1
3
NodeID
0xF990
2
3
NodeID
0x555E
1
NodeID
0x73FF
2
NodeID
0xABFE
NodeID
0x04FE
NodeID
0x13FE
4
NodeID
0x9990
1
2
2
NodeID
0x423E
3
NodeID
0x239E
1
NodeID
0x1290
John Kubiatowicz
8
Use of Plaxton Mesh
Randomization and Locality
John Kubiatowicz
9
Use of the Plaxton Mesh
(Tapestry Infrastructure)
• As in original Plaxton scheme:
– Scheme to directly map GUIDs to root node IDs
– Replicas publish toward a document root
– Search walks toward root until pointer locatedlocality!
• OceanStore enhancements for reliability:
– Documents have multiple roots (Salted hash of GUID)
– Each node has multiple neighbor links
– Searches proceed along multiple paths
» Tradeoff between reliability and bandwidth?
– Routing-level validation of query results
• Dynamic node insertion and deletion algorithms
– Continuous repair and incremental optimization of links
John Kubiatowicz
10
OceanStore Domains for
Introspection
• Network Connectivity, Latency
– Location tree optimization, link failure recovery
• Neighbor Nodes
– Clock synchronization, node failure recovery
• File Usage
– File migration
– Clustering related files
– Prefetching, hoarding
• Storage Peers
– Accounting, archive durability, backlisting
• Meta-Introspection
– Confidence estimation, stability
Dennis Geels, [email protected]
11
Common Functionality
• These targets share some requirements:
– High input rates
» Watch every file access, heartbeat, packet
transmission
– Both short- and long-term decisions
» Respond to changes immediately
» Extract patterns from historical information
– Hierarchical, Distributed Analysis
» Low levels make decisions based on local information
» Higher levels possess broader, approximate knowledge
» Nodes must cooperate to solve problem
• We can build shared infrastructure
Dennis Geels, [email protected]
12
Architecture for Wide-Area
Introspection
• Fast Event-Driven Handlers
– Filter and aggregate incoming events
– Respond immediately if necessary
• Local Database, Periodic Analysis
– Store historical information for trend-watching
– Allow more complicated, off-line algorithms
• Location-Independent Routing
– Flexible coordination, communication
Dennis Geels, [email protected]
13
Event-Driven Handlers
• Treat all incoming data as events: messages,
timeouts, etc.
– Leads to natural state-machine design
– Events cause state transitions, finite processing time
– A few common primitives could be powerful: average. count,
filter by predicate, etc.
• Implemented in “small language”
– Counts important primitives for aggregation, database access
– Facilitates implementation of introspective algorithms
» Allows greater exploration, adaptability
– Can verify security, termination guarantees
• E.g., EVENT.TYPE=“file access” : increment
COUNT in EDGES where SRC==EVENT.SRC and
DST==EVENT.SRC
Dennis Geels, [email protected]
14
Local Database, Periodic
Analysis
• Database Provides Powerful, Flexible Storage
– Persistent data allows long-term analysis
– Standard interface for event handler scripting language
– Leverage existing aggregation functionality
» Considerable work from Telegraph Project
– Can be lightweight
• Sophisticated Algorithms Run On Databases
– Too resource-intensive to operate directly on events
– Allow use of full programming language
– Security, termination still checkable; should use common mechanisms
• E.g., expensive clustering algorithm operating over edge graph,
using sparse-matrix operations to extract eigenvectors
representing related files
Dennis Geels, [email protected]
15
Location-Independent Routing
• Not a very good name for a rather simple idea.
Interesting introspective problems are inherently distributed. Coodination among nodes is
difficult. Needed:
–
–
–
–
Automatically create/locate parents in aggregation hierarchy
Path redundancy for stability, availability
Scalability
Fault tolerance, responsiveness to fluctuation in workload
• OceanStore data location system shares these
requirements. This coincidence is not surprising,
as each are instances of wide-area distributed
problem solving.
• Leverage OceanStore Location/Routing System
Dennis Geels, [email protected]
16
Summary: Introspection in
OceanStore
• Recognize and share a few common mechanisms
– Efficient event-driven handlers
– More powerful, database-driven algorithms
– Distributed, location-independent routing
• Leverage common architecture to allow system
designers to concentrate on developing &
optimizing domain-specific algorithms
Dennis Geels, [email protected]
17
Outline
•
•
•
•
Introspection Concept and Methods
SPAND Content Level Adaptation
MIT Congestion Manager/TCP Layer Adaptation
ICAP Cache-Layer Adaptation
18
SPAND Architecture
Mark Stemm
19
SPAND Architecture
Mark Stemm
20
What is Needed
• An efficient, accurate, extensible and timeaware system that makes shared, passive
measurements of network performance
• Applications that use this performance
measurement system to enable or improve
their functionality
Mark Stemm
21
Issues to Address
• Efficiency: What are the bandwidth and
response time overheads of the system?
• Accuracy: How closely does predicted value
match actual client performance?
• Extensibility: How difficult is it to add new
types of applications to the measurement
system?
• Time-aware: How well does the system adapt
to and take advantage of temporal changes in
network characteristics?
Mark Stemm
22
SPAND Approach: Shared
Passive Measurements
Mark Stemm
23
Related Work
• Previous work to solve this problem
– Use active probing of network
– Depend on results from a single host (no sharing)
– Measure the wrong metrics (latency, hop count)
• NetDyn, NetNow, Imeter
– Measure latency and packet loss probability
• Packet Pair, bprobes
– If Fair Queuing, measures “fair share” of bottleneck link b/w)
– Without Fair Queuing, unknown (min close to link b/w)
• Pathchar
– Combines traceroute & packet pair to find hop-by-hop latency &
link b/w
• Packet Bunch Mode
– Extends back-to-back technique to multiple packets for greater
accuracy
Mark Stemm
24
Related Work
• Probing Algorithms
– Cprobes: sends small group of echo packets as a simulated
connection (w/o flow or congestion control)
– Treno: like above, but with TCP flow/congestion control
algorithms
– Network Probe Daemon: traces route or makes short
connection to other network probe daemons
– Network Weather Service: makes periodic transfers to
distributed servers to determine b/w and CPU load on
each
Mark Stemm
25
Related Work
• Server Selection Systems
– DNS to map name to many servers
» Either round-robin or load balancing
– Boston University: uses cprobes, bprobes
– Harvest: uses round trip time
– Harvard: uses geographic location
– Using routing metrics:
» IPV6 Anycast
» HOPS
» Cisco Distributed Director
» University of Colorado
– IBM WOM: uses ping times
– Georgia Tech: uses per-application, per-domain probe
clients
Mark Stemm
26
Comparison with Shared
Passive Measurement
• What is measured?
– Others: latency, link b/w, network b/w
– SPAND: actual response time, application specific
• Where is it implemented?
– Others: internal network, at server
– SPAND: only in client domain
• How much additional traffic is introduced?
– Others: tens of Kbytes per probe
– SPAND: small performance reports and responses
• How realistic are the probes?
– Others: artificially generated probes that don’t
necessarily match realistic application workloads
– SPAND: actual observed performance from applications
Mark Stemm
27
Comparison with Shared
Passive Measurement
• Does the probing use flow/congestion control?
– Others: no
– SPAND: whatever the application uses (usually yes)
• Do clients share performance information?
– Others: no; sometimes probes are made on behalf of
clients
– SPAND: yes
Mark Stemm
28
Benefits of Sharing and
Passive Measurements
• Two similarly connected hosts are likely to
observe same performance of distant hosts
• Sharing measurements implies redundant
probes can be eliminated
Mark Stemm
29
Benefits of Passive Measurements
Mark Stemm
30
Design of SPAND
Mark Stemm
31
Design of SPAND
• Modified Clients
– Make Performance Reports to Performance Servers
– Send Performance Requests to Performance Servers
• Performance Servers
– Receive reports from clients
– Aggregate/post process reports
– Respond to requests with Performance Responses
• Packet Capture Host
– Snoops on local traffic
– Makes Performance Reports on behalf of unmodified clients
Mark Stemm
32
Design of SPAND
33
Design of SPAND
• Applications Classes
– Way in which an application uses the network
– Examples:
» Bulk transfer: uses flow control, congestion control,
reliable delivery
» Telnet: uses reliability
» Real-time: uses flow control and reliability
– (Addr, Application Class) is target of a Performance
Request/Report
Mark Stemm
34
Issues
• Accuracy
– Is net performance stable enough to make meaningful
Performance Reports?
– How long does it take before the system can service the
bulk of the Performance Requests?
– In steady state, what % of Performance Requests does
the system service?
– How accurate are Performance Responses?
• Stability
– Performance results must not vary much with time
• Implications of Connection Lengths
– Short TCP connections dominated by round trip time; long
connections by available bandwidth
Mark Stemm
35
Application of SPAND:
Content Negotiation
Web pages look good
on server LAN
Mark Stemm
36
Implications for Distant Access,
Overwhelmed Servers
Mark Stemm
37
Content Negotiation
Mark Stemm
38
Client-Side Negotiation Results
Mark Stemm
39
Server-Side Dynamics
Mark Stemm
40
Server-Side Negotiation:
Results
Mark Stemm
41
Content Negotiation Results
• Network is the bottleneck for clients and
servers
• Content negotiation can reduce download times
of web clients
• Content negotiation can increase throughput
of web servers
• Actual benefit depends on fraction of
negotiable documents
Mark Stemm
42
Outline
•
•
•
•
Introspection Concept and Methods
SPAND Content Level Adaptation
MIT Congestion Manager/TCP Layer Adaptation
ICAP Cache-Layer Adaptation
43
Congestion Manager
(Hari@MIT, Srini@CMU)
• The Problem:
– Communications flows compete for same limited bandwidth resource
(especially on slow start!), implement own congestion response, no shared
learning, inefficient, within end node
• The Power of Shared Learning and Information Sharing
f1
f2
Server
f(n)
Internet
Client
44
Adapting to Network
f1
?
Internet
Server
Client
• New applications may not use TCP
– Implement new protocol
– Often do not adapt to congestion: not “TCP-friendly”
Need system that helps applications learn
and adapt to congestion
45
State of Congestion Control
• Increasing number of concurrent flows
• Increasing number of non-TCP apps
Congestion Manager (CM): An end-system
architecture for congestion management
46
The Big Picture
HTTP
Per-macroflow
statistics
(cwnd, rtt, etc.)
Congestion
Manager
TCP1
Audio
Video1
TCP2
Video2
UDP
API
IP
All congestion management tasks performed in CM
Applications learn and adapt using API
47
Problems
• How does CM control when and whose
transmissions occur?
– Keep application in control of what to send
• How does CM discover network state?
– What information is shared?
– What is the granularity of sharing?
Key issues: API and information sharing
48
The CM Architecture
Applications
(TCP, conferencing app, etc)
API
Congestion
Controller
Prober
Sender
Scheduler
CM
protocol
Congestion
Detector
Responder
Receiver
49
Feedback about Network State
• Monitoring successes and losses
– Application hints
– Probing system
• Notification API (application hints)
50
Probing System
• Receiver modifications necessary
– Support for separate CM header
– Uses sequence number to detect losses
– Sender can request count of packets received
• Receiver modifications detected/negotiated via
handshake
– Enables incremental deployment
IP header
IP payload
CM header
IP payload
51
Congestion Controller
• Responsible for deciding when to send a packet
• Window-based AIMD with traffic shaping
• Exponential aging when feedback low
– Halve window every RTT (minimum)
• Plug in other algorithms
– Selected on a “macro-flow” granularity
52
Scheduler
• Responsible for deciding who should send a packet
• Hierarchical round robin
• Hints from application or receiver
– Used to prioritize flows
• Plug in other algorithms
– Selected on a “macro-flow” granularity
– Prioritization interface may be different
53
Sequence number
CM Web Performance
TCP Newreno
With CM
Time (s)
CM greatly improves
predictability
and consistency
54
Layered Streaming Audio
Sequence number
450
400
350
Competing TCP
300
TCP/CM
250
200
150
Audio/CM
100
50
0
0
5
10
15
Time (s)
20
25
Audio adapts to available bandwidth
Combination of TCP & Audio compete equally with normal TCP
55
Congestion Manager Summary
• CM enables proper & stable congestion behavior
• Simple API enables app to learn/adapt to network
state
• Improves consistency/predictability of net xfers
• CM provides benefit even when deployed at senders
alone
56
Outline
•
•
•
•
Introspection Concept and Methods
SPAND Content Level Adaptation
MIT Congestion Manager/TCP Layer Adaptation
ICAP Cache-Layer Adaptation
57
How Internet Content is
Delivered Today
Server
farm
database
Boston
Internet
New York
English
Spanish
mainframe
component
solution
Last Mile
Access
Broadband
Connections
cable modems
DSL, dial-up,
wireless
Internet
Caching & Internet
Content Delivery
localizes content
Centralized
Servers
Applications &
Multiple versions
of content are
centralized
58
What is iCAP?
• iCAP lets clients send HTTP messages to
servers for “adaptation”
– In essence, an “RPC” mechanism (Remote Procedure Call)
for HTTP messages
• An adapted message might be a request:
– Modify request method, URL being requested, etc.
• ...or, it might be a reply
– Change any aspect of delivered content
• iCAP enables edge services
59
What iCAP is not
-for now
• A way to specify adaptation policy
• A configuration protocol
• A protocol that establishes trust between
previously unrelated parties
• In other words:
ICAP defines the how, not who, when or why
60
iCAP Makes Content Smarter!
iCAP Enables Local Services
Congested, Slow, Distant,
and/or Expensive Link
Clients
ISP
Internet
Network
(Large Backbone ISP)
To Server
Farms
Content distribution
network, or Cache
Local sources of content: Better for everyone
(client, network, server)
61
Ad insertion
Why iCAP?
Language
Translator
Virus Checker
Web Server
or Proxy
Legend: iCAP servers for
Compute-Intensive
Operations
Transcoder
Content Filter
• Fast, Simple, Scalable
• Allows services to be customized
62
ICAP Benefits
• Very simple operation
– ICAP builds on HTTP GET and POST
• No proprietary APIs Required
• Standards-based
• Leverages the latest Internet infrastructure
developments
• Fast, simple, scalable, and reliable
• Allows you to customise services
63
iCAP general design
• Simple, simple, simple: CGI should be able to
turn a web server to an ICAP server
• Based on HTTP (+special headers)
• Three modes:
– Modify a request
– Satisfy a request (like any other proxy)
– Modify a response
64
Request Modification
• The request is passed to the ICAP server
(almost) unmodified; just like a proxy would
• The ICAP server sends back a modified
request, encapsulated in response headers
– Body, if any (e.g. POST), may also be modified
65
Request Modification
6
Client
1
Proxy cache
5
iCAP client
2
3
4
Origin
server
ICAP
Server
ICAP server modifies a request;
modified request continues on its
way
66
Response Modification
• ICAP client always uses POST to send body
• Encapsulated in POST headers may also be:
– Headers used by user to request the object
– Headers used by origin server in its reply
• ICAP server replies with modified content
67
Response Modification
Client
6
3
1
2
Proxy Cache
(ICAP Client)
4
Origin Server
5
ICAP
Server
ICAP server modifies a response from an origin server;
might be once, as the object is cached, or once per client served
68
Request Satisfaction
6
1
Client
Proxy Cache
(ICAP Client)
2
ICAP
Server
5
Origin Server
4
3
ICAP server satisfies a request just like a proxy;
origin server MAY be contacted by ICAP server (or not)
69
Infinite Variations
• Allows innovations. You choose 3rd party
applications.
• iCAP enables many different kinds of apps!
– Edge content sources can pass pages to ad servers
– Expensive operations can be offloaded
– Content filters can respond either with an unmodified
request or HTML (“Get Back to Work!”)
70
Next Steps
• ICAP Supporters continue to enhance protocol
– Learn from solutions and fix “bugs”
– Build future functionality later
• IETF
– ICAP Forum will submit the specification to IETF for
draft RFC status in mid 2000
• Additional partners
– Software developers, infrastructure companies and
Internet content delivery service providers will be
solicited for participation
– Need to get on same page
71
More on
– Important iCAP info at this site http://www.i-cap.org.
– Become an iCAP Participant by sending an e-mail to
mailto:[email protected]. A reply will be sent outlining
requirements.
72