Download IPAM-topology - Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of wireless community networks by region wikipedia , lookup

Transcript
New Directions in Internet Topology
Measurement and Modeling
John Byers
Topology Modeling Group, Boston University
CS: Mark Crovella, Marwan Fayed, Anukool Lakhina,
Ibrahim Matta, Alberto Medina
Physics: Paul Krapivsky, Sid Redner
Statistics: Eric Kolaczyk
Some observations about the Internet
• Rapid, decentralized growth:
– 90% of Internet systems were added in the last four
years
– Connecting to the network can be a purely local operation
• This rapid, decentralized growth has opened
significant questions about the physical structure
of the network; e.g.,
–
–
–
–
–
The number of hosts connected to the network
The properties of network links (delay, bandwidth)
The interconnection pattern of hosts and routers
The interconnection relationships of ISPs
The geographic locations of hosts, routers and links
Approaching the Internet Scientifically
• Engineering or Science?
Engineering: study of things made
Science: study of things found
• Although the Internet is an engineered
artifact, it now presents us with questions
that are better approached from a scientific
posture.
• Worthwhile scientific goals
– Understand what drives Internet growth
– Basic investigations pay off in unexpected ways
Talk Organization
• A brief retrospective.
• Towards a scientific understanding of the
Internet.
• Specific directions:
– Geometry/geography-driven topology generation.
• Where are the nodes/links in the Internet (recap)
• What is the geographical extent of ASes?
– Measuring and modelling the time-evolution of AS sizes.
Case Study: “Origins” paper, [MMB ‘00]
• Goal: Causal explanation for then unexplained
power-laws in [FFF ‘99].
• Our hypothesis: Simple Barabasi-Albert model of
incremental growth, preferential attachment.
• Model led to topologies which fit known metrics…
• BUT
– How to validate this explanation?
– [FFF ‘99] snapshots inadequate for testing hypotheses
about time-evolution of the system.
– In fact, no adequate set of measurements available.
• Also, much easier to invalidate than validate.
Case Study: “Origins” paper, [MMB ‘00]
• Not an entirely wasted effort...
• Some positive outcomes:
– BRITE/BRIANA topology generation framework /
analysis engine for testing wide assortment of models.
– Motivation to focus on modeling problems where
validation was possible…
– … or better yet, to start from measurements themselves.
• And some future considerations:
– Do topology models need to be explanatory, or just
descriptive?
– How to place value on a model that cannot be validated?
Our current approach
• Measurement:
– understand topological features from direct study
– leverage measurements from others as possible
– measurements of time-evolution of a system are
especially helpful
• Characterization and Modeling
• Validation: provide empirical confirmation that
model predictions fit measured data
• Tool-Building: build our models / other models
into BRITE (open source).
Breathing life into topology generation
• Raw topology analogous to a skeleton
– presents coarse structure, but incomplete, inanimate
– inadequate for conducting most simulations
• Flesh out by building annotated graphs:
–
–
–
–
Label nodes with autonomous system (AS) ID’s.
Label edges with link bandwidths.
Label edges with latencies.
Do this in a representative manner.
• Animate the topology:
– Generate representative traffic workloads across the
annotated graph.
– Consider other dynamic factors (churn, link failures)
• Now we’re ready to conduct a simulation.
One primary direction: Geography
• Long-term goal: annotated graph generation:
– Label nodes with autonomous system (AS) ID’s.
– Label edges with link bandwidths and latencies.
• How?
– All of the problems seem more tractable when we
consider the underlying geometry of the network.
• But next to nothing is known about the geometry/
geography of today’s Internet.
– Geographic extent of ASes?
– Distribution of link lengths? Inter-AS link lengths?
• Our first step (now complete): measurements.
Where is the Internet?
?
?
?
?
?
Assumptions and Definitions
• We treat the Internet as an undirected
graph embedded on the Earth’s surface
–
–
–
–
Nodes correspond to routers or interfaces
Edges correspond to physical router-router links
Routers associated with an administering AS
Not concerned with hosts (end systems)
• We will ignore many higher and lower level
questions
Our Basic Approach
• Obtain IP-level router maps
Mercator and Skitter
• Find geographic location of each router
Ixia’s IxMapping; Akamai’s EdgeScape
• Identify AS associated with each router
RouteViews
Mercator: Govindan et al., USC/ISI, ICSI
• Based on active probing from a single site
• Resolves aliases
• Uses loose source routing to explore alternate
paths
Skitter: Moore et al., CAIDA
• Traceroutes from 19 monitors to large set of
destinations
• Does not resolve aliases
• Destinations attempt to cover IP address space
Datasets
Mercator
• Collected August 1999
• 228,263 routers
• 320,149 links
Skitter
• Collected January 2002
• 704,107 interfaces
• 1,075,454 links
IxMapping: Moore et al., CAIDA
• Given an IP address, infers geographic location
based on a variety of heuristics
– Hostnames, DNS LOC, whois
e.g., 190.ATM8-0-0.GW3.BOS1.ALTER.NET is in Boston
• Able to map over 98% of routers/interfaces
• Similar to GeoTrack [Padmanabhan] which exhibits
reasonable accuracy
– Median error of 64 mi
– 90% queries within 250 mi
EdgeScape: Akamai
• Given an IP address, returns lat./long.
• Methods employed not currently published;
available as a commercial service.
• Claims mean error of < 50 miles
• Able to map over 99% of routers/interfaces
RouteViews
• Provides daily BGP table snapshots
• For each of the router/interface inventories, we
pull a BGP snapshot from the same date.
• Then, for each interface, infer the associated AS
by the AS advertising the containing block.
• For routers with multiple interfaces, use the
majority vote; discard if there is no majority vote
(2% of all routers).
Where are the routers?
USA
Europe
Interfaces and People: USA, Skitter
Grid size: ~90 mi x 90 mi
Routers and People
Upper, Mercator; Lower, Skitter
USA
Europe
Japan
Router Location: Summary
• Router location is strongly driven by
population density
• Superlinear relationship between router
and population density:
R  k Pa
k varies with economic development (users online)
a is greater than one (1.2 - 1.7)
• More routers per person in more densely populated
areas
Link Preference Function
• Interested in influence of distance on link
formation:
f(d) = P[C|d]
i.e., Probability two nodes separated by
distance d are directly connected by a link.
• Estimated as:
number links of length d
f(d) = ------------------------------------------number of router pairs separated by d
f(d) for USA (Skitter)
Distance Sensitive
Distance Insensitive
Link Distance Preference for USA
Skitter, d < 250, semi-log plot
L  140 mi.
Large d: distance insensitivity
USA data, Skitter
d
F(d) =  f(u)
u=1
Link Formation: Summary
• Link formation seems to be a mixture of
distance-dependent and –independent
processes
• Waxman (exponential) model remarkably
good for large fraction of all links!
– But, crucial difference is that we are using a
very irregular spatial distribution of nodes
• Small fraction of non-local links are very
important (structural)
Where are the ASes?
• Two measures:
– # of distinct locations (grid cells)
– area of the convex hull of the set of distinct
locations
• Computing convex hull
– cut earth along Int’l Date Line and unroll
– use Albers equal area projection to
approximately preserve areas
AS Findings (1)
• [TDGJSW ‘01] Size of an AS (in routers)
and AS degree are well correlated.
• We find 3-way correlation between size,
degree and # of distinct locations.
• Distribution of number of locations is longtailed, highly variable.
AS Findings (2)
• 80% of ASes have 0 area -- two or fewer
distinct locations.
• The rest of the ASes fall in two regimes:
– small ASes have considerable variability
– largest ASes are fully dispersed
– cutoff: degree > 100 or interfaces > 1000.
• size, degree and # of distinct locations.
Direction 2: AS Size Distribution
• Goal: Model the growth and evolution of
ASes and their sizes.
• Bonus: RouteViews BGP logs may later help
validate model predictions.
• Hosts enter the system and either:
– Create a new AS or
– Join an existing AS
• At each timestep, a pair of ASes may also
merge.
The Simplest Plausible Model
• N(t) = number of Ases
• M(t) = number of hosts
dN/dt = (q-r)N
dM/dt = pM + qN
• where q is the rate of new AS creation
• r is the rate of AS coalescence
• p is the rate of creation of new nodes
• Relative values of p, q and r determine
average AS size.
• When p > q - r, average AS size grows as
N^((p-q+r)/(q-r)).
Preliminary Findings
• Model behavior tractable to analyze.
• AS births, deaths and mergers can be
identified with some degree of confidence
from RouteViews logs.
– But… differentiating BGP churn from bona fide
events can be challenging
– Statistical de-noising methods may apply (?)
• Simple model makes reasonable predictions
– But… coalescence kernel needs fine-tuning, i.e.
measurements indicate that r is not size-indep.
In Conclusion
• Generating test networks rather than test
topologies is a natural next step.
• Geometry/geography provides leverage.
• Plenty of unexplored territory.
• Validation and measurement continues to
be underappreciated.
• Measurements of time-evolving systems
are in particularly short supply.
• Modeling problems can be a bonanza for
statisticians and physicists.