Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Trends in Internet Measurement Fall, 2004 Paul Barford Assistant Professor Computer Science University of Wisconsin Motivation • The Internet is gigantic, complex, and constantly evolving – Began as something quite simple • Infrequent use of “scientific method” in network research – Historical artifact – Lack of inherent measurement capability – Decentralization and privacy concerns • Recognition of importance of empirically-based research – Critical trend over past five years (Internet Measurement Conf.) • Good research hypothesis + good data + good analysis = good research results – Focus of this talk: “good data” - where we’ve been and where we’re going wail.cs.wisc.edu 2 In the beginning… • Measurement was part of the original Arpanet in ’70 – Kleinrock’s Network Measurement Center at UCLA – Resources in the network were reserved for measurement • Formation of Network Measurement Group in ’72 – Rfc 323 – who is involved and what is important • First network measurement publication in ’74 – “On Measured Behavior of the ARPA Network,” Kleinrock and Taylor • No significant difference between operations a research – Size kept things tractable wail.cs.wisc.edu 3 From ARPAnet to Internet • In the 80’s, measurement-based publications increased – “The Experimental Literature of the Internet: An Annotated Bibliography”,J. Mogul, ’88. • Rfc 1262 – Guidelines for Internet Measurement Activities, 1991 – V. Cerf, “Measurement of the Internet is critical for future development, evolution and deployment planning.” • What happened? • “On the Self-Similar Nature of Ethernet Traffic”, Leland et al., ‘94. – Novel measurement combined with thorough analysis – A transition point between operational and research measurement (?) wail.cs.wisc.edu 4 Gold in the streets in the 90’s • Lots of juicy problems garnered much attention in 90’s – Transport, ATM, QoS, Multicast, Lookup scalability, etc. • The rise of simulation (aaagggghhhhh!!!!) • Measurement activity didn’t die… – Research focus on Internet behavior and structure • • • • Self-similarity established as an invariant in series of studies Paxson’s NPD study from ’93 to ’97 Routing (BGP) studies by Labovitz et al. Structural properties (the Internet as a graph) by Govindan et al. – Organizations focused on measurement • National Laboratory for Applied Network Research (‘95) • Cooperative Association for Internet Data Analysis (‘97) wail.cs.wisc.edu 5 Measurement must be hard… • Well, not really…lot’s of folks are measuring the Internet – See CAIDA or SLAC pages – Business get paid to measure the Internet • Lot’s of tools are available for Internet measurement – See CAIDA and SLAC pages – Dedicated hardware – Public infrastructures wail.cs.wisc.edu 6 So, what’s the problem? • “Strategies for Sound Internet Measurement,” Paxson ‘04. – Lack consistent methods for measurement-based experiments – Problems faced in other sciences years ago • Issues of scale in every direction – What is representative? – HUGE, HIGH-DIMENSION date sets make things break • Disconnect between measurements for operations and measurements for research – Operational interests: SLA’s, billing, privacy, … – Research interests: network-wide properties wail.cs.wisc.edu 7 Current measurement trends 1. Open end host network measurement infrastructures • Available for a variety of uses 2. Large public data repositories – Domain specific – Suitable for longitudinal studies 3. Network telescope monitors • Malicious traffic 4. Laboratory-based testbeds • Bench environments 5. Standard anonymization methods • Address privacy concerns wail.cs.wisc.edu 8 End host infrastructures • Paxson’s NPD study; an end-host prototype – Accounts on 35 systems distributed throughout the Internet – Active, end-to-end measurement focus • National Internet Measurement Infrastructure (NIMI) and others evolved from NPD – Perhaps a bit too ambitious at the time • Today’s end host infrastructure “success story”: Planetlab wail.cs.wisc.edu 9 PlanetLab overview • Collaboration between Intel, Princeton, Berkeley, Washington, others starting in early ‘02 • Began as a distributed, virtualized system project – Peer-to-peer overlay systems were getting hot – Applications BOF at SIGCOMM ‘02 had only 6/80 people • Systems were donated to an initial set of sites in ‘02 – Most major universities and Abilene POPs • Available to members who host systems • Developers have done a fine job creating a management environment – Isolates individual experiments from each other wail.cs.wisc.edu 10 PlanetLab sites QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 449 nodes at 209 sites: source www.planet-lab.org wail.cs.wisc.edu 11 End host infrastructures & SP • End host infrastructures are primarily for active measurement – Generate probes and measure responses • Problem domains – – – – Network structure via tomography Network distance estimation End-to-end resource estimation End-to-end packet dynamics wail.cs.wisc.edu 12 Large public data repositories • First data repository - Internet Traffic Archive (LBL) – Hodgepodge of traces from various projects • Current projects are more focused – Passive Measurement and Analysis Project – Packet traces from high performance monitors – Abilene Observatory – Flow traces from the Internet2 backbone routers – Route views/RIPE – BGP routing updates from ~150 networks – Datasets for network security – DHS project focused on making attack and intrusion data available for research wail.cs.wisc.edu 13 Data repositories & SP • Most of the data in aforementioned repositories was gathered via passive means – Counters/logs on devices – Installed instrumentation – Configuration to measure specific traffic (BGP) • Problem domains – Anomaly detection – Traffic dynamics – Routing dynamics wail.cs.wisc.edu 14 Network telescopes • Simple observation 1: number globally routed IP addresses <> number of live hosts – Network address translation – Networks (ranges of IP addresses) are routed • Simple observation 2: traffic to/from standard services should only arrive at live hosts – Misconfigurations and malicious traffic are the exceptions • Network telescope = traffic monitor on routed but otherwise unused IP addresses – This traffic is otherwise usually dropped at border router wail.cs.wisc.edu 15 So, what’s the point? • Bad guys don’t know which IP addresses in a network a live – Random and systematic scanning commonly used – Spoofed source addresses are used in DoS attacks – Misconfigurations are fairly rare • Ergo, network telescopes can provide important perspective on malicious traffic – Most importantly, a clean signal • Implementation is fairly simple – Honeypots of live systems or honeypot specific monitors wail.cs.wisc.edu 16 What do we see? • “Characteristics of Internet Background Radiation,” Yegneswaran et al., ‘04. – Active monitors (small, medium, large) at 3 sites • Traffic is dominated by activity on common services – Worms and probes targeting HTTP and NetBIOS – The focus of our study • Traffic is highly variable and diverse – Perspectives from 3 monitors are quite different • Traffic mutates rapidly • Much deeper analysis is necessary wail.cs.wisc.edu 17 Network telescopes & SP • An emerging, rich source of data • Network security is critically important • Problem domains – – – – Outbreak and attack detection Collaborative monitoring Dynamic quarantine (Misconfiguration analysis) wail.cs.wisc.edu 18 Laboratory-based testbeds • Most scientific disciplines commonly use bench environments to conduct research – Control – Instrumentation – Repeatability • Network research community has relied on analytic modeling, simulation and empirical measurement • Openly available bench environments for network research are emerging – EMULAB at Utah - collection of end hosts – Wisconsin Advanced Internet Lab - collection of routers and end hosts wail.cs.wisc.edu 19 Laboratory testbeds & SP • The effectiveness of bench research hinges on research hypothesis and experimental design – Aspects of scale (emergent behavior) are difficult to capture • Problem domains – – – – Inference tool analysis Protocol (implementation) analysis Anomaly detection Network system evaluation wail.cs.wisc.edu 20 Data anonymization • Lots of people measure, most are scared s*!#less about sharing data – This is a legal issue – No standards (sure payloads are off limits, but addresses?) – Don’t ask, don’t tell? • The community is developing tools for trace anonymization – “A High-Level Programming Environment for Packet Trace Anonymization and Transformation,” Pang et al., ‘03. – Prefix preserving address anonymization – Payload hashing • Probably no direct SP application – But, implications vis-à-vis future data availability wail.cs.wisc.edu 21 Summary • Trends over past 30 years – Divergence of research and operations – Decline of importance of measurement in research – Empirical study of the Internet as an artifact • Current trends – – – – Rise of measurement as a discipline Open infrastructures and network testbeds Large-scale domain specific data repositories Novel measurement methods • Future trends – ?? – Embedded measurement systems wail.cs.wisc.edu 22