Download Large-scale Internet measurement

Large-scale Internet measurement Laki, Sándor Created by XMLmind XSL-FO Converter. Large-scale Internet measurement írta Laki, Sándor Publication date 2015 Szerzői jog © 2015 Laki Sándor Created by XMLmind XSL-FO Converter. Tartalom Large-scale Internet measurement ...................................................................................................... 1 1. 1 Introduction to Internet measurements ............................................................................... 1 1.1. Course Information ................................................................................................... 1 1.2. Grading ..................................................................................................................... 1 1.3. Term project ............................................................................................................. 2 1.4. What is this course about? ........................................................................................ 2 1.5. Reading ..................................................................................................................... 3 1.6. INTRODUCTION .................................................................................................... 4 1.7. Once upon a time... ................................................................................................... 4 1.8. And no... ................................................................................................................... 5 1.9. Another aspect of Internet evolution ........................................................................ 6 1.10. Today’s Internet ...................................................................................................... 8 1.11. Why do we need Internet measurements? .............................................................. 8 1.12. Why do we need Internet measurements? .............................................................. 9 1.13. What to measure? ................................................................................................... 9 1.14. Why is it challenging to measure the Internet? ....................................................... 9 1.15. Core simplicity ....................................................................................................... 9 1.16. Layered architecture and hidden network elements .............................................. 10 1.17. IP centric ............................................................................................................... 10 1.18. Middleboxes in the carriers’ networks .................................................................. 10 1.19. Administrative boundaries .................................................................................... 11 1.20. Applications .......................................................................................................... 11 1.21. Network measurements ........................................................................................ 11 1.22. Infrastructure measurements ................................................................................. 11 1.23. Traffic measurements ........................................................................................... 12 1.24. Application measurements .................................................................................... 12 1.25. Active and passive measurements ........................................................................ 12 1.26. Internet Measurements .......................................................................................... 13 1.27. Related Conferences and Journals ........................................................................ 13 2. 2 Analytical background ..................................................................................................... 13 2.1. Analytical background ............................................................................................ 13 2.2. LINEAR ALGEBRA .............................................................................................. 14 2.3. Notations ................................................................................................................. 14 2.4. Norms and orthogonality ........................................................................................ 14 2.5. Matrices .................................................................................................................. 14 2.6. Eigenvectors and eigenvalues ................................................................................. 15 2.7. Alternate algebras ................................................................................................... 15 2.8. PROBABILITY AND STATISTICS ..................................................................... 15 2.9. Why do we need statistics and probability theory? ................................................. 15 2.10. Notations ............................................................................................................... 16 2.11. Definitions ............................................................................................................ 16 2.12. Definitions - II ...................................................................................................... 16 2.13. Expected values and moments .............................................................................. 16 2.14. Variance and standard deviation ........................................................................... 16 2.15. Joint probability .................................................................................................... 16 2.16. Conditional probability ......................................................................................... 16 2.17. Central limit theorem ............................................................................................ 16 2.18. Distributions for Internet measurements ............................................................... 16 2.19. Stochastic processes ............................................................................................. 17 2.20. Stochastic processes ............................................................................................. 18 2.21. Stochastic processes ............................................................................................. 18 2.22. Characterization of a stochastic process ............................................................... 18 2.23. Simpler stationary conditions ............................................................................... 18 2.24. Measures of dependence ....................................................................................... 18 2.25. Measures of dependence ....................................................................................... 18 2.26. Measures of dependence ....................................................................................... 19 iii Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.27. Modeling network traffic and user activity ........................................................... 2.28. Modeling network traffic and user activity ........................................................... 2.29. Short and long tailed distributions ........................................................................ 2.30. Short and long tailed distributions ........................................................................ 2.31. Short and long tailed distributions ........................................................................ 2.32. Heavy tailed/power-law distribution .................................................................... 2.33. Heavy tailed distribution ....................................................................................... 2.34. Measured data ....................................................................................................... 2.35. Describing data ..................................................................................................... 2.36. More detailed descriptions .................................................................................... 2.37. Histogram ............................................................................................................. 2.38. Empirical cumulative distribution function (CDF) ............................................... 2.39. Categorical data description ................................................................................. 2.40. Describing memory and stability .......................................................................... 2.41. High variability in Internet data ............................................................................ 2.42. Zipf’s law .............................................................................................................. 2.43. GRAPH THEORY ............................................................................................... 2.44. Graph theory ......................................................................................................... 2.45. Graphs ................................................................................................................... 2.46. Subgraphs ............................................................................................................. 2.47. Connected graphs ................................................................................................. 2.48. Metrics for characterization .................................................................................. 2.49. Metrics for characterization .................................................................................. 2.50. Matrix representation ............................................................................................ 2.51. Applications of Routing Matrix ............................................................................ 2.52. Applications of routing matrix .............................................................................. 2.53. Artificial graph constructions ............................................................................... 2.54. Erdős-Rényi random graph ................................................................................... 2.55. Erdős-Rényi random graph ................................................................................... 2.56. Generalized random graph .................................................................................... 2.57. Preferential attachment model .............................................................................. 2.58. Preferential attachment model .............................................................................. 2.59. Regular vs Random graphs ................................................................................... 2.60. AS level topology ................................................................................................. 2.61. AS level topology ................................................................................................. 2.62. AS level topology ................................................................................................. 2.63. MODELING ......................................................................................................... 2.64. Measurement and modeling .................................................................................. 2.65. Descriptive data model ......................................................................................... 2.66. Constructive data model ....................................................................................... 2.67. Data model ............................................................................................................ 2.68. Why build models ................................................................................................. 2.69. Probability models ................................................................................................ 3. 3 Network measurement infrastructures ETOMIC and SONoMA ..................................... 3.1. Why Internet experimental facilities are needed? ................................................... 3.2. Existing TestBeds and Network Measurement Infrastructures ............................... 3.3. Lifecycle of network measurements ....................................................................... 3.4. ETOMIC ................................................................................................................. 3.5. The ETOMIC system .............................................................................................. 3.6. System architecture ................................................................................................. 3.7. Evolution of measurement nodes ............................................................................ 3.8. ETOMs ................................................................................................................... 3.9. APE boxes .............................................................................................................. 3.10. Measurement boxes .............................................................................................. 3.11. Central Management System ................................................................................ 3.12. Slices VS Unique timeslots .................................................................................. 3.13. The ETOMIC system ............................................................................................ 3.14. One day on the Internet ......................................................................................... 3.15. Experimental use cases in ETOMIC ..................................................................... 3.16. HOW TO USE ETOMIC? .................................................................................... iv Created by XMLmind XSL-FO Converter. 19 19 19 19 19 19 19 20 21 21 22 22 23 24 24 25 25 25 26 26 26 26 26 26 27 27 28 28 28 29 29 29 29 29 30 30 31 31 31 32 32 32 32 33 33 33 34 34 34 35 36 36 37 37 37 38 39 43 43 45 Large-scale Internet measurement 3.17. Performing an experiment from the system’s perspective .................................... 3.18. Measurement types ............................................................................................... 3.19. Necessary steps for submitting an experiment ...................................................... 3.20. Creating a bundle .................................................................................................. 3.21. Creating an experiment and querying its status .................................................... 3.22. Downloading the results ....................................................................................... 3.23. Programming DAG cards ..................................................................................... 3.24. PUBLISHING DATA .......................................................................................... 3.25. Experimental facilities .......................................................................................... 3.26. Traditional approach ............................................................................................. 3.27. Sharing science ..................................................................................................... 3.28. Related work: CAIDA/DatCat .............................................................................. 3.29. Related work: MoMe database ............................................................................. 3.30. Related work: MAWI repository .......................................................................... 3.31. Data publication efforts ........................................................................................ 3.32. Key ideas in data handling .................................................................................... 3.33. VO approach ......................................................................................................... 3.34. Unified interface ................................................................................................... 3.35. Casjobs User Interface for accessing data ............................................................ 3.36. SONOMA ............................................................................................................. 3.37. SONoMA v1.0 ...................................................................................................... 3.38. Why do we need another network measurement platform? .................................. 3.39. SONoMA .............................................................................................................. 3.40. System components .............................................................................................. 3.41. Management Layer ............................................................................................... 3.42. Measurement methods .......................................................................................... 3.43. Web client ............................................................................................................. 3.44. Case study: A full mesh topology measurement ................................................... 3.45. Case study: A full mesh topology measurement ................................................... 3.46. What happens in the background? A full mesh topology measurement ............... 3.47. Another use case: Spotter ..................................................................................... 3.48. SONoMA 2.0 ........................................................................................................ 3.49. Literature .............................................................................................................. 4. 4 Network measurement infrastructures PlanetLab ............................................................ 4.1. PlanetLab ................................................................................................................ 4.2. The main goal ......................................................................................................... 4.3. What is PlanetLab? ................................................................................................. 4.4. PlanetLab architecture ............................................................................................ 4.5. Slices ...................................................................................................................... 4.6. Slices ...................................................................................................................... 4.7. Slices ...................................................................................................................... 4.8. User Opt-in ............................................................................................................. 4.9. Services running in your slice ................................................................................. 4.10. Services running in your slice ............................................................................... 4.11. Services running in your slice ............................................................................... 4.12. Services running in your slice ............................................................................... 4.13. Services running in your slice ............................................................................... 4.14. Virtualization solutions ......................................................................................... 4.15. VServers in a PlanetLab node ............................................................................... 4.16. VServers in a PlanetLab node ............................................................................... 4.17. Low-level network access ..................................................................................... 4.18. Getting started ....................................................................................................... 4.19. Create your SSH Key ........................................................................................... 4.20. Create your slice ................................................................................................... 4.21. Login to your slice ................................................................................................ 4.22. Install additional packages .................................................................................... 4.23. Deploying your app .............................................................................................. 4.24. Configuring a server for automatic startup ........................................................... 4.25. Other useful tools .................................................................................................. 4.26. PSSH .................................................................................................................... v Created by XMLmind XSL-FO Converter. 46 46 48 49 49 49 49 50 50 50 51 52 52 53 53 54 54 54 54 55 55 55 56 56 57 57 57 58 59 59 60 60 61 61 61 61 62 62 62 62 62 62 62 63 64 65 66 67 68 68 69 69 69 70 70 70 71 71 71 71 Large-scale Internet measurement 4.27. PSSH Demo .......................................................................................................... 72 4.28. PlanetLab Slice Deploy Toolkit ............................................................................ 72 4.29. vxargs ................................................................................................................... 72 4.30. Nixes Tool Set ...................................................................................................... 72 4.31. Long-Running Services In PlanetLab ................................................................... 73 4.32. Services (cont) ...................................................................................................... 73 4.33. Services (cont) ...................................................................................................... 74 4.34. Further available testbeds with PlanetLab Europe account .................................. 74 4.35. NITOS Wireless Testbed ...................................................................................... 74 4.36. w-iLab.t ................................................................................................................ 75 5. 5 Network measurement infrastructures FEDERICA, SFA, OpenFlow ............................. 75 5.1. Federica .................................................................................................................. 76 5.2. Federica .................................................................................................................. 76 5.3. The physical topology ............................................................................................ 76 5.4. Core elements ......................................................................................................... 77 5.5. SFA – SLICE-BASED FACILITY ARCHITECTURE ......................................... 77 5.6. Slice-based Facility Architecture SFA ................................................................... 77 5.7. Slice-based Facility Architecture SFA ................................................................... 78 5.8. Experiment lifetime in general ............................................................................... 78 5.9. What can SFA help with? ....................................................................................... 78 5.10. SFA for federated testbeds .................................................................................... 79 5.11. SFA for federated testbeds .................................................................................... 81 5.12. SFA – Available resources ................................................................................... 82 5.13. SFA functionalities ............................................................................................... 83 5.14. Hierarchical naming ............................................................................................. 83 5.15. Authentication ...................................................................................................... 85 5.16. SFA API ............................................................................................................... 86 5.17. SFA Components .................................................................................................. 86 5.18. Resource Specification (RSpec) Documents ........................................................ 87 5.19. SFI and SFA client ............................................................................................... 88 5.20. Installation and configuration ............................................................................... 88 5.21. List records from the registry ............................................................................... 89 5.22. Detailed record information .................................................................................. 89 5.23. Get resources ........................................................................................................ 90 5.24. Get resources ........................................................................................................ 90 5.25. Allocate resources for a given slice ...................................................................... 91 5.26. Allocate resources for a given slice ...................................................................... 91 5.27. Deallocate resources ............................................................................................. 92 5.28. OPENFLOW CAPABILITIES IN PLANETLAB EUROPE ............................... 92 5.29. What is the problem with existing networks? ....................................................... 92 5.30. What is the problem with existing networks? ....................................................... 92 5.31. Software Defined Networking .............................................................................. 93 5.32. OpenFlow ............................................................................................................. 93 5.33. OpenFlow ............................................................................................................. 94 5.34. Plumbing primitives ............................................................................................. 95 5.35. Network OSes ....................................................................................................... 95 5.36. OpenFlow support in PlanetLab ........................................................................... 95 5.37. How to use it in PlanetLab? .................................................................................. 96 5.38. How to use it in PlanetLab? .................................................................................. 96 5.39. Create the topology ............................................................................................... 97 5.40. Create the topology ............................................................................................... 97 5.41. Modify the topology ............................................................................................. 98 5.42. Literature .............................................................................................................. 99 6. 6 Bandwidth measurement methods Network path characterization .................................. 99 6.1. Methods to measure path characteristics ................................................................ 99 6.2. Capacity ................................................................................................................ 101 6.3. Available bandwidth ............................................................................................. 101 6.4. Capacity and Available Bandwidth ...................................................................... 101 6.5. Passive Techniques ............................................................................................... 102 6.6. Active probing methods ........................................................................................ 102 vi Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.7. Basic ideas ............................................................................................................ 102 6.8. State of the art Bandwidth estimation methods .................................................... 103 6.9. SLoPS Self-Loading Periodic Streams ................................................................. 104 6.10. SLoPS Self-Loading Periodic Streams ............................................................... 105 6.11. SLoPS ................................................................................................................. 105 6.12. SLoPS ................................................................................................................. 105 6.13. OWD variations .................................................................................................. 106 6.14. How it works? ..................................................................................................... 107 6.15. How to determine parameters K,L and T? .......................................................... 107 6.16. Fleets of streams ................................................................................................. 107 6.17. How to detect the increasing trend of OWDs? ................................................... 107 6.18. PathLoad uses two metric to recognize increasing trend .................................... 107 6.19. PDT and PCT examples ...................................................................................... 108 6.20. PCT variations examples .................................................................................... 108 6.21. PDT variations example ..................................................................................... 109 6.22. Rate adjustment .................................................................................................. 110 6.23. Performance ........................................................................................................ 110 6.24. Packet Pair-based methods ................................................................................. 111 6.25. PathChirp Chirp Packet Trains ........................................................................... 111 6.26. PathChirp ............................................................................................................ 112 6.27. PathChirp Methodology ...................................................................................... 112 6.28. Self-Induced Congestion .................................................................................... 113 6.29. Excursions .......................................................................................................... 114 6.30. pathChirp Tool .................................................................................................... 115 6.31. Comparison with Pathload .................................................................................. 115 6.32. PathSensor: Granular model-based bandwidth estimation ................................. 116 6.33. Estimating output spacing with fluid traffic for a single-hop scenario ............... 117 6.34. Fluid curves for single-hop ................................................................................. 117 6.35. How to simulate cross traffic? ............................................................................ 118 6.36. Output spacing .................................................................................................... 119 6.37. Output spacing .................................................................................................... 119 6.38. Explicit solution for M/D/1 queues .................................................................... 119 6.39. Explicit solution for M/D/1 queues .................................................................... 119 6.40. Literature ............................................................................................................ 121 7. 7 Topology discovery in large-scale networks .................................................................. 122 7.1. Topology discovery .............................................................................................. 122 7.2. Challenges ............................................................................................................ 122 7.3. Naiv approaches ................................................................................................... 123 7.4. CAIDA’s Skitter ................................................................................................... 123 7.5. NetDimes .............................................................................................................. 124 7.6. Expectations ......................................................................................................... 125 7.7. Different methods ................................................................................................. 125 7.8. ROUTE DISCOVERY ......................................................................................... 126 7.9. Traceroute ............................................................................................................. 126 7.10. How traceroute works? ....................................................................................... 126 7.11. How traceroute works? ....................................................................................... 127 7.12. How traceroute works? ....................................................................................... 127 7.13. How traceroute works? ....................................................................................... 127 7.14. Problems ............................................................................................................. 128 7.15. Problems with load balancers ............................................................................. 128 7.16. Problems with load balancers ............................................................................. 129 7.17. What causes this anomaly? ................................................................................. 129 7.18. A more complex example ................................................................................... 130 7.19. What can we do? ................................................................................................. 130 7.20. Paris Traceroute Algorithm ................................................................................ 130 7.21. Finding the NEXTHOP ...................................................................................... 131 7.22. The key ideas behind NEXTHOP ....................................................................... 132 7.23. Number of probes and the expected number of interfaces at 95 percent confidence level 132 7.24. SELECTFLOW: Selecting a flow ...................................................................... 133 vii Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 7.25. SELECTFLOW: discovering new flows crossing router r ................................. 7.26. PERPACKET ..................................................................................................... 7.27. Discovering nexthop interfaces in presence of a load balancer .......................... 7.28. Discovering nexthop interfaces in presence of a load balancer .......................... 7.29. Performance of Paris traceroute .......................................................................... 7.30. Load balancers .................................................................................................... 7.31. TOPOLOGY DISCOVERY ............................................................................... 7.32. Topology discovery ............................................................................................ 7.33. DoubleTree ......................................................................................................... 7.34. The actual topology ............................................................................................ 7.35. Intra-monitor redundancy ................................................................................... 7.36. Inter-monitor redundancy ................................................................................... 7.37. Tree like structures ............................................................................................. 7.38. Monitor rooted tree ............................................................................................. 7.39. Destination rooted tree ........................................................................................ 7.40. DoubleTree ......................................................................................................... 7.41. Maintaining trees ................................................................................................ 7.42. DoubleTree results .............................................................................................. 7.43. Literature ............................................................................................................ 8. 8 Network tomography ..................................................................................................... 8.1. What does tomography mean? .............................................................................. 8.2. Network tomography? .......................................................................................... 8.3. Network tomography? .......................................................................................... 8.4. How does it work? ................................................................................................ 8.5. How does it work? ................................................................................................ 8.6. Network Tomography ........................................................................................... 8.7. Network Tomography ........................................................................................... 8.8. Network Tomography ........................................................................................... 8.9. Network Tomography ........................................................................................... 8.10. What else is needed? ........................................................................................... 8.11. Estimating Source-destination traffic intensities ................................................ 8.12. Estimating Source-Destination traffic intensities ............................................... 8.13. A toy example ..................................................................................................... 8.14. EM algorithm ...................................................................................................... 8.15. MLE and Normal Approximations ..................................................................... 8.16. MultiCast-based loss inference ........................................................................... 8.17. Loss model .......................................................................................................... 8.18. Loss inference ..................................................................................................... 8.19. Solution with EM ................................................................................................ 8.20. Convergence ....................................................................................................... 8.21. Convergence ....................................................................................................... 8.22. Unicast network tomography .............................................................................. 8.23. Sandwich probing ............................................................................................... 8.24. Sandwich probing ............................................................................................... 8.25. Measurement framework .................................................................................... 8.26. Topology Identification ...................................................................................... 8.27. Simplifying the problem ..................................................................................... 8.28. Find the tree ........................................................................................................ 8.29. Illustration ........................................................................................................... 8.30. Literature ............................................................................................................ 9. 9 Network coordinates systems ......................................................................................... 9.1. Introduction .......................................................................................................... 9.2. The key idea of an NCS ........................................................................................ 9.3. Localization Techniques ....................................................................................... 9.4. Localization Techniques ....................................................................................... 9.5. Localization Techniques ....................................................................................... 9.6. Network Coordinates System Basics .................................................................... 9.7. Network Coordinates System Basics .................................................................... 9.8. Network Coordinates Systems Advantages .......................................................... 9.9. LANDMARK BASED NCS ................................................................................ viii Created by XMLmind XSL-FO Converter. 133 133 133 134 135 136 137 137 137 138 138 138 138 139 139 139 140 140 140 140 140 141 142 142 143 144 145 146 147 148 148 148 148 149 149 149 150 151 151 151 152 153 154 154 155 155 155 155 156 157 157 157 158 158 158 158 158 159 160 160 Large-scale Internet measurement 9.10. IDMaps ............................................................................................................... 9.11. Landmark based NCSs ....................................................................................... 9.12. Global Network Positioning ............................................................................... 9.13. Lighthouses ......................................................................................................... 9.14. Lighthouses ......................................................................................................... 9.15. Network Positioning System .............................................................................. 9.16. Internet Coordinate System ................................................................................ 9.17. Internet Coordinate System ................................................................................ 9.18. Virtual landmarks ............................................................................................... 9.19. Internet Distance Estimation Service .................................................................. 9.20. DISTRIBUTED NCS ......................................................................................... 9.21. Distributed NCSs ................................................................................................ 9.22. Practical Internet Coordinates ............................................................................. 9.23. Big-Bang Simulation .......................................................................................... 9.24. Big-Bang Simulation .......................................................................................... 9.25. Vivaldi ................................................................................................................ 9.26. Vivaldi ................................................................................................................ 9.27. Vivaldi ................................................................................................................ 9.28. Vivaldi – Centralized algorithm ......................................................................... 9.29. Vivaldi – Centralized algorithm ......................................................................... 9.30. Distributed Vivaldi with constant timesteps ....................................................... 9.31. Vivaldi – Adaptive timesteps .............................................................................. 9.32. Decentralized Vivaldi with adaptive timestep .................................................... 9.33. Latency data for performance analysis ............................................................... 9.34. Timestep choice .................................................................................................. 9.35. Convergence and robustness against high-error nodes ....................................... 9.36. Communication patterns ..................................................................................... 9.37. Triangle Inequality Violations ............................................................................ 9.38. Euclidean spaces ................................................................................................. 9.39. Spherical coordinates .......................................................................................... 9.40. Height model ...................................................................................................... 9.41. Height model ...................................................................................................... 9.42. Pharos - Hierarchical Vivaldi ............................................................................. 9.43. Pharos – The algorithm ....................................................................................... 9.44. Hierarchical distance prediction ......................................................................... 9.45. A two-tier ICS .................................................................................................... 9.46. Triangular inequality violation ........................................................................... 9.47. Triangular inequality violation ........................................................................... 9.48. Two-tier Vivaldi ................................................................................................. 9.49. Two-tier Vivaldi ................................................................................................. 9.50. Limitations .......................................................................................................... 9.51. Benefits ............................................................................................................... 9.52. Comparison of different techniques .................................................................... 9.53. Security in NCS .................................................................................................. 9.54. Security in NCS .................................................................................................. 9.55. Security in NCS .................................................................................................. 9.56. Future directions ................................................................................................. 9.57. Literature ............................................................................................................ 10. 10 IP geolocation ............................................................................................................ 10.1. Motivation .......................................................................................................... 10.2. IP Geolocation in general ................................................................................... 10.3. Whois based location estimation example for passive geolocation .................... 10.4. IP Geolocation in general ................................................................................... 10.5. IP Geolocation in general ................................................................................... 10.6. THE FIRST STEPS ............................................................................................ 10.7. IP2Geo – Single point localization ..................................................................... 10.8. GeoTrack – main idea ......................................................................................... 10.9. GeoTrack ............................................................................................................ 10.10. GeoPing - Delay based localization .................................................................. 10.11. GeoPing - details .............................................................................................. ix Created by XMLmind XSL-FO Converter. 160 161 161 162 163 164 165 165 165 166 167 167 167 167 168 169 170 170 170 171 171 171 171 171 171 171 171 172 173 173 174 174 174 175 175 175 176 176 176 177 177 177 178 178 178 179 179 180 180 180 181 183 184 184 188 188 188 189 189 189 Large-scale Internet measurement 10.12. GeoCluster ........................................................................................................ 10.13. GeoCluster ........................................................................................................ 10.14. GeoCluster – Clustering IP addresses ............................................................... 10.15. Performance of GeoCluster .............................................................................. 10.16. ADVANCED TECHNIQUES .......................................................................... 10.17. Constraint Based Geolocation .......................................................................... 10.18. Constraint Based Geolocation .......................................................................... 10.19. Octant IP geolocation framework ..................................................................... 10.20. It is more than a simple method, it is a framework ........................................... 10.21. Notations ........................................................................................................... 10.22. Octant – Landmarks and constraints ................................................................. 10.23. Estimated location ............................................................................................ 10.24. Mapping latencies to distances ......................................................................... 10.25. Mapping latencies to distances ......................................................................... 10.26. Mapping latencies to distances ......................................................................... 10.27. Last hop delays ................................................................................................. 10.28. Eliminating last hop delays in Octant ............................................................... 10.29. Last hop delays in Octant ................................................................................. 10.30. Last hop delays ................................................................................................. 10.31. Results .............................................................................................................. 10.32. Results .............................................................................................................. 10.33. Spotter – a probabilistic approach .................................................................... 10.34. Travel time – distance relation ......................................................................... 10.35. Travel time – distance relation ......................................................................... 10.36. Statistical delay-distance model ........................................................................ 10.37. Statistical delay-distance model ........................................................................ 10.38. Statistical delay-distance model ........................................................................ 10.39. Evaluation – "Probabilistic triangulation" ........................................................ 10.40. Performance analysis ........................................................................................ 10.41. Topology-based Geolocation ............................................................................ 10.42. Topology based geolocation ............................................................................. 10.43. Summary of techniques .................................................................................... 10.44. Estimate hop latencies ...................................................................................... 10.45. Estimate hop latencies ...................................................................................... 10.46. Clustering interfaces ......................................................................................... 10.47. Clustering interfaces ......................................................................................... 10.48. Validating location hints ................................................................................... 10.49. Constraint optimization .................................................................................... 10.50. Constraint optimization .................................................................................... 10.51. Results .............................................................................................................. 10.52. Results .............................................................................................................. 10.53. Other issues to be handled Indirect routes ........................................................ 10.54. Other issues to be handled Indirect routes discovery ........................................ 10.55. Other issues to be handled Handling uncertainty .............................................. 10.56. Other issues to be handled Iterative refinement ................................................ 10.57. Literature .......................................................................................................... 11. 11 Geography of the Internet On the spatial properties of network topology ................. 11.1. Network research ................................................................................................ 11.2. The distance is what really counts. ..................................................................... 11.3. Data collection .................................................................................................... 11.4. Data collection .................................................................................................... 11.5. Covered areas ..................................................................................................... 11.6. Histogram maps .................................................................................................. 11.7. Transforming spatial distributions ...................................................................... 11.8. Transforming spatial distributions ...................................................................... 11.9. Transforming spatial distributions ...................................................................... 11.10. Transforming spatial distributions .................................................................... 11.11. A router-likelihood map ................................................................................... 11.12. Likelyhood of router positions - US ................................................................. 11.13. Likelyhood of router positions - US ................................................................. x Created by XMLmind XSL-FO Converter. 190 190 190 190 191 191 192 192 193 193 194 194 194 194 195 195 195 196 196 196 196 197 197 198 199 199 200 200 201 202 202 203 203 203 204 204 205 205 205 205 206 206 206 207 207 207 208 208 210 210 212 214 214 216 217 217 218 219 219 220 Large-scale Internet measurement 11.14. Characterizing the link length ........................................................................... 11.15. Characterizing the network links ...................................................................... 11.16. Frequency of link lengths ................................................................................. 11.17. Frequency of link lengths ................................................................................. 11.18. Frequency of link lengths ................................................................................. 11.19. Frequency of link lengths ................................................................................. 11.20. Frequency of link lengths ................................................................................. 11.21. Frequency of link lengths ................................................................................. 11.22. Frequency of link lengths ................................................................................. 11.23. Frequency of link lengths ................................................................................. 11.24. Distribution of link lengths ............................................................................... 11.25. Distribution of link lengths ............................................................................... 11.26. Distribution of link lengths ............................................................................... 11.27. Distribution of link lengths ............................................................................... 11.28. Distribution of link lengths ............................................................................... 11.29. The embedded topology ................................................................................... 11.30. Characterizing network paths ........................................................................... 11.31. Aggregated path length ..................................................................................... 11.32. Circuitousness ................................................................................................... 11.33. Symmetry ......................................................................................................... 11.34. Symmetry ......................................................................................................... 11.35. Direction dependence of lateral deviations ....................................................... 11.36. Unfamiliar routing phenomenon? ..................................................................... 11.37. Literature .......................................................................................................... 12. 12 Network traffic analysis, clustering and classification ............................................... 12.1. Traffic ................................................................................................................. 12.2. Traffic classification ........................................................................................... 12.3. Traffic classification ........................................................................................... 12.4. Quality of Service (QoS) .................................................................................... 12.5. Traffic Classification .......................................................................................... 12.6. Traffic Classification .......................................................................................... 12.7. Different approaches ........................................................................................... 12.8. Deep Packet Inspection ....................................................................................... 12.9. Deep Packet Inspection Basics ........................................................................... 12.10. Multi-byte pattern matching ............................................................................. 12.11. Deploying multiple multi-byte DFAs ............................................................... 12.12. True positive VS False positive etc. ................................................................. 12.13. Performance of different DPI tools ................................................................... 12.14. Classical recipe for flow statistic-based traffic classification ........................... 12.15. Statistical payload analysis ............................................................................... 12.16. KISS: Stochastic Packet Inspection .................................................................. 12.17. Chi square statistics .......................................................................................... 12.18. Decision process ............................................................................................... 12.19. Validation on a real traffic trace ....................................................................... 12.20. Early Identification of Peer-To-Peer Traffic ..................................................... 12.21. Modeling a flow ................................................................................................ 12.22. Classification via probabilistic models ............................................................. 12.23. Data Collection for ground truth ....................................................................... 12.24. Experiments ...................................................................................................... 12.25. Feasibility test ................................................................................................... 12.26. Feasibility test ................................................................................................... 12.27. How much data is needed? ............................................................................... 12.28. How much data is needed? ............................................................................... 12.29. How much data is needed? ............................................................................... 12.30. Robustness ........................................................................................................ 12.31. Training set sizes .............................................................................................. 12.32. Is it protocol independent? ................................................................................ 12.33. Robustness Asymmetric routing ....................................................................... 12.34. Robustness Unknown traffic ............................................................................. 12.35. Real traffic traces .............................................................................................. xi Created by XMLmind XSL-FO Converter. 221 222 222 222 223 223 224 225 226 227 229 229 229 230 230 231 232 232 233 234 234 235 236 236 237 237 237 238 239 239 239 240 240 240 241 242 242 243 243 244 244 244 245 245 245 245 246 247 248 248 249 250 251 251 252 252 253 254 254 255 Large-scale Internet measurement 12.36. Confusion matrix .............................................................................................. 12.37. Literature .......................................................................................................... 13. 13 Measurements in peer-to-peer networks .................................................................... 13.1. Centralized VS P2P ............................................................................................ 13.2. Peer-to-peer networks ......................................................................................... 13.3. Peer-to-peer networks ......................................................................................... 13.4. Some P2P protocols ............................................................................................ 13.5. What do we want to measure? ............................................................................ 13.6. How can we do that? .......................................................................................... 13.7. Gnutella .............................................................................................................. 13.8. Gnutella vs Napster ............................................................................................ 13.9. Gnutella vs Napster Lifetime of the peers .......................................................... 13.10. Gnutella vs Napster Shared files vs Shared data ............................................... 13.11. Gnutella Latencies and downstream bandwidth ............................................... 13.12. Kademlia ........................................................................................................... 13.13. Kademlia ........................................................................................................... 13.14. Kademlia ........................................................................................................... 13.15. Kademlia ........................................................................................................... 13.16. Kademlia Collected data ................................................................................... 13.17. Kademlia Peers with dynamic IPs .................................................................... 13.18. Kademlia Peer availability ................................................................................ 13.19. IP-based availability is similar to what we have seen for Gnutella .................. 13.20. How duration can affect the availability? ......................................................... 13.21. Time of day effects ........................................................................................... 13.22. BitTorrent ......................................................................................................... 13.23. File Sharing ...................................................................................................... 13.24. *.torrent ............................................................................................................ 13.25. The Tracker ....................................................................................................... 13.26. BitTorrent ......................................................................................................... 13.27. An example ....................................................................................................... 13.28. File sharing ....................................................................................................... 13.29. Lifetime of a torrent Seeders and leechers ........................................................ 13.30. Pieces and sub-pieces ....................................................................................... 13.31. Piece Selection .................................................................................................. 13.32. Piece Selection .................................................................................................. 13.33. Choking ............................................................................................................ 13.34. Choking algorithm ............................................................................................ 13.35. Optimistic unchoke ........................................................................................... 13.36. Upload only mode ............................................................................................ 13.37. Lifetime of a torrent .......................................................................................... 13.38. Peer behaviour .................................................................................................. 13.39. Peer behaviour .................................................................................................. 13.40. Literature .......................................................................................................... 14. 14 Analysis of online social networks ............................................................................. 14.1. Be socialized ....................................................................................................... 14.2. Increasing interest ............................................................................................... 14.3. Twitter users ....................................................................................................... 14.4. Social Flow ......................................................................................................... 14.5. Twitter ................................................................................................................ 14.6. Why do we analyze it? ........................................................................................ 14.7. Is the data available? ........................................................................................... 14.8. Quantifying influence ......................................................................................... 14.9. How to measure influence? ................................................................................ 14.10. Cascades ........................................................................................................... 14.11. Cascade sizes and depths .................................................................................. 14.12. How to predict influence? ................................................................................. 14.13. Regression tree for influence prediction ........................................................... 14.14. Past influences vs followers .............................................................................. 14.15. Information flow on twitter .............................................................................. 14.16. How to identify Elite users? ............................................................................. xii Created by XMLmind XSL-FO Converter. 255 256 256 256 257 258 258 259 259 260 260 260 261 262 263 263 263 264 264 264 265 266 267 267 268 269 269 270 270 271 271 271 271 272 272 272 273 273 274 274 275 276 277 277 277 278 279 279 280 280 280 281 281 282 283 284 284 284 286 286 Large-scale Internet measurement 14.17. Snowball sample of Twitter lists ...................................................................... 14.18. Activity sample of Twitter users ....................................................................... 14.19. Who listens to whom? ...................................................................................... 14.20. Who listens to whom? ...................................................................................... 14.21. Two step information flow ............................................................................... 14.22. Who are the intermediaries? ............................................................................. 14.23. Who are the intermediaries? ............................................................................. 14.24. Network Dynamics ........................................................................................... 14.25. Memetracking ................................................................................................... 14.26. Memetracking ................................................................................................... 14.27. Collective attention on Twitter ......................................................................... 14.28. Collective attention on Twitter ......................................................................... 14.29. Collective attention onTwitter .......................................................................... 14.30. Collection attention on Twitter ......................................................................... 14.31. Literature .......................................................................................................... 15. 15 Measurements in mobile and cellular networks ......................................................... 15.1. Internet and cellular networks ............................................................................ 15.2. What can measurements reveal? ......................................................................... 15.3. A widely heterogeneous environment ................................................................ 15.4. How do the different access technologies affect the performance? .................... 15.5. HSDPA downlink ............................................................................................... 15.6. HSDPA Uplink ................................................................................................... 15.7. LTE Downlink .................................................................................................... 15.8. LTE Uplink ......................................................................................................... 15.9. What is the problem with the first 10 seconds? .................................................. 15.10. Large scale measurements ................................................................................ 15.11. Performance of different access technologies ................................................... 15.12. Performance of different access technologies ................................................... 15.13. Performance of different access technologies ................................................... 15.14. Performance of different access technologies ................................................... 15.15. Performance of different access technologies ................................................... 15.16. Performance of different access technologies ................................................... 15.17. Performance of different access technologies ................................................... 15.18. Long-term trend and daily patterns ................................................................... 15.19. Long-term trend and daily patterns ................................................................... 15.20. Long-term trend and daily patterns ................................................................... 15.21. Measuring DNS lookup time in 3G networks ................................................... 15.22. Measuring DNS lookup time in 3G networks ................................................... 15.23. Downlink throughput of major carriers in the U.S. .......................................... 15.24. Cellular network policies .................................................................................. 15.25. Port scans for large carriers in the U.S. ............................................................ 15.26. Port scans for large carriers in the U.S. ............................................................ 15.27. Port scans for large carriers in the U.S. ............................................................ 15.28. FTP blocking in T-Mobile’s network ............................................................... 15.29. HTTP proxy port blocking in T-Mobile’s network .......................................... 15.30. How can these middleboxes affect user experience? ........................................ 15.31. IP spoofing ........................................................................................................ 15.32. IP spoofing measurement ................................................................................. 15.33. Short TCP connection timeout .......................................................................... 15.34. How short TCP connection timeouts affect energy consumption? ................... 15.35. Packet reordering in the middleboxes ............................................................... 15.36. Packet reordering in the middleboxes ............................................................... 15.37. Packet reordering in the middleboxes ............................................................... 15.38. NAT traversal ................................................................................................... 15.39. NAT mapping in cellular networks .................................................................. 15.40. Mobile network measurement projects ............................................................. 15.41. Literature .......................................................................................................... xiii Created by XMLmind XSL-FO Converter. 286 287 288 288 289 290 291 293 293 293 294 295 295 296 297 297 298 298 299 299 299 300 301 302 302 303 304 304 305 306 307 308 309 310 311 311 312 313 313 314 314 315 316 317 317 318 318 319 320 320 321 322 322 323 324 324 325 Large-scale Internet measurement 1. 1 Introduction to Internet measurements 1.1. Course Information • Instructor: Sándor Laki • E-mail: [email protected] • Office hours: Thursday, 10:00-12:00, DT. 2.506 • Lecture: T/Th, 10:00 – 12:00 • Location: DT. 2.516 • Web site: http://lakis.web.elte.hu/EIT/lsim-2012autumn • Mailing list: [email protected] 1.2. Grading • Prerequisites • Graduate level Computer Networking courses • http://lakis.web.elte.hu/comnet-eng-bsc/ • http://people.inf.elte.hu/lukovszki/Courses/1314BSC/ • Credits: 6 ETCS • Grading • Midterm: 20% • Good participation 10% • Term project: 50% • Specification: 10% • Work in progress report and midterm. presentation: 15% • Final report and presentation: 25% • Final exam: 20% 1 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Photo by Sage Ross 1.3. Term project • Do as a team of 2+ students • Decide what to measure and specify how to do that • Build measurement tools or use existing platforms • Perform measurements • Collect and analyze measurement data • Identify potential applications or further research directions 1.4. What is this course about? • Not an introduction to the Internet • Focus on Internet Measurements • What to measure? • traffic, infrastructure • applications, performance • Why is it important? 2 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Traffic engineering, capacity planning, topology mapping • How does Internet look like? • Application and traffic characteristics, Topology/route choice • How to measure and how to interpret measurement results? • Measurement methodologies and challenges • Design trade-offs • Design of measurement/monitoring systems • Tools: data collection, modeling, statistical inference, etc. • Image courtesy of Michal Marcol / FreeDigitalPhotos.net 1.5. Reading • Mark Crovella, Balachander Krishnamurthy: • Internet Measurement: Infrastructure, Traffic and Applications. • Wiley, 2006. • Raj Jain: • The Art of Computer Systems Performance Analysis. • Wiley and Sons, New York, 1991 • Kurose and Ross: • Computer Networking: A Top-Down Approach Featuring the Internet. • Fifth edition, Addison-Wesley, 2009 3 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 1.6. INTRODUCTION 1.7. Once upon a time... 4 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 1.8. And no... 5 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Source: wikipedia.org 1.9. Another aspect of Internet evolution • Once 6 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • and now… 7 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 1.10. Today’s Internet • Tier I (Core network) • Tier II providers • Customer IP networks • ISP-1 • ISP-2 • ISP-3 • IXP • IXP • IXP • Hyper Giants • Google, Akamai, CDN, etc. • National and Global Transit Backbones 1.11. Why do we need Internet measurements? • Internet seems to work well • Despite the exponential growth in its size • Despite the high variety of applications • Email, Web, Instant messaging, File sharing, Social networks, Games, etc. • Why do we bother measuring various aspects of it then? 8 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 1.12. Why do we need Internet measurements? • Internet is far from being ideal • Internet measurements help us • To better understand why the Internet works and how it does • To design new features that may lead to the next generation Internet • To identify the weaknesses of network protocols 1.13. What to measure? • Physical Properties • Network devices • routers, NAT boxes, firewalls, switches • Links • wired, wireless • Topology Properties • Various levels • Autonomous Systems (AS), Points of Presence (PoP), Routers, Interfaces • Traffic Properties • Delays • Transmission, Propagation, Queuing, Processing etc. • Losses, Throughput, Jitter, etc. 1.14. Why is it challenging to measure the Internet? • Poor observability of network characteristics • The reasons behind • Core simplicity • Layered architecture and hidden elements • Administrative boundaries 1.15. Core simplicity • The network built up from very simple elements • Keep it simple design concept • Stateless nature 9 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Generally end-to-end arguments • Packets are not tracked • The interaction with the network is hard to observe 1.16. Layered architecture and hidden network elements • Hourglass model hides the details of the lower layers • IP everywhere, few transport protocols • It is almost impossible to measure the layers below IP • HTTP, Email, FTP, DNS, RTP, SMTP, WWW, VoIP, BitTorrent,… • TCP, UDP,... • IP • Ethernet, CSMA • MPLS, PPP, sonet, ... • WiFi, WiMax, LTE, UMTS, copper, fiber, ... 1.17. IP centric • You must be familiar with IPv4 and IPv6 • IPv4 header: 1.18. Middleboxes in the carriers’ networks • Hidden elements 10 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Firewalls • Filter out traffic, block ports, etc. • Proxies and IP sniffers • Improve performance • Traffic shapers • Improve traffic management • NAT boxes • Utilize IP space more efficiently • Active network measurements have to take into consideration the presence of hidden middleboxes • Probe traffic may be blocked • Traffic shapers may affect probe traffic • NATs hide the internal structure and size of the network 1.19. Administrative boundaries • System of systems • Interconnected networks operated by different organizations • ISPs hide the details of their network • E.g. instead of router level topologies only PoP level ones are available • Inter-AS routing is based on business decisions • Economical and political aspects 1.20. Applications • Traffic engineering • Troubleshooting • Anomaly detection • Security forensics • Feasibility check of new ideas 1.21. Network measurements • Infrastructure • Traffic • Application 1.22. Infrastructure measurements 11 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Basic path characteristics • Loss, delay, jitter, bandwidth, etc. • Topology measurements • Network tomography • Network coordinate systems • IP geolocation • Wireless mesh networks 1.23. Traffic measurements • Packet traces • Sampling • Flow characteristics • Inter arrival times, packet size distribution, etc. • Traffic Matrix Estimations • Deep Packet Inspection • Statistical Traffic Classification • Statistical Payload Analysis 1.24. Application measurements • Content Delivery Networks • Web content clustering • Skype and other VOIP measurements • File sharing • Video On Demand, IPTV • Malware • Social networks 1.25. Active and passive measurements • Active measurements • Methods that inject probe traffic into the network for the purpose of the measurement, and examine how the network affect the properties of the probe traffic • Typically end-to-end 12 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Some tools: ping, owamp, traceroute, iperf, etc. • Passive measurements • Methods that capture traffic generated by other users and applications to calculate network related metrics • Examples • Routeview repositories stores BGP tables from a large set of Ases • Traffic trace captured by pcap at a given point of the network • Flow (byte) counters in routers 1.26. Internet Measurements • Internet Measurement is key to designing the next generation communication network • Fundamental design principles of the current internet make it harder for measuring various aspects of it • Preliminary research has resulted in a set of basic tools and methods to measure aspects like topology, traffic etc. • Accuracy of such methods is still an open question • There is still a lot of ground to cover in this direction and this is where researchers like you come into the equation 1.27. Related Conferences and Journals • Conferences • Internet Measurement Conference • Passive and Active Measurement Workshop • ACM SIGMETRICS • Network and Distributed System Security Symposium • ACM SIGCOMM • IEEE INFOCOM • Journals • Computer Networks (ComNet) • IEEE Transactions on Networking (ToN) • IEEE Journal on Selected Areas in Communication (JSAC) 2. 2 Analytical background 2.1. Analytical background We need tools for study Internet in a quantitative fashion 13 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Linear algebra • Probability and statistics • Graph theory Further readings in these topics: • 1.Linear algebra wikibook: http://en.wikibooks.org/wiki/Linear_Algebra • 2. Mario F. Triola: Elementary Statistics • 3. Reinhard Diestel: Graph Theory http://diestel-graph-theory.com/index.html 2.2. LINEAR ALGEBRA 2.3. Notations 2.4. Norms and orthogonality 2.5. Matrices 14 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.6. Eigenvectors and eigenvalues 2.7. Alternate algebras 2.8. PROBABILITY AND STATISTICS 2.9. Why do we need statistics and probability theory? • Most of the mechanisms in networks are not deterministic • Randomized algorithms • Improved robustness, load balancing, etc. • Stochastic behavior of incoming traffic • Without probability theory and statistics it would be hard to analyze them 15 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.10. Notations 2.11. Definitions 2.12. Definitions - II 2.13. Expected values and moments 2.14. Variance and standard deviation 2.15. Joint probability 2.16. Conditional probability 2.17. Central limit theorem 2.18. Distributions for Internet measurements 16 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.19. Stochastic processes • Typically, Internet measurements arrive over time, in some order • To use the tools of probability in this settings we need to define the sequence of random variables which is called a stochastic process. 17 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.20. Stochastic processes 2.21. Stochastic processes 2.22. Characterization of a stochastic process 2.23. Simpler stationary conditions 2.24. Measures of dependence 2.25. Measures of dependence 18 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.26. Measures of dependence 2.27. Modeling network traffic and user activity 2.28. Modeling network traffic and user activity 2.29. Short and long tailed distributions 2.30. Short and long tailed distributions 2.31. Short and long tailed distributions 2.32. Heavy tailed/power-law distribution 2.33. Heavy tailed distribution 19 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • New York City area road map • Link lengths in km 2.34. Measured data 20 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Describing data • For example: "mean of a dataset" • An objectively measurable quantity which is the average of a set of known values • Describing probability models • For example: "mean of a random variable" • A property of an abstract mathematical construct • To emphasize the distinction, we add the adjective "empirical" to describe data • Empirical mean vs. mean • Classification of measured data • Numerical: i.e. numbers • Categorical: i.e. symbols, names, tokens, etc. 2.35. Describing data 2.36. More detailed descriptions • Quantiles • The pth quantile is the value • below which the fraction p of • the values lies. • Median is the 0.5-quantile • Percentile • This can be expressed as percentile as well. • E.g. the 90th percentile is the value that is larger than 90 percent of the data 21 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.37. Histogram • Defined in terms of bins which are a particular of the observed values • Counts how many values fall in each bin • A natural empirical analog of a random variable’s probability density function (PDF) or distribution function • Practical problem: • How to determine the bin boundaries 2.38. Empirical cumulative distribution function (CDF) 22 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Involves no binning or averaging of data values • Provides more information about the dataset than the histogram. • For each unique value in the data set, the fraction of data items that are smaller than that value (quantile). • Empirical CCDF can be used similarly 2.39. Categorical data description • Probability distribution • An analog of the histogram for categorical data • Measure the empirical probability of each symbol in the dataset • Use histogram in decreasing order 23 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.40. Describing memory and stability Time series data • Question: Do successive measurements tend to have any relation to each other? Memory • When the value of a measurement tends to give some information about the likely values of future measurements • Empirical autocorrelation function (ACF): Stability • If its empirical statistics do not seem to be changing over time. • Subjective • Objective measures • A typical approach is to break the dataset into windows • E.g. a set of 1000 observations can be divided into 10 windows consisting of the 1st 100 observations, the 2nd 100 observations, and so on. • Empirical statistics are calculated for each window then Looking for consistency, trends or predictable variation, etc. 2.41. High variability in Internet data • Traditional statistical methods focuses on low or moderate variability of the data, e.g. Normal distribution • However, Internet data shows high variability • It consists of many small values mixed with a small number of large value • A significant fraction of the data may fall many standard deviations from the mean 24 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Empirical distribution is highly skewed, and empirical mean and variance are strongly affected by the rare, large observations • It may be modeled with a subexponential or heavy tailed distribution • Mean and variance are not good metrics for high variability data, while quantiles and the empirical distribution are better, • e.g. empirical CCDF on log-log axes for long-tailed distribution 2.42. Zipf’s law • Categorical distributions can also be highly skewed • A model for the shape of a categorical distribution when data values are ordered by decreasing empirical probability, • e.g. URLs of Web pages • Zipf’s law refers to • the situation where • For some positive • constants c and B 2.43. GRAPH THEORY 2.44. Graph theory • Generally, networks can be handled as directed or undirected graphs • However, different phenomena could also be analysed by graph theory • E.g. retweet graph in social network analysis 25 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Graph theory could help us to characterize networks and other phenomena and analyze their properties 2.45. Graphs • A graph is a pair • Undirected and directed • Unweighted and weighted • 5 • 2 • 5 • 3 • 7 • 6 • 1 • 3 2.46. Subgraphs 2.47. Connected graphs 2.48. Metrics for characterization 2.49. Metrics for characterization 2.50. Matrix representation 26 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.51. Applications of Routing Matrix • Origin-destination flow 2.52. Applications of routing matrix • Delay of paths • The routing and end-to-end delays are known 27 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.53. Artificial graph constructions • To model real networks by generating random graphs • Erdős-Rényi model • Random graphs • Theoretical relevance • Binomial degree distribution • Barabási-Albert model • Random scale-free networks • Modeling natural and human-made systems • Power-law degree distribution • Other models like Watts and Strogatz model 2.54. Erdős-Rényi random graph 2.55. Erdős-Rényi random graph 28 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.56. Generalized random graph 2.57. Preferential attachment model 2.58. Preferential attachment model 2.59. Regular vs Random graphs • Regular graph • Long characteristic path length • High degree of clustering • Random Graph • Short paths • Low degree of clustering • Small world graph • Short characteristic path length • High degree of clustering 2.60. AS level topology 29 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.61. AS level topology • High variability in degree distribution • Some ASes are very highly connected • Different ASes have dramatically different roles in the network • Node degree seems to be highly correlated with AS size • Generative models of AS graph • "Rich get richer" model • Newly added nodes connect to existing nodes in a way that tends to simultaneously minimize the physical length of the new connection, as well as the average number of hops to other nodes • New ASes appear at an exponentially increasing rate, and each AS grows exponentially as well 2.62. AS level topology 30 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.63. MODELING 2.64. Measurement and modeling • Model • Simplified version of something else • Classification • A system model: simplified descriptions of computer systems • Data models: simplified descriptions of measurements • Data models • Descriptive data models • Constructive data models 2.65. Descriptive data model • Compact summary of a set of measurements • E.g. summarize the variation of traffic on a particular network as “a sinusoid with period 24 hours" • An underlying idealized representation • Contains parameters whose values are based on the measured data • Drawback • Can not use all available information • Hard to answer "why is the data like this?" and "what will happen if the system changes?" 31 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 2.66. Constructive data model • Succinct description of a process that gives rise to an output of interest • E.g. model network traffic as "the superposition of a set of flows arriving independently, each consisting of a random number of packets" • The main purpose is to concisely characterize a dataset, instead of representing or simulating the real system • Drawback • Model is hard to generalize - such models may have many parameters • The nature of the output is not obvious without simulation or analysis • It is impossible to match the data in every aspect 2.67. Data model • "All models are wrong, but some models are useful" • Model is approximate • Model omits details of the data by their very nature • Modeling introduces the tension between the simplicity and utility of a model • Under which model is the observed data more likely? • Models involves a random process or component • Three key steps in building a data model: • Model selection • Parsimonious: prefer models with fewer parameters over those with a larger number of parameters • Parameters estimation • Validating the model 2.68. Why build models • Provides a compact summary of a set of measurements • Exposes properties of measurements that are important for particular engineering problems, when parameters are interpretable • Be a starting point to generate random but "realistic" data as input in simulation 2.69. Probability models • Why use random models in the Internet? • Fundamentally, the processes involved are random • The value is an immense number of particular system properties that are far too tedious to specify 32 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Random models and real systems are very different things • It is important to distinguish between the properties of a probabilistic model and the properties of real data. 3. 3 Network measurement infrastructures ETOMIC and SONoMA 3.1. Why Internet experimental facilities are needed? • Internet became a large scale and complex network • inefficient protocols • in the Internet it is often not possible to measure traffic flows and other aspects of usage • injecting active probes to discover these hiding properties • Understanding the details of network and traffic dynamics • Topology changing • Queueing delay variations • Available bandwidth • One-way delay variations • etc. • Models and analysis of measurement data and traffic dynamics could lead to a better design of Future Internet protocols 3.2. Existing TestBeds and Network Measurement Infrastructures 33 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • DIMES 3.3. Lifecycle of network measurements 3.4. ETOMIC 3.5. The ETOMIC system • The European Traffic Observatory Measurement InfrastruCture (etomic) was created in 2004 within the EU FP6 Evergrow Integrated Project • ETOMIC also takes part in EIT ICTLabs FITTING and EU FP7 OpenLab • Its goals: • open access, public testbed for researcher 34 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • high precision timestamping • GPS synchronized • Visit: www.etomic.org • Best Testbed Award 3.6. System architecture • Measurement nodes/agents • Geographically dispersed machines are ready to be used by the users • Advanced probing nodes called ETOMs • Lightweight APE boxes • Central Management System • Schedule experiments, authenticate users, etc. • Data repository • Network Measurement Virtual Observatory 35 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 3.7. Evolution of measurement nodes 3.8. ETOMs • ETOMs with DAG • Intel S875WP1E server • Debian Linux • Endace DAG 3.6GE • 60 ns precision • GPS synchronized • Special C API for programing the DAG card • User space applications • Packet sender, capturer • ETOMs with ARGOS • HP ProLiant ML370 server • Ubuntu Linux • Quad core processor 36 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • ARGOS card (dev. at UAM) • 10 ns precision • based on netFPGA • GPS synchronized • No need special API • Standard pcap library can be used 3.9. APE boxes • Active Probing Equipment • low cost network measurment device • ca. 300 Euro • based on a Blackfin programmable board • developed at Eötvös Loránd University • 100 ns precision • GPS synchronized • uClinux - Linux operating system for embedded systems • Low energy consumption • web service interface for performing predefined measurements (ping, traceroute, packet train sender, capturer, etc.) 3.10. Measurement boxes 3.11. Central Management System • IBM Blade server • Key tasks 37 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • User management • Node maintenance • Experiment scheduling • Storing experimental results (temporally) • Web GUI 3.12. Slices VS Unique timeslots • In PlanetLab • Virtualization • Slices • Sharing the resources • Introducing too much unpredictability in timing measurements • Low precision timestamping • In etomic • No virtualization • No slices • Unique timeslots 38 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • You own all the resources you need during the experimentation • High precision timestamping 3.13. The ETOMIC system 39 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 40 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • ETOMIC • www.etomic.org • ANME 41 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • www.onelab.eu • www.etomic.org 42 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Onelab-2: • 26 partners • in 13 countries • ETOMIC: • ca. 40 ETOMs and • 20 APE boxes • on more than 20 different sites 3.14. One day on the Internet 3.15. Experimental use cases in ETOMIC • one way delay (60nanosec resolution) • tracking topology changes • available bandwidth meter • transport protocol testing • queuing delay tomography • geolocation experiments • … 43 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 44 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 3.16. HOW TO USE ETOMIC? 45 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 3.17. Performing an experiment from the system’s perspective • Setting up an experiment • Uploading scripts, data files • Selecting measurement agents • Reserving one or more timeslots • Initializing phase • Reserving the selected • measurement agents • Uploading measurement scripts • and other files needed for the experiment • Execution phase • Running the uploaded scripts with the preconfigured settings on the etomic nodes • Data collection phase • Downloading and storing the resulting data files in the CMS database 3.18. Measurement types • User specific measurements • Customized experiments defined by the end-users • Almost full control over the measurement agents • User specific periodic measurements 46 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Customized periodic experiments defined by the users • Repeating an existing experiment more times • inter-experiment times • repetition count • Kernel level periodic measurements • Carrying out experiments by the CMS itself • Low priority task • executed only if the nodes are idle • if a user level experiment comes then it is canceled • Invisible for end-users 47 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 3.19. Necessary steps for submitting an experiment • Create a bundle • Choose the agents • Configure them to run your experiment • Create a new experiment • Choose a bundle to be executed • Schedule your experiment by selecting a start instant for it • Running the experiment • The current state of your experiment can be seen • Downloading and analyzing the results 48 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • When the status of your experiment is finished the resulting files can be downloaded 3.20. Creating a bundle • Online demo at www.etomic.org 3.21. Creating an experiment and querying its status • Online demo at www.etomic.org 3.22. Downloading the results • When the status of the experiment is finished, click on the results icon to go to the Result files tab in the Edit/View files section where you can download all the files. 3.23. Programming DAG cards • The libeverdag C library is provided in the Evergrow project to ease the use of the DAG 3.6GE cards. • The unique features provided by these cards: • Synchronized send with high precision timestamps inserted in the payload. • High performance capture with precision timestamps of receive time • More details in the API documentation 49 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 3.24. PUBLISHING DATA 3.25. Experimental facilities • most network measurement projects: • use a single dedicated infrastructure • scan only narrow subsegments • analyze a limited set of network characteristics • centralized and separated from each other • key idea: try to interconnect separate measurement data! • large-scale behaviour • long-term evolution 3.26. Traditional approach • Traditionally measurements are designed to collect only • specific data, important from the point of view of the • researcher’s agenda. 50 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 3.27. Sharing science • Genome databases 51 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Astronomy 3.28. Related work: CAIDA/DatCat 3.29. Related work: MoMe database 52 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 3.30. Related work: MAWI repository 3.31. Data publication efforts • DatCat (USA): • searchable catalog of metadata about measurements • passive traffic traces, traceroutes, BGP tables, virus propagation studies • MOME (EU): • database for meta-measurement data • packet and flow traces, routing data, HTTP traces • standardization efforts • sharing of analysis tools is possible (e.g. jitter calculation) • MAWI (Japan): • repository of passive traces from the WIDE backbone (collected since 1999) • raw data is not stored 53 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • raw data • from single • infrastructure 3.32. Key ideas in data handling • store and share raw data • joint analysis of different types of measurement data • reanalysis (with new evaluation methods) • reference data (historical comparison) • share analysis tools • server side processing simplifies client applications • no need to transfer bulk data packages: online processing • standardization, network XML • Network Measurement Virtual Observatory (nmVO) 3.33. VO approach • The modern approach is to collect and store all measurable data and make it available for "virtual observation". Virtual measurements can have set of goals different from the original 3.34. Unified interface • VO can be realized by collecting measurement data from different infrastructures. Data structures should be standardized netXML 3.35. Casjobs User Interface for accessing data 54 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • The Network Measurement Virtual Observatory is available at • http://nm.vo.elte.hu 3.36. SONOMA 3.37. SONoMA v1.0 • Service Oriented NetwOrk Measurement Architecture • It was originally developed to instrument APE measurement boxes • Provide a web service interface for the users to perform predefined network measurements • It can easily be extended with new measurement agents • Standardized communication via web services • Visit: sonoma.etomic.org 3.38. Why do we need another network measurement platform? • The state of the art measurement systems: 55 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • ETOMIC: knowhow to program card • PlanetLab: scheduling, data collection and storage is up to the user • Dimes: few tools; Penny; cannot predict startup • Scriptroute: lacks cooperation and synchronization between nodes; no repository • perfSONAR: focuses on monitoring in contrast to complex measurements • Looking glass like: few metrics, no common iface 3.39. SONoMA • "Its key objective is to integrate the complexity of large testbeds • and the popularity of the lightweight services." • Federates heterogeneous measurement agents • (19 APE, 230 PlanetLab); dynamic accounting • Distributes tasks in a robust and efficient way • Provides an easy-to-extend framework • A collection of off the shelf measurement and evaluation methods • Decreases client side development effort (SOA) • A tool for novel Internet applications • that require real-time large-scale measurement data • Archives data in an automated fashion • In a transparent way • Supporting both short and long term experiments 3.40. System components 56 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Actors • A user or a user application • Management Layer • Account users (allow PLE) and Measurement Agents • Authenticate users, authorize calls • Handle sessions, decompose and schedule processes • Hybrid resource management: • time sharing / reserving • Measurement Agents • currently two kinds of agents are deployed: • APE boxes, PlanetLab nodes 3.41. Management Layer [fragile] 3.42. Measurement methods MA methods defined: • ping, traceroute, chirp, train, capture, dnslookup • Synchronous / Asynchronous • Cannot be accessed by the actors directly ML method composition: • Sychronous/Asynchronous measurements = short/long ones • Atomic measurements: short*, long*, parallel*, ensemble*, ensembleNSlookup * = Ping or Traceroute • Complex measurements /require synchronization/: short*, long* * = Chirp, Train • Statistical evaluations /measurement(s) + model/: getAvailableBandwidth, topology, tomography 3.43. Web client • http://sonoma.etomic.org 57 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement [fragile] 3.44. Case study: A full mesh topology measurement A python-like example for obtaining the full mesh topology spanned by the MAs: # Service object: ws = ServiceServerSOAP( url="https://sonoma.etomic.org:8888" ) # Open a session and authenticate sessionID = ws.requestSession( user="User", zip=True, format="CSV" ) # Submit a topology measurement with the list of Mas to be involved in the experiment # The returned procID is a unique id which refers to the given experiment procID = ws.topology( sessionID, nodeList = [ "157.181.175.247", "132.65.240.38", ... ] ) # Waiting for that the measurement ends 58 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement while not ready: # Receive the current status of the experiment (duration, working, ready, dead) = ws.getProcessInfo( sessionID, procID ) ...do some smart waiting... # If the measurement is over, the resulting data can be downloaded result = ws.getData( sessionID, procID ) # Finally, the session has to be closed ws.closeSession( sessionID ) 3.45. Case study: A full mesh topology measurement 3.46. What happens in the background? A full mesh topology measurement 59 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 3.47. Another use case: Spotter • IP geolocation service (http://spotter.etomic.org) • based on active delay measurements • uses SONoMA as its measurement platform to perform real-time network measurements 3.48. SONoMA 2.0 • Being developed within the projects 60 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • EU FP7 NOVI and OpenLab • EIT ICTLabs FITTING • Supporting FITTING testbeds and FITTING EduLab • To provide a measurement/monitoring infrastructure for federated testbeds • Ontology driven measurement description • New tools/experiment types can be added on the fly in run-time by only extending the information model 3.49. Literature • I. Csabai et al., ETOMIC Advanced Network Monitoring System for Future Internet Experimentation, In Proceedings of TridentCom • I. Csabai et al., ETOMIC Advanced Network Monitoring System for Future Internet Experimentation, In Proceedings of TridentCom 2010 Conference, May 18-20, 2010, Berlin, Germany (2010) • B. Hullár et al., SONoMA: A Service Oriented Network Measurement Architecture In Proceedings of TridentCom 2011, April 17-19, 2011, Shanghai, China (2011) • P. Mátray et al., Building a Prototype for Network Measurement Virtual Observatory 4. 4 Network measurement infrastructures PlanetLab 4.1. PlanetLab • Planetary scale distributed testbed • More than 1000 users from 40+ countries • Today: 1157 nodes at 547 sites around the worlds 4.2. The main goal • "...to support seamless migration of an application from an early prototype, • through multiple design iterations, • to a popular service that continues to evolve." 61 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 4.3. What is PlanetLab? • PlanetLab is an overlay test bed • designed to allow researchers to experiment with network applications and services • Benefits • Distribution across a wide geographic area • Diversity of network, e.g. edge-sites, co-location, etc. • Flexibiliy, maximal control over PlanetLab nodes • PlanetLab Consortium • is a collection of academic, industrial, and government institutions cooperating to support and enhance the PlanetLab overlay network • Visit http://www.planet-lab.eu 4.4. PlanetLab architecture • PlanetLab nodes • Several virtual machines (VM) on each node • Resources are distributed fairly (disk, memory, cpu) • Services running in different VMs are isolated from each other • Elements for managing the overlay network • Node Managers, Brokers, agents and service managers 4.5. Slices 4.6. Slices 4.7. Slices 4.8. User Opt-in • Server • NAT • Client 4.9. Services running in your slice • PlanetLab nodes 62 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • (physical devices) • planet1.elte.hu • planetlab2.cse.msu.edu • planetlab5.cs.duke.edu 4.10. Services running in your slice • PlanetLab nodes • (physical devices) • Slices • (virtual nodes) 63 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • elteple twitter • inria priv 4.11. Services running in your slice • PlanetLab nodes • (physical devices) • Slices • (virtual nodes) • Applications or services running in the slices 64 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 4.12. Services running in your slice • PlanetLab nodes • (physical devices) • Slices • (virtual nodes) • Applications or services running in the slices 65 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 4.13. Services running in your slice • PlanetLab nodes • (physical devices) • Slices • (virtual nodes) • Applications or services running in the slices 66 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 4.14. Virtualization solutions • Software runtime (e.g. Java VM) • High level API • Depend on OS to provide protection and resource allocation 67 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Not flexible • Complete Virtual Machine (e.g. VmWare) • Low level API (hardware) • Maximum flexibility and Excellent protection • But High CPU/Memory overhead, cannot share common resources among virtual machines • PlanetLab’s Vserver • Kernel patch to mainstream OS (Linux) • Gives appearance of separate kernel for each virtual machine • Root privileges restricted to activities that do not affect other vservers • Resource control (e.g., File handles, port numbers) and protection facilities added 4.15. VServers in a PlanetLab node • Hardware • Linux • (Fedora • Core 2) Vserver Vserver • Vserver • Slice 3 • Vserver • Slice 4 Vserver • Combined • Isolation and • Application • Interface • Resource Isolation • Safe Raw Sockets • Instrumentation • ... 4.16. VServers in a PlanetLab node • Extend the idea of chroot(2) 68 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • New vserver created by system call • Descendent processes inherit vserver • Efficient solution • Unique filesystem, SYSV IPC, UID/GID space • Limited root privilege • Can’t control host node 4.17. Low-level network access • Some applications may need low-level network access • Ensure that they cannot access traffic generated by other services • PlanetLab provides "safe raw sockets" • TCP/UDP bound to local port • For each IP, ports are either free or owned by a slice • Incoming packets delivered only to service with corresponding port registered • Outgoing packets scanned to prevent spoofing • ICMP is also supported • 16-bit identifier placed in ICMP header • As any other resources, ports and ICMP IDs can be reserved by the slice itself 4.18. Getting started • Register for a PlanetLab Europe account at the following site: • http://planet-lab.eu • Choose Eotvos Lorand University as your site • Contact your PlanetLab Principal Investigator to enable your account and add you to a slice • Sandor Laki: [email protected] [fragile] 4.19. Create your SSH Key • You need an SSH Key to access PlanetLab nodes remotely • SSH login using RSA authentication • To generate an SSH key pair, use the ssh-keygen program on a secure UNIX system: • Ssh-keygen asks for a passphrase. For practical reason, just let it empty. • Upload the public key file to the PlanetLab Europe website 69 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Use Manage My Keys tab • Public key in the above case: ~/.ssh/pl_rsa.pub ssh-keygen -t rsa -f ~/.ssh/pl_rsa 4.20. Create your slice • You should ask your PI to create a slice for you, or to associate your account with an existing slice • After you have been associated with the slice, you may assign nodes to the slice • e.g. by using the Manage Nodes tab on the website. • Slices expire after two months • it is destroyed • all files in the slice are removed on all nodes assigned to that slice • Can be extended by using the Renew Slice tab [fragile] 4.21. Login to your slice • A slice is initially populated with a minimal Fedora Core 2 Linux installation • To access your node through SSH, you should use the slice name as login name (e.g. eltepletutor), your private key and the PlanetLab node to be accessed • After getting in, su/sudo commands can be used to control services, install new packages, mount certain resources, etc. • Note that this root user has limited privileges… ssh -l elteple_tutor -i ~/.ssh/pl_rsa planet1.elte.hu 4.22. Install additional packages • Additional standard packages in the slice, can be installed using yum. • For example, to install vim: • To bring your slice up-to-date with the latest versions of all packages: • yum install vim • yum update [fragile] 70 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 4.23. Deploying your app • The most straightforward way of deploying your app to a single node is with • Or rsync that copies just those files that do not exist or have changed on the remote slice • Or with codeploy which enables http://codeen.cs.princeton.edu/codeploy/ you to deploy your app to scp -i ~/.ssh/pl_rsa myapp [email protected]: rsync -a -e "ssh -l elteple_tutor -i ~/.ssh/pl_rsa" myapp planet1.elte.hu: 4.24. Configuring a server for automatic startup • Nodes reboot periodically • If your app is a long running service, you may want it to restart automatically after reboot • set up a System V Init script for it • in /etc/rc.d/init.d/ • modifying the line beginning with chkconfig: • 1st number is the list of runlevels the script should be run in; • 2nd number is the relative order in which it should start; • 3rd number is the relative order in which it should stop. • Enable it by chkconfig • chkconfig -add myapp chkconfig myapp on 4.25. Other useful tools • Parallel SSH • PlanetLab Slice Deploy Toolkit • vxargs • Nixes Tool Set • A complete list can be found here: • http://planet-lab.org/tools 4.26. PSSH • Developed at Intel Research, Berkeley • This package provides parallel versions of the openssh tools. Included in the distribution: 71 Created by XMLmind XSL-FO Converter. multiple nodes Large-scale Internet measurement • Parallel ssh (pssh) • Parallel scp (pscp) • Parallel rsync (prsync) • Parallel nuke (pnuke) • Parallel slurp (pslurp) [fragile] 4.27. PSSH Demo pssh -h nodes.txt -l elteple_tutor -o /tmp/foo hostname pscp -h hosts.txt -l ufl_ipop foo.txt /home/ufl_ipop/foo.txt pnuke -h ips.txt -l ufl_ipop java 4.28. PlanetLab Slice Deploy Toolkit • The PlanetLab Slice Deploy Toolkit consists of three scripts used to manage slices: • plslice create and manage a slice • pldeploy manage a collection of cogs deployed in a slice • pladdnodes example of a script to push a cog to all nodes • More details: • http://psepr.org/tools/ 4.29. vxargs • It provides the parallel versions of any arbitrary command, including ssh, rsync, scp, wget, curl, and whatever • The main features are: • parallelism: run many jobs at the same time • flexibility: arbitrary command with arbitrary options • visualization: monitor the total/per job progress in a curses-based UI • redirection: stdout and stderr of each individual job are redirected to files respectively for further analysis. • More details: • http://vxargs.sourceforge.net/ 4.30. Nixes Tool Set • More details: • http://www.aqualab.cs.northwestern.edu/projects/149-nixes-tool-set 72 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • plsetup node-list: bootstraps the slice with yum and gzip • plinstallrpm “rpms" node-list: installs all the rpms on all the nodes • pldeploy node-list: deploys any file structure to the nodes • plcmd command node-list: executes any set of commands on all the nodes. 4.31. Long-Running Services In PlanetLab • Content Distribution • CoDeeN: Princeton • Coral: NYU, Stanford • Cobweb: Cornell • Storage and Large File Transfer • LOCI: Tennessee • CoBlitz: Princeton • Information Plane • PIER: Berkeley, Intel • PlanetSeer: Princeton • iPlane: Washington • DHT • Bamboo (OpenDHT): Berkeley, Intel • Chord (DHash): MIT 4.32. Services (cont) • Routing / Mobile Access • i3: Berkeley • DHARMA: UIUC • VINI: Princeton • DNS • CoDNS: Princeton • CoDoNs: Cornell • Multicast • End System Multicast: CMU • Tmesh: Michigan • Anycast / Location Service 73 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Meridian: Cornell • Oasis: NYU 4.33. Services (cont) • Internet Measurement • ScriptRoute: Washington, Maryland • Pub-Sub • Corona: Cornell • Email • ePost: Rice • Management Services • Stork (environment service): Arizona • Emulab (provisioning service): Utah • Sirius (brokerage service): Georgia • CoMon (monitoring service): Princeton • PlanetFlow (auditing service): Princeton • SWORD (discovery service): Berkeley, UCSD 4.34. Further available testbeds with PlanetLab Europe account 4.35. NITOS Wireless Testbed • Features: • 40 nodes equipped with Atheros Wi-Fi cards, running on • Linux and open-source wireless drivers • Extra available features: GNU-radios, cameras, temperature/humidity sensors • Mobility: two programmable robots carrying Wi-Fi nodes • LTE and 3G-femtocell testbeds to be deployed • Remotely controlled, OMF-driven, provides spectrum slicing Testbeds • Interconnected with PlanetLab Europe • Suitable for: • Experimentation with Wi-Fi in a real world environment • PHY-layer experimentation through the 6 available GNU-radio-equipped nodes • Video over wireless experiments 74 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Sensor experiments • Slow mobility experiments through the two programmable robots • http://nitlab.inf.uth.gr/NITlab/index.php/testbed 4.36. w-iLab.t • A heterogeneous, generic, wireless testbed deployed at two locations: • an office (200 fixed nodes) and a pseudo-shielded environment (60 fixed, 20 mobile nodes). • Features: • A node = wireless sensor node + embedded PC with wireless interfaces • Wireless, wired and sensor technologies available • Testbed functionality fully configurable by user • Rich set of tools • Run experiments using wireless protocol design, using: • reproducible wireless environments - benchmarking strategies • reproducible real-life mobility, or fully emulated mobility • simultaneous use of different technologies and node types • http://www.ibbt.be/en/develop-test/ilab-t/wireless-lab 5. 5 Network measurement infrastructures FEDERICA, SFA, OpenFlow 75 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 5.1. Federica • e-infrastructure based on virtualization • Complete control on the protocol stack • Enabling experiments at all the communication layers • Including the characteristics of the underlying physical layer (e.g. bandwidth, delay, loss) • Reproducibility of experiments • In contrast to PlanetLab or ETOMIC where the conditions are varying from one experiment to another • Built on existing infrastructures • Géant 5.2. Federica 5.3. The physical topology • 4 core PoPs • DFN, DE • PSNC, PL • GARR, IT • CESNET, CZ • Each PoP • Juniper router • 2 or more V-Nodes • High speed connection • 1 Gbps 76 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 5.4. Core elements • Switch: • Juniper MX480, Dual CPU, 1 line card with 32 ports at 1Gb Ethernet. • Virtual and logical routing, MPLS, VLANs, IPv4, IPv6, 2 of the 4 line cards have hardware QoS capabilities • V-Node: • Quad core AMD @ 2GHz, 32GB RAM, 8 network interfaces, 2x500GB disks, Virtualization SW • 2 of them are deployed near each switch 5.5. SFA – SLICE-BASED FACILITY ARCHITECTURE 5.6. Slice-based Facility Architecture SFA • A great many testbeds have emerged in the past decade • EU, US, and national efforts • What if we want to use more than one? • We need an account for them 77 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Each testbed provides its own tools to create slices, reserve and access resource, so we have to learn these tool sets • Various way to deploy and perform an experiments • Combining various resources located in different testbeds is impossible or requires manual configuration • It makes difficult to try out new ideas. 5.7. Slice-based Facility Architecture SFA • Its objective • SFA aims at providing a secure common API with the minimum possible functionality to enable global federation • SFA is a specification • Many implementations exist • PlanetLab (python), ProtoGENI(Perl), Federica (Java) • It was originally conceived by Larry Peterson (Princeton) • Developed by Princeton and INRIA 5.8. Experiment lifetime in general • Authentication • Resource discovery • Slice creation • Resource reservation • Resource reconfiguration • Execution of the experiment • Result collection • Resource release • What can SFA help with? 5.9. What can SFA help with? • Authentication • Resource discovery • Slice creation • Resource reservation • Resource reconfiguration • Execution of the experiment 78 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Result collection • Resource release • Delete slice • By the local authority • Providing a general description of the available resources, even in a federated environment • Creating federated slices • Reserving different resources located in various testbeds and selected by the user • Releasing reserved resources • Slice deletition 5.10. SFA for federated testbeds 79 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Authorities • Users • Resources • Agreements • Agreements • Authentication 80 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Resource discovery 5.11. SFA for federated testbeds • Authorities 81 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Users • Resources • Authentication • Resource reservation 5.12. SFA – Available resources • Flack a client side SFA tool 82 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 5.13. SFA functionalities • Naming (slices, users, resources, authorities) • hierarchical naming space • Authentication and authorization • X.509 certificates and signed XML credentials • federation links = exchange of certificates • A standard XMLRPC API • Object management: users, resources, slices, authorities • Resource management: browse, acquire, manifest • Slice management: create, delete, start, stop • Resource description (RSpecs) • only the language (XML), not the semantics 5.14. Hierarchical naming • Object types: • Resources, authorities, users, slices • Unique names • HRN (Human Readable Naming) • ple.elte.userA 83 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • ... • ... • ple • plc • elte • upmc • princeton • mit • userA • userB • userC • ... • nodeA • nodeB • nodeC • ... • sliceA • sliceB • sliceC 84 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • ... [fragile] 5.15. Authentication • X.509 certificates • signed by an SFA authority • HRN + public key (self signed in the beginning) • GID = signed chain of certicates • Example: structure of ple.elte.userA.gid subject= /CN=ple.elte.userA Issuer= /CN=ple.elte -subject= /CN=ple.elte Issuer= /CN=ple -item subject= /CN=ple Issuer= /CN=ple • ... • ple • elte • upmc • userA • userB • userC 85 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • … • nodeA • nodeB • nodeC • … • sliceA • sliceB • sliceC • … [fragile] 5.16. SFA API • Object management Register/Remove/Update(Record, Credential) Record = Resolve(xrn, Credential) Record[] = List(xrn, Credential, options) • Resource management Rspec = ListResources(Credential, [hrn]) CreateSliver(xrn, Credential, RSpec, users, options) DeleteSliver(xrn, Credential, options) • Slice management ListSlices(Credential, options) Start/Stop/Shutdown(xrn, Credential) SliverStatus(xrn, Credential, options) 5.17. SFA Components • Aggregate Manager (AM) • Resource management • Registry (R) • Object management • Slice Manager (SM) • Slice management • Component Manager (CM) 86 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • AM • SM • AM • SM • R • R • Local resources • Local resources • CM • CM • CM • CM • CM • CM • CM • CM 5.18. Resource Specification (RSpec) Documents • RSpec • allow interoperability among different AMs • a common language for describing • resources, resource requests, and reservations. • Standardized XML documents • Aggregate or resource specific extensions • Usage 87 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Advertisement Rspec • This is the document that is returned by an AM that describes the resources that the AM has. (e.g. calling listresources) • Request Rspec • This is the document that a user sends to an AM to describe the resources that he wants to reserve. (e.g. calling createsliver) • Manifest Rspec • This is the document returned by an AM that describes the resources that a user has reserved at an AM. (e.g. calling listresources for a given sliver) 5.19. SFI and SFA client • Slice Federation Interface (SFI) • PlanetLab (PLC) • PlanetLab Europe (PLE) • PlanetLab Japan (PLJ) • VINI • protoGENI • Registry Interface for managing records • Add, Update, Remove, Show, List • Slice Interface for managing slices • Resources, Create, Delete, Start, Stop [fragile] 5.20. Installation and configuration • You need a Linux or Mac OS X platform • Follow the instructions at • http://svn.planet-lab.org/wiki/SFAInstallationGuide # sfi.py -h Usage: sfi [options] command [command_options] [command_args] Commands: list,show,remove,add,update,nodes,slices,resources,create,delete,start,stop,reset Options: h, --help show this help message and exit r URL, --registry=URL root registry s URL, --slicemgr=URL slice manager d PATH, --dir=PATH config and working directory - default is /Users/soltesz/.sfi/ u HRN, --user=HRN user name a HRN, --auth=HRN authority name 88 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement v, --verbose verbose mode p PROTOCOL, --protocol=PROTOCOL RPC protocol (xmlrpc or soap) 5.21. List records from the registry • To list all records in the registry for a given authority e.g. "ple.elte" • # sfi.py list ple.elte • ple.elte.twitter (slice) • ple.elte.sonoma (slice) • ple.elte.tutor (slice) • ... • ple.elte.laki (user) • ple.elte.ZZZ (user) • ... • ple.elte.planet1 (node) • ple.elte.planetl2 (node) • Each line consists of a HRN and the record type indicated in parentheses (slice, user, node or authority). • To get authority records query for the HRN of a top level authority such as plc or ple. [fragile] 5.22. Detailed record information • To see more information you can show a Record for objects that you have permission to access. • Typically it is used for fetching slice, node and user (actual user) records # sfi.py show -t slice plc.princeton.slicename gid: hrn: plc.princeton.slicename uuid: 229167493191239517049866786687974995079 last_updated: 2009-07-15 16:15:35.85365 hrn: plc.princeton.slicename type: slice date_created: 2009-07-15 16:15:35.85365 description: A test slice for personal tests. researcher: ['plc.princeton.sliceuser'] expires: 1871836937 name: princeton_slicename url: http://www.cs.princeton.edu/~sliceuser • We are looking for a slice record, but the type could be user and node as well. • Where the slice HRN is plc.princeton.slicename • The detailed record lists information on 89 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • the users assigned to the slice, expiration date, • login name of the slice, etc. [fragile] 5.23. Get resources • Slices are defined as the binding between a set of resources and users • To find out what resources are currently bound to a slice the command resources can be used • For a given slice HRN This command returns an carrying the nodes that belongs to the given slice # sfi.py resources -o slice.rspec plc.princeton.slicename <?xml version='1.0' encoding='ASCII'?> <RSpec type="SFA"> <network name="plc"> <site id="s10242"> <name>HU Berlin - IWI</name> <node id="n10855"> <hostname>planetlab1.wiwi.hu-berlin.de</hostname> </node> <node id="n10856"> <hostname>planetlab2.wiwi.hu-berlin.de</hostname> </node> </site> </network> </RSpec> • The content of the output file slice.rspec • The nodes for the slice plc.princeton.slicename [fragile] 5.24. Get resources • Slices are defined as the binding between a set of resources and users • To find out what resources are currently bound to a slice the command resources can be used • For a given slice HRN This command returns an carrying the nodes that belongs to the given slice # sfi.py resources -o slice.rspec plc.princeton.slicename <?xml version='1.0' encoding='ASCII'?> <RSpec type="SFA"> <network name="plc"> <site id="s10242"> <name>HU Berlin - IWI</name> <node id="n10855"> <hostname>planetlab1.wiwi.hu-berlin.de</hostname> </node> <node id="n10856"> <hostname>planetlab2.wiwi.hu-berlin.de</hostname> </node> </site> </network> </RSpec> 90 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • The content of the output file slice.rspec • The nodes for the slice plc.princeton.slicename • To obtain all available resources (resource discovery) use • # sfi.py resources -o all.rspec • all.rspec will contain all the nodes with detailed information [fragile] 5.25. Allocate resources for a given slice • You need to prepare an RSpec describing the resources you want to allocate • You can do it manually by adding "sliver" tags • or using command line tools <?xml version='1.0' encoding='ASCII'?> <RSpec type="SFA"> <network name="plc"> <site id="s15"> <name>CarnegieMellon</name> <node id="n40"> <hostname>planetlab-1.cmcl.cs.cmu.edu</hostname> <bw_limit units="kbps">5000</bw_limit> <sliver/> </node> <node id="n41"> <hostname>planetlab-2.cmcl.cs.cmu.edu</hostname> <bw_limit units="kbps">5000</bw_limit> <sliver/> </node> </site> </network> </RSpec> • You can modify the XML returned by the command resource by adding extra <sliver/> tags to the nodes you want to have in your slice. 5.26. Allocate resources for a given slice • Instead of modifying the RSpec by hand, command line tools can be applied • Get available resources • # sfi.py resources -o nodes.rspec • Create a text file of hostnames • # sfiListNodes.py -i nodes.rspec -o nodes.txt • Remove/add hostnames from/to nodes.txt • Create an RSpec with requested resources • # sfiAddSliver.py -i nodes.rspec -n nodes.txt -o my-nodes.rspec • And then create the slice with the specified resources: 91 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • # sfi.py create plc.princeton.slicename my-nodes.rspec 5.27. Deallocate resources • To release or deallocate resources you can use • # sfi.py delete plc.princeton.slicename new-slice.rspec • where new-slice.rspec contains the node to be released. • To deallocate all the resources of a slice just use the above command without the rspec: • # sfi.py delete plc.princeton.slicename 5.28. OPENFLOW CAPABILITIES IN PLANETLAB EUROPE 5.29. What is the problem with existing networks? • Commodity hardware • Difficult to program • Always behind the industry • Too expensive • Routers are not only forwarding elements • They implement the routing algorithms too • The control and forwarding, e.g. the contorl and data planes are not separated • Most of the routes are static • Traffic differentiation is expensive and not always possible • No way to test new ideas on a large scale • Hindering innovation and the introduction of new technologies • Reconfiguration of routers is expensive and always a tiresome business • There is no opportunity for radical changes • Many goof ideas from research, but almost no technology transfer 5.30. What is the problem with existing networks? • Commodity hardware • Difficult to program • Always behind the industry • Too expensive • Routers are not only forwarding elements 92 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • They implement the routing algorithms too • The control and forwarding, e.g. the contorl and data planes are not separated • Most of the routes are static • Traffic differentiation is expensive and not always possible • No way to test new ideas on a large scale • Hindering innovation and the introduction of new technologies • Reconfiguration of routers is expensive and always a tiresome business • There is no opportunity for radical changes • Many goof ideas from research, but almost no technology transfer • This is where Software Defined Networking could help!!! 5.31. Software Defined Networking • Separating control and data planes • The routers are simple forwarding elements • The decision is made by a programmable logically centralized element called the controller • Forwarding • Forwarding • Network OS • (Control) • Forwarding • Network OS • App • App • App • ... • Packet forwarding element • Open interface between the control and data planes 5.32. OpenFlow • What is OpenFlow? • a protocol for control the forwarding behavior of Ethernet switches in an SDN • Initially released by Stanford, now maintaned by Open Networking Forum • OpenFlow 1.3 was approved in 2012, but most of the devices support OpenFlow 1.0 93 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Controller • (a server software) • App • App • App • ... • Ethernet switch • Data path (HW) • Control path • OpenFlow protocol 5.33. OpenFlow • Innovative • app • Flow Table: • Filter: header=x, Action: send to port 2 • Filter: header=y, Action: overwrite IP field and send to port 4,5 94 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • In other cases ask the controller 5.34. Plumbing primitives • Matching arbitrary bits in the header • E.g. 100010110xxx0110x001xxxx • Any header or new header • Allows any flow granularity • Possible actions – small set of forwarding primitives • Forward to ports, drop flows, send to the controller • Overwrite header with mask, pop or push • Forward at specific bit rate 5.35. Network OSes • Open Source for research • NOX (C++/Python) • http://www.noxrepo.org/ • POX • Meastro • Helios • Beacon • Etc. • Commercial • ONIX (OSDI 2010, Google, Nicira, NEC) • ... 5.36. OpenFlow support in PlanetLab • Goals • Easily deploy a virtual L2 topology • Having as many virtual switches as planetlab nodes • They are connected with virtual cables • What do we get? • User level switches • Sliver-ovs – a modified OpenFlowVSwitch 95 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Support for ethernet over UDP tunnels • One tap device per switch (tunnel access point) • Bridging by OVS instance running in the sliver 5.37. How to use it in PlanetLab? Installation • sliver-ovs comes preinstalled with a new slice • Send a mail to PlanetLab Europe Support to request a private IP subnet for your slice • On your laptop, the following tools are needed • GNU make • the openssh client • the host program (usually distributed in bind-tools) • The Makefile contained on the git openvswitch.git;a=tree;f=planetlab/exp-tool;hb=HEAD repository 5.38. How to use it in PlanetLab? Assume that we have slice named elteple example with four nodes • onelab7.iet.unipi.it • planet2.elte.hu • planetlab2.ics.forth.gr 96 Created by XMLmind XSL-FO Converter. http://git.onelab.eu/?p=sliver- Large-scale Internet measurement • planetlab2.urv.cat The subnet we obtained is 192.168.0.2 [fragile] 5.39. Create the topology • Put a conf.mk file in the same directory where the Makefile is with the following content: --------SLICE=example_slice HOST_1=onelab7.iet.unipi.it IP_1=192.168.0.1/24 HOST_2=planet2.elte.hu IP_2=192.168.0.2/24 HOST_3=planetlab2.ics.forth.gr IP_3=192.168.0.3/24 HOST_4=planetlab2.urv.cat IP_4=192.168.0.4/24 LINKS := LINKS += 1-2 LINKS += 2-3 LINKS += 2-4 --------- • And then just type • make -j [fragile] 5.40. Create the topology 97 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Put a conf.mk file in the same directory where the Makefile is with the following content: --------SLICE=example_slice HOST_1=onelab7.iet.unipi.it IP_1=192.168.0.1/24 HOST_2=planet2.elte.hu IP_2=192.168.0.2/24 HOST_3=planetlab2.ics.forth.gr IP_3=192.168.0.3/24 HOST_4=planetlab2.urv.cat IP_4=192.168.0.4/24 LINKS := LINKS += 1-2 LINKS += 2-3 LINKS += 2-4 --------- • And then just type make -j. • You can easily test it: ssh -l eltepleexample onelab7.iet.unipi.it ping 192.168.0.4. 5.41. Modify the topology • You can dynamically add and remove virtual links 98 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • To remove the link between nodes 2 and 3: make –j U/2-3. • Create a new link between nodes 3 and 4: make –j L/3-4. 5.42. Literature • SFA User guide: • http://svn.planet-lab.org/wiki/SFAGuide • OpenFlow website: • http://www.openflow.org/ • OpenFlow in PlanetLab: • https://www.planet-lab.eu/doc/guides/user/practices/openflow 6. 6 Bandwidth measurement methods Network path characterization 6.1. Methods to measure path characteristics 99 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Passive methods • monitoring 100 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Carried out at a certain point of the network • Non invasive • Special permissions are needed • Active methods • Artificial probes • end-to-end measurements • Invasive • No special permission is needed • The network traffic is subdivided into packets: • Packet level properties: • source, destination • Packet size (payload) • Timestamps (sending, receiving) • +others 6.2. Capacity • S • 45 • DS3/ATM • 100 • Fast Ethernet • 100 • Fast Ethernet • R • 36(IP) + 9 (ATM) 6.3. Available bandwidth 6.4. Capacity and Available Bandwidth • kth link the narrow one • C2 • C3 • C1 101 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • A 6.5. Passive Techniques • Network managers are very interested in available bandwidth • Can be measured at each link from router utilization statistics (via SNMP) • Multi Router Traffic Grapher • MRTG graph example: Weekly traffic • BUT, users do not genreally see this data and it is not end-to-end • Visit: www.bix.hu – Budapest Internet eXchange 6.6. Active probing methods • Goal: • estimate network parameters • available bandwidth, physical bandwidth, cross traffic statistical properties, etc. • with end-to-end methods • Sender • Receiver • Sender Monitor: • Receiver Monitor: • background traffic 6.7. Basic ideas • Blast the path with UDP traffic • bad and very harmful practice 102 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Measure throughput of large TCP transfer • Widely used BUT has many drawbacks • TCP does not get available bandwidth in under-buffered paths • TCP gets more than available bandwidth in over-buffered paths • TCP saturates the path • intrusive measurements 6.8. State of the art Bandwidth estimation methods • SLoPS-based: • PathLoad 2002 • Packet pair-based: • PathChirp 2003 • PathSensor 2007 • Cprobe • TCP throughput measurement tools: • Treno • Iperf • Cap 103 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.9. SLoPS Self-Loading Periodic Streams 104 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.10. SLoPS Self-Loading Periodic Streams 6.11. SLoPS • • 1 • 2 • 3 • 4 • 4 • 1 • 2 • 3 • D1 • D2 • D3 • D4 • when 6.12. SLoPS 105 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • D1 • D2 • D3 • D4 • • 1 • 2 • 3 • 4 • 1 • 2 • 3 • 4 • when 6.13. OWD variations 106 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.14. How it works? 6.15. How to determine parameters K,L and T? • L – Size of the stream • can not be less than certain number of bytes • should not be greater than path MTU • to avoid fragmentation • T – Transmission time of the stream • should be small • to complete transmission of stream before context switch • K – Probe packets to be transmitted • Large K may overflow the queue of the tight link • when • Small K does not give enough samples • to infer trend robustly 6.16. Fleets of streams 6.17. How to detect the increasing trend of OWDs? 6.18. PathLoad uses two metric to recognize increasing trend 107 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.19. PDT and PCT examples 6.20. PCT variations examples 108 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.21. PDT variations example 109 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.22. Rate adjustment • Grey region 6.23. Performance • Source: Jain, Manish et al. (see last slice) 110 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.24. Packet Pair-based methods • ... • output spacing, receiver node • background traffic • stochastic process • probe pairs • fixed inter packet delay • ... • input spacing, sender node 6.25. PathChirp Chirp Packet Trains 111 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.26. PathChirp • Goal: exploit information in queuing delay signature • Excursions 6.27. PathChirp Methodology 112 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.28. Self-Induced Congestion 113 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.29. Excursions • Only a few packets are involved • (not valid excursions) 114 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.30. pathChirp Tool • Available as a standalone tool • Open source • http://www.spin.rice.edu/Software/pathChirp/ • How it works? • Two instances: a client and a server side • It sends UDP probe packets between the end points • No clock synchronization required, since it only uses relative queuing delay within a chirp duration • Computation at receiver • User specified average probing rate 6.31. Comparison with Pathload • Experimentation setup: • 100Mbps links • Poisson cross traffic • Y-topology 115 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Approx. 10 times fewer bytes are enough 6.32. PathSensor: Granular model-based bandwidth estimation • fluid model • the asymptotic behavior is correct • but unable to describe the transition region • new analytic description of the transition region • Introducing granularity, the effective cross traffic packet size • • 116 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.33. Estimating output spacing with fluid traffic for a single-hop scenario 6.34. Fluid curves for single-hop 117 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.35. How to simulate cross traffic? 118 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 6.36. Output spacing 6.37. Output spacing 6.38. Explicit solution for M/D/1 queues • Simplest M/G/1 type case is an M/D/1 queue: • single cross traffic packet size: P • Poisson arrival rate: l 6.39. Explicit solution for M/D/1 queues 119 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Validation with packet level simulation • M/D/1 queue: - Poisson arrival process - single cross traffic packet size ( from the M/D/1 result ) theoretical curves • Trimodal packet size distribution: - packet sizes are 40, 576 and 1500 bytes, while the corresponding probabilities are 0.59, 0.23 and 0.23. theoretical curves from the numerical integration of the Takács equation. • Validation with packet level simulation • Uniform packet size distribution: - between of the Takács equation. bits theoretical curves from the numerical integration • Validation with packet level simulation • Parameterization with the granularity • We can substitute the Taylor expansion of the queue length distribution function • into the Takács equation: • and introduce the granularity parameter Pg: • exact form of the cross traffic packet size distribution is not necessary; the value of the granularity is enough to describe also the transitional range of the dispersion curve • Parameterization with the granularity • M/D/1 curves for: • single packet size, • • uniform, bits bits, bits 120 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • bits • Tri-modal, real Internet • bits • Parameterization with the granularity • Granularity in a multihop scenario • Granularity determines the deviation from the fluid model even in a multihop case • CT with different granularities • with same granularities • Estimating the network parameters • Estimating the network parameters granular vs. fluid estimates • simulated scenarios: - fluid and granular estimates on the same measurement data - comparing the estimated networks parameters to the real values [fragile] Laboratory experiments Experiment settings: C = 10 Mbps, Cc = 4 Mbps, Pg = 12000 bits fitted granular parameters: C = 10.0 Mbps, Cc = 3.7 Mbps, Pg = 12000 bits fluid parameters: C = 10.5 Mbps, Cc = 5.1 Mbps while 100 packet pairs were averaged. Experiment settings: C = 100 Mbps, Cc = 22 Mbps, Pg = 12000 bits fitted granular parameters: C = 100 Mbps, Cc = 22.5 Mbps Pg = 15000 bits fluid parameters: C = 120 Mbps, Cc = 58 Mbps while 20 packet pairs were averaged. • Internet experiments • Carried out in ETOMIC • ETOMIC nodes located in Stockhom, Sweden and Jerusalem, Israel. estimated granular parameters: . fluid parameters: • ETOMIC nodes located in Pamplona, Spain and Budapest, Hungary. estimated granular parameters: . fluid parameters: • Laboratory and Internet experiments • Comparison to existing tools: - pathload - pathChirp - modified pathChirp tool with granular model based bandwidth estimation. 6.40. Literature 121 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Jain, Manish, and Constantinos Dovrolis. Pathload: A measurement tool for end-to-end available bandwidth. In Proceedings of Passive and Active Measurements (PAM) Workshop. 2002. • Ribeiro, Vinay Joseph, et al. pathchirp: Efficient available bandwidth estimation for network paths. In Proceedings of Passive and Active Measurements (PAM) Workshop. 2003. • Strauss, Jacob, Dina Katabi, and Frans Kaashoek. A measurement study of available bandwidth estimation tools. Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. ACM, 2003. • Hága, Péter, et al. Understanding packet pair separation beyond the fluid model: The key role of traffic granularity. IEEE INFOCOM. 2006. • Hága, Péter, et al. Granular model of packet pair separation in Poissonian traffic. Computer Networks 51.3 (2007): 683-698. • Hága, Péter, et al. PathSensor: Towards Efficient Available Bandwidth Measurement. Proceedings of IPSMoMe 2005. 2005. 7. 7 Topology discovery in large-scale networks 7.1. Topology discovery Why are we interested in discovering topology? • Reverse engineering the network • Better understanding of Internet routing • Forwarding and route dynamics • Simulation tools for the Internet • Topology and routing behaviour • Network management • Discovering multicast trees • Monitoring path properties • Failure localization • Predict path failures • Topology aware algorithms 7.2. Challenges • Determined by the routing algorithms of the Internet • Large-scale – over 1 billion hosts • Impossible to store each IP address • Administrative regions 122 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Internet = network of networks • ISPs want to hide the details of their own networks • Non-unique routes and changes may occur • Load balancer routers • AS level or intra-domain changes 7.3. Naiv approaches • Traceroute measurements • Sometimes misleading • Huge traffic overhead • Not suitable for detecting path changes • Limited probing frequency • BGP tables • AS level topology • BGP data lacks visibility • RIPE and other orgs collect and publish BGP data from some routers 7.4. CAIDA’s Skitter 123 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 7.5. NetDimes 124 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 7.6. Expectations • A topology discovery method should be • Efficient • Low traffic overhead • Accurate • Handling the effect of load balancing • Providing an accurate Internet map • Fast enough • To keep tracking route changes • Probes with high frequency 7.7. Different methods • Efficiency 125 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Accuracy • Traceroute • Tracetree • DoubleTree • ParisTraceroute • DTrack • Route discovery tools 7.8. ROUTE DISCOVERY 7.9. Traceroute • Try traceroute tool • Measure the path from a source to a destination • ICMP or UDP probes • How it works? • It sends UDP probes to each hop on the route by varying the TTL field of the IP header • When the TTL reaches 0 at a given router, the router sends a Time-Exceed ICMP message back to the source with the router’s address in the header. • When the probe arrives at the destination, a Port Unreachable message is sent back, indicating that the measurement is over. • The source increases the TTL field of the probes from 0 to a given limit (30) • to test all the hops along the path 7.10. How traceroute works? • S • D • R1 • R2 • R3 • UDP probe • TTL:1 • Dest.:D • Port:6761 • ICMP TE reply 126 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Dest.: S • Source: R1 7.11. How traceroute works? • S • D • R1 • R2 • R3 • UDP probe • TTL:2 • Dest.:D • Port: 6762 • ICMP TE reply • Dest.: S • Source: R2 7.12. How traceroute works? • S • D • R1 • R2 • R3 • UDP probe • TTL:3 • Dest.:D • Port: 6763 • ICMP TE reply • Dest.: S • Source: R3 7.13. How traceroute works? • S 127 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • D • R1 • R2 • R3 • UDP probe • TTL:4 • Dest.:D • Port:6764 • ICMP PU reply • Dest.: S • Source: D • Port 6764 is unreachable • D has been reached 7.14. Problems • Missing ICMP replies • There are routers that deny sending TE messages • Or the destination denies transmitting PU messages • Sometimes the route is not unique • In presence of load balancers the discovered path may be misleading • Per-packet • Per-flow 7.15. Problems with load balancers • S • D • R1 • R2 • R3 • R5 • R4 • The load balancer • 128 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 7.16. Problems with load balancers • S • D • R1 • R2 • R3 • R5 • R4 • The load balancer • • • The inferred path is not valid 7.17. What causes this anomaly? • Traceroute uses the destination port as identifier • This is why the destination port is increasing hop by hop • However, per-flow load balancers use the destination port as part of the flow identifier • The decision of which alternative route is chosen is also based on this port number • Per-packet load balancers • E.g. selecting alternative routes uniformly at random for all the packets • Their presence is not too significant • S • D • R1 • R2 • R3 • R5 • R4 • DPort: 1 • DPort: 2 • DPort: 3 129 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • DPort: 4 7.18. A more complex example • Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet • The discovered path is not a real one. 7.19. What can we do? • Solution: Paris Traceroute • Idea: • Let the destination port fixed for each hop • Use the checksum field to identify the probes instead • It also handles load balancers • Sending multiple probes to each hop with different destination ports • Multipath Detection Algorithm (MDA) • S • D • R1 • R2 • R3 • R5 • R4 • DPort: 1 • DPort: 1 • DPort: 1 • DPort: 1 7.20. Paris Traceroute Algorithm 130 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet • Test if interface r at hop h-1 belongs to a per-packet load balancer 7.21. Finding the NEXTHOP 131 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet, in IEEE/ACM Transactions on Networking (TON), vol. 19, issue 3, pp. 830-840, June 2011 • Generates a hash to randomly sample next hop interfaces of r at hop h • Probe retransmission, discovering a next hop interface • s • Maintaining a set of hash values for hop h and interface s • The set of next hop interfaces of r 7.22. The key ideas behind NEXTHOP 7.23. Number of probes and the expected number of interfaces at 95 percent confidence level 132 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • n: Number of expected interfaces • k: Number of probes to send • k is easy to compute with a binary search. 7.24. SELECTFLOW: Selecting a flow • To determine the nexthop interfaces of router r • Each probe has different flow identifier • Different source ports in case of UDP and TCP • Sequence number in case of ICMP probes • Hash based identifier selection • The probe's index is mapped to the range of 10000-65535 • The destination address of TCP and UDP probes is fixed • It helps our probes to get through firewalls • Important for scanning a hop • selecting ids that go through the router r as well • If there is not enough such id, new measurements need to be carried out 7.25. SELECTFLOW: discovering new flows crossing router r 7.26. PERPACKET • • Source: Brice Augustin, Timur Friedman, and Renata Teixeira,Measuring multipath routing in the internet, in IEEE/ACM Transactions on Networking (TON), vol. 19, issue 3, pp. 830-840, June 2011 7.27. Discovering nexthop interfaces in presence of a load balancer • Looking for nexthop interfaces of RX • First assumption: having two ifs • Sending 6 probes through RX (cnf. 95percent) • Then we assume that there is another if • So, other 5 probes are sent through RX • We didn’t find another if • So the route discovery is finished • S 133 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Rx • ? • ? • Ry • Rz • SPort: 3 • SPort: 5 • SPort: 1 • SPort: 2 • SPort: 4 • SPort: 6 • ? • SPort: 7 • SPort: 8 • SPort: 9 • SPort: 10 • SPort: 11 7.28. Discovering nexthop interfaces in presence of a load balancer • S • Rx • ? • ? • Ry • Rz • SPort: 3 • SPort: 5 • SPort: 1 • SPort: 2 • SPort: 4 • SPort: 6 • SPort: 7 134 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • SPort: 8 • SPort: 9 • SPort: 10 • SPort: 11 • ? • ? • • SPort: 3 • SPort: 5 • SPort: 7 • SPort: 8 • SPort: 10 • SPort: 12 • SPort: 13 • SPort: 15 • SPort: 16 • SPort: 12 7.29. Performance of Paris traceroute 135 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Load balancer routers discovered: • per-flow: 30 percent • per-packet: 2 percent • Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet, in IEEE/ACM Transactions on Networking (TON), vol. 19, issue 3, pp. 830-840, June 2011 7.30. Load balancers 136 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet, in IEEE/ACM Transactions on Networking (TON), vol. 19, issue 3, pp. 830-840, June 2011 7.31. TOPOLOGY DISCOVERY 7.32. Topology discovery • To discover tree topologies in the source to receivers direction • End-to-end paths may overlap • Some links are scanned several times • Generating higher traffic load on the network than what is necessary for topology discovery • How to handle this redundancy? 7.33. DoubleTree • Efficient cooperative topology discovery algorithm • Handles two kind of redundancies • Intra-monitor redundancy • Inter-monitor redundancy 137 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 7.34. The actual topology • M1 • M1 • M1 • D1 • D2 • D3 • D4 7.35. Intra-monitor redundancy • M1 • M1 • M1 • D1 • D2 • D3 • D4 • If probes are sent from a specific source to multiple targets, many links (close to the source) are probed several times, resulting intra-monitor redundancy. 7.36. Inter-monitor redundancy • M1 • M1 • M1 • D1 • D2 • D3 • D4 • If probes are sent from multiple sources to a specific target, many links (close to the destination) are probed many times, resulting inter-monitor redundancy. 7.37. Tree like structures • Suggesting two different kind of probing strategies 138 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • To discover two kind of tree like structures • Monitor rooted tree • For intra-monitor case • Govindan et al. and TraceTree • Destination rooted tree • For inter-monitor case 7.38. Monitor rooted tree • M1 • M1 • M1 • D1 • D2 • D3 • D4 • Reducing redundancy by scanning each link once 7.39. Destination rooted tree • M1 • M1 • M1 • D1 • D2 • D3 • D4 • Reducing redundancy by scanning each link once 7.40. DoubleTree • Forward and backward probing require different strategies • How to handle both kind of redundancy? • DoubleTree • starts probing at a given hop h • First performing forward probing from h 139 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Continuing with backward probing from h-1 7.41. Maintaining trees 7.42. DoubleTree results • Measurement load reduction up to 76 percent • Interface and link coverage above 90 percent • S1 • S2 • D1 • D2 • A • B • Network • Network • 4 partial measurements instead of 8 full paths • h 7.43. Literature • Augustin, Brice, et al. Avoiding traceroute anomalies with Paris traceroute. Proceedings of the 6th ACM SIGCOMM conference on Internet measurement. ACM, 2006. • Augustin, Brice, Timur Friedman, and Renata Teixeira. Measuring multipath routing in the Internet. IEEE/ACM Transactions on Networking (TON) 19.3 (2011): 830-840. • Donnet, Benoit, et al. Efficient algorithms for large-scale topology discovery. ACM SIGMETRICS Performance Evaluation Review. Vol. 33. No. 1. ACM, 2005. 8. 8 Network tomography 8.1. What does tomography mean? • Noninvasive imaging technique • X-ray or other beam directed through a part of the body onto sensors • The sensor data are used to reconstruct cross sectional views of the body 140 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Source: http://en.wikipedia.org/wiki/File:BrainMRI3planes.gif 8.2. Network tomography? • Who knows what is going on in the network? • ? • The network is a weighted graph • • Weighting could be • Loss 141 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Delay • Traffic rate • Etc. 8.3. Network tomography? • Reveal the internal state and structure of the network from • End-to-end/external measurements • Network models • The main goal is network optimization: • Performance metrics • Find bottleneck links, link characteristics • Diagnostic • Detect if something is broken or slow • Security • How to know if a malicious element (e.g. a sniffer) was added to the network 8.4. How does it work? 142 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Internet • Sending probes from different sources to some receivers 8.5. How does it work? 143 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • A limited number of measurements to infer the underlying topology and link performance metrics • Internet 8.6. Network Tomography • Why end-to-end measurements? • ISPs do not share too much about their network and does not allow internal measurements 144 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Internet is a decentralized stateless network • Measurement techniques • Unicast probes • Inferring path characteristics • Multicast probes • Inferring characteristics for multicast tree segments • Unicast back-to-back probes • Try to simulate multicast probes 8.7. Network Tomography • Why end-to-end measurements? • ISPs do not share too much about their network and does not allow internal measurements • Internet is a decentralized stateless network • Measurement techniques • Unicast probes • Inferring path characteristics • Multicast probes 145 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Inferring characteristics for multicast tree segments • Unicast back-to-back probes • Try to simulate multicast probes 8.8. Network Tomography • Why end-to-end measurements? • ISPs do not share too much about their network and does not allow internal measurements • Internet is a decentralized stateless network • Measurement techniques • Unicast probes • Inferring path characteristics • Multicast probes • Inferring characteristics for multicast tree segments • Unicast back-to-back probes • Try to simulate multicast probes 146 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 8.9. Network Tomography • Why end-to-end measurements? • ISPs do not share too much about their network and does not allow internal measurements • Internet is a decentralized stateless network • Measurement techniques • Unicast probes • Inferring path characteristics • Multicast probes • Inferring characteristics for multicast tree segments • Unicast back-to-back probes • Try to simulate multicast probes 147 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 8.10. What else is needed? 8.11. Estimating Source-destination traffic intensities 8.12. Estimating Source-Destination traffic intensities 8.13. A toy example 148 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 8.14. EM algorithm 8.15. MLE and Normal Approximations 8.16. MultiCast-based loss inference • Cáceres, Ramón, et al.Multicast-based inference of network-internal loss characteristics. Information Theory, IEEE Transactions on 45.7 (1999): 2462-2480. 149 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • EM-based approach to estimate loss rates on internal links from end-to-end measurements. • MultiCast from one source (0) to multiple destinations (4,5,6,7) 8.17. Loss model 150 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 8.18. Loss inference 8.19. Solution with EM 8.20. Convergence 151 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 8.21. Convergence 152 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 8.22. Unicast network tomography 153 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Coates, Mark, et al. Maximum likelihood network topology identification from edge-based unicast measurements. ACM SIGMETRICS Performance Evaluation Review 30.1 (2002): 11-20. • Sandwich probing measurement scheme • Stochastic search method for topology identification 8.23. Sandwich probing • Sending three probe packets unicast • Two small packets • And a large one in the middle • The main idea: • Because of queuing, extra separation between the small packets is induced on shared links • p1 • p3 • p2 8.24. Sandwich probing • p1 • p3 • p2 154 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Branching point • Queues, but not branching points • p1 • p3 8.25. Measurement framework 8.26. Topology Identification 8.27. Simplifying the problem 8.28. Find the tree 155 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 8.29. Illustration 156 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • true topology • Inferred topology 8.30. Literature • Vardi, Yehuda. Network tomography: Estimating source-destination traffic intensities from link data.Journal of the American Statistical Association91.433 (1996): 365-377. • Cáceres, Ramón, et al. Multicast-based inference of network-internal loss characteristics. Information Theory, IEEE Transactions on 45.7 (1999): 2462-2480. • Coates, Mark, et al. Maximum likelihood network topology identification from edge-based unicast measurements.ACM SIGMETRICS Performance Evaluation Review 30.1 (2002): 11-20. 9. 9 Network coordinates systems 9.1. Introduction • Aims • Estimate delays without performing direct measurements • Reduce the consumption of network resources • Usage • Closest mirror selection • Such as closest game server • Construction of application level multicast trees • OASIS (a distributed anycast system) • peer-2-peer networks • Azureus (now called Vuze a bittorrent client) 157 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Distribute web-crawling to nearby hosts • Provide inter-node latency bounds for clusters 9.2. The key idea of an NCS 9.3. Localization Techniques • Global Positioning System • Geolocation approaches • Already discussed • Meridian Approach • Wong et al. (SIGCOMM 2005) [23] • A framework for hosts to lookup their nearest peers in an overlay network • node selection directly without computing coordinates • Constructs a multi resolution ring structure 9.4. Localization Techniques • An example: Meridian • Scalable, gossip based node discovery • Example for closest node selection: 9.5. Localization Techniques • Main drawbacks • Huge number of wide-area-spanning e2e paths • It makes preforming on demand measurements impractical • High frequency probing, costly • Time-consuming 9.6. Network Coordinates System Basics 158 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.7. Network Coordinates System Basics 159 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.8. Network Coordinates Systems Advantages • Easy and practical support to P2P applications • Characterizing the proximity among peers • Neighbour selection • Scalability • Direct measurements are eliminated • Acceptable accuracy • The accuracy is not perfect, but acceptable • NCS families • Landmark based • Fix set of well-known trusted nodes • Decentralized • Any node may be used as landmark 9.9. LANDMARK BASED NCS 9.10. IDMaps • Internet Distance Map Service (IDMaps) • Francis at el. (INFOCOM 1999) [13] • The first complete system • Predecessor of landmark based NCSs • HOPS servers – Tracers – ordenary hosts 160 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Virtual topology map of the Internet • T1 • T2 9.11. Landmark based NCSs • Landmarks • All to all measurements • Computing its own coordinates (Fig. a) • Ordinary hosts • Measurements towards the landmarks • Evaluating its own coordinates based on landmarks ones (Fig. b) 9.12. Global Network Positioning 161 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 1. phase • Measuring inter-landmark latencies • Calculating the landmark coordinates • Minimizing the relative error function by DownHill Simplex method • 2. phase • Measuring the latencies between node A and each of the landmarks (K required) • Host A computes it own coordinates using the DownHill Simplex 9.13. Lighthouses Lighthouses is a GNP extension • Pias et al. (IPTPS 2003) • Problems with GNP • Huge volume of measurement traffic • Growing number of target nodes • Solution • Using multiple landmark sets • Each host measures distances to only one set • It is based on the concept • Multiple local basis with a transition matrix 162 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.14. Lighthouses 163 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.15. Network Positioning System Network Positioning System (NPS) • Zhang et al. (USENIX ATC 2004) [26] • Extends GNP into a hierarchical CS • All nodes could serve as landmarks • Introduces layers and dependencies • Recovering mechanism • Landmark failures • Performance bottlenecks 164 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.16. Internet Coordinate System 9.17. Internet Coordinate System • Clustering scheme for landmarks • Administrative node groups landmarks that are close to each other into clusters • Median nodes of the clusters are used by a node A to be joined • Choosing the most representative landmarks • Reducing the number of measurements 9.18. Virtual landmarks 165 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.19. Internet Distance Estimation Service Internet Distance Estimation Service (IDES) • Mao et al. (IMC 2004) • Provides two learning algorithms allowing a linear dimensionality reduction applied to matrices • Singular Value Decomposition (SVD) • Non-Negative Matrix Factorization (NMF) • Instead of Euclidean embedding • 1 • 1 • 1 • 1 • • • • • • One Possible 2-D Embedding • H3 • H1 • H2 • H4 • H3 • H1 • H2 • H4 • The estimated distance • between H1 and H4 • is 1.414 while the real • distance is 2.0 • Extra dimensions don't help 166 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.20. DISTRIBUTED NCS 9.21. Distributed NCSs • Generalizing the role of landmarks • to any node existing in the system • or by eliminating the permanent infrastructure • It can be seen as a • peer-to-peer • positioning • system. 9.22. Practical Internet Coordinates 9.23. Big-Bang Simulation 167 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Big-Bang Simulation (BBS) • Shavitt et al. (INFOCOM 2003) • Modeling the network nodes as a set of particles • Particles are traveling in the space under the effect of potential force field • They are initially placed at the origin of the space • The field force is derived from the total embedding error • Particles pull and repulse each others depending on the distance error between them • Reducing the potential energy of the whole system • At the end of each phase an equilibrium is achieved 9.24. Big-Bang Simulation 168 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.25. Vivaldi • Dabek et al. (ACM SIGCOMM 2004) • The most successful NCS so far • Used by some BitTorrent clients!!! • Not require any fixed network infrastructure • Compute the coordinates for a node A • Collecting distance information for a set of neighbors • Calculating its new coordinates using the above meas. • Each edge is modeled as springs • Handling high-error nodes • Weights for each RTT sample • Vivaldi is quickly convergates when latencies satisfy the triangle inequality 169 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.26. Vivaldi A variant of Euclidean Coordinates • Notation of height • Height space • Euclidean coordinate + a height vector • To model latency penalty of network access links • Such as queuing delay, DSL lines or cable modems • Representing the last hop delays • Distance between nodes • Euclidean distance + a positive value of the height vector Some further extensions: • E.g. Using hyperbolic space 9.27. Vivaldi 9.28. Vivaldi – Centralized algorithm 170 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.29. Vivaldi – Centralized algorithm 9.30. Distributed Vivaldi with constant timesteps 9.31. Vivaldi – Adaptive timesteps 9.32. Decentralized Vivaldi with adaptive timestep 9.33. Latency data for performance analysis • PlanetLab all-to-all ping measurements • 192x192 delay matrix • King dataset involves 1740 DNS Servers 9.34. Timestep choice 9.35. Convergence and robustness against high-error nodes 9.36. Communication patterns 171 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.37. Triangle Inequality Violations 172 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.38. Euclidean spaces • First question is how many dimensions to use • 2 or 3 is sufficient 9.39. Spherical coordinates 173 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.40. Height model 9.41. Height model 9.42. Pharos - Hierarchical Vivaldi 174 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Y. Chen et al. (GLOBECOM 2007) • A Two layer model based on Vivaldi • Base overlay (Vivaldi) • Local cluster • Binning method with anchor nodes • Each node has two set of coordinates • Global NC • Local NC 9.43. Pharos – The algorithm 9.44. Hierarchical distance prediction 9.45. A two-tier ICS 175 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • M. A. Kaafar et al. (IFIP-TC6 Networking ’08) • Motivation • TIVs have negative effects on the distance prediction • The paper analyzes the proportion of TIVs 9.46. Triangular inequality violation • M. A. Kaafar et al. (IFIP-TC6 Networking ’08) • TIVs have negative effects on the distance prediction • Analyzing the proportion of TIVs 9.47. Triangular inequality violation • Impact of TIV severity on the embedding 9.48. Two-tier Vivaldi • Two level • Higher level - flat Vivaldi • Global coordinates • Lower level – Vivaldi on clusters • Local coordinates • Clustering • Flat Vivaldi – Clustering the delay space • Diameter ms • Cross-checking and removing nodes with high errors 176 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 9.49. Two-tier Vivaldi 9.50. Limitations • Expensive maintenence • More or less accurate prediction • Triangle inequality violations (TIV) • The matrix factorization introduced in IDES allows the representation of distances violating TIVs and asymmetric distances • Better accuracy when considering lower dimensional space (e.g. 2-dim.) [50], [51] • The absolute relative error may not be the major indicator of the quality of an embedding • Eclidean space vs other spaces • Surface of spheres or tori • Hyperbolic model • Instead of height model 9.51. Benefits • Benefit for P2P applications and overlays • Azureus uses Vivaldi 177 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • More than one million nodes • It improves the Azureus efficiency • To improve the accuracy and stability • Latency filters and application-specific coordinates updates • Gossip based coordinates update • TIVs exclusion or awareness • Azureus example 0.5 percent of TIV nodes leads to 20 percent improvements in global accuracy 9.52. Comparison of different techniques 9.53. Security in NCS PIC • Providing a test based on TIV • Removing the nodes that most violate the TI • For all landmarks two bounds are introduced • Upper and lower bounds • This mechanism may degrade the performance of a clean system 9.54. Security in NCS Internal attacks • Participants are not completly trusted entities • Attack families • Isolation • Independent isolation attack 178 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • M node delays the measured RTT such that it is consistent with the random coordinates claimed for the victim • Repulsion • M node claims a position that is far away from the actual one and then delays the measured RTT • Disorder • Random attack like DoS attack • System control • Colluding Isolate attack • M nodes cooperate with each other • First they behave in a correct and honest way until enough of them become landmarks 9.55. Security in NCS Generic security Mechanisms • Surveyor Infrastructure • Surveyor nodes form a group of trusted entities • Scattered across the network • Formal reputation model to detect misbehaving nodes • Reliability of nodes • Reputation Computation Agent • Certificate agent • RVivaldi • Veracity 9.56. Future directions • NCS are focusing on predicting the network latency • Other network characteristics • Jitter • Bandwidth • Additional dimensions in existing latency space • Inverse correlation between latency and bandwidth • Lee et al. (PAM 2005) [72] • Embedding additional performance indicators in their own metric space • E.g. bandwidth 179 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Streaming or downloading • Server selection • Sequoia [74] • Assuming that bandwidth is a tree metric • Prediction trees 9.57. Literature • F. Dabek, R. Cox, F. Kaashoek, R. Morris: Vivaldi: A Decentralized Network Coordinates System, Proceedings of ACM SIGCOMM ’04, 2004, Protland, Oregon, USA • Y. Chen, Y. Xiong, X. Shi, B. Deng, X. Li: Pharos: A Decentralized and Hierarchical Network Coordinate System for Internet Distance Prediction, Proceedings of IEEE GLOBECOM 07, 2007, Washington, DC, USA • M. A. Kaafar, B. Gueye, F. Cantin, G. Leduc, L. Mathy, Towards a Two-Tier Internet coordinate system to mitigate the impact of Triangle Inequality Violations, In proceedings of IFIP-TC6 Networking 08, 2008, Singapore, LNCS 10. 10 IP geolocation 10.1. Motivation • Location information can be useful for both private and corporate users • Targeted advertising on the web • Restricted content delivery • Location-based security check • Scientific applications • Measurement visualization • Network diagnostics • Analysing spatial properties • of the Intenet 180 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.2. IP Geolocation in general • Passive Geolocation • Geolinguistic • Registry based • Whois, dns • Organizational infromation • Commercial databases • Maxmind, IPLigence, IP2Location, etc. • Active Geolocation • rt1.lon.uk.geant2.net • London, UK 181 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 182 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.3. Whois based location estimation example for passive geolocation • Cumulative distribution of the maximal distances from Pamplona, Spain to 4000 Google IPs. The maximal distances are calculated from the network delays assuming 200000 km/sec signal propagation speed. The vertical line represents the real geographical distance between Pamplona and Mountain View, CA, showing that 47 percent of the nodes must be closer to Pamplona than Mountain View. 183 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.4. IP Geolocation in general • Geolinguistic • Registry based • Whois, dns • Organizational infromation • Commercial databases • Maxmind, IPLigence, IP2Location, etc. • rt1.lon.uk.geant2.net • Passive Geolocation • Active Geolocation • Large and geographically dispersed IP blocks can be allocated to a single entity 10.5. IP Geolocation in general • Geolinguistic • Registry based • Whois, dns • Organizational infromation • Commercial databases • Maxmind, IPLigence, IP2Location, etc. • rt1.lon.uk.geant2.net 184 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Passive Geolocation • Active Geolocation • Active probing • Delay, topology, etc. • Landmarks • With known location • Location estimates for each individual IP addresses • Large and geographically dispersed IP blocks can be allocated to a single entity • ? • Target to • be localized: • 182.214.37.1 185 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Landmarks with known location • 20 ms • 39 ms 186 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 31 ms • 50 ms • Transforming delays into geographic constraints • Transforming delays into geographic constraints 187 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Transforming delays into geographic constraints 10.6. THE FIRST STEPS 10.7. IP2Geo – Single point localization • Multi-pronged approach that exploits various "properties" of the Internet • DNS names of router interfaces often indicate location • network delay tends to correlate with geographic distance • hosts that are aggregated for the purposes of Internet routing also tend to be clustered geographically • GeoTrack • determine location of closest router with a recognizable DNS name • GeoPing • use delay measurements to estimate location • GeoCluster • extrapolate partial (and possibly inaccurate) IP-to-location mapping information using BGP prefix clusters [fragile] 10.8. GeoTrack – main idea 188 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Extract geographical information from DNS names of routers on the path • Localizes the target to the last router whose position is known • Example ngcore1-serial8-0-0-0.Seattle.cw.net => Seattle 184.atm6-0.xr2.ewr1.alter.net => New York dnvr-scrm.abilene.ucaid.edu => Denver 10.9. GeoTrack • GeoTrack operation • do a traceroute to the target IP address • determine location of last recognizable router along the path • Key ideas in GeoTrack • partitioned city code database to minimize chance of false match • ISP-specific parsing rules • delay-based correction • Limitations • routers may not respond to traceroute • DNS name may not contain location information or lookup may fail • target host may be behind a proxy or a firewall 10.10. GeoPing - Delay based localization • Delay-based triangulation is conceptually simple • delay to distance • distance from 3 or more non-colinear points - target location • But there are practical difficulties • network path may be circuitous • transmission and queuing delays may corrupt delay estimate • OWD is hard to measure • because of routing asymmetry 10.11. GeoPing - details • Measure the network delay to the target host from several geographically distributed probes • typically more than 3 probes are used 189 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • round-trip delay measured using ping utility • small-sized packets - transmission delay is negligible • pick minimum among several delay samples • Nearest Neighbor in Delay Space (NNDS) • akin to Nearest Neighbor in Signal Space (NNSS) in RADAR • construct a delay map containing (delay vector,location) tuples • given a vector of delay measurements, search through the delay map for the NNDS • location of the NNDS is our estimate for the location of the target host • More robust that directly trying to map from delay to distance 10.12. GeoCluster • A passive technique unlike GeoTrack and GeoPing • Basic idea: • breaks the IP address space into clusters • assign a geographical location to each cluster based on IP-to-location third party databases • given a target IP address, first find the matching cluster using longest-prefix match. • location of matching cluster is our estimate of host location 10.13. GeoCluster • Example: • consider the cluster 128.95.0.0/16 (containing 65536 IP addresses) • suppose we know that the location corresponding to a few IP addresses in this cluster is Seattle • then given a new address, say 128.95.4.5, we deduce that it is likely to be in Seattle too 10.14. GeoCluster – Clustering IP addresses • Exploit the hierarchical nature of Internet routing • inter-domain routing in the Internet uses the Border Gateway Protocol (BGP) • BGP operates on address aggregates • we treat these aggregates as clusters • in all we had about 100,000 clusters of different sizes 10.15. Performance of GeoCluster 190 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Median errors: • GeoCluster 30km • GeoPing 300km • GeoTrack 100km 10.16. ADVANCED TECHNIQUES 10.17. Constraint Based Geolocation • Constraint Based Geolocation (B. Gueye et al.) • strict geographic constraints based on the bestlines • Calibration for each landmark • Few hundreds of reference nodes 191 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.18. Constraint Based Geolocation 10.19. Octant IP geolocation framework • Octant (B. Wong et al.) • Similar to CBG, but it introduces negative constraints • Calibration for each landmark • Few hundreds of reference nodes 192 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.20. It is more than a simple method, it is a framework • Combine very different techniques • Active and passive • Constraint-based • Weighted positive and negative constraints • Constraint - region • Using Bézier-regions • Efficient implementations of clipping and union operations are available 10.21. Notations 193 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.22. Octant – Landmarks and constraints 10.23. Estimated location 10.24. Mapping latencies to distances 10.25. Mapping latencies to distances 194 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.26. Mapping latencies to distances 10.27. Last hop delays • Mapping is further complicated by queuing and transmission delays associated with the last hop • Cable and DSL connections • Overloaded PlanetLAB nodes • Goal: isolate the delay components which artificially inflate latencies • Detailed maps of the underlying physical network, as in network tomography (not in Octant) • Octant introduce a simple metric called height 10.28. Eliminating last hop delays in Octant 195 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.29. Last hop delays in Octant 10.30. Last hop delays 10.31. Results 10.32. Results 196 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.33. Spotter – a probabilistic approach • Spotter (S. Laki et al.) • Idea: the distances are not uniformly likely within a constraint • For a given delay, do the distances follow a special distribution? • Bayesian approach to calculate spatial distribution of a target 10.34. Travel time – distance relation • . 197 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • reference dataset (nodes with known location) • known distance between the source and destination • measured RTTs 10.35. Travel time – distance relation • . 198 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • reference dataset (nodes with known location) • known distance between the source and destination • measured RTTs • "slow" packets • "fast" packets 10.36. Statistical delay-distance model 10.37. Statistical delay-distance model 199 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • The distances are normally distributed for a given RTT 10.38. Statistical delay-distance model • The distances are normally distributed for a given RTT • (after standardization) 10.39. Evaluation – "Probabilistic triangulation" 200 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.40. Performance analysis • Estimated accuracy for a geolocation ground truth • CAIDA's Geolocation Comparision Survey • More than 20000 reference nodes • Located in North America and in Europe • In North America • 35 percent 201 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 9 percent • 2 percent • 70 percent • 40 percent • 27 percent 10.41. Topology-based Geolocation 10.42. Topology based geolocation • TBG (E. Katz-Basset et al.) • Problems with CBG 202 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Use constraints that are less than speed of light • Risk of underestimates • When an underestimate occurs, the final region does not contain the true location • Topology based geolocation • using the speed of light to generate constraints • inspired by Sensor Network Localization 10.43. Summary of techniques • Traceroute from landmarks • Map topology • Estimate hop latency • Improve accuracy • Cluster network interfaces • Increase structuring • Validate location hints • Incorporate location hints • Constraint optimization • Geolocate targets 10.44. Estimate hop latencies • Using traceroute tool to infer link latency • Estimate hop latency from the difference in RTT to adjacent routers • Accurate only if the link is traversed both directions (symmetric routing) • How can we discover this property? • Three different techniques 10.45. Estimate hop latencies • Observing the reverse TTL values • Most routers initialize the TTL values for thier packets from a small set. • 30,32,64,128,150,255 • If TTL values changes significantly from one node to the next - discard the link estimate • Measuring paths in both direction between pairs of landmarks • If both paths traverse a particular link - taking the differences of measurements to the two endpoints 203 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • This estimation has high confidence • Increasing vantage points from which we probe a certain link • For every link on a path from a landmark we probes to both endpoints from all other landmarks… • If these probes pass over the link - estimate for the link 10.46. Clustering interfaces • Clustering interfaces that belong to the same router (IP aliases) 10.47. Clustering interfaces • Two IP-aliases techniques • Mercator • UDP probes are send to high-numbered ports on a set of interfaces • Routers send back a port-unreachable ICMP message with the source address • If two diff. interfaces replie with the same source address - aliases • Ally • Used on pairs of interfaces • Sends probes to the two if. • Examines the IP-ID 204 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Most routers generate the IP-ID using a single counter that has incremented after each packet has been created 10.48. Validating location hints • DNS names - locations • Some names are incorrect • Missnamed, reconfig, reassignment of IP addresses • Topology constraints can be used to verify location hints • RTT measurements - upper bounds... • Clustering - aliases • Hop latencies 10.49. Constraint optimization 10.50. Constraint optimization 10.51. Results 205 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 10.52. Results 10.53. Other issues to be handled Indirect routes • The preceding assumption • Route lengths are proportional to great circle distances • Not the case in practise, due to policy routing • Example: a subscriber Ithaca, NY - Cornell Univ. (Ithaca) • Syracuse, NY - Brockport, IL - New York City - Cornell Univ. • 1 mile physical distance VS. 800 miles length path 10.54. Other issues to be handled Indirect routes discovery • Landmark's heigth can indicate? • Localizing routers on the network path • Secondary landmarks • Localization by latencies • Extract location from router names 206 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Reverse DNS lookup – undns tool • Using ZIP code to determine geographical location 10.55. Other issues to be handled Handling uncertainty • Filter out errorneous constraint • Latency based constraints • Weight system that decreases exponentially with increasing latency • Weight threshold 10.56. Other issues to be handled Iterative refinement • Two phase: • First, we use accurate and mostly conservative constraints • Second, less acurate and more aggressive constraints to obtain a better estimation (inside the initial estimated region) • And so on... 10.57. Literature • Padmanabhan, Venkata N., and Lakshminarayanan Subramanian. An investigation of geographic mapping techniques for internet hosts. ACM SIGCOMM Computer Communication Review 31.4 (2001): 173-185. • Gueye, Bamba, et al. Constraint-based geolocation of internet hosts. Networking, IEEE/ACM Transactions on 14.6 (2006): 1219-1232. 207 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Wong, Bernard, Ivan Stoyanov, and Emin Sirer. Octant: A comprehensive framework for the geolocalization of Internet hosts. Proceedings of the NSDI. Vol. 7. 2007. • Katz-Bassett, Ethan, et al. Towards IP geolocation using delay and topology measurements. Proceedings of the 6th ACM SIGCOMM conference on Internet measurement. ACM, 2006. • Laki, Sándor, et al. Spotter: A model based active geolocation service. INFOCOM, 2011 Proceedings IEEE. IEEE, 2011. 11. 11 Geography of the Internet On the spatial properties of network topology 11.1. Network research • 1959 • 1998 • 1999 208 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Erdős and Rényi • Watts and Strogatz • Barabási and Albert 209 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Transport • Biological • Social • Internet 11.2. The distance is what really counts. • Is that really true? • What can we say about the spatial structure of the Internet? • P. Mátray, P. Hága, S. Laki, I. Csabai, G. Vattay • On the Spatial Properties of Internet Routes • Elsevier Computer Networks, Volume 56, Issue 9 (2012) 11.3. Data collection 210 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 700 PlanetLab nodes • 400,000 traceroutes • 16,000 unique IP addresses 211 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • ? 11.4. Data collection 212 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Spotter • 5 • S. Laki et al.: Spotter: A Model Based Active Geolocation Service, IEEE INFOCOM 2011, April 2011, Shanghai, China • 13,000 filtered addresses • 44,000 links • How to visualize such data set? 213 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 11.5. Covered areas 11.6. Histogram maps • Simple aggregation: • Sgi • g6 • g5 • g4 • g3 • g2 • g1 214 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • A histogram map • A histogram map • San Francisco • Chicago • New York • Washington 215 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • A histogram map on log-scale • Los Angeles • Plano 11.7. Transforming spatial distributions 216 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • g 11.8. Transforming spatial distributions • ... • • 11.9. Transforming spatial distributions 217 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • ... • • 11.10. Transforming spatial distributions • ... 218 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • • 11.11. A router-likelihood map 11.12. Likelyhood of router positions - US 219 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 11.13. Likelyhood of router positions - US • Green dots represents the most populated areas 220 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 11.14. Characterizing the link length • Link length is approximated by the spherical distance between the two routers 221 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 11.15. Characterizing the network links • Which links are important? • Which cities are the most interconnected? • Which link length is the most frequent? • How to model link length distribution? • What can be said about the spatial structure of the network? • etc. 11.16. Frequency of link lengths • which links are frequent and important? 11.17. Frequency of link lengths 222 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • each link is represented once • which links are frequent and important? 11.18. Frequency of link lengths • each link is represented once • ? • which links are frequent and important? 11.19. Frequency of link lengths 223 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • each link is represented once • Urban range • Intracont. • Atlantic O. • Pacific O. • which links are frequent and important? 11.20. Frequency of link lengths 224 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • each link is represented once • links are weighted up with their • prevalence in the traceroute • data set 11.21. Frequency of link lengths 225 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • each link is represented once • links are weighted up with their • prevalence in the traceroute • data set • LA-Houston • 121,000 occurances • 39 unique links • 1.5 percent of all traffic! 11.22. Frequency of link lengths 226 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • each link is represented once • links are weighted up with their • prevalence in the traceroute • data set • Amsterdam - New York • and • Frankfurt - Washington • Copenhagen - New York • and • Paris - Washington 11.23. Frequency of link lengths 227 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • each link is represented once • links are weighted up with their • prevalence in the traceroute • data set • Amsterdam - New York • and • Frankfurt - Washington • Copenhagen - New York • and • Paris - Washington 228 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Only a handful of gateway cities: "spatial hubs". 11.24. Distribution of link lengths • do router distances follow specific rules? 11.25. Distribution of link lengths • logarithmic relation, where • do router distances follow specific rules? 11.26. Distribution of link lengths • logarithmic relation, where • do router distances follow specific rules? • Similar phenomena found in: • Social networks D. Liben-Nowell, et al., PNAS(2005) • Mobile communication networks R. Lambiotte et al., Physica A (2008) • E-mail networks J. Goldenberg, M. Levy arXiv:0906.3202 • Early Internet data S. H. Yook et al., PNAS (2001) 229 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 11.27. Distribution of link lengths • logarithmic relation, where • do router distances follow specific rules? • J. Kleinberg: Navigation in a small world, Nature (2000) • Connection to navigability 11.28. Distribution of link lengths 230 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • logarithmic relation, where • power law, where • do router distances follow specific rules? 11.29. The embedded topology 231 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 11.30. Characterizing network paths • Circuitousness • Direction dependence of lateral deviations • Hop distance analysis • Symmetry of Internet routes 11.31. Aggregated path length 232 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • The sum of the length of • the consecutive links. 11.32. Circuitousness • Geographic, geopolitical and economical factors also affect routing 233 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 11.33. Symmetry 11.34. Symmetry 234 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • A: United Kingdom – Hong Kong • B: California, USA – Hong Kong • C: California, USA – Singapore 11.35. Direction dependence of lateral deviations 235 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 11.36. Unfamiliar routing phenomenon? • RTT 300 ms 11.37. Literature • Kleinberg, Jon M. Navigation in a small world. Nature 406.6798 (2000): 845-845. • Laki, Sándor, et al. Spotter: A model based active geolocation service. INFOCOM, 2011 Proceedings IEEE. IEEE, 2011. • Zhang, Yin, and Nick Duffield. On the constancy of Internet path properties. Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement. ACM, 2001. • Mátray, Péter, et al. On the network geography of the internet. INFOCOM, 2011 Proceedings IEEE. IEEE, 2011. • Mátray, Péter, et al. On the spatial properties of internet routes. Computer Networks (2012). 236 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 12. 12 Network traffic analysis, clustering and classification 12.1. Traffic • ISP • Internet 12.2. Traffic classification • ISP 237 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Internet • ? • ? • ? • ? • What protocol and application has generated the traffic? 12.3. Traffic classification • Identifying the applications and protocols generating IP flows • Why is it important? • Adaptive, network-based QoS mapping 238 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Lawful interception • User behavior analysis • Interest from industry • ISPs want to know who is using their network and what for 12.4. Quality of Service (QoS) • Quality of service is the ability to provide different priority to • Different applications • Users • data flows • or to guarantee a certain level of performance to a data flow. • Different characteristics can be guaranteed • Bit rate, delay, jitter, packet dropping, etc. • QoS is often referred to as a quality measure, but there are many different definitions. 12.5. Traffic Classification • Many new challenges • Various peer to peer applications • Traffic encryption • Standard port numbers can be missleading • Standard “static" methods cannot be applied anymore • New approaches are needed 12.6. Traffic Classification • Port • Transport layer ports • Payload • Pattern matching • DPI tools • Flow statistics • Inter arrival times, packet sizes, etc. • Social • Connection patterns 239 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • User/host level statistics • Very old fashion!!! • Recently used • Experimental • not too efficient • Hard to implement in a real network 12.7. Different approaches 12.8. Deep Packet Inspection • What is it about? • Stateful inspection on packet header and payload • Signature based pattern matching • Why is it a big deal? • High speed in-line processing (at wire speed) • Low memory and storage consumption • Low false positive and miss rates • Good performance even in worst cases • Why is it so important? • Network Intrusion Detection, Lawful Inspection, QoS, Censorship, Traffic Blocking, etc. 12.9. Deep Packet Inspection Basics • Classical solution is based on DFA • Aho-Corasick DFA algorithm (1975) • Word set: a, ab, bc, bca, c, caa • Consumes one byte/character • per lookup cycle • 10GbE/OC192 - 1 gigabytes/sec. • Too many state transitions even for such a small set 240 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Initial state 12.10. Multi-byte pattern matching • w1: apple • w2: application • w3: appeal • w4: peal • w5: appreciate • ication • peal • app • l • reciate • e • eal 241 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • One character per lookup, but some speed up can be achieved 12.11. Deploying multiple multi-byte DFAs • ... • x • y • z • a • p • p • l • i • c • a • t • i • o • n • z • y • … • Table replicas for different offsets • Higher memory complexity • One lookup for each offset 12.12. True positive VS False positive etc. • A grand truth data set with true labels • Generated by a reliable DPI or manually • KNOWN • OTHER • TP( ) = • Classifying • FN( ) = 242 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 12.13. Performance of different DPI tools • Note • RF is a Random Forest-based classification approach • RF is usually at least as good as those of the DPI methods in TP ratio, • while the DPIs are usually better in FP ratio • Crucial for protocol dependent policies 12.14. Classical recipe for flow statistic-based traffic classification • Calculate features based on flow statistics • Inter packet delays • min., max., avg. packet sizes • Total amount of bytes in the flow • Number of some tcp flags • Etc. • Train a classifier on a ground truth dataset • E.g. a Support Vector Machine or Random Forest • Validate the approach using e.g. ten-fold classification • Split the training data into 10 sets with equal sizes • Train on 9 selected sets and test it on the remaining one • Repeat this procedure in all the possible ways • Which features are the best? 243 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • How to decide? • What about overfitting? • Is the ground truth general enough??? • For more details read the survey by Thuy T.T. Nguyen et al. 12.15. Statistical payload analysis • Statistical characterization of data in a flow • Modeling a protocol • Byte distribution on flows • A useful protocol model • Expressive • Compact • Automatic • no human expertise/work is needed. • Few attempts: KISS, Markov models 12.16. KISS: Stochastic Packet Inspection • A. Finamore et al. • The method consists of three phases: • Statistical characterization of traffic • Look for the behavior of unknown traffic • Assign the class that better fits it • Check for false positives 12.17. Chi square statistics • Time • Deterministic • Deterministic • Deterministic • Counter • Random • Deterministic • Deterministic 244 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • payload bytes, 4bit x Chunk • Source: A. Finamore et al. • Source: A. Finamore et al. 12.18. Decision process • Statistical characterization of bits in the flow • Each flow is a vector in a G dimensional space • Decision process • Classification of these vectors based on • Euclidean distance – minimum distance • Maximum likelihood – Support Vector Machine 12.19. Validation on a real traffic trace • RTP errors are due to the unreliable training data • (dpi did not identify RTP v1) • DNS errors are due to impure training set • (for the oracle all port 53 is DNS traffic) • EDK errors are (maybe) Xbox Live • (proper training for "other") • FN are always • below 3 percent!!! • Source: A. Finamore et al. 12.20. Early Identification of Peer-To-Peer Traffic • B. Hullár et al. (IEEE ICC 2011) • Similar goals to KISS • Based on the statistical analysis of packet payloads • Modeling a protocol • Byte distribution on flows 12.21. Modeling a flow • Feature vector • The first X payload bytes of the first Y packets per flow 245 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • where XY is small • Packet 1 • Packet 2 • Packet Y • XY bytes as features: 12.22. Classification via probabilistic models • Krichevsky-Trofimov (KT) estimator • Zero order model • Memoryless distribution over blocks of the payload • KT estimator provides smoothes estimates for unseen data • Markov • First order Markov-model • Low memory footprint • MarkovKT • First order model • Using Krichevsky-Trofimov estimator • Low memory footprint 246 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • CTW • Context Tree Weighting Method • Lossless data compression • Higher order model (at most 5 in our case) • Combining exponentially many variable-order Markov models. • Random Forest • State-of-the-art classification technique • Using the estimate of many decision trees for improving accuracy • Roboust against noise • Prone to overfitting 12.23. Data Collection for ground truth • Ground truth data • captured in a fully controlled environment • labeled by a modified kernel module • Fully trusted • recorded at ELTE • disadvantage: not too diverse data • advantage: exact class information (uncommon in the literature) • WIRELESS data set • WiFi and 3G • traces with full payload 247 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • July 2010 • LAN data set • high-speed LAN • only the first 16 payload bytes per packet • November 2009 12.24. Experiments • Evaluation on the labeled traces • True Positive and False Positive Ratio in bytes and flow numbers • Cross validation • Measurements with different parameter values • training set size • used bytes and packets • Robustness 12.25. Feasibility test • Classification from the first 16 bytes of the first packet of each flow 248 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 12.26. Feasibility test • Classification from the first 16 bytes of the first packet of each flow 249 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 12.27. How much data is needed? 250 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 12.28. How much data is needed? • Ten-fold cross validation • Experiments • 12.29. How much data is needed? • Ten-fold cross validation • Experiments • 251 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 12.30. Robustness • Similar results were obtained in the following scenarios: • Unknown traffic • Trained on Wireless tested on LAN • Asymmetric routing: using the first reverse packet/reverse flow • Real traffic traces 12.31. Training set sizes • How much training flow is needed • if only the first packet is considered? 252 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 12.32. Is it protocol independent? 253 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Using around 30 bytes of the payload is sufficient • in contrast to the 300-400 bytes used by KISS 12.33. Robustness Asymmetric routing • Observing traffic from just one direction of a flow • Classification based on the first backward packet 12.34. Robustness Unknown traffic • A new class has been introduced • "Others" containing unknown traffic types 254 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 12.35. Real traffic traces • Various protocols: • FTP, XMPP, DirectConnect, Gnutella, SIP, SSH, RTSP, POP3, UPnP, Windows, Source-engine, xbox, Opera-Mini-sockets, DNS, HTTP, IMAP, RTMP, BitTorrent, WAP, WoW, RTP, PPStream, SMTP • 16GB traffic in 365k flows • Some class with very high percentage (HTTP, Bittorent), • others with few flows (eg.: FTP 80 flows, XMPP 70 flows) 12.36. Confusion matrix 255 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 12.37. Literature • Aho, Alfred V., and Margaret J. Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM 18.6 (1975): 333-340. • Nguyen, Thuy TT, and Grenville Armitage. A survey of techniques for internet traffic classification using machine learning. Communications Surveys and Tutorials, IEEE 10.4 (2008): 56-76. • Finamore, Alessandro, et al. Kiss: Stochastic packet inspection. Traffic Monitoring and Analysis. Springer Berlin Heidelberg, 2009. 117-125. • Béla Hullár, Sándor Laki, and Andras Gyorgy. Early identification of peer-to-peer traffic. Communications (ICC), 2011 IEEE International Conference on. IEEE, 2011. 13. 13 Measurements in peer-to-peer networks 13.1. Centralized VS P2P 256 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 13.2. Peer-to-peer networks • Peers can act as clients and servers • Different approaches • Centralized, decentralized • Structured and unstructured • Tracker-based • Dynamic membership 257 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Voluntary • Scalable and reliable 13.3. Peer-to-peer networks • P2P apps generate the majority of the Internet traffic • Issues: legality, volatility, scalability 13.4. Some P2P protocols 258 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Napster • Pseudo p2p, centralized index, mp3 distribution • Gnutella (Limewire, Skype, Bearshare,WinMX) • Fully decentralized, distributed searching • Kademlia (eMule, Overnet) • Decentralized,DHT for lookup, XOR of node keys as distance metric, structured • Kazaa • Supernodes, closed source, hierarchical approach • BitTorrent • Tracker-based, decentralized, tit-for-tat, choking 13.5. What do we want to measure? • P2P applications are widely used nowadays • The majority of the traffic • Important to understand their characteristic to develop better algorithms • ISPs also want to know the effect of these applications on their network 13.6. How can we do that? 259 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Collect traffic traces and classify p2p traffic • Build a P2P crawler to crawl a real world application and then collect the crawler’s traffic • Different P2P systems require different solutions 13.7. Gnutella • How it works? • Connection • Host list from GWebCache or a locally stored file • Ping/pong messages between potential neighbors • Content lookup • Query messages flooding on the network • QueryHit message propagates back to the source from peers having the content • Download • The source directly downloads the file from peers having the content 13.8. Gnutella vs Napster • S. Saroiu et al. (MMCN '02) • What can we say about ...? • Latency • Lifetime of peers • Bottleneck bandwidth • Neighborhood size • Etc. • An active crawler was used for data collection 13.9. Gnutella vs Napster Lifetime of the peers • More than 90 percent Internet host uptime for 20 percent peers • Application uptime is higher for Napster peers than for Gnutella ones • Median session duration is the same for both p2p networks 260 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 13.10. Gnutella vs Napster Shared files vs Shared data • Strong correlation between the number of shared files and the amount of shared data • slope of both lines is 3.7MB which was the typical size of an mp3 file 261 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 13.11. Gnutella Latencies and downstream bandwidth • 60 percent of the peers have a latency between 70 and 280 ms • Two clusters can be identified 262 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 70 ms • 280 ms • 20-60Kbps modem • Mbps broadband 13.12. Kademlia 13.13. Kademlia • Subtrees for node 0011.... • Each subtree has k buckets (k neighboring nodes) 13.14. Kademlia • node 0011...wants to search 1110... 263 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 13.15. Kademlia • R. Bhagwan et al. • Focusing on Overnet network • Using peer IDs to calculate availability • Availability percent of time a user or a machine is online • How to measure? • Crawler takes snapshot of all the peers by requesting 50 random IDs and repeats this regularly • Prober goes through the list of available IDs to check their availability, by sending a request them 13.16. Kademlia Collected data • Data was collected from January 14 to January 28, 2003 • About 40,000 hosts discovered by one crawl • For a day (6 crawls) 70,000 and 90,000 unique hosts • 1468 of the 2400 randomly selected hosts probes responded at least once to the Prober’s requests 13.17. Kademlia Peers with dynamic IPs 264 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Percentage of hosts that have multiple IPs during a longer period of time 13.18. Kademlia Peer availability 265 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Availability values for 7 days • Based on host IDs • Based on the first IP seen for each host ID • 0.07 • 0.3 • Using IP addresses would thus underestimate availability. 13.19. IP-based availability is similar to what we have seen for Gnutella 266 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 13.20. How duration can affect the availability? • the longer the period of time, the greater the chances of a host being unavailable 13.21. Time of day effects 267 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 80 percent of all host pairs lie in this interval • Strong independence 13.22. BitTorrent • Efficient and very popular file sharing system • Unstructured P2P network • Where the story has begun • 2001 – Bram Cohen – BitTorrent Inc. • And nowadays... 268 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • In 2009, the 27-55 percent of the overall Internet traffic • Fundamentals: • Tit-for-tat • Incentive • Pieces and Blocks 13.23. File Sharing • How to share content: • The peer creates a .torrent file: • (1) metainformation about the file to be shared • (2) infromation about the tracker • storing the peers • Downloading: • .torrent file needs to be downloaded • The peer connects to the tracker • and obtains a set of neighboring peers • Web Server • Harry Potter.torrent • Transformer.torrent • The Lord of Ring.torrent 13.24. *.torrent 269 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • URL of the tracker • Pieces • Size of a piece • Filenames • File sizes 13.25. The Tracker • Maintains a list of peers in the network • IP address, port, peer id • State information (Completed or Downloading) • It returns a random peer list to the requests • Neighboring peers 13.26. BitTorrent • Seeder • a peer, who has the whole file (all the pieces) • Initial seeder • the peer, who has the initial copy of the file to be shared • Leecher • a peer, who downloads data from others • Initial seeder • Seeder 270 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Leecher • Leecher 13.27. An example • Seeder: A • Leecher B • 1,2,3,4,5,6,7,8,9,10 • 1,2,3 • Leecher C • 1,2,3 • 1,2,3,4 • 1,2,3,9 • 1,2,3,4,9 13.28. File sharing • Initial seeder subdivides the file into pieces • Leecher • Find the tracker to obtain a neighborhood • As soon as a piece has downloaded it can be shared by others • After having all the pieces, the file can be assembled and the peer becomes seeder. • The more pieces are downloaded, the more replicas are available in the network 13.29. Lifetime of a torrent Seeders and leechers • Initial interest 13.30. Pieces and sub-pieces 271 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Pieces and sub-pieces • Generally, a piece consists of sub-pieces of 16 KBs. • While a piece is not ready, it’s sub-pieces are being downloaded with high priority • The goal is to have the entire piece as soon as possible • Transmission • Data transmission is over TCP (or UDP in some new clients) • Many request can be handled in parallel • Typically five • When a sub-piece arrives, a new request is sent out 13.31. Piece Selection • It is crucial for good performance • Rare pieces issue • The worst case when some pieces are missing from the network. • In such case, if the initial seeder stops, the file cannot be reassembled. • What is the good strategy? 13.32. Piece Selection • Strict Priority • When the peer received a sub-piece of a new piece, the sub-pieces of this piece will be downloaded with high priority. • Goal: having a new complete piece • Primary rule • Rarest First • General rule! A peer always asks for the locally rarest piece (locally=in its neighborhood) • Goal: avoid the situation when some pieces are missing… • Random First Piece • This policy is applied to download the first piece. A peer choses a random piece first, and download it from one or multiple neighboring peers. • Goal: having the first piece to be distributed as soon as possible • Endgame Mode • If only a few pices are missing, the peer sends requests to multiple peers in its neighborhood to obtain the sub-pieces. If the piece has downloaded, it stops requesting. 13.33. Choking 272 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • There’s no central resource management • Each peer try to maximize its download rate • Tit-for-tat: • Upload • Choking • Chocking • The peer can deny uploading temporally • Handling free-riders • Peers that only download • Alice • Bob • Choked • Choked 13.34. Choking algorithm • Peer A sends a choke message to peer B • If A decides to refuse uploading to B • Rechoking period is 10 sec • Based on the download rate • Each peer uploads to at most four neighboring peers • Three of them are with the highest download rates • Tit-for-tat • The fourth is chosen randomly (Optimistic Unchoke) • The worst neighbor is changing periodically 13.35. Optimistic unchoke 273 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • New connections can be tried out • There is a chance to find better neighbors • Avoiding starvation of new peers • New peers also have to download pieces • Choking the worst peer in every 30 sec 13.36. Upload only mode • After downloading all the pieces, the leecher becomes a seeder • Who shall we upload to? • To the peer having the highest upload rate • Goal is to have a new seeder in the network as soon as possible • Assuming that the user will not stop sharing the file after becoming seeder 13.37. Lifetime of a torrent • L. Guo et al. (IMC 2005) • Exponential decay of peer requests • Initial interest, but then the number of requests decreases rapidly • The download rates of leechers show high variance in different period of time 274 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 13.38. Peer behaviour • The overall contribution of a peer is decreasing, if it's download rate is increasing • The probability that a peer exits is basically independent of its download speed and the amount of already downloaded data. 275 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 13.39. Peer behaviour • The life time of most of the torrent is between 30 and 300 hours, the average is 8.5 hour • The average population size is 102 peers, which is not too much • The average seeding time is 8.42 hours 276 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 13.40. Literature • Saroiu, Stefan, P. Krishna Gummadi, and Steven D. Gribble. Measurement study of peer-to-peer file sharing systems. Electronic Imaging 2002. International Society for Optics and Photonics, 2001. • Bhagwan, Ranjita, Stefan Savage, and Geoffrey M. Voelker. Understanding availability. Peer-to-Peer Systems II. Springer Berlin Heidelberg, 2003. 256-267. • Guo, Lei, et al. Measurements, analysis, and modeling of BitTorrent-like systems. Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement. USENIX Association, 2005. 14. 14 Analysis of online social networks 14.1. Be socialized 277 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Social networks are basically graphs • Vertices are people • Edges are relations between them • Friends, followers, etc. • Different online social network • For different communities • with the same interest • With different edge types • Much more connections than a user have in real life • Online friends may never meet 14.2. Increasing interest • Source: B2Ce.Consultancy company websites 278 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.3. Twitter users 14.4. Social Flow • Twitter network related to #SahelNow. 279 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.5. Twitter • Microblog system • People can write short messages, tweets and share their thought with their followers • People can subscribe to follow others (friend) • People can retweet tweets from others • Hashtags can be used to the mark the topic of a tweet • Tweets can also contain URLs (bit.ly, tinyurl) • The structure of this network can be represented as a directed graph where • Vertices are Twitter users • Directed edges between users • Two names for the same edge: Friend and follower • B is a friend of A if A follows B • A is a follower of B if A follows B • There are other graph structures which can be used to analyze other properties of this network • E.g. retweet graphs, etc. 14.6. Why do we analyze it? • Advantages • Representative • Full spectrum of communication from mass media and celebrities to ordinary users • Easy tracking of information flow • Retweets, urls, etc. • Available • Drawbacks • Twitter is not the most popular social media • It is only one communication channel • Hard to measure its effect on real life 14.7. Is the data available? • Twitter Firehose Stream • 1 percent of all the tweets is freely available • 5 million tweets per day 280 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 0.5 GB data per day • One year is 180 GB • Twitter graph is not fully available • But we can crawl it using twitter api • And there are follower graph snapshots • Time consuming and there are several limitations • Getting the entire graph is almost impossible 14.8. Quantifying influence • E. Bakshy et al. (WSDM’11) • Twitter follower graph (July 2009) + 1.03B tweets • Analyzing the URLs posted • 87M tweets with bit.ly urls • 1.6 M users seeded an average • of 46.33 bit.ly urls each 14.9. How to measure influence? 281 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • For a given URL, let’s define the influence score as • The time is given for each URLs posted • If B follows A and A posted the URL before B and was the only of B’s friends, we say A influenced B to post the URL. • What if B has several friend’s who posted the URL? • We can consider three different ways: • time • 2 • 0 • 1 • 1 • 1 • 1.5 • First • Last • Shared 14.10. Cascades • We can construct disjoint influence trees called cascades for each initial posting of an URL. 282 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.11. Cascade sizes and depths • Cascade sizes approximately follow a power-law distribution • Depths resemble an exponential distribution • Both figures imply that the vast majority of posted URLs do not spread at all 283 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Extremely rare • Extremely rare 14.12. How to predict influence? • Aggregate all urls by a user • The influence of a user • the logarithm of the average size of all cascades for a user • Regression tree method to estimate influences • Five fold cross-validation 14.13. Regression tree for influence prediction 14.14. Past influences vs followers 284 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Past local influence versus number of followers for all users • The size of the circles represents the actual average influence • For the top 25 users having the highest actual influence values 285 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.15. Information flow on twitter S. Wu et al. (ACM WWW ‘11) • Data • Full follower graph • 5B tweets collected between 2009 and 2010 • Elite Twitter users categorized into four classes • Celebrities, media, organization, blogger • Snowball sample of Twitter lists • Analyzing bit.ly url spreading among different user groups • Two step flow of information • 46 percent of URLs spread along the class chain: • Elite - intermediate - ordinary 14.16. How to identify Elite users? 14.17. Snowball sample of Twitter lists 286 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Users appeared in the pruned list of lists with Lady Gaga (e.g. from celebrity or celebs) • End so on 14.18. Activity sample of Twitter users • Snowball sampling is potentially biased by our particular choice of seeds. • Let’s crawl all lists associated with all users who tweeted at least once every week for our entire observation period. • biased towards users who are consistently active • But the bias is likely to be quite different 287 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.19. Who listens to whom? • Share of tweets received between elite categories • It shows only how many URLs are received by category i from category j and it is a weak measure of attention for many tweets go unread. 14.20. Who listens to whom? • Retweets between elite categories 288 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.21. Two step information flow • Two-step flow theory (Katz and Lazarsfeld 1955) • Media exerts indirect influence on the masses via an intermediate layer of opinion leaders • A typical information flow in twitter • Elite - intermediate - ordinary • Direct flows • 46 percent of media-originated information is received through intermediaries 289 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Elite • Intermediate • Ordinary 14.22. Who are the intermediaries? • A large population (490K users) act as intermediaries for 600K users • Most (99 percent) are ordinary • Also receive information via two-step flows • More exposed to the media 290 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.23. Who are the intermediaries? • Opinion leadership is not a binary value. • Consistent with the original two-step flow theory 291 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 292 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.24. Network Dynamics • J. Leskovec et al. (ACM SIGKDD ‘09) • Memetracking • Tracking new topics, ideas, and "memes" across the social network • Finding exact quotes from blogs, and graph their volumes over time • Possible to see news cycles 14.25. Memetracking 14.26. Memetracking 293 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.27. Collective attention on Twitter 294 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • J. Lehmann et al. (WWW'11) • Data • 130 M Tweets and a follower graph with 2.7M users • Two parameters • Number of mentions before the peak • Number of mentions after the peak • Clustering in this two dimensional space • They identified four clusters 14.28. Collective attention on Twitter 14.29. Collective attention onTwitter 295 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.30. Collection attention on Twitter • Semantic makeup of the hashtag classes: • columns represent peak types and rows correspond to topics, • i.e., concepts in the WordNet semantic lexicon. • The radius of a circle is proportional to the average normalized frequency of the topic in the corresponding hashtag class. • The displayed topics represent the most frequently observed generic concepts. • Sample terms subsumed by them are reported in parenthesis. 296 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 14.31. Literature • Bakshy, E., and Hofman, J. (2011). Everyone’s an influencer: quantifying influence on twitter. Proceedings of the fourth ACM international conference on Web search and data mining. • Wu, S., Hofman, J. M., Mason, W. a., and Watts, D. J. (2011). Who says what to whom on twitter. Proceedings of the 20th international conference on World wide web - WWW ’11, 705. doi:10.1145/1963405.1963504 • Leskovec, J., Backstrom, L., and Kleinberg, J. (2009). Meme-tracking and the dynamics of the news cycle. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining • Yang, J., and Leskovec, J. (2011). Patterns of temporal variation in online media. Proceedings of the fourth ACM international conference on Web search and data mining • Lehmann, J., and Gonçalves, B. (2012). Dynamical classes of collective attention in twitter. Proceedings of the 21st international conference on World Wide Web • Wu, S., Tan, C., Kleinberg, J., and Macy, M. (2011). Does bad news go away faster. Proc. 5th International AAAI Conference on Weblogs and Social Media, 2011 15. 15 Measurements in mobile and cellular networks 297 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 15.1. Internet and cellular networks • The Internet usage has changed dramatically • Mobile traffic is expected to grow rapidly in the near future • 4G/LTE networks will provide much higher bandwidth (100/50Mbps d/u) and lower latencies on the last hop • More than 1B Internet enabled smart phones world wide • Out of the total 5B mobile phones • According to go-gulf.com in 2012 15.2. What can measurements reveal? • For ISPs to improve their services • Resource provisioning and allocation • Identification of bottlenecks • More user friendly policies • A direct way to measure QoE • For end users and governments to enforce contracts and low • QoS provided by the ISP • Network neutrality 298 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Effect of firewall and other middleboxes 15.3. A widely heterogeneous environment • Various mobile devices • Android, iOS, RIM, MS, Symbian • Various access technologies • WiMax, Wifi, LTE, CDMA2000, HSDPA, UMTS • A device context can consists of • Network type, signal strength, cell ID, RRC/DRX state, etc. • Device type, screen state, battery state, time of day, etc. • Sensor data like GPS coordinates, acceleration, etc. 15.4. How do the different access technologies affect the performance? • Y. Guo et al. (2013) • Measurement setting • MobiPerf mobile application was used to collect data • Available at Google Play • TCP connections with randomized data transfer in 2-5 minutes • The phone was kept stationary during the transfer • Downlink: server to mobile device, Uplink: vice versa • Laboratory experiments • Throughput was sampled in every 0.5 seconds • Measurements from the first 10 seconds were dropped • Time is needed for stabilize the connection • High variations can be experienced in the first 10 seconds 15.5. HSDPA downlink 299 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Positive correlation between signal strength and throughput 15.6. HSDPA Uplink 300 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Basically there is no correlation, • but the throughput is very low 15.7. LTE Downlink 301 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 15.8. LTE Uplink • Signal strength is a factor that affects the performance of wireless access esp. LTE 15.9. What is the problem with the first 10 seconds? • LTE Downlink • The TCP SlowStart period can be long • For other access technologies it may be even worst 302 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 15.10. Large scale measurements • J. Huang et al. (2013) • MobiPerf had 99k users from • across the world in 2009 • This much larger data set enables us to analyze characteristics on a larger scale • Downlink/Uplink • LDNS Lookup and Coverage • Latency and reachability 303 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 15.11. Performance of different access technologies Access technologies used by MobiPerf users • Wifi, 3G (UMTS family and CDMA family), EDGE and GPRS • Wihtin 3G: HSDPA, pure UMTS, 1xRTT, EVDOA 15.12. Performance of different access technologies • Access technologies used by MobiPerf users • Wifi, 3G(UMTS family and CDMA family), EDGE and GPRS • Wihtin 3G: HSDPA, pure UMTS, 1xRTT, EVDOA 304 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Wifi has the highest throughput • (Median: 1.46Mbps) • GPRS and EDGE perform the worst • UMTS family outperforms • CDMA family • (Median: 964 and 368 Kbps resp.) 15.13. Performance of different access technologies • Access technologies used by MobiPerf users • Wifi, 3G(UMTS family and CDMA family), EDGE and GPRS • Wihtin 3G: HSDPA, pure UMTS, 1xRTT, EVDOA 305 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Wifi has the highest throughput • (Median: 1.46Mbps) • GPRS and EDGE perform the worst • UMTS family outperforms • CDMA family • (Median: 964 and 368 Kbps resp.) • WHY? 15.14. Performance of different access technologies 306 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • UMTS family results smaller RTT values than CDMA family • (Median: 495 and 680 ms resp.) • TCP throughput is lower with higher RTT and loss rate 15.15. Performance of different access technologies 307 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 1xRTT is one of the earliest CDMA 3G technology • It results high RTT and retransmission rate that degrade TCP throughput. 15.16. Performance of different access technologies 308 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • The variation of TCP downlink RTT is often called as jitter. • Some applications do not tolerate high jitter, so it is also important for the design of mobile applications • The experienced jitter • Wifi - 41 ms • UMTS family - 93 ms • CDMA family - 233 ms • It has influence on user experience 15.17. Performance of different access technologies 309 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • The uplink throughput difference is less obvious. • The median for UMTS is 110 Kbps and 120 for CDMA family. • Within 3G family, all the median uplink throughputs are below 150 Kbps. 15.18. Long-term trend and daily patterns 310 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Downlink throughput degrades in the network of AT'n T during peak time • However, T-Mobile and Verizon users do not experience too much differences. • There is some slight degradation only 15.19. Long-term trend and daily patterns • While the RTT values for T-Mobile network seem to be stable. • Downlink RTT explains the throughput fluctuation for AT'n T and Verizon. 15.20. Long-term trend and daily patterns 311 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • There is a significant increase in jitter during peak hours for AT'nT and Verizon, which may degrade the user experience for applications having little tolerance on jitter. 15.21. Measuring DNS lookup time in 3G networks • DNS queries to resolve the IP of a server located in the U.S. • Median DNS lookup time for different areas (cell size is 50 km x 50 km) 312 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • ms 15.22. Measuring DNS lookup time in 3G networks • DNS queries to resolve the IP of a server located in the U.S. • ms • High DNS overhead could lead to longer delays for downloading websites using CDNs with DNS-based load balancing • (e.g. Akamai) 15.23. Downlink throughput of major carriers in the U.S. • In Kbps 313 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • For highly populated areas, service carriers provide better infrastructure with newer technology than in rural areas with few users. 15.24. Cellular network policies • Do the mobile operators use traffic differentiation? • Methodology • Scanning a set of ports from mobile devices • Well known ports like FTP(21,22),SMTP(25), HTTP(80), etc. • Both TCP and UDP • Measure • TCP Connect: the time between the sending of a TCP SYN packet and the receipt of SYN-ACK • TCP Data: the time for the client to send a short unique message to the server and receive the response message • UDP Data: Similarly to TCP Data, the data transmission time for UDP • TTL at client side: the TTL value in the probe packet received by the client. All the packets sent from the server have an initial TTL of 64 15.25. Port scans for large carriers in the U.S. 314 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • AT'nT and Verizon show very similar behavior and there is no obvious difference across the ports. 15.26. Port scans for large carriers in the U.S. 315 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Two levels of connection times: around 70 and 100 ms 15.27. Port scans for large carriers in the U.S. • Blocked TCP ports • All the IP packets sent by the server set the TTL field to 64, but the client receives packets with much higher TTL values. 316 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • 197 for SMTP, POP, HTTP,... • 253 for FTP... • There must exist a middlebox that rewrites packets going through. 15.28. FTP blocking in T-Mobile’s network • Establishing an FTP connection on port 21 • UDP port 21 is not blocked • Address spoofing by the middlebox • SYN • SYN/ACK • ACK • Data • ACK • DPI like solution since if the payload contains FTP commands the packets can go through the middlebox. • Connection established • (Three-way-handshake) 15.29. HTTP proxy port blocking in T-Mobile’s network • HTTP proxy is on TCP port 8080 • Address spoofing by the middlebox 317 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • SYN • SYN/ACK • Any web servers running on port 8080 can not be accessed from T-mobile’s network. The port is totally blocked. 15.30. How can these middleboxes affect user experience? • Z. Wang et al. (2011) • Problems with middleboxes • Policies • Application performance • Peer-to-peer behind NAT • Smartphone energy cost • Security • NetPiculet measurement system • Former version of MobiPerf 15.31. IP spoofing 318 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • SRC IP=10.5.6.102 • 10.5.6.100 • 10.5.6.102 • DST IP=10.5.6.102 • It can significantly reduce the lifetime of the victim's battery • 4 out of 60 carriers allow IP spoofing which could make their network vulnerable. 15.32. IP spoofing measurement • • • 10.5.6.100 • 10.5.6.102 319 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • If the packet is received, the IP spoofing is allowed. 15.33. Short TCP connection timeout • The default TCP keep alive timer of 2 hours is too large in cellular networks • Firewalls terminates idle connections after a much shorter time period • By sending RST packets to force the client to re-establish the connection • It takes much more energy and time than sending keep alive messages more frequently. • But idle connections occur: Facebook, Gmail, Gtalk, instant messengers, etc. • DATA • KEEP-ALIVE • KEEP-ALIVE • KEEP-ALIVE • KEEP-ALIVE • KEEP-ALIVE • Sending keep alive messages consumes lots of energy, but it is still better than re-establishing the connection. • Some carries apply TCP timeout timers of 5 minutes or shorter. • And more than 10 percent of them have timeouts shorter than 10 minutes. 15.34. How short TCP connection timeouts affect energy consumption? 320 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • Assuming long-lived TCP connections • And a battery of 1350mAh 15.35. Packet reordering in the middleboxes • Some firewalls buffer out-of-order TCP packets and send them inorder to the destination if they are available • Packet reordering along the path • Packet loss which happens more frequently • The main problem that it disables TCP fast retransmission since the sender never receives duplicate ACKs • P1 • P2 • P3 • Buffering out-of-order packets (P1 is missing) • The middlebox is waiting for P1 321 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • P1 • P2 • P3 • P1 15.36. Packet reordering in the middleboxes • The effect of packet loss 15.37. Packet reordering in the middleboxes • 3G was emulated on WiFi • 400 ms RTT and 1 percent loss rate 322 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • +44 percent downloading time • The longer the downloading time, the more energy is consumed!!! 15.38. NAT traversal • NAT mapping is crucial for NAT traversal • Peer-to-peer applications, skype, instant messengers • It defines how the NAT assign external port to each connection • Different NAT mapping types • Treated as random by existing traversal techniques • Thus impossible to predict port • Linearly increasing port numbers • NAT • NAT 323 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement 15.39. NAT mapping in cellular networks • Based on TCP connections to the server with random intervals • The server records the observed source ports • It's not random, so port prediction is feasible. 15.40. Mobile network measurement projects • MobiPerf • University of Michigen • http://mobiperf.com • PORTOLAN • University of Pisa • http://portolan.iet.unipi.it • MySpeedTest • Georgia Institute of Technology 324 Created by XMLmind XSL-FO Converter. Large-scale Internet measurement • https://play.google.com/store/apps/details?id=com.num 15.41. Literature • Y. Guo et al., "An In-depth Study of LTE: Effect of Network Protocol and Application Behavior on Performance", ACM SIGCOMM, 2013. Accepted. • Huang, Junxian, et al. Anatomizing application performance differences on smartphones. Proceedings of the 8th international conference on Mobile systems, applications, and services. ACM, 2010. • Huang, Junxian, et al. Mobiperf: Mobile network measurement system. Technical report, Technical report). University of Michigan and Microsoft Research, 2011. • Wang, Zhaoguang, et al. An untold story of middleboxes in cellular networks. ACM SIGCOMM Computer Communication Review. Vol. 41. No. 4. ACM, 2011. 325 Created by XMLmind XSL-FO Converter.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Large-scale Internet measurement