Download Large-scale Internet measurement

Document related concepts

Computer network wikipedia , lookup

IEEE 1355 wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Net neutrality wikipedia , lookup

Airborne Networking wikipedia , lookup

Peering wikipedia , lookup

Net neutrality law wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Deep packet inspection wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Net bias wikipedia , lookup

Transcript
Large-scale Internet measurement
Laki, Sándor
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
írta Laki, Sándor
Publication date 2015
Szerzői jog © 2015 Laki Sándor
Created by XMLmind XSL-FO Converter.
Tartalom
Large-scale Internet measurement ...................................................................................................... 1
1. 1 Introduction to Internet measurements ............................................................................... 1
1.1. Course Information ................................................................................................... 1
1.2. Grading ..................................................................................................................... 1
1.3. Term project ............................................................................................................. 2
1.4. What is this course about? ........................................................................................ 2
1.5. Reading ..................................................................................................................... 3
1.6. INTRODUCTION .................................................................................................... 4
1.7. Once upon a time... ................................................................................................... 4
1.8. And no... ................................................................................................................... 5
1.9. Another aspect of Internet evolution ........................................................................ 6
1.10. Today’s Internet ...................................................................................................... 8
1.11. Why do we need Internet measurements? .............................................................. 8
1.12. Why do we need Internet measurements? .............................................................. 9
1.13. What to measure? ................................................................................................... 9
1.14. Why is it challenging to measure the Internet? ....................................................... 9
1.15. Core simplicity ....................................................................................................... 9
1.16. Layered architecture and hidden network elements .............................................. 10
1.17. IP centric ............................................................................................................... 10
1.18. Middleboxes in the carriers’ networks .................................................................. 10
1.19. Administrative boundaries .................................................................................... 11
1.20. Applications .......................................................................................................... 11
1.21. Network measurements ........................................................................................ 11
1.22. Infrastructure measurements ................................................................................. 11
1.23. Traffic measurements ........................................................................................... 12
1.24. Application measurements .................................................................................... 12
1.25. Active and passive measurements ........................................................................ 12
1.26. Internet Measurements .......................................................................................... 13
1.27. Related Conferences and Journals ........................................................................ 13
2. 2 Analytical background ..................................................................................................... 13
2.1. Analytical background ............................................................................................ 13
2.2. LINEAR ALGEBRA .............................................................................................. 14
2.3. Notations ................................................................................................................. 14
2.4. Norms and orthogonality ........................................................................................ 14
2.5. Matrices .................................................................................................................. 14
2.6. Eigenvectors and eigenvalues ................................................................................. 15
2.7. Alternate algebras ................................................................................................... 15
2.8. PROBABILITY AND STATISTICS ..................................................................... 15
2.9. Why do we need statistics and probability theory? ................................................. 15
2.10. Notations ............................................................................................................... 16
2.11. Definitions ............................................................................................................ 16
2.12. Definitions - II ...................................................................................................... 16
2.13. Expected values and moments .............................................................................. 16
2.14. Variance and standard deviation ........................................................................... 16
2.15. Joint probability .................................................................................................... 16
2.16. Conditional probability ......................................................................................... 16
2.17. Central limit theorem ............................................................................................ 16
2.18. Distributions for Internet measurements ............................................................... 16
2.19. Stochastic processes ............................................................................................. 17
2.20. Stochastic processes ............................................................................................. 18
2.21. Stochastic processes ............................................................................................. 18
2.22. Characterization of a stochastic process ............................................................... 18
2.23. Simpler stationary conditions ............................................................................... 18
2.24. Measures of dependence ....................................................................................... 18
2.25. Measures of dependence ....................................................................................... 18
2.26. Measures of dependence ....................................................................................... 19
iii
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.27. Modeling network traffic and user activity ...........................................................
2.28. Modeling network traffic and user activity ...........................................................
2.29. Short and long tailed distributions ........................................................................
2.30. Short and long tailed distributions ........................................................................
2.31. Short and long tailed distributions ........................................................................
2.32. Heavy tailed/power-law distribution ....................................................................
2.33. Heavy tailed distribution .......................................................................................
2.34. Measured data .......................................................................................................
2.35. Describing data .....................................................................................................
2.36. More detailed descriptions ....................................................................................
2.37. Histogram .............................................................................................................
2.38. Empirical cumulative distribution function (CDF) ...............................................
2.39. Categorical data description .................................................................................
2.40. Describing memory and stability ..........................................................................
2.41. High variability in Internet data ............................................................................
2.42. Zipf’s law ..............................................................................................................
2.43. GRAPH THEORY ...............................................................................................
2.44. Graph theory .........................................................................................................
2.45. Graphs ...................................................................................................................
2.46. Subgraphs .............................................................................................................
2.47. Connected graphs .................................................................................................
2.48. Metrics for characterization ..................................................................................
2.49. Metrics for characterization ..................................................................................
2.50. Matrix representation ............................................................................................
2.51. Applications of Routing Matrix ............................................................................
2.52. Applications of routing matrix ..............................................................................
2.53. Artificial graph constructions ...............................................................................
2.54. Erdős-Rényi random graph ...................................................................................
2.55. Erdős-Rényi random graph ...................................................................................
2.56. Generalized random graph ....................................................................................
2.57. Preferential attachment model ..............................................................................
2.58. Preferential attachment model ..............................................................................
2.59. Regular vs Random graphs ...................................................................................
2.60. AS level topology .................................................................................................
2.61. AS level topology .................................................................................................
2.62. AS level topology .................................................................................................
2.63. MODELING .........................................................................................................
2.64. Measurement and modeling ..................................................................................
2.65. Descriptive data model .........................................................................................
2.66. Constructive data model .......................................................................................
2.67. Data model ............................................................................................................
2.68. Why build models .................................................................................................
2.69. Probability models ................................................................................................
3. 3 Network measurement infrastructures ETOMIC and SONoMA .....................................
3.1. Why Internet experimental facilities are needed? ...................................................
3.2. Existing TestBeds and Network Measurement Infrastructures ...............................
3.3. Lifecycle of network measurements .......................................................................
3.4. ETOMIC .................................................................................................................
3.5. The ETOMIC system ..............................................................................................
3.6. System architecture .................................................................................................
3.7. Evolution of measurement nodes ............................................................................
3.8. ETOMs ...................................................................................................................
3.9. APE boxes ..............................................................................................................
3.10. Measurement boxes ..............................................................................................
3.11. Central Management System ................................................................................
3.12. Slices VS Unique timeslots ..................................................................................
3.13. The ETOMIC system ............................................................................................
3.14. One day on the Internet .........................................................................................
3.15. Experimental use cases in ETOMIC .....................................................................
3.16. HOW TO USE ETOMIC? ....................................................................................
iv
Created by XMLmind XSL-FO Converter.
19
19
19
19
19
19
19
20
21
21
22
22
23
24
24
25
25
25
26
26
26
26
26
26
27
27
28
28
28
29
29
29
29
29
30
30
31
31
31
32
32
32
32
33
33
33
34
34
34
35
36
36
37
37
37
38
39
43
43
45
Large-scale Internet measurement
3.17. Performing an experiment from the system’s perspective ....................................
3.18. Measurement types ...............................................................................................
3.19. Necessary steps for submitting an experiment ......................................................
3.20. Creating a bundle ..................................................................................................
3.21. Creating an experiment and querying its status ....................................................
3.22. Downloading the results .......................................................................................
3.23. Programming DAG cards .....................................................................................
3.24. PUBLISHING DATA ..........................................................................................
3.25. Experimental facilities ..........................................................................................
3.26. Traditional approach .............................................................................................
3.27. Sharing science .....................................................................................................
3.28. Related work: CAIDA/DatCat ..............................................................................
3.29. Related work: MoMe database .............................................................................
3.30. Related work: MAWI repository ..........................................................................
3.31. Data publication efforts ........................................................................................
3.32. Key ideas in data handling ....................................................................................
3.33. VO approach .........................................................................................................
3.34. Unified interface ...................................................................................................
3.35. Casjobs User Interface for accessing data ............................................................
3.36. SONOMA .............................................................................................................
3.37. SONoMA v1.0 ......................................................................................................
3.38. Why do we need another network measurement platform? ..................................
3.39. SONoMA ..............................................................................................................
3.40. System components ..............................................................................................
3.41. Management Layer ...............................................................................................
3.42. Measurement methods ..........................................................................................
3.43. Web client .............................................................................................................
3.44. Case study: A full mesh topology measurement ...................................................
3.45. Case study: A full mesh topology measurement ...................................................
3.46. What happens in the background? A full mesh topology measurement ...............
3.47. Another use case: Spotter .....................................................................................
3.48. SONoMA 2.0 ........................................................................................................
3.49. Literature ..............................................................................................................
4. 4 Network measurement infrastructures PlanetLab ............................................................
4.1. PlanetLab ................................................................................................................
4.2. The main goal .........................................................................................................
4.3. What is PlanetLab? .................................................................................................
4.4. PlanetLab architecture ............................................................................................
4.5. Slices ......................................................................................................................
4.6. Slices ......................................................................................................................
4.7. Slices ......................................................................................................................
4.8. User Opt-in .............................................................................................................
4.9. Services running in your slice .................................................................................
4.10. Services running in your slice ...............................................................................
4.11. Services running in your slice ...............................................................................
4.12. Services running in your slice ...............................................................................
4.13. Services running in your slice ...............................................................................
4.14. Virtualization solutions .........................................................................................
4.15. VServers in a PlanetLab node ...............................................................................
4.16. VServers in a PlanetLab node ...............................................................................
4.17. Low-level network access .....................................................................................
4.18. Getting started .......................................................................................................
4.19. Create your SSH Key ...........................................................................................
4.20. Create your slice ...................................................................................................
4.21. Login to your slice ................................................................................................
4.22. Install additional packages ....................................................................................
4.23. Deploying your app ..............................................................................................
4.24. Configuring a server for automatic startup ...........................................................
4.25. Other useful tools ..................................................................................................
4.26. PSSH ....................................................................................................................
v
Created by XMLmind XSL-FO Converter.
46
46
48
49
49
49
49
50
50
50
51
52
52
53
53
54
54
54
54
55
55
55
56
56
57
57
57
58
59
59
60
60
61
61
61
61
62
62
62
62
62
62
62
63
64
65
66
67
68
68
69
69
69
70
70
70
71
71
71
71
Large-scale Internet measurement
4.27. PSSH Demo .......................................................................................................... 72
4.28. PlanetLab Slice Deploy Toolkit ............................................................................ 72
4.29. vxargs ................................................................................................................... 72
4.30. Nixes Tool Set ...................................................................................................... 72
4.31. Long-Running Services In PlanetLab ................................................................... 73
4.32. Services (cont) ...................................................................................................... 73
4.33. Services (cont) ...................................................................................................... 74
4.34. Further available testbeds with PlanetLab Europe account .................................. 74
4.35. NITOS Wireless Testbed ...................................................................................... 74
4.36. w-iLab.t ................................................................................................................ 75
5. 5 Network measurement infrastructures FEDERICA, SFA, OpenFlow ............................. 75
5.1. Federica .................................................................................................................. 76
5.2. Federica .................................................................................................................. 76
5.3. The physical topology ............................................................................................ 76
5.4. Core elements ......................................................................................................... 77
5.5. SFA – SLICE-BASED FACILITY ARCHITECTURE ......................................... 77
5.6. Slice-based Facility Architecture SFA ................................................................... 77
5.7. Slice-based Facility Architecture SFA ................................................................... 78
5.8. Experiment lifetime in general ............................................................................... 78
5.9. What can SFA help with? ....................................................................................... 78
5.10. SFA for federated testbeds .................................................................................... 79
5.11. SFA for federated testbeds .................................................................................... 81
5.12. SFA – Available resources ................................................................................... 82
5.13. SFA functionalities ............................................................................................... 83
5.14. Hierarchical naming ............................................................................................. 83
5.15. Authentication ...................................................................................................... 85
5.16. SFA API ............................................................................................................... 86
5.17. SFA Components .................................................................................................. 86
5.18. Resource Specification (RSpec) Documents ........................................................ 87
5.19. SFI and SFA client ............................................................................................... 88
5.20. Installation and configuration ............................................................................... 88
5.21. List records from the registry ............................................................................... 89
5.22. Detailed record information .................................................................................. 89
5.23. Get resources ........................................................................................................ 90
5.24. Get resources ........................................................................................................ 90
5.25. Allocate resources for a given slice ...................................................................... 91
5.26. Allocate resources for a given slice ...................................................................... 91
5.27. Deallocate resources ............................................................................................. 92
5.28. OPENFLOW CAPABILITIES IN PLANETLAB EUROPE ............................... 92
5.29. What is the problem with existing networks? ....................................................... 92
5.30. What is the problem with existing networks? ....................................................... 92
5.31. Software Defined Networking .............................................................................. 93
5.32. OpenFlow ............................................................................................................. 93
5.33. OpenFlow ............................................................................................................. 94
5.34. Plumbing primitives ............................................................................................. 95
5.35. Network OSes ....................................................................................................... 95
5.36. OpenFlow support in PlanetLab ........................................................................... 95
5.37. How to use it in PlanetLab? .................................................................................. 96
5.38. How to use it in PlanetLab? .................................................................................. 96
5.39. Create the topology ............................................................................................... 97
5.40. Create the topology ............................................................................................... 97
5.41. Modify the topology ............................................................................................. 98
5.42. Literature .............................................................................................................. 99
6. 6 Bandwidth measurement methods Network path characterization .................................. 99
6.1. Methods to measure path characteristics ................................................................ 99
6.2. Capacity ................................................................................................................ 101
6.3. Available bandwidth ............................................................................................. 101
6.4. Capacity and Available Bandwidth ...................................................................... 101
6.5. Passive Techniques ............................................................................................... 102
6.6. Active probing methods ........................................................................................ 102
vi
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.7. Basic ideas ............................................................................................................ 102
6.8. State of the art Bandwidth estimation methods .................................................... 103
6.9. SLoPS Self-Loading Periodic Streams ................................................................. 104
6.10. SLoPS Self-Loading Periodic Streams ............................................................... 105
6.11. SLoPS ................................................................................................................. 105
6.12. SLoPS ................................................................................................................. 105
6.13. OWD variations .................................................................................................. 106
6.14. How it works? ..................................................................................................... 107
6.15. How to determine parameters K,L and T? .......................................................... 107
6.16. Fleets of streams ................................................................................................. 107
6.17. How to detect the increasing trend of OWDs? ................................................... 107
6.18. PathLoad uses two metric to recognize increasing trend .................................... 107
6.19. PDT and PCT examples ...................................................................................... 108
6.20. PCT variations examples .................................................................................... 108
6.21. PDT variations example ..................................................................................... 109
6.22. Rate adjustment .................................................................................................. 110
6.23. Performance ........................................................................................................ 110
6.24. Packet Pair-based methods ................................................................................. 111
6.25. PathChirp Chirp Packet Trains ........................................................................... 111
6.26. PathChirp ............................................................................................................ 112
6.27. PathChirp Methodology ...................................................................................... 112
6.28. Self-Induced Congestion .................................................................................... 113
6.29. Excursions .......................................................................................................... 114
6.30. pathChirp Tool .................................................................................................... 115
6.31. Comparison with Pathload .................................................................................. 115
6.32. PathSensor: Granular model-based bandwidth estimation ................................. 116
6.33. Estimating output spacing with fluid traffic for a single-hop scenario ............... 117
6.34. Fluid curves for single-hop ................................................................................. 117
6.35. How to simulate cross traffic? ............................................................................ 118
6.36. Output spacing .................................................................................................... 119
6.37. Output spacing .................................................................................................... 119
6.38. Explicit solution for M/D/1 queues .................................................................... 119
6.39. Explicit solution for M/D/1 queues .................................................................... 119
6.40. Literature ............................................................................................................ 121
7. 7 Topology discovery in large-scale networks .................................................................. 122
7.1. Topology discovery .............................................................................................. 122
7.2. Challenges ............................................................................................................ 122
7.3. Naiv approaches ................................................................................................... 123
7.4. CAIDA’s Skitter ................................................................................................... 123
7.5. NetDimes .............................................................................................................. 124
7.6. Expectations ......................................................................................................... 125
7.7. Different methods ................................................................................................. 125
7.8. ROUTE DISCOVERY ......................................................................................... 126
7.9. Traceroute ............................................................................................................. 126
7.10. How traceroute works? ....................................................................................... 126
7.11. How traceroute works? ....................................................................................... 127
7.12. How traceroute works? ....................................................................................... 127
7.13. How traceroute works? ....................................................................................... 127
7.14. Problems ............................................................................................................. 128
7.15. Problems with load balancers ............................................................................. 128
7.16. Problems with load balancers ............................................................................. 129
7.17. What causes this anomaly? ................................................................................. 129
7.18. A more complex example ................................................................................... 130
7.19. What can we do? ................................................................................................. 130
7.20. Paris Traceroute Algorithm ................................................................................ 130
7.21. Finding the NEXTHOP ...................................................................................... 131
7.22. The key ideas behind NEXTHOP ....................................................................... 132
7.23. Number of probes and the expected number of interfaces at 95 percent confidence level
132
7.24. SELECTFLOW: Selecting a flow ...................................................................... 133
vii
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
7.25. SELECTFLOW: discovering new flows crossing router r .................................
7.26. PERPACKET .....................................................................................................
7.27. Discovering nexthop interfaces in presence of a load balancer ..........................
7.28. Discovering nexthop interfaces in presence of a load balancer ..........................
7.29. Performance of Paris traceroute ..........................................................................
7.30. Load balancers ....................................................................................................
7.31. TOPOLOGY DISCOVERY ...............................................................................
7.32. Topology discovery ............................................................................................
7.33. DoubleTree .........................................................................................................
7.34. The actual topology ............................................................................................
7.35. Intra-monitor redundancy ...................................................................................
7.36. Inter-monitor redundancy ...................................................................................
7.37. Tree like structures .............................................................................................
7.38. Monitor rooted tree .............................................................................................
7.39. Destination rooted tree ........................................................................................
7.40. DoubleTree .........................................................................................................
7.41. Maintaining trees ................................................................................................
7.42. DoubleTree results ..............................................................................................
7.43. Literature ............................................................................................................
8. 8 Network tomography .....................................................................................................
8.1. What does tomography mean? ..............................................................................
8.2. Network tomography? ..........................................................................................
8.3. Network tomography? ..........................................................................................
8.4. How does it work? ................................................................................................
8.5. How does it work? ................................................................................................
8.6. Network Tomography ...........................................................................................
8.7. Network Tomography ...........................................................................................
8.8. Network Tomography ...........................................................................................
8.9. Network Tomography ...........................................................................................
8.10. What else is needed? ...........................................................................................
8.11. Estimating Source-destination traffic intensities ................................................
8.12. Estimating Source-Destination traffic intensities ...............................................
8.13. A toy example .....................................................................................................
8.14. EM algorithm ......................................................................................................
8.15. MLE and Normal Approximations .....................................................................
8.16. MultiCast-based loss inference ...........................................................................
8.17. Loss model ..........................................................................................................
8.18. Loss inference .....................................................................................................
8.19. Solution with EM ................................................................................................
8.20. Convergence .......................................................................................................
8.21. Convergence .......................................................................................................
8.22. Unicast network tomography ..............................................................................
8.23. Sandwich probing ...............................................................................................
8.24. Sandwich probing ...............................................................................................
8.25. Measurement framework ....................................................................................
8.26. Topology Identification ......................................................................................
8.27. Simplifying the problem .....................................................................................
8.28. Find the tree ........................................................................................................
8.29. Illustration ...........................................................................................................
8.30. Literature ............................................................................................................
9. 9 Network coordinates systems .........................................................................................
9.1. Introduction ..........................................................................................................
9.2. The key idea of an NCS ........................................................................................
9.3. Localization Techniques .......................................................................................
9.4. Localization Techniques .......................................................................................
9.5. Localization Techniques .......................................................................................
9.6. Network Coordinates System Basics ....................................................................
9.7. Network Coordinates System Basics ....................................................................
9.8. Network Coordinates Systems Advantages ..........................................................
9.9. LANDMARK BASED NCS ................................................................................
viii
Created by XMLmind XSL-FO Converter.
133
133
133
134
135
136
137
137
137
138
138
138
138
139
139
139
140
140
140
140
140
141
142
142
143
144
145
146
147
148
148
148
148
149
149
149
150
151
151
151
152
153
154
154
155
155
155
155
156
157
157
157
158
158
158
158
158
159
160
160
Large-scale Internet measurement
9.10. IDMaps ...............................................................................................................
9.11. Landmark based NCSs .......................................................................................
9.12. Global Network Positioning ...............................................................................
9.13. Lighthouses .........................................................................................................
9.14. Lighthouses .........................................................................................................
9.15. Network Positioning System ..............................................................................
9.16. Internet Coordinate System ................................................................................
9.17. Internet Coordinate System ................................................................................
9.18. Virtual landmarks ...............................................................................................
9.19. Internet Distance Estimation Service ..................................................................
9.20. DISTRIBUTED NCS .........................................................................................
9.21. Distributed NCSs ................................................................................................
9.22. Practical Internet Coordinates .............................................................................
9.23. Big-Bang Simulation ..........................................................................................
9.24. Big-Bang Simulation ..........................................................................................
9.25. Vivaldi ................................................................................................................
9.26. Vivaldi ................................................................................................................
9.27. Vivaldi ................................................................................................................
9.28. Vivaldi – Centralized algorithm .........................................................................
9.29. Vivaldi – Centralized algorithm .........................................................................
9.30. Distributed Vivaldi with constant timesteps .......................................................
9.31. Vivaldi – Adaptive timesteps ..............................................................................
9.32. Decentralized Vivaldi with adaptive timestep ....................................................
9.33. Latency data for performance analysis ...............................................................
9.34. Timestep choice ..................................................................................................
9.35. Convergence and robustness against high-error nodes .......................................
9.36. Communication patterns .....................................................................................
9.37. Triangle Inequality Violations ............................................................................
9.38. Euclidean spaces .................................................................................................
9.39. Spherical coordinates ..........................................................................................
9.40. Height model ......................................................................................................
9.41. Height model ......................................................................................................
9.42. Pharos - Hierarchical Vivaldi .............................................................................
9.43. Pharos – The algorithm .......................................................................................
9.44. Hierarchical distance prediction .........................................................................
9.45. A two-tier ICS ....................................................................................................
9.46. Triangular inequality violation ...........................................................................
9.47. Triangular inequality violation ...........................................................................
9.48. Two-tier Vivaldi .................................................................................................
9.49. Two-tier Vivaldi .................................................................................................
9.50. Limitations ..........................................................................................................
9.51. Benefits ...............................................................................................................
9.52. Comparison of different techniques ....................................................................
9.53. Security in NCS ..................................................................................................
9.54. Security in NCS ..................................................................................................
9.55. Security in NCS ..................................................................................................
9.56. Future directions .................................................................................................
9.57. Literature ............................................................................................................
10. 10 IP geolocation ............................................................................................................
10.1. Motivation ..........................................................................................................
10.2. IP Geolocation in general ...................................................................................
10.3. Whois based location estimation example for passive geolocation ....................
10.4. IP Geolocation in general ...................................................................................
10.5. IP Geolocation in general ...................................................................................
10.6. THE FIRST STEPS ............................................................................................
10.7. IP2Geo – Single point localization .....................................................................
10.8. GeoTrack – main idea .........................................................................................
10.9. GeoTrack ............................................................................................................
10.10. GeoPing - Delay based localization ..................................................................
10.11. GeoPing - details ..............................................................................................
ix
Created by XMLmind XSL-FO Converter.
160
161
161
162
163
164
165
165
165
166
167
167
167
167
168
169
170
170
170
171
171
171
171
171
171
171
171
172
173
173
174
174
174
175
175
175
176
176
176
177
177
177
178
178
178
179
179
180
180
180
181
183
184
184
188
188
188
189
189
189
Large-scale Internet measurement
10.12. GeoCluster ........................................................................................................
10.13. GeoCluster ........................................................................................................
10.14. GeoCluster – Clustering IP addresses ...............................................................
10.15. Performance of GeoCluster ..............................................................................
10.16. ADVANCED TECHNIQUES ..........................................................................
10.17. Constraint Based Geolocation ..........................................................................
10.18. Constraint Based Geolocation ..........................................................................
10.19. Octant IP geolocation framework .....................................................................
10.20. It is more than a simple method, it is a framework ...........................................
10.21. Notations ...........................................................................................................
10.22. Octant – Landmarks and constraints .................................................................
10.23. Estimated location ............................................................................................
10.24. Mapping latencies to distances .........................................................................
10.25. Mapping latencies to distances .........................................................................
10.26. Mapping latencies to distances .........................................................................
10.27. Last hop delays .................................................................................................
10.28. Eliminating last hop delays in Octant ...............................................................
10.29. Last hop delays in Octant .................................................................................
10.30. Last hop delays .................................................................................................
10.31. Results ..............................................................................................................
10.32. Results ..............................................................................................................
10.33. Spotter – a probabilistic approach ....................................................................
10.34. Travel time – distance relation .........................................................................
10.35. Travel time – distance relation .........................................................................
10.36. Statistical delay-distance model ........................................................................
10.37. Statistical delay-distance model ........................................................................
10.38. Statistical delay-distance model ........................................................................
10.39. Evaluation – "Probabilistic triangulation" ........................................................
10.40. Performance analysis ........................................................................................
10.41. Topology-based Geolocation ............................................................................
10.42. Topology based geolocation .............................................................................
10.43. Summary of techniques ....................................................................................
10.44. Estimate hop latencies ......................................................................................
10.45. Estimate hop latencies ......................................................................................
10.46. Clustering interfaces .........................................................................................
10.47. Clustering interfaces .........................................................................................
10.48. Validating location hints ...................................................................................
10.49. Constraint optimization ....................................................................................
10.50. Constraint optimization ....................................................................................
10.51. Results ..............................................................................................................
10.52. Results ..............................................................................................................
10.53. Other issues to be handled Indirect routes ........................................................
10.54. Other issues to be handled Indirect routes discovery ........................................
10.55. Other issues to be handled Handling uncertainty ..............................................
10.56. Other issues to be handled Iterative refinement ................................................
10.57. Literature ..........................................................................................................
11. 11 Geography of the Internet On the spatial properties of network topology .................
11.1. Network research ................................................................................................
11.2. The distance is what really counts. .....................................................................
11.3. Data collection ....................................................................................................
11.4. Data collection ....................................................................................................
11.5. Covered areas .....................................................................................................
11.6. Histogram maps ..................................................................................................
11.7. Transforming spatial distributions ......................................................................
11.8. Transforming spatial distributions ......................................................................
11.9. Transforming spatial distributions ......................................................................
11.10. Transforming spatial distributions ....................................................................
11.11. A router-likelihood map ...................................................................................
11.12. Likelyhood of router positions - US .................................................................
11.13. Likelyhood of router positions - US .................................................................
x
Created by XMLmind XSL-FO Converter.
190
190
190
190
191
191
192
192
193
193
194
194
194
194
195
195
195
196
196
196
196
197
197
198
199
199
200
200
201
202
202
203
203
203
204
204
205
205
205
205
206
206
206
207
207
207
208
208
210
210
212
214
214
216
217
217
218
219
219
220
Large-scale Internet measurement
11.14. Characterizing the link length ...........................................................................
11.15. Characterizing the network links ......................................................................
11.16. Frequency of link lengths .................................................................................
11.17. Frequency of link lengths .................................................................................
11.18. Frequency of link lengths .................................................................................
11.19. Frequency of link lengths .................................................................................
11.20. Frequency of link lengths .................................................................................
11.21. Frequency of link lengths .................................................................................
11.22. Frequency of link lengths .................................................................................
11.23. Frequency of link lengths .................................................................................
11.24. Distribution of link lengths ...............................................................................
11.25. Distribution of link lengths ...............................................................................
11.26. Distribution of link lengths ...............................................................................
11.27. Distribution of link lengths ...............................................................................
11.28. Distribution of link lengths ...............................................................................
11.29. The embedded topology ...................................................................................
11.30. Characterizing network paths ...........................................................................
11.31. Aggregated path length .....................................................................................
11.32. Circuitousness ...................................................................................................
11.33. Symmetry .........................................................................................................
11.34. Symmetry .........................................................................................................
11.35. Direction dependence of lateral deviations .......................................................
11.36. Unfamiliar routing phenomenon? .....................................................................
11.37. Literature ..........................................................................................................
12. 12 Network traffic analysis, clustering and classification ...............................................
12.1. Traffic .................................................................................................................
12.2. Traffic classification ...........................................................................................
12.3. Traffic classification ...........................................................................................
12.4. Quality of Service (QoS) ....................................................................................
12.5. Traffic Classification ..........................................................................................
12.6. Traffic Classification ..........................................................................................
12.7. Different approaches ...........................................................................................
12.8. Deep Packet Inspection .......................................................................................
12.9. Deep Packet Inspection Basics ...........................................................................
12.10. Multi-byte pattern matching .............................................................................
12.11. Deploying multiple multi-byte DFAs ...............................................................
12.12. True positive VS False positive etc. .................................................................
12.13. Performance of different DPI tools ...................................................................
12.14. Classical recipe for flow statistic-based traffic classification ...........................
12.15. Statistical payload analysis ...............................................................................
12.16. KISS: Stochastic Packet Inspection ..................................................................
12.17. Chi square statistics ..........................................................................................
12.18. Decision process ...............................................................................................
12.19. Validation on a real traffic trace .......................................................................
12.20. Early Identification of Peer-To-Peer Traffic .....................................................
12.21. Modeling a flow ................................................................................................
12.22. Classification via probabilistic models .............................................................
12.23. Data Collection for ground truth .......................................................................
12.24. Experiments ......................................................................................................
12.25. Feasibility test ...................................................................................................
12.26. Feasibility test ...................................................................................................
12.27. How much data is needed? ...............................................................................
12.28. How much data is needed? ...............................................................................
12.29. How much data is needed? ...............................................................................
12.30. Robustness ........................................................................................................
12.31. Training set sizes ..............................................................................................
12.32. Is it protocol independent? ................................................................................
12.33. Robustness Asymmetric routing .......................................................................
12.34. Robustness Unknown traffic .............................................................................
12.35. Real traffic traces ..............................................................................................
xi
Created by XMLmind XSL-FO Converter.
221
222
222
222
223
223
224
225
226
227
229
229
229
230
230
231
232
232
233
234
234
235
236
236
237
237
237
238
239
239
239
240
240
240
241
242
242
243
243
244
244
244
245
245
245
245
246
247
248
248
249
250
251
251
252
252
253
254
254
255
Large-scale Internet measurement
12.36. Confusion matrix ..............................................................................................
12.37. Literature ..........................................................................................................
13. 13 Measurements in peer-to-peer networks ....................................................................
13.1. Centralized VS P2P ............................................................................................
13.2. Peer-to-peer networks .........................................................................................
13.3. Peer-to-peer networks .........................................................................................
13.4. Some P2P protocols ............................................................................................
13.5. What do we want to measure? ............................................................................
13.6. How can we do that? ..........................................................................................
13.7. Gnutella ..............................................................................................................
13.8. Gnutella vs Napster ............................................................................................
13.9. Gnutella vs Napster Lifetime of the peers ..........................................................
13.10. Gnutella vs Napster Shared files vs Shared data ...............................................
13.11. Gnutella Latencies and downstream bandwidth ...............................................
13.12. Kademlia ...........................................................................................................
13.13. Kademlia ...........................................................................................................
13.14. Kademlia ...........................................................................................................
13.15. Kademlia ...........................................................................................................
13.16. Kademlia Collected data ...................................................................................
13.17. Kademlia Peers with dynamic IPs ....................................................................
13.18. Kademlia Peer availability ................................................................................
13.19. IP-based availability is similar to what we have seen for Gnutella ..................
13.20. How duration can affect the availability? .........................................................
13.21. Time of day effects ...........................................................................................
13.22. BitTorrent .........................................................................................................
13.23. File Sharing ......................................................................................................
13.24. *.torrent ............................................................................................................
13.25. The Tracker .......................................................................................................
13.26. BitTorrent .........................................................................................................
13.27. An example .......................................................................................................
13.28. File sharing .......................................................................................................
13.29. Lifetime of a torrent Seeders and leechers ........................................................
13.30. Pieces and sub-pieces .......................................................................................
13.31. Piece Selection ..................................................................................................
13.32. Piece Selection ..................................................................................................
13.33. Choking ............................................................................................................
13.34. Choking algorithm ............................................................................................
13.35. Optimistic unchoke ...........................................................................................
13.36. Upload only mode ............................................................................................
13.37. Lifetime of a torrent ..........................................................................................
13.38. Peer behaviour ..................................................................................................
13.39. Peer behaviour ..................................................................................................
13.40. Literature ..........................................................................................................
14. 14 Analysis of online social networks .............................................................................
14.1. Be socialized .......................................................................................................
14.2. Increasing interest ...............................................................................................
14.3. Twitter users .......................................................................................................
14.4. Social Flow .........................................................................................................
14.5. Twitter ................................................................................................................
14.6. Why do we analyze it? ........................................................................................
14.7. Is the data available? ...........................................................................................
14.8. Quantifying influence .........................................................................................
14.9. How to measure influence? ................................................................................
14.10. Cascades ...........................................................................................................
14.11. Cascade sizes and depths ..................................................................................
14.12. How to predict influence? .................................................................................
14.13. Regression tree for influence prediction ...........................................................
14.14. Past influences vs followers ..............................................................................
14.15. Information flow on twitter ..............................................................................
14.16. How to identify Elite users? .............................................................................
xii
Created by XMLmind XSL-FO Converter.
255
256
256
256
257
258
258
259
259
260
260
260
261
262
263
263
263
264
264
264
265
266
267
267
268
269
269
270
270
271
271
271
271
272
272
272
273
273
274
274
275
276
277
277
277
278
279
279
280
280
280
281
281
282
283
284
284
284
286
286
Large-scale Internet measurement
14.17. Snowball sample of Twitter lists ......................................................................
14.18. Activity sample of Twitter users .......................................................................
14.19. Who listens to whom? ......................................................................................
14.20. Who listens to whom? ......................................................................................
14.21. Two step information flow ...............................................................................
14.22. Who are the intermediaries? .............................................................................
14.23. Who are the intermediaries? .............................................................................
14.24. Network Dynamics ...........................................................................................
14.25. Memetracking ...................................................................................................
14.26. Memetracking ...................................................................................................
14.27. Collective attention on Twitter .........................................................................
14.28. Collective attention on Twitter .........................................................................
14.29. Collective attention onTwitter ..........................................................................
14.30. Collection attention on Twitter .........................................................................
14.31. Literature ..........................................................................................................
15. 15 Measurements in mobile and cellular networks .........................................................
15.1. Internet and cellular networks ............................................................................
15.2. What can measurements reveal? .........................................................................
15.3. A widely heterogeneous environment ................................................................
15.4. How do the different access technologies affect the performance? ....................
15.5. HSDPA downlink ...............................................................................................
15.6. HSDPA Uplink ...................................................................................................
15.7. LTE Downlink ....................................................................................................
15.8. LTE Uplink .........................................................................................................
15.9. What is the problem with the first 10 seconds? ..................................................
15.10. Large scale measurements ................................................................................
15.11. Performance of different access technologies ...................................................
15.12. Performance of different access technologies ...................................................
15.13. Performance of different access technologies ...................................................
15.14. Performance of different access technologies ...................................................
15.15. Performance of different access technologies ...................................................
15.16. Performance of different access technologies ...................................................
15.17. Performance of different access technologies ...................................................
15.18. Long-term trend and daily patterns ...................................................................
15.19. Long-term trend and daily patterns ...................................................................
15.20. Long-term trend and daily patterns ...................................................................
15.21. Measuring DNS lookup time in 3G networks ...................................................
15.22. Measuring DNS lookup time in 3G networks ...................................................
15.23. Downlink throughput of major carriers in the U.S. ..........................................
15.24. Cellular network policies ..................................................................................
15.25. Port scans for large carriers in the U.S. ............................................................
15.26. Port scans for large carriers in the U.S. ............................................................
15.27. Port scans for large carriers in the U.S. ............................................................
15.28. FTP blocking in T-Mobile’s network ...............................................................
15.29. HTTP proxy port blocking in T-Mobile’s network ..........................................
15.30. How can these middleboxes affect user experience? ........................................
15.31. IP spoofing ........................................................................................................
15.32. IP spoofing measurement .................................................................................
15.33. Short TCP connection timeout ..........................................................................
15.34. How short TCP connection timeouts affect energy consumption? ...................
15.35. Packet reordering in the middleboxes ...............................................................
15.36. Packet reordering in the middleboxes ...............................................................
15.37. Packet reordering in the middleboxes ...............................................................
15.38. NAT traversal ...................................................................................................
15.39. NAT mapping in cellular networks ..................................................................
15.40. Mobile network measurement projects .............................................................
15.41. Literature ..........................................................................................................
xiii
Created by XMLmind XSL-FO Converter.
286
287
288
288
289
290
291
293
293
293
294
295
295
296
297
297
298
298
299
299
299
300
301
302
302
303
304
304
305
306
307
308
309
310
311
311
312
313
313
314
314
315
316
317
317
318
318
319
320
320
321
322
322
323
324
324
325
Large-scale Internet measurement
1. 1 Introduction to Internet measurements
1.1. Course Information
• Instructor: Sándor Laki
• E-mail: [email protected]
• Office hours: Thursday, 10:00-12:00, DT. 2.506
• Lecture: T/Th, 10:00 – 12:00
• Location: DT. 2.516
• Web site: http://lakis.web.elte.hu/EIT/lsim-2012autumn
• Mailing list: [email protected]
1.2. Grading
• Prerequisites
• Graduate level Computer Networking courses
• http://lakis.web.elte.hu/comnet-eng-bsc/
• http://people.inf.elte.hu/lukovszki/Courses/1314BSC/
• Credits: 6 ETCS
• Grading
• Midterm: 20%
• Good participation 10%
• Term project: 50%
• Specification: 10%
• Work in progress report and midterm. presentation: 15%
• Final report and presentation: 25%
• Final exam: 20%
1
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Photo by Sage Ross
1.3. Term project
• Do as a team of 2+ students
• Decide what to measure and specify how to do that
• Build measurement tools or use existing platforms
• Perform measurements
• Collect and analyze measurement data
• Identify potential applications or further research directions
1.4. What is this course about?
• Not an introduction to the Internet
• Focus on Internet Measurements
• What to measure?
• traffic, infrastructure
• applications, performance
• Why is it important?
2
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Traffic engineering, capacity planning, topology mapping
• How does Internet look like?
• Application and traffic characteristics, Topology/route choice
• How to measure and how to interpret measurement results?
• Measurement methodologies and challenges
• Design trade-offs
• Design of measurement/monitoring systems
• Tools: data collection, modeling, statistical inference, etc.
• Image courtesy of Michal Marcol / FreeDigitalPhotos.net
1.5. Reading
• Mark Crovella, Balachander Krishnamurthy:
• Internet Measurement: Infrastructure, Traffic and Applications.
• Wiley, 2006.
• Raj Jain:
• The Art of Computer Systems Performance Analysis.
• Wiley and Sons, New York, 1991
• Kurose and Ross:
• Computer Networking: A Top-Down Approach Featuring the Internet.
• Fifth edition, Addison-Wesley, 2009
3
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
1.6. INTRODUCTION
1.7. Once upon a time...
4
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
1.8. And no...
5
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Source: wikipedia.org
1.9. Another aspect of Internet evolution
• Once
6
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• and now…
7
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
1.10. Today’s Internet
• Tier I (Core network)
• Tier II providers
• Customer IP networks
• ISP-1
• ISP-2
• ISP-3
• IXP
• IXP
• IXP
• Hyper Giants
• Google, Akamai, CDN, etc.
• National and Global Transit Backbones
1.11. Why do we need Internet measurements?
• Internet seems to work well
• Despite the exponential growth in its size
• Despite the high variety of applications
• Email, Web, Instant messaging, File sharing, Social networks, Games, etc.
• Why do we bother measuring various aspects of it then?
8
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
1.12. Why do we need Internet measurements?
• Internet is far from being ideal
• Internet measurements help us
• To better understand why the Internet works and how it does
• To design new features that may lead to the next generation Internet
• To identify the weaknesses of network protocols
1.13. What to measure?
• Physical Properties
• Network devices
• routers, NAT boxes, firewalls, switches
• Links
• wired, wireless
• Topology Properties
• Various levels
• Autonomous Systems (AS), Points of Presence (PoP), Routers, Interfaces
• Traffic Properties
• Delays
• Transmission, Propagation, Queuing, Processing etc.
• Losses, Throughput, Jitter, etc.
1.14. Why is it challenging to measure the Internet?
• Poor observability of network characteristics
• The reasons behind
• Core simplicity
• Layered architecture and hidden elements
• Administrative boundaries
1.15. Core simplicity
• The network built up from very simple elements
• Keep it simple design concept
• Stateless nature
9
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Generally end-to-end arguments
• Packets are not tracked
• The interaction with the network is hard to observe
1.16. Layered architecture and hidden network elements
• Hourglass model hides the details of the lower layers
• IP everywhere, few transport protocols
• It is almost impossible to measure the layers below IP
• HTTP, Email, FTP, DNS, RTP, SMTP, WWW, VoIP, BitTorrent,…
• TCP, UDP,...
• IP
• Ethernet, CSMA
• MPLS, PPP, sonet, ...
• WiFi, WiMax, LTE, UMTS, copper, fiber, ...
1.17. IP centric
• You must be familiar with IPv4 and IPv6
• IPv4 header:
1.18. Middleboxes in the carriers’ networks
• Hidden elements
10
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Firewalls
• Filter out traffic, block ports, etc.
• Proxies and IP sniffers
• Improve performance
• Traffic shapers
• Improve traffic management
• NAT boxes
• Utilize IP space more efficiently
• Active network measurements have to take into consideration the presence of hidden middleboxes
• Probe traffic may be blocked
• Traffic shapers may affect probe traffic
• NATs hide the internal structure and size of the network
1.19. Administrative boundaries
• System of systems
• Interconnected networks operated by different organizations
• ISPs hide the details of their network
• E.g. instead of router level topologies only PoP level ones are available
• Inter-AS routing is based on business decisions
• Economical and political aspects
1.20. Applications
• Traffic engineering
• Troubleshooting
• Anomaly detection
• Security forensics
• Feasibility check of new ideas
1.21. Network measurements
• Infrastructure
• Traffic
• Application
1.22. Infrastructure measurements
11
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Basic path characteristics
• Loss, delay, jitter, bandwidth, etc.
• Topology measurements
• Network tomography
• Network coordinate systems
• IP geolocation
• Wireless mesh networks
1.23. Traffic measurements
• Packet traces
• Sampling
• Flow characteristics
• Inter arrival times, packet size distribution, etc.
• Traffic Matrix Estimations
• Deep Packet Inspection
• Statistical Traffic Classification
• Statistical Payload Analysis
1.24. Application measurements
• Content Delivery Networks
• Web content clustering
• Skype and other VOIP measurements
• File sharing
• Video On Demand, IPTV
• Malware
• Social networks
1.25. Active and passive measurements
• Active measurements
• Methods that inject probe traffic into the network for the purpose of the measurement, and examine how
the network affect the properties of the probe traffic
• Typically end-to-end
12
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Some tools: ping, owamp, traceroute, iperf, etc.
• Passive measurements
• Methods that capture traffic generated by other users and applications to calculate network related metrics
• Examples
• Routeview repositories stores BGP tables from a large set of Ases
• Traffic trace captured by pcap at a given point of the network
• Flow (byte) counters in routers
1.26. Internet Measurements
• Internet Measurement is key to designing the next generation communication network
• Fundamental design principles of the current internet make it harder for measuring various aspects of it
• Preliminary research has resulted in a set of basic tools and methods to measure aspects like topology, traffic
etc.
• Accuracy of such methods is still an open question
• There is still a lot of ground to cover in this direction and this is where researchers like you come into the
equation
1.27. Related Conferences and Journals
• Conferences
• Internet Measurement Conference
• Passive and Active Measurement Workshop
• ACM SIGMETRICS
• Network and Distributed System Security Symposium
• ACM SIGCOMM
• IEEE INFOCOM
• Journals
• Computer Networks (ComNet)
• IEEE Transactions on Networking (ToN)
• IEEE Journal on Selected Areas in Communication (JSAC)
2. 2 Analytical background
2.1. Analytical background
We need tools for study Internet in a quantitative fashion
13
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Linear algebra
• Probability and statistics
• Graph theory
Further readings in these topics:
• 1.Linear algebra wikibook: http://en.wikibooks.org/wiki/Linear_Algebra
• 2. Mario F. Triola: Elementary Statistics
• 3. Reinhard Diestel: Graph Theory http://diestel-graph-theory.com/index.html
2.2. LINEAR ALGEBRA
2.3. Notations
2.4. Norms and orthogonality
2.5. Matrices
14
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.6. Eigenvectors and eigenvalues
2.7. Alternate algebras
2.8. PROBABILITY AND STATISTICS
2.9. Why do we need statistics and probability theory?
• Most of the mechanisms in networks are not deterministic
• Randomized algorithms
• Improved robustness, load balancing, etc.
• Stochastic behavior of incoming traffic
• Without probability theory and statistics it would be hard to analyze them
15
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.10. Notations
2.11. Definitions
2.12. Definitions - II
2.13. Expected values and moments
2.14. Variance and standard deviation
2.15. Joint probability
2.16. Conditional probability
2.17. Central limit theorem
2.18. Distributions for Internet measurements
16
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.19. Stochastic processes
• Typically, Internet measurements arrive over time, in some order
• To use the tools of probability in this settings we need to define the sequence of random variables which is
called a stochastic process.
17
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.20. Stochastic processes
2.21. Stochastic processes
2.22. Characterization of a stochastic process
2.23. Simpler stationary conditions
2.24. Measures of dependence
2.25. Measures of dependence
18
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.26. Measures of dependence
2.27. Modeling network traffic and user activity
2.28. Modeling network traffic and user activity
2.29. Short and long tailed distributions
2.30. Short and long tailed distributions
2.31. Short and long tailed distributions
2.32. Heavy tailed/power-law distribution
2.33. Heavy tailed distribution
19
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• New York City area road map
• Link lengths in km
2.34. Measured data
20
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Describing data
• For example: "mean of a dataset"
• An objectively measurable quantity which is the average of a set of known values
• Describing probability models
• For example: "mean of a random variable"
• A property of an abstract mathematical construct
• To emphasize the distinction, we add the adjective "empirical" to describe data
• Empirical mean vs. mean
• Classification of measured data
• Numerical: i.e. numbers
• Categorical: i.e. symbols, names, tokens, etc.
2.35. Describing data
2.36. More detailed descriptions
• Quantiles
• The pth quantile is the value
• below which the fraction p of
• the values lies.
• Median is the 0.5-quantile
• Percentile
• This can be expressed as percentile as well.
• E.g. the 90th percentile is the value that is larger than 90 percent of the data
21
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.37. Histogram
• Defined in terms of bins which are a particular of the observed values
• Counts how many values fall in each bin
• A natural empirical analog of a random variable’s probability density function (PDF) or distribution function
• Practical problem:
• How to determine the bin boundaries
2.38. Empirical cumulative distribution function (CDF)
22
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Involves no binning or averaging of data values
• Provides more information about the dataset than the histogram.
• For each unique value in the data set, the fraction of data items that are smaller than that value (quantile).
• Empirical CCDF can be used similarly
2.39. Categorical data description
• Probability distribution
• An analog of the histogram for categorical data
• Measure the empirical probability of each symbol in the dataset
• Use histogram in decreasing order
23
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.40. Describing memory and stability
Time series data
• Question: Do successive measurements tend to have any relation to each other?
Memory
• When the value of a measurement tends to give some information about the likely values of future
measurements
• Empirical autocorrelation function (ACF):
Stability
• If its empirical statistics do not seem to be changing over time.
• Subjective
• Objective measures
• A typical approach is to break the dataset into windows
• E.g. a set of 1000 observations can be divided into 10 windows consisting of the 1st 100 observations, the
2nd 100 observations, and so on.
• Empirical statistics are calculated for each window then Looking for consistency, trends or predictable
variation, etc.
2.41. High variability in Internet data
• Traditional statistical methods focuses on low or moderate variability of the data, e.g. Normal distribution
• However, Internet data shows high variability
• It consists of many small values mixed with a small number of large value
• A significant fraction of the data may fall many standard deviations from the mean
24
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Empirical distribution is highly skewed, and empirical mean and variance are strongly affected by the rare,
large observations
• It may be modeled with a subexponential or heavy tailed distribution
• Mean and variance are not good metrics for high variability data, while quantiles and the empirical
distribution are better,
• e.g. empirical CCDF on log-log axes for long-tailed distribution
2.42. Zipf’s law
• Categorical distributions can also be highly skewed
• A model for the shape of a categorical distribution when data values are ordered by decreasing empirical
probability,
• e.g. URLs of Web pages
• Zipf’s law refers to
• the situation where
• For some positive
• constants c and B
2.43. GRAPH THEORY
2.44. Graph theory
• Generally, networks can be handled as directed or undirected graphs
• However, different phenomena could also be analysed by graph theory
• E.g. retweet graph in social network analysis
25
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Graph theory could help us to characterize networks and other phenomena and analyze their properties
2.45. Graphs
• A graph is a pair
• Undirected and directed
• Unweighted and weighted
• 5
• 2
• 5
• 3
• 7
• 6
• 1
• 3
2.46. Subgraphs
2.47. Connected graphs
2.48. Metrics for characterization
2.49. Metrics for characterization
2.50. Matrix representation
26
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.51. Applications of Routing Matrix
• Origin-destination flow
2.52. Applications of routing matrix
• Delay of paths
• The routing and end-to-end delays are known
27
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.53. Artificial graph constructions
• To model real networks by generating random graphs
• Erdős-Rényi model
• Random graphs
• Theoretical relevance
• Binomial degree distribution
• Barabási-Albert model
• Random scale-free networks
• Modeling natural and human-made systems
• Power-law degree distribution
• Other models like Watts and Strogatz model
2.54. Erdős-Rényi random graph
2.55. Erdős-Rényi random graph
28
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.56. Generalized random graph
2.57. Preferential attachment model
2.58. Preferential attachment model
2.59. Regular vs Random graphs
• Regular graph
• Long characteristic path length
• High degree of clustering
• Random Graph
• Short paths
• Low degree of clustering
• Small world graph
• Short characteristic path length
• High degree of clustering
2.60. AS level topology
29
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.61. AS level topology
• High variability in degree distribution
• Some ASes are very highly connected
• Different ASes have dramatically different roles in the network
• Node degree seems to be highly correlated with AS size
• Generative models of AS graph
• "Rich get richer" model
• Newly added nodes connect to existing nodes in a way that tends to simultaneously minimize the
physical length of the new connection, as well as the average number of hops to other nodes
• New ASes appear at an exponentially increasing rate, and each AS grows exponentially as well
2.62. AS level topology
30
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.63. MODELING
2.64. Measurement and modeling
• Model
• Simplified version of something else
• Classification
• A system model: simplified descriptions of computer systems
• Data models: simplified descriptions of measurements
• Data models
• Descriptive data models
• Constructive data models
2.65. Descriptive data model
• Compact summary of a set of measurements
• E.g. summarize the variation of traffic on a particular network as “a sinusoid with period 24 hours"
• An underlying idealized representation
• Contains parameters whose values are based on the measured data
• Drawback
• Can not use all available information
• Hard to answer "why is the data like this?" and "what will happen if the system changes?"
31
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
2.66. Constructive data model
• Succinct description of a process that gives rise to an output of interest
• E.g. model network traffic as "the superposition of a set of flows arriving independently, each consisting of
a random number of packets"
• The main purpose is to concisely characterize a dataset, instead of representing or simulating the real system
• Drawback
• Model is hard to generalize - such models may have many parameters
• The nature of the output is not obvious without simulation or analysis
• It is impossible to match the data in every aspect
2.67. Data model
• "All models are wrong, but some models are useful"
• Model is approximate
• Model omits details of the data by their very nature
• Modeling introduces the tension between the simplicity and utility of a model
• Under which model is the observed data more likely?
• Models involves a random process or component
• Three key steps in building a data model:
• Model selection
• Parsimonious: prefer models with fewer parameters over those with a larger number of parameters
• Parameters estimation
• Validating the model
2.68. Why build models
• Provides a compact summary of a set of measurements
• Exposes properties of measurements that are important for particular engineering problems, when parameters
are interpretable
• Be a starting point to generate random but "realistic" data as input in simulation
2.69. Probability models
• Why use random models in the Internet?
• Fundamentally, the processes involved are random
• The value is an immense number of particular system properties that are far too tedious to specify
32
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Random models and real systems are very different things
• It is important to distinguish between the properties of a probabilistic model and the properties of real data.
3. 3 Network measurement infrastructures ETOMIC
and SONoMA
3.1. Why Internet experimental facilities are needed?
• Internet became a large scale and complex network
• inefficient protocols
• in the Internet it is often not possible to measure traffic flows and other aspects of usage
• injecting active probes to discover these hiding properties
• Understanding the details of network and traffic dynamics
• Topology changing
• Queueing delay variations
• Available bandwidth
• One-way delay variations
• etc.
• Models and analysis of measurement data and traffic dynamics could lead to a better design of Future Internet
protocols
3.2. Existing TestBeds and Network Measurement Infrastructures
33
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• DIMES
3.3. Lifecycle of network measurements
3.4. ETOMIC
3.5. The ETOMIC system
• The European Traffic Observatory Measurement InfrastruCture (etomic) was created in 2004 within the EU
FP6 Evergrow Integrated Project
• ETOMIC also takes part in EIT ICTLabs FITTING and EU FP7 OpenLab
• Its goals:
• open access, public testbed for researcher
34
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• high precision timestamping
• GPS synchronized
• Visit: www.etomic.org
• Best Testbed Award
3.6. System architecture
• Measurement nodes/agents
• Geographically dispersed machines are ready to be used by the users
• Advanced probing nodes called ETOMs
• Lightweight APE boxes
• Central Management System
• Schedule experiments, authenticate users, etc.
• Data repository
• Network Measurement Virtual Observatory
35
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
3.7. Evolution of measurement nodes
3.8. ETOMs
• ETOMs with DAG
• Intel S875WP1E server
• Debian Linux
• Endace DAG 3.6GE
• 60 ns precision
• GPS synchronized
• Special C API for programing the DAG card
• User space applications
• Packet sender, capturer
• ETOMs with ARGOS
• HP ProLiant ML370 server
• Ubuntu Linux
• Quad core processor
36
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• ARGOS card (dev. at UAM)
• 10 ns precision
• based on netFPGA
• GPS synchronized
• No need special API
• Standard pcap library can be used
3.9. APE boxes
• Active Probing Equipment
• low cost network measurment device
• ca. 300 Euro
• based on a Blackfin programmable board
• developed at Eötvös Loránd University
• 100 ns precision
• GPS synchronized
• uClinux - Linux operating system for embedded systems
• Low energy consumption
• web service interface for performing predefined measurements (ping, traceroute, packet train sender,
capturer, etc.)
3.10. Measurement boxes
3.11. Central Management System
• IBM Blade server
• Key tasks
37
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• User management
• Node maintenance
• Experiment scheduling
• Storing experimental results (temporally)
• Web GUI
3.12. Slices VS Unique timeslots
• In PlanetLab
• Virtualization
• Slices
• Sharing the resources
• Introducing too much unpredictability in timing measurements
• Low precision timestamping
• In etomic
• No virtualization
• No slices
• Unique timeslots
38
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• You own all the resources you need during the experimentation
• High precision timestamping
3.13. The ETOMIC system
39
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
40
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• ETOMIC
• www.etomic.org
• ANME
41
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• www.onelab.eu
• www.etomic.org
42
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Onelab-2:
• 26 partners
• in 13 countries
• ETOMIC:
• ca. 40 ETOMs and
• 20 APE boxes
• on more than 20 different sites
3.14. One day on the Internet
3.15. Experimental use cases in ETOMIC
• one way delay (60nanosec resolution)
• tracking topology changes
• available bandwidth meter
• transport protocol testing
• queuing delay tomography
• geolocation experiments
• …
43
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
44
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
3.16. HOW TO USE ETOMIC?
45
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
3.17. Performing an experiment from the system’s perspective
• Setting up an experiment
• Uploading scripts, data files
• Selecting measurement agents
• Reserving one or more timeslots
• Initializing phase
• Reserving the selected
• measurement agents
• Uploading measurement scripts
• and other files needed for the experiment
• Execution phase
• Running the uploaded scripts with the preconfigured settings on the etomic nodes
• Data collection phase
• Downloading and storing the resulting data files in the CMS database
3.18. Measurement types
• User specific measurements
• Customized experiments defined by the end-users
• Almost full control over the measurement agents
• User specific periodic measurements
46
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Customized periodic experiments defined by the users
• Repeating an existing experiment more times
• inter-experiment times
• repetition count
• Kernel level periodic measurements
• Carrying out experiments by the CMS itself
• Low priority task
• executed only if the nodes are idle
• if a user level experiment comes then it is canceled
• Invisible for end-users
47
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
3.19. Necessary steps for submitting an experiment
• Create a bundle
• Choose the agents
• Configure them to run your experiment
• Create a new experiment
• Choose a bundle to be executed
• Schedule your experiment by selecting a start instant for it
• Running the experiment
• The current state of your experiment can be seen
• Downloading and analyzing the results
48
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• When the status of your experiment is finished the resulting files can be downloaded
3.20. Creating a bundle
• Online demo at www.etomic.org
3.21. Creating an experiment and querying its status
• Online demo at www.etomic.org
3.22. Downloading the results
• When the status of the experiment is finished, click on the results icon to go to the Result files tab in the
Edit/View files section where you can download all the files.
3.23. Programming DAG cards
• The libeverdag C library is provided in the Evergrow project to ease the use of the DAG 3.6GE cards.
• The unique features provided by these cards:
• Synchronized send with high precision timestamps inserted in the payload.
• High performance capture with precision timestamps of receive time
• More details in the API documentation
49
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
3.24. PUBLISHING DATA
3.25. Experimental facilities
• most network measurement projects:
• use a single dedicated infrastructure
• scan only narrow subsegments
• analyze a limited set of network characteristics
• centralized and separated from each other
• key idea: try to interconnect separate measurement data!
• large-scale behaviour
• long-term evolution
3.26. Traditional approach
• Traditionally measurements are designed to collect only
• specific data, important from the point of view of the
• researcher’s agenda.
50
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
3.27. Sharing science
• Genome databases
51
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Astronomy
3.28. Related work: CAIDA/DatCat
3.29. Related work: MoMe database
52
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
3.30. Related work: MAWI repository
3.31. Data publication efforts
• DatCat (USA):
• searchable catalog of metadata about measurements
• passive traffic traces, traceroutes, BGP tables, virus propagation studies
• MOME (EU):
• database for meta-measurement data
• packet and flow traces, routing data, HTTP traces
• standardization efforts
• sharing of analysis tools is possible (e.g. jitter calculation)
• MAWI (Japan):
• repository of passive traces from the WIDE backbone (collected since 1999)
• raw data is not stored
53
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• raw data
• from single
• infrastructure
3.32. Key ideas in data handling
• store and share raw data
• joint analysis of different types of measurement data
• reanalysis (with new evaluation methods)
• reference data (historical comparison)
• share analysis tools
• server side processing simplifies client applications
• no need to transfer bulk data packages: online processing
• standardization, network XML
• Network Measurement Virtual Observatory (nmVO)
3.33. VO approach
• The modern approach is to collect and store all measurable data and make it available for "virtual
observation". Virtual measurements can have set of goals different from the original
3.34. Unified interface
• VO can be realized by collecting measurement data from different infrastructures. Data structures should be
standardized
netXML
3.35. Casjobs User Interface for accessing data
54
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• The Network Measurement Virtual Observatory is available at
• http://nm.vo.elte.hu
3.36. SONOMA
3.37. SONoMA v1.0
• Service Oriented NetwOrk Measurement Architecture
• It was originally developed to instrument APE measurement boxes
• Provide a web service interface for the users to perform predefined network measurements
• It can easily be extended with new measurement agents
• Standardized communication via web services
• Visit: sonoma.etomic.org
3.38. Why do we need another network measurement platform?
• The state of the art measurement systems:
55
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• ETOMIC: knowhow to program card
• PlanetLab: scheduling, data collection and storage is up to the user
• Dimes: few tools; Penny; cannot predict startup
• Scriptroute: lacks cooperation and synchronization between nodes; no repository
• perfSONAR: focuses on monitoring in contrast to complex measurements
• Looking glass like: few metrics, no common iface
3.39. SONoMA
• "Its key objective is to integrate the complexity of large testbeds
• and the popularity of the lightweight services."
• Federates heterogeneous measurement agents
• (19 APE, 230 PlanetLab); dynamic accounting
• Distributes tasks in a robust and efficient way
• Provides an easy-to-extend framework
• A collection of off the shelf measurement and evaluation methods
• Decreases client side development effort (SOA)
• A tool for novel Internet applications
• that require real-time large-scale measurement data
• Archives data in an automated fashion
• In a transparent way
• Supporting both short and long term experiments
3.40. System components
56
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Actors
• A user or a user application
• Management Layer
• Account users (allow PLE) and Measurement Agents
• Authenticate users, authorize calls
• Handle sessions, decompose and schedule processes
• Hybrid resource management:
• time sharing / reserving
• Measurement Agents
• currently two kinds of agents are deployed:
• APE boxes, PlanetLab nodes
3.41. Management Layer
[fragile]
3.42. Measurement methods
MA methods defined:
• ping, traceroute, chirp, train, capture, dnslookup
• Synchronous / Asynchronous
• Cannot be accessed by the actors directly
ML method composition:
• Sychronous/Asynchronous measurements = short/long ones
• Atomic measurements:
short*, long*, parallel*, ensemble*, ensembleNSlookup
* = Ping or Traceroute
• Complex measurements /require synchronization/:
short*, long*
* = Chirp, Train
• Statistical evaluations /measurement(s) + model/: getAvailableBandwidth, topology, tomography
3.43. Web client
• http://sonoma.etomic.org
57
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
[fragile]
3.44. Case study: A full mesh topology measurement
A python-like example for obtaining the full mesh topology spanned by the MAs:
# Service object:
ws = ServiceServerSOAP( url="https://sonoma.etomic.org:8888" )
# Open a session and authenticate
sessionID = ws.requestSession( user="User", zip=True, format="CSV" )
# Submit a topology measurement with the list of Mas to be involved in the experiment
# The returned procID is a unique id which refers to the given experiment
procID = ws.topology( sessionID, nodeList = [ "157.181.175.247", "132.65.240.38", ... ]
)
# Waiting for that the measurement ends
58
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
while not ready:
# Receive the current status of the experiment
(duration, working, ready, dead) = ws.getProcessInfo( sessionID, procID )
...do some smart waiting...
# If the measurement is over, the resulting data can be downloaded
result = ws.getData( sessionID, procID )
# Finally, the session has to be closed
ws.closeSession( sessionID )
3.45. Case study: A full mesh topology measurement
3.46. What happens in the background? A full mesh topology
measurement
59
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
3.47. Another use case: Spotter
• IP geolocation service (http://spotter.etomic.org)
• based on active delay measurements
• uses SONoMA as its measurement platform to perform real-time network measurements
3.48. SONoMA 2.0
• Being developed within the projects
60
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• EU FP7 NOVI and OpenLab
• EIT ICTLabs FITTING
• Supporting FITTING testbeds and FITTING EduLab
• To provide a measurement/monitoring infrastructure for federated testbeds
• Ontology driven measurement description
• New tools/experiment types can be added on the fly in run-time by only extending the information model
3.49. Literature
• I. Csabai et al., ETOMIC Advanced Network Monitoring System for Future Internet Experimentation, In
Proceedings of TridentCom
• I. Csabai et al., ETOMIC Advanced Network Monitoring System for Future Internet Experimentation, In
Proceedings of TridentCom 2010 Conference, May 18-20, 2010, Berlin, Germany (2010)
• B. Hullár et al., SONoMA: A Service Oriented Network Measurement Architecture In Proceedings of
TridentCom 2011, April 17-19, 2011, Shanghai, China (2011)
• P. Mátray et al., Building a Prototype for Network Measurement Virtual Observatory
4. 4 Network measurement infrastructures PlanetLab
4.1. PlanetLab
• Planetary scale distributed testbed
• More than 1000 users from 40+ countries
• Today: 1157 nodes at 547 sites around the worlds
4.2. The main goal
• "...to support seamless migration of an application from an early prototype,
• through multiple design iterations,
• to a popular service that continues to evolve."
61
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
4.3. What is PlanetLab?
• PlanetLab is an overlay test bed
• designed to allow researchers to experiment with network applications and services
• Benefits
• Distribution across a wide geographic area
• Diversity of network, e.g. edge-sites, co-location, etc.
• Flexibiliy, maximal control over PlanetLab nodes
• PlanetLab Consortium
• is a collection of academic, industrial, and government institutions cooperating to support and enhance the
PlanetLab overlay network
• Visit http://www.planet-lab.eu
4.4. PlanetLab architecture
• PlanetLab nodes
• Several virtual machines (VM) on each node
• Resources are distributed fairly (disk, memory, cpu)
• Services running in different VMs are isolated from each other
• Elements for managing the overlay network
• Node Managers, Brokers, agents and service managers
4.5. Slices
4.6. Slices
4.7. Slices
4.8. User Opt-in
• Server
• NAT
• Client
4.9. Services running in your slice
• PlanetLab nodes
62
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• (physical devices)
• planet1.elte.hu
• planetlab2.cse.msu.edu
• planetlab5.cs.duke.edu
4.10. Services running in your slice
• PlanetLab nodes
• (physical devices)
• Slices
• (virtual nodes)
63
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• elteple twitter
• inria priv
4.11. Services running in your slice
• PlanetLab nodes
• (physical devices)
• Slices
• (virtual nodes)
• Applications or services running in the slices
64
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
4.12. Services running in your slice
• PlanetLab nodes
• (physical devices)
• Slices
• (virtual nodes)
• Applications or services running in the slices
65
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
4.13. Services running in your slice
• PlanetLab nodes
• (physical devices)
• Slices
• (virtual nodes)
• Applications or services running in the slices
66
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
4.14. Virtualization solutions
• Software runtime (e.g. Java VM)
• High level API
• Depend on OS to provide protection and resource allocation
67
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Not flexible
• Complete Virtual Machine (e.g. VmWare)
• Low level API (hardware)
• Maximum flexibility and Excellent protection
• But High CPU/Memory overhead, cannot share common resources among virtual machines
• PlanetLab’s Vserver
• Kernel patch to mainstream OS (Linux)
• Gives appearance of separate kernel for each virtual machine
• Root privileges restricted to activities that do not affect other vservers
• Resource control (e.g., File handles, port numbers) and protection facilities added
4.15. VServers in a PlanetLab node
• Hardware
• Linux
• (Fedora
• Core 2)
Vserver Vserver
• Vserver
• Slice 3
• Vserver
• Slice 4
Vserver
• Combined
• Isolation and
• Application
• Interface
• Resource Isolation
• Safe Raw Sockets
• Instrumentation
• ...
4.16. VServers in a PlanetLab node
• Extend the idea of chroot(2)
68
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• New vserver created by system call
• Descendent processes inherit vserver
• Efficient solution
• Unique filesystem, SYSV IPC, UID/GID space
• Limited root privilege
• Can’t control host node
4.17. Low-level network access
• Some applications may need low-level network access
• Ensure that they cannot access traffic generated by other services
• PlanetLab provides "safe raw sockets"
• TCP/UDP bound to local port
• For each IP, ports are either free or owned by a slice
• Incoming packets delivered only to service with corresponding port registered
• Outgoing packets scanned to prevent spoofing
• ICMP is also supported
• 16-bit identifier placed in ICMP header
• As any other resources, ports and ICMP IDs can be reserved by the slice itself
4.18. Getting started
• Register for a PlanetLab Europe account at the following site:
• http://planet-lab.eu
• Choose Eotvos Lorand University as your site
• Contact your PlanetLab Principal Investigator to enable your account and add you to a slice
• Sandor Laki: [email protected]
[fragile]
4.19. Create your SSH Key
• You need an SSH Key to access PlanetLab nodes remotely
• SSH login using RSA authentication
• To generate an SSH key pair, use the ssh-keygen program on a secure UNIX system:
• Ssh-keygen asks for a passphrase. For practical reason, just let it empty.
• Upload the public key file to the PlanetLab Europe website
69
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Use Manage My Keys tab
• Public key in the above case:
~/.ssh/pl_rsa.pub
ssh-keygen -t rsa -f ~/.ssh/pl_rsa
4.20. Create your slice
• You should ask your PI to create a slice for you, or to associate your account with an existing slice
• After you have been associated with the slice, you may assign nodes to the slice
• e.g. by using the Manage Nodes tab on the website.
• Slices expire after two months
• it is destroyed
• all files in the slice are removed on all nodes assigned to that slice
• Can be extended by using the Renew Slice tab
[fragile]
4.21. Login to your slice
• A slice is initially populated with a minimal Fedora Core 2 Linux installation
• To access your node through SSH, you should use the slice name as login name (e.g. eltepletutor), your
private key and the PlanetLab node to be accessed
• After getting in, su/sudo commands can be used to control services, install new packages, mount certain
resources, etc.
• Note that this root user has limited privileges…
ssh -l elteple_tutor -i ~/.ssh/pl_rsa planet1.elte.hu
4.22. Install additional packages
• Additional standard packages in the slice, can be installed using yum.
• For example, to install vim:
• To bring your slice up-to-date with the latest versions of all packages:
• yum install vim
• yum update
[fragile]
70
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
4.23. Deploying your app
• The most straightforward way of deploying your app to a single node is with
• Or rsync that copies just those files that do not exist or have changed on the remote slice
• Or with codeploy which enables
http://codeen.cs.princeton.edu/codeploy/
you
to
deploy
your
app
to
scp -i ~/.ssh/pl_rsa myapp [email protected]:
rsync -a -e "ssh -l elteple_tutor -i ~/.ssh/pl_rsa" myapp planet1.elte.hu:
4.24. Configuring a server for automatic startup
• Nodes reboot periodically
• If your app is a long running service, you may want it to restart automatically after reboot
• set up a System V Init script for it
• in /etc/rc.d/init.d/
• modifying the line beginning with chkconfig:
• 1st number is the list of runlevels the script should be run in;
• 2nd number is the relative order in which it should start;
• 3rd number is the relative order in which it should stop.
• Enable it by chkconfig
• chkconfig -add myapp chkconfig myapp on
4.25. Other useful tools
• Parallel SSH
• PlanetLab Slice Deploy Toolkit
• vxargs
• Nixes Tool Set
• A complete list can be found here:
• http://planet-lab.org/tools
4.26. PSSH
• Developed at Intel Research, Berkeley
• This package provides parallel versions of the openssh tools. Included in the distribution:
71
Created by XMLmind XSL-FO Converter.
multiple
nodes
Large-scale Internet measurement
• Parallel ssh (pssh)
• Parallel scp (pscp)
• Parallel rsync (prsync)
• Parallel nuke (pnuke)
• Parallel slurp (pslurp)
[fragile]
4.27. PSSH Demo
pssh -h nodes.txt -l elteple_tutor -o /tmp/foo hostname
pscp -h hosts.txt -l ufl_ipop foo.txt /home/ufl_ipop/foo.txt
pnuke -h ips.txt -l ufl_ipop java
4.28. PlanetLab Slice Deploy Toolkit
• The PlanetLab Slice Deploy Toolkit consists of three scripts used to manage slices:
• plslice create and manage a slice
• pldeploy manage a collection of cogs deployed in a slice
• pladdnodes example of a script to push a cog to all nodes
• More details:
• http://psepr.org/tools/
4.29. vxargs
• It provides the parallel versions of any arbitrary command, including ssh, rsync, scp, wget, curl, and whatever
• The main features are:
• parallelism: run many jobs at the same time
• flexibility: arbitrary command with arbitrary options
• visualization: monitor the total/per job progress in a curses-based UI
• redirection: stdout and stderr of each individual job are redirected to files respectively for further analysis.
• More details:
• http://vxargs.sourceforge.net/
4.30. Nixes Tool Set
• More details:
• http://www.aqualab.cs.northwestern.edu/projects/149-nixes-tool-set
72
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• plsetup node-list: bootstraps the slice with yum and gzip
• plinstallrpm “rpms" node-list: installs all the rpms on all the nodes
• pldeploy node-list: deploys any file structure to the nodes
• plcmd command node-list: executes any set of commands on all the nodes.
4.31. Long-Running Services In PlanetLab
• Content Distribution
• CoDeeN: Princeton
• Coral: NYU, Stanford
• Cobweb: Cornell
• Storage and Large File Transfer
• LOCI: Tennessee
• CoBlitz: Princeton
• Information Plane
• PIER: Berkeley, Intel
• PlanetSeer: Princeton
• iPlane: Washington
• DHT
• Bamboo (OpenDHT): Berkeley, Intel
• Chord (DHash): MIT
4.32. Services (cont)
• Routing / Mobile Access
• i3: Berkeley
• DHARMA: UIUC
• VINI: Princeton
• DNS
• CoDNS: Princeton
• CoDoNs: Cornell
• Multicast
• End System Multicast: CMU
• Tmesh: Michigan
• Anycast / Location Service
73
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Meridian: Cornell
• Oasis: NYU
4.33. Services (cont)
• Internet Measurement
• ScriptRoute: Washington, Maryland
• Pub-Sub
• Corona: Cornell
• Email
• ePost: Rice
• Management Services
• Stork (environment service): Arizona
• Emulab (provisioning service): Utah
• Sirius (brokerage service): Georgia
• CoMon (monitoring service): Princeton
• PlanetFlow (auditing service): Princeton
• SWORD (discovery service): Berkeley, UCSD
4.34. Further available testbeds with PlanetLab Europe account
4.35. NITOS Wireless Testbed
• Features:
• 40 nodes equipped with Atheros Wi-Fi cards, running on
• Linux and open-source wireless drivers
• Extra available features: GNU-radios, cameras, temperature/humidity sensors
• Mobility: two programmable robots carrying Wi-Fi nodes
• LTE and 3G-femtocell testbeds to be deployed
• Remotely controlled, OMF-driven, provides spectrum slicing Testbeds
• Interconnected with PlanetLab Europe
• Suitable for:
• Experimentation with Wi-Fi in a real world environment
• PHY-layer experimentation through the 6 available GNU-radio-equipped nodes
• Video over wireless experiments
74
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Sensor experiments
• Slow mobility experiments through the two programmable robots
• http://nitlab.inf.uth.gr/NITlab/index.php/testbed
4.36. w-iLab.t
• A heterogeneous, generic, wireless testbed deployed at two locations:
• an office (200 fixed nodes) and a pseudo-shielded environment (60 fixed, 20 mobile nodes).
• Features:
• A node = wireless sensor node + embedded PC with wireless interfaces
• Wireless, wired and sensor technologies available
• Testbed functionality fully configurable by user
• Rich set of tools
• Run experiments using wireless protocol design, using:
• reproducible wireless environments - benchmarking strategies
• reproducible real-life mobility, or fully emulated mobility
• simultaneous use of different technologies and node types
• http://www.ibbt.be/en/develop-test/ilab-t/wireless-lab
5. 5 Network measurement infrastructures FEDERICA,
SFA, OpenFlow
75
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
5.1. Federica
• e-infrastructure based on virtualization
• Complete control on the protocol stack
• Enabling experiments at all the communication layers
• Including the characteristics of the underlying physical layer (e.g. bandwidth, delay, loss)
• Reproducibility of experiments
• In contrast to PlanetLab or ETOMIC where the conditions are varying from one experiment to another
• Built on existing infrastructures
• Géant
5.2. Federica
5.3. The physical topology
• 4 core PoPs
• DFN, DE
• PSNC, PL
• GARR, IT
• CESNET, CZ
• Each PoP
• Juniper router
• 2 or more V-Nodes
• High speed connection
• 1 Gbps
76
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
5.4. Core elements
• Switch:
• Juniper MX480, Dual CPU, 1 line card with 32 ports at 1Gb Ethernet.
• Virtual and logical routing, MPLS, VLANs, IPv4, IPv6, 2 of the 4 line cards have hardware QoS
capabilities
• V-Node:
• Quad core AMD @ 2GHz, 32GB RAM, 8 network interfaces, 2x500GB disks, Virtualization SW
• 2 of them are deployed near each switch
5.5. SFA – SLICE-BASED FACILITY ARCHITECTURE
5.6. Slice-based Facility Architecture SFA
• A great many testbeds have emerged in the past decade
• EU, US, and national efforts
• What if we want to use more than one?
• We need an account for them
77
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Each testbed provides its own tools to create slices, reserve and access resource, so we have to learn these
tool sets
• Various way to deploy and perform an experiments
• Combining various resources located in different testbeds is impossible or requires manual configuration
• It makes difficult to try out new ideas.
5.7. Slice-based Facility Architecture SFA
• Its objective
• SFA aims at providing a secure common API with the minimum possible functionality to enable global
federation
• SFA is a specification
• Many implementations exist
• PlanetLab (python), ProtoGENI(Perl), Federica (Java)
• It was originally conceived by Larry Peterson (Princeton)
• Developed by Princeton and INRIA
5.8. Experiment lifetime in general
• Authentication
• Resource discovery
• Slice creation
• Resource reservation
• Resource reconfiguration
• Execution of the experiment
• Result collection
• Resource release
• What can SFA help with?
5.9. What can SFA help with?
• Authentication
• Resource discovery
• Slice creation
• Resource reservation
• Resource reconfiguration
• Execution of the experiment
78
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Result collection
• Resource release
• Delete slice
• By the local authority
• Providing a general description of the available resources, even in a federated environment
• Creating federated slices
• Reserving different resources located in various testbeds and selected by the user
• Releasing reserved resources
• Slice deletition
5.10. SFA for federated testbeds
79
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Authorities
• Users
• Resources
• Agreements
• Agreements
• Authentication
80
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Resource discovery
5.11. SFA for federated testbeds
• Authorities
81
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Users
• Resources
• Authentication
• Resource reservation
5.12. SFA – Available resources
• Flack a client side SFA tool
82
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
5.13. SFA functionalities
• Naming (slices, users, resources, authorities)
• hierarchical naming space
• Authentication and authorization
• X.509 certificates and signed XML credentials
• federation links = exchange of certificates
• A standard XMLRPC API
• Object management: users, resources, slices, authorities
• Resource management: browse, acquire, manifest
• Slice management: create, delete, start, stop
• Resource description (RSpecs)
• only the language (XML), not the semantics
5.14. Hierarchical naming
• Object types:
• Resources, authorities, users, slices
• Unique names
• HRN (Human Readable Naming)
• ple.elte.userA
83
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• ...
• ...
• ple
• plc
• elte
• upmc
• princeton
• mit
• userA
• userB
• userC
• ...
• nodeA
• nodeB
• nodeC
• ...
• sliceA
• sliceB
• sliceC
84
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• ...
[fragile]
5.15. Authentication
• X.509 certificates
• signed by an SFA authority
• HRN + public key (self signed in the beginning)
• GID = signed chain of certicates
• Example: structure of ple.elte.userA.gid
subject= /CN=ple.elte.userA
Issuer= /CN=ple.elte
-subject= /CN=ple.elte
Issuer= /CN=ple
-item subject= /CN=ple
Issuer= /CN=ple
• ...
• ple
• elte
• upmc
• userA
• userB
• userC
85
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• …
• nodeA
• nodeB
• nodeC
• …
• sliceA
• sliceB
• sliceC
• …
[fragile]
5.16. SFA API
• Object management
Register/Remove/Update(Record, Credential)
Record = Resolve(xrn, Credential)
Record[] = List(xrn, Credential, options)
• Resource management
Rspec = ListResources(Credential, [hrn])
CreateSliver(xrn, Credential, RSpec, users, options)
DeleteSliver(xrn, Credential, options)
• Slice management
ListSlices(Credential, options)
Start/Stop/Shutdown(xrn, Credential)
SliverStatus(xrn, Credential, options)
5.17. SFA Components
• Aggregate Manager (AM)
• Resource management
• Registry (R)
• Object management
• Slice Manager (SM)
• Slice management
• Component Manager (CM)
86
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• AM
• SM
• AM
• SM
• R
• R
• Local resources
• Local resources
• CM
• CM
• CM
• CM
• CM
• CM
• CM
• CM
5.18. Resource Specification (RSpec) Documents
• RSpec
• allow interoperability among different AMs
• a common language for describing
• resources, resource requests, and reservations.
• Standardized XML documents
• Aggregate or resource specific extensions
• Usage
87
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Advertisement Rspec
• This is the document that is returned by an AM that describes the resources that the AM has. (e.g. calling
listresources)
• Request Rspec
• This is the document that a user sends to an AM to describe the resources that he wants to reserve. (e.g.
calling createsliver)
• Manifest Rspec
• This is the document returned by an AM that describes the resources that a user has reserved at an AM.
(e.g. calling listresources for a given sliver)
5.19. SFI and SFA client
• Slice Federation Interface (SFI)
• PlanetLab (PLC)
• PlanetLab Europe (PLE)
• PlanetLab Japan (PLJ)
• VINI
• protoGENI
• Registry Interface for managing records
• Add, Update, Remove, Show, List
• Slice Interface for managing slices
• Resources, Create, Delete, Start, Stop
[fragile]
5.20. Installation and configuration
• You need a Linux or Mac OS X platform
• Follow the instructions at
• http://svn.planet-lab.org/wiki/SFAInstallationGuide
# sfi.py -h
Usage: sfi [options] command [command_options] [command_args]
Commands:
list,show,remove,add,update,nodes,slices,resources,create,delete,start,stop,reset
Options:
h, --help
show this help message and exit
r URL, --registry=URL
root registry
s URL, --slicemgr=URL
slice manager
d PATH, --dir=PATH config and working directory - default is
/Users/soltesz/.sfi/
u HRN, --user=HRN user name
a HRN, --auth=HRN authority name
88
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
v, --verbose
verbose mode
p PROTOCOL, --protocol=PROTOCOL
RPC protocol (xmlrpc or soap)
5.21. List records from the registry
• To list all records in the registry for a given authority e.g. "ple.elte"
• # sfi.py list ple.elte
• ple.elte.twitter (slice)
• ple.elte.sonoma (slice)
• ple.elte.tutor (slice)
• ...
• ple.elte.laki (user)
• ple.elte.ZZZ (user)
• ...
• ple.elte.planet1 (node)
• ple.elte.planetl2 (node)
• Each line consists of a HRN and the record type indicated in parentheses (slice, user, node or authority).
• To get authority records query for the HRN of a top level authority such as plc or ple.
[fragile]
5.22. Detailed record information
• To see more information you can show a Record for objects that you have permission to access.
• Typically it is used for fetching slice, node and user (actual user) records
# sfi.py show -t slice plc.princeton.slicename
gid:
hrn: plc.princeton.slicename
uuid: 229167493191239517049866786687974995079
last_updated: 2009-07-15 16:15:35.85365
hrn: plc.princeton.slicename
type: slice
date_created: 2009-07-15 16:15:35.85365
description: A test slice for personal tests.
researcher: ['plc.princeton.sliceuser']
expires: 1871836937
name: princeton_slicename
url: http://www.cs.princeton.edu/~sliceuser
• We are looking for a slice record, but the type could be user and node as well.
• Where the slice HRN is plc.princeton.slicename
• The detailed record lists information on
89
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• the users assigned to the slice, expiration date,
• login name of the slice, etc.
[fragile]
5.23. Get resources
• Slices are defined as the binding between a set of resources and users
• To find out what resources are currently bound to a slice the command resources can be used
• For a given slice HRN This command returns an
carrying the nodes that belongs to the given slice
# sfi.py resources -o slice.rspec plc.princeton.slicename
<?xml version='1.0' encoding='ASCII'?>
<RSpec type="SFA">
<network name="plc">
<site id="s10242">
<name>HU Berlin - IWI</name>
<node id="n10855">
<hostname>planetlab1.wiwi.hu-berlin.de</hostname>
</node>
<node id="n10856">
<hostname>planetlab2.wiwi.hu-berlin.de</hostname>
</node>
</site>
</network>
</RSpec>
• The content of the output file slice.rspec
• The nodes for the slice plc.princeton.slicename
[fragile]
5.24. Get resources
• Slices are defined as the binding between a set of resources and users
• To find out what resources are currently bound to a slice the command resources can be used
• For a given slice HRN This command returns an
carrying the nodes that belongs to the given slice
# sfi.py resources -o slice.rspec plc.princeton.slicename
<?xml version='1.0' encoding='ASCII'?>
<RSpec type="SFA">
<network name="plc">
<site id="s10242">
<name>HU Berlin - IWI</name>
<node id="n10855">
<hostname>planetlab1.wiwi.hu-berlin.de</hostname>
</node>
<node id="n10856">
<hostname>planetlab2.wiwi.hu-berlin.de</hostname>
</node>
</site>
</network>
</RSpec>
90
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• The content of the output file slice.rspec
• The nodes for the slice plc.princeton.slicename
• To obtain all available resources (resource discovery) use
• # sfi.py resources -o all.rspec
• all.rspec will contain all the nodes with detailed information
[fragile]
5.25. Allocate resources for a given slice
• You need to prepare an RSpec describing the resources you want to allocate
• You can do it manually by adding "sliver" tags
• or using command line tools
<?xml version='1.0' encoding='ASCII'?>
<RSpec type="SFA">
<network name="plc">
<site id="s15">
<name>CarnegieMellon</name>
<node id="n40">
<hostname>planetlab-1.cmcl.cs.cmu.edu</hostname>
<bw_limit units="kbps">5000</bw_limit>
<sliver/>
</node>
<node id="n41">
<hostname>planetlab-2.cmcl.cs.cmu.edu</hostname>
<bw_limit units="kbps">5000</bw_limit>
<sliver/>
</node>
</site>
</network>
</RSpec>
• You can modify the XML returned by the command resource by adding extra <sliver/> tags to the nodes you
want to have in your slice.
5.26. Allocate resources for a given slice
• Instead of modifying the RSpec by hand, command line tools can be applied
• Get available resources
• # sfi.py resources -o nodes.rspec
• Create a text file of hostnames
• # sfiListNodes.py -i nodes.rspec -o nodes.txt
• Remove/add hostnames from/to nodes.txt
• Create an RSpec with requested resources
• # sfiAddSliver.py -i nodes.rspec -n nodes.txt -o my-nodes.rspec
• And then create the slice with the specified resources:
91
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• # sfi.py create plc.princeton.slicename my-nodes.rspec
5.27. Deallocate resources
• To release or deallocate resources you can use
• # sfi.py delete plc.princeton.slicename new-slice.rspec
• where new-slice.rspec contains the node to be released.
• To deallocate all the resources of a slice just use the above command without the rspec:
• # sfi.py delete plc.princeton.slicename
5.28. OPENFLOW CAPABILITIES IN PLANETLAB EUROPE
5.29. What is the problem with existing networks?
• Commodity hardware
• Difficult to program
• Always behind the industry
• Too expensive
• Routers are not only forwarding elements
• They implement the routing algorithms too
• The control and forwarding, e.g. the contorl and data planes are not separated
• Most of the routes are static
• Traffic differentiation is expensive and not always possible
• No way to test new ideas on a large scale
• Hindering innovation and the introduction of new technologies
• Reconfiguration of routers is expensive and always a tiresome business
• There is no opportunity for radical changes
• Many goof ideas from research, but almost no technology transfer
5.30. What is the problem with existing networks?
• Commodity hardware
• Difficult to program
• Always behind the industry
• Too expensive
• Routers are not only forwarding elements
92
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• They implement the routing algorithms too
• The control and forwarding, e.g. the contorl and data planes are not separated
• Most of the routes are static
• Traffic differentiation is expensive and not always possible
• No way to test new ideas on a large scale
• Hindering innovation and the introduction of new technologies
• Reconfiguration of routers is expensive and always a tiresome business
• There is no opportunity for radical changes
• Many goof ideas from research, but almost no technology transfer
• This is where Software Defined Networking could help!!!
5.31. Software Defined Networking
• Separating control and data planes
• The routers are simple forwarding elements
• The decision is made by a programmable logically centralized element called the controller
• Forwarding
• Forwarding
• Network OS
• (Control)
• Forwarding
• Network OS
• App
• App
• App
• ...
• Packet forwarding element
• Open interface between the control and data planes
5.32. OpenFlow
• What is OpenFlow?
• a protocol for control the forwarding behavior of Ethernet switches in an SDN
• Initially released by Stanford, now maintaned by Open Networking Forum
• OpenFlow 1.3 was approved in 2012, but most of the devices support OpenFlow 1.0
93
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Controller
• (a server software)
• App
• App
• App
• ...
• Ethernet switch
• Data path (HW)
• Control path
• OpenFlow protocol
5.33. OpenFlow
• Innovative
• app
• Flow Table:
• Filter: header=x, Action: send to port 2
• Filter: header=y, Action: overwrite IP field and send to port 4,5
94
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• In other cases ask the controller
5.34. Plumbing primitives
• Matching arbitrary bits in the header
• E.g. 100010110xxx0110x001xxxx
• Any header or new header
• Allows any flow granularity
• Possible actions – small set of forwarding primitives
• Forward to ports, drop flows, send to the controller
• Overwrite header with mask, pop or push
• Forward at specific bit rate
5.35. Network OSes
• Open Source for research
• NOX (C++/Python)
• http://www.noxrepo.org/
• POX
• Meastro
• Helios
• Beacon
• Etc.
• Commercial
• ONIX (OSDI 2010, Google, Nicira, NEC)
• ...
5.36. OpenFlow support in PlanetLab
• Goals
• Easily deploy a virtual L2 topology
• Having as many virtual switches as planetlab nodes
• They are connected with virtual cables
• What do we get?
• User level switches
• Sliver-ovs – a modified OpenFlowVSwitch
95
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Support for ethernet over UDP tunnels
• One tap device per switch (tunnel access point)
• Bridging by OVS instance running in the sliver
5.37. How to use it in PlanetLab?
Installation
• sliver-ovs comes preinstalled with a new slice
• Send a mail to PlanetLab Europe Support to request a private IP subnet for your slice
• On your laptop, the following tools are needed
• GNU make
• the openssh client
• the host program (usually distributed in bind-tools)
• The
Makefile
contained
on
the
git
openvswitch.git;a=tree;f=planetlab/exp-tool;hb=HEAD
repository
5.38. How to use it in PlanetLab?
Assume that we have slice named elteple example with four nodes
• onelab7.iet.unipi.it
• planet2.elte.hu
• planetlab2.ics.forth.gr
96
Created by XMLmind XSL-FO Converter.
http://git.onelab.eu/?p=sliver-
Large-scale Internet measurement
• planetlab2.urv.cat
The subnet we obtained is 192.168.0.2
[fragile]
5.39. Create the topology
• Put a conf.mk file in the same directory where the Makefile is with the following content:
--------SLICE=example_slice
HOST_1=onelab7.iet.unipi.it
IP_1=192.168.0.1/24
HOST_2=planet2.elte.hu
IP_2=192.168.0.2/24
HOST_3=planetlab2.ics.forth.gr
IP_3=192.168.0.3/24
HOST_4=planetlab2.urv.cat
IP_4=192.168.0.4/24
LINKS :=
LINKS += 1-2
LINKS += 2-3
LINKS += 2-4
---------
• And then just type
• make -j
[fragile]
5.40. Create the topology
97
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Put a conf.mk file in the same directory where the Makefile is with the following content:
--------SLICE=example_slice
HOST_1=onelab7.iet.unipi.it
IP_1=192.168.0.1/24
HOST_2=planet2.elte.hu
IP_2=192.168.0.2/24
HOST_3=planetlab2.ics.forth.gr
IP_3=192.168.0.3/24
HOST_4=planetlab2.urv.cat
IP_4=192.168.0.4/24
LINKS :=
LINKS += 1-2
LINKS += 2-3
LINKS += 2-4
---------
• And then just type make -j.
• You can easily test it: ssh -l eltepleexample onelab7.iet.unipi.it ping 192.168.0.4.
5.41. Modify the topology
• You can dynamically add and remove virtual links
98
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• To remove the link between nodes 2 and 3: make –j U/2-3.
• Create a new link between nodes 3 and 4: make –j L/3-4.
5.42. Literature
• SFA User guide:
• http://svn.planet-lab.org/wiki/SFAGuide
• OpenFlow website:
• http://www.openflow.org/
• OpenFlow in PlanetLab:
• https://www.planet-lab.eu/doc/guides/user/practices/openflow
6. 6 Bandwidth measurement methods Network path
characterization
6.1. Methods to measure path characteristics
99
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Passive methods
• monitoring
100
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Carried out at a certain point of the network
• Non invasive
• Special permissions are needed
• Active methods
• Artificial probes
• end-to-end measurements
• Invasive
• No special permission is needed
• The network traffic is subdivided into packets:
• Packet level properties:
• source, destination
• Packet size (payload)
• Timestamps (sending, receiving)
• +others
6.2. Capacity
• S
• 45
• DS3/ATM
• 100
• Fast Ethernet
• 100
• Fast Ethernet
• R
• 36(IP) + 9 (ATM)
6.3. Available bandwidth
6.4. Capacity and Available Bandwidth
• kth link the narrow one
• C2
• C3
• C1
101
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• A
6.5. Passive Techniques
• Network managers are very interested in available bandwidth
• Can be measured at each link from router utilization statistics (via SNMP)
• Multi Router Traffic Grapher
• MRTG graph example: Weekly traffic
• BUT, users do not genreally see this data and it is not end-to-end
• Visit: www.bix.hu – Budapest Internet eXchange
6.6. Active probing methods
• Goal:
• estimate network parameters
• available bandwidth, physical bandwidth, cross traffic statistical properties, etc.
• with end-to-end methods
• Sender
• Receiver
• Sender Monitor:
• Receiver Monitor:
• background traffic
6.7. Basic ideas
• Blast the path with UDP traffic
• bad and very harmful practice
102
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Measure throughput of large TCP transfer
• Widely used BUT has many drawbacks
• TCP does not get available bandwidth in under-buffered paths
• TCP gets more than available bandwidth in over-buffered paths
• TCP saturates the path
• intrusive measurements
6.8. State of the art Bandwidth estimation methods
• SLoPS-based:
• PathLoad 2002
• Packet pair-based:
• PathChirp 2003
• PathSensor 2007
• Cprobe
• TCP throughput measurement tools:
• Treno
• Iperf
• Cap
103
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.9. SLoPS Self-Loading Periodic Streams
104
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.10. SLoPS Self-Loading Periodic Streams
6.11. SLoPS
•
• 1
• 2
• 3
• 4
• 4
• 1
• 2
• 3
• D1
• D2
• D3
• D4
•
when
6.12. SLoPS
105
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• D1
• D2
• D3
• D4
•
• 1
• 2
• 3
• 4
• 1
• 2
• 3
• 4
•
when
6.13. OWD variations
106
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.14. How it works?
6.15. How to determine parameters K,L and T?
• L – Size of the stream
• can not be less than certain number of bytes
• should not be greater than path MTU
• to avoid fragmentation
• T – Transmission time of the stream
• should be small
• to complete transmission of stream before context switch
• K – Probe packets to be transmitted
• Large K may overflow the queue of the tight link
• when
• Small K does not give enough samples
• to infer trend robustly
6.16. Fleets of streams
6.17. How to detect the increasing trend of OWDs?
6.18. PathLoad uses two metric to recognize increasing trend
107
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.19. PDT and PCT examples
6.20. PCT variations examples
108
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.21. PDT variations example
109
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.22. Rate adjustment
• Grey region
6.23. Performance
• Source: Jain, Manish et al. (see last slice)
110
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.24. Packet Pair-based methods
• ...
• output spacing, receiver node
• background traffic
• stochastic process
• probe pairs
• fixed inter packet delay
• ...
• input spacing, sender node
6.25. PathChirp Chirp Packet Trains
111
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.26. PathChirp
• Goal: exploit information in queuing delay signature
• Excursions
6.27. PathChirp Methodology
112
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.28. Self-Induced Congestion
113
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.29. Excursions
• Only a few packets are involved
• (not valid excursions)
114
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.30. pathChirp Tool
• Available as a standalone tool
• Open source
• http://www.spin.rice.edu/Software/pathChirp/
• How it works?
• Two instances: a client and a server side
• It sends UDP probe packets between the end points
• No clock synchronization required, since it only uses relative queuing delay within a chirp duration
• Computation at receiver
• User specified average probing rate
6.31. Comparison with Pathload
• Experimentation setup:
• 100Mbps links
• Poisson cross traffic
• Y-topology
115
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Approx. 10 times fewer bytes are enough
6.32. PathSensor: Granular model-based bandwidth estimation
• fluid model
• the asymptotic behavior is correct
• but unable to describe the transition region
• new analytic description of the transition region
• Introducing granularity, the effective cross traffic packet size
•
•
116
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.33. Estimating output spacing with fluid traffic for a single-hop
scenario
6.34. Fluid curves for single-hop
117
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.35. How to simulate cross traffic?
118
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
6.36. Output spacing
6.37. Output spacing
6.38. Explicit solution for M/D/1 queues
• Simplest M/G/1 type case is an M/D/1 queue:
• single cross traffic packet size: P
• Poisson arrival rate: l
6.39. Explicit solution for M/D/1 queues
119
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Validation with packet level simulation
• M/D/1 queue: - Poisson arrival process - single cross traffic packet size (
from the M/D/1 result
) theoretical curves
• Trimodal packet size distribution: - packet sizes are 40, 576 and 1500 bytes, while the corresponding
probabilities are 0.59, 0.23 and 0.23. theoretical curves from the numerical integration of the Takács equation.
• Validation with packet level simulation
• Uniform packet size distribution: - between
of the Takács equation.
bits theoretical curves from the numerical integration
• Validation with packet level simulation
• Parameterization with the granularity
• We can substitute the Taylor expansion of the queue length distribution function
• into the Takács equation:
• and introduce the granularity parameter Pg:
• exact form of the cross traffic packet size distribution is not necessary; the value of the granularity is enough
to describe also the transitional range of the dispersion curve
• Parameterization with the granularity
• M/D/1 curves for:
• single packet size,
•
• uniform,
bits
bits,
bits
120
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
•
bits
• Tri-modal, real Internet
•
bits
• Parameterization with the granularity
• Granularity in a multihop scenario
• Granularity determines the deviation from the fluid model even in a multihop case
• CT with different granularities
• with same granularities
• Estimating the network parameters
• Estimating the network parameters granular vs. fluid estimates
• simulated scenarios: - fluid and granular estimates on the same measurement data - comparing the estimated
networks parameters to the real values
[fragile] Laboratory experiments
Experiment settings:
C = 10 Mbps, Cc = 4 Mbps, Pg = 12000 bits
fitted granular parameters:
C = 10.0 Mbps, Cc = 3.7 Mbps, Pg = 12000 bits
fluid parameters:
C = 10.5 Mbps, Cc = 5.1 Mbps
while 100 packet pairs were averaged.
Experiment settings:
C = 100 Mbps, Cc = 22 Mbps, Pg = 12000 bits
fitted granular parameters:
C = 100 Mbps, Cc = 22.5 Mbps Pg = 15000 bits
fluid parameters:
C = 120 Mbps, Cc = 58 Mbps
while 20 packet pairs were averaged.
• Internet experiments
• Carried out in ETOMIC
• ETOMIC nodes located in Stockhom, Sweden and Jerusalem, Israel. estimated granular parameters:
. fluid parameters:
• ETOMIC nodes located in Pamplona, Spain and Budapest, Hungary. estimated granular parameters:
. fluid parameters:
• Laboratory and Internet experiments
• Comparison to existing tools: - pathload - pathChirp - modified pathChirp tool with granular model based
bandwidth estimation.
6.40. Literature
121
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Jain, Manish, and Constantinos Dovrolis. Pathload: A measurement tool for end-to-end available bandwidth.
In Proceedings of Passive and Active Measurements (PAM) Workshop. 2002.
• Ribeiro, Vinay Joseph, et al. pathchirp: Efficient available bandwidth estimation for network paths. In
Proceedings of Passive and Active Measurements (PAM) Workshop. 2003.
• Strauss, Jacob, Dina Katabi, and Frans Kaashoek. A measurement study of available bandwidth estimation
tools. Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. ACM, 2003.
• Hága, Péter, et al. Understanding packet pair separation beyond the fluid model: The key role of traffic
granularity. IEEE INFOCOM. 2006.
• Hága, Péter, et al. Granular model of packet pair separation in Poissonian traffic. Computer Networks 51.3
(2007): 683-698.
• Hága, Péter, et al. PathSensor: Towards Efficient Available Bandwidth Measurement. Proceedings of IPSMoMe 2005. 2005.
7. 7 Topology discovery in large-scale networks
7.1. Topology discovery
Why are we interested in discovering topology?
• Reverse engineering the network
• Better understanding of Internet routing
• Forwarding and route dynamics
• Simulation tools for the Internet
• Topology and routing behaviour
• Network management
• Discovering multicast trees
• Monitoring path properties
• Failure localization
• Predict path failures
• Topology aware algorithms
7.2. Challenges
• Determined by the routing algorithms of the Internet
• Large-scale – over 1 billion hosts
• Impossible to store each IP address
• Administrative regions
122
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Internet = network of networks
• ISPs want to hide the details of their own networks
• Non-unique routes and changes may occur
• Load balancer routers
• AS level or intra-domain changes
7.3. Naiv approaches
• Traceroute measurements
• Sometimes misleading
• Huge traffic overhead
• Not suitable for detecting path changes
• Limited probing frequency
• BGP tables
• AS level topology
• BGP data lacks visibility
• RIPE and other orgs collect and publish BGP data from some routers
7.4. CAIDA’s Skitter
123
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
7.5. NetDimes
124
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
7.6. Expectations
• A topology discovery method should be
• Efficient
• Low traffic overhead
• Accurate
• Handling the effect of load balancing
• Providing an accurate Internet map
• Fast enough
• To keep tracking route changes
• Probes with high frequency
7.7. Different methods
• Efficiency
125
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Accuracy
• Traceroute
• Tracetree
• DoubleTree
• ParisTraceroute
• DTrack
• Route discovery tools
7.8. ROUTE DISCOVERY
7.9. Traceroute
• Try traceroute tool
• Measure the path from a source to a destination
• ICMP or UDP probes
• How it works?
• It sends UDP probes to each hop on the route by varying the TTL field of the IP header
• When the TTL reaches 0 at a given router, the router sends a Time-Exceed ICMP message back to the
source with the router’s address in the header.
• When the probe arrives at the destination, a Port Unreachable message is sent back, indicating that the
measurement is over.
• The source increases the TTL field of the probes from 0 to a given limit (30)
• to test all the hops along the path
7.10. How traceroute works?
• S
• D
• R1
• R2
• R3
• UDP probe
• TTL:1
• Dest.:D
• Port:6761
• ICMP TE reply
126
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Dest.: S
• Source: R1
7.11. How traceroute works?
• S
• D
• R1
• R2
• R3
• UDP probe
• TTL:2
• Dest.:D
• Port: 6762
• ICMP TE reply
• Dest.: S
• Source: R2
7.12. How traceroute works?
• S
• D
• R1
• R2
• R3
• UDP probe
• TTL:3
• Dest.:D
• Port: 6763
• ICMP TE reply
• Dest.: S
• Source: R3
7.13. How traceroute works?
• S
127
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• D
• R1
• R2
• R3
• UDP probe
• TTL:4
• Dest.:D
• Port:6764
• ICMP PU reply
• Dest.: S
• Source: D
• Port 6764 is unreachable
• D has been reached
7.14. Problems
• Missing ICMP replies
• There are routers that deny sending TE messages
• Or the destination denies transmitting PU messages
• Sometimes the route is not unique
• In presence of load balancers the discovered path may be misleading
• Per-packet
• Per-flow
7.15. Problems with load balancers
• S
• D
• R1
• R2
• R3
• R5
• R4
• The load balancer
•
128
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
•
7.16. Problems with load balancers
• S
• D
• R1
• R2
• R3
• R5
• R4
• The load balancer
•
•
• The inferred path is not valid
7.17. What causes this anomaly?
• Traceroute uses the destination port as identifier
• This is why the destination port is increasing hop by hop
• However, per-flow load balancers use the destination port as part of the flow identifier
• The decision of which alternative route is chosen is also based on this port number
• Per-packet load balancers
• E.g. selecting alternative routes uniformly at random for all the packets
• Their presence is not too significant
• S
• D
• R1
• R2
• R3
• R5
• R4
• DPort: 1
• DPort: 2
• DPort: 3
129
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• DPort: 4
7.18. A more complex example
• Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet
• The discovered path is not a real one.
7.19. What can we do?
• Solution: Paris Traceroute
• Idea:
• Let the destination port fixed for each hop
• Use the checksum field to identify the probes instead
• It also handles load balancers
• Sending multiple probes to each hop with different destination ports
• Multipath Detection Algorithm (MDA)
• S
• D
• R1
• R2
• R3
• R5
• R4
• DPort: 1
• DPort: 1
• DPort: 1
• DPort: 1
7.20. Paris Traceroute Algorithm
130
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet
• Test if interface r at hop h-1 belongs to a per-packet load balancer
7.21. Finding the NEXTHOP
131
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet, in
IEEE/ACM Transactions on Networking (TON), vol. 19, issue 3, pp. 830-840, June 2011
• Generates a hash to randomly sample next hop interfaces of r at hop h
• Probe retransmission, discovering a next hop interface
• s
• Maintaining a set of hash values for hop h and interface s
• The set of next hop interfaces of r
7.22. The key ideas behind NEXTHOP
7.23. Number of probes and the expected number of interfaces at
95 percent confidence level
132
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• n: Number of expected interfaces
• k: Number of probes to send
• k is easy to compute with a binary search.
7.24. SELECTFLOW: Selecting a flow
• To determine the nexthop interfaces of router r
• Each probe has different flow identifier
• Different source ports in case of UDP and TCP
• Sequence number in case of ICMP probes
• Hash based identifier selection
• The probe's index is mapped to the range of 10000-65535
• The destination address of TCP and UDP probes is fixed
• It helps our probes to get through firewalls
• Important for scanning a hop
• selecting ids that go through the router r as well
• If there is not enough such id, new measurements need to be carried out
7.25. SELECTFLOW: discovering new flows crossing router r
7.26. PERPACKET
•
• Source: Brice Augustin, Timur Friedman, and Renata Teixeira,Measuring multipath routing in the internet, in
IEEE/ACM Transactions on Networking (TON), vol. 19, issue 3, pp. 830-840, June 2011
7.27. Discovering nexthop interfaces in presence of a load
balancer
• Looking for nexthop interfaces of RX
• First assumption: having two ifs
• Sending 6 probes through RX (cnf. 95percent)
• Then we assume that there is another if
• So, other 5 probes are sent through RX
• We didn’t find another if
• So the route discovery is finished
• S
133
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Rx
• ?
• ?
• Ry
• Rz
• SPort: 3
• SPort: 5
• SPort: 1
• SPort: 2
• SPort: 4
• SPort: 6
• ?
• SPort: 7
• SPort: 8
• SPort: 9
• SPort: 10
• SPort: 11
7.28. Discovering nexthop interfaces in presence of a load
balancer
• S
• Rx
• ?
• ?
• Ry
• Rz
• SPort: 3
• SPort: 5
• SPort: 1
• SPort: 2
• SPort: 4
• SPort: 6
• SPort: 7
134
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• SPort: 8
• SPort: 9
• SPort: 10
• SPort: 11
• ?
• ?
•
• SPort: 3
• SPort: 5
• SPort: 7
• SPort: 8
• SPort: 10
• SPort: 12
• SPort: 13
• SPort: 15
• SPort: 16
• SPort: 12
7.29. Performance of Paris traceroute
135
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Load balancer routers discovered:
• per-flow: 30 percent
• per-packet: 2 percent
• Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet, in
IEEE/ACM Transactions on Networking (TON), vol. 19, issue 3, pp. 830-840, June 2011
7.30. Load balancers
136
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Source: Brice Augustin, Timur Friedman, and Renata Teixeira, Measuring multipath routing in the internet, in
IEEE/ACM Transactions on Networking (TON), vol. 19, issue 3, pp. 830-840, June 2011
7.31. TOPOLOGY DISCOVERY
7.32. Topology discovery
• To discover tree topologies in the source to receivers direction
• End-to-end paths may overlap
• Some links are scanned several times
• Generating higher traffic load on the network than what is necessary for topology discovery
• How to handle this redundancy?
7.33. DoubleTree
• Efficient cooperative topology discovery algorithm
• Handles two kind of redundancies
• Intra-monitor redundancy
• Inter-monitor redundancy
137
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
7.34. The actual topology
• M1
• M1
• M1
• D1
• D2
• D3
• D4
7.35. Intra-monitor redundancy
• M1
• M1
• M1
• D1
• D2
• D3
• D4
• If probes are sent from a specific source to multiple targets, many links (close to the source) are probed
several times, resulting intra-monitor redundancy.
7.36. Inter-monitor redundancy
• M1
• M1
• M1
• D1
• D2
• D3
• D4
• If probes are sent from multiple sources to a specific target, many links (close to the destination) are probed
many times, resulting inter-monitor redundancy.
7.37. Tree like structures
• Suggesting two different kind of probing strategies
138
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• To discover two kind of tree like structures
• Monitor rooted tree
• For intra-monitor case
• Govindan et al. and TraceTree
• Destination rooted tree
• For inter-monitor case
7.38. Monitor rooted tree
• M1
• M1
• M1
• D1
• D2
• D3
• D4
• Reducing redundancy by scanning each link once
7.39. Destination rooted tree
• M1
• M1
• M1
• D1
• D2
• D3
• D4
• Reducing redundancy by scanning each link once
7.40. DoubleTree
• Forward and backward probing require different strategies
• How to handle both kind of redundancy?
• DoubleTree
• starts probing at a given hop h
• First performing forward probing from h
139
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Continuing with backward probing from h-1
7.41. Maintaining trees
7.42. DoubleTree results
• Measurement load reduction up to 76 percent
• Interface and link coverage above 90 percent
• S1
• S2
• D1
• D2
• A
• B
• Network
• Network
• 4 partial measurements instead of 8 full paths
• h
7.43. Literature
• Augustin, Brice, et al. Avoiding traceroute anomalies with Paris traceroute. Proceedings of the 6th ACM
SIGCOMM conference on Internet measurement. ACM, 2006.
• Augustin, Brice, Timur Friedman, and Renata Teixeira. Measuring multipath routing in the Internet.
IEEE/ACM Transactions on Networking (TON) 19.3 (2011): 830-840.
• Donnet, Benoit, et al. Efficient algorithms for large-scale topology discovery. ACM SIGMETRICS
Performance Evaluation Review. Vol. 33. No. 1. ACM, 2005.
8. 8 Network tomography
8.1. What does tomography mean?
• Noninvasive imaging technique
• X-ray or other beam directed through a part of the body onto sensors
• The sensor data are used to reconstruct cross sectional views of the body
140
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Source: http://en.wikipedia.org/wiki/File:BrainMRI3planes.gif
8.2. Network tomography?
• Who knows what is going on in the network?
• ?
• The network is a weighted graph
•
• Weighting could be
• Loss
141
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Delay
• Traffic rate
• Etc.
8.3. Network tomography?
• Reveal the internal state and structure of the network from
• End-to-end/external measurements
• Network models
• The main goal is network optimization:
• Performance metrics
• Find bottleneck links, link characteristics
• Diagnostic
• Detect if something is broken or slow
• Security
• How to know if a malicious element (e.g. a sniffer) was added to the network
8.4. How does it work?
142
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Internet
• Sending probes from different sources to some receivers
8.5. How does it work?
143
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• A limited number of measurements to infer the underlying topology and link performance metrics
• Internet
8.6. Network Tomography
• Why end-to-end measurements?
• ISPs do not share too much about their network and does not allow internal measurements
144
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Internet is a decentralized stateless network
• Measurement techniques
• Unicast probes
• Inferring path characteristics
• Multicast probes
• Inferring characteristics for multicast tree segments
• Unicast back-to-back probes
• Try to simulate multicast probes
8.7. Network Tomography
• Why end-to-end measurements?
• ISPs do not share too much about their network and does not allow internal measurements
• Internet is a decentralized stateless network
• Measurement techniques
• Unicast probes
• Inferring path characteristics
• Multicast probes
145
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Inferring characteristics for multicast tree segments
• Unicast back-to-back probes
• Try to simulate multicast probes
8.8. Network Tomography
• Why end-to-end measurements?
• ISPs do not share too much about their network and does not allow internal measurements
• Internet is a decentralized stateless network
• Measurement techniques
• Unicast probes
• Inferring path characteristics
• Multicast probes
• Inferring characteristics for multicast tree segments
• Unicast back-to-back probes
• Try to simulate multicast probes
146
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
8.9. Network Tomography
• Why end-to-end measurements?
• ISPs do not share too much about their network and does not allow internal measurements
• Internet is a decentralized stateless network
• Measurement techniques
• Unicast probes
• Inferring path characteristics
• Multicast probes
• Inferring characteristics for multicast tree segments
• Unicast back-to-back probes
• Try to simulate multicast probes
147
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
8.10. What else is needed?
8.11. Estimating Source-destination traffic intensities
8.12. Estimating Source-Destination traffic intensities
8.13. A toy example
148
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
8.14. EM algorithm
8.15. MLE and Normal Approximations
8.16. MultiCast-based loss inference
• Cáceres, Ramón, et al.Multicast-based inference of network-internal loss characteristics. Information Theory,
IEEE Transactions on 45.7 (1999): 2462-2480.
149
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• EM-based approach to estimate loss rates on internal links from end-to-end measurements.
• MultiCast from one source (0) to multiple destinations (4,5,6,7)
8.17. Loss model
150
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
8.18. Loss inference
8.19. Solution with EM
8.20. Convergence
151
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
8.21. Convergence
152
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
8.22. Unicast network tomography
153
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Coates, Mark, et al. Maximum likelihood network topology identification from edge-based unicast
measurements. ACM SIGMETRICS Performance Evaluation Review 30.1 (2002): 11-20.
• Sandwich probing measurement scheme
• Stochastic search method for topology identification
8.23. Sandwich probing
• Sending three probe packets unicast
• Two small packets
• And a large one in the middle
• The main idea:
• Because of queuing, extra separation between the small packets is induced on shared links
• p1
• p3
• p2
8.24. Sandwich probing
• p1
• p3
• p2
154
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Branching point
• Queues, but not branching points
• p1
• p3
8.25. Measurement framework
8.26. Topology Identification
8.27. Simplifying the problem
8.28. Find the tree
155
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
8.29. Illustration
156
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• true topology
• Inferred topology
8.30. Literature
• Vardi, Yehuda. Network tomography: Estimating source-destination traffic intensities from link data.Journal
of the American Statistical Association91.433 (1996): 365-377.
• Cáceres, Ramón, et al. Multicast-based inference of network-internal loss characteristics. Information Theory,
IEEE Transactions on 45.7 (1999): 2462-2480.
• Coates, Mark, et al. Maximum likelihood network topology identification from edge-based unicast
measurements.ACM SIGMETRICS Performance Evaluation Review 30.1 (2002): 11-20.
9. 9 Network coordinates systems
9.1. Introduction
• Aims
• Estimate delays without performing direct measurements
• Reduce the consumption of network resources
• Usage
• Closest mirror selection
• Such as closest game server
• Construction of application level multicast trees
• OASIS (a distributed anycast system)
• peer-2-peer networks
• Azureus (now called Vuze a bittorrent client)
157
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Distribute web-crawling to nearby hosts
• Provide inter-node latency bounds for clusters
9.2. The key idea of an NCS
9.3. Localization Techniques
• Global Positioning System
• Geolocation approaches
• Already discussed
• Meridian Approach
• Wong et al. (SIGCOMM 2005) [23]
• A framework for hosts to lookup their nearest peers in an overlay network
• node selection directly without computing coordinates
• Constructs a multi resolution ring structure
9.4. Localization Techniques
• An example: Meridian
• Scalable, gossip based node discovery
• Example for closest node selection:
9.5. Localization Techniques
• Main drawbacks
• Huge number of wide-area-spanning e2e paths
• It makes preforming on demand measurements impractical
• High frequency probing, costly
• Time-consuming
9.6. Network Coordinates System Basics
158
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.7. Network Coordinates System Basics
159
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.8. Network Coordinates Systems Advantages
• Easy and practical support to P2P applications
• Characterizing the proximity among peers
• Neighbour selection
• Scalability
• Direct measurements are eliminated
• Acceptable accuracy
• The accuracy is not perfect, but acceptable
• NCS families
• Landmark based
• Fix set of well-known trusted nodes
• Decentralized
• Any node may be used as landmark
9.9. LANDMARK BASED NCS
9.10. IDMaps
• Internet Distance Map Service (IDMaps)
• Francis at el. (INFOCOM 1999) [13]
• The first complete system
• Predecessor of landmark based NCSs
• HOPS servers – Tracers – ordenary hosts
160
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Virtual topology map of the Internet
• T1
• T2
9.11. Landmark based NCSs
• Landmarks
• All to all measurements
• Computing its own coordinates (Fig. a)
• Ordinary hosts
• Measurements towards the landmarks
• Evaluating its own coordinates based on landmarks ones (Fig. b)
9.12. Global Network Positioning
161
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• 1. phase
• Measuring inter-landmark latencies
• Calculating the landmark coordinates
• Minimizing the relative error function by DownHill Simplex method
• 2. phase
• Measuring the latencies between node A and each of the landmarks (K required)
• Host A computes it own coordinates using the DownHill Simplex
9.13. Lighthouses
Lighthouses is a GNP extension
• Pias et al. (IPTPS 2003)
• Problems with GNP
• Huge volume of measurement traffic
• Growing number of target nodes
• Solution
• Using multiple landmark sets
• Each host measures distances to only one set
• It is based on the concept
• Multiple local basis with a transition matrix
162
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.14. Lighthouses
163
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.15. Network Positioning System
Network Positioning System (NPS)
• Zhang et al. (USENIX ATC 2004) [26]
• Extends GNP into a hierarchical CS
• All nodes could serve as landmarks
• Introduces layers and dependencies
• Recovering mechanism
• Landmark failures
• Performance bottlenecks
164
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.16. Internet Coordinate System
9.17. Internet Coordinate System
• Clustering scheme for landmarks
• Administrative node groups landmarks that are close to each other into clusters
• Median nodes of the clusters are used by a node A to be joined
• Choosing the most representative landmarks
• Reducing the number of measurements
9.18. Virtual landmarks
165
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.19. Internet Distance Estimation Service
Internet Distance Estimation Service (IDES)
• Mao et al. (IMC 2004)
• Provides two learning algorithms allowing a linear dimensionality reduction applied to matrices
• Singular Value Decomposition (SVD)
• Non-Negative Matrix Factorization (NMF)
• Instead of Euclidean embedding
• 1
• 1
• 1
• 1
•
•
•
•
•
• One Possible 2-D Embedding
• H3
• H1
• H2
• H4
• H3
• H1
• H2
• H4
• The estimated distance
• between H1 and H4
• is 1.414 while the real
• distance is 2.0
• Extra dimensions don't help
166
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.20. DISTRIBUTED NCS
9.21. Distributed NCSs
• Generalizing the role of landmarks
• to any node existing in the system
• or by eliminating the permanent infrastructure
• It can be seen as a
• peer-to-peer
• positioning
• system.
9.22. Practical Internet Coordinates
9.23. Big-Bang Simulation
167
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Big-Bang Simulation (BBS)
• Shavitt et al. (INFOCOM 2003)
• Modeling the network nodes as a set of particles
• Particles are traveling in the space under the effect of potential force field
• They are initially placed at the origin of the space
• The field force is derived from the total embedding error
• Particles pull and repulse each others depending on the distance error between them
• Reducing the potential energy of the whole system
• At the end of each phase an equilibrium is achieved
9.24. Big-Bang Simulation
168
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.25. Vivaldi
• Dabek et al. (ACM SIGCOMM 2004)
• The most successful NCS so far
• Used by some BitTorrent clients!!!
• Not require any fixed network infrastructure
• Compute the coordinates for a node A
• Collecting distance information for a set of neighbors
• Calculating its new coordinates using the above meas.
• Each edge is modeled as springs
• Handling high-error nodes
• Weights for each RTT sample
• Vivaldi is quickly convergates when latencies satisfy the triangle inequality
169
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.26. Vivaldi
A variant of Euclidean Coordinates
• Notation of height
• Height space
• Euclidean coordinate + a height vector
• To model latency penalty of network access links
• Such as queuing delay, DSL lines or cable modems
• Representing the last hop delays
• Distance between nodes
• Euclidean distance + a positive value of the height vector
Some further extensions:
• E.g. Using hyperbolic space
9.27. Vivaldi
9.28. Vivaldi – Centralized algorithm
170
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.29. Vivaldi – Centralized algorithm
9.30. Distributed Vivaldi with constant timesteps
9.31. Vivaldi – Adaptive timesteps
9.32. Decentralized Vivaldi with adaptive timestep
9.33. Latency data for performance analysis
• PlanetLab all-to-all ping measurements
• 192x192 delay matrix
• King dataset involves 1740 DNS Servers
9.34. Timestep choice
9.35. Convergence and robustness against high-error nodes
9.36. Communication patterns
171
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.37. Triangle Inequality Violations
172
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.38. Euclidean spaces
• First question is how many dimensions to use
• 2 or 3 is sufficient
9.39. Spherical coordinates
173
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.40. Height model
9.41. Height model
9.42. Pharos - Hierarchical Vivaldi
174
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Y. Chen et al. (GLOBECOM 2007)
• A Two layer model based on Vivaldi
• Base overlay (Vivaldi)
• Local cluster
• Binning method with anchor nodes
• Each node has two set of coordinates
• Global NC
• Local NC
9.43. Pharos – The algorithm
9.44. Hierarchical distance prediction
9.45. A two-tier ICS
175
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• M. A. Kaafar et al. (IFIP-TC6 Networking ’08)
• Motivation
• TIVs have negative effects on the distance prediction
• The paper analyzes the proportion of TIVs
9.46. Triangular inequality violation
• M. A. Kaafar et al. (IFIP-TC6 Networking ’08)
• TIVs have negative effects on the distance prediction
• Analyzing the proportion of TIVs
9.47. Triangular inequality violation
• Impact of TIV severity on the embedding
9.48. Two-tier Vivaldi
• Two level
• Higher level - flat Vivaldi
• Global coordinates
• Lower level – Vivaldi on clusters
• Local coordinates
• Clustering
• Flat Vivaldi – Clustering the delay space
• Diameter
ms
• Cross-checking and removing nodes with high errors
176
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
9.49. Two-tier Vivaldi
9.50. Limitations
• Expensive maintenence
• More or less accurate prediction
• Triangle inequality violations (TIV)
• The matrix factorization introduced in IDES allows the representation of distances violating TIVs and
asymmetric distances
• Better accuracy when considering lower dimensional space (e.g. 2-dim.) [50], [51]
• The absolute relative error may not be the major indicator of the quality of an embedding
• Eclidean space vs other spaces
• Surface of spheres or tori
• Hyperbolic model
• Instead of height model
9.51. Benefits
• Benefit for P2P applications and overlays
• Azureus uses Vivaldi
177
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• More than one million nodes
• It improves the Azureus efficiency
• To improve the accuracy and stability
• Latency filters and application-specific coordinates updates
• Gossip based coordinates update
• TIVs exclusion or awareness
• Azureus example 0.5 percent of TIV nodes leads to 20 percent improvements in global accuracy
9.52. Comparison of different techniques
9.53. Security in NCS
PIC
• Providing a test based on TIV
• Removing the nodes that most violate the TI
• For all landmarks two bounds are introduced
• Upper and lower bounds
• This mechanism may degrade the performance of a clean system
9.54. Security in NCS
Internal attacks
• Participants are not completly trusted entities
• Attack families
• Isolation
• Independent isolation attack
178
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• M node delays the measured RTT such that it is consistent with the random coordinates claimed for the
victim
• Repulsion
• M node claims a position that is far away from the actual one and then delays the measured RTT
• Disorder
• Random attack like DoS attack
• System control
• Colluding Isolate attack
• M nodes cooperate with each other
• First they behave in a correct and honest way until enough of them become landmarks
9.55. Security in NCS
Generic security Mechanisms
• Surveyor Infrastructure
• Surveyor nodes form a group of trusted entities
• Scattered across the network
• Formal reputation model to detect misbehaving nodes
• Reliability of nodes
• Reputation Computation Agent
• Certificate agent
• RVivaldi
• Veracity
9.56. Future directions
• NCS are focusing on predicting the network latency
• Other network characteristics
• Jitter
• Bandwidth
• Additional dimensions in existing latency space
• Inverse correlation between latency and bandwidth
• Lee et al. (PAM 2005) [72]
• Embedding additional performance indicators in their own metric space
• E.g. bandwidth
179
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Streaming or downloading
• Server selection
• Sequoia [74]
• Assuming that bandwidth is a tree metric
• Prediction trees
9.57. Literature
• F. Dabek, R. Cox, F. Kaashoek, R. Morris: Vivaldi: A Decentralized Network Coordinates System,
Proceedings of ACM SIGCOMM ’04, 2004, Protland, Oregon, USA
• Y. Chen, Y. Xiong, X. Shi, B. Deng, X. Li: Pharos: A Decentralized and Hierarchical Network Coordinate
System for Internet Distance Prediction, Proceedings of IEEE GLOBECOM 07, 2007, Washington, DC, USA
• M. A. Kaafar, B. Gueye, F. Cantin, G. Leduc, L. Mathy, Towards a Two-Tier Internet coordinate system to
mitigate the impact of Triangle Inequality Violations, In proceedings of IFIP-TC6 Networking 08, 2008,
Singapore, LNCS
10. 10 IP geolocation
10.1. Motivation
• Location information can be useful for both private and corporate users
• Targeted advertising on the web
• Restricted content delivery
• Location-based security check
• Scientific applications
• Measurement visualization
• Network diagnostics
• Analysing spatial properties
• of the Intenet
180
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.2. IP Geolocation in general
• Passive Geolocation
• Geolinguistic
• Registry based
• Whois, dns
• Organizational infromation
• Commercial databases
• Maxmind, IPLigence, IP2Location, etc.
• Active Geolocation
• rt1.lon.uk.geant2.net
• London, UK
181
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
182
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.3. Whois based location estimation example for passive
geolocation
• Cumulative distribution of the maximal distances from Pamplona, Spain to 4000 Google IPs. The maximal
distances are calculated from the network delays assuming 200000 km/sec signal propagation speed. The
vertical line represents the real geographical distance between Pamplona and Mountain View, CA, showing
that 47 percent of the nodes must be closer to Pamplona than Mountain View.
183
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.4. IP Geolocation in general
• Geolinguistic
• Registry based
• Whois, dns
• Organizational infromation
• Commercial databases
• Maxmind, IPLigence, IP2Location, etc.
• rt1.lon.uk.geant2.net
• Passive Geolocation
• Active Geolocation
• Large and geographically dispersed IP blocks can be allocated to a single entity
10.5. IP Geolocation in general
• Geolinguistic
• Registry based
• Whois, dns
• Organizational infromation
• Commercial databases
• Maxmind, IPLigence, IP2Location, etc.
• rt1.lon.uk.geant2.net
184
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Passive Geolocation
• Active Geolocation
• Active probing
• Delay, topology, etc.
• Landmarks
• With known location
• Location estimates for each individual IP addresses
• Large and geographically dispersed IP blocks can be allocated to a single entity
• ?
• Target to
• be localized:
• 182.214.37.1
185
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Landmarks with known location
• 20 ms
• 39 ms
186
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• 31 ms
• 50 ms
• Transforming delays into geographic constraints
• Transforming delays into geographic constraints
187
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Transforming delays into geographic constraints
10.6. THE FIRST STEPS
10.7. IP2Geo – Single point localization
• Multi-pronged approach that exploits various "properties" of the Internet
• DNS names of router interfaces often indicate location
• network delay tends to correlate with geographic distance
• hosts that are aggregated for the purposes of Internet routing also tend to be clustered geographically
• GeoTrack
• determine location of closest router with a recognizable DNS name
• GeoPing
• use delay measurements to estimate location
• GeoCluster
• extrapolate partial (and possibly inaccurate) IP-to-location mapping information using BGP prefix clusters
[fragile]
10.8. GeoTrack – main idea
188
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Extract geographical information from DNS names of routers on the path
• Localizes the target to the last router whose position is known
• Example
ngcore1-serial8-0-0-0.Seattle.cw.net => Seattle
184.atm6-0.xr2.ewr1.alter.net => New York
dnvr-scrm.abilene.ucaid.edu => Denver
10.9. GeoTrack
• GeoTrack operation
• do a traceroute to the target IP address
• determine location of last recognizable router along the path
• Key ideas in GeoTrack
• partitioned city code database to minimize chance of false match
• ISP-specific parsing rules
• delay-based correction
• Limitations
• routers may not respond to traceroute
• DNS name may not contain location information or lookup may fail
• target host may be behind a proxy or a firewall
10.10. GeoPing - Delay based localization
• Delay-based triangulation is conceptually simple
• delay to distance
• distance from 3 or more non-colinear points - target location
• But there are practical difficulties
• network path may be circuitous
• transmission and queuing delays may corrupt delay estimate
• OWD is hard to measure
•
because of routing asymmetry
10.11. GeoPing - details
• Measure the network delay to the target host from several geographically distributed probes
• typically more than 3 probes are used
189
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• round-trip delay measured using ping utility
• small-sized packets - transmission delay is negligible
• pick minimum among several delay samples
• Nearest Neighbor in Delay Space (NNDS)
• akin to Nearest Neighbor in Signal Space (NNSS) in RADAR
• construct a delay map containing (delay vector,location) tuples
• given a vector of delay measurements, search through the delay map for the NNDS
• location of the NNDS is our estimate for the location of the target host
• More robust that directly trying to map from delay to distance
10.12. GeoCluster
• A passive technique unlike GeoTrack and GeoPing
• Basic idea:
• breaks the IP address space into clusters
• assign a geographical location to each cluster based on IP-to-location third party databases
• given a target IP address, first find the matching cluster using longest-prefix match.
• location of matching cluster is our estimate of host location
10.13. GeoCluster
• Example:
• consider the cluster 128.95.0.0/16 (containing 65536 IP addresses)
• suppose we know that the location corresponding to a few IP addresses in this cluster is Seattle
• then given a new address, say 128.95.4.5, we deduce that it is likely to be in Seattle too
10.14. GeoCluster – Clustering IP addresses
• Exploit the hierarchical nature of Internet routing
• inter-domain routing in the Internet uses the Border Gateway Protocol (BGP)
• BGP operates on address aggregates
• we treat these aggregates as clusters
• in all we had about 100,000 clusters of different sizes
10.15. Performance of GeoCluster
190
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Median errors:
• GeoCluster 30km
• GeoPing 300km
• GeoTrack 100km
10.16. ADVANCED TECHNIQUES
10.17. Constraint Based Geolocation
• Constraint Based Geolocation (B. Gueye et al.)
• strict geographic constraints based on the bestlines
• Calibration for each landmark
• Few hundreds of reference nodes
191
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.18. Constraint Based Geolocation
10.19. Octant IP geolocation framework
• Octant (B. Wong et al.)
• Similar to CBG, but it introduces negative constraints
• Calibration for each landmark
• Few hundreds of reference nodes
192
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.20. It is more than a simple method, it is a framework
• Combine very different techniques
• Active and passive
• Constraint-based
• Weighted positive and negative constraints
• Constraint - region
• Using Bézier-regions
• Efficient implementations of clipping and union operations are available
10.21. Notations
193
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.22. Octant – Landmarks and constraints
10.23. Estimated location
10.24. Mapping latencies to distances
10.25. Mapping latencies to distances
194
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.26. Mapping latencies to distances
10.27. Last hop delays
• Mapping is further complicated by queuing and transmission delays associated with the last hop
• Cable and DSL connections
• Overloaded PlanetLAB nodes
• Goal: isolate the delay components which artificially inflate latencies
• Detailed maps of the underlying physical network, as in network tomography (not in Octant)
• Octant introduce a simple metric called height
10.28. Eliminating last hop delays in Octant
195
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.29. Last hop delays in Octant
10.30. Last hop delays
10.31. Results
10.32. Results
196
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.33. Spotter – a probabilistic approach
• Spotter (S. Laki et al.)
• Idea: the distances are not uniformly likely within a constraint
• For a given delay, do the distances follow a special distribution?
• Bayesian approach to calculate spatial distribution of a target
10.34. Travel time – distance relation
• .
197
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• reference dataset (nodes with known location)
• known distance between the source and destination
• measured RTTs
10.35. Travel time – distance relation
• .
198
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• reference dataset (nodes with known location)
• known distance between the source and destination
• measured RTTs
• "slow" packets
• "fast" packets
10.36. Statistical delay-distance model
10.37. Statistical delay-distance model
199
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• The distances are normally distributed for a given RTT
10.38. Statistical delay-distance model
• The distances are normally distributed for a given RTT
• (after standardization)
10.39. Evaluation – "Probabilistic triangulation"
200
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.40. Performance analysis
• Estimated accuracy for a geolocation ground truth
• CAIDA's Geolocation Comparision Survey
• More than 20000 reference nodes
• Located in North America and in Europe
• In North America
• 35 percent
201
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• 9 percent
• 2 percent
• 70 percent
• 40 percent
• 27 percent
10.41. Topology-based Geolocation
10.42. Topology based geolocation
• TBG (E. Katz-Basset et al.)
• Problems with CBG
202
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Use constraints that are less than speed of light
• Risk of underestimates
• When an underestimate occurs, the final region does not contain the true location
• Topology based geolocation
• using the speed of light to generate constraints
• inspired by Sensor Network Localization
10.43. Summary of techniques
• Traceroute from landmarks
• Map topology
• Estimate hop latency
• Improve accuracy
• Cluster network interfaces
• Increase structuring
• Validate location hints
• Incorporate location hints
• Constraint optimization
• Geolocate targets
10.44. Estimate hop latencies
• Using traceroute tool to infer link latency
• Estimate hop latency from the difference in RTT to adjacent routers
• Accurate only if the link is traversed both directions (symmetric routing)
• How can we discover this property?
• Three different techniques
10.45. Estimate hop latencies
• Observing the reverse TTL values
• Most routers initialize the TTL values for thier packets from a small set.
• 30,32,64,128,150,255
• If TTL values changes significantly from one node to the next - discard the link estimate
• Measuring paths in both direction between pairs of landmarks
• If both paths traverse a particular link - taking the differences of measurements to the two endpoints
203
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• This estimation has high confidence
• Increasing vantage points from which we probe a certain link
• For every link on a path from a landmark we probes to both endpoints from all other landmarks…
• If these probes pass over the link - estimate for the link
10.46. Clustering interfaces
• Clustering interfaces that belong to the same router (IP aliases)
10.47. Clustering interfaces
• Two IP-aliases techniques
• Mercator
• UDP probes are send to high-numbered ports on a set of interfaces
• Routers send back a port-unreachable ICMP message with the source address
• If two diff. interfaces replie with the same source address - aliases
• Ally
• Used on pairs of interfaces
• Sends probes to the two if.
• Examines the IP-ID
204
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Most routers generate the IP-ID using a single counter that has incremented after each packet has been
created
10.48. Validating location hints
• DNS names - locations
• Some names are incorrect
• Missnamed, reconfig, reassignment of IP addresses
• Topology constraints can be used to verify location hints
• RTT measurements - upper bounds...
• Clustering - aliases
• Hop latencies
10.49. Constraint optimization
10.50. Constraint optimization
10.51. Results
205
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
10.52. Results
10.53. Other issues to be handled Indirect routes
• The preceding assumption
• Route lengths are proportional to great circle distances
• Not the case in practise, due to policy routing
• Example: a subscriber Ithaca, NY - Cornell Univ. (Ithaca)
• Syracuse, NY - Brockport, IL - New York City - Cornell Univ.
• 1 mile physical distance VS. 800 miles length path
10.54. Other issues to be handled Indirect routes discovery
• Landmark's heigth can indicate?
• Localizing routers on the network path
• Secondary landmarks
• Localization by latencies
• Extract location from router names
206
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Reverse DNS lookup – undns tool
• Using ZIP code to determine geographical location
10.55. Other issues to be handled Handling uncertainty
• Filter out errorneous constraint
• Latency based constraints
• Weight system that decreases exponentially with increasing latency
• Weight threshold
10.56. Other issues to be handled Iterative refinement
• Two phase:
• First, we use accurate and mostly conservative constraints
• Second, less acurate and more aggressive constraints to obtain a better estimation (inside the initial
estimated region)
• And so on...
10.57. Literature
• Padmanabhan, Venkata N., and Lakshminarayanan Subramanian. An investigation of geographic mapping
techniques for internet hosts. ACM SIGCOMM Computer Communication Review 31.4 (2001): 173-185.
• Gueye, Bamba, et al. Constraint-based geolocation of internet hosts. Networking, IEEE/ACM Transactions on
14.6 (2006): 1219-1232.
207
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Wong, Bernard, Ivan Stoyanov, and Emin Sirer. Octant: A comprehensive framework for the geolocalization
of Internet hosts. Proceedings of the NSDI. Vol. 7. 2007.
• Katz-Bassett, Ethan, et al. Towards IP geolocation using delay and topology measurements. Proceedings of
the 6th ACM SIGCOMM conference on Internet measurement. ACM, 2006.
• Laki, Sándor, et al. Spotter: A model based active geolocation service. INFOCOM, 2011 Proceedings IEEE.
IEEE, 2011.
11. 11 Geography of the Internet On the spatial
properties of network topology
11.1. Network research
• 1959
• 1998
• 1999
208
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Erdős and Rényi
• Watts and Strogatz
• Barabási and Albert
209
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Transport
• Biological
• Social
• Internet
11.2. The distance is what really counts.
• Is that really true?
• What can we say about the spatial structure of the Internet?
• P. Mátray, P. Hága, S. Laki, I. Csabai, G. Vattay
• On the Spatial Properties of Internet Routes
• Elsevier Computer Networks, Volume 56, Issue 9 (2012)
11.3. Data collection
210
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• 700 PlanetLab nodes
• 400,000 traceroutes
• 16,000 unique IP addresses
211
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• ?
11.4. Data collection
212
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Spotter
• 5
• S. Laki et al.: Spotter: A Model Based Active Geolocation Service, IEEE INFOCOM 2011, April 2011,
Shanghai, China
• 13,000 filtered addresses
• 44,000 links
• How to visualize such data set?
213
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
11.5. Covered areas
11.6. Histogram maps
• Simple aggregation:
• Sgi
• g6
• g5
• g4
• g3
• g2
• g1
214
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• A histogram map
• A histogram map
• San Francisco
• Chicago
• New York
• Washington
215
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• A histogram map on log-scale
• Los Angeles
• Plano
11.7. Transforming spatial distributions
216
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• g
11.8. Transforming spatial distributions
• ...
•
•
11.9. Transforming spatial distributions
217
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• ...
•
•
11.10. Transforming spatial distributions
• ...
218
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
•
•
11.11. A router-likelihood map
11.12. Likelyhood of router positions - US
219
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
11.13. Likelyhood of router positions - US
• Green dots represents the most populated areas
220
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
11.14. Characterizing the link length
• Link length is approximated by the spherical distance between the two routers
221
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
11.15. Characterizing the network links
• Which links are important?
• Which cities are the most interconnected?
• Which link length is the most frequent?
• How to model link length distribution?
• What can be said about the spatial structure of the network?
• etc.
11.16. Frequency of link lengths
• which links are frequent and important?
11.17. Frequency of link lengths
222
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• each link is represented once
• which links are frequent and important?
11.18. Frequency of link lengths
• each link is represented once
• ?
• which links are frequent and important?
11.19. Frequency of link lengths
223
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• each link is represented once
• Urban range
• Intracont.
• Atlantic O.
• Pacific O.
• which links are frequent and important?
11.20. Frequency of link lengths
224
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• each link is represented once
• links are weighted up with their
• prevalence in the traceroute
• data set
11.21. Frequency of link lengths
225
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• each link is represented once
• links are weighted up with their
• prevalence in the traceroute
• data set
• LA-Houston
• 121,000 occurances
• 39 unique links
• 1.5 percent of all traffic!
11.22. Frequency of link lengths
226
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• each link is represented once
• links are weighted up with their
• prevalence in the traceroute
• data set
• Amsterdam - New York
• and
• Frankfurt - Washington
• Copenhagen - New York
• and
• Paris - Washington
11.23. Frequency of link lengths
227
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• each link is represented once
• links are weighted up with their
• prevalence in the traceroute
• data set
• Amsterdam - New York
• and
• Frankfurt - Washington
• Copenhagen - New York
• and
• Paris - Washington
228
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Only a handful of gateway cities: "spatial hubs".
11.24. Distribution of link lengths
• do router distances follow specific rules?
11.25. Distribution of link lengths
• logarithmic relation, where
• do router distances follow specific rules?
11.26. Distribution of link lengths
• logarithmic relation, where
• do router distances follow specific rules?
• Similar phenomena found in:
• Social networks D. Liben-Nowell, et al., PNAS(2005)
• Mobile communication networks R. Lambiotte et al., Physica A (2008)
• E-mail networks J. Goldenberg, M. Levy arXiv:0906.3202
• Early Internet data S. H. Yook et al., PNAS (2001)
229
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
11.27. Distribution of link lengths
• logarithmic relation, where
• do router distances follow specific rules?
• J. Kleinberg: Navigation in a small world, Nature (2000)
• Connection to navigability
11.28. Distribution of link lengths
230
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• logarithmic relation, where
• power law, where
• do router distances follow specific rules?
11.29. The embedded topology
231
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
11.30. Characterizing network paths
• Circuitousness
• Direction dependence of lateral deviations
• Hop distance analysis
• Symmetry of Internet routes
11.31. Aggregated path length
232
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• The sum of the length of
• the consecutive links.
11.32. Circuitousness
• Geographic, geopolitical and economical factors also affect routing
233
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
11.33. Symmetry
11.34. Symmetry
234
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• A: United Kingdom – Hong Kong
• B: California, USA – Hong Kong
• C: California, USA – Singapore
11.35. Direction dependence of lateral deviations
235
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
11.36. Unfamiliar routing phenomenon?
• RTT 300 ms
11.37. Literature
• Kleinberg, Jon M. Navigation in a small world. Nature 406.6798 (2000): 845-845.
• Laki, Sándor, et al. Spotter: A model based active geolocation service. INFOCOM, 2011 Proceedings IEEE.
IEEE, 2011.
• Zhang, Yin, and Nick Duffield. On the constancy of Internet path properties. Proceedings of the 1st ACM
SIGCOMM Workshop on Internet Measurement. ACM, 2001.
• Mátray, Péter, et al. On the network geography of the internet. INFOCOM, 2011 Proceedings IEEE. IEEE,
2011.
• Mátray, Péter, et al. On the spatial properties of internet routes. Computer Networks (2012).
236
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
12. 12 Network traffic analysis, clustering and
classification
12.1. Traffic
• ISP
• Internet
12.2. Traffic classification
• ISP
237
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Internet
• ?
• ?
• ?
• ?
• What protocol and application has generated the traffic?
12.3. Traffic classification
• Identifying the applications and protocols generating IP flows
• Why is it important?
• Adaptive, network-based QoS mapping
238
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Lawful interception
• User behavior analysis
• Interest from industry
• ISPs want to know who is using their network and what for
12.4. Quality of Service (QoS)
• Quality of service is the ability to provide different priority to
• Different applications
• Users
• data flows
• or to guarantee a certain level of performance to a data flow.
• Different characteristics can be guaranteed
• Bit rate, delay, jitter, packet dropping, etc.
• QoS is often referred to as a quality measure, but there are many different definitions.
12.5. Traffic Classification
• Many new challenges
• Various peer to peer applications
• Traffic encryption
• Standard port numbers can be missleading
• Standard “static" methods cannot be applied anymore
• New approaches are needed
12.6. Traffic Classification
• Port
• Transport layer ports
• Payload
• Pattern matching
• DPI tools
• Flow statistics
• Inter arrival times, packet sizes, etc.
• Social
• Connection patterns
239
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• User/host level statistics
• Very old fashion!!!
• Recently used
• Experimental
• not too efficient
• Hard to implement in a real network
12.7. Different approaches
12.8. Deep Packet Inspection
• What is it about?
• Stateful inspection on packet header and payload
• Signature based pattern matching
• Why is it a big deal?
• High speed in-line processing (at wire speed)
• Low memory and storage consumption
• Low false positive and miss rates
• Good performance even in worst cases
• Why is it so important?
• Network Intrusion Detection, Lawful Inspection, QoS, Censorship, Traffic Blocking, etc.
12.9. Deep Packet Inspection Basics
• Classical solution is based on DFA
• Aho-Corasick DFA algorithm (1975)
• Word set: a, ab, bc, bca, c, caa
• Consumes one byte/character
• per lookup cycle
• 10GbE/OC192 - 1 gigabytes/sec.
• Too many state transitions even for such a small set
240
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Initial state
12.10. Multi-byte pattern matching
• w1: apple
• w2: application
• w3: appeal
• w4: peal
• w5: appreciate
• ication
• peal
• app
• l
• reciate
• e
• eal
241
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• One character per lookup, but some speed up can be achieved
12.11. Deploying multiple multi-byte DFAs
• ...
• x
• y
• z
• a
• p
• p
• l
• i
• c
• a
• t
• i
• o
• n
• z
• y
• …
• Table replicas for different offsets
• Higher memory complexity
• One lookup for each offset
12.12. True positive VS False positive etc.
• A grand truth data set with true labels
• Generated by a reliable DPI or manually
• KNOWN
• OTHER
• TP( ) =
• Classifying
• FN( ) =
242
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
12.13. Performance of different DPI tools
• Note
• RF is a Random Forest-based classification approach
• RF is usually at least as good as those of the DPI methods in TP ratio,
• while the DPIs are usually better in FP ratio
• Crucial for protocol dependent policies
12.14. Classical recipe for flow statistic-based traffic
classification
• Calculate features based on flow statistics
• Inter packet delays
• min., max., avg. packet sizes
• Total amount of bytes in the flow
• Number of some tcp flags
• Etc.
• Train a classifier on a ground truth dataset
• E.g. a Support Vector Machine or Random Forest
• Validate the approach using e.g. ten-fold classification
• Split the training data into 10 sets with equal sizes
• Train on 9 selected sets and test it on the remaining one
• Repeat this procedure in all the possible ways
• Which features are the best?
243
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• How to decide?
• What about overfitting?
• Is the ground truth general enough???
• For more details read the survey by Thuy T.T. Nguyen et al.
12.15. Statistical payload analysis
• Statistical characterization of data in a flow
• Modeling a protocol
• Byte distribution on flows
• A useful protocol model
• Expressive
• Compact
• Automatic
• no human expertise/work is needed.
• Few attempts: KISS, Markov models
12.16. KISS: Stochastic Packet Inspection
• A. Finamore et al.
• The method consists of three phases:
• Statistical characterization of traffic
• Look for the behavior of unknown traffic
• Assign the class that better fits it
• Check for false positives
12.17. Chi square statistics
• Time
• Deterministic
• Deterministic
• Deterministic
• Counter
• Random
• Deterministic
• Deterministic
244
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
•
payload bytes, 4bit x Chunk
• Source: A. Finamore et al.
• Source: A. Finamore et al.
12.18. Decision process
• Statistical characterization of bits in the flow
• Each flow is a vector in a G dimensional space
• Decision process
• Classification of these vectors based on
• Euclidean distance – minimum distance
• Maximum likelihood – Support Vector Machine
12.19. Validation on a real traffic trace
• RTP errors are due to the unreliable training data
• (dpi did not identify RTP v1)
• DNS errors are due to impure training set
• (for the oracle all port 53 is DNS traffic)
• EDK errors are (maybe) Xbox Live
• (proper training for "other")
• FN are always
• below 3 percent!!!
• Source: A. Finamore et al.
12.20. Early Identification of Peer-To-Peer Traffic
• B. Hullár et al. (IEEE ICC 2011)
• Similar goals to KISS
• Based on the statistical analysis of packet payloads
• Modeling a protocol
• Byte distribution on flows
12.21. Modeling a flow
• Feature vector
• The first X payload bytes of the first Y packets per flow
245
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• where XY is small
• Packet 1
• Packet 2
• Packet Y
• XY bytes as features:
12.22. Classification via probabilistic models
• Krichevsky-Trofimov (KT) estimator
• Zero order model
• Memoryless distribution over blocks of the payload
• KT estimator provides smoothes estimates for unseen data
• Markov
• First order Markov-model
• Low memory footprint
• MarkovKT
• First order model
• Using Krichevsky-Trofimov estimator
• Low memory footprint
246
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• CTW
• Context Tree Weighting Method
• Lossless data compression
• Higher order model (at most 5 in our case)
• Combining exponentially many variable-order Markov models.
• Random Forest
• State-of-the-art classification technique
• Using the estimate of many decision trees for improving accuracy
• Roboust against noise
• Prone to overfitting
12.23. Data Collection for ground truth
• Ground truth data
• captured in a fully controlled environment
• labeled by a modified kernel module
• Fully trusted
• recorded at ELTE
• disadvantage: not too diverse data
• advantage: exact class information (uncommon in the literature)
• WIRELESS data set
• WiFi and 3G
• traces with full payload
247
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• July 2010
• LAN data set
• high-speed LAN
• only the first 16 payload bytes per packet
• November 2009
12.24. Experiments
• Evaluation on the labeled traces
• True Positive and False Positive Ratio in bytes and flow numbers
• Cross validation
• Measurements with different parameter values
• training set size
• used bytes and packets
• Robustness
12.25. Feasibility test
• Classification from the first 16 bytes of the first packet of each flow
248
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
12.26. Feasibility test
• Classification from the first 16 bytes of the first packet of each flow
249
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
12.27. How much data is needed?
250
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
12.28. How much data is needed?
• Ten-fold cross validation
• Experiments
•
12.29. How much data is needed?
• Ten-fold cross validation
• Experiments
•
251
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
12.30. Robustness
• Similar results were obtained in the following scenarios:
• Unknown traffic
• Trained on Wireless tested on LAN
• Asymmetric routing: using the first reverse packet/reverse flow
• Real traffic traces
12.31. Training set sizes
• How much training flow is needed
• if only the first packet is considered?
252
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
12.32. Is it protocol independent?
253
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Using around 30 bytes of the payload is sufficient
• in contrast to the 300-400 bytes used by KISS
12.33. Robustness Asymmetric routing
• Observing traffic from just one direction of a flow
• Classification based on the first backward packet
12.34. Robustness Unknown traffic
• A new class has been introduced
• "Others" containing unknown traffic types
254
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
12.35. Real traffic traces
• Various protocols:
• FTP, XMPP, DirectConnect, Gnutella, SIP, SSH, RTSP, POP3, UPnP, Windows, Source-engine, xbox,
Opera-Mini-sockets, DNS, HTTP, IMAP, RTMP, BitTorrent, WAP, WoW, RTP, PPStream, SMTP
• 16GB traffic in 365k flows
• Some class with very high percentage (HTTP, Bittorent),
• others with few flows (eg.: FTP 80 flows, XMPP 70 flows)
12.36. Confusion matrix
255
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
12.37. Literature
• Aho, Alfred V., and Margaret J. Corasick. Efficient string matching: an aid to bibliographic search.
Communications of the ACM 18.6 (1975): 333-340.
• Nguyen, Thuy TT, and Grenville Armitage. A survey of techniques for internet traffic classification using
machine learning. Communications Surveys and Tutorials, IEEE 10.4 (2008): 56-76.
• Finamore, Alessandro, et al. Kiss: Stochastic packet inspection. Traffic Monitoring and Analysis. Springer
Berlin Heidelberg, 2009. 117-125.
• Béla Hullár, Sándor Laki, and Andras Gyorgy. Early identification of peer-to-peer traffic. Communications
(ICC), 2011 IEEE International Conference on. IEEE, 2011.
13. 13 Measurements in peer-to-peer networks
13.1. Centralized VS P2P
256
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
13.2. Peer-to-peer networks
• Peers can act as clients and servers
• Different approaches
• Centralized, decentralized
• Structured and unstructured
• Tracker-based
• Dynamic membership
257
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Voluntary
• Scalable and reliable
13.3. Peer-to-peer networks
• P2P apps generate the majority of the Internet traffic
• Issues: legality, volatility, scalability
13.4. Some P2P protocols
258
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Napster
• Pseudo p2p, centralized index, mp3 distribution
• Gnutella (Limewire, Skype, Bearshare,WinMX)
• Fully decentralized, distributed searching
• Kademlia (eMule, Overnet)
• Decentralized,DHT for lookup, XOR of node keys as distance metric, structured
• Kazaa
• Supernodes, closed source, hierarchical approach
• BitTorrent
• Tracker-based, decentralized, tit-for-tat, choking
13.5. What do we want to measure?
• P2P applications are widely used nowadays
• The majority of the traffic
• Important to understand their characteristic to develop better algorithms
• ISPs also want to know the effect of these applications on their network
13.6. How can we do that?
259
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Collect traffic traces and classify p2p traffic
• Build a P2P crawler to crawl a real world application and then collect the crawler’s traffic
• Different P2P systems require different solutions
13.7. Gnutella
• How it works?
• Connection
• Host list from GWebCache or a locally stored file
• Ping/pong messages between potential neighbors
• Content lookup
• Query messages flooding on the network
• QueryHit message propagates back to the source from peers having the content
• Download
• The source directly downloads the file from peers having the content
13.8. Gnutella vs Napster
• S. Saroiu et al. (MMCN '02)
• What can we say about ...?
• Latency
• Lifetime of peers
• Bottleneck bandwidth
• Neighborhood size
• Etc.
• An active crawler was used for data collection
13.9. Gnutella vs Napster Lifetime of the peers
• More than 90 percent Internet host uptime for 20 percent peers
• Application uptime is higher for Napster peers than for Gnutella ones
• Median session duration is the same for both p2p networks
260
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
13.10. Gnutella vs Napster Shared files vs Shared data
• Strong correlation between the number of shared files and the amount of shared data
• slope of both lines is 3.7MB which was the typical size of an mp3 file
261
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
13.11. Gnutella Latencies and downstream bandwidth
• 60 percent of the peers have a latency between 70 and 280 ms
• Two clusters can be identified
262
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• 70 ms
• 280 ms
• 20-60Kbps modem
•
Mbps broadband
13.12. Kademlia
13.13. Kademlia
• Subtrees for node 0011....
• Each subtree has k buckets (k neighboring nodes)
13.14. Kademlia
• node 0011...wants to search 1110...
263
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
•
13.15. Kademlia
• R. Bhagwan et al.
• Focusing on Overnet network
• Using peer IDs to calculate availability
• Availability
percent of time a user or a machine is online
• How to measure?
• Crawler takes snapshot of all the peers by requesting 50 random IDs and repeats this regularly
• Prober goes through the list of available IDs to check their availability, by sending a request them
13.16. Kademlia Collected data
• Data was collected from January 14 to January 28, 2003
• About 40,000 hosts discovered by one crawl
• For a day (6 crawls) 70,000 and 90,000 unique hosts
• 1468 of the 2400 randomly selected hosts probes responded at least once to the Prober’s requests
13.17. Kademlia Peers with dynamic IPs
264
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Percentage of hosts that have multiple IPs during a longer period of time
13.18. Kademlia Peer availability
265
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Availability values for 7 days
• Based on host IDs
• Based on the first IP seen for each host ID
• 0.07
• 0.3
• Using IP addresses would thus underestimate availability.
13.19. IP-based availability is similar to what we have seen for
Gnutella
266
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
13.20. How duration can affect the availability?
• the longer the period of time, the greater the chances of a host being unavailable
13.21. Time of day effects
267
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• 80 percent of all host pairs lie in this interval
• Strong independence
13.22. BitTorrent
• Efficient and very popular file sharing system
• Unstructured P2P network
• Where the story has begun
• 2001 – Bram Cohen – BitTorrent Inc.
• And nowadays...
268
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• In 2009, the 27-55 percent of the overall Internet traffic
• Fundamentals:
• Tit-for-tat
• Incentive
• Pieces and Blocks
13.23. File Sharing
• How to share content:
• The peer creates a .torrent file:
• (1) metainformation about the file to be shared
• (2) infromation about the tracker
• storing the peers
• Downloading:
• .torrent file needs to be downloaded
• The peer connects to the tracker
• and obtains a set of neighboring peers
• Web Server
• Harry Potter.torrent
• Transformer.torrent
• The Lord of Ring.torrent
13.24. *.torrent
269
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• URL of the tracker
• Pieces
• Size of a piece
• Filenames
• File sizes
13.25. The Tracker
• Maintains a list of peers in the network
• IP address, port, peer id
• State information (Completed or Downloading)
• It returns a random peer list to the requests
• Neighboring peers
13.26. BitTorrent
• Seeder
• a peer, who has the whole file (all the pieces)
• Initial seeder
• the peer, who has the initial copy of the file to be shared
• Leecher
• a peer, who downloads data from others
• Initial seeder
• Seeder
270
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Leecher
• Leecher
13.27. An example
• Seeder: A
• Leecher B
• 1,2,3,4,5,6,7,8,9,10
• 1,2,3
• Leecher C
• 1,2,3
• 1,2,3,4
• 1,2,3,9
• 1,2,3,4,9
13.28. File sharing
• Initial seeder subdivides the file into pieces
• Leecher
• Find the tracker to obtain a neighborhood
• As soon as a piece has downloaded it can be shared by others
• After having all the pieces, the file can be assembled and the peer becomes seeder.
• The more pieces are downloaded, the more replicas are available in the network
13.29. Lifetime of a torrent Seeders and leechers
• Initial interest
13.30. Pieces and sub-pieces
271
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Pieces and sub-pieces
• Generally, a piece consists of sub-pieces of 16 KBs.
• While a piece is not ready, it’s sub-pieces are being downloaded with high priority
• The goal is to have the entire piece as soon as possible
• Transmission
• Data transmission is over TCP (or UDP in some new clients)
• Many request can be handled in parallel
• Typically five
• When a sub-piece arrives, a new request is sent out
13.31. Piece Selection
• It is crucial for good performance
• Rare pieces issue
• The worst case when some pieces are missing from the network.
• In such case, if the initial seeder stops, the file cannot be reassembled.
• What is the good strategy?
13.32. Piece Selection
• Strict Priority
• When the peer received a sub-piece of a new piece, the sub-pieces of this piece will be downloaded with
high priority.
• Goal: having a new complete piece
• Primary rule
• Rarest First
• General rule! A peer always asks for the locally rarest piece (locally=in its neighborhood)
• Goal: avoid the situation when some pieces are missing…
• Random First Piece
• This policy is applied to download the first piece. A peer choses a random piece first, and download it from
one or multiple neighboring peers.
• Goal: having the first piece to be distributed as soon as possible
• Endgame Mode
• If only a few pices are missing, the peer sends requests to multiple peers in its neighborhood to obtain the
sub-pieces. If the piece has downloaded, it stops requesting.
13.33. Choking
272
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• There’s no central resource management
• Each peer try to maximize its download rate
• Tit-for-tat:
• Upload
• Choking
• Chocking
• The peer can deny uploading temporally
• Handling free-riders
• Peers that only download
• Alice
• Bob
• Choked
• Choked
13.34. Choking algorithm
• Peer A sends a choke message to peer B
• If A decides to refuse uploading to B
• Rechoking period is 10 sec
• Based on the download rate
• Each peer uploads to at most four neighboring peers
• Three of them are with the highest download rates
• Tit-for-tat
• The fourth is chosen randomly (Optimistic Unchoke)
• The worst neighbor is changing periodically
13.35. Optimistic unchoke
273
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• New connections can be tried out
• There is a chance to find better neighbors
• Avoiding starvation of new peers
• New peers also have to download pieces
• Choking the worst peer in every 30 sec
13.36. Upload only mode
• After downloading all the pieces, the leecher becomes a seeder
• Who shall we upload to?
• To the peer having the highest upload rate
• Goal is to have a new seeder in the network as soon as possible
• Assuming that the user will not stop sharing the file after becoming seeder
13.37. Lifetime of a torrent
• L. Guo et al. (IMC 2005)
• Exponential decay of peer requests
• Initial interest, but then the number of requests decreases rapidly
• The download rates of leechers show high variance in different period of time
274
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
13.38. Peer behaviour
• The overall contribution of a peer is decreasing, if it's download rate is increasing
• The probability that a peer exits is basically independent of its download speed and the amount of already
downloaded data.
275
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
13.39. Peer behaviour
• The life time of most of the torrent is between 30 and 300 hours, the average is 8.5 hour
• The average population size is 102 peers, which is not too much
• The average seeding time is 8.42 hours
276
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
13.40. Literature
• Saroiu, Stefan, P. Krishna Gummadi, and Steven D. Gribble. Measurement study of peer-to-peer file sharing
systems. Electronic Imaging 2002. International Society for Optics and Photonics, 2001.
• Bhagwan, Ranjita, Stefan Savage, and Geoffrey M. Voelker. Understanding availability. Peer-to-Peer
Systems II. Springer Berlin Heidelberg, 2003. 256-267.
• Guo, Lei, et al. Measurements, analysis, and modeling of BitTorrent-like systems. Proceedings of the 5th
ACM SIGCOMM conference on Internet Measurement. USENIX Association, 2005.
14. 14 Analysis of online social networks
14.1. Be socialized
277
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Social networks are basically graphs
• Vertices are people
• Edges are relations between them
• Friends, followers, etc.
• Different online social network
• For different communities
• with the same interest
• With different edge types
• Much more connections than a user have in real life
• Online friends may never meet
14.2. Increasing interest
• Source: B2Ce.Consultancy company websites
278
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.3. Twitter users
14.4. Social Flow
• Twitter network related to #SahelNow.
279
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.5. Twitter
• Microblog system
• People can write short messages, tweets and share their thought with their followers
• People can subscribe to follow others (friend)
• People can retweet tweets from others
• Hashtags can be used to the mark the topic of a tweet
• Tweets can also contain URLs (bit.ly, tinyurl)
• The structure of this network can be represented as a directed graph where
• Vertices are Twitter users
• Directed edges between users
• Two names for the same edge: Friend and follower
• B is a friend of A if A follows B
• A is a follower of B if A follows B
• There are other graph structures which can be used to analyze other properties of this network
• E.g. retweet graphs, etc.
14.6. Why do we analyze it?
• Advantages
• Representative
• Full spectrum of communication from mass media and celebrities to ordinary users
• Easy tracking of information flow
• Retweets, urls, etc.
• Available
• Drawbacks
• Twitter is not the most popular social media
• It is only one communication channel
• Hard to measure its effect on real life
14.7. Is the data available?
• Twitter Firehose Stream
• 1 percent of all the tweets is freely available
• 5 million tweets per day
280
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• 0.5 GB data per day
• One year is 180 GB
• Twitter graph is not fully available
• But we can crawl it using twitter api
• And there are follower graph snapshots
• Time consuming and there are several limitations
• Getting the entire graph is almost impossible
14.8. Quantifying influence
• E. Bakshy et al. (WSDM’11)
• Twitter follower graph (July 2009) + 1.03B tweets
• Analyzing the URLs posted
• 87M tweets with bit.ly urls
• 1.6 M users seeded an average
• of 46.33 bit.ly urls each
14.9. How to measure influence?
281
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• For a given URL, let’s define the influence score as
• The time is given for each URLs posted
• If B follows A and A posted the URL before B and was the only of B’s friends, we say A influenced B to
post the URL.
• What if B has several friend’s who posted the URL?
• We can consider three different ways:
• time
• 2
• 0
• 1
• 1
• 1
• 1.5
• First
• Last
• Shared
14.10. Cascades
• We can construct disjoint influence trees called cascades for each initial posting of an URL.
282
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.11. Cascade sizes and depths
• Cascade sizes approximately follow a power-law distribution
• Depths resemble an exponential distribution
• Both figures imply that the vast majority of posted URLs do not spread at all
283
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Extremely rare
• Extremely rare
14.12. How to predict influence?
• Aggregate all urls by a user
• The influence of a user
• the logarithm of the average size of all cascades for a user
• Regression tree method to estimate influences
• Five fold cross-validation
14.13. Regression tree for influence prediction
14.14. Past influences vs followers
284
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Past local influence versus number of followers for all users
• The size of the circles represents the actual average influence
• For the top 25 users having the highest actual influence values
285
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.15. Information flow on twitter
S. Wu et al. (ACM WWW ‘11)
• Data
• Full follower graph
• 5B tweets collected between 2009 and 2010
• Elite Twitter users categorized into four classes
• Celebrities, media, organization, blogger
• Snowball sample of Twitter lists
• Analyzing bit.ly url spreading among different user groups
• Two step flow of information
• 46 percent of URLs spread along the class chain:
• Elite - intermediate - ordinary
14.16. How to identify Elite users?
14.17. Snowball sample of Twitter lists
286
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Users appeared in the pruned list of lists with Lady Gaga (e.g. from celebrity or celebs)
• End so on
14.18. Activity sample of Twitter users
• Snowball sampling is potentially biased by our particular choice of seeds.
• Let’s crawl all lists associated with all users who tweeted at least once every week for our entire observation
period.
• biased towards users who are consistently active
• But the bias is likely to be quite different
287
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.19. Who listens to whom?
• Share of tweets received between elite categories
• It shows only how many URLs are received by category i from category j and it is a weak measure of
attention for many tweets go unread.
14.20. Who listens to whom?
• Retweets between elite categories
288
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.21. Two step information flow
• Two-step flow theory (Katz and Lazarsfeld 1955)
• Media exerts indirect influence on the masses via an intermediate layer of opinion leaders
• A typical information flow in twitter
• Elite - intermediate - ordinary
• Direct flows
• 46 percent of media-originated information is received through intermediaries
289
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Elite
• Intermediate
• Ordinary
14.22. Who are the intermediaries?
• A large population (490K users) act as intermediaries for 600K users
• Most (99 percent) are ordinary
• Also receive information via two-step flows
• More exposed to the media
290
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.23. Who are the intermediaries?
• Opinion leadership is not a binary value.
• Consistent with the original two-step flow theory
291
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
292
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.24. Network Dynamics
• J. Leskovec et al. (ACM SIGKDD ‘09)
• Memetracking
• Tracking new topics, ideas, and "memes" across the social network
• Finding exact quotes from blogs, and graph their volumes over time
• Possible to see news cycles
14.25. Memetracking
14.26. Memetracking
293
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.27. Collective attention on Twitter
294
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• J. Lehmann et al. (WWW'11)
• Data
• 130 M Tweets and a follower graph with 2.7M users
• Two parameters
• Number of mentions before the peak
• Number of mentions after the peak
• Clustering in this two dimensional space
• They identified four clusters
14.28. Collective attention on Twitter
14.29. Collective attention onTwitter
295
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.30. Collection attention on Twitter
• Semantic makeup of the hashtag classes:
• columns represent peak types and rows correspond to topics,
• i.e., concepts in the WordNet semantic lexicon.
• The radius of a circle is proportional to the average normalized frequency of the topic in the corresponding
hashtag class.
• The displayed topics represent the most frequently observed generic concepts.
• Sample terms subsumed by them are reported in parenthesis.
296
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
14.31. Literature
• Bakshy, E., and Hofman, J. (2011). Everyone’s an influencer: quantifying influence on twitter. Proceedings of
the fourth ACM international conference on Web search and data mining.
• Wu, S., Hofman, J. M., Mason, W. a., and Watts, D. J. (2011). Who says what to whom on twitter.
Proceedings of the 20th international conference on World wide web - WWW ’11, 705.
doi:10.1145/1963405.1963504
• Leskovec, J., Backstrom, L., and Kleinberg, J. (2009). Meme-tracking and the dynamics of the news cycle.
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
• Yang, J., and Leskovec, J. (2011). Patterns of temporal variation in online media. Proceedings of the fourth
ACM international conference on Web search and data mining
• Lehmann, J., and Gonçalves, B. (2012). Dynamical classes of collective attention in twitter. Proceedings of
the 21st international conference on World Wide Web
• Wu, S., Tan, C., Kleinberg, J., and Macy, M. (2011). Does bad news go away faster. Proc. 5th International
AAAI Conference on Weblogs and Social Media, 2011
15. 15 Measurements in mobile and cellular networks
297
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
15.1. Internet and cellular networks
• The Internet usage has changed dramatically
• Mobile traffic is expected to grow rapidly in the near future
• 4G/LTE networks will provide much higher bandwidth (100/50Mbps d/u) and lower latencies on the last
hop
• More than 1B Internet enabled smart phones world wide
• Out of the total 5B mobile phones
• According to go-gulf.com in 2012
15.2. What can measurements reveal?
• For ISPs to improve their services
• Resource provisioning and allocation
• Identification of bottlenecks
• More user friendly policies
• A direct way to measure QoE
• For end users and governments to enforce contracts and low
• QoS provided by the ISP
• Network neutrality
298
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Effect of firewall and other middleboxes
15.3. A widely heterogeneous environment
• Various mobile devices
• Android, iOS, RIM, MS, Symbian
• Various access technologies
• WiMax, Wifi, LTE, CDMA2000, HSDPA, UMTS
• A device context can consists of
• Network type, signal strength, cell ID, RRC/DRX state, etc.
• Device type, screen state, battery state, time of day, etc.
• Sensor data like GPS coordinates, acceleration, etc.
15.4. How do the different access technologies affect the
performance?
• Y. Guo et al. (2013)
• Measurement setting
• MobiPerf mobile application was used to collect data
• Available at Google Play
• TCP connections with randomized data transfer in 2-5 minutes
• The phone was kept stationary during the transfer
• Downlink: server to mobile device, Uplink: vice versa
• Laboratory experiments
• Throughput was sampled in every 0.5 seconds
• Measurements from the first 10 seconds were dropped
• Time is needed for stabilize the connection
• High variations can be experienced in the first 10 seconds
15.5. HSDPA downlink
299
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Positive correlation between signal strength and throughput
15.6. HSDPA Uplink
300
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Basically there is no correlation,
• but the throughput is very low
15.7. LTE Downlink
301
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
15.8. LTE Uplink
• Signal strength is a factor that affects the performance of wireless access esp. LTE
15.9. What is the problem with the first 10 seconds?
• LTE Downlink
• The TCP SlowStart period can be long
• For other access technologies it may be even worst
302
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
15.10. Large scale measurements
• J. Huang et al. (2013)
• MobiPerf had 99k users from
• across the world in 2009
• This much larger data set enables us to analyze characteristics on a larger scale
• Downlink/Uplink
• LDNS Lookup and Coverage
• Latency and reachability
303
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
15.11. Performance of different access technologies
Access technologies used by MobiPerf users
• Wifi, 3G (UMTS family and CDMA family), EDGE and GPRS
• Wihtin 3G: HSDPA, pure UMTS, 1xRTT, EVDOA
15.12. Performance of different access technologies
• Access technologies used by MobiPerf users
• Wifi, 3G(UMTS family and CDMA family), EDGE and GPRS
• Wihtin 3G: HSDPA, pure UMTS, 1xRTT, EVDOA
304
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Wifi has the highest throughput
• (Median: 1.46Mbps)
• GPRS and EDGE perform the worst
• UMTS family outperforms
• CDMA family
• (Median: 964 and 368 Kbps resp.)
15.13. Performance of different access technologies
• Access technologies used by MobiPerf users
• Wifi, 3G(UMTS family and CDMA family), EDGE and GPRS
• Wihtin 3G: HSDPA, pure UMTS, 1xRTT, EVDOA
305
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Wifi has the highest throughput
• (Median: 1.46Mbps)
• GPRS and EDGE perform the worst
• UMTS family outperforms
• CDMA family
• (Median: 964 and 368 Kbps resp.)
• WHY?
15.14. Performance of different access technologies
306
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• UMTS family results smaller RTT values than CDMA family
• (Median: 495 and 680 ms resp.)
• TCP throughput is lower with higher RTT and loss rate
15.15. Performance of different access technologies
307
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• 1xRTT is one of the earliest CDMA 3G technology
• It results high RTT and retransmission rate that degrade TCP throughput.
15.16. Performance of different access technologies
308
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• The variation of TCP downlink RTT is often called as jitter.
• Some applications do not tolerate high jitter, so it is also important for the design of mobile applications
• The experienced jitter
• Wifi - 41 ms
• UMTS family - 93 ms
• CDMA family - 233 ms
• It has influence on user experience
15.17. Performance of different access technologies
309
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• The uplink throughput difference is less obvious.
• The median for UMTS is 110 Kbps and 120 for CDMA family.
• Within 3G family, all the median uplink throughputs are below 150 Kbps.
15.18. Long-term trend and daily patterns
310
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Downlink throughput degrades in the network of AT'n T during peak time
• However, T-Mobile and Verizon users do not experience too much differences.
• There is some slight degradation only
15.19. Long-term trend and daily patterns
• While the RTT values for T-Mobile network seem to be stable.
• Downlink RTT explains the throughput fluctuation for AT'n T and Verizon.
15.20. Long-term trend and daily patterns
311
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• There is a significant increase in jitter during peak hours for AT'nT and Verizon, which may degrade the user
experience for applications having little tolerance on jitter.
15.21. Measuring DNS lookup time in 3G networks
• DNS queries to resolve the IP of a server located in the U.S.
• Median DNS lookup time for different areas (cell size is 50 km x 50 km)
312
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• ms
15.22. Measuring DNS lookup time in 3G networks
• DNS queries to resolve the IP of a server located in the U.S.
• ms
• High DNS overhead could lead to longer delays for downloading websites using CDNs with DNS-based load
balancing
• (e.g. Akamai)
15.23. Downlink throughput of major carriers in the U.S.
• In Kbps
313
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• For highly populated areas, service carriers provide better infrastructure with newer technology than in rural
areas with few users.
15.24. Cellular network policies
• Do the mobile operators use traffic differentiation?
• Methodology
• Scanning a set of ports from mobile devices
• Well known ports like FTP(21,22),SMTP(25), HTTP(80), etc.
• Both TCP and UDP
• Measure
• TCP Connect: the time between the sending of a TCP SYN packet and the receipt of SYN-ACK
• TCP Data: the time for the client to send a short unique message to the server and receive the response
message
• UDP Data: Similarly to TCP Data, the data transmission time for UDP
• TTL at client side: the TTL value in the probe packet received by the client. All the packets sent from the
server have an initial TTL of 64
15.25. Port scans for large carriers in the U.S.
314
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• AT'nT and Verizon show very similar behavior and there is no obvious difference across the ports.
15.26. Port scans for large carriers in the U.S.
315
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Two levels of connection times: around 70 and 100 ms
15.27. Port scans for large carriers in the U.S.
• Blocked TCP ports
• All the IP packets sent by the server set the TTL field to 64, but the client receives packets with much higher
TTL values.
316
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• 197 for SMTP, POP, HTTP,...
• 253 for FTP...
• There must exist a middlebox that rewrites packets going through.
15.28. FTP blocking in T-Mobile’s network
• Establishing an FTP connection on port 21
• UDP port 21 is not blocked
• Address spoofing by the middlebox
• SYN
• SYN/ACK
• ACK
• Data
• ACK
• DPI like solution since if the payload contains FTP commands the packets can go through the middlebox.
• Connection established
• (Three-way-handshake)
15.29. HTTP proxy port blocking in T-Mobile’s network
• HTTP proxy is on TCP port 8080
• Address spoofing by the middlebox
317
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• SYN
• SYN/ACK
• Any web servers running on port 8080 can not be accessed from T-mobile’s network. The port is totally
blocked.
15.30. How can these middleboxes affect user experience?
• Z. Wang et al. (2011)
• Problems with middleboxes
• Policies
• Application performance
• Peer-to-peer behind NAT
• Smartphone energy cost
• Security
• NetPiculet measurement system
• Former version of MobiPerf
15.31. IP spoofing
318
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• SRC IP=10.5.6.102
• 10.5.6.100
• 10.5.6.102
• DST IP=10.5.6.102
• It can significantly reduce the lifetime of the victim's battery
• 4 out of 60 carriers allow IP spoofing which could make their network vulnerable.
15.32. IP spoofing measurement
•
•
• 10.5.6.100
• 10.5.6.102
319
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• If the packet is received, the IP spoofing is allowed.
15.33. Short TCP connection timeout
• The default TCP keep alive timer of 2 hours is too large in cellular networks
• Firewalls terminates idle connections after a much shorter time period
• By sending RST packets to force the client to re-establish the connection
• It takes much more energy and time than sending keep alive messages more frequently.
• But idle connections occur: Facebook, Gmail, Gtalk, instant messengers, etc.
• DATA
• KEEP-ALIVE
• KEEP-ALIVE
• KEEP-ALIVE
• KEEP-ALIVE
• KEEP-ALIVE
• Sending keep alive messages consumes lots of energy, but it is still better than re-establishing the connection.
• Some carries apply TCP timeout timers of 5 minutes or shorter.
• And more than 10 percent of them have timeouts shorter than 10 minutes.
15.34. How short TCP connection timeouts affect energy
consumption?
320
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• Assuming long-lived TCP connections
• And a battery of 1350mAh
15.35. Packet reordering in the middleboxes
• Some firewalls buffer out-of-order TCP packets and send them inorder to the destination if they are available
• Packet reordering along the path
• Packet loss which happens more frequently
• The main problem that it disables TCP fast retransmission since the sender never receives duplicate ACKs
• P1
• P2
• P3
• Buffering out-of-order packets (P1 is missing)
• The middlebox is waiting for P1
321
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• P1
• P2
• P3
• P1
15.36. Packet reordering in the middleboxes
• The effect of packet loss
15.37. Packet reordering in the middleboxes
• 3G was emulated on WiFi
• 400 ms RTT and 1 percent loss rate
322
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• +44 percent downloading time
• The longer the downloading time, the more energy is consumed!!!
15.38. NAT traversal
• NAT mapping is crucial for NAT traversal
• Peer-to-peer applications, skype, instant messengers
• It defines how the NAT assign external port to each connection
• Different NAT mapping types
• Treated as random by existing traversal techniques
• Thus impossible to predict port
• Linearly increasing port numbers
• NAT
• NAT
323
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
15.39. NAT mapping in cellular networks
• Based on TCP connections to the server with random intervals
• The server records the observed source ports
• It's not random, so port prediction is feasible.
15.40. Mobile network measurement projects
• MobiPerf
• University of Michigen
• http://mobiperf.com
• PORTOLAN
• University of Pisa
• http://portolan.iet.unipi.it
• MySpeedTest
• Georgia Institute of Technology
324
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
• https://play.google.com/store/apps/details?id=com.num
15.41. Literature
• Y. Guo et al., "An In-depth Study of LTE: Effect of Network Protocol and Application Behavior on
Performance", ACM SIGCOMM, 2013. Accepted.
• Huang, Junxian, et al. Anatomizing application performance differences on smartphones. Proceedings of the
8th international conference on Mobile systems, applications, and services. ACM, 2010.
• Huang, Junxian, et al. Mobiperf: Mobile network measurement system. Technical report, Technical report).
University of Michigan and Microsoft Research, 2011.
• Wang, Zhaoguang, et al. An untold story of middleboxes in cellular networks. ACM SIGCOMM Computer
Communication Review. Vol. 41. No. 4. ACM, 2011.
325
Created by XMLmind XSL-FO Converter.