Download Institutionen för datavetenskap Estimating Internet-scale Quality of Service Parameters for VoIP Markus Niemelä

Document related concepts

K-means clustering wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Mixture model wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Institutionen för datavetenskap
Department of Computer and Information Science
Final thesis
Estimating Internet-scale Quality of Service
Parameters for VoIP
by
Markus Niemelä
LIU-IDA/LITH-EX-A--16/013—SE
2016-03-24
Linköpings universitet
SE-581 83 Linköping, Sweden
Linköpings universitet
581 83 Linköping
Linköpings universitet
Institutionen för datavetenskap
Final thesis
Estimating Internet-scale Quality of Service
Parameters for VoIP
by
Markus Niemelä
LIU-IDA/LITH-EX-A--16/013—SE
2016-03-24
Supervisor: Mikael Asplund
Examiner: Simin Nadjm-Tehrani
Presentation Date
2016-03-24
Publishing Date (Electronic version)
Department and Division
Software and Systems
Department of Computer and Information
Science
2016-04-22
Language
Type of Publication
ISBN (Licentiate thesis)
X English
Other (specify below)
Licentiate thesis
Degree thesis
Thesis C-level
X Thesis D-level
Report
Other (specify below)
ISRN: LIU-IDA/LITH-EX-A--16/013--SE
Number of Pages
50
Title of series (Licentiate thesis)
Series number/ISSN (Licentiate thesis)
URL, Electronic Version
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-127360
Publication Title
Estimating Internet-scale Quality of Service Parameters for VoIP
Author(s)
Markus Niemelä
Abstract
With the rising popularity of Voice over IP (VoIP) services, understanding the effects of a global network on Quality of
Service is critical for the providers of VoIP applications. This thesis builds on a model that analyzes the round trip time,
packet delay jitter, and packet loss between endpoints on an Autonomous System (AS) level, extending it by mapping AS
pairs onto an Internet topology. This model is used to produce a mean opinion score estimate. The mapping is introduced to
reduce the size of the problem in order to improve computation times and improve accuracy of estimates. The results of
testing show that estimating mean opinion score from this model is not desirable. It also shows that the path mapping does
not affect accuracy, but does improve computation times as the input data grows in volume.
Keywords
Voice over IP (VoIP), Quality of Service (QoS), cost estimation, QoS-driven routing
Abstract
With the rising popularity of Voice over IP (VoIP) services, understanding the
effects of a global network on Quality of Service is critical for the providers of
VoIP applications. This thesis builds on a model that analyzes the round trip
time, packet delay jitter, and packet loss between endpoints on an Autonomous
System (AS) level, extending it by mapping AS pairs onto an Internet topology.
This model is used to produce a mean opinion score estimate. The mapping is
introduced to reduce the size of the problem in order to improve computation
times and improve accuracy of estimates. The results of testing show that
estimating mean opinion score from this model is not desirable. It also shows
that the path mapping does not affect accuracy, but does improve computation
times as the input data grows in volume.
Contents
List of Abbreviations
iii
1 Introduction
1.1 Problem description . . . . . . . .
1.2 Approach . . . . . . . . . . . . . .
1.2.1 Network modeling . . . . .
1.2.2 Cost-based QoS evaluation
1.2.3 Evaluation of model . . . .
1.3 Related work . . . . . . . . . . . .
1.4 Limitations . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
3
5
5
5
6
2 Background
2.1 Autonomous systems . . . . .
2.2 Routing . . . . . . . . . . . .
2.3 Route Views . . . . . . . . .
2.4 AS relationship inference . . .
2.5 Quality of Service metrics . .
2.6 Additivity of QoS parameters
2.7 Machine learning algorithms .
2.7.1 Clustering . . . . . . .
2.7.2 Supervised learning .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
9
9
11
11
12
12
13
3 System design
3.1 System overview . . . . . . . . . . . . .
3.2 Network model . . . . . . . . . . . . . .
3.2.1 Path mapping . . . . . . . . . . .
3.2.2 Path component cost estimation
3.3 Mean opinion score estimator . . . . . .
3.3.1 Evaluation of alternatives . . . .
3.4 Design summary . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16
16
17
18
22
23
25
25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Evaluation
27
4.1 Test objectives and parameters . . . . . . . . . . . . . . . . . . . 27
4.2 Data set selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
i
4.3
4.4
4.5
Network model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mean opinion score estimator . . . . . . . . . . . . . . . . . . . .
Full system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Summary and conclusions
5.1 Summary of the outcomes . .
5.2 Lessons learned . . . . . . . .
5.3 Conclusions . . . . . . . . . .
5.4 Future work . . . . . . . . . .
5.4.1 Path mapping . . . . .
5.4.2 Parameter exploration
5.4.3 Alternative models . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
33
36
37
37
38
39
40
40
40
40
Bibliography
41
Appendices
45
A Source code
46
A.1 Constrained BFS Java implementation . . . . . . . . . . . . . . . 46
A.2 MATLAB estimate tests . . . . . . . . . . . . . . . . . . . . . . . 48
ii
List of Abbreviations
AS Autonomous System.
BGP Border Gateway Protocol.
C2P customer-to-provider.
CAIDA The Cooperative Association for Internet Data Analysis.
CI confidence interval.
ICMP Internet Control Message Protocol.
IGP Interior Gateway Protocol.
IP Internet Protocol.
ISP Internet Service Provider.
ITU The International Telecommunication Union.
LLSE Linear Least Squares Estimation.
MOS Mean Opinion Score.
MSE Mean Square Error.
P2C provider-to-customer.
P2P peer-to-peer.
QoS Quality of Service.
RMSE Root Mean Square Error.
RTT Round Trip Time.
S2S sibling-to-sibling.
VoIP Voice over IP.
iii
Chapter 1
Introduction
Real-time media quality between two or more endpoints is critical for the success
of Voice over IP (VoIP) applications. As providers of these services are only in
control over parts of the network on which they operate, they are today not
controlling how media packets are being routed between endpoints in individual
sessions. This is instead done by the routing policies of Internet entities outside
of the product’s control.
These global packet routes have tremendous impact on real-time communication quality, so in order to be more competitive and provide guaranteed realtime media communication quality, control must be established over real-time
media packet routing. The first step towards controlling real-time media packet
routing is understanding global network impact on real-time media quality.
1.1
Problem description
Today, a large amount of data is being logged for millions of VoIP calls every day
to monitor and give insight into call quality amongst other things. With such
a large amount of data, it is possible to infer quality properties of individual
links, such as Mean Opinion Score (MOS), Round Trip Time (RTT), packet loss,
packet delay jitter, and available bandwidth. However, even on the highest level
where connections between Autonomous System (AS) pairs are considered, there
are thousands of endpoints and millions of pairs of such endpoints, numbers
that are steadily growing. Trying to infer parameters for all of these pairs and
endpoints is a large problem, which takes a long time to calculate.
Additionally, many pairs may not have up-to-date data reflecting the true
characteristics of the connection between the pairs’ endpoints. This is because
the Internet is constantly changing, and the number of calls logged in a day is
dwarfed by the number of possible AS pairs.
As a first step toward controlling real-time media packet routing, the goal
is to be able to make an informed decision on whether to relay a call through
company controlled ASs or not, weighing costs against quality of service benefits.
1
This means that identifying calls where the experience is expected to be poor
and being able to offer alternative routes for these is of particular interest.
Company Network
C
A
D
Internet
B
Figure 1.1: Two potential paths for a VoIP call to take between A and B.
The choice is illustrated in Figure 1.1. A call between AS A and AS B
normally takes a path through other ASs on the Internet over which the VoIP
provider has no control, represented by the solid edge in the figure. The dotted
edges represent the alternative path where the call is relayed through a network
over which the provider does have control. While the provider still has no control
over edges AC and DB, it can now influence edge CD. Data for the full dotted
path is not available however, as traffic has not yet been routed through this
path. Therefore it is desirable to infer this from the individual components of
the path.
The problem’s complexity lies in the fact that there is a large number of
combinations of endpoints, and it keeps growing. There are billions of potential
pairings, and trying to solve for all of the Quality of Service (QoS) contributions
of these is infeasible for two reasons: data will not be available for all these pairs,
and the resources required to process the amounts of data required would be
immense. How can these billions of decisions be handled efficiently then?
Internally at Skype, one provider of VoIP services, a model to try to keep
track of all of these pairs for which there is data has been developed. However,
this model is considered to be slow and not scalable by the company. To be
able to put this into practical use, the complexity must be reduced somehow.
This thesis proposes and evaluates an improved model that aims to be faster
at processing the large number of AS pairs required to make well-informed
decisions. The goal is to reduce it to a more manageable state in order to be
able to analyze the data quickly and reliably. The model would be applied to
the large amounts of data prior to call set-up, and could then be used to provide
real-time information about the expected quality of service of route alternatives
during call set-up.
The purpose of the thesis is to provide a basis for evaluating whether such a
2
model can be of use in making relay decisions. This will be done by comparing
it to the existing model in accuracy of estimates as well as computation times.
1.2
Approach
To evaluate QoS given the problem description, we approach the problem in
two parts as described in this section.
1.2.1
Network modeling
The existing model describes all possible AS pairs and the QoS metrics associated with them by keeping track of every individual pair. This model represents
the network as a complete graph of AS vertices. That is, each of the vertices
is directly connected to all other vertices, as in Figure 1.2. Recognizing that
quality metrics can be affected by network connections within an AS as well as
between a pair, inferring these costs from call data becomes an expensive problem. With n vertices, there are N = n2 cost variables to calculate. For C data
points, where C > N , using a basic Linear Least Squares Estimation (LLSE)
algorithm to solve for the costs has a time complexity of O(N 2 C) = O(n4 C).
Currently, there are over 50000 ASs present on the Internet [1], which gives the
following:
n > 50000 =⇒ n4 > 6.25 ∗ 1018
It is unlikely that all of these AS pairs will be directly involved with VoIP
calls, but as the number of ASs is constantly growing, this may well be reasonable in the future if we assume that the proportion of ASs involved in VoIP calls
does not drop. Adding that the number of data points C must be greater than
N = n2 , this is a daunting size. This model will henceforth be referred to as
the naı̈ve model, as the way it represents the Internet is simplified compared to
the next model.
A
B
C
D
E
F
Figure 1.2: A naı̈ve AS relationship graph.
3
Clearly the Internet is not a complete graph, and in practice most ASs are
directly connected to very few other ASs, routing traffic towards each other
using the Border Gateway Protocol (BGP). The naı̈ve model would consider
a path between pairs to be a direct connection, independent of all other ASs
pairs, but knowing that this is not the case, we can model it differently. A path
between two ASs is in fact a combination of direct connections between ASs. An
example is shown in Figure 1.3. Here, a path between D and A is D → B → A,
including two direct connections. Note that both of these connections are also
used in paths connecting other AS pairs in the graph. Comparing to Figure 1.2
there are much fewer direct connections, but all ASs still have paths to each
other. With n vertices, there are still n2 pairs, but each pair maps to a path P
that is a subset of the m direct connections between vertices e on the Internet.
(ASi , ASj ) 7→ Pi,j ⊂ {ek : k ∈ [1, m]}∀i, j ∈ [1, n]
Necessarily, m ≤ n2 , and looking at BGP routing tables reveals that m appears
to have a growth that is closer to O(n) than it is to O(n2 ). With the numbers
from earlier, this could mean a potential improvement of 9 orders of magnitude
when scaled up fully. It should be noted that while the number of subsets
possible is 2n , there is generally only one correct path for each pair.
A
B
D
C
E
F
Figure 1.3: A topological AS relationship graph.
With this approach, calculating the cost of connections between pairs turns
into two problems. The problem of inferring costs for edges and vertices of the
graph remains, but with the new time complexity O(m2 C). To achieve this, the
problem of mapping AS pairs to paths is introduced. Consequently, finding a
good method of solving this is necessary for this proposed model to be feasible
to implement.
When evaluating the chosen model, we will measure the computation speedup of the model, as well as look at whether or not the produced results are
able to identify poor paths and good alternative paths which will help in the
decision-making process.
4
1.2.2
Cost-based QoS evaluation
Given a graph that has been populated with costs, in order to be able to make
a decision about whether to relay a call or not, we need to be able to quantify
the QoS of a call between a pair of ASs somehow. It may not be immediately
obvious what impact the QoS metrics in the model have on the actual perceived
quality of a call; even if a relayed call might improve the metrics, a non-relayed
call may still provide a sufficiently high QoS.
MOS is generally used to measure the subjective QoS, but it cannot be modelled using the above approach, as it is modelled as a more complex function
of other QoS parameters. There is however a relationship between the measurable network metrics and MOS, meaning the decision-making could be reduced
to comparing single numbers. This relationship however is not obvious, as is
noted in Section 2.5, and other explanatory variables may be necessary to obtain
satisfactory results.
In order to evaluate the implemented model’s usefulness for decision-making,
MOS will be used. As such, models to map network metric measurements to
MOS will be explored and their accuracy evaluated.
1.2.3
Evaluation of model
To evaluate the network model and MOS estimation model, the following questions need to be answered:
• Is the proposed network model as reliable as the current naı̈ve model?
• Does performance in terms of computation time improve in practice when
comparing the proposed model to the current naı̈ve model?
• Can the model directly predict user experience by translating network
metrics into MOS?
1.3
Related work
The behavior and performance of traffic in networks has long been studied, and
advances are made constantly.
The Cooperative Association for Internet Data Analysis (CAIDA) runs a
project that studies the topology of the Internet as a whole, trying to identify
the relationships between entities in the network [10, 11]. The ability to observe
and infer these relationships is a challenging endeavour, often imprecise, but has
served as an inspiration for the ideas in this thesis. The topology dataset [2] is
primarily concerned with classifying ASs and the connections between them.
Several other papers have looked at Internet topology and AS relationships
as well [13, 14] and the effects of the paths traffic takes as a result [14, 23, 24].
That the Internet is organized in a way that is suboptimal for the traffic types
today is found to be quite clear, but no solutions are found that would be easy
to implement on the scale of the Internet. As the premise of the model proposed
5
in this thesis assumes knowledge of Internet topology, these studies are highly
interesting to investigate. They are discussed further in Sections 2.3 and 2.4.
Methods for real-time measurements of some QoS metrics have been studied
as well. Using DNS servers, reasonably accurate estimates of RTT and packet
loss have been able to be obtained [15, 25]. These measurements enable more
research to be performed, though it is not appropriate for the purposes of performing real-time analysis at call set-up in VoIP applications, as it is much too
time consuming.
The Online Network Traffic Characterization (ONTIC) project is an ongoing
project which seeks to explore new ways to characterize network traffic online
using large scale data analysis. In one of the project deliverables [4], they
provide a collection of the current state of the art in offline algorithms used to
this end. The document, based on 101 sources, gathers not only the algorithms,
but also the frameworks primarily used to implement them. As it stands today,
Apache Hadoop and Apache Spark are the most important frameworks used,
taking advantage of distributed computing. Apache Hadoop in particular has
been widely used for many years, and is based on the MapReduce paradigm, in
which mappers in one step perform transformations of independent data, and
reducers then aggregate the results.
The ONTIC project is very interesting for the field of study, however the
algorithms described in the deliverable are ones for classification only. As the
problem statement for this thesis involves the comparison of routes between
two points, classification does not seem suitable, as there could be potential
improvements even within the elements mapped to the same class.
Component-based QoS effects are studied in[26], where real-time application
systems are considered. QoS effects are here attributed to components on a
much lower level than in this thesis, but the original naı̈ve model described in
this thesis is one that has been developed based on this work. This type of
model is found to be useful in identifying long-term QoS issues, shifting focus
away from short-term problems with components.
1.4
Limitations
• The parameters considered in the model are RTT, packet delay jitter, and
packet loss. Bandwidth as a cost variable is omitted from the model due
to the complexity in both measuring and modeling it. Unlike the other
parameters, it is not additive and would require a different approach. (See
Section 2.6)
• Given the time frame and amounts of data processed, it is not possible to
request any new kind of data collection to be performed. The evaluation
must therefore rely on data already available. For call data, this means
that MOS entries are available for only a small fraction of the potential
data. Additionally, all route data has been gathered externally (See Section 2.3), even though this could potentially be gathered via calls in the
6
future. As a result of these two data limitations, the actual number of AS
pairs that would be involved in a large-scale implementation of this model
will not be reached.
• The actual decision-making process is not in the scope of this thesis, as it is
largely a business decision. That is, no attempt to compare the potential
paths is made, as this requires information about what paths could be
used for relaying the calls, as well as when it would be profitable to relay.
The intent is solely to provide a basis to make such a decision.
7
Chapter 2
Background
This chapter will first introduce some concepts related to Internet routing to
understand how the Internet can be modeled appropriately (Section 2.1-2.4).
Then the QoS metrics that will be considered for the model and the methods
with which to analyze and estimate them will be presented (Section 2.5-2.7).
The terms route and path are used interchangeably throughout the thesis.
2.1
Autonomous systems
The Internet is a large network of routers connecting hosts to each other. In
order to manage the scale of the Internet and allow traffic to be routed correctly,
the Internet has been divided up into autonomous systems, all with independently managed internal routing. These ASs connect to each other through
Internet Exchange Points or direct links. Formally, an autonomous system is
defined by RFC 1930 to be ”a connected group of one or more Internet Protocol (IP) prefixes run by one or more network operators which has a single and
clearly defined routing policy”[17], where prefixes are contiguous blocks of IP
addresses.
At least 50,507 ASs were in use as of May 6th, 2015, a number that has seen
a consistent growth of roughly 3,000 per year over the last decade [1].
2.2
Routing
When routing traffic within an AS, any Interior Gateway Protocol (IGP) may
be used to determine the most suitable path. Some ASs may opt to consider
router hop count, while others use more complex metrics when selecting paths
[23].
When routing traffic between ASs, the protocol used is the path vector
protocol BGP. Paths are found by advertising routes to neighboring ASs, and
selecting the best route among the ones known. While ties between best routes
are broken by AS path length, the protocol first considers a local preference
8
value for path selection. This value depends on a local policy, the specifics
of which are decided upon at the discretion of the AS administrator. The
policy can be affected by any number of things, not all of which are related to
path performance optimality [24]. For example, network operators may have
agreements with each other that have to be honored, or they might have to
handle large amounts of traffic through load balancing [23].
Studies have shown that packets very frequently are not routed using the
shortest path, neither on the router level [24] nor on the AS level [14]. Understanding the routing decisions made is clearly quite a complex problem,
especially as routing policies are not obvious to an observer.
2.3
Route Views
University of Oregon’s Route Views [3] is a project which collects information
about the global routing system, and publishes it for public use. It has a number
of participating routers around the world that contribute with their BGP routing
tables every two hours, showing snapshots of their point of view of the Internet.
Each entry in these tables contains the following information:
• Network - The IP network reachable through this route
• Next Hop - The IP address to which traffic will be forwarded on this route
• Metric - A value used to discriminate between different points of access
to a neighboring AS
• LocPrf - The local preference value of the route, determined by BGP
policy
• Weight - A local, Cisco specific preference value, not shared through BGP
• Path - The AS path taken to reach the network through this path
Of particular interest for this thesis is the Path field, which provides information
on the connections between ASs. The Route Views data is not a complete set
of paths, as it only contains a few viewpoints, but with this information it is
possible to construct a graph representation of the Internet, which can be used
to infer additional possible paths for traffic to flow. However, as the next section
will elaborate on, due to routing policies such a graph is not sufficient to find
the paths that traffic actually takes.
2.4
AS relationship inference
ASs are typically owned by a company or an Internet Service Provider (ISP),
who pays a higher-tier ISP to transit its traffic to the rest of the Internet. Some
ASs might be connected to multiple higher level ISPs, yet they do not advertise
to these ISPs that these additional connections are possible routes [23]. That is,
9
the ISPs will only see the paths through the AS to lower-tier ISPs as available.
It is clear that the existence of a connection does not mean that traffic will be
routed through it for various reasons. To better understand how ASs interact,
a classification scheme for AS relationships was proposed by Gao [13].
The proposed scheme divides direct connections between ASs into the following four categories: provider-to-customer (P2C), customer-to-provider (C2P),
peer-to-peer (P2P), and sibling-to-sibling (S2S). X is considered a provider for
Y, considered the customer, if X transits traffic for Y but not the other way
around. If X and Y transit traffic for each other, they are considered siblings,
and if neither transits traffic for the other, they are considered peers.
CAIDA, analyzes Route Views BGP tables in order to gain a better understanding of how ASs are related to one another. They base this on the
classifications of Gao, but apply more recent algorithms to infer the relationships. [10, 11] The results of their analysis is continually published in the form
of relationship data, where they list all links between ASs, and whether they
are provider-to-customer or peer-to-peer. Customer-to-provider relationships
are implicit in the data sets. Sibling-to-sibling relationships are not reported
in the data sets, although they state that they infer such relationships in their
analysis as well. Since they are not present in the data sets however, this type
of relationship will be omitted from further discussion.
A
Customer-to-Provider
B
Peer-to-Peer
C
F
D
G
E
H
Figure 2.1: An AS graph with relationships.
Within this framework, some conclusions can be drawn about valid paths in
an AS graph. Valid paths must be valley-free, which is defined as: after passing
a provider-to-customer or peer-to-peer edge, a path may not pass a peer-to-peer
or customer-to-provider edge anymore. Violations of this principal would mean
that an ISP would be transiting traffic for other ISPs without getting paid by
anyone, which would not be in their best interests.
To illustrate, in Figure 2.1, an example of AS relationships is shown with ASs
organized in vertical levels by tier. Consider the case where AS F wishes to send
traffic to AS H. There are two possible valid paths: F → C → A → D → H and
F → C → D → H. These paths first travel up toward providers, then across 0
or 1 peer-to-peer edge, and finally down toward customers.
10
There are also four invalid paths between the two ASs, ignoring looping
paths. For example, in the path F → C → A → D → B → E → H, the
section A → D → B first travels across a provider-to-customer edge, and then
across a customer-to-provider, creating a valley. The other paths contain similar
violations, causing D to not act as a provider for anyone, yet still transit traffic.
2.5
Quality of Service metrics
The following network metrics that are known to impact QoS [19] will be considered in this thesis:
• Round-trip time (RTT) - The time it takes for a packet to travel from the
sender to the receiver, and a response to return.
• Packet delay jitter - The variance of the time it takes for a packet to
travel one way between sender and receiver. Also known as Packet Delay
Variation, but will be referred to as simply jitter in this thesis.
• Packet loss - The percentage of packets that are lost in transit between
sender and receiver.
These metrics have been collected alongside a subjective evaluation of call quality made by the call participant after the call has concluded, which is known as
Mean Opinion Score (MOS). The participant is asked to rate the call on a scale
from 1 to 5, with 1 being ”bad”, 2 being ”poor”, 3 being ”fair”, 4 being ”good”,
and 5 being ”excellent”. MOS is the arithmetic mean of these subjective ratings.
When the setting for calls is controlled, the expected MOS can be calculated
using a formula defined in ITU-T G.107 [18]. This formula depends on additional metrics however, some of which, like codec-specific parameters, cannot
be measured or calculated easily [5] and are not part of the data available.
2.6
Additivity of QoS parameters
An additive function is a function f that satisfies the following equality:
f (x + y) = f (x) + f (y)
for any x, y in the domain of the function.
In order to apply linear regression to the parameters under observation, we
must have them in an additive form (See Section 2.7.2).
RTT can be assumed additive as is: twice the geographical distance should
result in twice the round trip time.
Jitter is a measurement of the variance of the one-way delay, and the performance of individual components in our network are independent of each other.
11
As a result, it too can be considered additive, as the following is true for all
components X, Y :
V ar(X + Y ) = V ar(X) + V ar(Y ) + Cov(X, Y )
Cov(X, Y ) = 0
The total packet loss over a path is a product of individual packet loss factors,
with a fraction of packets being dropped in individual components in a path. It
can be calculated as such:
Y
LP =
(1 − Li )
i∈P
Where LP is the total packet loss over a path P , and Li is the packet loss over
component i. We can transform this into the following additive form:
X
log(1 − Li )
log(LP ) =
i∈P
As a note, the bandwidth of a path is determined by the lowest bandwidth
of the components on the path:
BP = min(Bi )
i∈P
Where BP is the bandwidth available over a path P , and Li is the bandwidth
available over component i. There is no way to reasonably transform this in
order to make it additive, which is why it is not considered in the proposed
model.
2.7
Machine learning algorithms
Machine learning as a field is concerned with learning from data and being able
to make predictions based on this, utilizing algorithms that can adapt to the data
it is provided. The algorithms are often iterative, requiring more computations
to reach a good result, but may require less assumptions be made about the
data. Algorithms that have been considered suitable to try to estimate MOS
are described in this section, as well as the LLSE algorithm used to compare
the two network models.
2.7.1
Clustering
Clustering algorithms are algorithms that aim to group items into sets in such
a way that the items in each set are more closely related to each other than to
those of other sets. This definition is quite broad, and as such, there are many
clustering algorithms to accommodate the many different kinds of clustering
that may be desired [12]. Clustering algorithms can be beneficial when it is not
obvious how the data points are related, and may as such be interesting for MOS
calculations, where several parameters together map into [1, 5]. Two clustering
algorithms that appeared to be good candidates for the kind of clustering desired
are described here.
12
k-means clustering
k-means clustering is a centroid clustering algorithm [16]. Its goal is to partition
all data points into sets, minimizing the squared euclidean distance from each
member of the set to the mean of the set, the centroid. Initially, k centroids
are randomly generated. The algorithm then alternates between two steps until
convergence:
1. Assign each of the data points to the cluster to which the euclidean distance to the centroid is the smallest.
2. Recalculate the position of the centroid for each cluster.
k-means clustering is considered a hard clustering algorithm, meaning each
data point is assigned to a single cluster.
Gaussian mixture model clustering
Gaussian mixture model (GMM) clustering is a distribution-based clustering
model which uses the expectation-maximization algorithm to train the model
[16]. This algorithm will produce probabilities that a data point belongs to
a certain distribution, having iteratively maximized the log-likelihood of these
distribution predictions. Initially, k distributions are randomly generated. The
algorithm then alternates between two steps until convergence:
1. Compute the probabilities of belonging to each cluster for every data point
using current model parameters.
2. Based on the probability that a data point belongs to a cluster, recompute
the mean and variances of the distributions.
GMM clustering is considered a soft clustering algorithm, meaning each data
point may be assigned to several clusters. It can however be used as a hard
clustering algorithm by assigning each data point to the cluster it has the highest
probability of belonging to.
2.7.2
Supervised learning
Supervised learning algorithms are algorithms that seek to infer a function from
a set of training data. The training data takes the form of input values X, and
corresponding output values Y. The function produced should then be able to
map unseen input values to the correct output value.
There are many supervised learning algorithms with different strengths and
weaknesses. They can be divided into two different categories: regression, which
deals with functions mapping to values, and classification, which deals with functions mapping to different categories. Since a comparison between the values
of QoS parameters is sought, we want to map to values, so we need regression
algorithms. Here we will look at two ensemble learning algorithms, as these
13
are well-established methods of for increasing accuracy of machine learning algorithms [9]. The Linear Least Squares Estimation algorithm is also described,
but it is not considered appropriate for MOS estimation, as MOS is not a linear
function of other QoS parameters [18].
Ensemble learning
In ensemble learning, an ensemble can be considered a ”committee” of learning
algorithms. The individual algorithms in the ensemble weight their results and
produce a common result that in many cases is better than that of the individual
learning algorithms [16].
Bagging Bagging, or bootstrap aggregating, is an ensemble method that averages a number of bootstrapped models to reduce the variance in its predictions
[16]. Bootstrapping refers to sampling from the data set with replacement, resulting in training sets for the model that have a number of duplicate elements.
After training the individual models, each of the models’ outputs are averaged to obtain the bagged model’s final output.
Boosting Boosting is an ensemble method that trains itself iteratively on
the provided data set by focusing on data points that were poorly predicted
[16]. Initially, all data points are weighted equally, and the underlying model is
trained on the set. Each data point is then in each step weighted by the square
error of the aggregated prediction of the models trained so far and the actual
value. This causes the next iteration to focus more on data points that were
poorly predicted previously.
Decision trees Decision trees are often used as the underlying model for
ensemble learning, as it is a fast algorithm [16]. The trees are built by analyzing
all possible binary splits of the training data, and then selecting the one that
gives the lowest mean square error. This is then done recursively for each node
in the tree until a stopping criteria is reached, such as maximum tree depth, or
the number of observations in the node being too small to continue. The model
can then be queried for predictions easily by following the correct splits down
the tree.
Linear least squares estimation (LLSE)
LLSE is a way to fit a linear model to data that isn’t fully explained by the
model in question by minimizing the square of the residual errors.
Consider the case where we have a relationship of the form
x1 β1 + x2 β2 + ... + xN βN = y
where xi and y can be measured, and we wish to determine βi , for i ∈ [1, N ]. To
do so, we require C equations of this type, where C > N . This can be written
14
using matrix form as
Xβ = y
We are then interested in finding the best set of coefficients β̂ that minimizes
C
X
i=1
|yi −
N
X
j=1
Xij β̂j |2
The solution to this minimization problem, given that the columns of X are
linearly independent, is given by solving the equation
(XT X)β̂ = XT y
The time complexity of solving this when utilizing parallel computing is
given in [7] as
N3
CN 2
+
+ N 2 log(P )
O
P
P
3
2
with P cores. As C > N , NP is dominated by CN
P . In any practical applica2
C
2
tion, P > log(P ) as well, so N log(P ) is dominated by CN
P . Thus, the time
complexity of the algorithm can be considered
O
CN 2 P
This is of interest as we seek to compare the models on their computation
times in particular.
15
Chapter 3
System design
There are many ways the problem described in Section 1.1 could be approached,
with their own advantages and disadvantages. This chapter will cover the design
choices made in the implementation of the approach described in Section 1.2.
A brief summary of the design chosen is included at the end of the chapter.
3.1
System overview
The two main components of the system are the network model, which is responsible for estimating the network QoS metrics of the expected path for the
traffic, and the MOS model, which is responsible for translating these metrics
into a single measurement of QoS.
Path endpoints
{start end [relay]}
Network Model
Network metrics
MOS Model
MOS estimate
Figure 3.1: Full system query.
Figure 3.1 illustrates the expected use case of the system. It will be queried
with a list of at least 2 ASs: the start AS, the end AS, and an optional list of
relay ASs. This set is evaluated by the network model, which calculate network
metrics that are passed on to the MOS model for translation. The estimated
MOS value is then returned as the response to our initial query.
If the optimal relay points for a call are known, then this system can provide
enough information for a relay decision to be made with two queries: one with
only the start and end ASs provided, and one with the relay ASs included.
If there are certain known thresholds on QoS metrics, it is plausible that
a decision could be made without the presence of the MOS model; no such
assumption is made here however. Should such information be available though,
the two components are easily decoupled by design.
16
3.2
Network model
The network model is the component of the system that is of particular interest
in this thesis. How can we attempt to accurately model the connections in
such a way that useful information is obtained? What are the pros and cons of
different approaches?
A straightforward way of designing the network model would be to directly
assign the network metrics to the input set of ASs. This might be done through
simply collecting data on previous calls for that connection and performing some
analysis on it. Figure 3.2 illustrates how this would be designed.
Call data collection
Call network metrics
Path endpoints
{start end}
Cost estimation
Network metrics
Figure 3.2: A simple network model design alternative.
While such a direct approach would likely be quite successful given enough
data, there are some concerns with it. For example, if little or no data exists
for a particular path, estimating network metrics becomes hard or impossible.
This model does not require any information about the network infrastructure,
but consequently does not give any new information about it either.
Another approach is to utilize information of the network infrastructure and
divide paths into network components, assigning network metric costs to each
component. Figure 3.3 shows what a query would look like in that system.
Call data collection
Call network metrics
Path endpoints
{start end [relay]}
Mapping
Path components
Cost estimation
Network metrics
Figure 3.3: A path mapping network model design alternative.
There is additional complexity in this design, as it requires the addition of
the path mapping component, but in return it has the potential to return some
information about the behavior of individual components of the infrastructure.
It could also allow for the prediction of call metrics of previously unseen AS
17
pairs. To illustrate, consider the connections in Figure 3.4. Assume that we
have previously seen a number of calls from AS A to AS C, and from AS B to
AS D. As each node and edge in this model is considered a component with
associated costs, we could now infer costs for a call between B and C by adding
the component costs together.
A
B
E
C
D
Figure 3.4: Four ASs interconnected through a common AS.
While both of these approaches would be interesting to explore, the latter
has been chosen for its greater perceived potential. The following subsections
will go into further detail about the path mapping and cost estimation system
components and their design.
3.2.1
Path mapping
With the chosen approach, mapping paths to the correct set of components is
crucial. If this is not done correctly, large components might be erroneously
assessed and applied to other paths, leading to unreliable model output. Some
different options have been explored to approach the mapping problem, described below.
Least hops model
The first approach to modeling the Internet infrastructure is based on the
CAIDA dataset of AS relationships [2]. It uses the known relationships between ASs to infer paths between any given pair. While it is known that the
shortest path using direct connections between ASs is often not the path used
in practice [14], with this model we seek to investigate whether the relationships
presented in Section 2.4 can be applied to produce valid path results.
The first thing to do is construct a graph representation of the network. The
CAIDA dataset provides two files, both of which we use to construct this graph.
18
The first contains a list of all ASs reachable downstream from every AS. More
importantly, it lists every AS in the dataset, so we can construct the set of all
nodes from this file. The other file contains a list of every relationship that has
been inferred, and its type: peer-to-peer or provider-to-customer.
43
Customer-to-Provider
79
Peer-to-Peer
55
1
15
32
4
8
Figure 3.5: An AS graph with relationships and AS numbers.
0
1
6,C2P
1
4
6,C2P
2
8
3,C2P 4,C2P
3
15
2,P2C 4,P2P 7,C2P
4
32
2,P2C 3,P2P 5,C2P 6,P2P 7,C2P
5
43
4,P2C 6,P2C
6
55
0,P2C 1,P2C 4,P2P 5,C2P
7
79
3,P2C 4,P2C
Figure 3.6: An AS number list and an adjacency list corresponding to the graph
in Figure 3.5. Indices are on the left.
The number of ASs and relationships is small, so an adjacency list can be
used to represent the graph in-memory. To illustrate, consider the graph in
Figure 3.5. It is the same graph as in Figure 2.1, but with ASs numbered. The
way this graph would be represented is shown in Figure 3.6. We keep a sorted
list of the AS numbers present in the graph, and have the indices of that list
correspond to the indices of the adjacency list. Each entry in the adjacency list
is then a list of the relationships that AS has, in the form of the AS index and
the relationship type.
19
Given this information, we can do a breadth first search on the graph to
find viable paths, subject to the constraints described in Section 2.4. A Java
implementation of this can be found in Appendix A.1. This will return two
different sets of distances and parents for each of the nodes in the graph: one
where future path options are constrained by the path taken thus far, and one
where they are not. This is of interest as the shortest path to a particular node
might not be part of the shortest path to its neighbor. Therefore, in order to
properly reconstruct paths, we must keep track of the best paths for both these
cases.
Having computed this from a source node, we can then recursively reconstruct the set of paths between the source and any other node in the graph as
shown in Algorithm 1. Calling the function initially, the parameter d is set to
the shortest of the two distances that have been found for the node. On the
recursive calls however, it is simply decremented by 1 for each call, preventing
an invalid path from being reconstructed by picking the locally optimal distance
for the current node.
Algorithm 1 Path Reconstruction
Require: An end node e. The distance d to e on this path. The distance Df
and the direct parents Af for each node on the path from a start node without traversing a P2P or P2C link. The distance Dc and the direct parents
Ac for each node on the path from a start node allowing for traversing P2P
or P2C links. A function rel(u, v) returning the edge connecting nodes u
and v.
Ensure: The set of shortest paths to e, P .
1: procedure ReconstructPaths(e, d, Df , Af , Dc , Ac )
2:
P ← {∅}
3:
Let D, A be Df , Af or Dc , Ac so that D = d
4:
for v ∈ A do
5:
P 0 ← ReconstructPaths(v, d − 1, Df , Af , Dc , Ac )
6:
for q ∈ P 0 do
7:
q ← q ∪ {e, rel(v, e)}
8:
P ← P ∪ {q}
9:
if P = {∅} then
10:
P ← {{e}}
11:
return P
The main advantage of this model is that the topology information is readily
available and requires little memory to hold the graph. However, computation
times can be large unless the input data is sorted appropriately in order to minimize the number of times that the constrained breadth first search algorithm
must run. For an initial run it is possible to sort large amounts of call data on
source AS, but for random queries no such sorting can be relied on.
Unfortunately, the output of this model will in the general case find multiple
paths of the same length despite the added constraints imposed. With the
20
topology data used, there is no way to identify all policies applied, so paths
that other traffic takes can appear as a valid path for the pair we are interested
in. It is also true that it may be the case that none of the paths returned is
the correct one, as policies could further inflate path length [14]. With such
unpredictable results, this model is clearly unsuitable.
Known path model
An alternative approach is to collect data on actual paths taken by traffic and
use this to map. This is equivalent to keeping routing tables for all ASs, which
can then be used to construct the paths. As stated in Section 1.4, complete
traceroute data that could give a near complete view of the relevant Internet
routes is not available to us. Thus, for this thesis, the Route Views data set
described in Section 2.3 is used. This limits the amount of paths that are known
to the ones visible through the Route Views routers, but should provide enough
known paths to evaluate the model.
To construct the routing tables, the AS Path field of the Route Views BGP
routing table entries is used. As it gives the complete path from the start AS
to the end AS, the path can simply be stepped through to create the next hop
entry for each AS in the path. Since BGP propagates paths, it is certain that
a path that is a tail of this AS path will be a correct path as well. Had there
been a different preferred path for such a tail, then that would also have been
reflected in the full path. A map data structure is used to store the (start, end)
AS pair and its next hop AS. Reconstructing the path is then a trivial task, as
seen in Algorithm 2.
Algorithm 2 Known Path Construction
Require: A start node s. An end node e. A function NextHop(s, e) that
returns the next hop from s on the path to e. A function rel(u, v) returning
the edge connecting nodes u and v.
Ensure: The components on the path P from s to e.
1: procedure ConstructPath(s, e)
2:
P ← {∅}
3:
while s 6= e do
4:
P ← P ∪ {s, rel(s, e)}
5:
s ← NextHop(s, e)
6:
P ← P ∪ {e}
7:
return P
The most important advantage of this model is that the paths are accurate.
While paths may change over time, and changes should be monitored, the model
provides one path as a response. The case of load balancing has not been
considered here, though allowing for the construction of multiple paths would
require only small modifications to the model. More interesting is how the
multiple paths would be handled by the cost inference, as well as how a decision
21
of path selection would be made if the paths differ in quality.
This model is also faster than the least hops model, as it does not need
to search a graph every time it is queried, but at the cost of having all path
information stored. This may require quite a bit of memory as the available
routing information grows.
The big disadvantage of course is the data collection, requiring a reliable
way of obtaining path information. Given that obtaining such data is realistic though, this model is the one that has been chosen as the path mapping
component.
3.2.2
Path component cost estimation
Going back to Figure 3.3, we now turn to the second part of the network model.
Given the path components from the mapping, network metrics are to be estimated.
Training the cost estimation component can be visualized as in Figure 3.7.
The first step is again to collect call data to be used for inferring component
costs. Before we can continue with that however, the AS pairs must be provided
to the mapping system component in order for them to be translated into path
components, which are returned to the cost inference step. Now the the call
data can be analyzed and finally the inferred component costs can be provided
for the cost estimation.
Call data collection
1
2
Mapping
Cost inference
3
4
Cost estimation
Figure 3.7: Training the cost estimation component.
Cost inference
LLSE has been chosen as the method to analyze the call data. This is due
to the naı̈ve model having used this method as well, and we wish to be able to
make a fair comparison between the models. We have assumed that our network
metrics are additive, and that components are independent of one another, so
this is a valid choice. We also know that the algorithm is scalable when our
data set grows [7].
22
The problem to solve is, as stated in Section 2.7.2
Xβ = Y
where X is a matrix with rows corresponding to data points, and columns to
path components. The matrix simply consists of 1’s and 0’s, signifying whether a
component is part of a row’s path or not. Y is a matrix with rows corresponding
to those of X, and columns to the network metrics, transformed according to
Section 2.6.
Cost estimation
Having calculated β, cost estimation is trivial. With a query x, equivalent to a
row in X previously, the estimate ŷ is obtained by
xβ = ŷ
Of course, having transformed the metrics to their additive representations,
they must be transformed back before they are returned from this step.
3.3
Mean opinion score estimator
The second main component of the system is the MOS estimator, which helps
with interpreting the values from the network model. As no models that describe
MOS using only the parameters available in our data have been found, we will
apply statistical analysis to attempt to produce a useful estimate.
Several methods are proposed and tested to identify the model with the
best results. The tests and results are detailed in Chapter 4, while the resulting
design decisions are outlined here.
Call data collection
Network metric
training set
RTT, jitter,
packet loss
Learner ensemble
MOS estimation
MOS estimate
Figure 3.8: A MOS estimation model with ensemble learning applied directly
to call data.
The first method considered is to apply supervised learning algorithms directly to the call data. In doing so, no assumptions about the relationship
between network metrics and MOS are made.
The supervised learning algorithms considered are the ones from Section 2.7,
the ensemble learning algorithms. In Figure 3.8, a boosting or bagging ensemble
23
is trained with RTT, jitter, and packet loss as input paramters, and reported
MOS values as output values. The ensemble is then directly queried to obtain
an estimated MOS value.
Call data collection
Network metric
training set
RTT, jitter,
packet loss
Clustering
MOS estimation
MOS estimate
Figure 3.9: A MOS estimation model with clustering applied to call data.
The second method considered is to cluster data points by network metric
values, effectively discretizing the space of MOS values. The motivation behind
clustering is that the MOS values in the call data are observations of a random
variable, the mean of which is what is of interest to us. The clustering algorithms
described in Section 2.7 require the number of clusters to be specified. However,
there is no intuitive way of knowing how many clusters could be appropriate for
our purposes. A starting point for thepamount of clusters is therefore chosen
through a rule of thumb [22] to be k = n/2, where n is the data set size. The
rule of thumb was chosen over other ways of selecting k mainly for its simplicity,
as computation times were already growing quite large.
Call data collection
Network metric
training set
Clustering
MOS estimation
Network metric
training set
with MOS estimate
RTT, jitter,
packet loss
Learner ensemble
MOS estimation
MOS estimate
Figure 3.10: A MOS estimation model with supervised learning applied to call
data with MOS estimates from clustering.
Figure 3.9 shows how this method functions similar to the previous one,
though the training data is here instead clustered with either k-means or Gaus24
sian Mixture clustering. Queries are then assigned to the most appropriate
cluster in accordance to the algorithm, and the mean MOS value of the cluster’s data points is returned as an estimate.
The final method considered is where clustering is first used, and supervised
learning is then applied to the resulting data. Here we wish to see if the clustered
results can be improved upon by increasing the set of possible results. Figure
3.10 illustrates how this would look.
3.3.1
Evaluation of alternatives
All of these approaches were tested using the full evaluation data set that will
be described in Section 4.2. When reviewing the results, the first method of
applying supervised learning to the call data directly is found to be the best
performing. Of the two algorithms tried for this method, the boosting algorithm
produced the best result while also being faster and requiring less resources.
Consequently, the MOS estimator component of the system is chosen to be
that of Figure 3.8 with a boosting ensemble. The mean square errors of all
approaches are shown in Figure 3.11. The test results will be shown in more
detail in Section 4.4.
2.5
Mean Square Error
2.3
2.1
1.9
1.7
G
M
G M+ GM
M B M
M ag
+ gi
B ng
oo
kst
m
in
e
k
k- an -m g
m s+ ea
ea B n
ns ag s
+ gi
B ng
oo
st
B ing
ag
B ging
oo
st
in
g
1.5
Figure 3.11: Mean square errors of all MOS estimation approaches.
3.4
Design summary
Figure 3.12 shows an overview of the system’s design after all choices have been
considered. Three main components make up the system: path mapping, path
component cost estimation, and MOS estimation. The network model, which is
25
AS Path
{start end [relay]}
Routing data
Known path mapping
Path components
Call network data
LLSE cost estimation
Network metric estimate
Call MOS data
Boosting ensemble
MOS estimation
MOS estimate
Figure 3.12: Final system design with three main components and their inputs
and outputs.
of the greatest interest, consists of the first two components, which work together
to produce the network metric estimate. The cost estimation component is
tightly coupled with path mapping, providing estimates based on the particular
mappings from the previous step. The MOS estimation component however
could be removed from the system easily, should a decision be possible from
network metrics alone.
To train the model, two sets of data are required: the AS-level paths taken
by calls, and call data with network metrics and MOS.
26
Chapter 4
Evaluation
In Chapter 1, we presented a number of questions to be answered in this thesis.
Is the proposed network model as reliable as the current naı̈ve model? Does
performance in terms of computation time improve in practice when comparing
the proposed model to the current naı̈ve model? Can the model directly predict
user experience by translating network metrics into MOS? Having decided on a
design for the system, this chapter will answer these questions.
The chapter will describe how the system and its components have been
tested and evaluated, and present the outcome. First, the objectives of the
tests and what the tests will cover is defined. Then, the choice of data sets is
presented. Next, the two main components are tested separately. Finally, the
system as a whole is tested.
4.1
Test objectives and parameters
The goal of testing the model is to ascertain how well the model cost predictions
match real data. This will be done by measuring the Mean Square Error (MSE)
and Root Mean Square Error (RMSE) of the model outputs compared to the
true observed values in the test sets.
For the network model, what will be tested is the estimated network metric
values for AS pairs, and the computation times required. In addition to the
MSE, plotting the estimates against observations is of interest to see what the
nature of the errors are, and how the estimator behaves. What is most interesting is how the path mapped model compares to the naı̈ve model, particularly
in computation times, as this is where improvements can be expected.
For the MOS estimator, it is naturally the estimated MOS value that is
compared to the observed MOS values for the test data, with network metrics
as input values.
For the complete system it is the error when comparing the MOS estimate
for an AS pair and the mean observed MOS for that pair that is to be measured.
Similar to the network model, the spread of MOS values is also valuable to look
27
at, as it shows whether the user experience is somewhat consistent for a pair as
well. The metrics used to evaluate the models have been chosen based on prior
experience using them for statistical analysis.
4.2
Data set selection
The measured metrics are not the only factors that determine a user’s experience
of a call, as noted in Section 2.5. There are factors on the user’s side that affect
the perceived experience substantially [18].
In order to reduce the amount of factors that affect the data, the platforms
were constrained. Only calls between two Windows desktop clients are used in
the test data. Unfortunately, this does not rule out differences in client versions,
which can affect quality. It also does not differentiate between wired and wireless
Internet connections, which means that things like wireless access point location
planning may affect MOS values.
The data is based on calls performed throughout February 2015, during
which time a fraction of users are were randomly prompted to rate the call on a
scale from 1 to 5. If a user opted to rate the call, then the call data was logged.
To match the time period, the Route Views data used is from a single point
in time at the middle of the time period. Some routes may have changed prior
to or after this midpoint, introducing some possibility of error.
Source AS
13456
Remote AS
7442
Jitter(ms)
20
RTT(ms)
152
Send loss (%)
0
Receive loss (%)
10
Table 4.1: An example of the call data entries used.
Table 4.1 shows an example of the call data used as the basis for training
and evaluating the model. ASs are identified by their AS number, and jitter
and RTT are measured in milliseconds. Packet loss is measured separately for
packets sent and received.
This data was further filtered based on AS pairs in order to be able to
perform the tests. First, only AS pairs whose paths were present in the Route
Views data could be used, as path mapping could not be performed otherwise.
Second, only AS pairs that were present at least 100 times in the call data were
included when testing the model errors. This is to have some confidence in the
data for the pairs, as well as being able to cross validate the model without
having AS pairs in the test set that were not present in the training set.
The final data set used for the evaluation of accuracy consisted of 2,549,900
calls from 2,178 unique AS pairs. For performance evaluation, the data set
contained 2,934,919 calls from 65,579 unique AS pairs.
28
MOS
4
4.3
Network model
In testing the network model, 10-fold cross validation [16] has been performed
with the data set, training both a naı̈ve model and a path mapped model. The
evaluation was performed in MATLAB, and the implementation is available in
Appendix A.2. Table 4.2 shows the resulting errors for both models. There is
no notable impact on the average error from the path mapping, with the errors
being very similar. At a glance, the errors appear quite large however.
Model
Path Mapped
Naı̈ve
Error
MSE
RMSE
MSE
RMSE
Jitter (ms)
2665
163.3
2666
163.3
RTT (ms)
2900
170.3
2913
170.7
Send loss %
0.2
4.5
0.2
4.5
Receive loss %
0.14
3.8
0.14
3.8
Table 4.2: Mean square errors and root mean square errors in 10-fold cross
validation of the network models.
RTT
Jitter
Send loss
Receive loss
Path Mapped
∈CI
∈CI
/
1750
428
432
1746
1096
1082
1380
798
Naı̈ve
∈CI ∈CI
/
1749 429
435 1743
1105 1073
1385 793
Table 4.3: The number of network parameter estimates that fall inside of and
outside of the 95 % confidence intervals for the mean of the observed values for
each AS pair.
Having done cross validation, the plots in Figures 4.1-4.8 show the estimates
generated by the models plotted against the means of the data points for every
AS pair in the data set.
Table 4.3 shows the number of estimates that fall within and outside of the
95 % confidence interval (CI) for the mean of the observed values, assuming
that the observations are normally distributed around the true value. Quite a
large number of estimates fall outside of the confidence interval, indicating that
the confidence of the model’s estimates is not that great. The data shown is for
one of the cross validation folds, though the others behave similarly.
Figures 4.1 and 4.2 show the results of the RTT estimation. The two models
produce very similar results, and seem to overall follow the diagonal line that
would mean a perfect prediction of the means. It would appear that the models
underestimate large RTT values, though the number of data points above 600
ms is not large enough to be conclusive. In the more populated ranges, the
spread spread is more uniform, though somewhat large, as seen in Table 4.2.
Depending on the required accuracy, this may be acceptable, especially as the
29
1,800
1,500
1,500
Observed mean (ms)
Observed mean (ms)
1,800
1,200
900
600
300
0
1,200
900
600
300
0
300
600
0
900 1,200 1,500 1,800
0
Estimated mean (ms)
300
600
900 1,200 1,500 1,800
Estimated mean (ms)
Figure 4.1: Path mapping net- Figure 4.2: Naı̈ve network model
work model RTT estimates plotted RTT estimates plotted against the
against the means of test point val- means of test point values.
ues.
1,000
1,000
800
800
Observed mean (ms)
Observed mean (ms)
most interesting data points are the ones with high RTT. 20 % of the estimates
fall outside of the 95 % confidence interval, which is the best result of the
estimated metrics, but still a large amount. The expected value from a perfect
fit would be 95 % accuracy.
600
400
200
0
600
400
200
0
200
400
600
800
0
1,000
Estimated mean (ms)
0
200
400
600
800
1,000
Estimated mean (ms)
Figure 4.3: Path mapping net- Figure 4.4: Naı̈ve network model
work model jitter estimates plotted jitter estimates plotted against the
against the means of test point val- means of test point values.
ues.
Figures 4.3 and 4.4 show the results of the jitter estimation. Again, the two
models have generated very similar results, but the fit is much worse than that
of RTT. The regression models have a tendency to overestimate this parameter,
and has comparatively few error cases where it has underestimated it. Considering that the majority of the observed means are below 200, the RMSE of 163
30
25
25
20
20
Observed mean (%)
Observed mean (%)
ms makes this parameter’s estimation very untrustworthy. In fact, 80 % of the
estimated values fall outside of the confidence intervals for the observed means.
15
10
5
0
15
10
5
0
5
10
15
20
0
25
0
5
Estimated mean (%)
15
20
25
Figure 4.6: Naı̈ve network model
sending packet loss estimates plotted against the means of test point
values.
25
25
20
20
Observed mean (%)
Observed mean (%)
Figure 4.5: Path mapping network
model sending packet loss estimates
plotted against the means of test
point values.
15
10
5
0
10
Estimated mean (%)
15
10
5
0
5
10
15
20
0
25
Estimated mean (%)
0
5
10
15
20
25
Estimated mean (%)
Figure 4.7: Path mapping network
model receiving packet loss estimates plotted against the means of
test point values.
Figure 4.8: Naı̈ve network model receiving packet loss estimates plotted
against the means of test point values.
Figures 4.5 and 4.6 show the results of the send packet loss estimation. Yet
again, the models generate very similar results, though quite concentrated close
to zero. The estimator has a tendency to overestimate the packet loss, not
completely unlike the jitter estimator. This is particularly noticable when the
mean is zero, as there are plenty of estimates along the X-axis. Other than that
slight bias though, the estimator behaves quite randomly, with values all over
31
the place. Half of the estimates fall within the 95 % confidence interval, which
again is poor.
Figures 4.7 and 4.8 show the results of the receiving packet loss estimation.
Unsurprisingly, the models match closely also for the last parameter. Like the
other packet loss parameter, the values are concentrated close to zero. The
estimator seems to work slightly better for this data, though the plot is still
very scattered. The RMSE is down slightly as well, and 63 % of estimated
values fall within the confidence intervals here. The performance must still be
considered poor however.
Overall, it can be said that the path mapping step has not noticeably influenced the linear regression step of the network model. The regression results
appear quite poor, however it can be noted that for the large values in all of
the cases the models produce large as well. If these cases with large, i.e. poor,
values are compared to significantly lower estimates, then there can be some
value derived from these estimates. This is due to the fact that significance of
the differences between estimates are likely outweighing the probability of error
in the individual estimates.
·104
Total AS count
6
4
2
0
0
2
4
Distinct pair count
6
·104
Figure 4.9: Total number of ASs present as a function of included distinct AS
pairs.
Remaining then is to compare computation times between the two models,
which was what we hoped to improve. Figure 4.9 shows how the total number of
ASs grows as a function of the number of distinct AS pairs. This is of interest to
us as it reflects difference in number of variables to solve for in the naı̈ve and the
proposed model. As expected, the growth rate decreases fairly rapidly, having
included 40,000 of the around 50,000 ASs already at slightly more than 60,000
distinct pairs. At the beginning though, the total number of ASs is naturally
higher, as many paths consist of more than two ASs. At 5000 distinct pairs, the
two are roughly equal.
Figure 4.10 shows the computation times of running the naı̈ve model and the
path mapped model as a function of distinct AS pairs. It should be noted that
the number of equations in the system was not held constant as the pair count
32
Computation time (s)
15
Naive
Path mapping
10
5
0
0
2
4
Distinct pair count
6
·104
Figure 4.10: Time to solve the LLSE system as a function of distinct AS pairs
included.
increased, but it was the same between the models at each measured point. The
naı̈ve model is in the beginning slightly faster, though by very small amounts.
The path mapped model starts to perform better after a while though, at about
the same point as AS count growth visibly starts to decline in Figure 4.9. The
difference appears to be growing at a consistent rate, though the slope of the
graphs could be affected by the varying number of equations.
This confirms that the path mapped model performs better than the naı̈ve
model in the tests as the number of distinct AS pairs grows.
4.4
Mean opinion score estimator
To test the MOS estimator, we again use 10-fold cross validation. However,
the data set was required to be reduced in order to allow for computations with
limited memory space and time. The MOS data set samples 320,000 data points
from the previously used data set to work with.
Beginning with the clustering algorithms, the optimal number of clusters was
found through
p trial and error. Using the rule of thumb mentioned in Section 3.3,
we use k = 320, 000/2 = 400 as a starting point. Table 4.4 shows the results
of these initial tests. The results of clustering, as stated in Section 3.3, are the
mean values of the resulting clusters’ training data points, and it is against this
mean value that the tested values are compared.
The Gaussian mixture algorithm does not seem affected by varying the number of clusters, whereas k-means loses accuracy as the number of clusters increases. Since the clustering algorithms are computationally intense, greater
granularity in the cluster numbers is not feasible to test. Therefore, the best
result for each algorithm in this table is selected as the number of clusters to be
used, which was 300 in both cases.
For these values, the results of the clustering are used to train the boosting
and bagging learner ensembles, as depicted in Figure 3.10. We will see that the
33
results would have to be significantly improved to be of interest compared to
not using clustering, which seems unlikely.
Gaussian mixture
k-means
300
2.0023
2.0033
350
2.0024
2.0036
400
2.0023
2.0039
450
2.0024
2.0046
500
2.0024
2.0053
Table 4.4: Mean square errors for the two clustering algorithms with different
cluster sizes using 10-fold cross validation. The first row contains the cluster
sizes.
Below, Table 4.5 shows the results of running the two supervised learning
algorithms after processing the training data by either of the clustering algorithms as well as with no pre-processing. In the clustered data cases, the model
is trained using the means of the parameters belonging to a cluster. The models
are trained with 100 decision trees, limited by memory space.
Raw data
Gaussian mixture
k-means
Bagged
1.8683
2.0055
2.0182
Boosted
1.8502
2.1820
2.0165
Table 4.5: Mean square errors for the two supervised learning algorithms using
different training data.
The extra model on top of clustering seems to somewhat increase the MSE,
though only by a small amount. This is largely unimportant though, as it is clear
that the model produces the best results when the data has not been clustered
prior to training. The boosted model is slightly better than the bagged model
for the raw data in the tests, and also has the benefit of being faster to train
and requiring less memory.
Figure 4.11 shows the result of using the best model to predict MOS on the
test partition of one of the cross validation folds. Most data points have values
that are high on the scale (indicating high quality), with a large spike at the
top values. It stands to reason that a large user base is more likely where the
quality is high, so more data points would be collected of this data.
Plotting the MOS estimates of the model against the averages for each AS
pair shows a less positive picture however, shown in Figure 4.12. The estimated
values are overall in a quite slim range compared to the observed values, with
plenty of values close to 5, where the estimator has none at all. Only 61 % of
predictions fall within the 95 % confidence intervals of mean of the observed
values (1324 within CI compared to 854 outside CI).
34
14,000
12,000
Data points
10,000
8,000
6,000
4,000
2,000
0
1
2
3
4
5
Predicted MOS
Figure 4.11: Distribution of MOS predictions using the boosted raw data model.
Observed mean MOS
5
4
3
2
1
1
2
3
4
5
Estimated mean MOS
Figure 4.12: Boosted raw data model estimates plotted against the means of
test point values for AS pairs.
35
4.5
Full system
For the full system test, the MOS estimator is trained with the full data set, as
this particular model for MOS estimation had no issues with the size. 10-fold
cross validation is used here as well.
MSE
0.196
RMSE
0.443
Table 4.6: Mean square error and root mean square error for the full system’s
predictions of AS pair MOS values compared to their observed means.
In Table 4.6, the error values for the tests are shown. While smaller than
those listed earlier, note that these values are for the means of AS pairs, which
would reduce the influence of outliers on the errors.
Plotting the estimated values against the observed means for AS pairs in
Figure 4.13, it appears to be quite similar to Figure 4.12. The estimates from
the network model have slightly improved the fit though, aligning the data
points better overall with the optimal diagonal line. Now 64 % of the estimates
fall within the 95 % confidence intervals for the mean of observed values (1387
within CI compared to 791 outside CI).
While few AS pairs with low observed mean MOS scores are present in
the data, the ones that do appear indicate that the system would overestimate
those values by multiples of the RMSE, with degrading performance the worse
the quality gets.
Observed mean MOS
5
4
3
2
1
1
2
3
4
5
Estimated mean MOS
Figure 4.13: System estimates plotted against the means of test point values.
36
Chapter 5
Summary and conclusions
This chapter will discuss the results from the previous chapter and how the
choices made during the design and implementation of the models have affected
the outcome of the process. It will also relate back to the original problem,
evaluating whether the problem has been solved and what conclusions can be
drawn from the results produced. Finally, future directions and alternatives are
discussed.
5.1
Summary of the outcomes
As noted for the network model previously, none of the parameter estimations
were notably affected by the path mapping step in the estimation. Whether
this is because of the number of ASs used in training the model, or because no
notable amounts of information stands to be gained from the path mapping step
is unclear, as a larger set of AS paths was unavailable at the time of testing.
With the used set of paths, the number of path components in the regression
model was larger than the number of unique ASs and paths, which would not
be the case with a larger set of paths. With more more paths than components,
the probability that a component is traversed in multiple paths should increase,
thereby affecting the estimation of the path network metrics.
Another potential issue with the data is that some AS pairs are a lot more
common than others, causing their performance to influence the results more
than is desirable. Considering that the network model and the naı̈ve model differ
only slightly in their predictions when the network model has many more shared
components implies that this has a relatively small, if any, impact however. To
what extent this is a consequence of the properties of the test data is therefore
an open question.
Looking at the errors for the network parameter estimates, RTT estimation
performs significantly better than jitter and packet loss estimation. Unfortunately, RTT has also been found to be less influential relative to jitter and packet
loss in affecting user satisfaction [6]. RTT is directly related to the physical dis-
37
tance that the packets have to travel, whereas jitter and packet loss are not,
which could be the cause of a greater degree of consistency in RTT data points.
As stated earlier however, the estimations for the observed large values are in
most cases accurate enough to be able to compare to estimates of a path that
is significantly better. This means that for the pairs with the largest (worst)
network metric measurements, that is the most interesting ones, this model can
improve QoS.
The measured predictive power of the model is also affected by the choice
of evaluating metrics (Mean Square Error and Confidence Intervals). Other
metrics may have given additional insights into how well the model can be
expected to perform, but the current scope was set given the time constraints.
The performance of the path mapped model was found to be better than
that of the naı̈ve model, though the theoretically expected very large difference
was not visible. It is possible that the data set was not large enough to fully
illustrate the difference between the models, since Figure 4.9 shows that the
difference in growth rate of the number of variables has just begun to manifest
itself. It may also be that hardware resources affected computation times, as
computations often consumed all available resources on the test machine.
The poor results in the MOS estimation are not very surprising, seeing as
there are many factors that were not considered in the estimation that affect
user satisfaction, as mentioned in Section 2.5. There are other models internally
at Skype that are constructed similarly, but take into account a number of additional factors, amongst them bandwidth, that work well for estimating MOS.
It stands to reason then that the parameters used are insufficient for producing
a useful estimate, and that this network model must be utilized based on its
produced network parameters given this.
5.2
Lessons learned
A conscious design decision that was made was to retain the simple LLSE in
order to be able to compare the effect of path mapping with the naı̈ve model
which had utilized it previously. Trying other models would have been an addition to the scope that would have required additional time. Other models may
have been more suitable for producing estimates on their own, or might have
made use of the path mapping in a better way.
It may have been more beneficial to have omitted the MOS estimation component from the scope, and instead solely focused on finding a good model for
network parameters. This would not have been sufficient to produce a full system as it was originally intended, but could potentially have laid the foundation
for building a more reliable system in the future.
When selecting evaluating the system, only a single data point was used
for the Internet topology. Ideally, multiple data points would have been tested,
however converting the data from its source format took two days per data point
with the tools available. This was deemed too expensive considering the time
constraints. Given that the effect of path mapping was negligible in tests, the
38
use of a single data point appears acceptable.
The majority of the implementation was done in MATLAB, which provided
easy access to a multitude of algorithms and functions necessary. However, this
also limited what was possible to some degree. MATLAB’s maximum memory
limit was often reached with the amounts of data that the model processes,
leaving some algorithms unusable for large calculations. It is possible that more
efficient implementations of algorithms that are customized for our type of data,
and written for a platform with more memory, could lead to better accuracy.
Even in cases where MATLAB was able to run an algorithm on the full
data set and store the results, the available hardware was many times a bottleneck. With some tests taking several days, the number of possible models
and parameters that could be tested was severely limited. The consequences of
implementation errors were also harsh, delaying progress at times and necessitating re-prioritization of tasks.
In planning for the implementation and testing, there was clearly a need to
define how the models should be evaluated with smaller and larger data sets
to determine their viability. This would have made it easier to eliminate poor
models early, and also to plan for the possible necessity of additional hardware
to run large tests more efficiently.
5.3
Conclusions
The purpose of this thesis has been to evaluate whether or not a model such
as the one constructed here can be used to understand what impact the global
networks can have on real-time media quality. With a high RMSE at best, and
seemingly random estimations at worst, this model can be said to not work well
for data in general, but does have the potential to improve the worst performing
cases. The model proposed is also a step forward compared to the previous naı̈ve
model as it saw better scaling.
Adding information to the network data in the form of a network topology
has had no discernible effect on the accuracy of the results, and would from
the results appear to only affect the computation times. This does however
introduce an additional cost of collecting more network data, and building and
maintaining a system that accurately maps data onto this network topology. As
the data set was limited by the available BGP route data, this behavior is not
necessarily representative of what a full topology would behave like however.
MOS estimation is a complex endeavor, and the results in this thesis show
that these supervised learning models cannot accurately model MOS with the
limited amount of information that was provided. Being a subjective measurement, it is not surprising that three objective measurements cannot model it
well in an uncontrolled user environment. For the proposed network model to be
of use, decisions must be made from the estimated network performance metrics
alone.
39
5.4
Future work
Going forward, there are several things that could be done that either continue
down the same path, or try a different approach. Here we will list some ideas
that we have found interesting.
5.4.1
Path mapping
While there has been no sign of any tangible effect on the accuracy of the results
in this thesis, there remains the possibility that a larger data set, or a different
regression model would produce more favorable results. To fully rule out this
possibility, these options should be explored. Procuring a larger data set is a
problem in itself however, as procuring accurate AS-level paths from traceroutes
has been worked on for years [21, 27].
Hybrid mapping model
If large scale mapping does seem beneficial, there may still be some issues in
collecting path information. Traceroutes as a way of collecting path information
is not completely reliable, as there may be routers filtering Internet Control
Message Protocol (ICMP) requests on the path [20]. In the case where full
path information is not available, the two path mapping methods described in
Section 3.2.1 could be used together to arrive at an estimate of the path. With
a small neighborhood search in the cases where the true path is not known, the
number of paths found should be substantially less than a full network search,
and the likelihood of it being a correct path higher. The viability of this method
would be interesting to explore.
5.4.2
Parameter exploration
While RTT is directly related to the physical distance that the packets have to
travel, jitter and packet loss are not as intuitive on a macro level. For example,
jitter has been found to correlate with bandwidth [6], which can differ between
links over the path. Furthermore, several sources do not separate packet loss and
jitter into seperate variables [6, 8, 18]. How jitter and packet loss are handled is
a result of how the codec is designed [8, 18], which could further affect modeling.
Researching what network parameters influence jitter and packet loss, and
gathering data on these could aid in the estimation of them, and perhaps make
the model explored in this thesis more viable.
5.4.3
Alternative models
A path mapping approach is only one possible way of approaching this problem,
and this type of inference also restricts the possible parameters additive ones.
Given that MOS estimation likely requires more parameters, a model that could
handle non-additive parameters such as bandwidth would be interesting. The
40
concern then would be to ensure that data could be processed fast enough, as
more complex algorithms will require heavy computations for the sizable data
sets used.
One alternative that was considered that does not require additivity is an
A/B testing model, where a relayed path and a directly routed path are alternated. Eventually, enough samples would be gathered to distinguish the true
metric values from each other, and a decision could be made about which path
to choose going forward. An advantage of this approach is that the underlying
network metrics do not need to be considered if one so desires. As any numerical
information tracked could be used for the model, a path’s MOS value could be
estimated directly instead of estimating it from the network metrics, reducing
the errors occurring due to missing information.
41
Bibliography
[1] CIDR Report. http://www.cidr-report.org/as2.0/ (Retrieved on 2015-0506).
[2] The
CAIDA
AS
Relationships
http://www.caida.org/data/as-relationships/
10).
Dataset,
(Retrieved
2015-02-01.
on 2015-03-
[3] University of Oregon Route Views Project. http://www.routeviews.org/
(Retrieved on 2015-03-10).
[4] Daniele Apiletti, Elena Baralis, Fabio Pulvirenti, Silvia Chiusa no, Tania
Cerquitelli, Paolo Garza, Luigi Grimaudo, and Luca Venturini. D3.1 avaliable algorithms identification. Public deliverable, ONTIC Project (GA
number 619633), January 2015.
[5] H. Assem, M. Adel, B. Jennings, D. Malone, J. Dunne, and P. O’Sullivan. A
generic algorithm for mid-call audio codec switching. In Integrated Network
Management (IM 2013), 2013 IFIP/IEEE International Symposium on,
pages 1276–1281, May 2013.
[6] Kuan-Ta Chen, Chun-Ying Huang, Polly Huang, and Chin-Laung Lei.
Quantifying Skype User Satisfaction. In Proceedings of the 2006 Conference
on Applications, Technologies, Architectures, and Protocols for Computer
Communications, SIGCOMM ’06, pages 399–410, New York, NY, USA,
2006. ACM.
[7] Cheng T. Chu, Sang K. Kim, Yi A. Lin, Yuanyuan Yu, Gary R. Bradski,
Andrew Y. Ng, and Kunle Olukotun. Map-Reduce for Machine Learning
on Multicore. In NIPS, pages 281–288. MIT Press, 2006.
[8] R. G. Cole and J. H. Rosenbluth. Voice over IP Performance Monitoring.
SIGCOMM Comput. Commun. Rev., 31(2):9–24, April 2001.
[9] Thomas G. Dietterich. Ensemble Methods in Machine Learning. In Proceedings of the First International Workshop on Multiple Classifier Systems,
MCS ’00, pages 1–15, London, UK, UK, 2000. Springer-Verlag.
42
[10] X. Dimitropoulos, D. Krioukov, B. Huffaker, k. claffy, and G. Riley. Inferring AS Relationships: Dead End or Lively Beginning? In 4th Workshop on
Efficient and Experimental Algorithms (WEA), volume 3503, pages 113–
125, Santorini, Greece, May 2005. Springer Lecture Notes in Computer
Science.
[11] Xenofontas Dimitropoulos, Dmitri Krioukov, Marina Fomenkov, Bradley
Huffaker, Young Hyun, kc claffy, and George Riley. AS Relationships:
Inference and Validation. SIGCOMM Comput. Commun. Rev., 37(1):29–
40, January 2007.
[12] Vladimir Estivill-Castro. Why So Many Clustering Algorithms: A Position
Paper. SIGKDD Explor. Newsl., 4(1):65–75, June 2002.
[13] Lixin Gao. On Inferring Autonomous System Relationships in the Internet.
IEEE/ACM Trans. Netw., 9(6):733–745, December 2001.
[14] Lixin Gao and Feng Wang. The extent of AS path inflation by routing
policies. In Global Telecommunications Conference, 2002. GLOBECOM
’02. IEEE, volume 3, pages 2180–2184 vol.3, Nov 2002.
[15] Krishna P. Gummadi, Stefan Saroiu, and Steven D. Gribble. King: Estimating Latency Between Arbitrary Internet End Hosts. In Proceedings of
the 2nd ACM SIGCOMM Workshop on Internet Measurment, IMW ’02,
pages 5–18, New York, NY, USA, 2002. ACM.
[16] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of
Statistical Learning. Springer Series in Statistics. Springer New York Inc.,
New York, NY, USA, 2001.
[17] J. Hawkinson and T. Bates. Guidelines for creation, selection, and registration of an Autonomous System (AS). RFC 1930, RFC Editor, March
1996.
[18] ITU-T. Recommendation G.107 : The E-Model, a computational model
for use in transmission planning. Technical report, 2011.
[19] ITU-T. Recommendation Y.1540 : Internet protocol data communication
service – IP packet transfer and availability performance parameters. Technical report, 2011.
[20] Priya Mahadevan, Dmitri Krioukov, Marina Fomenkov, Xenofontas Dimitropoulos, k c claffy, and Amin Vahdat. The Internet AS-level Topology:
Three Data Sources and One Definitive Metric. SIGCOMM Comput. Commun. Rev., 36(1):17–26, January 2006.
[21] Zhuoqing Morley Mao, Jennifer Rexford, Jia Wang, and Randy H. Katz.
Towards an Accurate AS-level Traceroute Tool. In Proceedings of the 2003
Conference on Applications, Technologies, Architectures, and Protocols for
Computer Communications, SIGCOMM ’03, pages 365–378, New York,
NY, USA, 2003. ACM.
43
[22] Kanti V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis
(Probability and Mathematical Statistics). Academic Press, January 1980.
[23] Stefan Savage, Andy Collins, Eric Hoffman, John Snell, and Thomas Anderson. The End-to-end Effects of Internet Path Selection. In Proceedings
of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’99, pages 289–299, New
York, NY, USA, 1999. ACM.
[24] H. Tangmunarunkit, R. Govindan, S. Shenker, and D. Estrin. The Impact
of Routing Policy on Internet Paths. In INFOCOM 2001. Twentieth Annual
Joint Conference of the IEEE Computer and Communications Societies.
Proceedings. IEEE, volume 2, pages 736–742 vol.2, 2001.
[25] Y. Angela Wang, Cheng Huang, Jin Li, and Keith W. Ross. Queen: Estimating Packet Loss Rate Between Arbitrary Internet Hosts. In Proceedings
of the 10th International Conference on Passive and Active Network Measurement, PAM ’09, pages 57–66, Berlin, Heidelberg, 2009. Springer-Verlag.
[26] Ye Wang, Cheng Huang, Jin Li, Philip A. Chou, and Y. Richard Yang.
QoSaaS: Quality of Service as a Service. In Proceedings of the 11th USENIX
Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services, Hot-ICE’11, pages 6–6, Berkeley, CA, USA,
2011. USENIX Association.
[27] Baobao Zhang, Jun Bi, Yangyang Wang, Yu Zhang, and Jianping Wu.
Revisiting IP-to-AS Mapping for AS-level Traceroute. In Proceedings of
The ACM CoNEXT Student Workshop, CoNEXT ’11 Student, pages 16:1–
16:2, New York, NY, USA, 2011. ACM.
44
Appendices
45
Appendix A
Source code
A.1
1
2
3
4
5
6
Constrained BFS Java implementation
// Returns distances and parents for all end nodes in the graph
private ArrayList < NodeInfo > constrainedBFS ( int startIndex ) {
ArrayList < NodeInfo > dist = new ArrayList < NodeInfo >( s .
nodeMapping . size () ) ;
for ( int i = 0; i < s . nodeMapping . size () ; i ++)
dist . add ( new NodeInfo () ) ;
dist . get ( startIndex ) . dist = 0;
7
8
9
10
11
12
13
14
15
16
Queue < QueueEntry > queue = new LinkedList < QueueEntry >() ;
queue . add ( new QueueEntry (0 , startIndex , false ) ) ;
QueueEntry front ;
while (( front = queue . poll () ) != null ) {
int savedDist = front . constrained ?
dist . get ( front . index ) . cDist : dist . get ( front .
index ) . dist ;
if ( savedDist > front . dist )
continue ;
int newDist = savedDist + 1;
17
for ( Relation rel : s . graph . get ( front . index ) ) {
NodeInfo neighbor = dist . get ( rel . end ) ;
if ( rel . type == RelType . C2P ) {
if ( front . constrained )
continue ;
if ( newDist == neighbor . dist
&& ! neighbor . parents . contains ( front . index
18
19
20
21
22
23
24
)) {
25
26
27
28
neighbor . parents . add ( front . index ) ;
} else if ( newDist < neighbor . dist ) {
neighbor . parents . clear () ;
neighbor . parents . add ( front . index ) ;
46
neighbor . dist = newDist ;
queue . add ( new QueueEntry ( newDist , rel . end ,
29
30
false ) ) ;
}
}
else {
if ( front . constrained && rel . type == RelType . P2P )
continue ;
if ( newDist == neighbor . cDist
&& ! neighbor . cParents . contains ( front .
index ) ) {
neighbor . cParents . add ( front . index ) ;
} else if ( newDist < neighbor . cDist ) {
neighbor . cParents . clear () ;
neighbor . cParents . add ( front . index ) ;
neighbor . cDist = newDist ;
queue . add ( new QueueEntry ( newDist , rel . end ,
true ) ) ;
}
}
}
}
return dist ;
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
1
2
3
4
5
6
7
8
9
10
11
12
13
}
// Return the set of paths to a specified end node
private Set < ArrayList < Integer > > constructPaths ( int index ,
ArrayList < NodeInfo > dist ) {
Set < ArrayList < Integer > > paths = new HashSet < ArrayList < Integer
> >() ;
NodeInfo node = dist . get ( index ) ;
ArrayList < Integer > parents = node . dist < node . cDist ? node .
parents : node . cParents ;
for ( int parent : parents )
paths . addAll ( constructPaths ( parent , dist ) ) ;
if ( paths . size () == 0)
paths . add ( new ArrayList < Integer >() ) ;
for ( List < Integer > path : paths )
path . add ( index ) ;
return paths ;
}
47
A.2
1
2
3
4
MATLAB estimate tests
% % Load data
load A . dat ;
load A_n . dat ;
load b . dat ;
5
6
7
A = spconvert ( A ) ;
A_n = spconvert ( A_n ) ;
8
9
% % Filter data
10
11
FREQ_CUTOFF = 100;
12
13
14
[C , ia , ic ] = unique (A , ’ rows ’) ;
cc = histc ( ic , unique ( ic ) ) ;
15
16
17
18
19
rowmask = find ( cc ( ic ) >= FREQ_CUTOFF ) ;
A = A ( rowmask , :) ;
A_n = A_n ( rowmask , :) ;
b = b ( rowmask , :) ;
20
21
22
23
24
25
colmask = find ( any ( A ) ) ;
colmask_n = find ( any ( A_n ) ) ;
A = A (: , colmask ) ;
A_n = A_n (: , colmask_n ) ;
[C , ia , ic ] = unique (A , ’ rows ’) ;
26
27
% % Run cross validation
28
29
FOLD_CNT = 10;
30
31
32
33
cv = cvpartition ( size (A , 1) , ’ KFold ’ , FOLD_CNT ) ;
folds = cell ( FOLD_CNT ,1) ;
folds_n = cell ( FOLD_CNT ,1) ;
34
35
36
37
38
for i = 1: FOLD_CNT
folds { i } = cvprocess (A , b , ic , cv . training ( i ) , cv . test ( i ) ) ;
folds_n { i } = cvprocess ( A_n , b , ic , cv . training ( i ) , cv . test ( i )
);
end
39
40
41
42
43
44
45
46
mse = zeros (1 ,4) ;
mse_n = zeros (1 ,4) ;
for i = 1: FOLD_CNT
mse = mse + calcmse ( folds { i }) ;
mse_n = mse_n + calcmse ( folds_n { i }) ;
end
mse = mse ./ 10;
48
47
mse_n = mse_n ./ 10;
48
49
50
51
% % Fold - specific data
% means , model predictions , and confidence intervals for means
based on
% test data points
52
53
FOLD_IDX = 1;
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
means = zeros (4 , size ( folds { FOLD_IDX } ,2) ) ;
preds = zeros (4 , size ( folds { FOLD_IDX } ,2) ) ;
ci1 = zeros (2 , size ( folds { FOLD_IDX } ,2) ) ;
ci2 = zeros (2 , size ( folds { FOLD_IDX } ,2) ) ;
ci3 = zeros (2 , size ( folds { FOLD_IDX } ,2) ) ;
ci4 = zeros (2 , size ( folds { FOLD_IDX } ,2) ) ;
for i = 1: size ( folds { FOLD_IDX } ,2)
[p , m , ci ] = parsefelement ( folds { FOLD_IDX }{ i }) ;
means (: , i ) = m ;
preds (: , i ) = p ;
ci1 (: , i ) = ci (: ,1) ;
ci2 (: , i ) = ci (: ,2) ;
ci3 (: , i ) = ci (: ,3) ;
ci4 (: , i ) = ci (: ,4) ;
end
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
1
2
3
4
5
6
7
8
9
meansn = zeros (4 , size ( folds_n { FOLD_IDX } ,2) ) ;
predsn = zeros (4 , size ( folds_n { FOLD_IDX } ,2) ) ;
ci1_n = zeros (2 , size ( folds_n { FOLD_IDX } ,2) ) ;
ci2_n = zeros (2 , size ( folds_n { FOLD_IDX } ,2) ) ;
ci3_n = zeros (2 , size ( folds_n { FOLD_IDX } ,2) ) ;
ci4_n = zeros (2 , size ( folds_n { FOLD_IDX } ,2) ) ;
for i = 1: size ( folds_n { FOLD_IDX } ,2)
[p , m , ci ] = parsefelement ( folds_n { FOLD_IDX }{ i }) ;
means_n (: , i ) = m ;
preds_n (: , i ) = p ;
ci1_n (: , i ) = ci (: ,1) ;
ci2_n (: , i ) = ci (: ,2) ;
ci3_n (: , i ) = ci (: ,3) ;
ci4_n (: , i ) = ci (: ,4) ;
end
function [ c ] = cvprocess (X , Y , i , cvtr , cvte )
Xp = X ( cvtr , :) ;
Yp = Y ( cvtr , :) ;
idx = i ( cvtr , :) ;
b = Xp \ Yp ;
c = {};
y_hat = Xp * b ;
for j = 1: size ( Xp ,1)
c { idx ( j ) } = y_hat (j ,:) ;
49
10
11
12
13
14
15
16
17
1
2
3
4
5
6
end
Xt = X ( cvte , :) ;
Yt = Y ( cvte , :) ;
it = i ( cvte , :) ;
for j = 1: size ( Xt ,1)
c { it ( j ) } = [ c { it ( j ) }; Yt (j ,:) ];
end
end
function [ prediction , means , ci ] = parsefelement ( el )
el (: ,2:3) = 1 - exp ( el (: ,2:3) ) ;
el (: ,1) = sqrt ( el (: ,1) ) ;
prediction = el (1 ,:) ;
ci = zeros (2 , size ( el ,2) ) ;
means = zeros (1 , size ( el ,2) ) ;
7
8
9
10
11
12
13
14
1
2
3
4
5
6
7
8
9
10
11
12
13
for i = 1: size ( el ,2)
pd = fitdist ( el (2: end , i ) , ’ Normal ’) ;
means ( i ) = mean ( el (2: end , i ) ) ;
cis = paramci ( pd ) ;
ci (: , i ) = cis (: ,1) ;
end
end
function [ mse ] = calcmse ( cvt )
cnt = 0;
mse = zeros (1 ,4) ;
for i = 1: size ( cvt ,2)
cvt { i }(: ,1) = sqrt ( max (0 , cvt { i }(: ,1) ) ) ;
cvt { i }(: ,2:3) = 1 - exp ( cvt { i }(: ,2:3) ) ;
pred = cvt { i }(1 ,:) ;
test = cvt { i }(2: end ,:) ;
for j = 1: size ( test ,1)
mse = mse + ( pred - test (j ,:) ) .^2;
cnt = cnt + 1;
end
end
14
15
mse = mse ./ cnt ;
16
17
end
50
På svenska
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –
under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,
skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för
ickekommersiell forskning och för undervisning. Överföring av upphovsrätten
vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av
dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,
säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ
art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i
den omfattning som god sed kräver vid användning av dokumentet på ovan
beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan
form eller i sådant sammanhang som är kränkande för upphovsmannens litterära
eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se
förlagets hemsida http://www.ep.liu.se/
In English
The publishers will keep this document online on the Internet - or its possible
replacement - for a considerable time from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission for
anyone to read, to download, to print out single copies for your own use and to
use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses
of the document are conditional on the consent of the copyright owner. The
publisher has taken technical and administrative measures to assure authenticity,
security and accessibility.
According to intellectual property law the author has the right to be
mentioned when his/her work is accessed as described above and to be protected
against infringement.
For additional information about the Linköping University Electronic Press
and its procedures for publication and for assurance of document integrity,
please refer to its WWW home page: http://www.ep.liu.se/