Download D3.2 Traffic Models

Document related concepts

Airborne Networking wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Distributed firewall wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Peering wikipedia , lookup

Network tap wikipedia , lookup

Net bias wikipedia , lookup

Deep packet inspection wikipedia , lookup

Transcript
Project Deliverable
CELTIC TRAMMS CP4-025
TRAMMS – TRAFFIC MEASUREMENTS AND MODELS IN MULTISERVICE NETWORKS
DELIVERABLE D3.2 - TRAFFIC MODELS
Editor full name
Perényi Marcell/ Kåre Gustafsson
Editor affiliation
BUTE/EABS
Editor email address
[email protected]/[email protected]
Contributors
Tord Westholm (EABS), Andreas Aurelius (Acreo), Felipe Mata (UAM)
Iñigo Sedano (ROB), Jens Andersson (LTH), Tamás Éltető (BUTE),
Sándor Molnár (BUTE)
Identifier:
Deliverable D3.2
Class:
Report
Version:
Version Date:
02/04/2009
Distribution:
Public
Responsible Partner:
EABS
D3.2 TRAFFIC MODELS
Public
1 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
TRAMMS PROJECT
The Celtic TRAMMS project (http://projects.celtic-initiative.org/tramms/) measures and analyzes IP
traffic in European access networks. TRAMMS aims to increase the insight in the nature of the data
traffic in today’s and tomorrow’s IP networks. In order to cope with the demands from emerging
applications, the architecture of the underlying networks must be laid out with deep knowledge of the
applications that are used and the traffic that will be flowing through the networks.
The idea behind the concept of a converged infrastructure is that a single network should support (in
principle) all applications. It will have to carry traffic from different terminals and a great variety of
applications. Traditionally, lack of knowledge regarding traffic patterns in multiservice IP networks has
been compensated by massive over-provisioning of resources in order to decrease the likelihood of
QoS violations. Understanding the user traffic patterns and how they aggregate on different levels will
imply a competitive advantage when deploying broadband networks and applications since the
investment costs will be lower.
The main objective of TRAMMS is to model traffic in multi-service IP networks, and to use the models
as input for capacity planning of tomorrow’s networks. The models will be built upon data acquired with
advanced traffic measurements on the application level with deep packet/deep flow inspection in
different parts of Europe, combined with bottleneck analysis and interdomain routing analysis. The
traffic generated by end-users in fixed access network infrastructures by specifying traffic parameters
to be measured and analysed in the different test sites, and to jointly evaluate the results and develop
traffic models built upon them. Based on the traffic models, dimensioning rules for capacity planning of
IP networks will be created.
To achieve the goals of TRAMMS work will be performed in the following main areas:
•
Traffic measurements in fixed metro/access and wireless access networks.
•
Traffic analysis and models for fixed metro/access and wireless access networks
ABSTRACT
Traffic measurements made in two municipal networks in Sweden, in one Spanish residential network
and in the RedIRIS network in Spain are presented and analyzed in this deliverable. The usages of
different applications in the four networks are described as well as the locality, the sources and the
geographical destination.
TABLE OF CONTENT
TRAMMS PROJECT............................................................................................................................... 2
ABSTRACT ............................................................................................................................................. 2
TABLE OF CONTENT ............................................................................................................................ 2
ABBREVIATIONS................................................................................................................................... 4
1
EXECUTIVE SUMMARY ................................................................................................................. 5
2
STATE OF THE ART ....................................................................................................................... 6
3
MEASUREMENTS, TOOLS AND NETWORKS ............................................................................. 8
4
NETWORK TRAFFIC LOCALITY ................................................................................................... 8
4.1
OUTGOING DIRECTION ................................................................................................................. 9
4.1.1
Packet count ...................................................................................................................... 9
4.1.2
Byte count ........................................................................................................................ 10
4.1.3
Flow count........................................................................................................................ 12
D3.2 Traffic Models
Public
2 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
4.2
INCOMING DIRECTION................................................................................................................. 13
4.2.1
Packet count .................................................................................................................... 13
4.2.2
Byte count ........................................................................................................................ 15
4.2.3
Flow count........................................................................................................................ 16
4.3
BOTH DIRECTIONS ..................................................................................................................... 19
4.3.1
Packet count .................................................................................................................... 19
4.3.2
Byte count ........................................................................................................................ 19
4.3.3
Flow count........................................................................................................................ 19
4.4
DISCUSSION OF THE RESULTS .................................................................................................... 24
5
MAIN TRAFFIC DESCRIPTOR ..................................................................................................... 25
5.1
DAILY AND WEEKLY PROFILES .................................................................................................... 25
5.1.1
Daily profiles Spanish network......................................................................................... 25
5.1.2
Comparison between the daily profiles of the different networks .................................... 26
5.2
UL/DL VOLUME AND PACKET ...................................................................................................... 28
5.2.1
Spanish network .............................................................................................................. 28
5.3
SUBSCRIBER CLUSTERS ............................................................................................................. 29
5.3.1
Separation using the total traffic volume ......................................................................... 29
5.3.2
Separation using cluster analysis .................................................................................... 31
5.3.3
Clustering of users by setting up traffic limits .................................................................. 34
5.3.4
Analysis of “minimal users”.............................................................................................. 39
5.3.5
Separation using cluster analysis (Swedish network) ................................................... 40
5.3.6
Separation using the traffic volume of popular applications ............................................ 43
5.4
SUBSCRIBER ACTIVITIES ............................................................................................................ 44
5.5
APPLICATION VOLUME, PACKET AND SESSION SHARE .................................................................. 45
5.5.1
Comparison of applications usage in different networks and technologies..................... 45
5.5.2
Applications with high user penetration xviii ...................................................................... 46
5.5.3
Traffic volume distribution xviii ........................................................................................... 48
6
APPLICATION CHARACTERISTICS ........................................................................................... 50
6.1
WEB VIDEO ON DEMAND ............................................................................................................ 50
6.1.1
YouTube content popularity analysis............................................................................... 52
6.2
VIDEO STREAMING ..................................................................................................................... 54
6.3
WEB TRAFFIC ANALYSIS ............................................................................................................. 56
6.3.1
Top domain analysis........................................................................................................ 58
6.4
P2P FILE SHARING..................................................................................................................... 59
6.5
P2P TELEPHONY AND VOIP........................................................................................................ 65
6.5.1
Skype traffic ..................................................................................................................... 65
6.5.2
MSN Messenger (Windows Live Messenger) traffic........................................................ 68
7
CONCLUSIONS/DISCUSSION ..................................................................................................... 71
7.1
7.2
7.3
8
DESCRIPTION OF AGGREGATE TRAFFIC ...................................................................................... 71
APPLICATION USAGE .................................................................................................................. 72
CLUSTERING OF USERS.............................................................................................................. 72
REFERENCES............................................................................................................................... 73
D3.2 Traffic Models
Public
3 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
ABBREVIATIONS
AC
Autonomous Community
ADSL
Asymmetric Digital Subscriber Line
AL
Aggregation Level
CMTS
Cable Modem Transmission System
CoS:
Class of Service
DiffServ:
Differentiated Services
DL
Downlink
DPI
Deep Packet Inspection
DRDL
Datastream Recognition Definition Language
DRG
Digital Residential Gateway
DSL
Digital Subscriber Line
FBM:
Fractional Brownian Motion
FTP
File Transfer Protocol
FTTC
Fiber To The Cabinet
FTTH
Fiber To The Home
Gbps:
Gigabits per second
GGSN
Gateway GPRS Support Node
GPRS
General Packet Radio Service
IntServ:
Integrated Services
IPTV
Internet Protocol TV
MRTG:
Multi Router Traffic Grapher
P2P
Peer to peer
PoP
Point of presence
POTS:
Plain Old Telephony Service
QoS:
Quality of Service
RSVP:
Resource ReSerVation Protocol
SLA:
Service Level Agreement
VoIP
Video over IP
D3.2 Traffic Models
Public
4 (74)
Project Deliverable
1
CELTIC TRAMMS CP4-025
EXECUTIVE SUMMARY
Internet usage is evolving, from the traditional WWW usage (i.e. downloading web pages), to tripleplay usage where households may have all their communication services (telephony, data, TV)
through their broadband access connection. The challenge is to design IP access networks so that
they can deliver services with strict QoS demands such as IPTV at the same time as having capacity
for (from the operator's perspective) unwanted traffic, for example file sharing, demanded by the
users.
One important part in meeting this research challenge is to identify and monitor Internet usage. The
traffic patterns and applications need to be investigated and reported on. The experience from earlier
traffic measurements is that it is no longer sufficient to investigate aggregated traffic at the IP level. In
order to capture user behavior and traffic patterns in IP access networks, the measurements need to
be performed close to the users and to be able to identify specific applications.
Traffic modeling is tightly coupled both to traffic measurements and to engineering and techno
economics. There are in most cases either a theoretical or practical problem motivating and defining
the modeling and depending on the problem, the model may take very different shapes. For example
when studying queuing disciplines, detailed dynamic models are needed for reliable results, while for
network planning and capacities dimensioning high level traffic models in combination with estimates
of the traffic evolution are needed. However, independent of the type of model traffic measurements
are a common denominator that provide input for the model parameters. Without measurements the
parameters are very likely to be wrong or not detailed enough.
It should be pointed out that we model user data traffic, i.e. traffic going to and from subscriber clients.
This includes application signaling traffic that is normally considered as user data from a network
traffic handling point of view. Thus, there are many traffic types not considered here such as link
control traffic, mobility traffic for cellular networks, operation and maintenance traffic, etc.
The main traffic descriptors aim at describing the traffic profile either of a subscriber/subscriber line or
at some aggregate level of subscribers/subscriber lines, while the application level model describes
the traffic characteristics of individual applications or application types/classes.
In this report, traffic measurements from four different networks were collected and analyzed. Two of
the networks are in Spain, one commercial and one university network, and two are in Sweden:
•
The first Swedish municipal network is an open fibre based network with approximately 2600
FTTH and 200 DSL customers. The FTTH customers represent many social and ethnic
groups, while the DSL customers constitute a more homogeneous group of Swedish middle
class living in single family houses.
•
The second Swedish municipal network is a FTTH network with 350 IPTV users. This was
used only to study user IPTV behavior.
•
The commercial Spanish network contains both fixed and wireless access networks. The
wireline part consists of a fibre network to the cabinet (FTTC) and the last mile consists of
Cable Modem Termination System (CMTS) and ADSL. The wireless access is a combined
GPRS and UMTS system.
•
The Spanish university network RedIRIS interconnects and allows Internet access to more
than 300 institutions with 2.7 million users. The network is SDH-based with link speeds from
2.5 Gbps up to 10 Gbps.
The current document (Deliverable 3.2) contains several sections describing new results and
achievements since the previous deliverable (D3.1) was published. A number of sections were
updated and extended significantly. Nevertheless, D3.2 also includes the summary of some of the
results and findings present in D3.1 focusing on the major conclusions.
Chapter 2 (State of the Art) gives a brief overview of the network capacity planning methods used in
today’s networks to assure Quality of Service (QoS).
D3.2 Traffic Models
Public
5 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Chapter 3 describes the measurement tools and techniques used to capture traffic in different
networks and measurement points. It also discusses the steps of the data processing, including
anonymization.
Chapter 4 contains an analysis of Network traffic Locality that has been performed with measurements
in the RedIRIS network. Traffic sent to and received from six universities within the RedIRIS network
has been analyzed. The mapping of IP addresses with the related countries made use of the public
free database for IP addresses’ geographic localization of MaxMind i which has an accuracy of 99.5%.
Chapter 5 (Main Traffic Descriptors) focuses on properties of network traffic and users on an
aggregate level. Section 5.1 presents a comparison between daily profiles of the different networks
along with an interesting analysis of the average traffic rate per active user in the Spanish CMTS
network. Section 5.2 summarizes the findings (presented in D3.1) of the relationship between the
uplink and the downlink traffic volumes per subscriber in the Spanish network. Section 5.3 extends the
cluster analysis work (started in D3.1) by new clustering results considering users over a certain traffic
limit. Furthermore, it contains a characterization of “minimal users” and a cluster analysis of the
Swedish subscribers. Section 5.4 analyzes the number of active MAC addresses of the total traffic
and some popular applications categories. Finally, Section 5.5 compares the share of important
application groups in different technologies and networks. It also investigates applications with the
highest user penetration and the share of the most popular application in the traffic volume.
Chapter 6 (Application Characteristics) concentrates on individual applications (or application
categories) instead of the aggregate traffic. Section 6.1 contains an analysis of the traffic of the
popular web based video sharing websites (namely YouTube and Metacafe). It also presents an
interesting content popularity study of YouTube videos as well as user activity and traffic intensity
charts. Section 6.2 investigates the characteristics of video streaming traffic. Section 6.3 studies the
distribution of web traffic among different websites and domains. The study reveals the total traffic of
websites distributing traffic between several servers and sub-domains for load balancing purposes.
The findings about the characteristics of P2P file sharing applications are summarized in Section 6.4.
Section 6.5 focuses on VoIP (and instant messaging) applications, e.g. Skype and MSN messenger.
The section unveils findings about weekly fluctuation of Skype traffic in the Spanish fixed and mobile
networks, user rankings according to generated traffic, and detailed daily profile of hosts using Skype
in the Swedish network.
Finally, Chapter 7 collects the most important findings and conclusions of the document.
2
STATE OF THE ART
Network capacity planning is the process of determining the amount of resources needed in every link
of a network in order to guarantee certain Quality of Service (QoS) constraints. Usually, the resources
are the bandwidth of the different links and the QoS constraints are defined in order to satisfy users’
performance requirements. Two different approaches have appeared to deal with this problem. In the
first approach, protocols and architectures that guarantee the QoS constraints are used to reserve
bandwidth for every new stream or to give priority to some streams over other ones. This alternative
makes an efficient use of the resources at the expense of complexity in management and
maintenance. In the second approach, the overprovisioning alternative, links are dimensioned with
more bandwidth than is needed for the aggregate stream which wastes resources but is easy to
handle.
Two techniques have become popular for dimensioning networks without wasting resources, namely
IntServ ii and DiffServ iii. IntServ (Integrated Services) reserves bandwidth by means of the RSVP
protocol to ensure QoS. A new application can make a reservation of a required amount of bandwidth
and if there is enough bandwidth along the path between the origin of the application and the end,
then the application is guaranteed those resources in the network and the target QoS. DiffServ
(Differentiated Services) uses a byte in the IP header to set a QoS level for an IP packet. Routers
make use of this information to prioritize traffic with higher level of QoS over traffic with lower level of
QoS. The main difference between IntServ and DiffServ is that DiffServ offers a relative level of QoS,
because the QoS of a given Class of Service (CoS) depends on the amount of traffic of the other CoS,
while IntServ gives a QoS that does not depend on the remaining traffic.
Although the former point of view seems to meet the requirements of a network provider, where the
resources are efficiently managed to achieve the desired level of QoS, thus reducing the investment in
D3.2 Traffic Models
Public
6 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
the network, this is in reality not the case. The equipment used in the network must have more
complexity in order to maintain these architectures, increasing equipment investment. This also results
in additional cost in operation and management of the network. Network operators must be trained to
configure and manage the different classes of service, and installers of the equipment have to be
instructed how to properly configure the routers.
These reasons make the overprovisioning approach more attractive than would be thought at first. The
usual approach to bandwidth overprovisioning follows several stages. In the first stage, the
performance parameters and their target values are determined. These targets are commonly agreed
on in a Service Level Agreement (SLA) which is a contract where the client and the network operator
formally define the level of service that the network operator is obliged to give to the user. In the
second stage, measurements are done in order to analyze current capacity and if the QoS targets are
met. In the third stage, a network model together with the actual measurements is necessary to predict
or estimate the demand of bandwidth in the future. Later, a validation of the model is needed, in order
to assess the correctness of the predictions already done. Finally, conclusions from the model and the
data are extracted and the amount of resources is determined.
Following this approach, what is commonly done by operators is measuring
bandwidth over a link and then using the following rule of thumb:
C=d ·M
the average used
Equation 1
where C is the link capacity that satisfies the requirements, M is the average of the traffic load and d is
some overprovisioning constant, which is larger than 1. In this naive approach, the measurements can
be of several kinds, but it is sufficient to have MRTG iv values with a granularity of 5 minutes. The
constant d is commonly much greater than 1 because the operators want to take into account the
fluctuations of the traffic about the mean value (burstiness). The model in this case is as simple as
considering the load constant over time M, taking M for instance as the average load during the
busiest hour of a period of time. It has been reported v that present networks are very lightly utilized,
less than 40 % utilization even in highly loaded days, resulting in a capacity about 30 times the
average traffic rate.
More complex approaches to network dimensioning have appeared in the literature, for instance the
one of Fraleigh et al. vi In this work, the QoS target is defined by means of delay between POPs
(Points-of-Presence). A backbone network consists of a set of nodes (POPs) that are connected via
high speed links. The QoS requirement is of the form
P[d (i , j ) > Dt arg et ] < τ ,
Equation 2
i.e. the probability of having a delay between POP i and POP j greater than the target delay is less
than a given threshold τ. Modeling the traffic load with a Gaussian process, specifically a two-scale
Fractional Brownian Motion (FBM) process, they compute the delay distribution for a single queue,
and using this result eventually compute the end-to-end queueing delay through a network. After
assuming that the characteristics of a traffic demand remain the same throughout the network and that
the delays at each queue are independent (these assumptions were first validated in Fraleigh’s
thesis vii) they compute the end-to-end queuing delay as the convolution of the queuing delays of the
single queues that connect both ends. The traffic measurements contained in this paper are packet
level measurements from the Sprint IP network, containing the arrival time, packet size and the first 40
bytes of every transmitted packet. These measurements were used to derive and validate the above
described model, and also to compute the values for the capacities of every link connecting POPs. To
achieve this, they resolve a Capacity Assignment problem taken into account the traffic demand
between POPs. The results of this work are that for links with high capacity (greater than 1 Gbps),
utilization can reach 80%-90% and still meet the delay requirements, and that with an extra capacity
between 5%-15% an end-to-end delay requirement in the Sprint IP network of 4 ms is satisfied.
Another interesting work in capacity planning was published in 2006 by Hans van den Berg et al. viii.
The authors define the QoS requirement using the following formula
P[ A(T ) ≥ CT ] ≤ γ
Equation 3
i.e. the probability that the amount of traffic offered in [0, T] A(T), for small T, is greater than the
maximum amount of traffic that can be allocated in the link over that period of time (CT) is less than a
given threshold γ. Here we can see that the QoS target directly relates with the capacity C of the link.
D3.2 Traffic Models
Public
7 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
The model that they use for the traffic is that the amount of traffic offered in [0, T] is distributed as a
Gaussian
( A(T ) : Norm( ρT ,ν (T )) )
Equation 4
with load ρ Mbps and varianceν (T ) Mbit². Using this model, the show that the capacity of a link
follows the following formula:
C=ρ + α ρ
Equation
5
Here, ρ is estimated in intervals of length T, and α is a parameter that depends on the ratio between
peak bandwidth and mean bandwidth, the time interval T, the target probability γ and the mean service
time (supposing an M/G/∞ queue). Surprisingly, they demonstrate that the value of α does not depend
on the arrival rate, so the fact that the number of users is increased in the network affects the capacity
only by means of the increase of the average load ρ. For the calculation of ρ they use MRTG
measurements with a granularity of 5 minutes. For the calculation of α, they estimate it using Equation
5 in order to satisfy it for a 99 % of the data, but they point out that it is possible to compute it
theoretically if flow-level measurements are done.
3
MEASUREMENTS, TOOLS AND NETWORKS
The analysis in this report is based on measurements performed in access networks in Sweden and
Spain. The residential user measurements include the access technologies DSL and FTTH (Sweden
municipal network), CMTS and Mobile (Spanish operator network). Measurements were also
performed in the Spanish National Academic and Research Network (NARN), RedIRIS. Details on the
networks and subscribers are found in the TRAMMS Deliverable D3.1 ix.
The tools used in the measurements are
x
•
PacketLogic
•
Cisco NetFlow
•
Wireshark
•
Traffic databases
xi xii
xiii
The PacketLogic and Cisco NetFlow are described in D3.1ix
Wireshark is a passive software solution that can be used for real-time monitoring or non-real-time
analysis from captured files. In this report a larger focus has been put on packet level measurements
than in the D3.1. Thus one new measurement technique is introduced, packet capture via firewall
rules.
In this technique, a firewall rule is created that matches certain criteria, e.g. url visited, application
used, etc. Packets that match the criteria are dumped to pcap files xiv. These files are anonymized and
post processed either with wireshark, python or similar programming languages, to extract statistics
from the files.
The traffic databases mentioned in the bullets above are a way of retrieving the traffic data from the
Packetlogic tool, and store it in a database to make analysis easy and fast. Thus, the information in
the databases is traffic data per household, per IP number and per application.
4
NETWORK TRAFFIC LOCALITY
The analysis of Network traffic Locality has been performed with measurements in the RedIRIS
network. Traffic sent to and received from six universities within the RedIRIS network has been
analyzed. The mapping of IP addresses with the related countries made use of MaxMind xv,the public
free database for IP addresses’ geographical localization which has an accuracy of 99.5%.
D3.2 Traffic Models
Public
8 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
To clarify the description of the results, they have been split in three cases, one for the incoming
direction of traffic, another for the outgoing direction and a last one for both directions. The incoming
direction must be understood as the traffic that has as destination IP address belonging to one of the
universities under study. On the other hand, the outgoing traffic is the traffic that is generated in the
universities and so the source IP address belongs to one of the universities. The results of the both
directions case are the aggregation of all the traffic analyzed in the former cases.
In all the cases three different measurements have been performed, namely packet count, byte count
and flow count. This is done to circumvent (and also illustrate when possible) the traffic behavior
commonly referred to as “the elephants and mice phenomenon”, that is that a very small proportion of
the flows carries the largest part of the information.
The chapter is structured as follows. In the following three sections we present the results of the
aforementioned cases and in the last section we discuss the obtained results.
4.1
Outgoing direction
Here we present the results obtained from the Network Traffic Locality analysis in the outgoing
direction for the three measurements described above.
4.1.1
Packet count
In Figure 4-1 we show the percentage of packets per destination sent from the universities. The
countries in the figure are the thirteen that receive the largest amount of packets (accounting for at
least 1% of the total number of packets), the not classified packets (i.e. the packets whose IP
addresses did not match in the database) and the percentage of packets that were sent to other
countries not shown in the graph (more than 12% of the traffic). As can be seen, most of the traffic is
sent to locations within Spain (nearly 45% of the total number of packets) and the second most visited
country is the United States, accounting for nearly 20% of the packet count. After them there is a huge
jump in percentage, with Germany being the third most visited country with less than 4% of the
packets. It is also worth mention that the percentage of not classified packets was nearly negligible
(less than 0.1%)
Figure 4-1: Percentage of packets per destinion for the traffic from RedIRIS.
As Spain and the United States jointly account for more than 65% of the analyzed packets, we show in
Figure 4-2 the percentage of traffic that is sent to the Top 15 countries after removing Spain and the
D3.2 Traffic Models
Public
9 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
United States (that is, the percentage of traffic sent to the 15 most visited countries without taking into
account Spain and the United States) in order to remove the masking effect of them.
Finally, we have placed this information in a color coded map. Spain and the United States have been
removed from this map in order not to make the color scale meaningless. We have also excluded the
countries which account for less than 10-3 % of the packets. The results are shown in Figure 4-3.
4.1.2
Byte count
In Figure 4-4 we present results analogous to those shown in Figure 4-1. As in Figure 4-1, most of the
bytes are sent to Spain and the United States (accounting again for more than 65% of the total
number of bytes) but in this case the United States percentage is halved. There are also thirteen
countries where at least 1% of the bytes are sent to, but these are not the same as those for the
packet count.
Figure 4-2: Percentage of packets of the 15 most visited countries excluding Spain & USA.
D3.2 Traffic Models
Public
10 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-3: Map plot of Outgoing Packet Locality.
We again remove Spain and the United States and show the results in Figure 4-5. We find
approximately the same countries in the as for the packet count, and the percentage of bytes of the
remaining countries is also similar.
Figure 4-4: Percentage of bytes per destinion of the traffic from RedIRIS.
Figure 4-5: Percentage of bytes for the 15 most visited countries excluding Spain & USA.
D3.2 Traffic Models
Public
11 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Finally, in Figure 4-6 we show the geographic plot of the countries that account for more than 10-3 % of
the bytes sent from the universities under study.
Figure 4-6: Map plot of Outgoing Byte Locality.
Figure 4-7: Percentage of flows per destinion for the traffic from RedIRIS.
4.1.3
Flow count
In Figure 4-7 we show the results of the analysis of the traffic locality by flows. Similarly to the
previous measurements, Spain and the United States account for the majority of the flows, but in this
case the percentage does not reach 65% as reached before. There is also another difference with
D3.2 Traffic Models
Public
12 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
previous results. In this case there are fourteen countries that account for at least 1% of the total
number of flows.
Figure 4-8 corresponds to the Top 15 countries when we measure the number of flows. As in the
former cases, the countries are not exactly the same, neither is their percentage of flows.
Figure 4-8: Percentage of flows of the 15 most visited countries after removing Spain & USA.
Finally, as was done in the Packet and Byte cases, we show in Figure 4-9 the countries that account
for more than 10-3 % of the flow number on a world map.
4.2
Incoming direction
We now proceed to the results obtained when analysing the incoming traffic, i.e. the flows where the
destination was one of the universities under study.
4.2.1
Packet count
Here results equivalent to those presented in Section 4.1.1 are shown here for traffic in the incoming
direction. Figure 4-10 shows the countries that contribute with more than 1% of the total number of
packets. Compared with Figure 4-1, there are fewer countries contributing with more than 1% of the
packet count in the incoming direction than in the outgoing direction. However the set of countries that
contribute with more than 1% of the total number of packets in the incoming direction are contained in
the equivalent set for the outgoing direction.
In Figure 4-11 we present the results of the 15 most contributing countries excluding Spain and the
United States, because as in the previous cases they account for more than 60% of the total number
of packets. The sets of countries in both directions are not the same, but this difference appears in the
less contributing countries of the Top 15.
Finally, we show this information in a world map in Figure 4-12.
D3.2 Traffic Models
Public
13 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-9: Map plot of Outgoing Flow Locality.
Figure 4-10: Percentage of packets per destination of the traffic to RedIRIS.
D3.2 Traffic Models
Public
14 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-11: Percentage of packets of the 15 most contributing countries excluding Spain &
USA.
Figure 4-12: Map plot of Incoming Packet Locality.
4.2.2
Byte count
Figure 4-13 shows the percentage of bytes that are sent from the most contributing countries (those
that sent at least 1% of the bytes). First it is worth mentioning that in this case, the most contributing
country is not Spain but the United States, although both have nearly 30% of the total bytes. Another
D3.2 Traffic Models
Public
15 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
difference as compared with Figure 4-4 is the number of countries that contribute with more than 1%
of the total, which in this case is fewer than in Figure 4-4.
Figure 4-13: Percentage of bytes per destination of the traffic to RedIRIS.
In Figure 4-14 we present an analogous figure to Figure 4-5. There are noticeable differences
between them, because one can readily see that when talking about incoming direction, the most
contributing countries contribute more than they did in Figure 4-5 (it can be seen that the less
contributing countries of the incoming direction contribute with hardly 1% of the remaining traffic
excluding Spain and the United States.
Finally, as was done in the previous cases, we present a map of the localization of the sources of the
traffic in Figure 4-15.
4.2.3
Flow count
In Figure 4-16 we present the results of the countries that contribute with at least 1% of the total
number of flows. Comparing it with Figure 4-7, it can be seen that the percentages accounted for by
Spain and the United States are nearly the same, and so is the percentage of the remaining countries.
The only slight difference is that Brazil is one of the most contributing countries in the incoming
direction which it wasn’t in the outgoing direction. The rest of the countries are the same in both cases,
with a small permutation of the orders.
Figure 4-17 shows the 15 most contributing countries, again after the removal of Spain and the United
States. It can be seen that the percentage of the countries are nearly the same as in Figure 4-8, and
the only difference between both of them is that in the incoming direction Brazil has replaced
Switzerland.
Finally, Figure 4-18 shows a map with the geographic localization of the flows represented as a
intensity color scale.
D3.2 Traffic Models
Public
16 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-14: Percentage of bytes of the 15 most contributed countries excluding Spain & USA.
Figure 4-15: Map plot of Incoming Byte Locality.
D3.2 Traffic Models
Public
17 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-16: Percentage of flows per destination of the traffic to RedIRIS.
Figure 4-17: Percentage of flows of the 15 most contributing countries excluding Spain & USA.
D3.2 Traffic Models
Public
18 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-18: Map plot of Incoming Flow Locality.
4.3
Both directions
In this section we present the results of the joint analysis of traffic in both the incoming and the
outgoing directions. The discussion of the results is deferred to the following section.
4.3.1
Packet count
We present here the results of analyzing the geographic locality measuring the packet count. Figure
4-19 shows the percentages of the countries that contribute more than 1% in both directions. Figure
4-20 shows the 15 most contributing countries after removing the bias introduced by Spain and the
United States due to their huge difference in percentages. Finally, Figure 4-21 displays the data of the
countries that account for more than 10-3 % of the total number of packets in a world map with color
intensities.
4.3.2
Byte count
For the case of the byte count in both directions, Figure 4-22 shows the countries that contribute with
at least 1% of the bytes in both directions, Figure 4-23 presents the fifteen most contributing countries
after removing Spain and the United States and finally Figure 4-24 presents this information in a world
map.
4.3.3
Flow count
The last study we present in this chapter is the analysis of the locality of the traffic for both of the
incoming and outgoing directions when measuring the number of flows. Figure 4-25 presents the
countries that account for more than a 1% of the total number of flows, Figure 4-26 shows the fifteen
most contributing countries without taking into account Spain and the United States as they both
account for more than half of the flows, and finally Figure 4-27 presents this information in a world
map.
D3.2 Traffic Models
Public
19 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-19: Percentage of packets per destiny in both directions.
Figure 4-20: Percentage of packets of the 15 most contributing countries after removing Spain
& USA.
D3.2 Traffic Models
Public
20 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-21: Map plot of both directions packet Locality.
Figure 4-22: Percentage of bytes per destination in both directions.
D3.2 Traffic Models
Public
21 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-23: Percentage of bytes of the 15 most contributing countries excluding Spain & USA.
Figure 4-24: Map plot of both directions byte Locality.
D3.2 Traffic Models
Public
22 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-25: Percentage of flows per destination in both directions.
Figure 4-26: Percentage of flows of the 15 most contributing countries excluding Spain & USA.
D3.2 Traffic Models
Public
23 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 4-27: Map plot of both directions flow Locality.
4.4
Discussion of the results
In this section an analysis of the results presented in the previous sections is presented. We will give a
more in depth description of these results and provide some insights into what we measure. First of all
it is worth remembering the results of D3.1 for daily and weekly profiles of the RedIRIS networks. In
that analysis, it was shown that in the RedIRIS network the outgoing traffic is greater than the
incoming traffic during night-hours, but during working-hours it was smaller. Also it is worth mentioning
that the average incoming traffic was slightly greater than the average outgoing traffic. We refer the
reader to the D3.1 for further reading.
The majority of the packets, approximately 40%, are sent and received within Spain. This is
reasonable because all the universities connected to the network are located in Spain, and it is logical
that most of the Internet traffic is sent and received from Spanish sites. In the RedIRIS network, the
P2P is negligible. Instead the largest traffic volumes come from email or web services. This traffic will
be between Spanish people in the first case and will be visited by Spanish people in the second case.
Human social interaction studies support these results xvi.
In second place we found the United States which is responsible for about 20% of the total packet
traffic volume. As the United States stands for most of the research developments and is the world
leader in the information society, it is understandable when looking for first hand information to search
within United States sites. Moreover, the majority of the most visited web pages are hosted in the
United States, which supports these results.
In third place we can put the group of most important countries in the European Community. To
mention some of them: France, Italy, United Kingdom, Germany… From a research perspective, the
most common kind of projects without taking into account national projects are European projects
such as this one for example. This forces the users of the RedIRIS network to communicate and be up
to date with news of these countries, explaining the high percentage of traffic to and from these
countries accounting for nearly 20% of the total traffic.
In fourth position, we encounter the Latin American countries, for instance Mexico, Chile, Argentina,
etc. As the official language used there is Spanish, it is usual to get directed to a web page in one of
these countries when looking for information in Spanish. Moreover, there are a lot of researchers in
Spain that come from Latin America countries which also explains the percentage of traffic from this
region which is nearly 15% of the total traffic volume.
D3.2 Traffic Models
Public
24 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Finally, we find that nearly all countries are present in the study. Although their specific percentage of
the traffic is very small (less than 10-3 % of the traffic) they jointly account for nearly 5% of the traffic.
This traffic can be defined as sporadic.
In the section describing the incoming traffic, it was found that the United States was the largest
contributor to the byte count and not Spain as expected (see Figure 4-13). This can be thought of as
anomalous compared with the flow count and packet count for the same direction where Spain has a
greater percentage of traffic than the United States. Actually it cannot be considered an anomaly, as it
is an example of a well studied phenomenon, the aforementioned “the elephants and mice
phenomenon”. This phenomenon xvii is very common in actual networks, where a small percentage of
the users account for most of the traffic. In our case we find “elephants” in the United States traffic,
where we can see that there are flows that contain a similar number of packets that have a very high
payload. So although we have more flows and packets from Spain, the ones that come from the
United States have a greater percentage of the total number of bytes.
The impact of these “elephants” is not so large when both the incoming and the outgoing directions
are taken into account. We can see that for all the studied metrics, Spain accounts for the greater
percentage.
5
5.1
5.1.1
MAIN TRAFFIC DESCRIPTOR
Daily and weekly profiles
Daily profiles Spanish network
In this section the daily profile of the average traffic per active user in the Spanish fixed network
(CMTS) was investigated. The measurements used in this analysis were done from 2008-03-07 to
2008-03-30.
The CMTS daily profile in the Spanish fixed network is shown in the Figure 5-1:
650
IN
OUT
TOTAL
600
550
Traffic rate (Mbps)
500
450
400
350
300
250
200
150
0
5
10
15
Hours of the day
20
25
Figure 5-1. CMTS daily profile (Spanish Network measurement from 2008-03-07 to 2008-03-30)
The number of active users for each hour of the day was calculated. A subscriber was considered to
be active if it generated any uplink or downlink traffic within the hour. This was done for all the days of
the measurement period and then averaged. The results are plotted in Figure 5-2:
D3.2 Traffic Models
Public
25 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
2600
2400
Number of active users
2200
2000
1800
1600
1400
1200
1000
800
0
5
10
15
Hours of the day
20
25
Figure 5-2. Number of active users per hour of the day (Spanish Network CMTS measurement
from 2008-03-07 to 2008-03-30)
Using the data from Figure 5-1 and Figure 5-2, the average traffic rate per active user for each hour of
the day was calculated. The results are shown in the Figure 5-3.
5
Average traffic rate per active user (bps)
4.5
x 10
IN
OUT
TOTAL
4
3.5
3
2.5
2
1.5
1
0
5
10
15
Hours of the day
20
25
Figure 5-3. Daily average traffic rate per active user (Spanish Network CMTS measurement
from 2008-03-07 to 2008-03-30)
It can be noted that the average traffic rate per active user remains stable throughout the day.
However during the night (from 1 a.m. to 7 a.m.) the average traffic rate per active user increases
significantly (around 60%). That means that the percentage of heavy users in the total number of
active users is much higher during that time.
5.1.2
Comparison between the daily profiles of the different networks
In this analysis the following measurements were considered (only fixed networks):
D3.2 Traffic Models
Public
26 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
−
DSL in Swedish municipal network No.1, measurements from 2007-12-10 to 2007-01-30.
−
FTTH in Swedish municipal network No. 1, measurements from 2007-10-01 to 2007-11-06.
−
RedIRIS university, measurements from 2008-06-11 to 2008-07-22.
−
CMTS in Spanish Network, measurements from 2008-03-07 to 2008-03-30.
The daily traffic pattern of each network was normalized dividing by its maximum value. Then the
average between the networks was calculated and normalized, also dividing by its maximum value.
This was done separately for the downlink, uplink and total traffic. The results are shown in the Figure
5-4, Figure 5-5 and Figure 5-6:
1
RedIRIS
CMTS
FTTH
DSL
Average
0.9
N ormalized dow nlink traffic
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
Hours of the day
Figure 5-4. Normalized daily downlink traffic pattern for the different networks and normalized
average daily downlink traffic pattern
1
RedIRIS
CMTS
FTTH
DSL
Average
N o rm a lize d u p lin k tra ffic
0.9
0.8
0.7
0.6
0.5
0.4
0
5
10
15
20
25
Hours of the day
Figure 5-5. Normalized daily uplink traffic pattern for the different networks and normalized
average daily uplink traffic pattern
D3.2 Traffic Models
Public
27 (74)
Project Deliverable
1
RedIRIS
CMTS
FTTH
DSL
Average
0.9
0.8
N orm alized total traffic
CELTIC TRAMMS CP4-025
0.7
0.6
0.5
0.4
0.3
0.2
0
5
10
15
20
25
Hours of the day
Figure 5-6. Normalized daily total traffic pattern for the different networks and normalized
average daily total traffic pattern
These figures show that the Spanish fixed network and the Swedish network, despite using different
access technologies (CMTS, DSL, FTTH) have similar daily traffic patterns, both for the uplink and the
downlink traffic. However, the RedIRIS academic network shows a very different daily traffic pattern for
the downlink traffic. The amount of downlink traffic is less constant in the RedIRIS network than in the
other networks (10% of the maximum traffic at 5 a.m.) and from 1 p.m. to 9 p.m. it decreases while it
increases in the other networks.
The main conclusion is that the shape of the daily traffic patterns depends on the subscriber type of
the network (residential, enterprise, academic) and that there is a common daily traffic pattern for the
networks that have mainly residential users.
Taking into account the results shown in the Deliverable D3.1 it can be deduced that the amount of
downlink and uplink traffic depends on the access technology (CMTS, DSL or FTTH).
5.2
UL/DL volume and packet
This section describes the UL/DL traffic volume and packet share for each measurement set.
5.2.1
Spanish network
In the Deliverable 3.1 a 6.000 subscriber sample was studied to obtain the relationship between the
downlink and the total traffic volume per subscriber (Spanish Network measurement from 2008-03-07
to 2008-03-30). This analysis was done for the fixed access network and it was found that the
common profile is to have much more downlink than uplink traffic:
1. The percentage of households where the downlink was similar to the uplink traffic volume was
25,95%.
2. The percentage of households where the downlink traffic volume was greater than the uplink
traffic volume was 58,99%. Within this group, the households where the downlink traffic
volume was much greater than the uplink traffic volume was 34,31% of the total.
3. The percentage of households where the uplink traffic volume was greater than the downlink
traffic volume was 15,05%. Within this group the households where the uplink traffic volume
was much greater than the downlink traffic volume was 2,92% of the total.
However, as seen in the Deliverable D3.1 section 5.1.2 “Daily profiles Spanish network”, in the early
hours of the day the amount of uplink traffic volume is greater than the downlink traffic volume,
probably due to P2P application usage. Therefore there may exist a relationship between the type of
subscriber (heavy user, applications used) and the downlink/uplink ratio of the subscriber that can be
explored in further subscriber cluster analysis.
D3.2 Traffic Models
Public
28 (74)
Project Deliverable
5.3
CELTIC TRAMMS CP4-025
Subscriber clusters
This section contains the summary and conclusions of the cluster analysis results presented in
Deliverable 3.1 along with new results from analysis assuming traffic limits and analysis of minimal
users.
The section analyses different Spanish network measurements in order to identify population groups
that behave similarly from different points of view. Section 5.3.1 considers the total traffic volume per
MAC address in uplink and downlink direction. The analysis in 5.3.2 considers the total downlink traffic
volume per MAC address and the number of observed applications per MAC address using cluster
analysis. Section 5.3.3 performs cluster analysis assuming traffic limits. Section 5.3.4 analyses
“minimal users”. Finally, Section 5.3.6 is based on the traffic volume of several popular applications.
Section 5.3.1, 5.3.2 and 5.3.6 summarize the conclusion of Deliverable 3.1, while Section 5.3.3 and
5.3.4 present new results.
A few general conclusions can be drawn from the different kinds of analyses detailed below. First of
all, there is a small group of subscribers who generate huge amount of – mainly P2P – traffic, while
there are subscribers whose traffic demands are much more moderate and the ratio of web browsing
is more significant in their traffic mix, though they also use P2P applications as well.
Another general conclusion is that the traffic of the heavy users seems to be constant over the
repeated measurements, while the moderate subscribers seem to gradually increase their activity.
5.3.1
Separation using the total traffic volume
The exploratory data analysis for identifying user groups was performed in Spanish Network
measurements detailed below. The exact definition of the main analysed statistics is:
Average incoming/outgoing daily traffic volume per MAC address
This value is calculated as the average of the incoming/outgoing daily traffic volumes generated by a
MAC address during the measurement.
Table 5-1 shows an example of the basic statistics for the aggregated traffic volume of the different
MAC addresses. The population here consists of 5801 different MAC addresses.
By comparing the results, we could observe a little increase in the maximum traffic. Note, that the
mean traffic was more-or-less the same in all measurements.
N
average incoming daily
traffic per MAC address
average outgoing daily
traffic per MAC address
Minimum
Maximum
Mean
Std. Deviation
5801
0
19.2 GB
502.1 MB
1.1 GB
5801
0
7.0 GB
437.0 MB
874.3 MB
Table 5-1 Descriptive statistics of the average daily traffic of MAC addresses (Spanish Network
measurement from 2008-02-19 to 2008-02-29)
Table 5-2 shows the 10 percentile values for the distribution of the average incoming daily traffic per
MAC address. In the low percentage region, the downlink (incoming) traffic is more significant than the
uplink (outgoing). For example, 30% of the MAC addresses downloaded less than 36 MB traffic, while
the bottom 30% uploaded less than 6 MB. However, particularly above 90% the up- and downlink
traffic volume is quite balanced: 1.3 GB for downlink and 1.4 GB for uplink. One can also observe the
different scales in the traffic volumes for both directions: the median (50%) is at 113 MB / 46 MB,
which is quite small compared to the mean traffic volume of 502 MB / 437 MB. This indicates that the
distribution of the daily average traffic volume has significant tail.
D3.2 Traffic Models
Public
29 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
average incoming
daily traffic per
MAC address
Percentiles
average outgoing
daily traffic per
MAC address
10
1.7 MB
0.2 MB
20
14.7 MB
2.0 MB
30
35.6 MB
5.7 MB
40
64.7 MB
15.1 MB
50
113.2 MB
45.7 MB
60
195.8 MB
126.7 MB
70
351.4 MB
298.2 MB
80
648.0 MB
635.2 MB
90
1337.7 MB
1372.9 MB
Table 5-2 10 percentile values of the average daily traffic volume per MAC addresses (Spanish
Network measurement from 2008-02-19 to 2008-02-29)
The comparison of further measurements showed that the 20-70 percentile regions, i.e. the low traffic
segments, increased their traffic volume over the measurements. Unlike the low traffic segments, the
traffic of the heavy users seems to have stayed on a constant level, around 1.2-1.3 GB.
Figure 5-7 shows the complementary cumulative distribution function of the number of subscribers as
a function of the downlink/uplink traffic volume in the same Spanish Network measurement. Since
Table 5-1 and Table 5-2 indicate that the subscribers generate traffic on significantly different scales,
the complementary cumulative distribution function is plotted on linear-logarithmic scale in order to
visualize the distribution over several orders of magnitude.
Figure 5-8 shows the uplink traffic volume – downlink traffic volume pairs for the Spanish Network
measurement from 2008-02-19 to 2008-02-29. As was seen in Table 5-2 - 4, Figure 5-8 shows that the
traffic is quite downlink dominated for MAC addresses with low traffic volumes, while it is quite
symmetric for MAC addresses with high traffic volume.
traffic volume in downlink
direction
traffic volume in uplink direction
1.0
Probability
0.8
0.6
0.4
0.2
0.0
1.0E-5GB
1.0E-4GB
0.0GB
0.0GB
0.1GB
1.0GB
10.0GB
100.0GB
Average traffic volume per day per mac address
Figure 5-7: Complementary cumulative distribution function of users based on generated
traffic
(Lin-log scale, Spanish Network measurement from 2008-02-19 to 2008-02-29)
D3.2 Traffic Models
Public
30 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
average uplink traffic volume per mac address
10.0GB
1.0GB
0.1GB
0.0GB
0.0GB
1.0E-4GB
1.0E-5GB
1.0E-6GB
1.0E-6GB
1.0E-5GB
1.0E-4GB
0.0GB
0.0GB
0.1GB
1.0GB
10.0GB
100.0GB
average traffic downlink volume per mac address
Figure 5-8 The relation of the uplink and downlink traffic per MAC address (Log-log scale,
Spanish Network measurement from 2008-02-19 to 2008-02-29)
5.3.2
Separation using cluster analysis
The aim of this subsection was to divide the total population into groups (clusters) using cluster
analysis. For this reason, the following statistics are also computed for each MAC address besides the
average traffic volume statistics:
Number of active applications per MAC address
This is the number of applications identified by PacketLogic per MAC address that have
nonzero traffic volume.
Table 5-3 shows an example result of the cluster analysis (Two Step Cluster in SPSS) that was run on
the population of MAC addresses. The average downlink daily traffic volume per MAC address and the
number of active applications per MAC address statistics were used as continuous variables. The
cluster analysis identified three distinct clusters in both measurements.
Cluster
1
N
1768
% of Total
30.5%
2
3645
62.8%
3
Total
388
6.7%
5801
100.0%
Table 5-3 Cluster distribution for Spanish Network measurement from 2008-02-19 to 2008-02-29
D3.2 Traffic Models
Public
31 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Average downlink
daily traffic volume
per MAC address
Cluster
Number of active
applications per MAC
address
1
67.9 MB
14.4
2
372.3 MB
36.3
3
3700.0 MB
42.8
502.1 MB
30.1
Combined
Table 5-4 Cluster centroids for Spanish Network measurement from 2008-02-19 to 2008-02-29
Table 5-4 shows the coordinates of the cluster centroids for the two Spanish measurements. The
centroids of Cluster 3 in the measurements indicate that these clusters contain MAC addresses that
have rather high average downlink traffic volume; these can be identified as “heavy users” or “high
profile” subscribers.
Table 5-3 shows that this cluster contains 6.7% of the population in the first Spanish measurement.
The similar ratio is 6.6% for the second Spanish measurement, respectively.
Another cluster contains subscribers who generate considerably limited traffic volume and use a
limited number of applications at the same time. The ratio of this cluster is around 30% in the first and
second measurement. We suggest referring to these subscribers as “low profile” subscribers.
The third cluster centroid shows that the subscribers in this cluster generate considerably smaller
downlink traffic volume. However, the members of Cluster 2 use a large set of applications, almost 40,
while the members of Cluster 1 use much less. That is, there is a group whose members generate
relatively low traffic volume, but use many applications. We suggest referring to these subscribers as
“medium profile” subscribers.
After comparing the cluster centroids for the clusters of the low profile subscribers in the two
measurements, we found that the cluster centroids are quite similar. The average traffic of the low
profile subscribers is 68 MB in the first measurement and again around 60 MB in the second one. The
number of observed applications seems to be 10-20 for this set. That is, the population of the low
profile subscribers seems to behave in a similar manner over the measurement periods.
When examining the cluster centroids for the medium profile subscribers, it was seen that the traffic of
the medium profile subscribers increased in the second measurement as compared to the first one.
Also based on the analysis of the cluster centroids, it can be seen that the centroids show similar
traffic volumes for the measurements.
We note that the number of the observed applications in medium and high profile subscribers seems
to stay around 40 in all the measurements.
In Figure 5-9, the structure of the different clusters can be seen. The figure shows the average traffic
per MAC address versus the number of applications used by the individual subscribers. There is a
small population of heavy users and two larger for the medium and low profile users. The three
populations are distinctly divided.
D3.2 Traffic Models
Public
32 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
TwoStep Cluster
Number
Average incoming daily traffic per mac address
100000,00 MB
1
2
10000,00 MB
3
1000,00 MB
100,00 MB
10,00 MB
1,00 MB
0,10 MB
0,01 MB
0
20
40
60
80
Number of active applications per mac address
Figure 5-9. The average traffic and application number of the different clusters
There is a straightforward question about the list of applications that generate most traffic in the
different groups. Table 5-5 shows this list together with the average traffic volumes for the
corresponding applications in the first Spanish measurement. It can be seen that the HTTP traffic is
quite significant for the low and medium profile subscribers, while different P2P traffic types appear in
first place for the medium and high profile subscribers. Nevertheless, we note that marginal P2P traffic
can also be observed in the group of low profile subscribers. Similar conclusions can be drawn from
the same list in other measurements.
D3.2 Traffic Models
Public
33 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Cluster 1
Encapsulated
HTTP
eDonkey
HTTP media stream
Unknown
Cluster 2
Cluster 3
22.1 MB
15.5 MB
11.1 MB
6.4 MB
4.1 MB
eDonkey
HTTP
BitTorrent transfer
eDonkey encrypted
HTTP media stream
139.8 MB
60.8 MB
46.9 MB
28.4 MB
24.8 MB
7.4 MB
3.5 MB
3.2 MB
eDonkey
BitTorrent transfer
Unknown
eDonkey encrypted
HTTP
BitTorrent
encrypted transfer
Encapsulated
HTTP media
stream
Ares
Thunder UDP
BitTorrent transfer
POP3
3.3 MB
0.8 MB
17.0 MB
9.9 MB
Untracked
eDonkey encrypted
SSL v3
BitTorrent encrypted
transfer
Soulseek
PPLive
TFTP transfer
RTMP
0.6 MB
0.4 MB
0.4 MB
Unknown
Ares
BitTorrent encrypted
transfer
Encapsulated
Ares tcp
0.3 MB
0.2 MB
0.2 MB
0.2 MB
0.2 MB
Pando
BitTorrent KRPC
Kademlia
Untracked
POP3
2.4 MB
2.2 MB
2.0 MB
1.7 MB
1.5 MB
Pando
Untracked
BitTorrent KRPC
QQ live
PPStream
1692.2 MB
625.4 MB
286.3 MB
269.0 MB
240.5 MB
141.5 MB
71.3 MB
53.5 MB
47.1 MB
46.0 MB
38.3 MB
29.3 MB
19.3 MB
17.0 MB
15.9 MB
Table 5-5 Most voluminous traffic types of the clusters for Spanish Network measurement from
2008-02-19 to 2008-02-29
5.3.3
Clustering of users by setting up traffic limits
This subsection is an extension of the previous one. We were interested in how the clustering results
change if we set a lower limit on the generated traffic per user, thus excluding those users whose
traffic was low in the measurement interval.
Table 5-6 shows the descriptive statistics of the total traffic considering different lower bounds on the
total traffic.
Descriptive statistics for the total traffic
Lower limit Number
on traffic
of
per day
users
Minimum
Maximum
Mean
Std. Deviation
Number of
applications
0 MB
5801
0,0 GB
19,2 GB
0,5 GB
1,1 GB
312
10 MB
5137
10,07 MB
19,40 GB
1,05 GB
1,58 GB
164
30 MB
4847
30,00 MB
19,72 GB
1,18 GB
1,67 GB
131
50 MB
4577
50,10 MB
19,69 GB
1,28 GB
1,73 GB
114
Table 5-6. Descriptive statistics of the total traffic considering different lower bounds on the
total traffic
It can be concluded from the table that by setting a limit on the daily traffic, the number of users
meeting the criterion drops sharply. There are 664 users generating less traffic than 10 MB per day,
which is a surprisingly high number. (The traffic statistics of this user group is analyzed in details in
Section 39.) There are 954 users under 30 MB and 1224 under 50 MB.
As a trivial consequence, the average traffic per user also rose. The standard deviation of the
generated traffic per user increases significantly as well.
D3.2 Traffic Models
Public
34 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Similar to the number of users, a considerable drop can be seen in the number of applications. Only
164 applications (out of 312 which can be found in the measurement) generate more traffic than 10
MB, 131 generate more than 30 MB, and only 114 applications exceed the 50 MB limit of daily
average traffic. Without setting a traffic limit the number of applications in the “minimal”, “medium” and
“heavy user” clusters were 14, 36, and 42 respectively. As expected, after setting a traffic limit, these
numbers fell to 2, 6, and 10.
Cluster centroids
0 MB
10 MB
traffic_mean_sum
Cluster Mean
APPL
traffic_mean_sum
APPL
Std. Dev Mean Std. Dev. Mean
Std. dev
Mean Std. dev
1
67,9 MB
202,9 MB 14,42
7,68
207,0 MB
270,8 MB
2,53
1,08
2
372,3 MB
429,1 MB 36,32
7,32
941,0 MB
656,2 MB
6,69
1,55
10,20 3831,4 MB 2377,7 MB 10,57
2,80
3 3700,0 MB 2223,8 MB 42,79
Cluster centroids
30 MB
traffic_mean_sum
Mean
50 MB
APPL
traffic_mean_sum
Std. Dev Mean Std. Dev Mean
Std. Dev
APPL
Mean Std. Dev
374,14 MB
0,39 GB
2,42
1,09
400,90 MB
398,00 MB
1,92
0,81
1761,36 MB
0,97 GB
6,36
1,58 1688,50 MB
933,43 MB
5,02
1,21
6448,76 MB
2,49 GB
8,78
2,97 5607,78 MB 2627,18 MB
7,85
2,29
Table 5-7. Cluster centroids in the cases of different lower bounds on traffic per day
Cluster distribution
0 MB
10 MB
30 MB
50 MB
N
%
N
%
N
%
N
%
1
1768
30,477
2228
43,372
2922
60,285
2577
56,303
2
3645
62,833
2154
41,931
1661
34,269
1628
35,569
3
388
6,688
755
14,697
264
5,447
372
8,128
Table 5-8. Distribution of users between clusters
D3.2 Traffic Models
Public
35 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Cluster Size
TwoStep Cluster
Number
1
2
3
0 MB lower limit on traffic per day
10 MB lower limit on traffic per day
Cluster Size
Cluster Size
TwoStep Cluster
Number
TwoStep Cluster
Number
1
1
2
2
3
3
30 MB lower limit on traffic per day
50 MB lower limit on traffic per day
Figure 5-10: Pie diagram of the distribution of users between clusters
Table 5-7 shows the cluster centroids in the cases of different lower bounds on traffic per day, while
Table 5-8 contains the distribution of users between clusters.
The clustering process classifies more users as “minimal (light) users” at the expense of the “medium
users”, because the real minimal users were practically filtered out. By shifting the cluster centroids the
average traffic of “minimal users” increased.
The percentage of heavy users did not change.
0 MB
TOTAL
eDonkey
Cluster 1
204,4 MB
Encapsulated
Cluster 2
22,1 MB
eDonkey
Cluster 3
139,8 MB
eDonkey
1692,2 MB
BitTorrent transfer
72,3 MB
HTTP
15,5 MB
HTTP
60,8 MB
BitTorrent transfer
625,4 MB
HTTP
59,0 MB
eDonkey
11,1 MB
BitTorrent transfer
46,9 MB
Unknown
286,3 MB
eDonkey encrypted
36,0 MB
HTTP media stream
6,4 MB
eDonkey encrypted
28,4 MB
eDonkey encrypted
269,0 MB
Unknown
31,1 MB
Unknown
4,1 MB
HTTP media stream
24,8 MB
HTTP
240,5 MB
HTTP media stream
21,1 MB
BitTorrent transfer
3,3 MB
Unknown
17,0 MB
BitTorrent encrypted transfer
141,5 MB
BitTorrent encrypted transfer
14,2 MB
POP3
0,8 MB
Ares
Encapsulated
13,7 MB
9,9 MB
Encapsulated
71,3 MB
Untracked
0,6 MB
BitTorrent encrypted transfer
7,4 MB
HTTP media stream
53,5 MB
Ares
9,4 MB
eDonkey encrypted
0,4 MB
Encapsulated
3,5 MB
Ares
47,1 MB
Pando
4,1 MB
SSL v3
0,4 MB
Ares tcp
3,2 MB
Thunder UDP
46,0 MB
D3.2 Traffic Models
Public
36 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Thunder UDP
3,8 MB
BitTorrent encrypted transfer
0,3 MB
Pando
2,4 MB
Pando
38,3 MB
Untracked
3,2 MB
Soulseek
0,2 MB
BitTorrent KRPC
2,2 MB
Untracked
29,3 MB
BitTorrent KRPC
2,7 MB
PPLive
0,2 MB
Kademlia
2,0 MB
BitTorrent KRPC
19,3 MB
Ares tcp
2,6 MB
TFTP transfer
0,2 MB
Untracked
1,7 MB
QQ live
17,0 MB
Kademlia
1,5 MB
RTMP
0,2 MB
POP3
1,5 MB
PPStream
15,9 MB
10 MB
TOTAL
Cluster 1
eDonkey
373,9 MB
BitTorrent transfer
164,5 MB
HTTP
Cluster 2
52,9 MB
Cluster 3
eDonkey
368,8 MB
eDonkey
1363,0 MB
eDonkey
43,7 MB
BitTorrent transfer
104,4 MB
BitTorrent transfer
807,5 MB
HTTP media stream
37,1 MB
HTTP
93,0 MB
eDonkey encrypted
309,0 MB
78,2 MB
Encapsulated
28,1 MB
eDonkey encrypted
74,1 MB
Unknown
241,2 MB
59,9 MB
Unknown
6,6 MB
HTTP media stream
64,4 MB
HTTP
212,6 MB
HTTP media stream
55,7 MB
BitTorrent transfer
4,8 MB
Unknown
51,5 MB
BitTorrent encrypted transfer
163,7 MB
BitTorrent encrypted transfer
30,5 MB
eDonkey encrypted
3,9 MB
Ares
24,7 MB
Ares
111,8 MB
Ares
27,7 MB
POP3
2,8 MB
BitTorrent encrypted transfer
15,3 MB
Untracked
95,1 MB
Encapsulated
23,6 MB
Adobe Update Manager
2,7 MB
Untracked
15,1 MB
HTTP media stream
85,8 MB
Untracked
21,3 MB
49,1 MB
HTTP
93,2 MB
eDonkey encrypted
Unknown
Untracked
2,2 MB
Encapsulated
10,0 MB
Encapsulated
Pando
8,3 MB
RTSP media stream
2,2 MB
Ares tcp
8,2 MB
Thunder UDP
42,4 MB
Thunder UDP
7,8 MB
RTMP
2,2 MB
PPLive
6,9 MB
Pando
38,1 MB
Ares tcp
6,6 MB
Ares
2,0 MB
Adobe Update Manager
6,8 MB
BitTorrent KRPC
27,5 MB
BitTorrent KRPC
6,0 MB
PPLive
1,5 MB
RTSP media stream
6,2 MB
Ares tcp
19,9 MB
PPLive
6,0 MB
BitTorrent tracker
1,4 MB
RTMP
6,2 MB
FTP transfer
19,6 MB
30 MB
TOTAL
Cluster 1
Cluster 2
eDonkey
413,9 MB
eDonkey
eDonkey
649,01 MB
eDonkey
2,30 GB
BitTorrent transfer
178,8 MB
HTTP
85,3 MB
BitTorrent transfer
269,51 MB
BitTorrent transfer
1,34 GB
HTTP
122,5 MB
HTTP media stream
51,9 MB
eDonkey encrypted
157,11 MB
eDonkey encrypted
0,51 GB
Encapsulated
24,3 MB
HTTP
148,86 MB
Unknown
0,46 GB
eDonkey encrypted
91,8 MB
110,3 MB
Cluster 3
Unknown
69,6 MB
BitTorrent transfer
22,5 MB
Unknown
106,65 MB
HTTP
0,37 GB
HTTP media stream
67,4 MB
eDonkey encrypted
16,6 MB
HTTP media stream
88,16 MB
BitTorrent encrypted transfer
0,29 GB
BitTorrent encrypted transfer
34,4 MB
Unknown
13,2 MB
Ares
58,59 MB
Untracked
0,21 GB
Ares
31,3 MB
Ares
BitTorrent encrypted transfer
51,40 MB
Encapsulated
0,13 GB
Encapsulated
25,6 MB
Untracked
3,8 MB
Untracked
30,75 MB
Ares
0,13 GB
Untracked
24,0 MB
Adobe Update Manager
3,3 MB
Ares tcp
14,83 MB
Thunder UDP
0,12 GB
7,2 MB
Thunder UDP
9,2 MB
RTSP media stream
3,1 MB
Pando
14,27 MB
HTTP media stream
0,11 GB
Pando
9,1 MB
RTMP
2,8 MB
PPLive
11,78 MB
Pando
0,06 GB
Ares tcp
7,4 MB
FTP transfer
2,4 MB
BitTorrent KRPC
11,60 MB
BitTorrent KRPC
0,04 GB
BitTorrent KRPC
6,6 MB
POP3
2,3 MB
Encapsulated
10,80 MB
NNTP
0,04 GB
PPLive
6,4 MB
PPLive
2,0 MB
FTP transfer
9,64 MB
QQ live
0,03 GB
50 MB
TOTAL
eDonkey
Cluster 1
449,1 MB
eDonkey
Cluster 2
127,7 MB
eDonkey
Cluster 3
628,8 MB
eDonkey
1888,7 MB
1166,7 MB
BitTorrent transfer
192,1 MB
HTTP
99,6 MB
BitTorrent transfer
240,1 MB
BitTorrent transfer
HTTP
144,0 MB
HTTP media stream
56,5 MB
HTTP
168,7 MB
eDonkey encrypted
461,8 MB
eDonkey encrypted
102,3 MB
Encapsulated
28,3 MB
eDonkey encrypted
156,8 MB
Unknown
405,3 MB
105,2 MB
HTTP
343,9 MB
BitTorrent encrypted transfer
259,6 MB
Unknown
76,0 MB
BitTorrent transfer
21,1 MB
Unknown
HTTP media stream
75,5 MB
eDonkey encrypted
16,0 MB
HTTP media stream
D3.2 Traffic Models
Public
92,3 MB
37 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
BitTorrent encrypted transfer
38,0 MB
Unknown
Ares
55,9 MB
Untracked
177,9 MB
Ares
34,2 MB
Ares
7,6 MB
BitTorrent encrypted transfer
46,7 MB
HTTP media stream
134,6 MB
Encapsulated
27,1 MB
RTSP media stream
2,9 MB
Untracked
25,8 MB
Ares
123,9 MB
Untracked
25,2 MB
Untracked
2,7 MB
Ares tcp
14,7 MB
Thunder UDP
94,7 MB
Thunder UDP
10,2 MB
Pando
9,8 MB
10,0 MB
RTMP
2,5 MB
Pando
13,1 MB
Encapsulated
82,1 MB
FTP transfer
2,4 MB
Encapsulated
12,7 MB
Pando
58,2 MB
10,5 MB
Ares tcp
7,9 MB
POP3
1,8 MB
PPLive
BitTorrent KRPC
38,5 MB
BitTorrent KRPC
6,6 MB
Zattoo TCP
1,8 MB
BitTorrent KRPC
9,2 MB
NNTP
27,4 MB
PPLive
6,4 MB
Ares tcp
1,7 MB
FTP transfer
8,3 MB
FTP transfer
26,1 MB
Table 5-9. Top applications per cluster assuming different lower bounds on traffic per day
TwoStep Cluster
Number
30
1
2
25
3
APPL
20
15
10
5
0
0,00GB
5,00GB
10,00GB
15,00GB
20,00GB
traffic_mean_sum
Figure 5-11 The average traffic and the number of used applications per user on a 2D plot of
those generating at least 10 MB average traffic per day
TwoStep Cluster
Number
1
25
2
3
APPL
20
15
10
5
0
0GB
5GB
10GB
15GB
20GB
traffic_mean_sum
Figure 5-12 The average traffic and the number of used applications per user on a 2D plot of
those generating at least 30 MB average traffic per day
D3.2 Traffic Models
Public
38 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 5-11 and Figure 5-12 show the clustering results on a 2D plane assuming a traffic limit of 10
and 30 MB, respectively. The dimensions are average generated traffic and the average number of
applications. Each point represents one user. The clusters are indicated by different colors.
Table 5-9 shows the top applications per clusters assuming different lower bounds on average traffic
per day. Despite the traffic limits, we observed the same contributing applications, but with higher
average traffic.
Regarding the light users, the higher the traffic limit, the more HTTP traffic shifts backward in rank for
the top applications, because users generating the least amount of traffic are filtered out. Nonetheless,
HTTP traffic still remains important. Interestingly, P2P applications are getting more dominant as the
traffic limits are increasing. At the same time new applications also showed up among the top 15
applications (e.g., RSTP media stream, BitTorrent tracker). Interestingly, the “Adobe Update Manager”
application also earned a position in the top 15 by generating 3 MB of traffic per day on average.
Considering the heavy users, the list of top applications remained unchanged irrespectively of the
traffic limits.
5.3.4
Analysis of “minimal users”
Those users who generated less than 10 MB daily traffic on average are regarded as “minimal users”.
This section gives insight into the statistics of the “minimal user” group.
Table 5-10 contains 10% percentile values of the average daily traffic volume per MAC addresses in
the case of “minimal users”, while Figure 5-13 shows complementary cumulative distribution function
of the traffic generated by light users. Both suggest that the traffic of minimal users is distributed
unevenly (just like the total traffic).
Statistics
traffic_mean_sum
N
Valid
5801
Missing
Percentiles
0
10
0,88 MB
20
2,12 MB
30
3,47 MB
40
4,64 MB
50
5,92 MB
60
7,34 MB
70
9,01 MB
80 11,23 MB
90 15,22 MB
Table 5-10 10% percentile values of the average daily traffic volume per MAC addresses in the
case of “minimal users”
D3.2 Traffic Models
Public
39 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 5-13: Complementary cumulative distribution function of the traffic generated by light
users
N
Minimum Maximum Mean
Std. Deviation
HTTP
5801
0,00 MB
9,94 MB
2,11 MB
2,16 MB
HTTP media stream
5801
0,00 MB
9,68 MB
0,65 MB
1,02 MB
SSL v3
5801
0,00 MB
8,16 MB
0,54 MB
0,83 MB
Unknown
5801
0,00 MB
9,64 MB
0,53 MB
1,09 MB
Undetermined
5801
0,00 MB
9,49 MB
0,42 MB
1,09 MB
POP3
5801
0,00 MB
7,71 MB
0,33 MB
0,79 MB
eDonkey encrypted
5801
0,00 MB
9,71 MB
0,33 MB
0,90 MB
Kademlia
5801
0,00 MB
9,94 MB
0,21 MB
0,82 MB
eDonkey
5801
0,00 MB
9,71 MB
0,16 MB
0,66 MB
ICMP
5801
0,00 MB
9,15 MB
0,15 MB
0,54 MB
BitTorrent KRPC
5801
0,00 MB
9,98 MB
0,12 MB
0,62 MB
Untracked
5801
0,00 MB
9,68 MB
0,12 MB
0,37 MB
Encapsulated
5801
0,00 MB
9,15 MB
0,11 MB
0,47 MB
RTMP
5801
0,00 MB
4,84 MB
0,11 MB
0,34 MB
PPLive
5801
0,00 MB
4,74 MB
0,10 MB
0,31 MB
Table 5-11. Descriptive statistics of applications used by “minimal users”
Table 5-11 shows statistics of applications used by “minimal users”. The table suggests that most
minimal users browse the web (HTTP and HTTP secure) and read emails. These applications are
followed by file sharing and P2P applications, which means even light users use them.
5.3.5
Separation using cluster analysis (Swedish network)
xviii
Cluster measurement data present household usage based on the number of unique applications
used together with the amount of data transferred. The bandwidth axis is in logarithmic scale to
resolve users with low bandwidth usage and remove the domination of extreme bandwidth usage in
D3.2 Traffic Models
Public
40 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
the graph. Below, inbound data refers to when a household downloads data and outbound to when
data leaves the household. We separate these to detect differences and similarities of the shape of
the clusters. One can then identify common user habits and extreme cases.
Figure 5-14 The graph shows a number of households, plotted based on the number of
applications used and inbound bandwidth consumed. Technology is FTTH. Measurement from
the Swedish network No. 1 between 2007-09-01 00:00 and 2007-10-01 00:00. Total number of
households in measurement is 2081
Figure 5-14 and Figure 5-15 show a measurement from the FTTH part of the Swedish municipal
network No.1 measured during 30 days, between 2007-09-01 00:00 and 2007-10-01 00:00. Traffic is
separated based on direction where Figure 5-14 describes inbound traffic and Figure 5-15 represents
outbound traffic. The upper 10% of the households are colored red based on their high bandwidth
consumption. Similarly, the lower 10% of the households are colored blue based on low bandwidth
consumption. The total number of households measured was 2081. The upper boundary was
calculated to 2 GB of inbound data per day and per household and the lower boundary was
approximately 4 MB of total data per day and household. For outbound data, the boundaries were
calculated to 9 GB and 4 MB. We see a distinct difference in inbound volume and outbound volume.
To a large extent, traffic dominating this is P2P file sharing from computers left on around the clock.
Another characteristic for both inbound and outbound traffic in the FTTH measurement which is also
present in Figure 5-16, which shows the DSL measurement, is the fact that we have a slope from low
application and low bandwidth users to high application and high bandwidth users.
We also see that the application and protocol usage is quite high, the majority uses more than 20
applications or protocols. One reason for such a high number is the fact that the PacketLogic
separates protocols for one specific application. For example, Skype has seven different sub-protocols
while HTTP can be divided into HTTP, HTTP media streaming and with the use of SSL v2 and SSL
v3, common web browsing is therefore listed as four different applications.
D3.2 Traffic Models
Public
41 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 5-15 The graph shows a number of households, plotted based on the number of
applications used and outbound bandwidth consumed. Technology is FTTH. Measurement
from the Swedish network No. 1 between 2007-09-01 00:00 and 2007-10-01 00:00. Total number
of households in measurement is 2081.
Looking further into Figure 5-15, there are a lot of households quite high on the data axis. To put that
scale in perspective, every household consuming more than 106 KB use at least the equivalent of
data to download or upload one full length movie per day during one month. As we see in Figure 5-16
where the technology is DSL instead of FTTH, fewer people reach that extreme level. This might
depend on many different things, including the type and age of the people in the households and the
bandwidth of the Internet connection. In the DSL case, we have not divided the traffic into inbound and
outbound graphs because we could not find any distinct differences when doing so. Hence, Figure
5-16 represents the total data measured. The DSL measurements were performed between 2008-0202 00:00 and 2008-02-20 00:00 and the total number of households in the measurement was 104.
The same color indexing was used in Figure 5-16 as in Figure 5-15 but the boundaries for the DSL
case of total data were calculated to 2 GB and 5 MB for the upper and lower boundary, respectively.
Figure 5-16 The graph shows a number of households, plotted based on the number of
applications used and total bandwidth consumed. Technology is DSL. Measurement from the
Swedish network No. 1 between 2008-02-02 00:00 and 2008-02-20 00:00. Total number of
households in measurement is 104.
D3.2 Traffic Models
Public
42 (74)
Project Deliverable
5.3.6
CELTIC TRAMMS CP4-025
Separation using the traffic volume of popular applications
Figure 5-17, Figure 5-18, Figure 5-19 and Figure 5-20 show histograms of the number of users as a
function of traffic volume for some popular applications.
Probability
Both Figure 5-17 and Figure 5-18 show that the complementary cumulative distribution function of the
FTP control and data traffic volume have linear parts on log-log scale indicating that the traffic volume
distributions have polynomial relations over two or three orders of magnitude.
0.01
0.001
0.00
0.001MB
0.01MB
0.10MB
1.00MB
10.00MB
FTP control average daily traffic volume
Figure 5-17: Complementary cumulative distribution function of MAC addresses based on
generated FTP control traffic (50% of the MAC addresses do not generate FTP control traffic)
Probability
Probability
0.10
0.01
0.01
0.001
0.00
0.00
0.01MB
0.10MB
1.00MB
10.00MB
100.00MB
1000.00MB
FTP data average daily traffic volume
10.00MB
100.00MB
1000.00MB
FTP data average daily traffic volume
Figure 5-18: Body and tail of the complementary cumulative distribution function of MAC
addresses based on generated FTP data traffic (43% of the MAC addresses do not generate
FTP data traffic)
Figure 5-19 shows the complementary cumulative distribution function of the traffic volume of MAC
addresses that generate HTTP traffic. It can be seen that approximately 80% of the MAC addresses
generate less than 10MB HTTP traffic on average during a day. Nevertheless, there are a few MAC
addresses that generate more than 1GB HTTP traffic on average. This does not seem to be consistent
with the rest of the population and it might be the result of some data transfer applications over HTTP
connections and not real web browsing.
D3.2 Traffic Models
Public
43 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
1.00
Probability
0.80
0.60
0.40
0.20
0.00
0.01MB
0.10MB
1.00MB
10.00MB 100.00MB
1000.00
MB
10000.00
MB
HTTP average daily traffic volume
Figure 5-19: Complementary cumulative distribution function of MAC addresses based on
generated HTTP traffic (5% of the MAC addresses do not generate HTTP traffic)
Figure 5-20 shows the complementary cumulative distribution function of the traffic volume of MAC
addresses that generate P2P traffic. One can see here that approximately 80% of the MAC addresses
generate less than 100 MB of data on average during a day. The remaining 20% contains the “heavy
users” whose average daily traffic volume is in the order of GB.
1,0000
Probability
0,8000
0,6000
0,4000
0,2000
0,0000
1,00 MB
10,00 MB
100,00 MB
1000,00 MB
10000,00 MB
100000,00 MB
Average daily traffic volume of P2P file transfer
Figure 5-20: Complementary cumulative distribution function of MAC addresses based on
generated P2P traffic (BitTorrent, eDonkey, BitTorrent encrypted, eDonkey encrypted, PPLive)
The distributions of the population of MAC addresses were shown in this subsection based on the
traffic volume of three popular applications. There is a group containing MAC addresses with an
“average” behavior and there is another group containing MAC addresses with “heavy user” behavior.
It is a matter of further analysis whether the groups of “average HTTP” and “average P2P” MAC
addresses and the groups of “heavy user HTTP” and “heavy user P2P” MAC addresses correlate to
each-other. It is also a matter of further analysis whether these groups correlate to the groups defined
by the cluster analysis in Section 5.3.1.
5.4
Subscriber activities
In Deliverable 3.1 we analyzed the number of active MAC addresses contributing to the total traffic
and some popular applications categories, namely FTP, HTTP and P2P traffic. Here we are
summarizing the main conclusions.
The number of active MAC addresses was available on a daily basis in an 11 day long measurement.
A MAC address is assumed to be active if at least one byte of traffic is generated by that MAC address
during the measurement interval.
D3.2 Traffic Models
Public
44 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
The user activity was quite steady during every day of the measurement, though some minor drops in
user activity was observed during the weekend. This observation applies to the total traffic and to the
individual application categories as well.
Regarding FTP traffic, the daily number of active MAC addresses ranged between 300 and 500, which
is 5-9% of the total population. On the other hand, 2900 MAC addresses generated (at least once in
the interval) FTP traffic during the 11 days, which means 50% penetration. The difference of the two
values suggests that many of the MAC addresses generated FTP traffic for a few days only during the
measurement and did not return later.
Considering HTTP traffic, the number of daily users was around 3500-4500 (60-77%), while the total
penetration (meaning those users who have generated HTTP traffic at least once) was 5500 (95% of
the population). The numbers indicate that, compared to the case of FTP, more MAC addresses
returned and generated HTTP traffic several times during the measurement.
About the File Sharing Traffic we observed that practically there is no drop in activity during the
weekend days. This may happen because the P2P applications can work autonomously, without
human control. The number of daily users varies between 4100 and 4200 (70-72%), which is to be
compared to 5300 (92% of the population). This concludes that the ratio of returning subscribers is
even higher than for the HTTP traffic.
The user activity statistics show again the significant difference between HTTP (mostly web traffic) and
File Sharing traffic: File Sharing traffic is steady in time, though the penetration is lower than web
surfing. Web traffic, on the other hand, varies more in time and has a higher total penetration. In the
case of File Sharing traffic, the difference is the smallest between the daily and the total penetration,
suggesting that file sharing applications are “always on”.
5.5
5.5.1
Application volume, Packet and Session share
Comparison of applications usage in different networks and technologies
In order to avoid differences in the amount of unrecognized traffic between the Swedish network no.1
and the Spanish network only the traffic from the following application groups has been considered for
this comparison: web browsing, P2P file sharing and multimedia streaming. In all the networks and
technologies most of the traffic belongs to one of these groups.
In Table 5-12, Table 5-13 and Table 5-14 the share of the application groups in the downlink, uplink
and total traffic for different technologies and networks is depicted:
Application group
Web Browsing
P2P File Sharing
Multimedia streaming
Downlink traffic
Swedish network No.1
FTTH
DSL
7,06%
20,62%
88,27%
65,98%
4,67%
13,40%
Spanish network
CMTS
GGSN
12,10%
60,49%
80,85%
26,56%
7,05%
12,95%
Table 5-12. Share of the application groups in the downlink traffic for different technologies
and networks (only web browsing, P2P file sharing and multimedia streaming traffic are
considered, measurements in the period from October 2007 to March 2008).
D3.2 Traffic Models
Public
45 (74)
Project Deliverable
Application group
Web Browsing
P2P File Sharing
Multimedia streaming
CELTIC TRAMMS CP4-025
Uplink traffic
Swedish network No.1
FTTH
DSL
0,41%
2,65%
99,39%
96,74%
0,20%
0,61%
Spanish network
CMTS
GGSN
1,86%
20,70%
96,97%
76,51%
1,17%
2,79%
Table 5-13. Share of the application groups in the uplink traffic for different technologies and
networks (only web browsing, P2P file sharing and multimedia streaming traffic are
considered, measurements in the period from October 2007 to March 2008).
Application group
Web Browsing
P2P File Sharing
Multimedia streaming
Total traffic
Swedish network No.1
FTTH
DSL
2,38%
13,28%
96,07%
78,65%
1,55%
8,07%
Spanish network
CMTS
GGSN
7,42%
50,02%
88,22%
39,71%
4,36%
10,27%
Table 5-14. Share of the application groups in the total traffic for different technologies and
networks (only web browsing, P2P file sharing and multimedia streaming traffic are
considered, measurements in the period from October 2007 to March 2008).
As far as the uplink traffic is concerned, in the fixed networks (FTTH, DSL and CMTS) P2P file sharing
is responsible for more than 97% of the traffic regardless of the technology.
In the case of downlink traffic, in the fixed networks (FTTH, DSL and CMTS) the P2P file sharing
generates an important amount of traffic depending on the technology (from 66% to 88%).
Approximately 60% of the rest of the traffic corresponds to web browsing and 40% to multimedia
streaming.
Regarding the mobile network (GGSN) the amount of web browsing traffic is five times higher than
multimedia streaming. Compared to the fixed networks, the P2P file sharing traffic in the mobile
network is lower in uplink (77% of the traffic) and much lower in downlink (27% of the traffic). In the
downlink direction the mobile network traffic is mainly web browsing (61% of the traffic).
5.5.2
Applications with high user penetration xviii
User penetration is a very important factor when looking at network traffic. Applications with high user
penetration often load themselves automatically when the computer is started and runs in the
background when the user is not actively using the computer.
An excerpt from a database measurement in the Swedish municipal network No. 1 between 2007-0901 00:00 and 2007-10-01 00:00 is displayed in Table 5-15. During this time, the total number of
households seen sending or receiving traffic was 2081. The Internet access technology used was
FTTH.
Not surprisingly, the HTTP protocol sits at the top of the list. HTTP as used by the World Wide Web is
widely adopted by Internet users. The reason that SSL also has a place near the top is probably due
to the fact that most e-business and web based e-mail sites use it for security. Some protocols, like
SSL are actually separated by the PacketLogic into SSL v2 and SSL v3. In Table 5-15 and Table 5-16,
they are merged into one. BitTorrent, being the most popular P2P file sharing protocol in Sweden, can
also be found in the top. The e-mail protocol POP3 is found rather far down on the list. Reasons for
this include that web based e-mail increases in popularity and that a newer protocol called IMAP often
is used instead of POP3.
The HTTP protocol is further up on the list than the DNS protocol in the FTTH measurement. Users of
HTTP and the World Wide Web mainly use domain names instead of IP addresses. Translation from a
domain name to an IP address implicates the DNS protocol. The explanation for the rather low usage
D3.2 Traffic Models
Public
46 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
of DNS compared with that of HTTP is because in the FTTH network, users have the possibility to
choose their ISP. Different ISPs use different setups for their DNS servers. Since some ISP's DNS
servers are located on the user side of the PacketLogic measurement point not all DNS traffic pass the
measurement equipment, and thus do not contribute to the measurement.
In Table 5-16, which shows the top 13 used applications or protocols for the DSL customers in the
Swedish municipal network No. 1, HTTP and SSL are as widely used as in Table 5-15. In comparison
with Table 5-15, the signature of Windows update was added to the signature database by Procera
Networks and we find it at 95% which will give a good hint about the usage of Microsoft Windows in
this rather small population. The total number of hosts active during the measurement interval was
104. The measurements for the results in Table 5-16 were done between 2008-02-02 00:00 and 200802-20 00:00.
Number of active households
Percent
Application or protocol
2068 99.3%
HTTP
1975 94.9%
SSL
1911 91.8%
ICMP
1850 88.9%
HTTP media stream
1795 86.3%
BitTorrent
1794 86.2%
NTP
1769 85.0%
DNS
1768 85.0%
SOAP over HTTP
1630 78.3%
Ares
1593 76.6%
eDonkey
1571 75.5%
MSN messenger
1287 61.8%
RTP
1273 61.2%
Napster
1239 60.0%
RTSP media stream
805 38.7%
Skype
752 36.1%
POP3
Table 5-15 User penetration results from the traffic database. The measurements are from the
Swedish network No. 1 between 2007-09-01 00:00 and 2007-10-01 00:00. The total number of
households were 2081 during the time and the Internet access technology was FTTH.
D3.2 Traffic Models
Public
47 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Number of active households
Percent
Application or protocol
104 100.0%
HTTP
104 100.0%
DNS
102 98.1%
SSL
99 95.1%
Windows update
94 90.4%
HTTP media stream
85 81.7%
NTP
83 79.8%
ICMP
73 70.2%
BitTorrent
69 66.3%
RTP
63 60.6%
MSN messenger
58 55.8%
RTSP media stream
54 51.9%
POP3
51 49.0%
Microsoft Online Crash Analysis
Table 5-16 User penetration results from the traffic database. The measurements are from the
Swedish network No. 1 between 2008-02-02 00:00 and 2008-02-20 00:00. The total population
was 104 and the Internet access technology was DSL.
5.5.3
Traffic volume distribution xviii
Analyzing the traffic based on different applications and protocols is very important and will make
traffic analysis easier. Both capacity planning for new access networks and QoS configuration are
examples that can benefit from volume distribution analysis.
There are several ways to display volume measurements, e.g. pie charts, tables, bar charts. We have
chosen to use bar charts since it is possible to extract a lot of information from a bar chart, mostly
regarding inbound and outbound traffic volumes.
Figure 5-21 and Figure 5-22 show results from measurements in the Swedish municipal network No.
1. In Figure 5-21, the names BT trans and BT enc refers to BitTorrent transfer and BitTorrent
encrypted transfer, respectively. HTTP ms stands for HTTP media stream. In Figure 5-22, DC trans
means Direct Connect transfer which is a P2P file sharing application.
Looking at Figure 5-21 and Figure 5-22, the clearly most bandwidth dominant protocol is
unsurprisingly BitTorrent. Many research articles in the area of traffic measurements report P2P file
sharing as the most bandwidth intense category. In Sweden, BitTorrent is the near-standard way of file
sharing compared with other countries. For example, TRAMMS partners in Spain report that eDonkey
and Direct Connect are the most popular file sharing applications, shown in TRAMMS D3.1. From
Figure 5-21, compared with other traffic measurements, it can be seen that HTTP media stream is
beginning to become rather bandwidth intense. Probably YouTube and similar video sites are the
explanation as they are attracting more viewers.
D3.2 Traffic Models
Public
48 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 5-21 The graph shows traffic divided into application and protocol categories based on
volume. Technology is DSL. Measurement from the Swedish network No. 1 between 2008-02-02
00:00 and 2008-02-20 00:00
Figure 5-22 The graph shows traffic divided into application and protocol categories based on
volume. Technology is FTTH. Measurement from the Swedish network No. 1 between 2007-0901 00:00 and 2007-10-01 00:00.
D3.2 Traffic Models
Public
49 (74)
Project Deliverable
6
6.1
CELTIC TRAMMS CP4-025
APPLICATION CHARACTERISTICS
Web Video on Demand
In this subsection we investigated the traffic of two popular web based video sharing systems,
YouTube and Metacafe xix. Compared to D3.1, our analysis was extended with an interesting content
popularity analysis presented in Section 6.1.1.
YouTube that is described in a number of publications xx,xxi,xxii is a video sharing website where users
can upload, view and share video clips. YouTube was created in February 2005 by three former
PayPal employees. The service uses Adobe Flash technology to display a wide variety of usergenerated video content, including movie clips, TV clips and music videos, as well as amateur content
such as video-blogging and short original videos. In October 2006, Google Inc. acquired the company
for US$1.65 billion in Google stock.
Unregistered users can watch most videos on the site, while registered users are permitted to upload
an unlimited number of videos. Some videos are available only to users of age 18 or older.
Few statistics are publicly available regarding the number of videos on YouTube. However, in July
2006, the company revealed that more than 100 million videos were being watched every day, and 2.5
billion videos were watched in June 2006. 50,000 videos were being added per day in May 2006, and
this increased to 65,000 by July. In January 2008 alone, nearly 79 million users watched over 3 billion
videos on YouTube.
YouTube's video playback technology is based on Macromedia's Flash Player 9 and uses the
Sorenson Spark H.263 video codec.
YouTube files contain an MP3 audio stream. By default, it is encoded in mono at a bit rate of 64kbps
sampled at 22 kHz, giving an audio bandwidth of around 10 kHz. The default bit rate delivers
acceptable but not hi-fi audio quality.
Figure 6-1: Comparison of YouTube and Metacafe web-based video sharing systems
(Spanish Network measurement (GGSN+CMTS) from 2008-02-19 to 2008-02-29)
D3.2 Traffic Models
Public
50 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 6-2: Comparison of YouTube and Metacafe web-based video sharing systems
(Spanish Network measurement (GGSN+CMTS) from 2008-03-07 to 2008-03-30)
Figure 6-3: Comparison of YouTube and Metacafe web-based video sharing systems
(Spanish Network measurement (GGSN+CMTS) from 2008-02-19 to 2008-02-29)
Figure 6-4: Comparison of YouTube and Metacafe web-based video sharing systems
(Spanish Network measurement (GGSN+CMTS) from 2008-03-07 to 2008-03-30)
D3.2 Traffic Models
Public
51 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 6-1 shows the total traffic generated by YouTube and Metacafe websites for both downlink and
uplink direction. YouTube generates several orders of magnitude more traffic than others, thus it is
more popular among Spanish users.
Figure 6-2 shows the result of a second, 24 day long measurement. Although this measurement was
longer, the ratios of traffic are very similar. Interestingly the daily fluctuation of the video sharing traffic
changes quite unpredictably according to Figure 6-4. It shows no clear weekly periodicity.
In fact YouTube is more popular than not only Metacafe, but any other website, generating almost 600
GB of downlink traffic in a period of 11-days which is about 14.8% of the total downlink web traffic and
2.0% of the total downlink traffic.
The traffic seems to be somewhat smaller on weekdays and larger on the weekend (23rd and 24th
February appears as a peak in the curve). The calculated downlink per uplink traffic ratio is 43.14 and
47.31 for YouTube and Metacafe, respectively.
6.1.1
YouTube content popularity analysis
The aim of this analysis is to investigate when users are viewing YouTube videos and what the most
popular contents are. This way we can draw conclusions about user activity, user behavior, traffic
intensity and content popularity. Naturally, we are not focusing on the content of the video itself, but
want to determine certain properties (e.g., intensity, popularity distribution) of the collection of videos.
To achieve this, we set up a special firewall rule in PacketLogic, which filtered out all HTTP GET
requests containing the following query string: “http://www.youtube.com/watch?v=”. The equation
sign is followed by the 11 character long YouTube content ID, which is the subject of the investigation.
PacketLogic logged and dumped all IP packets containing the given pattern. By processing the trace,
it is possible to the extract content IDs and times when the videos were viewed. The latter is
determined by the packet arrival time.
The output of the packet dump procession is a text file containing only the content IDs and times; this
text file is later loaded into a database system for further analysis.
Figure 6-5 shows the number of viewed videos per hour throughout the 16 day long measurement,
which is an estimation of the user activity (and the traffic intensity as well). The user activity seems
more intense on weekdays and lower on the weekend (confirmed by Figure 6-6 as well). In Figure 6-6
the final day of the measurement is cut so that it contains samples of exactly two weeks; this way no
distortion is introduced in the sampling. We applied the same technique as we calculated the daily
distribution (see Figure 6-7 below) of the access times.
Number of YouTube videos viewed (per hour)
70
60
50
40
30
20
10
0
0h 12h 0h
12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h
Time (f rom Mon, 17 Nov 2008 16:31, to Tue, 2 Dec 2008 09:56)
12h 0h
Figure 6-5: Number of viewed YouTube videos per hour
(Swedish network measurement (DSL) from 2008-11-17 to 2008-12-02)
D3.2 Traffic Models
Public
52 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 6-6: Weekly distribution of viewed YouTube videos per day
(Swedish network measurement (DSL) from 2008-11-17 16:31 to 2008-12-01 16:34)
Figure 6-7 shows the daily distribution of the videos; apparently, user activity is higher in the afternoon
and evening hours. The busy hours start at around 4 PM, which is the typical time when people return
home from work. The peak hours can be observed around 6-7 PM. It is known that the population
mostly consists of home users. Thus this kind of behavior was expected.
Taking all things into consideration, statistics of YouTube users and traffic intensity may be different in
another environment, even though employees in a business environment may also watch YouTube
videos during working hours.
Number of YouTube videos viewed (per 15 mins)
100
90
80
70
60
50
40
30
20
10
0
0 AM
3 AM
6 AM
9 AM
12 PM
15 PM
Time of the day
18 PM
21 PM
24 PM
Figure 6-7: Daily distribution of viewed YouTube videos per hour
(Swedish network measurement (DSL) from 2008-11-17 to 2008-12-02)
Finally, Figure 6-8 shows the popularity distribution of the YouTube videos. The videos were ranked
according to the number of times they had been watched. Figure 6-8 (left) shows the popularity
distribution on a linear-linear scale, while the right subfigure shows the same on a linear-logarithmic
scale. They suggest an exponential-like decrease in popularity. Consequently, the popularity of the
content is definitely not even; a limited number of videos are extremely popular, while others are
watched rarely.
D3.2 Traffic Models
Public
53 (74)
CELTIC TRAMMS CP4-025
15
15
12
12
Number of views
Number of views
Project Deliverable
9
6
6
3
3
0
9
0
500
1000
1500
YouTube videos (ranked)
2000
0
0
10
1
2
3
10
10
10
YouTube videos (ranked)
4
10
Figure 6-8: Ranking of YouTube videos according to popularity
on linear-linear scale (left) and logarithmic-linear scale (right)
(Swedish network measurement (DSL) from 2008-11-17 to 2008-12-02)
6.2
Video streaming
PacketLogic defines precisely the applications belonging to the “Streaming Media” category; we
decided to use the same classification. According to PacketLogic the “Streaming Media” category
contains the following applications and subcategories (organized in a hierarchy):
•
•
•
Audio
o
Last.fm client
o
social.fm
Peer-to-Peer
o
MySee
o
P2P-Radio
o
PeerCast
o
RawFlow
o
SopCast
o
TVUPlayer
o
TvAnts tcp
o
TvAnts udp
Video
o
Abacast
o
EBS lecture
o
HTTP RealPlayer stream
o
HTTP media stream
o
Joost
o
Live Delivery Network
o
LocationFree player
o
MMS
o
Miro
o
Octoshape
o
Octoshape
o
Octoshape discovery
o
PPLive
o
PPStream
D3.2 Traffic Models
Public
54 (74)
Project Deliverable
•
CELTIC TRAMMS CP4-025
o
RTCP
o
RTMP
o
RTMPT
o
RTP
o
RTSP
o
RTSP media stream
o
Radegast
o
STTV
o
Slingbox media stream
o
SpotLife
o
StreamerOne
o
Chumby
o
Nabaztag
Toys
In Deliverable 3.1 we investigated the characteristics of video streaming traffic in different access
networks (CMTS and GGSN measurement points).
The general observation was (as Figure 6-9 and Figure 6-10 suggest) that there is a significant
difference between the two access types in terms of data volume. Figure 6-9 and Figure 6-10 show
the total amount of video-streaming traffic for downlink and uplink directions.
The downlink direction exceeds the uplink direction significantly.
After analyzing several measurements made in the same CMTS network at different times, we
realized that the daily amounts of data do not differ significantly; it varies between 120 and 180 GB in
downlink direction.
The amount of video-streaming traffic generated in mobile networks (at the GGSN measurement
point) is a fraction of the traffic measured at the CMTS measurement point.
Video-Stream ing video traffic in CMTS m easurem ent
180
160
140
Traffic (GB)
120
100
Inbound traffic
Outbound traffic
80
60
40
20
0
March 10. Mon11.
12.
13.
14.
15.
16.
17.
Time (days)
18.
19.
20.
21.
22.
23. Sun
Figure 6-9: Fluctuation of video-streaming video traffic
(Spanish Network measurement (CMTS) from 2008-03-10 to 2008-03-23)
D3.2 Traffic Models
Public
55 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Video - Streaming video traffic in GGSN-Internet measurement
1.4
Video/Streaming video in
Video/Streaming video out
1.2
traffic (GB)
1
0.8
0.6
0.4
0.2
0
February 19 Tuesday 20
21
22
23
24
time (days)
25
26 Tuesday
27
28
29 Friday
Figure 6-10: Fluctuation of video-streaming video traffic
(Spanish Network measurement (GGSN) from 2008-02-19 to 2008-02-29)
6.3
Web traffic analysis
In this subsection, the distribution of web traffic among different websites and domains was
investigated. Large web services often distribute the traffic load between numerous web servers.
These servers usually belong to the same domain, but since the request URLs of the servers are
different, it is not straightforward to calculate the aggregated traffic. We managed to calculate the
aggregated traffic by grouping the request URLs based on the domain (like “facebook”) substring and
top domain (like “com”) part. Host part and other sub-domains are omitted. In this way we could
determine the exact amount, which is of interest, of traffic generated by each service.
Table 6-1 shows the top 30 websites generating the largest amount of inbound traffic in the 24 day
long measurement. YouTube is clearly the number one web service generating around 1.3 TB of traffic
during 24 days. The results tell us that web based file sharing services (e.g., megaupload.com,
rapidshare.com) also generate significant traffic. Such services work the following way: the user may
upload the desired file to a website and the system sends out e-mails to those who the user wants to
share the content with.
Adult content sites are apparently also very popular, including primarily video sharing sites.
Google search engine is at the 6th place with almost 164 GB of data in 24 days, which is surprising,
since this is a service providing mostly textual content. However, the considerable traffic amount can
be easily explained with the extreme popularity of the search engine.
Social networking websites seem also very popular among users. Tuenti xxiii is a Madrid-based, social
networking website that has been referred to as the “Spanish Facebook”. Tuenti is targeted at the
Spanish audience. The site is currently accessible only to those who have been invited.
Myspace.com is the most popular International social network site suggested by the fact that
myspacecdn.com (the location where MySpace stores photos) is at position 22. Naturally, as expected, some local (in this case meaning Spanish sites, since the investigated measurement was made in Spain) websites are also in the top list. Besides the local social networking site, other popular sites include elcorreodigital.com and elmundo.es. They provide news and media content.
Official websites of software giants Microsoft and Apple also generated large traffic volumes most of
all by offering software downloads for customers. Microsoft’s dedicated software update site, which
provides the updates for all Windows-based computers, generated the 3rd largest traffic in the network.
Microsoft’s search engine (offering various services as well) Live.com has position 19 in the top list.
Microsoft’s information portal msn.com is at position 29.
D3.2 Traffic Models
Public
56 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
The traffic of the Panda Security software company’s website (pandasoftware.com) is likely stemming
from the antivirus update requests.
Rank Site name
Traffic inbound (MB) 1 youtube.com
1299373.5 2 megaupload.com
820211.2 3 windowsupdate.com
222035.8 4 megarotic.com
210257.2 5 rapidshare.com
187761.1 6 google.com
163999.3 7 llnwd.net
142528.0 8 redtube.com
107147.6 9 microsoft.com
102370.9 10 playstation.net
100670.2 11 youporn.com
89200.9 12 dailymotion.com
85900.5 13 tuenti.com
69744.5 14 apple.com
66917.4 15 veoh.com
66325.6 16 gigasize.com
60016.6 17 elcorreodigital.com
56595.0 18 edgesuite.net
53447.4 19 live.com
48652.6 20 pandasoftware.com
47963.2 21 elmundo.es
45829.0 22 myspacecdn.com
45188.3 23 xvideos.com
41006.6 24 porkolt.com
40401.9 25 pornhub.com
37253.4 26 photobucket.com
37151.5 27 pajilleros.com
35581.3 28 imageshack.us
33271.2 29 msn.com
31696.8 30 ytimg.com
31530.4 Table 6-1, Top 30 web domains (web services) ranked in the order of generated inbound traffic
(Spanish Network measurement from 2008-03-07 to 2008-03-30)
D3.2 Traffic Models
Public
57 (74)
Project Deliverable
6.3.1
CELTIC TRAMMS CP4-025
Top domain analysis
This section summarizes the results and conclusions of the domain analysis presented in Deliverable
3.1.
Figure 6-11, as an example from the 1st Spanish network measurement, shows the top 5 domains
based on the generated downlink traffic. (The uplink traffic is negligible in case of HTTP traffic, namely
the users download about 20 times more web traffic than that they upload). The largest part of HTTP
traffic is provided by .com domain and other international domains (.net, .org). Spanish domain (.es) is
in the top 5 as well, since the measurement was carried out in Spain.
The percentage of unknown HTTP traffic (containing IP addresses that could not be resolved to
domain names) is relatively high. This traffic may not have been generated by websites, rather by
other applications using the HTTP protocol.
Total traffic inbound (GB)
3500
3000
2500
2000
1500
1000
500
0
com
other
net
.es
org
Domains
Figure 6-11: Top 5 domains ranked by generated downlink web traffic
(Spanish Network measurement from 2008-02-19 to 2008-02-29)
Figure 6-12 shows the subsequent domains ranked by the generated download traffic; it contains
mainly the domains of European countries.
Figure 6-12: Subsequent domains ranked by generated downlink web traffic
(Spanish Network measurement from 2008-02-19 to 2008-02-29)
Figure 6-13 shows the contribution of all domains of the total downlink web traffic. The most significant
part of the traffic is related to international domains and the local homeland domain. The downlink and
uplink profiles are not significantly different.
D3.2 Traffic Models
Public
58 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Distribution of total inbound traffic among domains
Spanish Network measurement: 7th March 2008 ‐ 30th March 2008
0,07%
0,07%
0,09%
0,10%
7,30%
0,10%
4,54%
0,91% 0,12%
0,70%
2,96% 0,13%
83,61%
0,17%
0,18%
0,18%
com
net
.es
org
.tv
.us
0,07% 0,07%0,07%0,06%
.fm
0,05%
.to
0,04%
0,03%
nfo
0,03%
.de
0,03%
0,03%
.fr
0,03%
.uk
0,02%
0,02%
.br
0,02%
.ru
0,02%
.cn
0,02%
0,02% .eu
0,02%
biz
0,02%
0,02%
0,01%
0,01% .ar
.pt
edu
.ve
.nl
0,18%
.it
.se
.cz
.be
.hu
.ie
.jp
.ws
gov
.cm
.ee
0,46%
.ch
cat
.ro
.mx
.cl
.is
.co
0,22%
others
0,19%
Figure 6-13, Share of domains in the total generated downlink traffic
(Spanish Network measurement from 2008-03-07 to 2008-03-30)
6.4
P2P file sharing
There are several popular file sharing applications generating significant traffic volumes (including
often illegal content) on the Internet. In this subsection the aggregated file sharing traffic and the traffic
of the most popular applications (e.g., eDonkey, BitTorrent and Gnutella) are investigated in detail in
several types of access networks. This section summarizes the analysis results from Deliverable 3.1.
Gnutella xxiv is a file sharing network supported by several clients with varying capabilities. Gnutella is
a fully distributed system based on P2P technology. The network is never completely stable, since
peers are constantly joining and leaving the system. In the original version of Gnutella all clients were
regarded as equal, search requests were performed by flooding the search message all over the
overlay network. These features made searching and data downloading quite unreliable and
ineffective and raised scalability problems. This observation inspired the development of distributed
hash tables (which are much more scalable but support only exact-match, rather than keyword search)
and the introduction of ultra-peers and leaf-nodes.
The eDonkey network is a decentralized, server-based, peer-to-peer file sharing network used
primarily to exchange audio files, video files and computer software. Like most file sharing networks, it
was decentralized; files were not stored on a central server but were exchanged directly between
users based on the peer-to-peer principle. eDonkey supports both server-based and DHT-based
searching in the most recent clients.
Direct connect is a peer-to-peer file-sharing protocol. Direct connect (DC) clients connect to a central
hub and can download files directly from one user to another. DC hubs are central servers to which
clients connect, thus the networks are not as de-centralized as Gnutella or FastTrack. Hubs provide
D3.2 Traffic Models
Public
59 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
information about the clients, as well as file searching and chat capabilities. File transfers are done
directly between clients, in true peer-to-peer fashion.
BitTorrent is a P2P file sharing communications protocol. BitTorrent is a method of distributing large
amounts of data widely without the original distributor incurring the entire costs of hardware, hosting,
and bandwidth resources. Instead, when data is distributed using the BitTorrent protocol, each
recipient supplies pieces of the data to newer recipients, reducing the cost and burden on any given
individual source, providing redundancy against system problems, and reducing dependence on the
original distributor. Usage of the protocol accounts for significant traffic on the Internet. There are
numerous compatible BitTorrent clients (e.g., written in a variety of programming languages, and
running on a variety of computing platforms).
FastTrack is a P2P protocol, used by the Kazaa (and variants, Grokster and iMesh) file sharing
programs. In 2003, FastTrack was the most popular file sharing network, being mainly used for the
exchange of music mp3 files. Popular features of FastTrack are the ability to resume interrupted
downloads and to simultaneously download segments of one file from multiple peers. Also, the search
for a certain keyword is optimal: if the search is not stopped or timed out, FastTrack finds a source for
the search if one exists. The network had approximately 2.4 million concurrent users at its peak in
2003, afterwards the number of users decreased significantly.
We analyzed several measurements originating from different access networks and different
technologies (e.g., ADSL network, FTTH network, mobile network). The figures include daily and
weekly fluctuation of the traffic and specific applications. The figures included in this section are a
small selection of those in Deliverable 3.1.
File Transfer - P2P Traffic in CMTS m easurem ent
Inbound traffic
Outbound traffic
Traffic (GB)
2000
1500
1000
500
0
March 10. Mon
12.
13.
14.
15.
16.
17.
Time (days)
18.
19.
20.
21.
22.
23. Sun
Figure 6-14: Fluctuation of P2P file sharing traffic
(Spanish Network measurement (CMTS) from 2008-03-10 to 2008-03-23)
Figure 6-14 and Figure 6-15 show the fluctuation of file sharing traffic in CMTS and GGSN networks.
Naturally, significantly less traffic is generated in the mobile network for the same reasons mentioned
earlier: high cost of data transfer and lack of bandwidth.
D3.2 Traffic Models
Public
60 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
File Transfer - P2P traffic in GGSN measurement
4.5
File transfer/p2p in
File transfer/p2p out
4
3.5
traffic (GB)
3
2.5
2
1.5
1
0.5
0
February 19 Tuesday 20
21
22
23
24
time (days)
25
26 Tuesday
27
28
29 Friday
Figure 6-15: Fluctuation of P2P file sharing traffic
(Spanish Network measurement (GGSN) from 2008-02-19 to 2008-02-29)
File Sharing daily traffic in FTTH - peak hour measurment
45
in
out
total
40
35
traffic (GB)
30
25
20
15
10
5
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
time (hours)
14
15
16
17
18
19
20
21
22
23
Figure 6-16: Average daily fluctuation of P2P file sharing traffic
(Swedish municipal network No1 measurement (FTTH) from 2007-08-20 to 2007-10-21)
Figure 6-16 shows a huge difference between upload and download traffic in the FTTH network, the
upload/download ratio is about 3. In the case of file sharing, the uploaded data is typically larger than
the downloaded. This behavior is especially true for peers with high uplink bandwidth. In a typical
scenario these peers will serve other peers (who generally have relatively fast downlink, but slow
uplink).
D3.2 Traffic Models
Public
61 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
A clear weekly profile seems to appear in Figure 6-17. The traffic load is higher on the weekend and
lower on weekdays. The uplink traffic seems to show an increasing trend throughout the
measurement, though we cannot give a sound explanation for this phenomenon.
File Sharing Traffic Volume Per Day
8000
7000
Traffic volume (GB)
6000
5000
Inbound
Outbound
4000
3000
2000
1000
Monday
Monday
Monday
Monday
Monday
Time (day)
Monday
Monday
Monday
Figure 6-17: Fluctuation of P2P file sharing traffic
(Swedish municipal network No1 measurement (FTTH) from 2007-08-20 to 2007-10-21)
We can also draw some general (and partly obvious) conclusions based on the measurements:
•
The higher the access speed, the more traffic is generated. We have the strong impression
that if more bandwidth is offered for the user, it will be used mostly for file sharing, or it will not
be utilized at all.
•
The higher the access speed, the larger the difference between upload and download traffic in
the favor of upload. This behavior can be explained by the general working of P2P file sharing
networks. Peers with high uplink bandwidth are rare. Therefore exceptional peers with high
uplink will be well utilized, because all other peers will download data from them.
We also analyzed some specific, most voluminous P2P file sharing applications in terms of number of
users, transferred data in uplink and downlink directions, daily and weekly profiles, and ranking of
users according to traffic volume. We investigated the following applications which turned out to
generate the latgest traffic volume: eDonkey, BitTorrent and Gnutella.
According to the results, the most popular file sharing application in the measurement was eDonkey
generating 32.48 TB of total traffic, 2.95 TB daily (1.39 TB downlink, 1.56 TB uplink). BitTorrent
finished in second place with 11.03 TB of total traffic, 1.00 TB daily (0.52 TB downlink, 0.48 TB uplink).
In the case of BitTorrent the upload/download ratio, contrary to eDonkey, was over 1.0.
Traffic volumes and shares of the total traffic were similar in the measurements, since only a short
time elapsed between them.
The 3rd most voluminous application was Gnutella, but its traffic was negligible compared to the
others.
D3.2 Traffic Models
Public
62 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
When comparing the number of active users (e.g., Figure 6-18), eDonkey is seen to have more than
two times the users than BitTorrent, while the number of Gnutella users is rather small. In general the
number of users seems to be steady during the first measurement interval. We could see similar
numbers in all measurements.
No clear weekly profile can be recognized in the figures. The most important facts are summarized in
Table 6-2.
Number of users (P2P file sharing)
1800
1600
1400
Number of users
1200
1000
800
600
400
200
EDonkey
BitTorrent
0
19 Tue
20
21
22
23
24
25
Time (days)
26 Tue
27
28
29 Fri
Figure 6-18: Fluctuation of the number of BitTorrent and eDonkey users
(Spanish Network measurement (CMTS) from 2008-02-19 to 2008-02-29)
We ranked BitTorrent and eDonkey users according to the generated traffic; the ranking of BitTorrent
users (Figure 6-19) showed a clear exponentially decreasing tail distribution, while the ranking of
eDonkey users did not follow a clear trend. In addition, it was also interesting that the penetration of
both applications was surprisingly high: about 1800 user use BitTorrent (31%), while almost 3300 user
used eDonkey at least once during the measurement (56%).
Ranking of Bittorrent users
4
10
3
10
2
10
1
Traffic (GB)
10
0
10
-1
10
-2
10
-3
10
-4
10
0
200
400
600
800
1000
1200
Rank of users
1400
1600
1800
2000
Figure 6-19: Ranking of BitTorrent users
(Spanish Network measurement (CMTS) from 2008-02-19 to 2008-02-29)
D3.2 Traffic Models
Public
63 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Table 6-2, Table 6-3 and Table 6-4 show the basic traffic profiles of the file-sharing applications in the
two Spanish network measurements. BitTorrent and eDonkey are clearly the most popular
applications generating significant traffic volume. The daily traffic of an average BitTorrent or eDonkey
user is also considerable. Direct Connect and the downfallen Kazaa applications generate minor traffic
in the Spanish measurement.
Gnutella seems to have few users in Spain. However, the average generated daily traffic per user is
not negligible. This may be due to the fact that Gnutella is popular elsewhere, outside Spain.
Application
Traffic downlink
(GB)
Traffic uplink
(GB)
Total traffic
(GB)
↓Total number
of users
EDonkey
15336
17144
32481
3246
BitTorrent
5755
5277
11032
1759
Kazaa
1
1
1
119
Gnutella
36
69
106.
70
Direct Connect
0
0
0
2
Table 6-2, Traffic comparison of file sharing applications
(Spanish Network measurement (CMTS) from 2008-02-19 to 2008-02-29)
Application
Traffic downlink
(GB)
Traffic uplink
(GB)
Total traffic
(GB)
↓Total number
of users
EDonkey
18752
19487
38240
3289
BitTorrent
6262
5783
12045
1814
Kazaa
2
1
2
131
Gnutella
43
74
117
95
Direct Connect
0
0
0
2
Table 6-3, Traffic comparison of file sharing applications
(Spanish Network measurement (CMTS) from 2008-03-10 to 2008-03-23)
Spanish Network measurement from
2008-02-19 to 2008-02-29
Application
Users/day
Traffic/user/day
(downlink, MB)
Spanish Network measurement from
2008-03-10 to 2008-03-23
Users/day
↓ Traffic/user/day
(downlink, MB)
Edonkey
1559
893
1404
951
BitTorrent
643
813
551
809
Gnutella
18
185
17
177
Kazaa
15
4
15
7
0
0
0
53
Direct Connect
Table 6-4, Comparison of file-sharing application in terms of average number of daily users and
daily traffic
D3.2 Traffic Models
Public
64 (74)
Project Deliverable
6.5
CELTIC TRAMMS CP4-025
P2p telephony and VoIP
Skype is currently the most popular P2P VoIP network; the number of registered users, in 2006, went
beyond 100 million. Users can initiate and receive voice and video calls to/from other Skype users, or
even PSTN users using SkypeOut/SkypeIn. Moreover, instant messaging (chat) and file transfer is
also supported within the Skype infrastructure.
The overlay network contains a huge number of peers (or ordinary nodes) among whom some are
promoted to be super node. Super nodes are the switching elements in the overlay network and
(among others) responsible for maintaining a Global Index distributed directory, which allows users to
find each other. There are also some dedicated components of the network, (e.g., login servers,
update servers and buddy-list servers). These central entities are operated by Skype. All
communications between the Skype network entities are strongly encrypted. More detailed information
about Skype network entities, operation and identification techniques can be found in papers xxv,xxvi.
6.5.1
Skype traffic
Several types of Skype traffic are recognized by PacketLogic traffic analyzer (Figure 6-20). Among
them, the P2P component is the most dominating part. TCP transfer is mainly used for file transfer
only, because Skype tries to avoid the usage of TCP for voice and video transfer.
Skype traffic components
30
Traffic inbound (GB)
25
20
15
10
5
0
discovery
login
version check Hub2Hub
P2P
SSL
TCP
UDP
Figure 6-20: Components of Skype (downlink) traffic
(Spanish Network measurement from 2008-02-19 to 2008-02-29)
Comparing Figure 6-21 and Figure 6-22, it can be observed that the amount of Skype traffic is
significantly lower in the mobile network (GGSN measurement point) than in the fixed cable TV
network (CMTS measurement point). This observation is interesting, since Skype is well applicable
and even cost effective in mobile networks for two main reasons: the bandwidth provided by (highspeed) mobile networks (3G and HSDPA) is sufficient for Skype. The cost of Skype usage should be
calculated according to the transmitted amount of data which is charged on a per megabyte basis or
even on a flat rate basis (depending on the tariff package). Even if it is charged on a per megabyte
basis, it can still be cheaper than traditional calls (charged on a rather expensive per minute basis).
According to Figure 6-22, however, the amount of Skype traffic is still negligible. Weekly periodicity
can be seen neither in Figure 6-21 nor in Figure 6-22. The upload/download ratio also varies around
1.0. However, in theory, it should be close to one considering that a single Skype call usually
generates symmetric traffic.
D3.2 Traffic Models
Public
65 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Total Skype traffic in GGSN network
0.18
Skype traffic inbound
Skype traffic outbound
0.16
0.14
traffic (GB)
0.12
0.1
0.08
0.06
0.04
0.02
0
January 1 Tuesday
7 Monday
14 Monday
time (days)
21 Monday
28 Monday 31 Thursday
Figure 6-21: Fluctuation of total Skype traffic
(Spanish GGSN Network measurement from 2008-01-01 to 2008-01-31)
Skype traffic appears to be more significant in fixed networks (Figure 6-22). Regarding the first 11
days long measurement, it generates a daily average traffic of 4.02 GB and 3.55 GB in downlink and
uplink direction. Assuming that the total traffic consists of voice calls only, this amount of data would
correspond to about 200 hours of speech calculating with the old Skype voice codec, or about 100
hours calculating with the latest codec. The equivalent average Minutes of Usage per user (MOU)
would be 1.76 and 0.88, respectively.
Total Skype traffic
5.5
inbound
outbound
5
4.5
4
Traffic (GB)
3.5
3
2.5
2
1.5
1
0.5
0
19 Tue
20
21
22
23
24
25
time (days)
26 Tue
27
28
29 Fri
Figure 6-22: Fluctuation of Skype traffic
(Spanish CMTS Network measurement from 2008-02-19 to 2008-02-29)
The number of Skype users (Figure 6-23) was quite stable during the whole measurement period and
varied around 290 every day which means a penetration of 5%. According to the measurement about
2200 users generated MSN traffic throughout the 11-days measurement, which means a total
penetration of 38%. However, for many users PacketLogic detected minor daily traffic, which is
rounded to zero, but still included in the log making the penetration higher (as opposed to this, zero
activity is not even included in the traffic log.).
D3.2 Traffic Models
Public
66 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Skype user activity
350
300
Number of users
250
200
150
100
50
0
19 Tue
20
21
22
23
24
25
time (days)
26 Tue
27
28
29 Fri
Figure 6-23: Fluctuation of Skype users
(Spanish Network measurement from 2008-02-19 to 2008-02-29)
Figure 6-24 shows the ranking of Skype users according to the generated traffic. The straight curve in
the whole range suggests a clear exponential tail of generated traffic volume distribution.
Rank of users
4
10
3
10
2
Traffic (log GB)
10
1
10
0
10
-1
10
-2
10
0
200
400
600
Rank of users
800
1000
1200
Figure 6-24: Rank of Skype users according to the generated (downlink) traffic
(Spanish Network measurement from 2008-02-19 to 2008-02-29)
The daily traffic of an average Skype user and the average number of Skype users per day are visible
in Table 6-5 along with the same statistics for other multimedia applications for comparison.
Figure 6-25 shows the traffic pattern of active and inactive users of Skype in the Swedish municipal
network No. 1. The measurements have been done with the PacketLogic appliance. The PacketLogic
differentiates between users whose usage are below 1 kbps and users who use more than 1 kbps as
D3.2 Traffic Models
Public
67 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
an average during 5 minute intervals, so this limit was chosen as the differentiator between active and
inactive users. This means that if a user has generated more than 1 kbps during 5 minutes, this user is
classified as active. Two graphs are showed inFigure Figure 6-26, the upper line describes the number
of users that have been seen running Skype in every measured 5 minute period, but not using more
than 1 kbps of Skype bandwidth. The lower line shows the number of Skype users that use Skype and
generate more than 1 kbps of Skype traffic during the measured 5 minutes. The measurement has
been conducted during 10 days. In the graph, we see that in the peak hour over 130 users were
logged in to their Skype account, and 10 were active, out of the total 3687 IP addresses investigated.
This means that 8% of the logged in Skype users use more than 1 kbps of Skype traffic during a 5
minute average. xviii
Figure 6-25 Average daily traffic pattern showing active and total number of users seen using
Skype during a 10 day measurement in the FTTH part of the Swedish network No. 1.
6.5.2
MSN Messenger (Windows Live Messenger) traffic
Windows Live Messenger (formerly called as MSN Messenger) is the instant messaging solution of
Microsoft. MSN (unlike Skype) is a centralized system; it does not use P2P technology. In addition it
offers other features, like voice calls, video conferencing, file transfer, etc. Several 3rd party clients
have been also released for other platforms. However, they may not support all the features.
According to Figure 6-26 the amount of traffic produced by MSN has the same order of magnitude as
the traffic of Skype. MSN generates an average daily traffic of 3.51 GB downlink and 3.24 GB uplink.
D3.2 Traffic Models
Public
68 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Windows Live Messenger total traffic
5.5
5
4.5
4
Traffic (GB)
3.5
3
2.5
2
1.5
1
0.5
0
19 Tue
20
21
22
23
24
25
Time (days)
26 Tue
27
28
29 Fri
Figure 6-26: Fluctuation of total MSN Messenger (Windows Live Messenger) traffic
(Spanish Network measurement from 2008-02-19 to 2008-02-29)
Comparing Figure 6-23 and Figure 6-27 it can be recognized that MSN has more than two times as
many users as Skype. MSN has 732 users (12.6% of the total number of users) on an average day
with small variance. About 3800 users generated MSN traffic throughout the 11-days measurement,
which means a total penetration of 65.6%, which is unexpectedly high even though MSN is considered
to be probably the most popular online application.
The daily number of MSN users was found to be similar in all measurements.
900
800
700
Number of users
600
500
400
300
200
100
0
19 Tue
20
21
22
23
24
25
Time (days)
26 Tue
27
28
29 Fri
Figure 6-27: Fluctuation of the number of Windows Live Messenger users
(Spanish Network measurement from 2008-02-19 to 2008-02-29)
The ranking of Windows Live Messenger users Figure 6-28 shows again an exponential decrease.
D3.2 Traffic Models
Public
69 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Ranking of Windows Live Messenger users
2
10
1
10
0
Traffic (log GB)
10
-1
10
-2
10
-3
10
-4
10
0
500
1000
1500
Rank of users
2000
2500
Figure 6-28: Ranking of Windows Live Messenger (MSN Messenger) users according to
generated total traffic (Spanish Network measurement from 2008-01-01 to 2008-01-31)
The daily traffic of an average MSN user and the average daily number of MSN users are visible in
Table 6-5 along with the same statistics of other multimedia applications for comparison.
Spanish Network measurement from
2008-02-19 to 2008-02-29
Application
Skype
Yahoo Messenger
MSN (Win. Live)
Users/day
Traffic/user/day
(downlink, MB)
Spanish Network measurement from
2008-03-10 to 2008-03-23
Users/day
↓Traffic/user/day
(downlink, MB)
290.10
13.8
253.50
14.3
9.7
6.5
8.8
6.3
731.91
4.8
567.71
5.2
Table 6-5, Comparison of multimedia application in terms of average number of daily users and
daily traffic
Yahoo Messenger is also denoted in the table, though its traffic was negligible both in terms of volume
and number of users.
Using the definition that an active MSN messenger user is one that consumes more than 1 kbps in
total network traffic solely for the MSN messenger application, we constructed an average daily traffic
pattern graph. The measurement in Figure 6-29 is from the Swedish municipal network No. 1. The
measurement period was two weeks and averaged over each 5 minute period of the day. Two things
are notable in the figure, few MSN messenger users use more than 1 kbps of transferred data and the
average number of online users is quite high during the early hours of the day. It is evident that those
computers were left on during the night, probably with the purpose of file sharing. During the peak
hour at 20:00, we see that almost 500 users are logged in to their MSN messenger account. The
active part is although not more than 10, giving the percentage that only 2% of the MSN messenger
users use more than 1 kbps of MSN messenger traffic during a 5 minute average. The measurement
period was between 2007-09-01 00:00 and 2007-09-11 00:00. xviii
D3.2 Traffic Models
Public
70 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
Figure 6-29 Average daily traffic pattern graph showing number of users consuming more than
and less than 1 kbps using the MSN application on the FTTH network. Measurements from the
Swedish municipal network No. 1.
7
7.1
CONCLUSIONS/DISCUSSION
Description of Aggregate Traffic
According to the Spanish network measurement, heavy users seem to be active throughout the whole
day. They dominate the most during the night hours, when light users leave the network. The average
traffic rate per active user remains stable throughout the day, but increases significantly during the
night hours.
By comparing several daily profiles of different countries and technologies, it can be concluded that
the shape of the daily traffic patterns depends on the subscriber type of the network (residential,
enterprise, academic), and that there is a common daily traffic pattern for the networks that have
mainly residential users.
Previous experiments suggest that the amount of downlink and uplink traffic depends on the access
technology (CMTS, DSL or FTTH).
In the Swedish network we can see that all households generate much file sharing traffic. The FTTH
traffic is very asymmetric, with more uplink traffic than downlink traffic. The DSL households show a
more balanced traffic pattern, and in the evening they have a more classical Internet traffic pattern,
with mostly downlink traffic. Also, the FTTH households seem to use their broadband access for
mainly file sharing applications such as Bit Torrent, whereas the DSL households have less file
sharing and thereby a larger share of HTTP traffic.
The CMTS traffic in the commercial Spanish network is also downlink dominated, but not as
pronounced as the DSL traffic in the Swedish network. The network with the most pronounced
downlink traffic is the wireless access network in the commercial Spanish network. In terms of access
capacity, the faster the access links, the larger the uplink traffic share of the total traffic.
As far as the Spanish Network is concerned, the traffic volume studies have shown that the daily
profile is quite different for the fixed and mobile networks. In the first case there is a peak (in the
evening) and a valley (in the morning), whereas in the second one there are several peaks and valleys
throughout the day. Comparing the daily profile of the Spanish fixed network with the other fixed
access networks under study, we can see that the CMTS access network is quite similar to the DSL
network in Sweden. In both cases the downlink traffic is the main responsible for the total traffic,
however during the early hours of the day the amount of uplink is bigger than the downlink.
D3.2 Traffic Models
Public
71 (74)
Project Deliverable
7.2
CELTIC TRAMMS CP4-025
Application usage
Regarding the application usage it should be noted that the Peer-to-Peer applications are mainly
responsible for the traffic volume in the Spanish fixed Network. eDonkey is the main application in use,
in comparison with the Swedish Networks where BitTorrent is clearly dominant.
The user activity statistics show a significant difference between HTTP (mostly web traffic) and File
Sharing traffic (in the Spanish network): File Sharing traffic is steady in time, though the penetration of
it is lower than web surfing. Web traffic, on the other hand, varies more in time and has a higher total
penetration. In the case of File Sharing traffic, the difference is the smallest between the daily and the
total penetration, suggesting that file sharing applications are “always on”.
The comparative study of applications usage in different networks and technologies reveals several
interesting findings. As far as the uplink traffic is concerned, in the fixed networks (FTTH, DSL and
CMTS) the P2P file sharing is responsible for more than 97% of the considered traffic regardless of
the technology. In the case of the downlink traffic, in the fixed networks (FTTH, DSL and CMTS) the
P2P file sharing generates an important amount of traffic depending on the technology (from 66% to
88%).
The application penetration analysis in the Swedish network shows that the HTTP protocol (used by
World Wide Web) is at the top of list. BitTorrent finishes second, as the most popular P2P file sharing
protocol in Sweden. The results suggest that IMAP mail access protocol is more popular than the old
POP3 counterpart. The penetration of Windows update (95%) gives us a good hint about the usage of
Microsoft Windows in this rather small population.
We investigated the distribution of web traffic among different websites and domains in the Spanish
network. The results tell that YouTube and other web based file sharing services dominate the web
traffic in terms of traffic volume. Adult content sites, Google search engine, and social networking
websites are also very popular.
According to the domain statistics of web traffic (Spain), international domains (.com, .org, .net) carry
the highest portion of web traffic. In addition, however, the local domain (.es) also appeared in the
top 5.
YouTube, on its own, produced about 2% of the total traffic and 15% of web traffic. The YouTube
content popularity analysis also exposed several interesting findings. We regarded the “number of
viewed videos per hour” a good estimation of the user activity and the traffic intensity as well. The user
activity seems more intense on weekdays and lower on the weekend. Apparently, the user activity is
higher in the afternoon and evening hours. The rank curve indicates that the popularity of the contents
is not even; a limited number of videos are extremely popular, while others are watched rarely.
VoIP and instant messaging application also have significant penetration in the population (e.g., Skype
has a daily penetration of 5%, while MSN has 13%). However, they produce much less traffic than the
file sharing and video distribution applications. In the Spanish network we observed much less Skype
traffic in the mobile network than in the fixed cable TV network.
The Skype traffic analysis suggests that around 8% of the logged on Skype users generate traffic
higher than 1 kbps, which may be the indication of conducting voice calls.
MSN messenger was found to be the most popular instant messaging application in the Spanish and
Swedish networks. Though only 2% of the MSN messenger users used more than 1 kbps of MSN
messenger traffic, which may show that MSN messenger is used mainly for presence and chatting,
which does not require high bandwidth.
7.3
Clustering of users
Based on the clustering of subscribers, a few general conclusions can be made. First of all, there is a
small group of subscribers who generate huge amount of – mainly P2P – traffic, while there are
subscribers whose traffic demands are much more moderate and the ratio of web browsing is more
significant in their traffic mix, although they use P2P applications as well.
Another general conclusion is that the traffic of the heavy users seems to be constant over the
repeated measurements, while the moderate subscribers seem to gradually increase their activity.
Some partly natural conclusions can be made from the experiment when a daily traffic limit was set up.
The clustering process classified more users as “minimal (light) user” at the expense of the “medium
D3.2 Traffic Models
Public
72 (74)
Project Deliverable
CELTIC TRAMMS CP4-025
users” because the real minimal users were practically filtered out in advance. The percentage of
heavy users did not change. The number of users meeting the criterion dropped sharply; the average
traffic per user also rose; and the standard deviation of the generated traffic per user increased
significantly as well.
Regarding the light users, the higher the traffic limit, the more HTTP traffic is shifting backward in the
ranking of the top applications. P2P applications are more dominant as the traffic limit is increased.
Some new, interesting applications showed up among the top 15 characteristic applications.
Considering the heavy users, the list of top applications remained unchanged irrespectively of the
traffic limits.
The analysis of minimal users suggests that their traffic is distributed unevenly (just like the total
traffic). Most minimal users browse the web (HTTP and HTTP secure) and read emails. Even light
users use P2P applications.
The analysis of popular applications (HTTP, P2P applications, FTP) shows that there are huge
differences between users according to the generated traffic of a certain application. I.e. traffic is
uneven at the aggregate level and also at the level of individual applications.
8
i
REFERENCES
http://www.maxmind.com/app/geolitecountry
ii
R. Braden, D. Clark, and S. Shenker. “Integrated Services in the Internet Architecture: An Overview”,
IETF RFC 1633 (1994).
iii
S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. “An Architecture for
Differentiated Services”, IETF RFC 2475 (1998).
iv
MRTG website: http://oss.oetiker.ch/mrtg/
v
A. Odlyzko. “Data networks are lightly utilized, and will stay that way”, Review of Network Economics,
vol. 2, no. 3, pp. 210-237, Sept 2003.
vi
C. Fraleigh, F. Tobagi, and C. Diot. “Provisioning IP Backbone Networks to Support Latency
Sensitive Traffic”, Proc. of IEEE Infocom, San Francisco, USA. 2003
vii
C. Fraleigh. “Provisioning Internet Backbone Networks to Support Latency Sensitive Applications”,
Ph.D. thesis, Stanford University, June 2002.
viii
J. van den Berg, M. Mandjes, R. van de Meent, A. Pras, F. Roijers, and P. Venemans. “QoS-aware
bandwidth provisioning for IP links”, Computer Networks 50, 631-647 (2006).
ix
TRAMMS Deliverable "D3.1 – Traffic Characterization", May 2008
x
Procerea Networks home page, http://www.proceranetworks.com
xi
http://www.cisco.com/go/netflow
xii
http://en.wikipedia.org/wiki/Netflow
xiii
http://www.wireshark.org/
xiv
http://www.tcpdump.org/
xv
http://www.maxmind.com/app/geolitecountry
xvi
R. Guimerà, L. Danon, A. Díaz-Guilera, F. Giralt, and A. Arenas, “Self-similar community structure
in organizations”.
xvii
K.Papagiannaki, N. Taft, S. Bhattacharyya, P. Thiran, K. Salamatian, and C. Diot, “A pragmatic
definition of elephants in Internet backbone traffic,” Proc. Internet Measurement Workshop, 2002.
xviii
T. Bonnedal, “Traffic Measurement and Analysis in Fixed and Mobile Broadband Access
Networks”, Master Thesis, LTH, to be published (2009)
D3.2 Traffic Models
Public
73 (74)
Project Deliverable
xix
CELTIC TRAMMS CP4-025
Metacafe website: http://www.metacafe.com/
xx
P. Gill, M. Arlitt, Z. Li, and A. Mahanti. “YouTube Traffic Characterization: A View From the Edge”,
Proc. of the ACM SIGCOMM Internet Measurement Conference (IMC), San Diego, USA. Oct. 2007.
xxi
Xu Cheng, Cameron Dale, Jiangchuan Liu, “Understanding the Characteristics of Internet Short
Video Sharing: YouTube as a Case Study”, cs.NI Networking and Internet Architecture (cs.MM
Multimedia), 2007.
xxii
Michael Zink, Kyoungwon Suh, Yu Gu, Jim Kurose, “Watch global, cache local: YouTube network
traffic at a campus network: measurements and implications”, Multimedia Computing and Networking
2008.
xxiii
Wikipedia, The free encyclopaedia, www.wikipedia.org
xxiv
Wikipedia, The free encyclopaedia, www.wikipedia.org
xxv
Marcell Perényi, András Gefferth, Trang D. Dang and Sándor Molnár, “Skype Traffic Identification”,
Proc., 50th IEEE Global Communications Conference (GLOBECOM), pages 399-404, Washington,
DC, USA, 2007.
xxvi
Marcell Perényi and Sándor Molnár, “Enhanced Skype Traffic Identification”, Proc. 2nd Int. Conf. on
Performance Evaluation Methodologies and Tools (VALUETOOLS), Nantes, France, 2007.
D3.2 Traffic Models
Public
74 (74)