Download Cross-Domain and Cross-Layer Coarse Grained Quality of

Document related concepts

Piggybacking (Internet access) wikipedia , lookup

Point-to-Point Protocol over Ethernet wikipedia , lookup

Internet protocol suite wikipedia , lookup

Computer network wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Wake-on-LAN wikipedia , lookup

IEEE 1355 wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

Deep packet inspection wikipedia , lookup

Routing wikipedia , lookup

Net bias wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Peering wikipedia , lookup

Quality of service wikipedia , lookup

Transcript
Cross-Domain and Cross-Layer Coarse Grained
Quality of Service Support in IP-based Networks
von der Fakultät für Elektrotechnik und Informationstechnik
der Technischen Universität Chemnitz
genehmigte
Dissertation
zur Erlangung des akademischen Grades
Doktoringenieur
(Dr.-Ing.)
vorgelegt
von Dipl.-Ing. Thomas Martin Knoll
geboren am 10. Januar 1973 in Reichenbach
eingereicht am 27.7.2009
Gutachter:
Univ.-Prof. Dr.-Ing. Thomas Bauschert
Univ.-Prof. Dr.-Ing. Jörg Eberspächer
Univ.-Prof. Dr.-Ing. habil. Klaus Franke
Tag der Verteidigung: 11.11.2009
Verfügbar im MONARCH der TU Chemnitz: http://archiv.tu-chemnitz.de/pub/2009/0165
17.11.2009
Bibliographische Beschreibung
Thomas Martin Knoll
Cross-Domain and Cross-Layer Coarse Grained Quality of Service Support in IP-based
Networks
Dissertation (in englischer Sprache)
166 Seiten, 155 Abbildungen, 21 Tabellen, 185 Literaturverweise
Referat
Mit der zunehmenden Popularität des Internets steigt die Anzahl der Nutzer und vor
allem die Anzahl zeit- und verlustkritische Dienste – wie zum Beispiel „Voice over IP“,
Videoübertragungen und netzbasierte Spiele. Das Internet ist dabei der Zusammenschluss von ca. 30.000 Betreibernetzen, die mit Hilfe des „Internet Protocol (IP)“ derzeit
ohne jede Dienstgüteunterstützung den Datenverkehraustausch realisieren. Massive
Überdimensionierung der Netzkapazitäten führen zu einer Netzauslastung von nur ca.
10% und entsprechend guter Übertragungsqualität. Mit steigendem Verkehrsaufkommen wird in dieser Dissertation erwartet, das die Netzbetreiber infolge des Kostendrucks nicht schritthaltend den überhöhten Netzausbau aufrechterhalten können und
somit Qualitätseinbußen zu erwarten sind. Innerhalb der Betreiber wird bereits jetzt
Verkehrstrennung betrieben, jedoch am Übergabepunkt verworfen und im besten Fall
im Nachbarnetz durch aufwendige Analyse erneut vorgenommen.
Im Rahmen dieser Arbeit wurde deshalb ein domänen- und schichtenübergreifendes
Konzept zur Realisierung grob-granularer Dienstgüte in IP-Netzen entworfen, zur
Standardisierung bei der „Internet Engineering Task Force (IETF)“ vorgeschlagen,
implementiert und in Auszügen simuliert und getestet.
Dabei werden die Verkehrsklasseninformationen mehrere Netzschichten in transitiven
Nachrichtenelementen des „Border Gateway Protocol (BGP)“ signalisiert und schichtenübergreifend assoziiert.
Die vorliegende Dissertation beinhaltet im wesentlichen drei Teile:
1. Eine umfassende Zusammenstellung von vorhandenen Dienstgütekonzepten
einschließlich der bereits existierenden QoS-Funktionselemente in verfügbaren
Netzelementen,
2. Die detaillierte Spezifikation des neuen Konzeptes und
3. den Ergebnissen der Simulations- und Implementierungsaktivitäten zum Nachweis der Funktion und Skalierbarkeit des Entwurfes.
Zwei wesentliche Erkenntnisse und Forderungen sind durch die Bearbeitung des
Themas erwachsen. Die Einfachheit der Konzeptstruktur und die Einfachheit der
angestrebten Dienstgüteunterstützung. Die angestrebte Dienstgüte beschränkt sich
deshalb auf die primitive Verkehrstrennung in mehrere Klassen, die in den Weiterleitungsknoten getrennt abgelegt und mit verschiedenem Vorrang behandelt werden.
Schlagwörter
Quality of Service (QoS), Class of Service (CoS), Cross-Domain, Cross-Layer, Inter-AS,
Marking Signalling, Ingress limitation Signalling, BGP, Extended Community Attribute
ii
17.11.2009
Abstract
The increasingly popular Internet with a steadily growing user base, the resulting traffic
load and its rising usage for time and loss critical services, such as voice over IP, video
streaming and gaming, consists of about 30,000 interconnected service provider
networks. Those interconnections are based on the Internet Protocol (IP) and do not
distinguish the mixed traffic types within the transported traffic load. The currently
observed and mostly sufficient service quality can only be achieved by network internal
and inter-domain link capacity over-provisioning. Resource utilization of about 10% is
commonly applied to achieve stable and un-congested network operation. However,
service providers are increasingly deploying Quality of Service (QoS) support mechanisms within their network domain in order to provide traffic separation and differentiated forwarding. Not only IP QoS, but also underlying link layer QoS mechanisms are
applied. Such QoS support is currently removed at the interconnection link and possibly
reapplied in an independent and uncoordinated fashion in the neighbouring domain.
A new cross-domain and cross-layer coarse grained Quality of Service support concept
has therefore been drafted, which allows for the automated inter-domain class of
service (CoS) support information exchange about the distinguished traffic classes at
different networking layers. The concept is based on the standard inter-domain
signalling protocol, the Border Gateway Protocol (BGP) version 4. Transitive BGPbased cross-domain signalling and cross-layer CoS mapping is a novel contribution.
The cross-domain signalling of cross-layer mapped class set information has been
submitted for standardization within the Internet Engineering Task Force (IETF). This
includes a class overload prevention signalling by means of applied token bucket based
ingress limitations. Global scale usage and omnipresent traffic class of service support
is targeted with the proposed and implemented concept. It is likely, that service
providers might be tempted to misuse offered service classes, hence the overload
limitation.
Three major contributions are documented within this thesis:
1. A comprehensive compilation of QoS support concepts with detailed network and
node internal building block descriptions has been arranged, which proves the
technical readiness of currently deployed devices for an inter-domain CoS based
interconnection.
2. The drafted specification of the new inter-domain CoS concept including the CoS
marking and class overload limitation signalling is detailed herein.
3. Simulations and implementations of vital building blocks of the concept have
been made to underline its functionality and technical feasibility. Resource estimates and successful field trials provide evidence for its scalable and functioning
design.
The thesis’ work identified two fundamental design requirements for the concept. They
are simplicity in design and QoS support.
QoS in this approach therefore refers to primitive traffic separation into several classes,
which will experience differently prioritized forwarding behaviour in relaying nodes.
Enqueueing in separate queues is thereby aspired to.
iii
17.11.2009
Contents
1
Introduction ____________________________________________________________ 3
2
Fundamentals of IP routing and forwarding __________________________________ 4
2.1
IP datagram structure and addressing___________________________________________4
2.2
Routing basics _______________________________________________________________7
2.2.1
2.2.2
2.3
Router architecture _________________________________________________________19
2.3.1
2.3.2
2.3.3
3
Routing protocols and hierarchy _____________________________________ 7
Inter-domain routing using BGP ____________________________________ 12
Router control plane structure ______________________________________ 19
Router internal interconnection structure _____________________________ 20
Router internal queuing structure ___________________________________ 21
Basic QoS aspects ______________________________________________________ 23
3.1
Overview __________________________________________________________________23
3.1.1
3.1.2
3.2
QoS treatment scope_________________________________________________________37
3.2.1
3.2.2
3.2.3
3.3
QoS-based forwarding ____________________________________________ 38
QoS-based routing________________________________________________ 39
QoS-based tunnelling _____________________________________________ 41
Architectural scope__________________________________________________________44
3.3.1
3.3.2
4
relative vs. absolute vs. coarse-grained QoS _________________________ 23
QoS building blocks_______________________________________________ 25
Cross-layer QoS__________________________________________________ 44
Cross-domain QoS _______________________________________________ 45
State of the art QoS Concepts _____________________________________________ 46
4.1
IP QoS ____________________________________________________________________46
4.1.1
4.1.2
4.1.3
4.1.4
DiffServ _________________________________________________________ 47
IntServ __________________________________________________________ 52
IntServ / DiffServ combination ______________________________________ 54
ITU-T IP QoS concept_____________________________________________ 55
4.2
Ethernet QoS_______________________________________________________________56
4.3
MPLS QoS_________________________________________________________________61
4.4
QoS in access networks ______________________________________________________65
4.5
Summary of expected Class of Service support ___________________________________69
5
State of the art AS interconnection _________________________________________ 71
5.1
IP transit __________________________________________________________________74
5.2
IP peering _________________________________________________________________75
5.3
Internet Routing Registry - IRR _______________________________________________77
6
Related work___________________________________________________________ 78
iv
17.11.2009
7
New (coarse grained) CoS concept _________________________________________ 86
7.1
Motivation and target________________________________________________________86
7.2
Usage of BGP for QoS signalling ______________________________________________88
7.3
Definitions and information processing _________________________________________89
7.3.1
7.3.2
8
BGP extended community attribute for CoS marking __________________ 89
BGP class of service interconnection ________________________________ 96
Mapping strategies_____________________________________________________ 101
8.1
Problem statement _________________________________________________________101
8.1.1
8.1.2
mapping between different class sets of the same layer_______________ 101
mapping between different class sets of different layers_______________ 103
8.2
Existing recommendations___________________________________________________104
8.3
Coarse grained CoS mapping recommendations ________________________________113
9
Simulation results _____________________________________________________ 115
9.1
Setup selection for QoS marking and forwarding ________________________________115
9.2
Simulation results for QoS marking and forwarding _____________________________117
9.2.1
9.2.2
9.2.3
9.2.4
9.2.5
9.2.6
9.2.7
Scenario 1: single node interconnection ____________________________ 117
Scenario 2: AS interconnection – Single AS _________________________ 120
Scenario 3: AS interconnection – Multi-AS __________________________ 121
Scenario 4: AS interconnection – Multi-AS 2 ________________________ 122
Scenario 5: AS interconnection – Multi-AS 3 ________________________ 123
Scenario 6: AS interconnection – Multi-AS 4 ________________________ 124
Scenario 7: AS interconnection – Cross-Layer _______________________ 126
9.3
Setup selection for token bucket ingress filtering ________________________________127
9.4
Simulation results for token bucket ingress filtering _____________________________128
9.5
Summary of simulation results _______________________________________________130
10
Concept implementation_______________________________________________ 132
10.1
Linux implementation ____________________________________________________132
10.2
Wireshark implementation ________________________________________________136
10.3
Online debug form _______________________________________________________137
11
Implementation test __________________________________________________ 138
11.1
Test setup _______________________________________________________________138
11.2
Test result and observations _______________________________________________139
11.3
Ethernet QoS support test at IXPs __________________________________________142
11.4
Resource usage estimates __________________________________________________143
11.4.1
11.4.2
12
Increase in routing update information size ________________________ 145
Increase in memory consumption with routers _____________________ 148
Summary and outlook ________________________________________________ 152
12.1
Contributions and results__________________________________________________152
12.2
Practical usage___________________________________________________________153
12.3
Outlook ________________________________________________________________153
v
17.11.2009
Titel
Domänen- und schichtenübergreifendes Konzept zur Realisierung grob-granularer
Dienstgüte in IP-Netzen
Inhaltsverzeichnis
1
Einleitung______________________________________________________________ 3
2
Grundlagen des IP Routing und Forwarding _________________________________ 4
2.1
IP Datagramstruktur und Adressierung _________________________________________4
2.2
Grundlagen des Routings______________________________________________________7
2.2.1
2.2.2
2.3
Router-Architektur _________________________________________________________19
2.3.1
2.3.2
2.3.3
3
Routing-Protokolle und -hierarchien __________________________________ 7
Inter-Domän-Routing mittels BGP___________________________________ 12
Struktur der Router-Steuerungsschicht ______________________________ 19
Struktur Router-internen Verbindungen ______________________________ 20
Struktur der Router-internen Warteschlangen ________________________ 21
Grundlegende Aspekte der Dienstgüte ______________________________________ 23
3.1
Überblick __________________________________________________________________23
3.1.1
3.1.2
3.2
Ausdehnungsbereich von QoS-Mechanismen ____________________________________37
3.2.1
3.2.2
3.2.3
3.3
QoS-basiertes Weiterleiten ________________________________________ 38
QoS-basierte Wegewahl___________________________________________ 39
QoS-basiertes Tunneln ____________________________________________ 41
Einflußbereiche der Konzept-Architektur_______________________________________44
3.3.1
3.3.2
4
Relative vs. absolute vs. grob-granulare QoS_________________________ 23
QoS-Bausteine ___________________________________________________ 25
Schichtenübergreifende QoS_______________________________________ 44
Domänübergreifende QoS _________________________________________ 45
Aktuelle QoS-Konzepte __________________________________________________ 46
4.1
IP QoS ____________________________________________________________________46
4.1.1
4.1.2
4.1.3
4.1.4
DiffServ _________________________________________________________ 47
IntServ __________________________________________________________ 52
Kombination von IntServ und DiffServ _______________________________ 54
IP QoS Konzept der ITU-T _________________________________________ 55
4.2
Ethernet QoS_______________________________________________________________56
4.3
MPLS QoS_________________________________________________________________61
4.4
QoS in Zugangsnetzen _______________________________________________________65
4.5
Zusammenfassung der zu erwartenden Dienstklassenunterstützung _________________69
5
Derzeitige AS-Kopplung _________________________________________________ 71
5.1
IP Transit _________________________________________________________________74
5.2
IP Peering _________________________________________________________________75
5.3
Internet Routing Registratur - IRR ____________________________________________77
6
Bisherige Arbeiten auf dem Gebiet _________________________________________ 78
vi
17.11.2009
7
Das neue (grob-granulare) CoS-Konzept ____________________________________ 86
7.1
Motivation und Zielsetzung ___________________________________________________86
7.2
Nutzung von BGP zur QoS-Signalisierung ______________________________________88
7.3
Definitionen und Informationsverarbeitung _____________________________________89
7.3.1
7.3.2
8
BGP Extended Community Attribut zur CoS-Markierung _______________ 89
Dienstklassen-basierte Kopplung mittels BGP ________________________ 96
Zuordnungsstrategien __________________________________________________ 101
8.1
Problembeschreibung_______________________________________________________101
8.1.1
8.1.2
Dienstklassenabbildungen innerhalb einer Schicht ___________________ 101
Dienstklassenabbildungen zwischen verschiedenen Schichten ________ 103
8.2
Vorhandene Empfehlungen__________________________________________________104
8.3
Empfehlungen zu grob-granularen CoS-Abbildungen ____________________________113
9
Simulationsergebnisse __________________________________________________ 115
9.1
Simulationsplanung für QoS-Markierungen und QoS-Weiterleitung _______________115
9.2
Simulationsergebnisse für QoS-Markierungen und QoS-Weiterleitung______________117
9.2.1
9.2.2
9.2.3
9.2.4
9.2.5
9.2.6
9.2.7
Szenario 1: Einzelknotenkopplung _________________________________ 117
Szenario 2: AS-Kopplung – Einzel-AS ______________________________ 120
Szenario 3: AS-Kopplung – Multi-AS _______________________________ 121
Szenario 4: AS-Kopplung – 2 AS __________________________________ 122
Szenario 5: AS-Kopplung – 3 AS __________________________________ 123
Szenario 6: AS-Kopplung – 4 AS __________________________________ 124
Szenario 7: Schichtenübergreifende AS-Kopplung ___________________ 126
9.3
Simulationsplanung für Token Bucket-Filterung ________________________________127
9.4
Simulationsergebnisse für Token Bucket Filterung ______________________________128
9.5
Zusammenfassung der Simulationsergebnisse __________________________________130
10
Implementierung des Konzeptes ________________________________________ 132
10.1
Linux-Implementierung ___________________________________________________132
10.2
Wireshark-Implementierung _______________________________________________136
10.3
Online-Formular zur Dekodierung __________________________________________137
11
Implementierungstest _________________________________________________ 138
11.1
Testaufbau ______________________________________________________________138
11.2
Testergebnisse und Beobachtungen _________________________________________139
11.3
Tests zur Ethernet-QoS Unterstützung bei IXPs _______________________________142
11.4
Abschätzung des Resourcenverbrauchs ______________________________________143
11.4.1
11.4.2
12
Anstieg der UPDATE-Größe_____________________________________ 145
Anstieg des Speicherbedarfs ____________________________________ 148
Zusammenfassung und Ausblick________________________________________ 152
12.1
Beitrag und Ergebnisse ___________________________________________________152
12.3
Praxisanwendung ________________________________________________________153
12.3
Ausblick ________________________________________________________________153
vii
17.11.2009
Einleitung
Die Vernetzung aktueller IP-basierter Datennetze bildet zwar eine moderne Kommunikationstechnologie, besitzt jedoch einige Unzulänglichkeiten in der Netzkopplung. Die
nachfolgende geschichtliche Analogie zeigt genau diese Schwachstellen des Internets
auf, welche zugleich in dieser Arbeit aufgegriffen und verbessert werden.
Im 19. Jahrhundert wurde die Kommunikation zwischen den Kolonien Südaustralien
und Westaustralien durch Dampfschiffe realisiert, was durchaus Wochen für den
Transport dauern konnte. Damals entschied man, die Kommunikation auf Telegraphie
umzustellen.
1874 begann man deshalb mit dem Bau der Telegraphenleitung. Südaustralien trieb die
Leitung von Port Augusta westwärts bis zur Grenze und Westaustralien begann mit
dem Bau in Albany in Richtung Osten. An der Telegraphenstation in der kleinen
Grenzsiedlung Eucla ([145], [166]) wurde 1877 die Verbindung beider Leitungsabschnitte erreicht.
Die Station wurde zu gleichen Teilen mit Mitarbeitern betrieben, die entlang eines
langen Nord-Süd ausgerichteten Tisches sich gegenüber saßen. Die Grenze war dabei
die Mitte des Hauses und die Mitte des Tisches. Nachrichten, die zwischen den Staaten
ausgetauscht werden sollten wurden somit vom jeweiligen Personal empfangen,
manuell zur anderen Seite des Tisches gereicht und dort erneut als Telegraphennachricht gesendet.
Grund dafür waren verschiedene Zeichenkodierungen, die auf beiden Seiten verwendet
wurden. Südaustralien verwendete den amerikanischen Morse-Code und Westaustralien den internationalen.
Die Ähnlichkeit besteht darin, dass das heutige Internet aus etwa 30000 unabhängig
voneinander betriebener IP-Netze, so genannter Autonomer Systeme (AS), besteht, die
in unkoordinierter Weise Dienstgütekonzepte verfolgen und auf einfachstem Niveau
privat oder öffentlich vernetzt sind. Trotz dessen, dass diese ASse oft intern frei
gewählte Verkehrstrennung und –priorisierung anwenden, wird bei deren Zusammenschluss die Trennung entfernt und ohne Verkehrstrennung und vorrangige Behandlung
die Verkehrsübergabe vorgenommen.
Einige Eintrittsvermittlungen der ASse betreiben dann aufwendige Klassifizierung
anhand der gekapselten Empfangsdaten, um eine möglichst gute Schätzung der
empfangenen Verkehrsart zu treffen und erneut die passende interne Verkehrstrennung
und –priorisierung anzuwenden.
Deshalb wurde in dieser Arbeit die Signalisierung und direkte Verkehrsklassen-basierte
Kopplung Autonomer Systeme untersucht, dokumentiert und implementiert.
viii
17.11.2009
Zusammenfassung und Ausblick
Diese Dissertation betrachtet den Zusammenschluss von so genannten „Autonomen
Systemen“, die derzeit keinerlei Dienstgüteunterstützung bieten.
Die erbrachten Beiträge dieser Arbeit sind in wesentlichen in drei Teile gegliedert. Den
ersten Teil bildet eine umfassende Zusammenstellung von vorhandenen Dienstgütekonzepten einschließlich der bereits existierenden QoS-Funktionselemente in verfügbaren Netzen und Geräten zur Netzkopplung. Diese Geräte sind nachweislich für die
Unterstützung von domänenübergreifender, klassenbasierter Dienstgüte geeignet. Aus
diesen Erkenntnissen und zusammen mit den mündlichen Aussagen führender
Europäischer und Amerikanischer Netzbetreiber und Betreibern aus dem Nahen Osten
über die akzeptable Komplexität solcher Dienstgütevorhaben entstand die vordringliche
Forderung nach einem einfachen, leicht fassbaren und handhabbaren Dienstgütekonzept. In einem zweiten Teil wurde das angestrebte domänenübergreifende Dienstgütekonzept spezifiziert und zur Standardisierung bei der IETF eingereicht. Im dritten Teil
wird durch Simulation und Implementierung wesentlicher Konzeptbestandteile deren
Funktion und technische Machbarkeit dargelegt. Die Skalierbarkeit und Funktionalität
des Konzeptes wurde durch Feldtests und durch Abschätzungen des Ressourcenverbrauchs nachgewiesen.
Beitrag und Ergebnisse
Folgende Erkenntnisse und Beiträge wurde in der Arbeit erbracht:
• Der Zusammenschuss von autonomen Systemen zum globalen Internet stellt
aus technischer und ökonomischer Sicht eine neuralgische Schnittstelle zwischen Netzbetreibern dar. Derzeitige Zusammenschlüsse basieren ausschließlich
auf
dem
Austausch
von
IP-Nachrichten
ohne
Dienstgüteunterstützung. Überdimensionierung und netzinterne Dienstgüteunterstützung werden derzeit vorgenommen. Durch das anhaltende Wachstum des Internetverkehrs wird in der Dissertation ein Anstieg an
Netzausbaukosten und zunehmender Verkehrsstau auf den Kopplungsleitungen erwartet. Eine neues klassenbasiertes Kopplungskonzept wurde deshalb
entwickelt, das für globale Anwendung geeignet ist.
• Die Einfachheit eines Entwurfes wurde als entscheidendes Entwurfskriterium
für die Akzeptanz des Konzeptes in der Internet-Gemeinde erkannt. Es erstreckt sich dabei sowohl auf die Signalisierungsstrukturen als auch das tatsächliche Ausmaß der Klassenunterstützung.
• Die Wichtigkeit der Unterstützung von mindestens zwei oder besser 4 Dienstklassen wurde mit Hilfe von Simulationen untermauert.
• Im Gegensatz zu existierenden komplexen Dienstgütekonzepten, die Garantien zu Verzögerungen, Verzögerungsschwankungen und Verlustraten anstreben, wird aus Kosten- und Akzeptanzgründen im vorliegenden Konzept
nur einfache Verkehrstrennung gefordert.
• Der erreichte Grad an Einfachheit durch Wegfall von Dienstgütegarantien ist
eine zentrale Voraussetzung für die globale Anwendbarkeit.
• Die Entscheidung zur Verwendung von BGP für die Signalisierung wurde auf
Basis der Betrachtungen zu bereits existierenden und emporkommenden
Signalisierungsprotokolle getroffen.
• Im BGP wurden neue so genannte “Extended Communities” und ein neues
Pfadattribut definiert, die zur Signalisierung der erforderlichen domänen- und
schichtenübergreifenden Klasseninformation verwendet werden.
ix
17.11.2009
•
•
•
•
•
•
Das neuartige Prinzip der transitiven Weiterleitung von Dienstklasseninformationen mittels der “Extended Communities” und der vom Betreiber festlegbaren Zuordnung der Dienstgüteeinstellungen verschiedener Netzschichten
innerhalb der Signalisierung stellt eine grundlegende Errungenschaft dar.
Die Ergebnisse aufwendiger Einzelknoten-Simulationen und Simulationen auf
AS-Niveau wurden auszugsweise in dieser Dissertation dokumentiert und
sind auf Anfrage vollständig verfügbar.
Der Nachweis der Anwendbarkeit des Konzeptes und der Interoperabilität mit
vorhandenen Netzelementen wurde durch Tests mit der LinuxImplementierung erbracht.
Abschätzungen zum Ressourcenverbrauch wurden vorgenommen, die einen
vernachlässigbar kleinen Einfluss des zusätzlichen Signalisierens von Dienstklasseninformationen auf die Größe der BGP-UPDATE-Nachrichten aufzeigten. Ein maßvoller Verbrauch an Speicherressourcen wurde ebenfalls
ermittelt. Dabei wurde unter der Annahme von realistischen Szenarien die
Anwendbarkeit der Konzepten auch für große Netzausmaße nachgewiesen.
Die Gestaltung des Konzeptes behindert nicht den zusätzlichen gezielten
Einsatz komplexer Dienstgütemechanismen mit garantierter Dienstgüte. In
der Tat wird der universelle Einsatz des hiesigen Konzeptes und der selektive
Einsatz höherwertiger Konzepte an ausgewählten Kopplungen oder TransitPfaden unterstützt.
Auf der Basis des Konzeptes wird die Umwandlung des heutigen Internets hin
zu einem 2- oder besser 4-Klassen unterstützenden Internet.
Praxisanwendung
Besonderes Augenmerk wurde auf die praktische Nutzung des Konzeptes gelegt. Die
folgenden Punkte listen wichtige Meilensteine für die Anwendbarkeit.
•
•
•
•
Mit der Übertragung der Konzeptspezifikation an die IETF Standardisierung
wurde praktisch eine lizenzfreie Nachnutzung ohne patentrechtliche Einschränkungen ermöglicht. Die globale Anwendung des Konzeptes ist angestrebt und mögliche Kosteneinsparungen auf Betreiberseite tragen zum durch
das Konzept erreichbaren Gewinn bei.
Die Implementierungen in der Linux Routing-Software, Quagga, und dem
Netzanalysewerkzeug, Wireshark, sind frei verfügbar. Die WiresharkErgänzung ist dabei bereits von den Entwicklern akzeptiert und in die aktuelle
Softwareversion integriert worden. Gleiches ist für die Quagga-Erweiterung
geplant.
Ein Online-Dienst wurde eingerichtet der die Dekodierung von signalisierten
Klasseninformationen im Rohdatenformat akzeptiert. Er ist unter folgender
Adresse zu finden: http://www.bgp-qos.org/draft-knoll/decode_attributes.php .
Die Nummernvergabestelle, IANA, hat bereits Typnummern für die “QoS
Marking” und “CoS Capabilities” Elemente zugeteilt, so dass diese offiziell in
den Produktionsnetzen der Betreiber verwendet werden können. Damit hat
das Konzept bereits die Schwelle vom Laboraufbau hin zum öffentlichen Einsatz überschritten.
x
17.11.2009
Ausblick
Derzeit ist die Anwendung des neuen domänen- und schichten-übergreifenden Konzept
zur Realisierung grob-granularer Dienstgüte auf Linux-basierte Netzelemente beschränkt. Laufende Gespräche mit Netzbetreibern und Router-Herstellern zielen jedoch
auf die generelle Unterstützung des Konzeptes in kommerziellen Routern ab. Die
technische Machbarkeit wurde dabei bestätigt und Interesse daran wurde von Europäischen Betreibern bekundet. Zukünftige Praxiserfahrungen und Änderungswünschen
werden dabei zur Verfeinerung des Konzeptes führen.
Um die Anwendung des Konzeptes zu fördern, wird derzeit an der Ergänzung der
herkömmlichen kommerziellen Router um eine interaktive Linux-basierte Fernsteuerung
gearbeitet. Fig. 155 zeigt dabei den verdeckten Steuermechanismus des kommerziellen
Routers durch einen internen Linux-PC. Dadurch, dass die Signalisierungselemente
transitiv definiert wurden, kann der Router mit passivem bidirektionalem Durchleiten die
Verarbeitung und Generierung von Dienstklasseninformationen an das Linux-System
deligieren. Mit Hilfe einer zweiten Verbindung kann nun der Linux-PC die Steuerschnittstelle des Routers erreichen und die notwenigen Kommandos zur Konfiguration und
Aktivierung der vorhandenen Router internen QoS Funktionen absetzen.
Dies Übergangslösung erlaubt den Netzbetreibern ohne kostspielige Software- oder
Hardwareaktualisierungen eine klassenbasierte Netzkopplung anzubieten.
Fig. 153
Steuerung eines kommerziellen Routers durch einen Linux-PC
Eine derzeitige Diskussion über „Netzneutralität“ beeinflusst die Bereitschaft von
Netzbetreibern und Herstellern, domänenübergreifende Dienstgütemechanismen zu
unterstützen. Dabei steht der neutrale Netzbetrieb ohne Dienstlimitierungen, Inhaltsfilter, und ohne jegliche Bevorzugung einzelner Nutzer im Vordergrund.
Entsprechende Gespräche mit Netzbetreibern und verschiedener staatlicher Netzagenturen haben ergeben, dass das vorgeschlagene Dienstgütekonzept mit seiner einfachen
und allgemein anwendbaren Struktur womöglich als nicht diskriminierende und
flächendeckend einsetzbare Verbesserung des Internets angesehen würde.
Zusätzliche techno-ökonomische Studien zu erreichbaren Kosteneinsparungen werden
von Nöten sein, um die Entscheidungsprozesse der Betreiber hinsichtlich Geräteaktualisierungen und der Einführung von klassenbasierter Dienstgüte zu unterstützen.
xi
17.11.2009
In Kapitel 5.2 wurde bereits kurz ein von der Firma Google vorgeschlagener Unterschriftsprozess beschrieben, der mit Hilfe von so genannten BGP „Communities“ die
Teilnahme an neuen Diensten und Konzepten besiegelt. Je nach Erfolg dieses
Vorhabens kann es dazu führen, dass das vorgeschlagene Dienstgütekonzept als
Vertragsbasis für die Vereinbarung von klassenbasierter Dienstgüte zwischen Betreibern genutzt wird.
xii
17.11.2009
Acronyms
ABR
ABR
AD
ADSL
AFI
ARP
ASBR
ASN
ATM
B-ISDN
BA
BGP
BGRP
BRAS
CAC
CAPEX
CBR
CBWFQ
CIDR
CIR
CLI
CLP
COPS
CR-LDP
CS
DE
DFZ
DiffServ
DMA
DNS
DRR
DS
DSCP
DSL
DV
E-LSP
eBGP
ECN
EF
ECN
EGP
EIGRP
FCFS
FIB
Area Border Router
Available Bit Rate
Administrative Distance
Asymmetric DSL
Address Family Identifier
Address Resolution Protocol
Autonomous System Border Router
Autonomous System Number
Asynchronous Transfer Mode
Broadband ISDN
Behaviour Aggregate
Border Gateway Protocol
Border Gateway Reservation Protocol
Broadband Remote Access Server
Call Admission Control
Capital Expenditure
Constant Bit Rate
Class-Based Weighted Fair Queueing
Classless Inter-Domain Routing
Committed Information Rate
Command Line Interface
Cell Loss Priority (CLP) bit
Common Open Policy Service
Constraint-based Routed LDP
Class Selector
Discard Eligibility bit in frame relay
Default Free Zone
Differentiated Services
Direct Memory Access
Domain Name System
Deficit Round-Robin
Differentiated Services
DiffServ Code Point
Digital Subscriber Line
Distance Vector
EXP-Inferred-PSC LSP / now: Explicitly TC-encoded-PSC LSP
external Border Gateway Protocol
Explicit Congestion Notification
Expedited Forwarding
Explicit Congestion Notification
Exterior Gateway Protocol
Enhanced Interior Gateway Routing Protocol
First Come First Served
Forwarding Information Base
xiii
17.11.2009
FIFO
First In First Out
FR
Frame Relay
FSM
Finite State Machine
FTP
File Transfer Protocol
GbE
Gigabit Ethernet
GBR
Guaranteed Bit Rate
GCRA
Generic Cell Rate Algorithm
GIST
General Internet Signalling Transport
GMPLS
Generalized MPLS
GPS
Generalized Processor Sharing
GRE
Generic Routing Encapsulation
HDLC
High Level Data Link Control
HOLB
Head of Line Blocking
IANA
Internet Assigned Numbers Authority
iBGP
internal Border Gateway Protocol
ICMP
Internet Control Message Protocol
IETF
Internet Engineering Task Force
IESG
Internet Engineering Steering Group
IGP
Interior Gateway Protocol
IGRP
Interior Gateway Routing Protocol
IntServ
Integrated Services
IP
Internet Protocol
IPv4
Internet Protocol version 4
IPv6
Internet Protocol version 6
IRR
Internet Routing Registry
IS-IS
Intermediate System to Intermediate System
ISDN
Integrated Services Digital Network
ISO
International Organization for Standardization
ISP
Internet Service Provider
IXP
Internet Exchange Point
L-LSP
Label-only-Inferred-PSC LSP
LAN
Local Area Network
LDP
Label Distribution Protocol
LIB
Label Information Base
LIFO
Last In First Out
Loc-RIB
Local RIB
LQD
Longest Queue Drop
LS
Link State
LSDB
Link State Database
LSP
Label Switched Path
MAC
Media Access Control
MAC-in-MACEncapsulation of Ethernet frames in Ethernet frames
MED
Multiple Exit Discriminator
MESCAL
Management of End-to-end Quality of Service Across the Internet at Large
MPLS
Multi Protocol Label Switching
MSS
Maximum Segment Size
MTU
Maximum Transmission Unit
NGN
Next Generation Network
NLRI
Network Layer Reachability Information
NSIS
Next Steps In Signalling
NSLP
NSIS Signalling Layer Protocol
NTLP
NSIS Transport Layer Protocol
xiv
17.11.2009
OPEX
OS
OSI
OSPF
PBB
PBT
PC
PCN
PCP
PDB
PDP
PDU
PFC
PGPS
PHB
POTS
PS
PSTN
PT
q-BGP
Q-in-Q
QoS
QoE
RAM
ReaSE
RED
RFD
RIB
RIP
RPSL
RPSLng
RR
RR
RS
RSVP
RSVP-TE
SAFI
SDH
SDU
SLA
SONET
SP
SPF
SPI
TC
TCA
TCP
TOS
TTL
UBR
UDP
UMTS
Operational Expenditure
Operating System
Open Systems Interconnection
Open Shortest Path First
Provider Backbone Bridges
Provider Backbone Transport
Personal Computer
Pre-Congestion Notification
Priority Code Point
Per Domain Behaviour
Policy Decision Point
Protocol Data Unit
Priority-based Flow Control
Packet-by-packet Generalized Processor Sharing
Per Hop Behaviour
Plain Old Telephone Service
Processor Sharing
Public Switched Telephone Network
Packet Type
QoS enhanced BGP
802.1q in 802.1q encapsulation
Quality of Service
Quality of Experience
Random Access Memory
Realistic Simulation Environments for IP-based Networks
Random Early Detection
Route Flap Damping
Routing Information Base
Routing Information Protocol
Routing Policy Specification Language
Routing Policy Specification Language next generation
Round Robin
Route Reflector
Router Server
Resource Reservation Protocol
RSVP-Traffic Engineering
Subsequent Address Family Identifier
Synchronous Digital Hierarchy
Service Data Unit
Service Level Agreement
Synchronous Optical NETwork
Strict Priority
Shortest Path First
System Packet Interface
Traffic Class
Traffic Conditioning Agreement
Transmission Control Protocol
Type of Service
Time To Live
Unspecified Bit Rate
User Datagram Protocol
Universal Mobile Telecommunications System
xv
17.11.2009
URL
VBR
VC
VLAN
VLSM
VoIP
VOQ
VTYSH
WAN
WDRR
WiMAX
WRED
WRR
WLAN
WLL
Uniform Resource Locator
Variable Bit Rate
Virtual channel
Virtual LAN
Variable Length Subnet Mask
Voice over IP
Virtual Output Queues
Virtual TeletYpe shell
Wide Area Network
Weighted Deficit Round-Robin
Worldwide Interoperability for Microwave Access
Weighted Random Early Detection
Weighted Round Robin
Wireless LAN
Wireless Local Loop
xvi
17.11.2009
Acknowledgments
The work presented in this thesis was done at Chemnitz University of Technology in
Chemnitz, Germany. The interest for the topic and the idea for the proposed concept arose
through the lecturing work at the Chair of Communication Networks.
I would like to express my deep thanks to the current and the former head of chair, Prof.
Thomas Bauschert and Prof. Klaus Franke, respectively, for their support during the last
years and for invaluable discussions and comments on my work. I am very grateful to Prof.
Jörg Eberspächer for his offer to act as a co-examiner of my thesis and for the chance to
present this work at his institute.
A special thanks goes to David Ward, Dr. Yakov Rekhter, Robert Raszuk and Jie Dong for
their support with IANA’s number assignment, fruitful discussions and detailed feedback
on the concept.
I am very grateful to Arnold Nipper and Wolfgang Tremmel from DE-CIX as well as Jens
Wengenmayr and Frank Benndorf from envia TEL GmbH for their technical feedback and
support.
Furthermore, I wish to thank Simon Ehnert for the programming support with the Quagga
routing suite, my co-worker Daniel Manns for his support in the work with OMNET++ , Uwe
Steglich for challenging hours with NS2 and the other co-workers and students at the chair
of Communications Networks for their helpful comments and reflections.
My thanks is due as well, to Brian Schaefer, who has helped me with correcting my writing.
Finally, I would like to thank my family for their support, patience, and understanding
during these challenging years.
Thomas Martin Knoll
Chemnitz, July 2009
2
17.11.2009
1 Introduction
The internetworking of current IP-based data networks is a modern communication
technology with some major interconnection drawbacks. The following historical allegory
depicts the weak spot of the widely used Internet, that is addressed in this work.
Back in the 19th century, the two colonies of South Australia and Western Australia
decided to communicate between each other via telegraph, rather than steamship, which
took weeks.
In 1874 both colonies started to erect a new telegraph line to interconnect their independently operating telegraph systems. South Australia, started its line from Port Augusta
towards the border in the west and Western Australia erected its line from Albany towards
its eastern border.
In 1877, the interconnection was established at the Eucla Telegraph Station ([158], [179]),
a small settlement near the border between the colonies. The station was equally staffed
and the telegraphists of both colonies sat along a north to south oriented table. In fact the
technical border divided the building and the operators’ table in half.
The West Australian operators received their inter-state messages at the western half of
the table and pushed the message across it towards their respective South Australian
colleague. From there, the message was again telegraphed into South Australia and vice
versa.
The reason for this manual repeater station was the different character encoding used on
either side. South Australia used the American Morse code and Western Australia the
International one.
The similarity lies in the fact that the current Internet consists of about 30.000 independently operated IP networks, called Autonomous Systems (AS), which run uncoordinated
quality of service concepts and are in a very basic manner privately or publicly interconnected. Despite the fact, that ASes often apply some sort of independently chosen traffic
separation and prioritization within the respective network cloud, their interconnection
removes all such separation and handles the exchange traditionally without any separation
or prioritization.
Some AS ingress routers in turn apply multi-layer ingress classification methods in order to
make a good guess on what traffic enters the network and should be separated and or
prioritized.
The signalling and direct traffic class based interconnection of Autonomous Systems has
therefore been investigated, documented and implemented.
3
17.11.2009
2 Fundamentals of IP routing and forwarding
The robust and inexpensive exchange of information between end systems in global scale
is the major achievement of the current Internet. Many networking technologies exist,
which allow for the networking of electronic devices using different layer two technologies.
However, such local area networks make use of several, independently chosen technologies, which require interworking functions for an internetworking between them. This
barrier is removed with the introduction of the commonly used Internet Protocol (IP) as
least common denominator regarding the very basic requirements for a primitive datagram
based information exchange.
The Internet is therefore a patchwork of many networking clouds, which all provide the
means for an end-to-end IP-based datagram transmission service.
2.1 IP datagram structure and addressing
In order to understand the capabilities of the globally available IP datagram service, it is
best to review the protocol’s control information exchange, which is carried within the
header structure of each single protocol data unit.
Fig. 1 depicts the datagram structure of the currently predominantly used version four of
the Internet protocol. Its original structure was defined in RFC791 [153].
Fig. 1 IP version 4 datagram structure
The most important elements of the header are the destination and the source IP address,
which are used for a hop-by-hop relay process towards the destination and for backward
error reporting in case of delivery failures, respectively.
IP addresses used to be grouped into address classes – A, B, C, D, E – following the
structure given in Fig. 2. Each node belonging to a network cloud was assigned an IP
address containing the same network part within the 32 bit number. A router would
therefore decide by the destination address of the datagram as well as of the network
4
17.11.2009
number its receiving interface belongs to, whether the datagram is destined for the
originating cloud or needs to be relayed towards a next hop router.
Fig. 2 IPv4 address class system - [22]
The stiff address class regime, as well as the huge and small network clouds for class A
and C type networks, respectively, led to a revised scheme for network/host differentiation
allowing any bit position within the 32 bit field as network address boundary. The scheme
is called “Classless Inter-Domain Routing (CIDR)” [81], [82] and introduces a network
mask field of 32 bit to support “variable length subnet masks (VLSM)”. Combined with the
traditional address classes, it now allows the creation of subnets out of one larger network
and supernets out of several consecutive smaller networks. Fig. 3 gives a subnetting
example for the creation of 128 subnets out of one class B network.
10
Network
Subnet
Host
.
11111111111111111111111 000000000
Fig. 3 CIDR example network mask
Routers in CIDR networks now compare the network part of their interface address with
the network part of the currently processed IP destination address using a simple AND
operation with the network mask applied on both addresses.
The major advantage of CIDR in global scale routing lies on the field of route aggregation.
IP address ranges (so called prefix blocks) of Internet service providers or some large
scale companies tend to have fine grained address allocations with network masks in their
twenties. However, routers in the core regions of the Internet might see a number of
consecutive address blocks in their routing tables, which all resolve towards the same next
hop neighbour. Summarizing those table entries into just one bigger address block with a
shorter network mask saves on table storage, table lookup delay and route advertisement
messages. Such prefix aggregation by means of CIDR is therefore heavily used in today’s
Internet routing.
Further work on IP addressing was performed with the introduction of IP version 6 [63],
[64], [5][1]. This new version extends the IP addresses to 128 bit fields and specifies a
fixed size basic header structure of 40 bytes length. The new scheme of header extensions allows for a dynamic incorporation of additional header information. Fig. 4 depicts
the version 6 datagram structure.
5
17.11.2009
Fig. 4 IP version 6 datagram structure
The CIDR concept of address and netmask is continued, but the IP address classes
vanished. IP version 6 introduces address types instead [92]. The following types have
been defined:
• Unicast Addresses,
o Interface Identifiers,
o The Unspecified Address,
o The Loopback Address,
o Global Unicast Addresses,
o IPv6 Addresses with Embedded IPv4 Addresses,
o Link-Local IPv6 Unicast Addresses,
o Site-Local IPv6 Unicast Addresses (deprecated),
• Anycast Addresses and
• Multicast Addresses.
Under the light of QoS support in IP-based networks, the differentiation of datagrams
during the hop-by-hop relay process needs to be made.
One approach could be to reserve address blocks for certain forwarding treatments and to
enumerate those end devices, which have certain quality of service requirements, with
such IP addresses. Relaying nodes could react to such special destination addresses
within the datagram header and might even provide different routing decisions in their
relay process.
However, this puts an unnecessary burden on the globally arranged IP address planning
and prevents end devices to support several services with possibly differing QoS requirements concurrently.
The original IPv4 “Type of Service (ToS)” as well as the IPv6 “Traffic Class (TC)” header
field both provide 8 bits for quality of service datagram marking information, but with
different encodings. In the course of “Differentiated Services (DiffServ)” - a quality of
6
17.11.2009
service concept described in section 4.1.1 - the redefinition of both fields by RFC 2474
[142] into a so called “Differentiated Services field (DS field)” was the decisive step forward
to achieve a common encoding scheme independent of IP protocol versions. Six bit
“Differentiated Services Code Points (DSCP)” were specified for DiffServ purposes.
Since DSCP occupies only 6 out of the redefined 8 bits of both fields, a second mechanism is incorporated in the remaining two bits. It is called “Explicit Congestion Notification
(ECN)” and allows for forward congestion notification by relaying nodes – RFC 3168 [156].
The combination of both definitions and some clarification on the wording and meaning of
the specifications is given in RFC 3260 [85]. Fig. 5 references the major redefinition RFCs
as well as the four common differentiated services “Per Hop Behaviour (PHB)“ encodings.
Fig. 5 Differentiated Services (DS) field in IPv4 and IPv6 datagram headers
2.2 Routing basics
2.2.1 Routing protocols and hierarchy
The relay of IP datagrams is performed in a hop-by-hop manner, which transports the
packetized information solely based on the destination address field contained in the IP
packet’s header information. The interworking functions only relay the datagram out of the
layer two networking cloud, if the IP destination address belongs to a different IP network
than the one it originated from.
The interfaces of such interworking devices are each members of the respective networking cloud and relay the datagram on behalf of the original source to the neighbouring
relaying node, which they believe to be closer to the datagram’s destination. Those layer
three interworking devices are called “routers” and can be equipped with different relaying
capabilities depending on their positioning within the global hierarchically organized
patchwork of networking clouds.
The relay process of IP datagrams within a router consists of three major steps:
7
17.11.2009
1. IP lookup of the datagram’s IP destination address in a forwarding table to find the
best matching relay path towards the correct next hop router together with the respective output interface connecting to that next hop neighbour,
2. IP header field processing to decrease the time-to-live value and to update the
header check sum accordingly and
3. internal transfer of the processed datagram to the output interface for transmission
towards the layer two address of the next hop router.
The described action points relate to the lower half of the depicted functionality in Fig. 6 –
the functions of the so called “forwarding plane”.
The upper half controls the setup of the vital routing information within the mentioned
forwarding table. The control plane functionality is based on dynamic reachability
information exchanges using specialized routing protocols. Each routing protocol instance
of a router performs some sort of neighbour discovery and advertises the known IP
prefixes to them.
Fig. 6 IP routing and forwarding functionality
A router typically maintains two “routing tables” internally. The so called “Routing Information Base (RIB)” stores all valid routes the update process has learned locally or dynamically from other routers. In a second step the best routes to each advertised prefix are
selected out of the RIB and installed into the so called “Forwarding Information Base (FIB)”
used by the forwarding plane.
If an IP prefix has been learned in several best path granularities (prefix lengths), then all
of them are stored in the FIB. During IP lookup, a so called “longest prefix match” is
performed in order to find the most specific best route towards the currently processed IP
destination address.
The dynamic routing process with the exchange of reachability information for IP prefixes
is organized in a hierarchical fashion. In a flat routing structure, every newly established or
on the contrary lost connectivity to an IP prefix needs to be communicated to all participating routers. This is not feasible in a global scale Internet, but is used in small portions of
the internetworked clouds.
8
17.11.2009
The routing hierarchy is comprised of routing areas, routing domains and autonomous
systems (AS) as shown in Fig. 7.
Fig. 7 Internet routing hierarchy
Each hierarchy level summarises its internal reachability changes and communicates
those changes in summary routes to the next upper level. This way of operation reduces
the size and frequency of routing updates between the routing hierarchy levels and uses
CIDR with VLSM aggregation in large scale.
The lowest level of the hierarchy is routing areas, which are solely used to encapsulate
routing changes and errors into confined regions. This limits the reach of flooded information, reduces convergence time after route changes and covers (dampens) routing
changes, which have only area local significance. Routing areas are run by the same
authority and operate identical routing protocols and policies. Inter-area routing information
exchange of summarized routes is performed by so called “area border routers (ABR)” or
more general, “gateways”. The typical topology is a hub and spoke approach, which
consists of one backbone area connecting all other areas within a routing domain.
Routing domains are consistently operated by a single authority and normally rely on a
single internal routing protocol with the same set of metrics and routing decision policies.
They can also be regarded as single routing domains in the case of different internal
routing protocols being used, as long as the domain provides a single and consistent
routing behaviour to the outside network.
An authority can operate one or more routing domains and request a so called “autonomous system number (ASN)” for registration in the global Internet. Each autonomous
routing domain, which was assigned an ASN, turns into an “Autonomous System (AS)”.
ASes are therefore characterized by a 16 bit (currently being transitioned to 32 bit [176])
AS number and a unified administrative routing policy internally. RFC 1930 [86] gives
guidelines for creation, selection, and registration of an AS.
AS-internal routing protocols are generally referred to as “Interior Gateway Protocols
(IGP)” and the ones interconnecting ASes are called “External Gateway Protocols (EGP)”,
respectively. Edge routers at AS interconnection points are referred to as “Autonomous
System Border Routers (ASBR)”.
Fig. 8 gives an example of a typical Internet routing architecture.
9
17.11.2009
Fig. 8 Internet routing architecture
Commonly found routing protocols in today’s networks are RIP (Routing Information
Protocol – version 1 [87] and 2 [130]), OSPF (Open Shortest Path First [140]), IS-IS
(Intermediate System to Intermediate System [106], [42]) and BGP (Border Gateway
Protocol version 4 [157]).
Two proprietary protocols, IGRP (Interior Gateway Routing Protocol [48] - obsolete) and
EIGRP (Enhanced Interior Gateway Routing Protocol [50]) are also in use, but are limited
to networks, which solely deploy Cisco routers.
The first inter-domain routing protocol, EGP (Exterior Gateway Protocol [138]), was limited
to a tree topology and is no longer in use.
IP routing protocols – applicability
Intra-domain routing
Inter-domain routing
Interior Gateway Protocols (IGP)
Exterior Gateway Protocols (EGP)
-
RIPv1 (obsolete)
RIPv2
IGRP (obsolete)
EIGRP
OSPF
IS-IS
iBGP (version 4)
- EGPv3 (obsolete)
- eBGP (version 4)
Fig. 9 IP routing protocols – classified by applicability
10
17.11.2009
The exchange of reachability information together with path characteristics can be
classified in two major working principles: “distance vector routing (DV)” and “link state
routing (LS)”. Smaller networks with less stringent convergence requirements and with low
processing power / low energy consuming routers will opt for DV protocols. Otherwise, link
state protocols are required. A third principle, “path vector routing”, is a modified distance
vector principle, which is currently only used with the border gateway protocol. Fig. 10
gives an overview about the principle classification and typical examples.
Distance vector routing
The advertisement of all known routes and their associated cost (hop count) is periodically sent out to all neighbours within a broadcast/multicast domain. Each router
will in turn incorporate the reachable prefixes and adopted costs into its routing table and send this new table out to its topological neighbours.
Update processing makes use of the Bellman-Ford algorithm ([22], [78]) , which
minimizes the hop count within the route selection. The DV principle incurs a low
processing load, considerable network traffic and the evolving update dissemination
is vulnerable to routing loops long convergence times.
Since routers along the advertisement track can not work out the original source of
the information, this principle is also referred to as “routing by rumour”.
Path vector routing
The major characteristic of this working principle is the recording of the advertisement relay trail in the exchange prefix reachability update information. The trail record in the inter-domain case is the path of AS numbers, the advertisement passed
through. Path vector routing can be combined with distance vector or other path selection mechanisms. The prominent example for path vector routing is the border
gateway protocol. The advertisement and router selection process is governed by
policy (filter) rules and does cover numerous criteria. Reachable IP networks are
announced to carefully selected neighbours and might be selectively and
neighbouring based filtered out by the above mentioned policy rules.
The knowledge about the advertisement trail mitigates routing loops and enables
route selection beyond the hop-by-hop scope. However, it still does not disclose
precise network topology information.
The processing load depends on the filter complexity. Since path vector routing is
mainly used in inter-domain setups, routing stability is preferred before fast restoration times in case of resource failures. Hence, a slow convergence time is regarded
as less critical.
Link state routing
Link state routers inform all such other routers in the respective routing areas about
their status knowledge of connected links as well as the flooded link state information of the other neighbouring routers. LS routers maintain neighbouring sessions
with each other and after an initial phase, where all link states are exchanged, only
changes are flooded within the area. This way, each router incrementally receives a
complete status of the network’s topology and can work out the shortest path routing table from its point of network view. This increased computational effort uses the
“Shortest Path First (SPF)” algorithm of Dijkstra [67]. The higher processing load
saves on network traffic and leads to faster convergence times. The flooded link
state updates contain the information about the information originator. This routing
principle is therefore referred to as “routing by propaganda.”
11
17.11.2009
IP routing protocols – working principle
Distance Vector
- RIPv1, RIPv2
- IGRP
- EIGRP (hybrid)
Path Vector
- eBGP / iBGP
“routing by rumour”
Fig. 10
Link State
- OSPF
- IS-IS
- EIGRP (hybrid)
“routing by propaganda”
IP routing protocols – classified by working principle
The routing in the global Internet relies on the meshed interconnection of autonomous
systems. So far, point-to-point connections are obvious solutions for the interconnection of
ASBRs. However, the vast majority of public interconnections are accomplished by means
of Ethernet based “Internet Exchange Points (IXP)”. As Fig. 11 indicates, hierarchical,
redundant and mostly distributed switching clusters are common realizations for those
neuralgic peering hubs with several hundred interconnected ASes.
Fig. 11
Internet Exchange Point - IXP
2.2.2 Inter-domain routing using BGP
The Border Gateway Protocol is explained in more detail due to its exclusive usage for
inter-domain routing as well as to the importance for the proposed Inter-AS BGP-based
signalling concept of this thesis.
12
17.11.2009
BGP is a so called Path-Vector protocol and distributes the reachability information of
network prefixes together with associated attributes. An outstanding characteristic is the
AS_PATH attribute, which records a list of relaying ASes for the respective reachability
information. This way, not only the neighbouring AS for a specific prefix is known to the
recipient, but the whole AS path that needed to be traversed in order to reach the
announced network(s). The AS path is therefore used as an important metric for path
selection (optimized for minimal path length) and loop detection. Only advertised network
prefixes, which do not include their own AS number in the path list are accepted as valid
route updates. BGP relies on TCP for reliable message exchange and sets up so called
“BGP sessions” between interconnected AS border routers. Each end point of the session
is called BGP peer and the BGP neighbour session establishment will only be successful,
if the parties configure the IP address and AS number of the respective peer in their router
internal BGP process. The border gateway protocol distinguishes two horizons, the
internal (iBGP) – BGP peerings between edge routers within an AS and the external
(eBGP) – BGP peerings between edge routers of adjacent ASes. Since one AS needs to
have a consistent knowledge of reachable prefixes at its edges, internal peers need to
establish and maintain a full mesh of peering sessions. AS confederations and the concept
n ⋅ (n − 1)
scalability problem of the full mesh in large
of route reflectors to circumvent the
2
ASes are explained below.
The exchanged reachability information is flooded across all BGP sessions, however they
are filtered (in each ingress and egress) to apply strategic routing policy decisions. The
border gateway protocol is therefore a global interconnection protocol with routing policy
enforcements. Each AS decides, which of their own prefixes are advertised to which
peering partner, which prefixes are accepted from external peers (ingress filter) and which
selected best paths are advertised to which external peer (egress filter).
This best path selection procedure is vital to understand BGP’s selection decisions of
active paths out of the received available paths. The algorithm is applied for the case,
when the same prefix is received several times. The decision points are processed in the
given order and the first differing criterion will yield the decision.
One optional route processing extension is BGP Multipath, which allows it to commit
several paths to a prefix in the local forwarding table. The best path will still be worked out
and announced to BGP peers, but multiple active paths will be installed and used for nodelocal packet forwarding.
13
17.11.2009
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Fig. 12
Prefer the path with the highest WEIGHT. (Cisco proprietary)
Prefer the path with the highest LOCAL_PREF.
Prefer the path that was locally originated via a network or
aggregate BGP subcommand or through redistribution from an IGP.
Prefer the path with the shortest AS_PATH.
Prefer the path with the lowest origin type.
Prefer the path with the lowest multi-exit discriminator (MED).
Prefer eBGP over iBGP paths.
If bestpath is selected, go to Step 9 (multipath).
Prefer the path with the lowest IGP metric to the BGP next hop.
Continue, even if bestpath is already selected.
Determine if multiple paths require installation in the routing
table for BGP Multipath.
Continue, if bestpath is not yet selected.
When both paths are external, prefer the path that was received
first (the oldest one).
Prefer the route that comes from the BGP router with the lowest
router ID.
If the originator or router ID is the same for multiple paths,
prefer the path with the minimum cluster list length.
(for Route Reflector environment)
Prefer the path that comes from the lowest neighbour session IP
address.
BGP Best Path Selection Algorithm - [49]
Five types of BGP messages are exchanged between peers, which are:
• OPEN,
(Initial setup of the peering session and exchange of protocol timer settings)
• UPDATE,
(Reachability and withdraw advertisement of network prefixes combined with path
attributes / complete update during initialization and triggered updates later on)
• NOTIFICATION,
(Closing message of the BGP session)
• KEEPALIVE and
(periodic handshake message, if no UPDATE is sent)
• ROUTE-REFRESH [47].
(Request message for a complete reachability information exchange (refresh) e.g.
for non-disruptive policy change enforcements )
All messages share the same structure, which is depicted in Fig. 13.
14
17.11.2009
Fig. 13
BGP message structure
The UPDATE message structure (see Fig. 15) in particular consists of a fixed length
message header, the variable length withdrawn route section, a variable length path
attribute section, and the variable length “network layer reachability information (NLRI)”
section. All advertised prefixes within the NLRI are “labelled” with the signalled attributes.
Prefixes, which require different attribute association, need to be sent in a separate
UPDATE message. New UPDATE messages for the same previously advertised prefixes
will override the stored NLRI and attribute information at the receiving end. All attributes
are classified by type numbers, the same type must not be included several times in a
message and the type number is used as ordering criteria.
There are different types of path attributes defined as shown in Fig. 14.
Path Attributes
optional
wellknown
mandatory
discretionary
- Origin
- AS-Path
- Next Hop
- Local
Fig. 14
transitive
- Aggregator
- Community
- Extended
Preference
- Atomic
Aggregate
non-transitive
- Multi-Exit-
Discriminator
(MED)
Community
BGP path attribute classification [46], [161]
It is important to note, that “transitive” and “non-transitive” in the attributes’ context
relates to the attribute signalling. The optional attribute is either relayed across the AS
towards the next neighbour or it is terminated within the peering AS. A non-transitive
attribute might be sent out to a different AS, but will terminate there.
15
17.11.2009
Fig. 15
BGP UPDATE message structure – after [157]
The concept of optional transitive community [46] and extended community [161] attributes
has been added to BGP for the purpose of rather free-style signalling of e.g. policy rule
triggers or other mutually agreed on activities. Fixed size community and extended
community structures have been defined, which are carried within the respective attribute
type. Several communities are therefore consecutively embedded within just a single
(extended) community attribute.
Of particular interest for this thesis are extended communities, which are further detailed
as follows. Extended communities are of fixed 8 byte size and consist of a type field
followed by the remaining community bytes. Extended community types are divided into
“regular types” (1 byte type field) and “extended types” (2 byte type field). The remaining
community bytes are therefore either 7 or 6 byte.
A type number registry for extended community types is administered by the Internet
Assigned Numbers Authority (IANA) [95], which ensures, that no assigned regular type
number is the high byte part of an assigned extended type number.
Extended communities are contained in the transitive optional extended community path
attribute. However, the communities themselves are classified as “transitive” and “nontransitive”. It is important to note, that “transitive” and “non-transitive” in the communities’
context relates to the AS border crossing nature of the community. That is, transitive
extended communities are relayed internally and externally. Non-transitive extended
16
17.11.2009
communities, however, must not cross any AS border. Despite the fact, that they are
embedded in a transitive attribute, they are by definition confined within the iBGP of a
single AS.
Fig. 16 depicts the UPDATE message structure with optional transitive extended community path attribute.
Fig. 16
BGP UPDATE message structure with Extended Community attribute
The above mentioned scalability problem for fully meshed internal BGP sessions between
border routers is addressed in two ways: “Router Reflector” and “Confederations”.
BGP Route Reflector
BGP route reflector has been defined in RFC 4456 [20].
If fully meshed BGP speakers are grouped into clusters, a hierarchy of route reflectors and
route clients can be setup. That is, only route reflectors still need to be fully meshed and
they serve their clients with incoming UPDATE as a relay node and speak on behalf of the
clients for UPDATEs raised within the cluster.
17
17.11.2009
Fig. 17
BGP Route Reflector topology
BGP Confederation
Autonomous System Confederations for BGP have been defined in RFC 5065 [172].
The main idea of the concept is to divide an AS internally into several confederation ASes.
Such internal ASes are not seen externally, so that the overall behaviour of the original AS
will not change. Internally, routing convergence and scalability is greatly enhanced due to
the route confinement within such artificial (private) ASes. Some special rules on advertisement and attribute handling were to be specified in order to establish the right
procedures for confederation internal eBGP and confederation external AS representation.
One example would be the AS path handling, which will record private AS numbers during
the confederation internal signalling. Such private AS numbers, however shall not cross
the original AS border and need to be stripped off outside the confederation.
Fig. 18 depicts the resulting AS internal topology of e.g. AS 4321.
Fig. 18
Autonomous System Confederations for BGP
18
17.11.2009
2.3 Router architecture
A router provides interconnection and relaying functionality between several inputs and
typically the same number of outputs. Major characteristics are the number of supported
ports and the achievable throughput. Fig. 19 depicts the general block diagram structure of
a router.
Fig. 19
IP router block diagram
2.3.1 Router control plane structure
The control plane of an IP router is equipped with one or more IP routing protocol
instances and provides the routing update generation and processing functionality
accordingly. Command line and web based interfaces allow the direct access to the control
instance for configuration and monitoring. Each enabled routing protocol allocates storage
and processing power resources according to its working principle. Fig. 20 depicts the
typical block structure for the control plane part. Each protocol maintains its specific
update information storage (e.g. link state database (LSDB)) and performs protocol local
route selection algorithms. The resulting routing table information is stored in protocol local
routing information bases (e.g. display with “command line interface (CLI)” command:
“show ospf route”). The route redistribution manager is a central building block, which
controls the route selection for the node local routing table as well as the mutual routing
information exchange between routing protocol instances. For instance, external BGP
learned routes can be redistributed into OSPF to announce global connectivity and vice
versa for prefixes originating inside that routing domain. The filtered out routes are stored
in the node’s local IP routing information base (display with CLI: “show ip route”). A
protocol precedence, called “administrative distance (AD)”, has been defined, which
decides on the precedence order of the protocol RIBs. The lower the AD value, the more
important the information source is. In a last step, the condensed forwarding relevant
information is installed in the forwarding information base (display with CLI: “show ip cef”).
This FIB is often replicated in the input port units for fast interface local lookups.
19
17.11.2009
Fig. 20
IP router internal structure -> route processing
2.3.2 Router internal interconnection structure
This section looks more closely into the internal structure of the generalized block diagram
as of Fig. 19. The internal relaying of IP packets between input and output port units can
be implemented using three major concepts for the interconnection:
- shared memory,
- bus interconnection and
- crossbar interconnection.
The shared memory concept is the implementation variant, where received packets at the
input port are stored in the node local shared main memory and read out again from the
output port unit for transmission. Input and output port units use direct memory access
(DMA) methods for a fast copy transaction to and from the shared memory. A central
processing unit performs the IP lookup operation, the IP header update operations and
informs the respective output port about the lined up sending task. The intermediate copy
phase with the write and read access of the shared memory limits the achievable
throughput, which in turn limits the number of ports that can be served with such a
structure.
20
17.11.2009
A common bus infrastructure between the port units is used in the second implementation
variant, where the packet data transfer is arranged directly between the input and output
units. Each input port unit needs to be equipped with the forwarding table and the route
processing unit, which performs the lookup and header update operations. It independently determines the output port unit and initiates the bus transfer. Hybrid solutions with
bus interconnection for frequently served prefixes and shared memory operation for error
handling and complex routing operations are possible. The saved copy operation as well
as the concurrently performed packet processing speeds up the router’s throughput and
increases device scalability.
The commonly used implementation variant in commercial routers is the crossbar
interconnection between the input and output port units. A switching matrix – often referred
to as “switching fabric” – provides dynamically arranged interconnections between the
respective input and output unit at any given point in time. Such switching fabrics are
classified into blocking and non-blocking cross-connect types.
Given that two or more input port units independently work out to forward a packet towards
different output port units, a non-blocking switching fabric ensures the concurrent transfer
of all packets across the crossbar. Such non-blocking operation of the switching fabric can
be achieved using Clos’ concept [57] of a multi-stage switching network. The clocked relay
of variable length packets across the switching fabric is cumbersome. High speed routers
therefore split up the packets into small fixed length packets in the input port unit, which
are again reassembled in the output port unit. Such small junks are called “cells” and have
a size of e.g. 64 bytes. The direction of the cells across the fabric can either be centrally
organised by the switch fabric controller or more frequently used in high speed routers, by
so called “self-routed” cells. The latter solution adds a short header – so called “routing
tag” – to each cell, which carries the short fabric local addressing of the respective output
port unit.
2.3.3 Router internal queuing structure
In routers with blocking switching fabrics as well as in non- blocking switching fabrics with
certain constellations of packet arrivals and common relay destinations, it is necessary to
provide queuing memory in the input port units for tentative package storage. Whenever a
packet can not traverse the fabric straight away, it’s servicing is delayed and the input port
queue starts to build up. Given that two or more input port units independently work out to
forward a packet towards one and the same output port unit at a given point in time, the
path across the fabric is blocked towards this free output.
If the arrival rate outweighs the servicing rate, dropping mechanisms must be in place that
handle the queue’s overflow events.
Output port units also need to provide an output queue for line rate adaptation, if the
incoming rate of all traffic that is routed out this interface exceeds the available sending
rate. This is particularly important for traffic bursts within the multiplexed streams.
Given the situation, that a blocked and currently postponed packet is followed by a
consecutive packet that could be served instead of the blocked one, this situation is called
“Head of Line Blocking (HOLB)”.
The vendor Cisco describes in its 12000 series Internet router architecture documentation
[51] a multi-queue solution – called “virtual output queues (VOQ)”. The queue within each
input port unit is replicated times the number of output port units (plus one for multicast).
This way, each postponed packet transfer can be queued in the separate queue towards
21
17.11.2009
the currently blocked output destination and gives way to the servicing of the adjacent
queue of the input unit, which is not blocked in that switching time slot.
Such an optimized high speed router structure is depicted in Fig. 21.
Fig. 21
IP router with non-blocking fabric and virtual output queues
22
17.11.2009
3 Basic QoS aspects
3.1 Overview
The abbreviation “QoS” stands for “Quality of Service” and is extensively used in the
context of telecommunication systems. This wide applicability and usage leads to a lack of
understanding, what the respective QoS is. The term “Quality” as well as “Service” is not
standardized and needs to be discussed in each case.
In terms of data packet-switched networks, the QoS term refers to packet forwarding
characteristics (transmission speed, delay, jitter, packet dropping probability and packet
distortion rate). If such forwarding parameters are quantified in fixed values, this is referred
to as “absolute QoS”.
On the other hand, “relative QoS” distinguishes different kinds of packets or packet flows,
which are treated differently in order to achieve prioritized forwarding characteristics. No
absolute parameter values are applied, but priorities are assigned to the distinguished
kinds of packetized traffic.
“Coarse-grained QoS” combines both QoS types in a way, where not single packets or
packets flows are associated with fixed characteristic parameters, but groups/classes of
traffic are infolded into a fixed characteristic parameter set. Traffic separation and
tunnelling techniques are common ways to support coarse-grained QoS.
The ability of a packet data networks to deliver data packets in time to the right destination,
with acceptable loss and distortion rates and small transfer delay variations are the main
technical criteria to judge about the Quality of (transmission) Service. These technically
measurable parameters, however, do not easily relate to the experienced application
service quality as seen by the users at both end points of the networking system. The term
“Quality of Experience (QoE)” addresses the end user’s quality experience of the services,
which rely on the underlying interconnected networks and their QoS in terms of transmission quality. The mapping rules between measurable transfer QoS parameters and the
resulting QoE are application specific and still in development. Fast evolving application
services and QoS adaptive application implementations create a closed control loop,
which obfuscates the mapping. Quality of Experience is therefore indirectly addressed but
not focussed on in this work.
The following sections give an overview of the QoS types, their requirements, building
blocks and scopes.
3.1.1 relative vs. absolute vs. coarse-grained QoS
There are three general ways to support a distinguished and sufficiently good quality of
transmission. The term “sufficiently” is in place since quality requirements are application
service specific and a networking system that just fulfils those requirements is “good
enough” for the particular service. Any transmission quality provided in excess of those
requirements might be regarded as wastage. However, such wastage is justified, if the
23
17.11.2009
capital and operational expenditures (“CAPEX” and “OPEX”) for the QoS control outweighs the economical gain of a higher utilized network.
Over-provisioning
The first and easiest way to achieve a sufficiently good quality of transmission is overprovisioning, which is also referred to as “over-engineering”. In that case, the transmission
capacity – often falsely referred to as “bandwidth” – far exceeds the sustainable transmission requirements of any given service.
Link capacity utilization of less than 40% are common in over-provisioned networks, which
ensures lightly loaded network components, hardly filled input and output queues in
relaying nodes and thus, fast and low latency and low drop rate transmission for all
services. As long as technical solutions and the economical trade-off between service
revenue and transmission equipment capital and operational expenditures allow for an
over-provisioning business case, it is the easiest way to achieve sufficiently good
transmission quality.
The ease of planning and network operation (configuration, debugging etc.) as well as the
resulting network stability are major arguments of support for this approach.
Over-provisioning functions well in both cases, either the connection-oriented network
operation, where flows of packets are signalled to the network beforehand, or the
connection-less transmission of datagrams.
Relative QoS through Prioritization
As outlined before, the aspired sufficiently good quality of transmission is application
service specific and therefore allows for a differentiated quality of transmission solution. As
long as each application service receives the sufficiently good transmission quality, it is a
completely equivalent QoS solution.
This differentiation, however, requires control overhead, which results in packet classification, marking and forwarding treatment rules in each node. The term “relative” relates to
the fact, that application service packets no longer receive the same relaying treatment,
but some are preferably forwarded and others are delayed or even dropped.
Networks with differentiated forwarding treatment support both, connection-oriented and
connection-less type of operation as long as the association to a traffic class can be
derived from the information carried along in the packets/datagrams. The packet marking
approach is common for networks with relative QoS support, but does not preclude the
same forwarding treatment as derived from some sort of association between the possible
combination of packet header data with the required treatment.
The relative high transmission quality of the prioritized application services comes at the
expense of transmission quality discrimination for low-priority services, which ideally
suffice with the resulting quality.
The applicability of relative QoS solutions is only justifiable, if capacity limitations or the
economical trade-off between service revenue and CAPEX and OPEX require selective
service discrimination.
Absolute QoS through Reservation
Transmission quality in terms of guaranteed quantified parameter limits for throughput,
loss, delay and delay variation can only be safely achieved by means of resource
reservation with admission control and limited overbooking.
The prerequisites for reservation approaches are as follows. Flows of packets need to be
identified as to belong to a pre-established reservation state. All relaying nodes need to be
signalled about resource requests and have to admit and reserve reservation grants. They
need to keep track of the flows’ resource usage in order to detect excess traffic and to
make informed decisions about incoming new reservation requests. Furthermore, the edge
24
17.11.2009
nodes or even every node need to implement admission control functions in order to
mitigate excess traffic requests and to screen the granted reservations.
The usage metering needs to account for average rates as well as instant peak limitations.
In order to prevent false alarms through the packet multiplex in relaying nodes, traffic
shapers can smooth out bursts at the expense of increased transfer delay and delay
variation.
Networks with resource reservation in the forwarding path require a connection-oriented
type of operation. The signalling of resource requests, the resource grant along the route,
the setup of admission control, traffic metering and possibly shaping units needs to be
arranged during connection setup prior to the actual packet transfer phase.
The guaranteed high transmission quality with absolute parameter limitations for selected
packet flows of application services comes at the expense of service discrimination for
low-priority – non-reserved - services, which ideally suffice with the resulting quality.
The applicability of absolute QoS solutions is only justifiable, if application services or the
economical trade-off between service revenue and CAPEX and OPEX require service
guarantees and make up for the high control overhead in every node.
Guaranteeing QoS parameters can even be targeted in over-provisioned networks. At the
operator’s risk, such commitments are made and abided by, because of the lightly loaded
network.
All three general QoS approaches are often used in combination. Depending on the
carried traffic type and the available link capacity, traffic separation with prioritization, trunk
reservation and flow reservation can be applied concurrently.
Coarse-grained QoS
The level of granularity for absolute or relative QoS is addressed with the “coarse-grained”
QoS approach. Single flow reservations with absolute QoS guarantees are targeted with
the Resource Reservation Protocol as described in 4.1.2. This is clearly a fine-grained
approach and far too detailed in core network nodes with thousands of traffic flows being
aggregated in the stream of IP packets going through. The Differentiated Services
approach with traffic being classified into traffic classes is a feasible QoS concept in large
scale networking. However, up to 64 classes can be distinguished with the 6 bit DSCP
marking codepoints. This still appears to be too sophisticated for global scale internetworking tasks, where currently no traffic differentiation is used.
The traditional IP precedence approach provided 8 traffic markings within the 3 precedence bits of the TOS field (see Fig. 1). Although the traffic separation in two, three or four
classes is expected to be a sufficient level of differentiation, the term “coarse-grained QoS”
is said to comprise no more than 8 traffic classes.
This coarse traffic separation is easily configured in relaying nodes, enables class based
tunnelling as described in 3.2.3 and allows for bundle reservations onto the few separated
(and possibly tunnelled) traffic classes.
3.1.2 QoS building blocks
The quality of service that a packet perceives during the forwarding process along a given
path from one communication end point to the other depends on a number of forwarding
decisions along the way. The major decision points reside on the path the packet will take
through the networking cloud(s), the mixture of equivalently forwarded packets as well as
the per hop forwarding behaviour in each relaying node.
Since the QoS treatment scope is addressed in chapter 3.2, the following paragraphs will
address the node local QoS-related treatment mechanisms in detail.
25
17.11.2009
The forwarding of packets in a single node is shown in Fig. 22 and characterized by:
- (1)
the classification,
- (2)
the routing decision,
- (3)
the input enqueuing with (4) dropping,
- (5)
the input queue scheduling,
- (6)
the fabric transmission,
- (7)
the output enqueuing and
- (8)
the output queue scheduling.
Fig. 22
Router internal forwarding path per hop behaviour
Each of the decision and treatment steps will be looked at more closely in the following
paragraphs.
Fabric transmission
The node internal relay of packets (packet cells) is predominantly performed in nonblocking crossbar structures in a contention free way of operation. It does therefore not
negatively influence the overall QoS characteristic of the router.
Routing strategies
Differentiated forwarding of packets along differing routes is a major QoS treatment, which
is addressed in chapter 3.2.2. It is the result of collaborative routing decisions and FIB
setups among a set of routers and does not directly relate to node internal QoS strategies.
Enqueuing strategies (input and output)
The tentative storage of packets after arrival and before send out is the first major internal
QoS treatment strategy, which is based on the number of queues supported in input and
output port units as well as the sorting criteria for them.
This work considers the availability of separated queues for distinguished packet types,
the very basic prerequisite for QoS support.
Whether the queues are implemented as separate hardware components or virtually
distinguished in software within one hardware component is irrelevant for the experienced
separation behaviour.
The number of queues and the possibly varying queue lengths are important QoS
parameters.
The enqueuing itself is a mapping operation between the distinguished types of traffic and
the available set of queues. Routers will ideally match the classification of incoming traffic
with the number of available queues for separate enqueuing.
Output enqueuing will follow this decision and decide the mapping based on either the
packet header or routing tag carried classification information.
Classification strategies
If QoS is supported in a node, there needs to be a classification block in the input port unit,
which sorts the incoming traffic into distinguished traffic classes. The easiest way of
operation would be based on the DSCP class markings in the IP header as depicted in
Fig. 5. It will, however require a class mapping (grouping) operation between the poten-
26
17.11.2009
tially 64 available DSCP classes and the actually provided enqueuing classes. Such class
mappings are addressed in chapter 8.
If DSCP markings are missing or not trusted, the classification can be based on any
combination of IP packet header information. However, since traffic separation is most
likely be performed following application requirements, a more sophisticated classification
based on multi-layer classification is performed. Mainly port information of Transmission
Control Protocol (TCP) and User Datagram Protocol (UDP) packets reveal the type of
traffic being transported within the packet payload.
It is at the operator’s discretion to define the level of deep packet inspection and to
configure the resulting classification rules.
Current commercial router equipment provides input port units (line cards), which allow for
hardware based line rate deep packet inspection even within the TCP and UDP payload.
Dropping strategies
Because of the burstiness of multiplexed packets, queues can quickly fill up and might flow
over. Normally, packets are dropped at the end of the queue as shown in Fig. 23. It is
called “Drop-Tail queuing”.
Fig. 23
Drop-Tail queue dropping strategy
However, two optimization strategies are common, which aim for early dropping of packets
as congestion indication for responsive transport protocols such as TCP or selective
(preferred) dropping of less important packets, if traffic separation is in place.
TCP interprets any lost packet as congestion indication and will reduce the sending rate
accordingly. The respective congestion window mechanism therefore tries to mitigate
queue overflows in bottleneck nodes. However, packet loss indication with tail drop far too
late informs the sender about the congestion, which leads to slow TCP reaction times.
Sally Floyd invented the so called “Random Early Detection (RED)” mechanism [77],
which randomly selects packets from a filling up queue, in order to give an early congestion indication. However, the randomness is staggered into three sections as depicted in
Fig. 24. As soon as the minimum threshold is crossed, the dropping probability of each
enqueued packet rises as a function of the calculated average occupied queue size.
Based in this, the congestion indication to each sender is roughly proportional to the
sender’s share of the transmission capacity.
There are several flavours of RED available, which will be briefly mentioned here.
“Dropping from the front”: Since congestion indication is time critical, the randomized
dropping decision should target packets at the queue’s head instead of the tail. This is
particularly important, if traffic flows are accounted for within the queue management. That
is, if a packet is worked out to be dropped, flow optimized congestion indication picks an
earlier enqueued packet of the same flow for the dropping instead.
“Congestion marking instead of dropping”: TCP’s deficiently congestion detection has
been tackled by the “Explicit Congestion Notification (ECN)” and “Pre-Congestion
Notification (PCN)” extensions, which are optionally available in IP networks. End points
27
17.11.2009
signal their ECN support in one of the two ECN bits in the IP header (see Fig. 5) and
relaying routers might indicate with the second bit the upcoming congestion. This forward
congestion notification saves the actual packet drop (and the resulting resending of it) and
with backward signalling by the receiver achieves exactly the same sending rate reduction
effect as before. Enqueued packets crossing the maximum threshold in Fig. 24 are
therefore all ECN marked.
Fig. 24
Random Early Detection (RED) for congestion avoidance
The Pre-Congestion Notification is currently defined at the IETF pcn working group and
reuses ECN bits under certain conditions. PCN markings are only applied for IP packets,
which are marked with the so called “DSCP for Capacity-Admitted Traffic” value [17].
Those markings are used for domain internal admission and flow control between the
ingress and egress routers of a PCN domain. That is, PCN enabled network domains
differentiate DSCP marked traffic flows and ensure a smooth transport of admitted flows
even under high load conditions. This is achieved by blocking of new incoming flows
traversing the domain or even flow termination of some existing ones.
A plethora of other dropping strategies are available to operators to choose from for their
per hop dropping behaviour configuration.
A simple but efficient flow based dropping scheme, called “Longest Queue Drop (LQD)”
[170] can be applied, if flows are recorded, equally weighted and virtually queued in a
separate queue. The respective queue length related to the flow’s usage of the link
capacity and the simple scheme of dropping front end packets out the longest flow queue
will actually degrade each flow according to its link share.
28
17.11.2009
Fig. 25
Longest Queue Drop (LQD) of virtually separated flows
Lastly, the “Weighted Random Early Detection (WRED)” strategy is widely used for
differently prioritized packet dropping behaviour (e.g. in [54]). Weighted RED combines
standard RED detection with prioritized dropping behaviour based on DSCP class
markings. The vendor Cisco thereby concentrates on the leading three DSCP bits and
performs TOS conformant IP precedence prioritization (see Fig. 5).
As a result, the RED minimum and maximum thresholds are replicated and differently
configured for the distinguished classes of packet traffic. Low priority packets therefore
experience an early threshold setting with resulting higher dropping probability.
Congestion indication via dropping is only fully effective for responsive applications like
TCP – otherwise, it only throttles the respective link share usage.
Scheduling strategies (input and output)
Stage (5) and (8) in Fig. 22 consider servicing decisions in the view of several queues
waiting for transmission. Output port units only provide a set of queues, if they distinguish
between traffic classes with associated forwarding priorities. Otherwise, there is only a
single queue in place for incoming and outgoing rate adaptation with simple “First In, First
Out (FIFO)” scheduling strategy.
Input port units provide several queues for the available N output paths to prevent HOLB
and might additionally distinguish C traffic classes in separate queues per path resulting to
a maximum queue set of size C x N.
There are several strategies available, which decide on the servicing order of the queue
set.
The easiest algorithm is called “Round Robin (RR)”, which services the queues in turns,
each enqueued element at a time.
29
17.11.2009
Fig. 26
Round Robin scheduling
If packets are prioritised and require differentiated scheduling behaviour, the most
stringent algorithm is “Strict Priority (SP)” servicing. Here, queues of lower priority are
only then serviced, if all higher priority queues are empty at that time (see Fig. 27).
Fig. 27
Strict Priority scheduling
The strict servicing paradigm can, however, starve out low priority queues completely,
which is not intended.
A plethora of altered round robin and priority scheduling strategies has been developed,
which all aim for some fairness in the scheduling process.
The major flavours called “Weighted Round Robin”, “Deficit Round-Robin”, “Weighted Fair
Queuing” and “Self Clocked Fair Queuing” will be briefly described below.
“Weighted Round Robin (WRR)” is the simplest approximation of the so called “generalized processor sharing (GPS)” model and aims to grant each scheduled queue the link
share according to its priority. Since WRR can only service packets (more often cells), the
priority guided fairness granularity of the link share depends on the served packet sizes.
The major difference to strict priority scheduling is, that WRR does not necessarily empty
the high priority queue before advancing to the lower one, but rather calculates the
packets of a certain queue that should be serviced this round depending on the queue’s
link share priority. The number of dequeued packets at a time is the queue’s priority
divided by the lowest priority of any queue.
30
17.11.2009
Fig. 28
Weighted Round Robin scheduling
“Deficit Round-Robin (DRR)” as well as “Weighted Deficit Round-Robin (WDRR)” are
both modifications to the relating aforementioned two strategies. The dequeuing granularity is statistically stretched across multiple dequeuing phases and accounts for the
difference between the calculated dequeuing share in bytes and the actually used up
share in dequeued packet bytes. Unused granted bytes are added to the next round share,
which eventually dequeues the delayer packet due to the accumulated granted byte count.
The accumulated difference count is called “deficit count.” The number of granted bytes
served in each round is called “quantum” and is either defined fixed or weighted by the
queues priority.
All fair queuing approaches follow the idea of an ideally fair mixing of traffic streams with
infinitesimal fine granularity. This scheme is called “Generalized Processor Sharing (GPS)”
and can be symbolized as in Fig. 29.
Fig. 29
Symbolized fair queuing in an idealized GPS = Fluid-Flow Queuing
“Weighted Fair Queuing (WFQ)” [66], [149] is a Packet-by-packet Generalized Processor
Sharing (PGPS) approach, which takes into account, that the entire unit of traffic (packet)
must be served in a dequeue process. The fluid-flow queuing ideal is approximated by
means of an internal simulation of the theoretical service finishing time of each scheduled
queue and ordering the actually performed servicing of the packets according this finishing
time order. The difference between simulated finishing time and actual finishing time is a
metric for the experienced service degradation (unfairness) for the respective queue.
31
17.11.2009
Fig. 30
Fluid-flow approximated queuing in WFQ
The scheduling strategy called “Class-Based Weighted Fair Queuing (CBWFQ)” is a
modified WFQ in the sense, that it specifically assigns traffic classes to queues and
configures different weights and queue lengths to the set of serviced classes.
The calculation of the finishing time in fluid-flow simulations is based on actual time steps
and independent from enqueuing or dequeuing events processed in a scheduler. This
absolute time scale is therefore replaced by an internally generated virtual time scale. The
packet stream becomes “self-clocked”, hence the name “Self Clocked Fair Queuing
(SCFQ)” [84].
The calculation of the self-clocked virtual finishing time is a function of the following
parameters.
* packet length,
virtual finishing time = f
* virt. finishing time of the previous packet of the same queue,
* system’s virtual time at packet arrival
In practice this can be implemented by stamping an arriving packet with a service tag
equal to the corresponding virtual finishing time and servicing the packets in ascending
order of service tags.
Linking the virtual time to the work progress in the actual packet-based queuing system is
a fast and computational simple operation, which yields the same packet ordering result as
the original WFQ approach. SCFQ is therefore the preferred implementation variant of any
WFQ inclined scheduling strategy.
Given the described variety of building blocks along the router internal forwarding path, a
basic QoS supporting router with non-blocking switching behaviour, virtual output queues
and eight supported QoS classes is sketched in Fig. 31.
Cisco’s 12000 series routers [52] apply weighted random early detection for class-based
dropping (congestion avoidance) as well as deficit round robin scheduling within the input
port unit and an proprietary modified deficit round robin for the output port scheduling.
32
17.11.2009
Fig. 31
VoQ with 8 classes CoS support (scheduling and dropping)
As depicted in Fig. 32, the overall per hop forwarding behaviour experienced by any traffic
being routed across a single router node is comprised of all router internal building blocks
and their configuration.
Per hop treatment
Separate enqueueing
Queue Scheduling
Packet Dropping
Number of queues
1:1 enqueueing
M:N enqueueing
Round Robin
WRR/WDRR
WFQ
CBFQ
Drop Tail
RED
WRED
Combination = Per hop forwarding behaviour
Fig. 32
Per hop forwarding behaviour composition in relaying nodes
Additional building blocks are required, if absolute QoS parameters are to be guaranteed.
This encompasses metering, rate based remarking, possibly shaping and admission
control. All of those blocks rely on measurement algorithms at the input or output of the
system, which either directly influence the relaying and sending behaviour (packet
dropping, delaying or downgrading) or take the measurement thresholds as indicators for
accounting or other traffic related configuration decisions.
Generally speaking, a policing block is added to the input port unit and a shaping block to
the output port unit.
“Policing” is controlling the general access of a certain traffic type for the node or less
stringent, a rate limitation element. The functionality can be packet based or flow based,
depending on the ability to associate single packets into flows by indication or by classification outside the policing block. Traffic measurements for the respective packets or flows
are taken and compared with a configured threshold in order to judge about rate conform-
33
17.11.2009
ing (“in”) traffic or excess (“out”) traffic. Policing of excess traffic can either result in packet
drops or in downgraded markings, if traffic differentiation by packet markings is in place.
A special policing operation is “admission control”. It is the strict dropping operation of
packets either based on the excess traffic measurement for previously agreed on sending
rates or based on a flow indication and a list of previously granted or refused flows. This
essential part on agreed rates or flows can either be implemented with signalling protocols
for rate or flow reservations or be part of a written agreement between interconnected
parties, called “Service Level Agreement (SLA)”.
“Shaping” controls the burstiness of transmitted streams of packets. A measurement
based delaying of the enqueued outgoing packets assures a smoothed out packet sending
rate. Traffic aggregation points are typical causes for traffic bursts due to the statistical
multiplex of the independent packet streams.
The required measurements are generally based on two measurement analogies, “leaky
bucket” and “token bucket”. Both algorithms consider the filling and draining of an
imagined bucket with controlled throughput characteristics. It is important to mention, that
both algorithms come up with filling level and timing values, which can in turn control the
policing or shaping action.
“Leaky bucket” measurement model (see Fig. 33):
This simple model emulates the filling of a bucket with elements and concurrently the
dripping out of elements at a fixed rate. Two sources of events (and consecutively followed
actions) can be taken out of this model. The first is the occurrence of overflowing elements, if more than b elements have been queued up in the buckets at any point in time.
This happens, if the average filling rate of the variable filling process exceeds the constant
drainage of the bucket. The size b of the bucket thereby represents the averaging time
interval. Elements that are arriving in excess of the bucket’s capacity are dropped. The
second source of events is the sending time of dripped out elements.
The model is most often used for traffic shaping due to its applicability to smooth out
bursty incoming traffic into constant rate outgoing traffic. Traffic peaks are eliminated by
the delayed relay procedure.
However, the overflow events can also be used in policing blocks, which detect excess
traffic this way and predominantly drop or possibly remark the excess elements.
Leaky bucket implementations are mostly used in “Asynchronous Transfer Mode (ATM)”
networks and named “Generic Cell Rate Algorithm (GCRA)” there.
34
17.11.2009
Fig. 33
Leaky bucket algorithm
“Token bucket” measurement model (see Fig. 34):
This most commonly used model also fills and drains a bucket, but with a token controlled
variable drainage rate. This rate variation is bounded by the bucket size b and leads to a
constant average sending rate. Essential parameters of the model are the “Token Bucket
Rate – r” and the “Token Bucket Size – b”. Tokens are released into the bucket at a
constant rate r and will be used up for each drained element. Incoming elements leave the
bucket, if enough tokens are available for the drainage. Otherwise, the elements are
stored in the bucket until enough tokens have been released by the token rate. If b
elements are stored in the bucket when a new element arrives, this element flows over
and is classified as excess traffic.
Useful additional parameters for a token bucket model are “Peak Data Rate – p”, “Minimum Policed Unit – m” and “Maximum Packet Size – M” (see e.g. [182]). The peak rate is
limited by either the interface line rate or the maximum data generation rate at the sender.
The maximum packet size is a fixed statement about the largest packet size being
processed and must be smaller than the link’s maximum transmission unit (MTU). The
minimum policed unit is the minimum granularity of the decision process. Elements of
smaller size are set to have the minimum size in the calculations.
Three sources of events (and consecutively followed actions) can be taken out of this
model. The first is the occurrence of overflowing elements, if more than b elements have
been queued up in the buckets at any point in time. The second source of events is the
sending time of dripped out elements. A bucket filled with tokens that is seen by a newly
arrived packet will immediately send out any incoming packet at the full incoming rate.
However, if the token reserve has been used up, the dripping will follow the token rate.
The third source of events is the first moment, where an incoming packet can not be sent
out instantly, due to a lack of tokens for it.
35
17.11.2009
Fig. 34
Token bucket algorithm
The token bucket metering is used within the integrated services architecture (IntServ)
(see 4.1.2) and is addressed in detail in RFC 2210 [182].
Two widely used marking schemes make use of token buckets:
- RFC 2697 - A Single Rate Three Colour Marker [89] and
- RFC 2698 - A Two Rate Three Colour Marker [90].
They use one or two token buckets to determine the degree of conformity of traffic to
several marking levels.
Each token bucket can be used to compare the stream of packets against a certain token
rate and bucket size parameter set. However, if the stream needs to be classified in
several traffic classes, a number of token buckets with differing parameter sets can be
used.
The single rate marker uses two token buckets with the same token rate, but different
sizes, in order to base a three colour marking on the differing burst sizes.
The two rate marker also uses two token buckets, however with differing token rates and
bucket sizes.
The three colour markers are often used in differentiated services (DiffServ) (see 4.1.1)
setups.
The above mentioned congestion notification scheme PCN makes also use of token
bucket metering for its signalling decisions. However, it introduces a filling threshold below
the actual bucket size. Early warning signals can now be triggered, if the bucket filling
crosses that warning threshold. The algorithm is therefore called “threshold marking
algorithm” [70].
Both metering algorithms, leaky bucket and token bucket, are clearly understood, if equally
sized elements are used. However, the transition to IP networks reveals a level of
uncertainty or unfairness, when variable length packets are measured. If the bucket
calculations would be based on packets as a whole, the decision taking remains easy and
clean. However, the metering outcome is distorted by unequal packet sizes and thus not
useful.
36
17.11.2009
In practise, both models revert to byte or even bit counts for bucket sizes, packet sizes and
tokens in order to accommodate the unequal resource shares of the varying packets.
However, packets can now partially overflow, find partially available tokens and introduce
varying times for sending, threshold crossing and overflow events.
This phenomenon is addressed in RFC 3290 [28] with the definition of loose and strict
mode of metering operation.
Strict conformance
“Packets of length L bytes are considered conforming only if there are sufficient tokens available in the bucket at the time of packet arrival for the complete packet
(i.e., the current depth is greater than or equal to L): no tokens may be borrowed
from future token allocations.”
Loose conformance
“Packets of length L bytes are considered conforming if any tokens are available in
the bucket at the time of packet arrival: up to L bytes may then be borrowed from
future token allocations.”
The strict mode of operation is commonly used, since it avoids negative parameter values.
3.2 QoS treatment scope
Quality of Service differentiation of packet streams can be applied in different levels of
granularity as well as treatment scope. The QoS per hop treatment options have been
described in the preceding chapters and will be referred to as “QoS-based forwarding”.
Outside the single node behaviour, the path - a packet stream takes through a meshed
network - can also be guided by QoS differentiation, which will be called “QoS-based
routing”. Lastly, groups of packets belonging to a traffic class can be encapsulated in
tunnelling technologies either within the IP layer (e.g. Generic Routing Encapsulation
(GRE) [72]) or below (e.g. MPLS LSP see 4.3, Ethernet VLANs see 4.2, etc.). Such QoSbased tunnelling will be titled as “QoS-based tunnelling”.
Fig. 35 depicts the differences in the control plane routing processing as well as the
resulting FIB differences.
Fig. 35
QoS-based IP lookup variants
As a starting point Fig. 36 depicts a simple inter-domain network setup, that will be used
as an example for the following three sections. It assumes three interconnected ASes,
which internally perform traffic separation in certain granularities (AS 1 with 4 classes in
layer 2 and 3 / AS 2 without separation / AS 3 with 4 classes in layer 2 and 3). The
interconnecting AS border routers remove any separation and provide no QoS-based (i.e.
37
17.11.2009
“best effort” only) traffic exchange. From the point of view of AS1 and AS3, AS2 is a transit
provider, which also offers best effort only transit service.
Fig. 36
Best Effort interconnection example
Customer traffic from AS1 towards a prefix originating from AS3 will find AS2BR_1 as next
hop and the respective output interface for the interconnection link during the IP lookup
procedure. Packet markings within the IP header of the traffic have no influence in this
lookup and will be removed or ignored by AS1BR and AS2BR_1. The packet relay within
the transit AS is shortest path per-hop forwarding without traffic class separations.
AS2BR_2 in turn will find AS3BR as next hop and the respective output interface for the
interconnection link during the IP lookup procedure. Packet markings within the IP header
of the traffic have no influence in this lookup and will be removed or ignored by AS2BR_2
and AS3BR. The traffic entering AS3 might either be relayed internally as a best effort
traffic class toward the destination or AS3 might perform costly multi-layer ingress
classification in order to guess the most suitable traffic class out of the supported three
class set.
The three treatment scope variants are described below.
3.2.1 QoS-based forwarding
The simplest quality of service inferred packet transmission behaviour is achieved, if
conventional routing (and thus path selection) remains unchanged and the traffic differentiation is made hop-by-hop within relaying nodes. All QoS building blocks as of chapter
3.1.2 are applicable. Packets will traverse the network along the same paths as if no QoS
support were enabled. However, the experienced per hop treatment of each traffic class
will reveal the differentiated forwarding and dropping behaviour, which results in measurable QoS improvements for higher prioritised classes in terms of delay, loss and throughput compared to the common unclassified best effort network behaviour.
It is at the operator’s choice to select and apply a combination of QoS building blocks for
forwarding in every node along the chosen route. The standard shortest path first routing
behaviour of IP networks remains unchanged, which normally results in unbalanced
network load distributions.
The exchange of reachability information within the router control plane does not necessarily signal QoS information for advertised IP prefixes and thus allows for no QoS specific
best path selection. RIB and FIB contain just one routing entry per IP prefix and point to
the relevant next hop and port information. Remapping information might additionally be
38
17.11.2009
stored, if the next hop network requires a different DSCP marking in the IP header for the
same traffic class.
The quality of service in QoS-based forwarding solely relies on node internal QoS means
based on the QoS information carried within the IP header.
Fig. 37
QoS-based forwarding interconnection example
In the example of Fig. 37, customer traffic from AS1 towards a prefix originating from AS3
will find AS2BR_1 as next hop and the respective output interface for the interconnection
link during the IP lookup procedure. Packet markings within the IP header of the traffic are
used in AS1BR for queue selection, scheduling and dropping decisions. Both interconnected border routers will respect the QoS packet markings. Since different class sets are
supported in AS1 and AS2, either AS1BR or AS2BR_1 is responsible for class mapping
and possibly packet remarking. The packet relay within the transit AS is shortest path perhop forwarding with traffic class separation based on the offered QoS-based transit class
set as of AS2. AS2BR_2 in turn will find AS3BR as next hop and the respective output
interface for the interconnection link during the IP lookup procedure. Packet markings
within the IP header of the traffic are used in AS2BR_2 for queue selection, scheduling
and dropping decisions. Both interconnected border routers will respect the QoS packet
markings. Since AS2 and AS3 both support a three class setup, there might not necessarily be any class mapping and remarking in place. Whether or not identical class sets and
markings are used in both systems needs to be checked during separate QoS signalling
information exchange. There is no longer a necessity to perform multi-layer ingress
classification due to the QoS-based forwarding and consistent marking procedure.
3.2.2 QoS-based routing
Shortest path first routing tends to unbalance network loads due to the simple preference
of the shortest path algorithm. The resulting routing along congested links can improve the
experienced QoS for prioritized classes by means of QoS-based forwarding. However
reducing the links’ load by diverting the network traversal for differentiated streams of
traffic is far more effective. Routers would ideally support multi-path routing entries in their
RIBs and FIBs and direct traffic of different classes along different next hop routes. If multipath routing is not supported, the selection of the shortest path could be augmented by
QoS-based conditions. That is, paths with signalled QoS support should be preferred over
others, even if the non-QoS paths would be shorter. If multiplex QoS-supporting paths are
discovered towards the same prefix, the selection should prefer the one with the best
matching QoS class set. This would lead to a best QoS path selection rather than a
shortest path selection. A similar best path selection is already used in inter-domain
39
17.11.2009
routing with BGP (see Fig. 12). Applying QoS-based routing to BGP would therefore either
require multi-path BGP and/or a modified best path selection with added QoS match
checking (see Fig. 38).
QoS based path selection
Best path selection process
Multi-path traffic assignment
“QoS based load balancing”
Additional selection condition for
(extent of) QoS support
Available in meshed setups, where
multiple (instead of “best”)
paths are selected for a prefix
Best path selection modification !
Fig. 38
Multi-path support required
QoS-based path selection in BGP
The exchange of reachability information within the router control plane would ideally
comprise the signalling of QoS information for advertised IP prefixes and thus allow for
QoS specific best path selection. RIB and FIB either contain the best QoS matching
routing entry per IP prefix or even multiple entries for the same IP prefix with associated
QoS marking selections. Remapping information might additionally be stored, if the next
hop network requires a different DSCP marking in the IP header for the same traffic class.
The quality of service in QoS-based routing relies on QoS-based route selection possibly
combined with node internal QoS means based on the QoS information carried within the
IP header.
Fig. 39
QoS-based routing interconnection example
In the example of Fig. 39, customer traffic from AS1 towards a prefix originating from AS3
will either find AS2BR_1 or AS4BR_1 as next hop and the respective output interface for
the interconnection link during the IP lookup procedure. This route selection would be
based on the DSCP marking of the packets and sort out priority traffic using a different
transit route than ordinary traffic. In either case, packet markings within the IP header of
40
17.11.2009
the traffic are used in AS1BR for queue selection, scheduling and dropping decisions.
Each interconnected border router will respect the QoS packet markings. Since different
class sets are supported in AS1, AS2 and AS4, either AS1BR, AS2BR_1 or AS4BR_1 is
responsible for class mapping and possibly packet remarking. The packet relay within the
transit AS2 or AS4 might or might not apply QoS-based routing internally as well. That is,
shortest path per-hop forwarding or best QoS path per-hop forwarding with transit QoS
traffic class separation is applied. Either AS2BR_2 or AS4BR_2 will eventually be reached
by the relayed packet and in both cases will find AS3BR as next hop and the respective
output interface for the interconnection link during the IP lookup procedure. Packet
markings within the IP header of the traffic are used in AS2BR_2 or AS4BR_2 for queue
selection, scheduling and dropping decisions. Each interconnected border router will
respect the QoS packet markings. Remapping and packet header remarking will be
performed by either AS4BR_2 or AS3BR in order to match the available class sets.
Whether or not identical class sets and markings are used in the involved autonomous
systems needs to be checked during the separate QoS signalling information exchange.
There is no longer a necessity to perform multi-layer ingress classification due to the QoSbased forwarding and consistent marking procedure.
3.2.3 QoS-based tunnelling
Offering network services to customers and service providers increasingly calls for
tunnelled traffic transport. Two major reasons can be observed for the current trend
towards encapsulated and route pinned IP packet transport. The first reason is traffic
engineering and enhanced control over IP packet transmissions. Packet forwarding within
tunnels enables operators to skirt around shortest path first routing. Tunnels can be
planned, setup with fixed traversal nodes and dynamically switched around without IP
forwarding and routing interference. Maintenance operations and network resilience
characteristics make use of backup tunnel setups and fast switching operations between
the active and the standby tunnels.
The second reason for increased tunnelling usage is transparency. Tunnelled customer
traffic does not interfere with network local addressing or QoS setup peculiarities. Virtual
private networking services are one major application for tunnelled customer traffic
transport.
Tunnelling can either be realized within the IP layer, e.g. using GRE [72], or below the IP
layer, most importantly with MPLS (see 4.3) and Ethernet VLANs (see 4.2). If customer IP
traffic is encapsulated in transit provider IP-based GRE, then the DSCP of the outer IP
header becomes the tunnel QoS marking. Similarly the 3 bit QoS markings of MPLS
(traffic class (TC) marking) and Ethernet VLANs (“Priority Code Point (PCP)” marking) in
each respective header format become the 8 class limited QoS marking of those lower
layer tunnelling technologies.
From the QoS perspective, two major distinctions should be made for QoS-based
tunnelled transport. Either there is just one tunnel available with QoS markings in the
tunnel header information or several tunnels are setup, which each represent a certain
QoS class. The latter is generally available in any “virtual channel (VC)” based networking
technologies. With the advent of “Generalized MPLS (GMPLS)”, any time slot, wavelength,
fibre etc. can establish a separated channel, which can in turn stand for QoS-based traffic
separation. Hence, a channel per traffic class option is expected to be increasingly used in
future Internet setups.
Applying the described tunnelled transport to IP forwarding is easily setup within an
autonomous system. However, the AS interconnections are IP based only and any
tunnelled peering needs mutual agreements of the interconnected peers. IP-based GRE
encapsulation is always a valid option. However, inter-AS MPLS tunnels and inter-AS
41
17.11.2009
VLANs are more appealing. Point-to-point interconnections might be able to support both
lower layer tunnelling options. The predominantly used Internet Exchange Points, however
are neuralgic intersections, where Ethernet VLAN tunnelling would make a major
difference for transparent QoS-based customer traffic transport.
Fig. 40 depicts the described tunnelling options. The E-LSP and L-LSP abbreviations will
be explained in detail in chapter 4.3.
Tunnelling options
Inter-AS tunnelling
Intra-AS tunnelling
- Inter-AS E-LSPs …
- “tunnelling” through L2
marking (VLAN, PCP)
- Unlikely: layered peering
- E-LSPs, Carrier Eth.+PCP
- L-LSPs, VCs, λs, fibres etc.
Tunnelled forwarding
strongly recommended
Operator’s choice
Fig. 40
L2 marking most likely
Mutual agreement
Tunnelling scope
The exchange of reachability information within the router control plane would ideally
comprise the signalling of IP layer QoS information for advertised IP prefixes as well as the
supported tunnel QoS information. Routing is then augmented into a best tunnel selection
process together with the instalment of respective tunnel mapping information in the
router’s RIB and FIB. Depending on the 1:1 or n:1 class-to-tunnel mapping, there is either
one best QoS matching routing entry per IP prefix with tunnel selection and tunnel marking
adoption or even multiple entries for the same IP prefix with class-based tunnel selection.
Due to the tunnelled transport, customer traffic no longer needs to be remarked in the
header, but the tunnel selection and tunnel marking cares for the correct forwarding path
and per hop treatment. QoS-based tunnelling relies solely on the tunnel “header”
information to select the appropriate QoS building blocks within relaying nodes.
Tunnelled transport can therefore provide several transit QoS class sets to external
customers and does not need to change customer header information. It is the preferred
type of transit service in the future.
Major drawbacks of this approach are the currently missing Inter-AS tunnel support as well
as the missing standardized mapping between encapsulated IP QoS and outer tunnel QoS
markings. This is one of the addressed improvements of this thesis.
42
17.11.2009
Fig. 41
QoS-based tunnelling interconnection example
In the example of Fig. 41, customer traffic from AS1 towards a prefix originating from AS3
will find AS2BR_1 as next hop and the respective output interface for the interconnection
link during the IP lookup procedure. The packet markings within the IP header of the traffic
are used in AS1BR for queue selection, scheduling and dropping decisions. Both
interconnected border routers will respect the QoS packet markings. Since different class
sets are supported in AS1 and AS2, either AS1BR or AS2BR_1 is responsible for class
mapping and possibly packet remarking. The packet relay within the transit AS2 is chosen
to be tunnelled in two provided edge-to-edge tunnels of the AS. Two classes are set to be
provided for transit QoS, each class per tunnel. That is, incoming packets at the border
router AS2BR_1 will find two routing table entries for all reachable IP prefixes originating
from AS3, with differing outgoing interface mappings. Depending on the packet’s DSCP
marking, either of the two QoS-based transit tunnels will be selected. This includes IP
packet encapsulation. In case of single tunnel transport and tunnel QoS support, an
additional mapping of IP QoS markings onto the tunnel header QoS marking would be
performed. The encapsulated packets are now transferred across AS2 applying per hop
behaviour in each relaying node as derived from the tunnel associated QoS building block
behaviour. AS2BR_2 will eventually be reached by the relayed packet. The encapsulation
is removed and normal IP lookup will work out AS3BR as next hop together with the
respective output interface for the interconnection link. Packet markings within the IP
header of the traffic are again used in AS2BR_2 for queue selection, scheduling and
dropping decisions. Each interconnected border router will respect the QoS packet
markings. Remapping and packet header remarking will be performed by either AS2BR_2
or AS3BR in order to match the available class sets. Whether or not identical class sets
and markings are used in the involved autonomous systems needs to be checked during
the separate QoS signalling information exchange. There is no longer a necessity to
perform multi-layer ingress classification due to the QoS-based forwarding and consistent
marking procedure. The described tunnelling procedure is known as “peer model”, where
the tunnelling is confined within the transit AS boundaries. However, if the interconnection
links are inter-AS tunnelling enabled, the so called “overlay model” can be used. That is,
customer traffic is encapsulated in the sending AS1 and decapsulated in the egress border
router of AS2 or ideally the ingress border router of AS3. Such an inter-AS tunnel would
use tunnel QoS markings, which need to be agreed on between at least AS1 and AS2.
Multiple inter-AS tunnels are also feasible, where AS2 would offer several entry points for
differentiated tunnelled transit.
The QoS-based tunnelled transport of IP traffic with differentiated traffic classes is optimal
for transport of unmodified packets and for consistent QoS marking support. However, it
requires cross-domain reachability and QoS marking signalling combined with cross-layer
QoS mapping information. Both requirements are addressed improvements of this thesis.
43
17.11.2009
3.3 Architectural scope
3.3.1 Cross-layer QoS
Quality of Service support has been targeted in many networking technologies. Circuit
switched networks, such as the “Plain old telephone service (POTS)”, reserve for instance
separate lines / fibres, separate time slots, separate frequencies, separate wavelengths,
etc. for the interconnection of communication endpoints. Such reserved channels
inherently provide excellent quality of service being designed and operated with exactly
those reservations as required for the targeted service.
Packet switched networks such as “Frame Relay (FR)”, “Asynchronous Transfer Mode
(ATM)”, Ethernet with “virtual local area network (VLAN)” support, “Multi Protocol Label
Switching (MPLS)” and others provide similar channel emulations by means of virtual
channels. Packetized information transported in frames is separated by channel identifiers
(FR->DLCI, ATM->VPI/VCI, Ethernet->VLAN-ID, MPLS->Label), which are either preconfigured or dynamically setup during connection setup.
In either way, such separated channels provide excellent lower layer support for QoSbased tunnelling as described in 3.2.3. If separate QoS-inferred transport “lanes” are used
for IP packet transport, such setups will be named “Layer 1 QoS support” in the remaining
chapters.
All of the named packet switching networks also provide some means for QoS-related
frame markings. First of all, the channel identifier can be associated with certain per hop
treatments and possibly reservations. Furthermore, explicit “QoS markings” are available
such as:
• FR Æ “Discard Eligibility (DE)” bit
• ATM Æ “Cell Loss Priority (CLP)” bit,
• Ethernet-VLAN Æ 3 bit Priority Code Point (PCP) and
• MPLS Æ 3 bit Traffic Class" Field [9].
Such quality of service support for tunnelling technologies will be referred to as “Layer 2
QoS support”.
If QoS is aimed at in the IP networking layer (layer 3), the underlying QoS support should
ideally be incorporated in the QoS-based routing and forwarding process. However,
according to the layering concept, IP does not normally be aware of the underlying
technologies and their QoS support. The mapping between the upper layer QoS class set
and marking and the lower layer QoS class set and marking is neither harmonized nor
standardized.
The differentiated services approach mentions this drawback in RFC 2475 [32] and
recommends in guideline 15, that specifications of per hop behaviour should include such
layer two mapping recommendation. However, this is not the case. For more details, see
chapter 8.2.
Vendors also give configuration guidelines for such mappings to their customers. Some
examples of Cisco default mappings are given in chapter 8.2.
The Ethernet standardization addresses VLAN-based frame priorities in its standard
802.1p, which is now incorporated in 802.1D [97]. However, IEEE does not provide
mapping rules outside the Ethernet VLAN realm. It does standardize so called “User
Priority Regeneration” [97] and [98], which defines how priority classes are to be mutually
mapped, if different class set granularities are supported. Only the initial setting is stated,
which might be changed by port specific configuration. This is looked at more closely in
chapter 8.2 as well.
In summary, many configuration options are offered to network operators for cross-layer
QoS settings and some recommendations and default settings are provided. However,
there is no standardized mapping defined and reliable configurations can only be assumed
on single administrative domains. All default configurations can be overwritten and
44
17.11.2009
supported markings and mappings need to be dynamically signalled or manually configured based on SLAs.
3.3.2 Cross-domain QoS
The “Resource ReSerVation Protocol (RSVP)” [39] is the predominantly used QoS
signalling protocol for single packet flow reservations as well as for trunk reservations
together with MPLS or other tunnelling mechanisms. The protocol belongs to the integrated services architecture (see 4.1.2) and sends traffic specifications toward the traffic
sink and on the way back receives the actually achieved reservation specification. Strict
parameter based reservations with quality guarantees can easily be set up by this method.
RSVP is an end-to-end applicable reservation protocol, which potentially allows the
resource reservations across domain boundaries. This protocol has further gained
importance, since MPLS has chosen RSVP augmented with traffic engineering extensions
(RSVP-TE) for the predominantly used path setup protocol for traffic engineered MPLS
paths [11]. RFC 5151 [73] addresses the inter-AS setup of MPLS paths using three
possible methods: contiguous, nested and stitched paths. One crucial signalling requirement for neighbouring domains is the class set and marking information exchange during
the setup procedure. This can be achieved within RSVP-TE by means of the so called
“DIFFSERV” object. Two types have been defined for 1:1 and n:1 class-to-tunnel
mappings and can be found in RFC 3270 [75].
Besides this Inter-AS MPLS approach, there is no generally applicable protocol available,
that signals available traffic separation class sets and their encodings across AS boundaries and beyond. The setup of a QoS supporting IP forwarding path across several ASes
with potentially different class sets and encodings is currently unsolved and requires
explicitly arranged mutual agreements between neighbouring AS peers along the way in
order to establish a SLA guided forwarding path.
Furthermore, the availability of transit QoS in a certain class set extent as well as the
required packet marking for appropriate QoS class selection is not globally known, which
complicates the search for QoS capable peering partners to be contacted for SLA
negotiations.
This major drawback of a currently missing generally available simple QoS signalling
mechanism across AS boundaries is resolved by this thesis. Chapter 7 describes the
proposed solution.
45
17.11.2009
4 State of the art QoS Concepts
As described in chapter 3, over-provisioning is the easiest and currently used quality of
service concept in the Internet. As long as the cost for hugely over-dimensioned transfer
capacities (factor 5 to 10) is lower than any QoS scheme cost (investment in QoS capable
devices, staff training, operation monitoring, debugging, SLA negotiations etc.), there will
always be a sensible decision to go for the over-dimensioned network approach.
However, QoS concepts have been developed for IP and other networking technologies,
which are used within administrative domains. The following sections give a brief overview
to them.
4.1 IP QoS
The Internet protocol included from its very beginning a marking option in its header Fig. 5
to differentiate the priority and optimisation indication for the packet’s forwarding treatment.
Those precedence and type of service marking were, however not used by vendors and
network users and used to be ignored widely for many years. The carried IP traffic used to
be data traffic only, which is insensitive to packet loss or delay and can cope with bad
transfer qualities through retransmissions and packet buffering. Following the classification
in Table 1, this is batch transfer, where sending and receiving have bulk characteristic
without any strict timing requirements. Throughput is the main transfer parameter.
Table 1
Transfer demand matrix – after [79]
Source
Bulk
Bulk
Batch transfer
e.g. file transfer
Stream
“Replay” application
e.g. video on demand
Stream
“Recording” application
e.g. recording of a measurement signal
Signal transfer
e.g. voice over IP, video
conferencing
Sink
“Recording” applications are streaming services, where the sender needs to regularly
transmit its packetized information. The reception can be delayed since it is not critical for
the recording. If buffering is used within the source, packet retransmissions are feasible to
cope with packet losses.
“Replay” applications are streaming services, which require bounded transfer delay
constraints. Some delay variations can be compensated by replay buffers and time shifted
replay start points. However, excess delay causes late reception of packets and becomes
useless for the sink. Losses can not be repaired by retransmissions.
Signal transfer applications are duplex streaming services, which require bounded transfer
delay constraints in both directions. Only very limited sender and receiver buffers can be
used for transfer delay variation compensation. In both directions, packet retransmissions
are not available for loss compensation. Acceptable one way transfer delay times are for
instance investigated in ITU-T Recommendation G.114 . According to the analysis of very
good user satisfaction for voice over IP, the mount-to-ear-delay ideally stays below 150ms.
46
17.11.2009
The demand for IP QoS has risen with the increased usage of IP networks for the latter
three types of time and loss critical services.
4.1.1 DiffServ
The Differentiated Services (DiffServ) architecture is a prioritization concept of aggregated
traffic providing relative QoS (see 3.1.1) in IP networks. Work on DiffServ started in the
late 1990s and resulted in two major standard documents:
• RFC 2475 - An Architecture for Differentiated Services [32] and
• RFC 2474 - Definition of the Differentiated Services Field (DS Field) in the IPv4 and
IPv6 Headers [142].
Three interesting citations out of the work on DiffServ should be mentioned here, which
clarify the situation and intention of its developers.
“What problem are we solving?
Give “better” service to some traffic (at the expense of giving worse service to the rest).
ATM marketing fantasies to the contrary, QoS is a zero-sum game:
- it does not create bandwidth.
- it does not guarantee that you get better service.”
Van Jacobson [121]
"QoS is managed unfairness."
Kathleen Nichols [141]
“DiffServ wasn't chartered to solve the end to end QoS problem. It was chartered to define
coarse-grained class-of-service differentiation, which is an entirely different (and much
easier) goal.”
Brian E. Carpenter [44]
The formal description of DiffServ’s intention is given in RFC 3086 [143]:
“The differentiated services framework enables quality-of-service provisioning
within a network domain by applying rules at the edges to create traffic aggregates
and coupling each of these with a specific forwarding path treatment in the domain
through use of a codepoint in the IP header”.
An overview of the Differentiated Services architecture is depicted in Fig. 42.
47
17.11.2009
Fig. 42
Differentiated Services regions, domains and nodes
Packets of the different traffic streams are grouped into so called “Behaviour Aggregates
(BA)”, which infers the same per hop treatment behaviour (PHB) in relaying nodes along
the path. A BA is identified by as differentiated services codepoint (DSCP). All nodes
within a DS domain therefore associate a consistently configured set of treatment policies
(queuing, scheduling, dropping) to each specific BA. This applies to core as well as to
edge nodes. Core nodes classify behaviour aggregates solely by inspecting the packet‘s
DSCP information. Edge nodes of a DS domain, however, additionally perform multi-field
classification and conditioning functions. That is, the classification / grouping in the
domain’s ingress node inspects a combination of possibly several header information
fields and the ingress interface of that packet in order to make a policy guided decision
about the packet’s aggregation into a certain BA. This in turn results in the appropriate
DSCP marking (see Fig. 43).
Fig. 43
Behaviour aggregate classification and DSCP marking
The edge node’s conditioning comprises several function elements, such as meter,
marker, shaper, and dropper. It is important to understand, that DiffServ assumes the
correct classification and in particular the conditioning functionality being sufficiently
enforced at the edge of a DS domain. This in fact increases the scalability of the concept
and takes the burden of computational intensive multi-field classification and conditioning
off of the core nodes. Although this approach does not preclude or address the possible
burstiness and congestion conditions that will arise in internal traffic aggregation points, it
is a good compromise between strict QoS support and required control overhead. Internal
over-provisioning can be used as countermeasure.
48
17.11.2009
RFC 2475 [32] depicts the classifier and traffic conditioner structure as show in Fig. 44.
Fig. 44
Logical View of a Packet Classifier and Traffic Conditioner
The policy-based setup of PHB associated treatment combinations of queuing, scheduling,
dropping, metering, marking and re-marking, shaping and dropping is made by each DS
domain administrator and can make use of the plethora of mechanisms being described in
chapter 3.1.2. Several DS domain may be operated under the same administration, which
relieves the edge node operation of neighbouring DS domains. If both domains use the
same configuration, the ingress edge can simply operate with core node functionality.
Otherwise, full ingress operation needs to be applied.
If different administered DS regions are interconnected, there needs to be an agreement
on how to setup the ingress classification and conditioning. This is normally done with
SLAs containing the so called “Traffic Conditioning Agreement (TCA)”.
As described above, the central point of operation of DS domains are behaviour aggregates with associated PHBs. There is no limitation being standardized regarding the
number of possibly applicable PHBs. However, the mapping into DSCP is limited to 64
possible values, which results in operator specific PHB-DSCP mappings (see Fig. 45).
Fig. 45
PHB ÅÆ DSCP mapping
The state limitation of 4096 “global PHB space” is not a strict limit and relates to the
definition of so called “Per Hop Behaviour Identification Codes (PHB-ID)” [31] for PHB
signalling usage out a consistently managed DS region. A typical example is inter-provider
49
17.11.2009
PHB identification. The PHB ID encoding distinguishes between standard PHBs as
described below and non-standardized PHBs. The latter are IANA assignable and
currently limited to 12 bit length, hence the 4096 limitation.
Fig. 46
PHB encoding [31]
PHB ID codes can contain a set bit 14, which indicates PHB sets. This is best understood
for the below described 4 classes of AF PHB with 3 dropping precedence encoding each.
Besides the free choice for PHB definitions, there have been 22 per hop behaviour defined
with recommended DSCP encoding values, which will be briefly explained below.
Default PHB - 000000
RFC 2474 [142] defines the default setting of the DSCP header field to “000000”, which is
the standard IP encoding for “best effort” service. This is also the last resort choice, if the
DS domain ingress node can not associate another DSCP encoding during its classification process.
Class-Selector (CS) PHB – xxx000
RFC 2474 [142] has included the 8 DSCP encodings for backward compatibility reasons
with the original IP precedence encoding. The three leading bits of the 6 bit DSCP are
thereby numbered with the equivalent class selector number CS0 – CS7. CS0 is identical
with the default PHB. Some minimal requirements have been stated concerning packet
ordering and timely delivery for those classes. It should be noted, that CS-DSCPs all
encode the lower 3 bits to 000. However, as DS domain, which only supports class
selector PHBs will classify incoming packets on the three leading bits only. This way,
different DSCPs might be merged into single CS DSCPs.
Expedited Forwarding (EF) PHB - 101110
RFC 3246 [60] defines this expedited forwarding per hop behaviour, that is intended to
support low delay, low jitter and low loss services. It is the only PHB, that is normally
associated with a rate limitation that is strictly enforced in the DS ingress edge nodes. EF
marked traffic is expected to experience short or even empty waiting queues in relaying
nodes, which leads to low packet loss as well as low and hardly jittering transport delay
times. Separate enqueuing and highly prioritized scheduling are the keys for EF forwarding
behaviour.
50
17.11.2009
Assured Forwarding (AF) PHB
RFC 2597 [88] standardizes a group of Assured Forwarding PHB. 12 DS codepoints have
been allocated for 4 AF classes (AF1, AF2, AF3, AF4) with 3 drop precedence values
(1,2,3) each. An important constraint is given about the packet ordering. Within each class,
packets marked with the same AF class (and possibly differing drop precedence) must not
be reordered during the forwarding process. This is particularly important for multi-path
forwarding (e.g. load balancing). Since PHB are aggregates of IP traffic, such AF classes
are also referred to as “ordered aggregate”. The recommended encoding is depicted in
Fig. 47 and Table 2.
Binary
Drop
always
Class Encoding Precedence
‚0‘
AFcd =
XYZab0
Fig. 47
Encoding of Assured Forwarding PHBs
Table 2
Assured Forwarding DSCP encoding
AF class
1
2
3
4
DP low
AF11
001010
AF21
010010
AF31
011010
AF41
100010
DP medium
= AF12 = 001100
DP high
AF13 = 001110
= AF22 = 010100
AF23 = 010110
= AF32 = 011100
AF33 = 011110
= AF42 = 100100
AF43 = 100110
All standardized PHBs are “per hop behaviour” descriptions, which only define the
forwarding behaviour of a single node. No statement can be made about the experienced
overall forwarding behaviour by a packet crossing a DS domain. Such chained forwarding
behaviours, called “Per Domain Behaviour (PDB)”, are addressed in RFC 3086 [143].
Per domain behaviours are requested to state specific metrics that quantify the treatment,
which should be measurable, to be used in SLAs.
Interestingly there is currently just one PDB standardized, which does not include any
quantifiable metrics. It is called “Lower Effort (LE)” PDB and targets traffic of lower
importance than the traditional best effort type.
There has been another PDB specification approach around called “’Virtual Wire’ PerDomain Behaviour” (draft-ietf-diffserv-pdb-vw-00). However, it has never reached RFC
status and expired in 2001.
Lower Effort (LE) PDB – 001000
Traffic that is of lower value than normal best effort traffic is currently marked with the
default PHB. However, such low value packets are allowed to be starved out in times of
congestion and can serve as lowest priority background traffic to effectively use available
link capacity without detrimental interaction with other traffic classes.
There is no precise DSCP encoding given, but the mentioned CS1 (001000) in the RFC
3662 [34] is expected to be widely used in LE supporting DiffServ setups.
51
17.11.2009
Table 3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Currently specified PHBs
DSCP
000 000
001 000
001 010
001 100
001 110
010 000
010 010
010 100
010 110
011 000
011 010
011 100
011 110
100 000
100 010
100 100
100 110
101 000
101 110
110 000
111 000
PHB
Default PHB / CS0
LE / CS1
AF11
AF12
AF13
CS2
AF21
AF22
AF23
CS3
AF31
AF32
AF33
CS4
AF41
AF42
AF43
CS5
EF
CS6
CS7
It should be clearly stated, that all PHB and
PDB encodings are recommendations and
network operators may choose alternative
DSCP values for the same behaviour.
That is why inter-domain PHB signalling
should include the global PHB ID signalling
together with locally applied encodings. This
approach has been proposed in this work
and is documented in chapter 7.3.1.
A second general statement shall be given
on fragmentation. If IP packets are being
fragmented along the forwarding path, there
is no explicit rule on how DiffServ marking,
metering, scheduling and dropping should
react to the set “more fragment” bit. As RFC
2474 states: “The policy to apply to packet
fragments is outside the scope of this
document”.
4.1.2 IntServ
The second fundamental QoS architecture for the Internet is the so called “Integrated
Services (IntServ)” architecture [38]. In contrast to DiffServ, IntServ is an approach
targeting fine-grained end-to-end QoS (see 3.1.1) with guaranteed absolute traffic
parameters. Guarantees are enabled through traffic flow specific reservations and
connection admission control in each node along the forwarding path. This requires a
connection setup procedure with resource request and resource grant messages and
leads to flow-specific reservation states in each relaying node. The association of packets
to those reserved flow states is not generally based on a single packet header marking,
but rather requires multi-field “classification” in each node. In IPv6, the “Flow Label” header
field (see Fig. 4) enables efficient IPv6 flow classification [155]. Reservations are application specific end-to-end flow states, which are signalled by means of a specialised
signalling protocol called “Resource ReSerVation Protocol (RSVP)” [39]. Reservations
exist in hosts and relaying nodes and are unidirectional. RSVP operates on top of IPv4 or
IPv6 and follows standard IP routing. Fixed route selection (source routing) can be
enforced by so called explicit route objects, which list the pinned down sequence of hops
to be used.
Soft state reservations are setup, which automatically time out, if not periodically being
refreshed again. Refresh cycles of 30 seconds are common. Reservations are receiver
initiated. Multicast is supported and upstream reservations are merged in multicast tree
joints towards the sender.
52
17.11.2009
Five fundamental questions are addressed in the IntServ architecture’s signalling:
• How to identify a flow (associate packets to reserved flows) ? Æ FilterSpec,
• What is the sender’s traffic profile ? Æ TSPEC,
• What guaranty profile is requested by the receiver? Æ RSPEC,
• Which reservation style is used in the FilterSpec? Æ fixed, shared, wildcard and
• Which service models are requested for the network element behaviour
Æ “Controlled-Load Network Element Service” or “guaranteed service”.
Since the resource reservation is flow-oriented, the flow descriptor (see Fig. 48) becomes
a central architectural element.
Flow Descriptor
FilterSpec
‘filter specification’
rules to associate packets
to flows
Æ classifier setup
FlowSpec
TSpec
RSpec
‘traffic specification’
describe the expected
traffic
‘reserve
specification’
describe the flow
reservations
Æ scheduler setup
Flow
Fig. 48
QoS
RSVP flow descriptor structure
The FilterSpec is generally not limited to certain header and/or payload fields of the
respective IP packets. However, source IP and source port are basically used for flow
classification. Three classes of filters (and hence the classifications) are specified: “fixed
filter”, which leads to separate reservations for each sender, “shared filter”, which allows
for resource sharing between several named senders and “wildcard filter”, which allows for
resource sharing between all senders.
The working principle of RSVP is as follows. A traffic source offers an application service
to potential clients and describes the offered traffic flow characteristics by means of token
bucket parameters in the so called TSpec structure. This parameter set is carried
unchanged from one node to the next in RSVP PATH messages. The TSpec eventually
reaches the potential clients and those receivers decide individually, whether they start a
reservation up-stream. This is signalled using RESV messages with concrete receiver
TSpec values, which are now referred to as “RSpec values”. Intermediate nodes will
reserve the requested resources according to those RSpec values in their scheduler
setup. The TSpec and RSpec parameter encoding is defined in RFC2210 [182].
Two QoS service modes of operation have been standardized, which either imply strict
bounds on end-to-end datagram queueing delays as “Guaranteed Quality of Service” [166]
or approximate QoS by means of capacity based admission control in order to emulate a
lightly loaded network to the respective “Controlled-Load” [183] flows.
Fig. 49 and Fig. 50 depict the message flow and node internal block diagram structure.
Fig. 49 also points out the merging of RSpecs at multicast tree junctions. Single receiver
requests are combined and upstream signalling and resource reservation is based on the
combined reservations requests.
53
17.11.2009
Fig. 49
RSVP message flow diagram
Fig. 50
RSVP support block diagram – after [39]
The policy control and the admission control function are normally harmonized across a
IntServ domain. Its central management is achieved through policy servers (also referred
to as “Policy Decision Point (PDP)”. The information exchange in query response manner
makes use of the “Common Open Policy Service (COPS)” protocol [91].
4.1.3 IntServ / DiffServ combination
The fine-granular end-to-end reservations based on IntServ can not be scaled into a
globally extended network. However, two solutions exist to tackle this scalability problem:
a) create coarse-grained reservations by applying IntServ to tunnels and
b) setup the interworking of IntServ with DiffServ networks in order to gain some QoS
advantage from non-IntServ enabled networks.
The first approach is commonly applied in MPLS-based networks, which is addressed in
chapter 4.3. The second approach is standardized in RFC 2998 “A Framework for
54
17.11.2009
Integrated Services Operation over DiffServ Networks“ [29] and will be briefly outlined
below.
End-to-end QoS in such a IntServ/DiffServ combined network path is being targeted in a
way, where whole DiffServ domains are regarded as virtual links between the IntServ
capable routers or hosts. That is, fined grained multi-field classification is performed in the
IntServ realm and behaviour aggregation (BA) classification (namely DSCP classification)
is applied in the virtual links. By the definition of supported PHB and the rate limited
admission control into the DiffServ domain, those virtual links obtain predictable forwarding
behaviour.
As an example, the RSVP support in Cisco routers can be switched between the standard
IntServ operation and the RFC2998 mode of operation (see Fig. 51). The difference lies in
the way of classification as well as the low latency queuing in the data plane for the
IntServ/DiffServ mode.
Fig. 51
Cisco’s two RSVP operation models: IntServ and IntServ/DiffServ [53]
4.1.4 ITU-T IP QoS concept
Besides the IETF standardization of IP QoS, ITU-T has also targeted IP QoS in its
recommendation Y.1221 [107] and Y.1541 [108]. Generally speaking, it distinguishes three
transfer capabilities, which are arrangements of traffic control and congestion control
functions. A concept of traffic contracts between users and the network is assumed.
Y.1541 adds classes of network QoS with objectives for IP network performance parameters in absolute numbers, which will be the base for the respective contracts. Table 4 lists
the six specified classes, their parameter limitations as well as the suggested association
to DiffServ PHBs.
55
17.11.2009
Table 4
Excerpt of IP QoS class definitions and performance objectives [108]
Two further classes (6 and 7) have already been identified, but with provisional status.
That is, eight classes can be expected for ITU-T’s IP QoS recommendations in the future.
4.2 Ethernet QoS
The predominantly used layer two networking technology in today’s data networks is
Ethernet. There are several flavours standardized in respect to the transmission speed
and channel allocation behaviour. Commonly used Ethernet variants are Fast Ethernet
(100 Mbps) with emulated shared medium operation, Gigabit Ethernet (1 Gbps) with
exclusively uses dedicated transmission lines and 10 Gigabit Ethernet (10 Gbps). 40 and
100 Gigabit Ethernet (40 and 100 Gbps) is currently under specification at the IEEE
P802.3ba task force. Those Ethernet types differ substantially in the physical and
operational specifications, but they all make use of one and the same frame format (see
Fig. 52).
Fig. 52
Ethernet frame format
Almost all local area networks (LAN) and many metro networks rely on Ethernet for device
networking. A special development has been made with Wireless LAN (WLAN) [99], which
uses an Ethernet-like frame structure as well.
Not only, that the Ethernet is the major LAN and metro technology. The Internet in global
scale mostly relies on Ethernet based exchange points for the mass data transmission at
56
17.11.2009
the interconnection of ASes in the core. (About 130.000 Terabyte of data is exchanged per
month [8].) However, native Ethernet does not provide any QoS functionality and core
interconnects are currently based on that standard Ethernet framing. The “virtual LAN
(VLAN)” extension to Ethernet added four additional octets in the frame structure for the
purpose of grouping end devices into VLAN-ID groups. That is, an Ethernet node can now
belong to one or more virtual LANs and communicate only to peers within the same virtual
LAN(s). This VLAN-ID filtering in the relay nodes has been standardized in IEEE Std
802.1Q [98]. This tag field not only includes the respective VLAN-ID field, but also provides
three priority bits. The actual usage of those user priority bits has been defined by the
IEEE P802.1p task group and is now published in IEEE Std 802.1D [97]. Fig. 53 depicts
the resulting tagged Ethernet frame structure.
Fig. 53
IEEE 802.1p User Priority marking in 802.1q (VLAN) tagged frames
VLAN enabled Ethernets are therefore able to classify frames into eight different classes
and provide class-based forwarding in Ethernet interconnection devices – called
“switches”. IEEE 802.1D gives detailed specifications on how marked frames are to be
enqueued (strict priority queuing only) and forwarded depending on the available number
of queues in the switching device. Since traffic type “2” is not used, seven so called traffic
types are specified as listed in Table 5. Interestingly, user priority “1” and “2” are handled
to be of less priority than best effort (“0”) traffic type.
Table 5
Ethernet traffic types [97]
Table 6 shows the mapping and merging of traffic types according to the number of traffic
separating queues.
57
17.11.2009
Table 6
Mapping of traffic types to available queues [97]
Because of the widespread usage of Ethernet in modern businesses and its available
traffic separation with strict priority queuing, IP QoS and Ethernet QoS are often combined
to achieve maximum QoS support in company internal networks – so called “Intranets”.
Chemnitz University for example applies the mapping between Ethernet priorities and IP
DSCP markings as given in Table 17.
Table 7
Chemnitz University applied Ethernet-priority-to-DSCP mapping
A further development in QoS-based Ethernet deployment is currently worked on under
the title “Carrier-grade Ethernet”. Several companies are pushing the standardization
process with different proposals for frame encapsulation and operational twists. Without
going into detail, Fig. 54, Fig. 55 and Fig. 56 show the resulting frame structures.
Fig. 54
VLAN Cross Connect / VLAN XC [123]
58
17.11.2009
Fig. 55
Q-in-Q / stacked VLAN / Provider Bridges - IEEE 802.1ad [100]
Fig. 56
MAC-in-MAC / Provider Backbone Bridges (PBB) – IEEE 802.1ah [101]
The most prominent approach is called “PBB Traffic Engineering (PBB-TE) or Provider
Backbone Transport (PBT)”, which however reuses the 802.1ah frame structure.
No matter, which approach will prevail, they all provide one or even several times the user
priority encoding in their respective frame structure. Furthermore, the encapsulation
schemes of Ethernet frames being nested in a second Ethernet frame (MAC-in-MAC
approach) allows for tunnelled transport of customer frames. That is, not only standardized
Ethernet-QoS forwarding is available, but rather QoS-based tunnelling and QoS-based
layer two “routing”. This plethora of choice is of particular interest for this thesis due to the
importance of Ethernet for future Internet setups as well as the good match to the crosslayer mappings enabled by the new Inter-AS QoS concept (see chapter 7).
QoS-based Ethernet congestion control
The last paragraph on Ethernet-QoS will touch on a further QoS-related approach
currently under development. Ethernet defines a congestion control mechanism, which
allows switching devices to stop the upstream neighbour’s sending by means of so called
PAUSE control frames. Such a locally generated frame signals a pause request in
milliseconds in times of congestion. Upstream neighbours can in turn be forced to pause
59
17.11.2009
their traffic sources creating a chain of backpressure under high network load situations.
However, sending selective PAUSE frames to upstream neighbours with differentiated
pause times for the 7 traffic classes appears to be a simple and promising approach in
order to confine the congestion effects in lower layers to low priority traffic types. Such
“Priority-based Flow Control (PFC)” has already been addressed e.g. in [171] and [126].
IEEE has started the work on PFC within a task group 802.1Qbb by the end of 2007. It
extends the original PAUSE based flow control as of IEEE 802.3x [102].
In principle, the send stop request will no longer lead to a full stop, but only to a sending
stop for a signalled priority. Given that Ethernet traffic is sorted into eight lanes (priorities),
Fig. 57 depicts the concept’s idea.
Fig. 57
Priority Flow Control [56]
The simulator OMNET has been extended to include priority-based PAUSE functionality.
However, the research is not yet completed and will not be published in this thesis.
Generally, backpressure flow control schemes do raise the hazard of congestion spreading, if several flows to different output directions are sent by a (intermediate) source, which
only partially add to a single output congestion. The resulting PAUSE backpressure,
however, does stop all flows undifferentiated, which artificially spreads the congestion to
practically un-congested forwarding paths as well.
Fig. 58
Congestion spreading [103]
60
17.11.2009
The current 802.1Qbb approach therefore includes a congestion notification mechanism
as described in 802.1Qau [104]. Due to this combined approach, the right source will
eventually receive the congestion notice and slow down selectively.
A more simple mechanism is being investigated, which sends the priority-based pause
together with a destination MAC address. That is, the sending intermediate node stops
only those prioritized frames, which are destined to that MAC address, which has been
identified as the highest load on the congested output queue. Research results will be
published outside of this thesis.
Generally speaking, the VLAN-based QoS support within Ethernet and the fast growing
deployment of Ethernet technology in all networking areas underlines the importance of a
close cross-layer coordination of traffic separation and its marking with the ubiquitous IP
networking and the proposed CoS concept.
4.3 MPLS QoS
“Multi-Protocol Label Switching (MPLS)” is the widely accepted tunnelling mechanism in IP
networks, which is used to introduce some connection-oriented forwarding behaviour in
the datagram network. Although MPLS is capable to encapsulate any networking protocol
data units, it is currently exclusively used for IP packet forwarding. MPLS introduces in
most cases an additional header structure, called “shim header”, which is effectively used
to encapsulate the packet data behind it. The second important characteristic is a new 20
bit addressing scheme, called “label”, which has interconnection local significance. Labels
are swapped between input and output labels during the MPLS forwarding procedure
using a “label information base (LIB)”. The strict swapping operation together with a LIB
establishment procedure creates a forwarding chain with fixed relaying nodes – the so
called “Label Switched Path (LSP)”. This concept of fixed length local addresses with
chained relay is not new, but can be found e.g. in ATM and Frame Relay networks as well.
Although MPLS can be used on top of all underlying packet transport capable networking
technologies, separate mapping procedures exist for ATM and FR due to this addressing
similarity.
The pre-established LSPs are excellent tunnels for traffic engineering approaches. MPLS
is therefore widely used to perform traffic engineering (flow steering and fast restoration)
besides the mandatory available shortest path first IP routing. Fig. 59 depicts the shim
header (label stack entry format [160]) and its usage between e.g. an Ethernet transport in
layer two and IP payload in layer three. The figure also implies the label stacking option,
which is explicitly shown in Fig. 60. Such hierarchy of tunnels is a new and powerful option
for scalable, fast restorable and transparent transport even of already tunnelled customer
traffic. This capability is of particular interest, since customer traffic is most elegantly
transported in tunnels and might be forwarded in nested tunnels in carrier’s carrier
scenarios. The latter implies inter-carrier (that normally means inter-AS) MPLS LSP
scenarios, which is not commonly used nowadays, but are envisioned for the near future.
In terms of QoS support, the shim header “Traffic Class (TC)” field is the obvious class
encoding. It is a three bit field, which allows for eight classes of tunnelled traffic. Historically, this field used to be named “EXP”, since the bits’ usage was for experimental
purposes only. However, RFC 3270 [75] explicitly targets the MPLS support for Differentiated Services and introduces the so called “EXP-Inferred-PSC LSPs (E-LSP)”. The eight
available behaviour aggregates (BAs) recall DiffServ behaviour aggregates with associated per hop treatments. However, the RFC deliberately delegates the mapping between
DSCP encoding and E-LSP encoding to some unspecified signalling or preconfigured
setup.
61
17.11.2009
Fig. 59
MPLS shim header structure and hierarchy usage
Fig. 60
MPLS Label stack structure
Because of the QoS dedication of the EXP bits, a renaming of the field towards “TC” has
occurred just recently. RFC 5462 [9] updates the respective DiffServ related MPLS RFCs,
but keeps the “E-LSP” abbreviation.
The second mapping option for IP traffic classes onto MPLS tunnels is given in so called
“Label-Only-Inferred-PSC LSPs (L-LSP)”. Here, each traffic class is associated to its
separate tunnel, represented by a different entry label. This extents the limitation of eight
classes up to – theoretically – a million supportable classes. This L-LSP QoS support picks
up the generally available class separation available in all “tunnel or virtual channel based”
transport technologies where each class is mapped into a separate encapsulation, which
might also involve separate forwarding paths per class.
The EXP/TC-inferred QoS forwarding behaviour combined with Label-inferred QoS
tunnelling combined with LSP hierarchies enables numerous quality of service support
options. This plethora of choice is of particular interest for this thesis due to the importance
of MPLS for future Internet setups as well as the good match to the cross-layer mappings
enabled by the new Inter-AS QoS concept (see chapter 7).
A crucial component of the MPLS QoS support is the control part for path setup (label
distribution) and TC marking distribution. The MPLS working group developed two major
signalling branches for QoS-related traffic engineering. One branch extended a specifically
developed “Label Distribution Protocol (LDP)” [10] into RFC 3212 “Constraint-based
62
17.11.2009
Routed LDP (CR-LDP)” [122]. The second branch reused the existing IntServ signalling
protocol, RSVP and specified the traffic engineering extensions to it. The resulting
“Resource Reservation Protocol-Traffic Engineering (RSVP-TE)” [13] defines several
protocol objects, which are most importantly used to convey the label information. For this
thesis, it is important to mention, that RSVP-TE does not directly signal EXP/TC markings
during the LSP setup procedure. A more generalized “colour” concept is used, which is
implemented as resource class attribute [14]. This new attribute becomes part of a path
description and can for instance be used for constraint-based routing.
In 2003, the MPLS working group has come to the decision to favour RSVP-TE for MPLS
LSP path setup and to “refrain from entertaining work that intends to progress RFC 3212
or related RFCs beyond proposed standard” [11].
Two major improvements to MPLS have been tackled during the past few years. InterDomain MPLS and GMPLS Traffic Engineering are of importance to the thesis’ class of
service concept.
Inter-Domain MPLS
RFC 5151 [73] is the RSVP-TE extension to a defined Inter-domain MPLS TE framework
[74]. Taking the operators’ independence in terms of MPLS support and configuration into
account, three methods of LSP establishment have been identified as depicted in Fig. 61.
Contiguous LSPs are setup following the procedures from RFC 3209 [13] and 3473 [26].
Nested LSPs follow RFC 4206 [128] and stitched LSPs are described in RFC 5150 [15].
Interestingly, those setup procedures do not specify, how inter-domain EXP/TC encoding
information is to be exchanged. Coloured routing is again proclaimed.
Fig. 61
MPLS LSP signalling: contiguous, nested, stitched
63
17.11.2009
Generalized Multi-Protocol Label Switching (GMPLS)
GMPLS is the logical abstraction of the MPLS approach towards physical representations
of “labels”. RFC 3471 [25] describes the functional signalling part of the extended control
plane. Targeted physical label switching representations are: time-division (e.g., Synchronous Optical Network and Synchronous Digital Hierarchy, SONET/SDH), wavelength
(optical lambdas) and spatial switching (e.g., incoming port or fibre to outgoing port or
fibre) – see Fig. 62.
Fig. 62
GMPLS label representations
Although such generalized labels are not markable with traffic class bits, the virtual
channel concept, which those time division, lambda or spatial encapsulations represent, is
well capable of QoS based tunnelling support – even more so with the decoupled and
hugely hierarchical generalized encapsulation concept (Fig. 63).
Fig. 63
GMPLS LSP hierarchy
64
17.11.2009
The class of service concept of this thesis therefore encompasses virtual channel based
traffic class separations as layer one QoS approach and includes GMPLS based tunnelling
on purpose.
4.4 QoS in access networks
Speaking about wired and wireless access networks in a thesis about BGP signalled class
of service support is rectified by the increasingly used intra-AS QoS support up to the
provider-customer edge of a network. Fast increasing access rates in multi-megabit scale
and fast growing acceptance of heavy data load application services by the Internet
community causes not only large volumes of traffic, but also a sensitive mixture of loss and
time critical data streams in that volume. Standardized access technologies all provide
means for QoS-based traffic separation and Europe seems to take a leading role in
actually applying them on customer lines and channels. The following section briefly lists
some typical QoS class sets, which are often even standardized with fixed parameter limits
for loss, delay and delay variation.
Wireless access technologies – UMTS
The currently widely available wireless access technology, “Universal Mobile Telecommunications System (UMTS)” is a third generation (3G) mobile telecommunications technology. It offers mobile data transfer services, which enable mobile devices to participate in IP
communication networks. The UMTS standard documents (e.g. 3GPP TS 23.107 [1]) a
detailed quality of service concept and architecture, which is based on four so called
UMTS bearer services. Table 8 lists the resulting QoS classes and their respective
characteristics and applications. The conversational, streaming, interactive and background class have detailed specifications for requirement and service parameters (see
Table 9).
Table 8
UMTS QoS classes [1]
65
17.11.2009
Table 9
UMTS Bearer Service Attributes [1]
Customer traffic entering through prioritised UMTS bearers will certainly be prioritized in
DiffServ capable provider’s backhaul network and should be passed on in e.g. a class set
of four across AS boundaries towards the destination network.
Wireless access technologies – LTE
The highest demand for QoS concepts in the core of the Internet builds up, if the pace of
speed increase in the core is slower than the increase pace in the access. This is
particularly true for the 3GPP “Long Term Evolution (LTE)” standard, which is currently
developed as a set of enhancements to UMTS. The official target of achievable customer
data rates is 100 Mbps downlink and 50 Mbps uplink speed [3]. This rapid speed increase
becomes even worse for the operator’s core network, if LTE-Advanced gets specified,
which promises up to 1Gbps access rate for a slow mobility usage case [4].
LTE addresses the different needs of such potentially high speed interactive services with
the definition of nine QoS classes as shown in Table 10. The repetition of classes is due to
the distinction between traffic of premium versus non privileged subscribers.
66
17.11.2009
Table 10
LTE QoS class attributes [2]
Wireless access technologies – WiMAX
The “Worldwide Interoperability for Microwave Access (WiMAX)” wireless telecommunication technology has been standardized under IEEE 802.16 [105] and is regarded as
alternative to wired broadband access solutions. WiMAX defines several QoS setups
(provisioned, admitted, active), which are unidirectional, flow-based and either dynamically
signalled or statically configured. Generally, the supported so called “service flows” are
parameter signalled with a detailed set of QoS parameters associated to each active
service flow. Large scale WiMAX deployments can, however, not guarantee the same
service flow setups being consistently available throughout the network at all time. The
solution are so called “global service classes”. A vital part of the WiMAX QoS concept is
the naming of service flows, which are then associated with parameters sets. That is,
global service classes should carry standardized names, which in this case are standardized by naming rules. The resulting names can therefore be parsed and reveal pointers
into standardized tables of fixed options or parameters settings.
In a way, those parameter tables and their combined referencing constitute the provided
class sets. This spans a large class set range, which is in no way comparable to the
aforementioned simple IP or Ethernet class sets.
Simple flow prioritization is not provided in WiMAX except for the case, when two flows
have identical parameter sets but different priorities, a so called “traffic priority parameter”
decides upon the precedence. The interworking of WiMAX with its detailed QoS approach
with simple IP or Ethernet classes of service is not trivial. WiMAX, however, provides a
means to associate e.g. Ethernet priorities with WiMAX service flows by means of
classifier rules. Among others, the Ethernet user priority or the IP DSCP value can be
added as classification criteria to a specific flow.
Wired access technologies – DSL / ATM
“Digital Subscriber Line (DSL)” is a family of wired access technologies, which all use the
traditional two-wire access lines for higher speed packetized data access services. The
most prominent family members are: “Asymmetric Digital Subscriber Line (ADSL)”
[112][113], “ADSL2” [114][115], “Very-high-bitrate DSL (VDSL)” [116] and “VDSL2” [117].
The digital subscriber line standards do not address traffic separation and prioritization, but
provide bit pipes for higher layer protocols. DSL uses Asynchronous Transfer Mode (ATM)
67
17.11.2009
as framing and networking technology for packetized data transport services. The only
natively available prioritization is found within the “High-Level Data Link Control (HDLC)”
encapsulated transport of overhead messages.
In order to understand the traffic separation in ATM-based DSL access, Fig. 64 depicts the
structure of the fixed length ATM packets, called “cells”. For cell prioritization, there is only
a single header bit available, which indicates a higher cell less priority if set. It is a sort of
punishment marking for cells and can not be regarded as traffic class marking as seen
with MPLS TC, IP DSCP or Ethernet priority. However, ATM cells carry local scope
addresses (VPI/VCI), which are swapped at the forwarding switches and need to be setup
during a connection setup phase. That is, the set up chain of VPI/VCI forwarding tables
again creates a tunnelling scheme for the ATM cell payload by means of the resulting
virtual channels. Several virtual channels can be established across a DSL access line
and might be used for traffic class separations. Especially in Europe, it is common for DSL
providers to configure e.g. 4 or 6 ATM VCs onto DSL lines for QoS purposes.
Fig. 64
ATM cell structure
Traditionally, the ATM transport not only included the actual access line transport, but also
the aggregation network at the providers edge. This trend has changed towards an
Ethernet-based aggregation network as specified in TR-101 [68]. Furthermore, TR-101
also includes interface options, where direct usage of Ethernet instead of ATM on the DSL
line is recommended. Together with this Ethernet transition, the Ethernet priority support
for QoS has explicitly been standardized. It is included as a “MAY” option.
Given the above configurations and trends, it is expected to see separated traffics of four
to eight classes within DSL access networks.
Wired access technologies – GPON
Due to the continuing demand for higher speed access rates on one hand and the high
transmission capacity of fibre based technologies on the other hand, network operators
push fibre as close to the customer as possible. This will include aggregation points in
street cabinets or even fibre in customers’ houses. The predominantly used technology
today for this purpose is “Gigabit-capable passive optical networks (GPON)” [111]. Within
GPON, four service-bearing transmission containers are provided, which represent four
differentiated classes of service. GPON specifically offers an Ethernet data service (see
Fig. 65), which includes TR-101 [68] functionality. Furthermore, GPON is often combined
with VDSL2, which again can lead to Ethernet transport with optional user priority
68
17.11.2009
encoding. Due to the underlying four bearers, this Ethernet transport can additionally be
stretched into four parallel Ethernet transport channels.
It is expected to see separated traffics of four to eight classes within GPON based access
networks.
Fig. 65
Functional layering structure for the Ethernet data service [111]
4.5 Summary of expected Class of Service support
Given the above listed variants of QoS support in different networking technologies, the
following table summarizes the expected number of separated traffic classes.
69
17.11.2009
Table 11
Overview of available layer 2 and 3 quality of service classes
¹ ATM has not been extensively described in this thesis, due to its declining usage.
However it needs to be mentioned that ATM had long before one of the most detailed and
researched into QoS concept with parameter negotiation, detailed measurement, traffic
conditioning, admission control and management functions. Many recent developments –
especially in the field of MPLS – seem to have adopted and learned from ATM. The
mentioned QoS categories relate to classes of service (CBR, VBR, ABR and UBR as
defined by the ATM-Forum).
70
17.11.2009
5 State of the art AS interconnection
The Internet is a patchwork of interconnected autonomous systems, which exchange IP
traffic. By means of an Exterior Gateway Routing protocol, BGP version 4, each AS
announces the IP networks, represented by IP prefixes, which are reachable through that
AS. Interconnected autonomous systems establish so called BGP peering sessions for this
reachability information exchange. The “Network Layer Reachability Information (NLRI)” is
advertised in BGP UPDATE messages together with associated path attributes of the
announced routes. Each AS in turn processes the route advertisements and determines,
which routes it takes in its own routing table as well as which routes it relays to other
external peers. Those policy decisions are taken based on the interconnection topology of
that AS, as well as the operator’s policy rules, which process the BGP path attributes. Fig.
66 depicts the basic options for physical AS interconnection, being direct point-to-point
links and Ethernet based traffic hubs, the Internet Exchange Points.
Fig. 66
AS interconnection options
Whether or not an AS decides to establish a direct interconnection or an IXP interconnection depends mainly on the link cost as well as on the business gain through interconnections at central exchange points. The interconnection link technology is increasingly based
on Ethernet. It is mandatory for IXP access due to their Ethernet switch based platform,
but also has become popular for direct lines. An AS is usually solely responsible for the
link towards its IXP switch port or towards the interconnected AS. In the latter case, either
the two parties share the cost or the smaller one takes the lead in order to get connected
to the bigger one. This can either be fixed line cost or part of the traffic cost in non-zero
settlements.
The link can be owned or rented. The link speed is normally in the range of 100 Mbps up
to 10 Gbps (sometimes bundles of n x 10Gbps) and therefore mostly fibre based.
However, dark fibres used with own transmission equipment are more common in the
71
17.11.2009
direct interconnection case. Otherwise, rented virtual channels are used by the vast
majority.
If the interconnected parties are not co-located in the same building, where direct Ethernet
interconnect is possible, Ethernet over carrier (wavelength, MPLS, SDH) seems to be the
most often used linking technology.
Whether or not a point-to-point interconnection is chosen depends on several factors.
Driving forces are geographic location, customer base and customer traffic destinations as
well as zero or non-zero settlement based interconnection.
Advantages of point-to-point interconnections (private interconnection):
- One or only a few peering ASes are geographically close and terminate the majority
of traffic loads of the own customer base. So it is worthwhile to rent or even build
the direct link,
- Link speed requirements to only a few peering ASes exceed the normal IXP interconnection speed,
- Private mutual agreements (e.g. QoS requirements, MTU sizes, and other technical
twists) should neither be made public nor can be provided by public platforms,
- High security and confidentiality requirements request for a private interconnection,
- Point-to-Point interconnection is setup to important partner ASes for backup purposes and
- Accounting for paid traffic exchange is easy with the link interface counters.
Advantages of IXP based interconnections (public interconnection):
- The high link (rent) cost are prohibitive for the setup of many single link interconnections to other ASes for large scale connectivity, if only a single link to a major exchange point serves the job (e.g. single line from Africa to a large European
exchange point),
- The customers’ traffic demand is distributed and most of the high load terminating
ASes are present a the chosen IXP,
- The reception of several paths towards the same network increases routing robustness, if one interconnected AS fails to reach that destination,
- Interconnection at IXPs is normally a zero settlement,
- Direct interconnection across a switch towards many other networks reduces latency, which would otherwise build up with up and down transitions through higher
tier ASes,
- accounting based on source MAC address filtering is possible and
- Route servers at IXPs reduce the number of BGP sessions to a single one.
The “Router Server (RS)” argument needs to be briefly explained. Since IXPs allow for the
interconnection of several hundred ASes at one place, the BGP session load of the single
interconnection AS border router is enormous. However, it can be cut down to just one
BGP session, if the IXP offers a central route server for its customers. Such a route server
acts similar to a route reflector, but is peered with using eBGP with all resulting in and out
filter policies in the peering ASes. Route announcements and receptions are therefore
exchanged with the central route server only. The actual traffic exchange, however, takes
place directly between the interconnected ASes.
In numbers, many ASes are interconnected via public exchanges, but most of the Internet
traffic load is carried over point-to-point interconnections.
Large companies such as Google and Amazon are present at many Internet exchanges
for close user proximity. Transit providers with global networks are also commonly present
at IXPs, which use the platform as customer aggregation points.
72
17.11.2009
Quality of service support is currently almost exclusively provided through private
interconnections. Public exchanges, however, start to support Ethernet based traffic
separation [127]. Fig. 67 exemplarily depicts the internal topology of the German Internet
exchange, DE-CIX, in Frankfurt, which is distributed across four locations in Frankfurt and
has proven to support VLAN user priority marking upon request.
Fig. 67
DE-CIX topology 2009 [61]
The native interconnection between two ASes does not reveal the associated payment
structure – settlement – involved in the interconnection. Payment based wordings are:
• “transit” / “non-zero settlement” for paid traffic exchange (see 5.1) and
• “peering” / “zero settlement” for free of charge traffic exchange (see 5.2).
Both words “transit” and “peering” are, however, also used in purely technical terms. That
is, “transit traffic” / “transit network” could simply describe the forwarding architecture
across a different network towards the destination, regardless of any payments.
The same applies to the word “peering”, which can be used to describe any interconnection between adjacent networks (peers). Moreover, the BGP protocol always speaks about
peers and peering sessions no matter what settlement is associated with that interconnection.
The level of interconnection of a single AS determines its rank in a global AS hierarchy,
the so called “tier structure”. Lower level tiers pay for their traffic exchange with higher
level tiers (“they buy transit”), where as same level tiers might “peer” with each other for
mutual benefit for free.
73
17.11.2009
In theory, the highest tier group, “tier 1”, applies to ASes, which only interconnect via free
of charge peerings with other tier 1 ASes and thereby gain full connectivity to all globally
available Internet destinations. Due to their fully meshed interconnection, the routing table
of those AS border routers does not contain a default route entry, but holds specific routing
entries to all routable prefixes. Routers with such a full routing table make up the so called
“Default Free Zone (DFZ)”.
At the lower end, “tier 3” ASes exchange all their external traffic through transit interconnections with higher tiers. The vast majority of ASes are “tier 2” ASes, which hold some
free of charge peerings to other tier 2 ASes and buy transit for the remaining global
connectivity. Deutsche Telekom is such a tier 2 example, which runs a rather widely
interconnected network with just one upstream transit (AS1239).
Fig. 68 depicts the described Internet hierarchy.
Fig. 68
Internet hierarchy
5.1 IP transit
The easiest way to achieve global connectivity for its customer base is through buying of
transit from the global players, either largely interconnected tier 2 or from one of the
roughly 10 core ASes of type tier 1. The selling party is the provider and the buying party
the customer. There are many factors, which influence this interconnection decision – cost
and reliability being the major drivers. Non-zero settlements are not published and the two
parties normally sign non-disclosure agreements about the interconnection specifics and
rates.
Two major settlement philosophies are common, being classical IP interconnection with
volume based charges and upcoming Voice interconnections with time based charges.
The latter is a new development, which is coming up through the current transition trend
from time based circuit-switched voice interconnections towards “Voice over IP (VoIP)”
type voice interconnections. It is assumed to be currently common to have two separate
74
17.11.2009
interconnections for the Internet and the voice services, but the drive towards a purely IP
interconnected service platform might lead to a volume based voice interconnect as well.
The easiest volume interconnection charging model is the “difference in volume” charge.
Incoming traffic volume is subtracted from the outgoing traffic volume at the interface of
the upper tier. Since large traffic volumes will stream down from the upper peer to the
lower one, the lower gets the positive sign and pays for the difference. This in turn reveals
that traffic from the lower tier towards the higher tier limits its revenue. A second charging
model is a base and offset model, which includes a monthly fixed transit cost amount for a
”Committed Information Rate (CIR)” and a volume based excess charge for unexpected
excess traffic.
In densely interconnected regions, such as the US, Europe, Japan etc., a plethora of
interconnection choices exist, which enable transit competition due to the cheaply
available interconnection links. The trade-off between higher transit cost and alternative
interconnection link expenses make the Internet more expensive in remote or underdeveloped regions of the world. Operators tend to establish several transit interconnections for
backup and strategic reasons. Contracts are constantly renegotiated, which leads to a
ever changing AS interconnection topology. Separate companies have specialized in data
mining the AS tree in order to provide consultancy to transit selling parties, monitor the
interconnection changes for debugging or trend analysis, document the topology for fault
tracking and usage statistics etc. Even some vague conclusions can be drawn from the
observed connectivity changes and interconnection path announcements, which estimate
customer churn and undisclosed price movements.
The clue to the selection of possibly multiple transit paths as well as the base for the
mentioned analysis work is the policy based routing protocol BGP. Route attributes are
used for mutual or single sided path selection by filtering, e.g. a cheap unreliable transit for
the normal traffic load and a high priced transit for some important customers or services.
This is often prefix oriented and used in a multi-homing scenario for transit cost optimization.
BGP as a path vector protocol creates a record of crossed ASes in the route advertisement process, which provides partial knowledge about the AS graph structure.
The introduction of the currently missing quality of service support on transit interconnection raises several issues, which need to be observed and addressed in future analysis.
Expected issues are:
- transit partner selection based on available QoS support,
- transit partner selection and or negotiation about QoS class granularity, marking/remarking and traffic handling (shaping, scheduling, dropping etc.),
- possible QoS-related charging models instead of the single class model today and
- extended reliability discussion on whether the QoS support is only interconnection
local or extends across further interconnections.
The thesis’ class of service concept provides a simple marking and rate limitation
mechanism, which addresses some basic QoS improvements in a transitive inter-domain
manner.
5.2 IP peering
The interconnection of ASes for mutual benefit for no cost (zero settlement) is called
“peering” and happens largely at internet exchange points.
Partnering ASes agree to setup interconnection links with BGP peering sessions to
exchange reachability information. If the customer base of each party often exchanges
traffic with the opponent customer base, this short-cut interconnection saves both sides
transit costs via higher level tiers as well as transfer delay time due to the shortened
forwarding path.
75
17.11.2009
Peering requests can be raised informally and the resulting agreements can be rather
loose. That is, no complex service level agreement is mandatory and as long as both
partners are content, even a handshake can be sufficient. Emails with contact details are
exchanged and the IP addresses and AS numbers of the peering equipment are sufficient
knowledge. Such a simple setup leaves the questions about interconnection quality, fault
handling, service reliability etc. open. If either party becomes unhappy with the way of
operation, “depeering” can occur.
Generally, peering requests will be refused in the following cases:
• requests from own customers,
• requests from potential customers,
• requests from other peer’s customers,
• requests from providers with bad track records,
• requests from providers with low infrastructure investment policy and
• requests from providers, where the mutual benefit is questioned.
The peering / depeering business is a playground for multi-dimensional optimization
strategies and resulting business models. Social networking and market influence by route
manipulations have strong impact in the request and grant procedure.
In the transit case, attracting customers of competitive transit providers by financial or
technical incentives to use the own transit path is common practice. However, attracting
customers of competitive transit providers to use the own peering agreement is considered
bad practice.
The peering policies of network providers are often very short and publicly available. They
are mainly published to attract potential partners. Existing providers and new players use
the information to optimize their peering relationships and to find the right geographical
and technical base for the interconnection.
Examples of publicly available peering policies are e.g. [162] and [174].
The most comprehensive resource for publicly available provider information, their
geographical presence at exchange points, their public policy, their interconnection types
and speeds etc. can be found at the so called “PeeringDB” [151].
An interesting case study is the “aggressive, user-driven rollout” peering strategy of the
company Google [58]. In order to cut down delay times for good user experience of
Google services, the company is welcoming direct peering interconnections with customer
networks at many places around the world. Furthermore, the company is a leading force
for the ongoing IPv6 transition and mitigates the currently missing global IPv6 transit
availability by means of direct IPv6 peering.
Google has also made a scaleable “signup” procedure proposal, which is of particular
interest for this thesis class of service support signalling approach. Google is running a
large IPv6 connectivity test with so called “trusted tester” peering partners. However, the
company can not possibly sign up testing agreements with each interested peering AS,
considering the number of possibly 55000 assigned and about 30000 actively participating
ASes in the Internet [93]. Therefore, a proposal was made for sending a BGP community
value of “15169:6666” in the advertisement of a specific IPv4 prefix to “sign up” for the
trusted tester program [58].
76
17.11.2009
5.3 Internet Routing Registry - IRR
The “Internet Routing Registry (IRR)” is a database system to store and globally access
routing policy information in a structured, humanly readable and automatically usable way.
32 IRRs are currently listed [137], which differ in administration and database quality
(performance, consistency and availability of database tools). In theory, the distributed
registry information of the different regions in the world would comprise of one consistent
information base about all AS routing policies. In practise, operators mainly rely on the
information stored in the RIPE database, which is known to be the most advanced and
consistently administered database. The representation of stored IP routing policies has
been standardized in RIPE document “ripe-81” [159], which has been republished as RFC
1786 [21]. Many registries make use of the “Routing Policy Specification Language
(RPSL)” [6] or “Routing Policy Specification Language next generation (RPSLng)” [35].
The biggest advantage of this specification language is its usage for automated router
configuration generation. Network operators specify their AS number and assigned
prefixes as well as their import and export policies in the registry. Direct interconnection
partners can draw their generated filter rules based on that structured information and
ASes further away of the originating AS can verify the incoming route information for
plausibility and sanity.
If all operators would rely on this registry mechanism, malicious route advertisements and
the spreading of such fake information could largely filtered out. As the “Youtube outage”
[41] in February 2008 showed, however, many operators are still not making use of this
valuable Internet routing registry in their update sanity checks.
77
17.11.2009
6 Related work
A number of QoS improvement approaches have been proposed before, but none has
been standardized and actually used for QoS support in the public inter-domain case.
Private inter-domain QoS setups do exist, but are not made public. In such cases, the QoS
configurations and parameter settings are agreed on offline and documented in service
level agreements.
Three major characteristics have been identified about the past QoS improvement
approaches:
1. Quality of Service is targeted end-to-end and includes the inter-domain interconnections for the case of sending and receiving parties being in separate ASes,
2. Quality of Service is targeted in a guaranteed quality fashion, which requires detailed parameter signalling, QoS enforcement functions, QoS parameter measurements, violation detection and fining and
3. Quality of Service is targeted in a homogeneous fashion, that is, all participating
ASes need to support the same QoS setup. This includes the common signalling
protocol, common setup of class sets with the respective classification, scheduling
and dropping functionality.
Some important examples of those past QoS improvement approaches are addressed
below.
France Telecom and Alcatel submitted an Internet draft, draft-jacquenet-bgp-qos-00 [59],
in 2004, which introduced the so called “QOS_NLRI” attribute in BGP. It is used for
propagating QoS-related information associated to the NLRI (Network Layer Reachability
Information) information conveyed in a BGP UPDATE message. Single so called "QoS
routes" are signalled, which fulfil certain QoS requirements. Several information types are
defined for the attribute, which concentrate on rate and delay type parameters. This
approach therefore addresses QoS guarantees for selected end-to-end routes. QoS
parameters, such as packet rates, loss rates, one-way delays and inter-packet delay
variation are signalled in absolute numbers and might need to be re-signalled, if end-toend requirements or network load conditions require adoption.
Parameter signalling, however, introduces two major drawbacks in global scale operation.
The first is the resource accounting, which actually registers used and available capacity
shares together with triggered signalling and admission control. This effort is justifiable for
just some QoS routes, but is unmanageable if most of the routes have associated
parameter sets. The second drawback is protocol stability. BGP has been designed to
trade-off routing dynamic for routing stability. Frequent parameter signalling is therefore
counterproductive.
In support of this argument, the BGP route flap dampening (RFC 2439 [175]) behaviour
should be briefly described. If enabled, BGP will suppress frequent route advertisements
based on a penalty scheme with hysteresis thresholds. In this dampening concept, each
route flap (withdraw/announcement pair) accounts for a penalty of 1000 and each attribute
change for a penalty of 500. Penalties accumulate for each advertisement event and
decays by 50% in a configurable time period. Penalty counts above a suppress limit
prevents the advertisement relay of a given prefix and its attributes. Penalty counts below
a reuse limit switch to normal router advertisement operation. Fig. 69 depicts the dampening characteristic.
78
17.11.2009
Fig. 69
Route Flap Dampening [167]
The work on this draft is embedded within a European Union funded project called
“Management of End-to-end Quality of Service Across the Internet at Large (MESCAL)”
[139]. This extensive project work on inter-domain QoS support goes far beyond the
limited class of service approach of this thesis. It is based on a so called “cascaded
signalling approach”, which assumes AS-internal DiffServ QoS class support – referred to
as “local-QoS-class (l-QC)” – and inter-AS QoS class support with resulting “extendedQoS-classes (e-QC)”. Fig. 70 depicts the cascaded approach with the l-QC and e-QC after
interconnection. The MESCAL project targets end-to-end QoS guarantees and therefore
signals parameters between ASes. Such parameters are part of the l-QC QoS specification and the e-QCs are constructed recursively from them following parameter specific
combination rules.
Fig. 70
MESCAL - Cascaded Approach [139]
As mentioned before, such QoS guarantees with parameters signalling are out of focus for
this thesis’ class of service concept. However, the MESACAL project also includes an
option - called “loose guarantees solution” - that renounces end-to-end QoS guarantees.
79
17.11.2009
However, it still performs mutual negotiations on performance parameters and bandwidth
requirements and requires either globally understood class indicators (e.g. DSCPs) or
SLA-based local indication agreements. Dynamic signalling of (re-)marking information
and marking preservation is not provided.
France Telecom started a second Internet draft, draft-boucadair-qos-bgp-spec-01 [37], in
2005. It is based on the specified QOS_NLRI attribute and introduces some modifications
to it. The notion of AS-local and extended QoS classes is used, which effectively describes
the local set of QoS performance parameters or their cross-domain combined result. Two
groups of QoS delivery services are distinguished, where the second group concentrates
on ID associated QoS parameter propagation between adjacent peers. The first group is of
more interest for this thesis’ work, since it concentrates on the "identifier propagation" such
as the DSCP value for example. However, this signalling is specified for the information
exchange between adjacent peers only and assumes the existence of extended QoS
classes and offline traffic engineering functions. The limitations of the inherited QOS_NLRI
attribute remain.
The co-workers at France Telecom, Christian Jacquenet and Mohamed Boucadair, hold
large contributions to the mentioned work and explain provisioning techniques for the
currently topical IP/MPLS networking case (see [120]). However, the exchange of class
identification (marking) is also not addressed.
Another approach has been raised by a group of researchers at Johns Hopkins University.
It is described in [24]. The Internet draft associates a list of QoS metrics with each prefix
by extending the existing AS_PATH attribute format. Hop-by-hop metric accumulation is
performed as the AS_PATH gets extended in relaying ASes. Metrics are generically
specified as a list of TLV-style attribute elements. The metrics such as bandwidth and
delay are exemplarily mentioned in the draft.
One contribution specialized in the signalling of Type Of Service (TOS) values which are in
turn directly mapped to DSCP values in section 3.2 of the draft [185]. The TOS value is
signalled within an Extended Community Attribute and, if it is understood correctly, will be
applied to a certain route. An additional value field is used to identify, which routes belong
to which signalled TOS community. Who advertises such attributes and whether they are
of transitive or non-transitive type remains unspecified. Advertising multiple paths (and
associated metrics) for one prefix is addressed and a new path selection algorithm had
been proposed. The concept would therefore support the packet classification and classbased route selection. The draft expired in December 2006.
The most comprehensive analysis (although not an IETF draft) is given in [7]. This "Interprovider Quality of Service" white paper examines the inter-domain QoS requirements and
derives a comprehensive approach for the introduction of at least one QoS class with
guaranteed delay parameters. The implementation aspects of metering, monitoring,
parameter feedback and impairment allocations are all considered in the white paper.
However, QoS guarantees and frequent parameter signalling have been identified as
critical characteristics for the inter-domain global scale routing system and the BGP
protocol stability. It is valuable work for the fine-grained QoS setup for an arguably large
number of selected end-to-end routes. A general applicability of the concept for nearly all
Internet routes is not feasible.
A more economically inclined approach has been published during the IEEE ICC2002
conference under the title “Enabling dynamic market-managed QoS interconnection in the
next generation internet by a modified BGP mechanism” [94]. It relates to this thesis for
two reasons. Firstly, the intention of QoS support at inter-domain interconnections and
80
17.11.2009
secondly the usage of BGP for signalling. It is again based on the QOS_NLRI as described above, but includes price information as well. Although the economical characteristics might become the clinching argument for or against any QoS-based interconnection, it
is not expected to undisclose such information in publicly seen routing protocol messages.
Furthermore, the limitations of QoS guarantees and the associated parameter signalling
have already been described above.
A further concept proposes BGP-based QoS service capability signalling for groups of
NLRI. This Internet draft was launched in October 2006 under the name “draft-djernaessimple-context-update-00” [68]. The draft does not specify the precise signalling encoding
of QoS class markings and parameter signalling, but rather retreats to a more general
QOS Service signalling, which might optionally involve interconnection local marking
signalling. The fundamental idea in this signalling concept is the grouping of reachability
information (prefixes) in QoS Service address families. BGP always signals in the
UPDATE messages, which address family the contained network layer reachability
information belongs to. The respective concept of “address family identifier (AFI)” and
“subsequence address family identifier (SAFI)” has been defined in RFC 4760 [19]. The
draft proposes to define a new AFI/SAFI for QoS Service signalling and all NLRI contained
in such an UPDATE message belong to that QoS Service context. This approach is
attractive for two reasons. Firstly, the signalling overhead scales well due to the grouping
effect of possibly numerous prefixes under a common AFI/SAFI based signalling.
Secondly, QoS related UPDATE information can selectively be signalled for the separate
AFI/SAFI. The same is true for selective route refreshes and soft-reconfigurations.
However, this thesis aims for a global scale class of service interconnection support for
possibly all Internet routes. Using the capability signalling concept, this would result in
double signalling of all prefixes, one time for traditional reachability and a second time for
the QoS service context association.
Further observations on existing QoS signalling approaches are summarized in RFC 4094
[131] – a review analysis produced by the “Next Steps in Signaling (nsis)” working group.
Half of the document is dedicated to RSVP analysis (see 4.1.2), being the most important
QoS reservation protocol in today’s networks. Since RSVP is an end-to-end QoS signalling
protocol, which is also augmented to establish MPLS traffic engineering tunnels, it has
high potential to setup (possibly tunnelled) inter-domain QoS paths. RFC 2814 [184] and
RFC 2815 [163] also address the mapping issue of Integrated Services QoS into Ethernet
User Priorities. However, no fixed mapping is – and can be – defined, but rather a request
and response negotiation between neighbouring nodes about locally available Ethernet
resources is suggested. The dissemination of DSCP values has also been standardized
for RSVP in RFC 2996 [27]. The usage of RSVP, however, raises concerns about
scalability (due to the flow-based end-to-end nature and soft-state signalling behaviour)
and lately about direct user <-> provider equipment interaction (see Fig. 73).
RFC 4094 analyses several intra-domain signalling protocols, which similarly to RSVP
allow for resource reservations for traffic flows. The protocols are Tenet [18], ST-II [65],
YESSIR (YEt another Sender Session Internet Reservations) [148], Boomerang [76] and
INSIGNIA [129]. Differing aspects are signalling complexity, sender or receiver initiated
reservations, and multicast reservation support.
Three inter-domain reservation protocols are also analysed, which is closely related to this
thesis. The first is the “Border Gateway Reservation Protocol (BGRP)” [147]. BGRP
creates a sink-tree reservation structure limiting the reservation states in border nodes.
DiffServ forwarding is expected and sender-initiated PROBE/GRAFT reservation messages aggregate resource requests along the way by reusing and re-allocating existing
reservations.
81
17.11.2009
The reservation tree structure can not fully aggregate reservations, due to the possibly
differing roots of multiple trees. Therefore, a second inter-domain protocol, called “Sharedsegment Inter-domain Control Aggregation protocol (SICAP)” [168] has been defined,
which optimizes the aggregation using shared-segment aggregations instead of a tree
structure. Due to this change in reservation structure, the state information in border
routers can be significantly reduced with SICAP.
“Dynamic Aggregation of Reservations for Internet Services (DARIS)” [33], the last
analyzed protocol in RFC 4094, provides a threshold based dynamic inter-domain
aggregation scheme. Individual reservations are monitored and trigger the setup of an
aggregation reservation by crossing a configured threshold. This approach also establishes shared segment reservations along AS path routes. Intermediate ASes can in turn
remove individual reservation states and rely on the aggregate instead.
All of the analysed protocols have in common, that they setup flow reservations with fixed
parameters. This is far too complex for an approach that targets general traffic separation
for potentially all flows in the Internet without explicit resource reservations and QoS
guarantees.
The same argument holds true for the Next Steps in Signalling (nsis) concept, which is
standardized within an official IETF working group, which has already produced five RFCs.
They focus on signalling framework, protocol design and signalling security.
Fig. 71 depicts the layered structure of interconnected NSIS components in a node, which
shows the general structure of the concept. The “NSIS Signaling Layer protocols (NSLP)”
and the “NSIS Transport Layer protocols (NTLP)” represent the two-layer framework
structure. In the upper half, the NSLP for QoS signalling [132] is of most interest here. It is
still in proposed standard draft status. The last draft version (-16) expired in August last
year. The lower half is dominated by the universal transport layer protocol “General
Internet Signaling Transport (GIST)” [164]. It has recently been changed into an “experimental” draft status. Both together provide similar ways of operation and achievable
functionality as RSVP. However, the NSLP QoS does not depend on a specific underlying
QoS model and supports different reservation types (such as edge-to-edge, access-toedge, edge-to-end).
NSIS is a universal signalling concept that appears to be well applicable for a wide range
of resource reservation for flows of different granularities. Three independent implementations exist, which are all based on Linux platforms. None of the commercial router
producers has NSIS implementations in their products.
It is expected, that NSIS is well suited to achieve similar traffic separations as targeted in
this thesis at least within the networking layer. However, it has not been chosen as base
for the new cross-domain and cross-layer coarse grained Quality of Service support
concept for several reasons. First of all, the flow-based reservation signalling is considered
counterproductive as mentioned before. Secondly, the concept is too complex for the
aspired simple traffic separation. Thirdly, the lack of support in commercial routers would
delay the adoption of the proposed concept of this thesis in provider networks at large
scale. Lastly, the recent shift towards experimental status is a major drawback on the road
to commercial deployment. Especially the reasoning of the “Internet Engineering Steering
Group (IESG)”, for the downgrade from proposed standard to experimental standard is
remarkable. Fig. 73 and Fig. 72 document the current situation of the GIST standardization
process.
82
17.11.2009
Fig. 71
Components of a NSIS node - [80]
Next Steps in Signaling
Internet-Draft
Intended status: Experimental
Expires: December 5, 2009
H. Schulzrinne
Columbia U.
R. Hancock
RMR
June 3, 2009
GIST: General Internet Signalling Transport
draft-ietf-nsis-ntlp-20
...
Fig. 72
GIST protocol change to “Experimental“ status [164]
83
17.11.2009
To: Gerald Ash <gash5107 at yahoo.com>, "iesg at ietf.org" <iesg at ietf.org>
Subject: Re: [NSIS] FW: I-D Action:draft-ietf-nsis-ntlp-20.txt
From: Ross Callon <rcallon at juniper.net>
Date: Thu, 11 Jun 2009 23:17:47 -0400
The fundamental problem with GIST is that is allows normal hosts (laptops, desktops, …) to send traffic
to the control plane of routers. This opens up a new vector for hosts to be the source of DDOS attacks
against the control plane of the routers. Note that such DDOS attacks are not just theory -- in fact multi-gigabit
DDOS attacks against routers have occurred and do occur, and thus protecting against these is critical. It is
therefore normal for service providers to prohibit "host to router" signaling packets (such as RSVP
packets) from entering their network from the customer networks, for example by discarding these at the
CE/PE boundary.
Unfortunately the fact that such DDOS attacks are facilitated is not dependent upon the method that the router
uses to recognize the packets as signaling packets. So long as a host has the ability to send traffic to the control
plane of routers, then attackers will be able to harness the power of thousands of compromised hosts to attack
routers.
Of course the same issue could come up with RSVP. It became a standards track protocol a very long time
ago, and would probably face the same scrutiny if it were a new protocol being proposed today. The
current widespread use of RSVP is generally in ISPs limited to support of MPLS within a service provider.
The same issue comes up in terms of DDOS attacks against application servers. Here one issue is that we don't
have an alternative: hosts have to be able to send traffic to servers. Also, in general at least the largest DDOS
attacks against servers need to be dealt with by putting appropriate packet filters / rate limits in place in routers
(assuming that the router network is operating, and wasn't taken down by a different DDOS attack).
In terms of the right want to deal with such DOS attacks: The reality is that it would be quite a major
undertaking to deploy sufficient protection to allow hosts in general to signal to the router's control
plane while still protecting against such attacks. For example control traffic would need to be rate limited at
pretty much every entry to every major service provider network, and the effect that any DDOS attack would
have on legitimate control traffic would need to be understood. If the attack came from a very large number of
sources, then the rate at each entry point might be quite low, implying that either the widely deployed rate limits
would need to also be very low, or they would need to be adjusted in response to an attack. All of this would
need to be documented. However, the amount of difficulty that would be encountered in deploying such a
system suggests that this is not an appropriate thing to put into the IETF standards track unless and until
there is clear and well documented motivation for whatever new signaling protocol is being proposed.
It is also possible that a signaling protocol could be used in a sort of "walled garden" scenario, where the
hosts that are permitted to initiate control traffic are known and are protected from compromise. The current
use of RSVP within some enterprise networks could be thought of as one example of such a "walled garden". If
deployment experience of NSIS is collected from the experiment and presented with a clear definition of the
walled garden within which the protocol can be safely operated, then this work might be more likely to be
progressed to standards track (with the description of how and why the deployment is limited to that garden).
Ross (speaking for myself, but having discussed the issue with other IESG members)
Fig. 73
GIST protocol objections explained by Ross Callon [43]
Following the reasoning of routing area director Ross Callon, any general signalling
protocol that aims for end-to-end resource reservations will no longer pass the IESG for
potential denial of service reasons. Either the protocol scope retreats from the end hosts
(and applications) or a “walled garden” scenario is implemented, which strictly limits the
user to network interaction to non-harmful functionality.
Under those circumstances, RSVP appears to be the historical standardization flaw that
will prevail.
Further work has been completed in the field of guaranteed inter-domain QoS reservations
[165]. It introduces a refined version of BGP, called “BGP+”, which is optimized for fast
convergence. BGP+ also includes the ability to judge about a route’s QoS capabilities and
to exchange this information with QOS NSLP and NTLP (GIST). This is required for the
reservation adoption in the case of route changes.
84
17.11.2009
However, this work is again too complex for the aspired generally applicable simple class
of service concept of this thesis. Its strong ties with GIST, the guaranteed resource
reservation and the requirement of a homogenous set of supported classes of service in all
participating provider networks prohibit its usage for the new concept.
Related work on QoS provisioning in a wider sense can include the concept of “PreCongestion Notification (PCN)”. Since resource reservations and traffic prioritization
provide significant QoS enhancement in highly loaded (congested) networks, the
avoidance of congestion by sending rate reductions is an effective means of QoS
provisioning. This goal is targeted by PCN using token bucket metering and packet
marking for early congestion warning. This marking can either guide the intermediate
nodes to select the “right” packet for dropping in the case of congestion or trigger the
egress edge of a PCN domain to inform the ingress edge about the congestion. This
ingress in turn is responsible for admission control to the PCN domain and possibly flow
termination, if the already admitted flows’ QoS degradation does not allow extra flows to
enter the domain. Fig. 74 depicts the major PCN components and its working principle.
Fig. 74
PCN working principle - [136]
From an inter-domain perspective with a general traffic separation scheme in mind, the
PCN concept reveals two major drawbacks. The first is its limitation to a single domain and
the more subtle second limitation comes out from the missing PCN marking encoding in
packet headers. The IPv4 header (see Fig. 1) has no PCN marking bits, but rather 6
DSCP bits and two bits for the explicit congestion notification [156]. However, PCN defines
three level of congestion marking (no congestion, admission stop marking and excess
traffic marking), which need to be encoded in the packet header. The found compromise is
PCN’s limitation on just one DSCP value marking for PCN signalling. That is the so called
“DSCP for Capacity-Admitted Traffic” [17] will be used for PCN and redefines the ECN
bits accordingly. All other DSCP marked traffic can either not be admitted into a PCN
domain, or will be remarked to PCN DSCP or can not be used for PCN marking.
Both strong limitations clearly show, that PCN will not be an equivalent replacement for the
proposed general concept of cross-domain and cross-layer coarse grained Quality of
Service support in IP-based networks as described in chapter 7.
85
17.11.2009
7 New (coarse grained) CoS concept
7.1 Motivation and target
The current situation of Quality of Service support within Autonomous Systems and the
support between ASes at interconnection points differs dramatically.
The fast growing number of Internet users and the rapid increase in access line transmission capacity lead to a steady growth of Internet traffic. Fig. 75 exemplarily shows the
overall exchanged traffic statistics of the German Internet exchange in Frankfurt.
Fig. 75
DE-CIX yearly traffic graph - [62]
Furthermore, service providers are increasingly offering voice over IP or IP-TV services to
their customers. In order to ensure the right transfer quality of those delay-sensitive
services, most providers still choose a high degree of over-provisioning (less than 20%
network load) as the easiest and still cheapest solution. However, European providers
seem to take a leading role in consistently setting up Differentiated Services forwarding
within their network in order to ensure separated traffic handling and to cut down on the
over-provisioning cost. Four to six traffic classes of service are common. However, this
approach can only be consistently and homogenously applied intra-domain. Public
inter-domain interconnections are still run without any QoS support (“Best-Effort (BE)”
interconnection) and solely rely on the over-provisioning solution.
Such interconnected “quality islands” exist independently, peer with BE traffic, perform
costly multi-parameter ingress classification to locally guess and match on the incoming
traffic class, run uncoordinated QoS concepts and might not even be known globally.
86
17.11.2009
Due to the fast access speed increase and the high quality expectations of their customers, service providers are increasingly forced to frequent and costly interface speed
upgrades.
This new coarse grained CoS concept therefore targets the inter-domain BE interconnection and aims for a traffic separating interconnection style without QoS guarantees.
Neighbouring providers are already able to setup such CoS enabled interconnections by
means of mutual SLA-based agreements about the supported classes of service, their
encoding and mapping. However, the new concept extents this manual local interconnection CoS support by means of transitive signalling of available classes of service and their
respective class markings. This enables multi-domain CoS transit paths with automated
class of service transfer adoption.
The locally available CoS support is disclosed to all providers, which will adopt the
forwarded traffic to the neighbour’s class set and encoding. Based on this basic functionality, CoS routing and tunnelling are expected to evolve on top.
Furthermore, the new coarse grained CoS concept extends the traffic separation in locally
available classes of service across networking layers. IP CoS support is the anchor point
of the class set signalling using the inter-domain PHB ID encoding (see Fig. 46). However,
depending of the availability of MPLS tunnels, Ethernet QoS support or virtual channels for
traffic separation, the concept’s signalling associates the classes and their markings of
different layer technologies. This is required, since no definitive standards exist, which
defined this cross-layer mapping of available class sets. However, service providers do
individually define mapping rules within their domain and now have the means to signal
this cross-layer mapping to interconnection partners. This again enables class set
approximation, but now augmented to a consistent traffic separation in all CoS forwarding
enabled layers.
Lastly, the new concept introduces an optional second mechanism, which prevents the
possible excessive misuse of higher priority traffic classes. Class-based ingress limitation
using token bucket metering with associated dropping or remarking rules for excess traffic
protects the CoS enabled AS from overload in high priority traffic classes. Those ingress
filter parameters are signalled to adjacent interconnection partners. This results in a
predictable forwarding behaviour and allows for informed traffic planning and possibly
shaping at the sending AS egress edge.
The new coarse grained CoS concept therefore:
• provides knowledge about the available traffic separations and markings by means
of transitive Cross-domain marking signalling with associated Cross-layer mapping,
• enables marking adoption (and possibly route selection) without guarantees and
• performs fair signalling of class overload limitations and excess traffic handling.
This twofold “free to join” concept about global class set marking signalling with cross-layer
mapping and rate limitation signalling is optimized for simplicity. Quality of Service
guarantees are waived in favour of signalling, metering, debugging and operating
simplicity. QoS in this approach therefore refers to primitive traffic separation into several
classes, which will experience differently prioritized forwarding behaviour in relaying
nodes. Enqueueing in separate queues is aimed for.
Inter-AS Class of Service is targeted by the concept, since simple traffic separation is
identified as key characteristic. If widely applied, the public Internet will evolve into a public
“Betternet” in the future.
87
17.11.2009
The concept has been formulated in two Internet drafts [124] and [125] and was widely
discussed and published in the networking community. The resulting feedback acknowledges the reduced complexity and expressed the preference before the aforementioned
more QoS guaranteeing approaches. The concept is expected to be applied in global
scale, possibly combined with SLA-based QoS guaranteeing solutions at individual
interconnections.
Due to the targeted global deployment, scalability, router resource consumption and
operational stability has been analyzed.
7.2 Usage of BGP for QoS signalling
Signalling Class of Service sets and markings between interconnection partners can either
be performed as piggyback on already deployed protocols or by means of a separate
signalling protocol. Static CoS sets and markings would be a third and in theory the best
solution, when all providers would agree on a single globally available inter-domain CoS
set. However, the latter is neither existent nor likely to be standardized any time soon.
The definition and usage of specialized signalling protocols for the possibly frequent
exchange of load statistics and flow-based quality requests and grants is an appropriate
solution and likely to happen for QoS guaranteeing approaches. The concept of this thesis,
however, does not require such separate handling due to its simplicity and coarse-grained
global signalling scope. Reusing existing protocols with simple extensions is therefore
envisaged.
An attractive reuse candidate out of the existing signalling protocols is the NSIS protocol
family. It has not been chosen for two reasons:
1. None of the existing AS border routers currently runs NSIS protocol entities and
2. The recently observed IETF objections against NSIS seem to dramatically delay its
appearance as a proposed standard of the Internet (see Fig. 73).
The only IETF standard protocol that is readily available at interconnection points for the
reuse purpose is the Border Gateway Protocol. The following two sections explain the pros
and cons of this choice.
Why to use BGP for signalling
BGP is the de-facto interconnection protocol and therefore globally accepted and globally
available. It is a well designed flexible protocol that allows for simple signalling extensions.
BGP exchanges reachability information and can tag this information with route related
attributes. Such attributes have IANA assigned type values listed in respective IANA
registries and associated attribute structures. This attribute and IANA registry approach
allows for the flexible extension of BGP. All attributes (existing or newly defined attributes)
are automatically associated with the network layer reachability information advertised in
the respective BGP UPDATE messages. Attributes of transitive type are even relayed
globally together with the NRLI.
Why not to use BGP for signalling
BGP’s stability is achieved through dampened UPDATE message rates and the concept of
failure confinement within routing areas or confederations (see chapter 2.2.2). Any fast
changing signalling information is therefore not suited for BGP.
BGP might also be avoided, if long lived signalling information can be placed in Internet
Routing Registries (see 5.3) instead of the UPDATE message transport.
88
17.11.2009
Lastly, since all AS border routers of the Internet need to store and process the large and
ever growing BGP communicated reachability information, any extension should barely put
any extra burden on the routers’ resource consumption.
The new coarse grained CoS concept does make use of BGP and its already defined
extended community attribute structure for the following reasons:
1. BGP is readily available for the concept’s deployment, in particular, if widely implemented attribute structures can be used for the CoS signalling,
2. The concept’s CoS signalling is small in size and not rapidly changing,
3. Service providers are familiar with BGP’s community philosophy and can easily
adopt to the proposed CoS extensions and
4. The Internet Routing Registry signalling approach is included in the concept’s
specification for backup and security purposes, but not exclusively relied on. This is
due to the fact, the many service providers are still not making use of IRR information in their border router configurations.
Within BGP, the choice of Extended Community attributes for the CoS signalling has been
made, since the container size of 8 byte is sufficient and the automatism of associating
attributes with all NLRI of an UPDATE message matches the concept’s target of signalling
simplicity and efficiency. Specification details are outlined in chapter 7.3.
The different approach in related work (see chapter 6) of using a separate address family
for “QoS route” signalling has been deselected. The new coarse grained CoS concept’s
signalling information is expected to be associated with the majority of Internet routes. The
use of a separate address family would require doubled signalling for reachability and CoS
support purposes, which is not an efficient signalling solution.
7.3 Definitions and information processing
The following two sections outline the design principle, attribute definitions and processing
as specified in the respective two IETF draft documents [124] and [125]. They are an
integral part of this thesis work.
7.3.1 BGP extended community attribute for CoS marking
Cross-domain CoS marking and cross-layer mapping signalling is specified in “draft-knollidr-qos-attribute-04” [124] as follows.
Reachability information of IP prefixes is augmented by possibly several instances of a
new BGP Extended Community. Each instance signals the availability of a certain class of
service together with its technology dependent marking encoding. Several such Extended
Communities are needed in order to signal more available classes as well as more
associated cross-layer representations in other networking technologies.
As a design principle, only the IP prefix originating AS is allowed to initially associate such
a set of Extended Communities of supported classes with the advertisement of their own
prefixes. Neighbouring and more distant ASes will then:
- learn about the available classes and marking encodings,
- possibly use the information for best path or multi-path decision making,
- relay the respective best path and associated transitive attribute information to their
neighbours – possibly adopting the signalled locally applied marking and
89
17.11.2009
use the learned class marking for downstream packet forwarding (including possible
remarking at the outgoing edge interface).
Transit ASes perform class marking approximation for an as close as possible class set
mapping and forwarding adoption.
ASes are free to ignore single classes or cross-layer mappings of the classes, but need to
indicate this by means of a provided “ignore” flag.
Fig. 76 depicts the resulting signalling and traffic forwarding procedure.
-
Fig. 76
Cross-Domain CoS marking concept
Several QoS Marking communities may be included in a single BGP UPDATE message.
They are virtually linked together by means of an identical "QoS Set Number" field. Each
QoS Marking community is encoded as 8-octet tuple, as defined in [124]. Signalled QoS
Class Sets are assumed to be valid for traffic crossing this AS. If different QoS strategies
are used with an AS, its provider is responsible for consistent transport of transit traffic
across this inhomogeneous domain. In all transit forwarding cases, QoS based tunnelling
mechanisms are the means of choice for transparent traffic transport.
The availability of the "Best Effort" forwarding class is implied and defaults to a zero
encoding on all signalled layers. It is therefore not necessary to include QoS Marking
communities for the Best Effort Class as long as the default encoding is in place.
7.3.1.1 Extended Community Type
The new QoS Marking community is encoded in a BGP Extended Community Attribute
[161]. It is therefore a transitive optional BGP attribute with Type Code 16. The actual
encoding within the BGP Extended Community Attribute is as follows.
The QoS Marking community is of regular type which results in a 1 octet Type field
followed by 7 octets for the QoS marking structure. The Type is IANA-assignable and
marks the community as transitive across ASes. The type number has been assigned by
IANA to 0x04 (see Fig. 77). Optionally, a non-transitive Type value assignment of 0x44 is
provided, which allows for the AS internal marking information exchange. The community
format remains untouched for this non-transitive version.
Fig. 78 depicts the BGP Extended Community Attribute structure.
90
17.11.2009
http://www.iana.org/assignments/bgp-extended-communities
Border Gateway Protocol (BGP) Data Collection Standard Communities
(last updated 2009-06-02)
…
Registry Name: BGP Extended Communities Type - regular, transitive
Reference: [RFC4360]
Range
Registration Procedures
----------- --------------------------------------0x90-0xbf Standards Action/Early IANA Allocation
0x00-0x3f First Come First Served
Registry:
Type Value Name
Reference Registration Date
----------- ------------------------------------ --------- ----------------0x04
QoS Marking
[Knoll] 2008-12-30
0x05
CoS Capability
[Knoll] 2009-05-18
Registry Name: BGP Extended Communities Type - regular, non-transitive
Reference: [RFC4360]
Range
Registration Procedures
----------- --------------------------------------0xd0-0xff Standards Action/Early IANA Allocation
0x40-0x7f First Come First Served
Registry:
Type Value Name
Reference Registration Date
---------- ------------------------------------- --------- ----------------0x40
Link Bandwidth Extended Community
[draft-ietf-idr-link-bandwidth-00] 2009-05-18
0x44
QoS Marking
[Knoll] 2008-12-30
…
Fig. 77
IANA registry for BGP Extended Community type numbers
As already made clear in chapter 2.2.2, it is important to distinguish between the “transitive
attribute” and a “transitive community”. This depicted attribute structure is by default of
transitive type and will therefore be always relayed across ASes – regardless of the actual
processing of it. A special marking flag, so called “partial flag”, is defined for BGP path
attributes, which will be set by ASes, which do not interpret such an Extended Community
Attribute. With the decision to use Extended Community Attributes as “container structure”
for the Extended Communities for CoS marking, it is ensured, that the signalling relay will
actually reach all ASes of the Internet. However, one limitation still exists in practise and
that is, that providers might decide to generally suppress the relay of Extended Community
Attributes no matter what communities are enclosed. In such a case, all ASes up to the
blocking one will receive the class set information and might make use of it. Further
upstream ASes, however, will not receive the CoS signalling via this relay path.
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 0 0 x 0 0|
|
+-+-+-+-+-+-+-+-+
7 octet QoS Marking community structure
|
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fig. 78
BGP Extended Community Attribute structure with type 0x40 or 0x44
BGP UPDATE messages can by definition only include a single BGP Extended Community Attribute. However, each attribute can enclose several Extended Communities. Such
Extended Communities are in turn again classified as “transitive” or “non-transitive”
community type. Here, “transitive” stands for the distinction of whether a community can
be signalled across an eBGP session or whether the community is confined to communi-
91
17.11.2009
cation sessions with iBGP peers only. The CoS marking Extended Community has been
assigned a transitive as well as a non-transitive type number to give providers the choice
for AS external or AS internal only usage of the signalling structure. The remaining
explanation will assume the usage as transitive type, since cross-domain signalling is an
important target of the concept.
7.3.1.2 QoS Marking Extended Community Structure
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 P R I A 0 0| QoS Set Number|Technology Type| QoS Marking Oh|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| QoS Marking Ol| QoS Marking A |0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fig. 79
Structure of the QoS Marking Community
As shown in Fig. 79, each signalled Extended Community contains a “Flags” field, a “QoS
Set Number”, a “Technology Type” and two “QoS Marking” fields.
The first octet contains four flags, ‘P, R, I and A’, which are used to indicate processing
status and results. The 'P' flag indicates the preservation of incoming markings during the
transit forwarding process. The IP prefix originating AS should set the flag to '1', which is
otherwise implied by an AS_PATH length of 1 AS. Transit ASes must set the flag to '1', if
the advertised Marking A is accepted at the ingress and is sent out unchanged at the
egress. That is, no remarking occurs - neither for marking adoption with the neighbouring
downstream AS nor by resetting the markings. This flag field is set and cleared by each
relaying AS according to its handling of markings - irrespective of the possible ignorance of
the particular Marking A in the internal per hop forwarding behaviour.
The "R, I and A" flags are set to '0' in the advertisement by the IP prefix originating AS.
Transit ASes must change the flag value to '1' once the respective event occurred. If the
QoS marking actively used in the transit AS internal forwarding is different from the
advertised original one, the 'Remarking (R)' flag is set to '1'. This must be signalled
separately for each technology type community within the set of Extended Communities.
The same applies to the 'Ignore (I)' flag, if the respective advertised QoS marking is
ignored in the transit AS internal forwarding. The 'Aggregation (A)' flag must be set to '1' by
the UPDATE message relaying transit AS, if the respective IP prefixes will be advertised
inside an IP prefix aggregate constituted from differing Class Sets.
The handling of prefix aggregation is vital for routing table size reduction and routing
stability. However, this aggregation can easily result in the merging of routes to the more
specific prefixes with differing class of service sets. In this case, the aggregator becomes
the IP prefix originating AS for the prefix aggregate and is responsible for the mapping
between the upstream signalled merged class set and the downstream available differing
class sets. It is the provider’s responsibility to care for close class set approximation in
terms of forwarding and marking behaviour.
If the defined "R, I and A" flags are cleared - and by means of the cleared 'Partial' flag of
the BGP attribute it is shown, that no "QoS Class ignorant" AS is involved in the forwarding
path – a consistent class based overall traffic separated forwarding is available along this
path.
Several single QoS Marking communities can be logically grouped into a QoS Marking
community Set characterized by a identical QoS Set Number. This grouping of the single
QoS Marking communities into a set provides cross-layer linking between the QoS class
encodings. The number of signalled QoS Marking communities as well as QoS Marking
community Sets is at the operator's choice of the originating AS. The enumerated QoS set
numbers have BGP UPDATE message local significance starting with set number 0x00.
92
17.11.2009
Since all signalled marking are networking technology specific, the Technology Type field
indicates, which technology the marking refers to. Extensive searching has been performed in the course of defining this signalling for existent technology type enumerations.
The closest result was the “IANAifType-MIB” enumeration [96]. However, this enumeration
is far too detailed and the registry maintainers have discouraged its usage for existing
consistency weaknesses. Therefore, a short and simple enumeration has been defined as
shown in Table 12.
Table 12
Value
0x00
0x01
0x02
0x03
0x04
0x05
0x06
Technology Type Enumeration
Technology Type
DiffServ enabled IP (DSCP encoding)
Ethernet using 802.1q priority tag
MPLS using E-LSP
Virtual Channel (VC) encoding using separate channels for QoS forwarding /
one channel per class (e.g. ATM VCs, FR VCs, MPLS L-LSPs)
GMPLS - time slot encoding
GMPLS - lambda encoding
GMPLS - fibre encoding
The two most important fields of the new QoS Marking Extended Community structure are
the QoS Marking O and A field. The interpretation of these fields depends on the selected
layer and technology. ASes, which process the community and support the given QoS
Class by means of a QoS mechanism using bit encodings for the targeted behaviour (e.g.
IP DSCP, Ethernet User Priority, MPLS TC etc.) must use a copy of the encoding in the
"QoS Marking A" community field. Unused higher order bits default to '0'. Other technologies, which use separate forwarding channels for different classes (such as L-LSPs,
VPI/VCI inferred ATM classes, lambda inferred priority, etc.) shall use class enumerations
as encoding in this community field. The enumeration count starts with zero for the best
effort traffic class and rises by one with each available higher priority class. There are two
QoS Marking fields within the QoS Marking community for the "original (O)" and the "active
(A)" QoS marking. Higher order bits of those fields, which are not used for the respective
behaviour encoding, default to zero.
The QoS Marking O (Original QoS Marking) field is a 16 bit QoS Marking field, which
consists of a high ("Oh") and a low ("Ol") octet. The IP prefix originating AS copies the
internally associated QoS encoding of the given Technology Type into this one octet field.
The field value is right-aligned depending on the number of encoded bits. For the IP
technology, the encoding of Per Hop Behaviour Codes has to follow the definitions stated
in [31]. The field must remain unchanged in BGP UPDATE messages of relaying nodes.
QoS Marking A (Active QoS Marking) and QoS Marking O must be identically encoded by
the IP prefix originating AS, except for the case, where IP technology Per Hop Behaviours
are addressed. "QoS Marking A" will always contain the locally applied encoding for the
targeted PHB. All other ASes use this Active QoS Marking field to advertise their locally
applied internal QoS encoding of the given class and technology at the interconnection
point. The field value is right-aligned depending on the number of encoded bits. A cleared
Marking field (all zero) signals that this traffic class experiences default traffic treatment
within the transit AS forwarding technology.
7.3.1.3 QoS Marking Extended Community Usage
Providers may choose to process the QoS Marking communities and adopt the behaviour
encoding and tunnel selection according to their local policy. This may also lead to
different IGP routing decisions or even effect BGP update filters.
93
17.11.2009
Only the IP prefix originating AS is allowed to signal the QoS Marking communities and
Sets. All advertised prefixes, which originate from that AS will be sent with the same QoS
Marking community Set in the respective UPDATE message. Transit ASes must not
modify or extend the QoS Marking community Set except for the update of each 'QoS
Marking A' field contained in the community Set and the respective "P, R, I, A" flags.
Prefixes with associated identical QoS Marking community Sets are to be advertised
together in common UPDATE messages in relaying nodes.
Fig. 80 shows an AS interconnection example with different Class Sets. It shows the case
in AS 5 where different Class Sets are used internally and externally. The proposed QoS
Class Set signalling will always use the external definitions within the UPDATE message
QoS Marking communities. The example also shows, that IP prefixes, which originate in
AS 5 and AS 3 can be advertised together with the same QoS Marking community Set as
long as their Layer 2 encoding is identical.
Fig. 80
CoS enabled AS interconnection example topology
IP packet forwarding based on packet header QoS encoding might require remarking of
packets in order to match AS internal policies and encodings of neighbouring ASes.
Identical QoS class sets and encodings between neighbouring ASes do not require any
remarking. Different encodings will be matched on the outgoing traffic. Outgoing traffic for
a given IP prefix uses the 'QoS Marking A' information of the respective BGP UPDATE
message QoS Marking community for adopted remarking of the forwarded packet. If the 'I'
flag is set for a given encoding, the outgoing traffic remarking should still be applied
despite the signalled lack of QoS Class forwarding support. This is particularly important, if
the preserve flag 'P' is signalled together with the 'I' flag.
Several IP prefixes of different IP prefix originating ASes may be aggregated to a shorter
IP prefix in transit ASes. If the original Class Sets of the aggregated prefixes are identical,
the aggregate will use the same Set. In all other cases, the resulting IP prefix aggregate is
handled the same as if the transit AS were the originating AS for this aggregated prefix.
The transit provider may care for AS internal mechanisms, which map the signalled
aggregate QoS Class Set to the different original Class Sets in the internal forwarding
94
17.11.2009
path. In case of IP prefix aggregation with different QoS Class Sets, the 'Aggregation (A)'
flag of each QoS Marking community within the Set must be set to '1'.
7.3.1.4 Confidentiality Considerations
The disclosure of confidential AS intrinsic information is of no concern since the signalled
marking for QoS class encodings can be adopted prior to the UPDATE advertisement of
the IP prefix originating AS. This way, a distinction between internal and external QoS
Class Sets can be achieved. AS internal cross-layer marking adaptation and policy based
update filtering allows for consistent QoS class support despite made up QoS Class Set
and encoding information within UPDATE advertisements. In case of such policy hiding
strategy, the required AS internal ingress and egress adaptation shall be done transparently without explicit "Active Marking" and 'R' flag signalling.
7.3.1.5 QoS Marking Example
The example AS is advertising several IP prefixes, which experience equal QoS treatment
from AS internal networks. The IP packet forwarding policy within this originating AS
defines e.g. 3 traffic classes for IP traffic (DSCP1, DSCP2 and DSCP3). These three
classes are also consistently taken care of within a TC bit supporting MPLS tunnel
forwarding. The BGP UPDATE message for the announced IP prefixes will contain the
following QoS Marking community Set together with the IP prefix NLRI.
95
17.11.2009
Fig. 81
QoS Marking Extended Community signalling example
7.3.2 BGP class of service interconnection
Class-overload prevention is specified in “draft-knoll-idr-cos-interconnect-03” [125] as
follows.
The new coarse grained CoS concept is a twofold concept and provides in its second half
an optional mechanism, that prevents the possible excessive misuse of higher priority
traffic classes. Class-based ingress limitation using token bucket metering with associated
dropping or remarking rules for excess traffic protects the CoS enabled AS from overload
in high priority traffic classes. Those ingress filter parameters are signalled to adjacent
interconnection partners. This results in a predictable forwarding behaviour and allows for
informed traffic planning and possibly shaping at the sending AS egress edge. This fair
and square interconnection limitation signalling is specified using two BGP attributes.
Two new transitive attributes are specified, which enable adjacent peers to signal Class of
Service Capabilities and token bucket Class of Service admission control Parameters. The
new "CoS Capability" is deliberately kept simple and denotes the general EF, AF Group,
BE and LE forwarding support across the advertising AS. The second "CoS Parameter
Attribute" is of variable length and contains a more detailed description of available
forwarding behaviours using the PHB ID Code encoding. Each PHB ID Code is associated
with rate and size based traffic parameters, which will be applied in the ingress AS Border
Router for admission control purposes to a given forwarding behaviour.
96
17.11.2009
A Basic Set of supported Classes, called "Basic CoS" is defined here, which consists of
the primitive "Best Effort (BE)" PHB, the "Expedited Forwarding (EF)" PHB [60], the
"Assured Forwarding (AF)" PHB Group [88] and the "Lower Effort" Per-Domain Behaviour
(PDB) [34]. Providers, which can support this Basic CoS, signal this capability to their
interconnection partners by means of the new CoS Capability Extended Community
defined below.
4 AF PHB classes have been defined so far, which will be grouped into the generally
signalled "AF Group". That is, as long as the AS provider can support at least one out of
the 4 AF classes in his externally supported CoS Set, this AS is regarded as AF capable.
A second transitive attribute is defined for parameter signalling about the applied access
control within the ingress AS border router. The reason for this traffic limitation is the fact,
that certain high quality forwarding behaviours can only be achieved, if the percentage of
high priority traffic within the traffic mix lies below a certain threshold. This attribute informs
the interconnection partner about the applied limitation, which can in turn be used to
perform traffic shaping at the neighbouring AS egress. The attribute allows this limitation
signalling either associated to the NLRI within the same UPDATE message or with "global"
scope to describe the generally applied ingress limitation.
Both attributes are likely to be used together, if ingress class limitation is used for the
respective AS.
Fig. 82 depicts the resulting class overload limitation concept and outlines, how excess
traffic can either experience dropping or remarking punishment actions.
Fig. 82
Class overload limitation concept
7.3.2.1 CoS Capability Extended Community Structure
The CoS Capability Extended Community is encoded as BGP Extended Community path
attribute as described in section 7.3.1.1. It is deliberately kept very simple and is defined
as outlined in Fig. 83. It is a transitive Extended Community of regular type with the IANA
assigned type value of 0x05 (see Fig. 77). The binary encoded support of per hop
behaviour classes is detailed in Table 13.
97
17.11.2009
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|B E A L| Currently Unused - default to '0'
|
|E F F E|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Currently Unused - default to '0'
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fig. 83
CoS Capability Extended Community Structure
Table 13
CoS Capability Attribute – binary class encoding
Bit
0
1
2
3
4 .. 7
Flag
BE
EF
AF
LE
unused
Encoding
Default to ‘1’ to signal general “Best Effort” PHB support
‘1’ … “Expedited Forwarding” PHB support [60]
‘1’ … “Assured Forwarding” PHB group support [88]
‘1’ … “Lower Effort” PHB support [34]
Default to ‘0’
The implied Per-Hop-Behaviour Identification Codes follow the definition as standardized
in RFC 3140 [31]. The AF Group needs to consist of at least one of the currently available
AF1x, AF2x, AF3x and AF4x.
Fig. 84
Per-Hop-Behaviour Identification Codes implied by CoS Capability
98
17.11.2009
7.3.2.2 CoS Parameter Attribute structure
The second attribute is a new optional transitive BGP path attribute of variable length. The
attribute type number of 0xFF is currently used as specified in RFC 2042 [133]. The
attribute contains one or more of the following token bucket parameter sets as shown in
Fig. 85.
Fig. 85
CoS Parameter Attribute structure
The PHB ID Code field associates the respective signalled PHB support with the consecutively followed token bucket parameter set. This parameter set follows the specifications as
given in RFC 2210 [182], which is also used within the IETF Integrated Services architecture.
Only two flags (‘G’ and ‘DR’) are defined within the one octet Flags field. The 'G' flag
signals, whether the limitations have global scope on all incoming traffic ('1') or are
associated to traffic that is destined to destinations within the NLRI of the UPDATE
message ('0'). NLRI specific limitation will supersede globally signalled ones for traffic
destined to those NLRI destinations.
The 'DR' flag signals the applied handling of non-confirming traffic. DR='0' signals strict
dropping of excess traffic. DR='1' signals the performed remarking of excess traffic
packets to Best Effort traffic marking.
In order to correctly identify the originator of the signalled limitations, the “ASN of sending
AS” holds the corresponding AS number. Depending on the 2-octet or 4-octet AS peering
type, the sending AS of the attribute must encode its AS number as right-aligned 32 bit
number.
7.3.2.3 CoS Parameter Attribute Usage
The signalled parameters are used for PHB ID Code based ingress limitation. Depending
on which PHB ID Codes a BGP peer signals in this attribute to its neighbour, it is said, that
the respective PHB ID Code is supported and will experience the defined limitations.
99
17.11.2009
Those limitations can be applied to all incoming traffic of a specific PHB ID Code (marked
as 'G') or only for incoming traffic, that is destined for the NLRI of the given UPDATE
message. The resulting treatment for non-confirming traffic is signalled through the 'DR'
flag.
All limitations have AS local scope for the advertising AS and the neighbouring AS might
or might not adopt its sending behaviour to those advertised limitations.
Despite of the transitive nature of the new attribute, its usage for ingress limitation is
confined to neighbouring ASes. Processing of the conveyed parameters is only valid for
peers, who are peering with the AS specified in the ASN field of the attribute. The attribute
should not be transitively relayed to non-adjacent interconnection partners.
Since non-transitive BGP path attributes can be sent out into eBGP peering sessions, the
CoS Parameter attribute would have been sufficiently defined as of non-transitive type.
However, current commercial routers are not aware of this new attribute and will silently
discard it. Therefore, the attribute has been defined as transitive type in order to allow for
remote router configuration control as outlined in chapter 12.
100
17.11.2009
8 Mapping strategies
8.1 Problem statement
The new coarse grained CoS concept of this thesis allows for the cross-domain and crosslayer signalling of supported classes of service, their marking encoding and mapping.
The number of supported classes, their networking technology dependent marking and in
particular their mapping between networking technologies is chosen independently by
different network providers. The intra-domain mapping between technologies and the interdomain mapping between class sets of each layer constitutes the heterogeneity of the
internetworking situation.
The targeted aim in all such mappings is, that the QoS efforts taken up in one layer or
domain should not be destroyed in the other.
Fig. 86
Classification of the Mapping scope
Three general criteria need to be distinguished:
1. QoS enabled networking technology (e.g. DiffServ, Ethernet QoS, MPLS E-LSPs),
2. Class encoding (marking) and
3. Associated parameter and treatment characteristics.
8.1.1 mapping between different class sets of the same layer
This section outlines the difficulties in mapping operations within the same layer and
specifically within the same networking technology.
The simplest case is the 1:1 mapping between class sets with the same number of classes
and associated treatment characteristics, but differing markings. Simple mapping can be
performed as long as the resulting remarking is set up consistently. Such a mapping would
101
17.11.2009
not change the experienced forwarding behaviour, but rather enable providers to freely
choose the marking value space.
In terms of this new CoS concept, the signalling would include the same technology
encoding, the same Marking O, but differing Marking A.
The more likely and complex mapping occurs, if n original classes would be mapped into
m resulting classes. This results in a mapping table with the following categorisation:
• n > m Æ Class aggregation,
• n = m Æ 1:1 mapping as described above and
• n < m Æ Class splitting or Class wastage.
The latter case appears to be the easier one, since no traffic merging is possibly required
and class marking can be preserved. The degree of traffic separation, as targeted in this
coarse-grained concept, is preserved and no detrimental effects are expected.
This holds true, if all incoming traffic flows into the finer-grained QoS enabled AS arrive
with the same limited class set n as described above. However, the respective AS merges
those incoming traffic flows onto its internal forwarding paths and the merging might lead
to some finer-grained traffic taking up higher priority classes as compared to the coarsegrained ones. For example, a 5:7 mapping as opposed to a 3:7 mapping might end up with
a higher link share on the combined forwarding paths. This can be addressed by a priority
scaled mapping, where the scaling in the 5:7 case, leads to 40% priority increase and in
the 3:7 case to a 233% priority increase. The uniform distribution of class priorities in this
simple case would assume a linear priority scheme in the original and resulting class sets.
Un-scaled and scaled mapping would leave the additional classes unused. The difference
only lies on the decision, which classes remain unused.
Class splitting, however, would make use of all additional classes and differentiate the
incoming traffic further. Such an approach could not only rely on the class markings, but
rather perform further classification on additional header fields. Depending on the
networking technology header, this could be the protocol or type field, the destination MAC
or IP address etc. In general, this could also result in waiving the original classification and
performing multi-layer packet inspection and re-classification.
Class aggregation is the opposite case, where existing traffic classes are merged onto a
common forwarding treatment. This can either be realised through a remarking function,
where several incoming class markings are merged into one resultant marking or through
the unchanged marking, but merged forwarding treatment. The latter will be referred to as
“funnel treatment”. For the considered AS, the resulting forwarding treatment will be the
same for each traffic class. However, the original traffic separation at the outgoing edge of
the AS domain can not be restored in the remarking case and the smallest degree of traffic
separation will prevail along the remaining forwarding path.
Marking preservation is the preferred solution. The required funnel treatment can either be
achieved through classification merging at each relaying AS internal node or through
tunnelling. Either, the markings remain untouched within the tunnel and the tunnel marking
determines the forwarding behaviour, or separate tunnels can be used to de-aggregate the
traffic flows and markings.
One specific tunnelling solution for all IP networks is specified in RFC 2983 [30] and
formerly in RFC 2003 [152]. This IP in IP encapsulation is attractive for three reasons:
1. The encapsulated IP header remains entirely unchanged,
2. The mapping result and forwarding header processing is solely based on the outer
IP header and
3. All internetworked ASes are capable of providing IP tunnelling as long as their edge
devices can handle the encapsulation and de-capsulation procedures.
102
17.11.2009
Fundamental drawbacks of this IP tunnelling are the increased processing load on edge
devices and the reduced maximum transmission unit size for transit payloads. At least 20
byte MTU reduction will occur due to the outer IP header.
The same behaviour can be achieved with the more generic IP-based encapsulation
protocol, called “Generic Routing Encapsulation (GRE)”, defined in RFC 2784 [72]. IP
encapsulation within GRE would add at least another 4 byte MTU reduction for the minimal
GRE header structure.
Existing class aggregation recommendations for DiffServ classes and Ethernet priorities
are listed in chapter 8.2.
Encapsulation (tunnelling) of transit (customer) traffic is highly recommended by this new
coarse grained CoS concept. IP-in-IP tunnelling, MPLS label switched paths or some kind
of Ethernet encapsulation (refer to chapter 4.2) are strongly encouraged for traffic
separation preservation and ease of CoS deployment.
8.1.2 mapping between different class sets of different layers
Service providers, who offer Differentiated Service in the IP layer, are likely to undergird
this approach with QoS mechanisms in lower layers. Typical QoS support of such link
layer technologies are listed in chapter 4.
Ideally, the underlying QoS support exactly matches the granularity and forwarding
treatment requirements of the DiffServ PHBs. However, in practise this can hardly be
achieved due to the differing number of available classes and the lack of standardized
mappings between the CoS sets of each technology. Service providers will therefore
independently decide on the cross-layer mappings and classification policies applied within
their networks. Those configuration decisions are under sole control of the respective
providers and therefore expected to be setup consistently and appropriately. The resulting
⎛n⎞
mapping ⎜ ⎟ is a vertical association of n classes of service and their markings to x
⎝ x⎠
underlying classes of service and their markings in the lower layer technology. This can
possibly be setup in multiple rows, if several encapsulating technologies are in place.
For the inter-domain case, the explained n:m mapping within the same layer is therefore
multiplied by the number of technology combinations (such as e.g. IP in MPLS-LSP in
Ethernet in FR-VC) in lower layers.
⎛n : m⎞
⎜
⎟
The matrix ⎜ ... ⎟ is the resulting layered mapping table in a fully meshed cross-layer
⎜x: y ⎟
⎝
⎠
mapping case. Here, each original class within each encapsulation is mapped into a
resulting class in each new encapsulation. Such cases need to be addressed in interdomain cases, where the traffic exchange at an interconnection point is not just IP based,
but encompasses inter-domain tunnels.
103
17.11.2009
⎛n:n⎞
⎜
⎟
Here, either, the encapsulated class set and marking will prevail ⎜ ... ⎟ or the tunnel CoS
⎜ x: y⎟
⎝
⎠
⎛n : m⎞
⎛n : n⎞
⎜
⎟
⎜
⎟
will be harmonized ⎜ ... ⎟ or the encapsulated and the tunnelled CoS ⎜ ... ⎟ are forced
⎜x:x ⎟
⎜ x : x⎟
⎝
⎠
⎝
⎠
to be identical between the interconnection partners.
As mentioned earlier, this intra-domain and possibly inter-domain tunnelled forwarding is
particularly important at places, where class aggregation needs to be performed. This
aggregation would be applied by means of a reduced number of tunnel-based forwarding
treatments, but will preserve the traffic separation granularity for the encapsulated traffic.
The respective signalling of marking preservation is available through the “P” flag in the
QoS Marking Extended Community.
8.2 Existing recommendations
In the course of work on this thesis, a number of readings and talks revealed, that service
providers are occasionally involved in QoS related research and discussions, but are
reluctant to turn QoS on for inter-domain interconnections. This explains the current gap
between the many QoS proposals and the still missing deployment. However, deployment
recommendations are incentives and guidelines to make configuration easier and
concepts more acceptable. As outlined in chapter 4 and 6, numerous QoS specifications
and approaches exist, which have not yet led to QoS enabled interconnections in large
scale. Differentiated Services, however, is widely accepted and increasingly applied within
ASes. The same applies to MPLS and its TC bits based DiffServ support. Since the
protocols of IETF dominate the AS interconnection and the capabilities of devices within
the ASes at each side, the following section will concentrate on recommendations given in
RFC 4594 [16] and RFC 5127 [45]. Ethernet specifications for user priority support and the
respective priority setup specifications will follow, based on the definitions of IEEE 802.1D
[97].
It should be noted, that the large variety of configurable class sets, their encoding and
intra-layer and cross-layer mapping contributes to the complexity and uncertainty of an
overall class of service based packet forwarding transport. The lack of a single standardized and globally supported class set and its encoding becomes obvious.
However, service providers are reluctant to apply such a fixed class of service support and
might not be able to define the common CoS base for such a standard activity. This is the
sacrifice for standing out from each other in competition and for the valuable freedom of
administration of the respective AS. On the contrary, the introduction of PHB IDs revealed,
that service providers even requested a 16 bit encoding instead of the 6 bit DSCP
encoding.
Since the currently applied fallback solution of mere over-provisioning is expected to
exceed the capital and operational expenditure budget of declining interface cost
limitations, coarse-grained CoS interconnection with class-based over-provisioning will
arise.
Existing recommendations will be taken up for granted by larger Internet Service Providers
and their smaller interconnection partners will adopt it. This is claimed to happen for simple
DiffServ-based class setups and a respective recommendation is given in chapter 8.3.
Cross-layer mappings will be applied AS-internally, but might evolve inter-domain as well.
104
17.11.2009
Especially the IXP-based CoS interconnection is expected to be used and the new
concept provides the means for a consistent automatic setup.
Configuration Guidelines for DiffServ Service Classes - RFC 4594
The fundamental guideline document for the deployment of DiffServ is RFC 4594 [16],
which gives configuration guidelines to network operators for sensible service class
selection, their usage and construction out of queueing, traffic management, PHB
selection and DSCP marking elements. All definitions therein are given as recommendations and the authors point out several times, that all can be changed and applied
differently, at the provider’s choice. However, they do point out consistency as being of
importance for interoperability.
The universal deployment of this DiffServ Service Class recommendation is, as the name
suggests, based on the “Service Class” approach. That is, all applications that generate
similar traffic characteristics and require similar traffic forwarding behaviour can be
grouped into classes, so called “Service Classes”. The required characteristics of the
traffic aggregate are represented by a PHB, which will in turn be encoded as DSCP value.
Network traffic has been classified into two groups: “network control traffic” and
“user/subscriber traffic”. The first group is divided into two service classes, being "Network
Control" and “OAM”.
As shown in Fig. 87, in the “user/subscriber traffic” group, ten Service Classes have been
identified. They are grouped into four application categories and reflected against the so
called “End-user multimedia QoS categories” as defined by ITU-T in G.1010 [118].
Fig. 87
User/Subscriber Service Classes Grouping - [16]
The service classes in turn are associated with descriptive characteristics as well as with
coarse statements about the respective loss, delay and jitter tolerance. No absolute values
are given, but treatment orientation can easily be taken out from this definition. Fig. 88
depicts the table of associated characteristics.
105
17.11.2009
Fig. 88
Service Class Characteristics - [16]
All service classes are given DSCP encoding names and values together with application
examples. This sums up to 20 DSCP encodings for all 12 service classes as shown in Fig.
89.
Lastly, each service class is guided by recommendations for applicable conditioning at the
ingress of the network, the queuing and the queue management. The resulting table is
depicted in Fig. 90.
It is important to note, that a “Low-Priority Data” Service Class has been defined here with
CS1 (‘001000’) encoding. This is of particular interest, since the support of Lower Effort
PDB (RFC 3662 [34]) is expected to become important for inter-domain deployment and
matches the recommended set of service classes.
In practise, DiffServ support will not start off with all 12 service classes being configured at
once. A start with fewer classes and a gradual increase as differing demand arises is likely
and at the same time supported by the recommendations.
106
17.11.2009
Fig. 89
DSCP to Service Class Mapping - [16]
Fig. 90
QoS Mechanisms Used for Each Service Class - [16]
107
17.11.2009
Given the flexible and coarse design of the service class recommendations, its readiness
for application as well as the authors’ reputation and company support, it is likely to find
such class sets being configured and deployed within provider networks. Since standard
DSCP values are specified in the configuration guidelines, a smooth interworking between
providers’ networks is feasible, possibly with a reduced number of service classes.
Aggregation of DiffServ Service Classes – RFC 5127 [45]
As described above, service providers are likely to adopt the DiffServ recommendations
given in RFC 4594. However, the possible granularity of 12 service classes with up to 20
DSCP encoding values is far too detailed in core network areas. RFC 5127 [45] therefore
gives guidelines on how multiple service classes may be aggregated into a fewer set of so
called “forwarding treatment aggregates”. The usage and DSCP encoding of RFC 4594
based Service Classes is assumed and funnel mapping of several different DSCP
encoded traffic flows with about the same forwarding treatment requirements into one
treatment aggregate is specified. Marking preservation is expected.
Treatment aggregates are created by funnel mapping of Service Classes grouped by loss,
delay, and jitter requirements.
Fig. 91 exemplarily shows the funnel mapping of the 12 Service Classes into e.g. 4
treatment aggregates.
Fig. 91
Treatment Aggregate / Service Class Performance Requirements - [45]
Fig. 92 in turn adds the funnel mapping of inter-DSCP association for the Treatment
Aggregate Behaviour.
108
17.11.2009
Fig. 92
Treatment Aggregate Behaviour - [45]
As mentioned above, such bundling of Service Classes into coarser forwarding treatments
is typically deployed in high capacity core networks, where statistical multiplexing of similar
traffic streams into a few forwarding classes combined with moderate over-provisioning is
manageable and sufficient. Fine-grained QoS schemes in the core do not scale, increase
processing load, hamper debugging etc. and are thus not acceptable by network providers.
The four treatment aggregate example, however, seems to be a realistic, feasible and
acceptable compromise.
Fig. 93
MPLS E-LSP mapping of Treatment Aggregates - [45]
The inter-domain case is also addressed briefly in the recommendation stating, that
peering parties need to agree on the exact treatment aggregate content and representation. This results in mutual agreements and limits the extent of the QoS enable intercon-
109
17.11.2009
nection path. Furthermore, the RFC targets not just traffic separation, but rather Services
Class based interconnection possibly including the MIT QoS scheme (see chapter 6)
combined with precise parameter bounds as given in Y.1541 [108].
Both drawbacks are comprehensible for this higher class QoS interconnection, but distract
from the aim of this thesis.
RFC 5127 stresses the recommendation of marking preservation despite of treatment
aggregate based forwarding. This is completely in line with this thesis’ strong recommendation of tunnelled customer transport.
As a common example, the RFC documents the mapping suggestion of treatment
aggregates into E-LSP of a MPLS domain. This choice is realistic, since nowadays most
providers use MPLS in their network. Fig. 93 depicts the proposed mapping table of the 20
possible DSCP values as of RFC 4594 into 6 TC bit (former EXP bit [9]) encodings of the
E-LSP. Four traffic aggregates are claimed to be supported, but sub-differentiated into two
dropping levels for the default and the assured elastic aggregate.
Again, the lower effort CS1 encoding is consistently used as forwarding treatment with
lowest priority.
All stated aggregate selections, DSCP encodings and TC bit associations are explicitly
defined as recommendations. However, no strong commitment is requested from service
providers.
IEEE 802.1D user priority mapping definitions
Ethernet QoS, in contrast to the above mentioned loose mapping commitment, precisely
specifies the available user priorities, the mapping to strict priority forwarding queues and
priority regeneration rules. Those specifications are part of IEEE 802.1D [97] and have
been already addressed in chapter 4.2.
Ethernet priority tagged frames offer 3 bits for 8 available forwarding priorities (see Table
5). Strict mapping is given for switch devices, which support 8 or less internal queues (see
Table 6). Furthermore, the standard allows for priority reassignment by means of a
configurable regeneration table. The default configuration would simply map the 8 classes
to each other without changes. However, the combination with IP and/or MPLS based
treatment aggregates could well be consistently reflected by mapped user priority settings.
Hence, the queue mapping table can in turn be applied for treatment aggregation in single
step granularities between one and eight resulting priorities (see Table 14).
Table 14
Queue mapping reuse for priority mapping
110
17.11.2009
Systematic QoS Class Mapping Framework
Although not widely deployed, the “Systematic QoS Class Mapping Framework” [150] of
the Information and Communications University in Korea is worth mentioning. It targets the
QoS class mapping in two steps: “Parameter-To- Class mapping” at the forwarding path
ingress and “Class-to-Class Mapping” at the network borders along the way.
Fig. 94
QoS Class Mapping framework - [150]
For this thesis’ mapping analysis, the framework’s class-to-class mapping is of importance.
Since not signalling of supported classes and their encoding is included, the framework is
based on the six ITU-T defined IP QoS classes [108]. In order to extend this limited class
set, the framework introduces a “Location Information (LI)” and performs a parameter-to-LI
mapping at the entry and LI-to-class mapping at each relay boundary. LI is encoded as an
integer in a redefined IP TOS field and addresses a 4x16 (loss x delay bounds) matrix.
However, this interesting approach is unlikely to gain community acceptance because of
the TOS field redefinition and the required LI interpretation and mapping function within all
participating interconnected service provider networks.
ITU-T NGN Focus Group - Proceedings Part II
The NGN Focus Group within ITU-T has also developed a mapping for their defined six IP
QoS classes, which is documented in paper [109]. However, the mapping concept is
limited to the respective classes 0..5 and considers the mapping into four ATM, Frame
Relay and UMTS QoS classes. It is not generally applicable to DiffServ based networks
and will therefore not be considered in detail.
QoS class mapping in Cisco devices
The vendor Cisco gives detailed instructions on how to enable and to configure QoS
support in Ethernet switches (see [55]). This includes configurable mapping tables for
Ethernet priority (called “CoS”) to DSCP mappings with default settings.
As Table 15 and Table 16 show, Cisco relies on the upper 3 bit DSCP value, which relates
to the class selector (former IP precedence) values. The eight priority values are exactly
matched to the respective class selector and in the opposite mapping direction, funnel
mapping based on the upper 3 bit value is performed. Interestingly, one of the most
important DSCP encoding, the one for EF traffic (= 46), is not directly covered. However,
mapping tables are configurable and for example the fully Cisco equipped Chemnitz
University applied a slightly modified mapping as shown in Table 17.
111
17.11.2009
Table 15
Cisco’s default CoS-to-DSCP mapping [55]
Table 16
Cisco’s default DSCP-to-CoS mapping [55]
Table 17
Chemnitz University applied CoS-to-DSCP mapping
MPLS capable devices are as well challenged to define mappings between DSCP
encodings and the 3 bit E-LSP encoding. For consistency and simplicity reasons, service
providers will configure the DSCP <-> CoS and DSCP <-> TC identically. It should be
noted, that this is a best common practice approach and no standard.
RFC 3270 [75] describes the Support of Differentiated Services within MPLS domains.
This includes E-LSP and L-LSP mappings.
112
17.11.2009
E-LSP mapping as mentioned above, is freely configurable, but requires having a
consistent mapping strategy throughout a MPLS domain. If this is not the case for
interconnected domains, consistent remapping at ingress and/or egress needs to be cared
for.
By default, if no mappings are configured, all encodings will map to Default PHB treatment.
Special care needs to be taken for L-LSP encodings, since 20 bit label information and 3
bit TC information is combined for QoS treatment selection. L-LSP paths are associated
with PHB types, which influences the embedded TC bit encoding. Mandatory mapping
exists for TC bit encodings, which must be followed in outgoing marking and incoming
classification. Fig. 95 depicts the bidirectional mapping requirement.
Fig. 95
Mandatory L-LSP encoding rules - [75]
8.3 Coarse grained CoS mapping recommendations
The new coarse grained CoS concept emphasized traffic separation, simplicity and
implicitly customer traffic tunnelling.
Service Class specifications combined with treatment aggregation creates the base for the
selection of proposed simple class sets within the concept.
The wide spread deployment and precise mapping of Ethernet priorities (see Table 14)
adds a possible mapping strategy for dynamic class set granularity.
A subset of eight class encodings is widely available in QoS enabled networking technologies and might be considered as realistic upper bound. However, much simpler setups are
envisioned for the upcoming inter-domain class of service support.
The default behaviour (“best effort (BE)” PHB) will and must always be available and
defaults to an encoding of zero in all technologies. Direct mappings or funnel mappings
with unspecified encoding ranges must always map the unknown codepoints into the
default behaviour.
Secondly, the lower effort (LE) behaviour is strongly encouraged for the inter-domain use
case, since the existing BE interconnection can easily be augmented with an exchange of
lower value traffic. Hence, the first and simplest recommended class set would consist of
BE & LE classes encoded as ‘000000’ / ‘000’ and ‘001000’ / ‘001’ for the six or three bit
representations. As for the mapping between BE & LE and other class sets, the LE will
either be mapped into the LE class and all other classes into BE – resulting in separate
enqueuing for both supported classes with lower scheduling and higher dropping priority
for LE. However, in purely Ethernet related setups, BE and LE would both be enqueued in
113
17.11.2009
the same queue, but with possibly differing dropping probabilities. This single queue
mapping conforms to the Service Class concept, but is not as well recommended for the
simple BE&LE class set case.
A second class setup is envisioned, which does not make use of LE, but rather introduces
EF for high priority, delay sensitive and AF as medium priority forwarding class. The EF
encoding of ‘101110’ would normally map to a priority of 5 (‘101’). However, for Ethernet
related enqueuing and mapping, a priority of 6 (‘110’) is recommended. The AF support, is
not associated to a single precise DSCP encoding. The four AF classes with 3 dropping
priorities each result in 12 possible DSCP values. However, for reasons of simplicity, all
AF encodings will possibly be funnel mapped into a single 3 bit encoding. ‘100’ is
recommended for Ethernet related enqueuing and mapping.
Lastly, a combination of all four basic behaviours (LE, BE, AF and EF) is recommended as
inter-domain class of service support. The mapping of the EF&AF&BE case and separate
low priority enqueuing for LE (‘001000’ / ‘001’) is aimed for.
The described class sets and mapping are recommendations only and can be freely
adapted to specific needs. However, in such cases, consistent signalling of supported
encodings and cross-layer mappings is required by means of the QoS Marking Extended
Community.
If virtual channel encodings are used in the interconnection case, providers at both ends
need to abide by the mandatory marking requirements for ATM, FR and MPLS L-LSPs as
defined in RFC 3290.
114
17.11.2009
9 Simulation results
The new cross-domain and cross-layer coarse grained Quality of Service support concept
aims for both a general and global deployment. Therefore potentially all kinds of traffic will
be carried along short as well as long multi-AS forwarding paths.
This universal usage can not be covered by simulation. However, the comparison of the
new concept’s improved class of service support contrasted with the current best effort
only forwarding will be documented in the following section on simple topology examples
with extensively simulated configuration combinations.
The actual signalling of marking information does not need to be simulated, because
proven BGP UPDATE message based signalling is not fundamentally influenced by the
addition of a few Extended Communities. Practical feasibility and scalability are of more
concern here, which is addressed in real world tests in chapter 11.
Two simulation sections follow below, which address the resulting QoS marking and
forwarding behaviour as well as the functionality of token bucket ingress limitation filters.
9.1 Setup selection for QoS marking and forwarding
The packet transmission of different traffic type flows across differently configured single
nodes as well as ASes has been simulated extensively. Besides the network topology,
parameters have been varied, such as the number of traffic sources with appropriate traffic
markings, the number of supported classes of service along the forwarding path as well as
the scheduling and queuing configurations applied within the relaying nodes. The
recommended simple CoS setup consisting of at most LE & BE & AF and EF coarse
grained classes has been used. Three queuing disciplines have been varied, which are
“no priority = round robin queuing”, “strict priority queuing” and “class based weighted fair
queuing”.
Varying those parameters resulted in more than 3000 simulations, which will be documented using a few selected examples. The complete result set is available upon request.
The simulation of QoS marking and forwarding influence follows in scenario 1 to 7 in
different topological granularities. Starting from a single node interconnection, the topology
grows into interconnected ASes, into multiple interconnected ASes up to a chain of four
interconnected transit-AS with multiple stub-ASes at either side. Scenario 7 concludes the
marking and forwarding simulation section with the comparative simulation of cross-layer
interworking between IP and Ethernet QoS enabled forwarding.
The simulator OMNeT++ [12] has been used for the majority of the simulations. It is a
modular, C++ programmed, discrete event simulation framework, which is freely available
for academic research as open-source software. The so called “INET Framework for
OMNeT++” creates the base for Internet protocol modelling and simulation. Additionally,
two further modules have been used for realistic VoIP modelling and topology generation
for Autonomous System based simulations.
The Voice over IP traffic generation makes use of the “voiptool”, which is documented and
available at [36].
The AS based topology generation is done by the “Realistic Simulation Environment for IPbased Networks (ReaSE)” [83].
115
17.11.2009
Not readily available was the DSCP marking option for simulated sources, DSCP based
enqueuing in multi-queue setups, class based weighted fair queuing, class based
throughput metering and class based enqueuing and scheduling within the Ethernet switch
simulation module. The work is documented in [134].
As mentioned above, the simulation abstracts the numerous traffic sources in real world
scenarios into the recommended four basic traffic types being:
• EF - Expedited Forwarding,
• AF - Assured Forwarding,
• BE - Best Effort and
• LE - Lower Effort.
The EF class is normally used for delay and delay variation critical applications such as
Voice over IP, video conferencing etc. To get as close as possible to realistic VoIP packet
streams, the voiptool [36] has been used to feed the EF sources and to record the
received packets at the respective sink. This tool actually sends out audio samples read in
from a wave file and records the received wave packets at the EF sink. Afterwards, the
received wave file can be compared to the original one in order to calculate the perceived
audio quality expressed in “Perceptual evaluation of speech quality (PESQ)” values [119].
If not stated differently, the simulation parameters were as given in Table 18.
Table 18
Traffic source configuration parameters
Traffic Type
VoIP
(EF source
with differing
DSCP and
PESQ
meter)
Parameter settings
Coding rate: 40000 bps
DSCP: 101 000 // 0x28
mapped to ‘110’
Sending intervall:8 ms
Packet size: 79 Byte
Sending rate: 79 kbps
Traffic Type
CBR0
(BE source)
Parameter settings
DSCP: 0
Sending interval: 5ms
Packet size: 827 Byte
Sending rate:
1.323200 Mbps
FTP
(TCP based
AF source
with differing
DSCP)
Port: 21
DSCP: 010 000 // 0x10
Packet size: 1044 Byte
Sending rate: variable
TCP type: Reno
Maximum Segment Size:
1024
Advertised Window: 14336
CBR1
(LE source
with drop rate
meter)
DSCP: 001 000 // 0x08
Sending interval: 1ms
Packet size: 83 Byte
Sending rate: 664 kbps
The differing DSCP encodings for the four recommended forwarding behaviours is due to
the fact, that the simulator classification for enqueuing is based on the upper 3 DSCP bits
only - hence the truncated EF and AF encodings.
Waiting queues are drop tail queues by default and the metering resolution of the
throughput meter was set to 250ms.
Depending on the number of configured waiting queues for the different simulation runs,
the queue mapping strategy follows the one given in Table 14. Thus, the DSCP marking
for EF traffic will be mapped to priority 6 (‘110’) for smooth mapping into the IEEE VO
priority class.
The parameter combinations used within each topology, described later, are arranged in
Table 19. The sources are gradually started to create different traffic mixes and the
columns (a) to (f) show the supported classes (including separate enqueuing) along the
116
17.11.2009
forwarding path. All combinations were simulated with three scheduling schemes: no
priority (round robin), strict priority and class based weighted fair queuing.
The latter requires the setup of queue weights, which are given in the last row for the
respective supported class set in each column.
Table 19
Class and traffic type variations in simulations
9.2 Simulation results for QoS marking and forwarding
The following sections will document some simulation results for the setups described
above. Six scenarios differ in topological complexity with resulting traffic load and
simulation results. Congestion is artificially caused by differing link capacities in order to
demonstrate the influence even of this simple class of service on the resulting forwarding
quality. Scenario 6 also addresses the mapping between differently setup class support
along the AS forwarding chain under the light of marking preservation as opposed to
packet remarking.
Scenario 7 documents the influence of CoS setup harmonization between different
networking technologies. The IP QoS and Ethernet QoS are combined in this cross-layer
simulation, which required modified link capacities and adopted source sending rates.
9.2.1 Scenario 1: single node interconnection
The topology of the first scenario is deliberately kept simple for demonstration purposes of
the used sources and meters. Simulation results are deduced, which allow the direct
mapping between source characteristic, relaying behaviour and reception result.
Fig. 96 depicts the single node interconnection of four sources 1:1 mapped to their
respective sinks and a bottleneck link between two routers. Router 0 is the contention
point, where packet losses occur.
117
17.11.2009
Fig. 96
Scenario 1: single node interconnection
Fig. 97 exemplarily shows the sending and receiving characteristic of the VoIP source in
the four class CoS support case. Slight variations in the sending rate are due to the audio
encoding characteristics in wave files, where a larger packet with some replay overhead is
consecutively followed by three smaller audio sample packets. A PESQ value of 4,334 and
a LE drop rate of 83.8 % has been achieved.
Fig. 98 depicts the same situation but with no CoS support. All traffic types are roughly
equally affected by the packet drops, which balances the loss rate according to the
sourced traffic load. A PESQ value of 4,329 and a LE drop rate of 30 % has been worked
out.
Fig. 97
S1: 9-f-cbwfq
Fig. 98
118
S1: 9-a-no-priority
17.11.2009
Fig. 99
S1: 9-f-cbwfq
Fig. 100
S1: 9-a-no-priority
Fig. 99 depicts the resulting traffic mix after the contention point. It can be seen, that all
traffic classes are transmitted. The rather slow EF traffic passes through without discrimination. The TCP type AF traffic gets a fair capacity share according to its queue weight.
Best effort traffic is limited below 500 kbps and LE traffic gets a minimal share. As stated
above, about 80% of LE traffic gets discarded.
Fig. 100 gives a different picture. EF traffic is varying as already shown in Fig. 98.
However, BE traffic uses up most of the capacity followed by LE traffic. TCP based AF
traffic is completely starved out. This is due to the fact, that BE and LE combined have
been configured to exceed the bottleneck link capacity.
However, starvation effects can also result from CoS deployment with strict priority
queueing. This is documented in Fig. 101. EF traffic gets excellent forwarding and the TCP
based AF traffic uses all remaining capacity. BE and LE traffic dies off.
Fig. 101
S1: 9-f-strict-priority
119
17.11.2009
9.2.2 Scenario 2: AS interconnection – Single AS
Scenario 2 uses a topology of two stub ASes (AS00 and AS20) and one transit AS (AS10).
Each AS is simulated with double sources and sinks. No crossing traffic is modelled in
AS10, so that a simple router model can be applied.
Fig. 102
Scenario 2: AS Interconnection – Single AS
Fig. 103
S2: 9-f-cbwfq
Fig. 104
S2: 9-a-no-priority
Fig. 103 and Fig. 104 show the resulting throughput graphs sorted by traffic classes. The
qualitative result is comparable to the on in Scenario 1. However the usage rate (ur)
increase in the CBWFQ case needs to be explained. The sending rate of all fixed rate
sources sums up to about 4,1 Mbps. This rate is roughly achieved in the non priority case
on the 10 Mbps link between AS00 and AS10. The reading of the diagram distinguishes
the rates before the bottleneck with index ‘1’ and the ones after AS10 with index ‘2’. As
Fig. 104 shows, TCP traffic is almost completely starved out. Packet drops trigger TCP’s
congestion avoidance mechanism, which leads to an ever decreasing sending rate. In the
CBWFQ case, the configured fair capacity share is given to the AF class, which leads to a
120
17.11.2009
sustainable TCP throughput rate. Before the bottleneck, the non-responsive CBR sources
fill the 10 Mbps link to their nominal sending rate – hence the higher link usage.
Fig. 105
S2: 9-b-cbwfq
Fig. 106
S2: 9-e-cbwfq
Fig. 105 depicts a similar result compared to the four class CBWFQ setup. This time, only
two traffic classes (BE & EF) were supported and the classifier would map AF into the EF
queue. IEEE based classifiers would therefore require an AF group encoding of ‘100xxx’.
However, EF&AF&BE would be a recommended class set, which results in the very well
acceptable forwarding behaviour as depicted in Fig. 106.
9.2.3 Scenario 3: AS interconnection – Multi-AS
Scenario 3 uses a topology of six stub ASes (AS00, AS01, AS02 and AS20, AS21, AS22)
and two transit AS (AS10 and AS11).
Fig. 107
Scenario 3: AS interconnection – Multi-AS
121
17.11.2009
Fig. 108
S3: 9-f-cbwfq
Fig. 109
S3: 9-a-no-priority
Fig. 108 and Fig. 109 clearly demonstrate the superior forwarding behaviour of a simple
CoS enabled forwarding behaviour. The increased number of sources, which are mixed
with the transit links of AS10 smooth out the multiplexed traffic class throughput graphs.
9.2.4 Scenario 4: AS interconnection – Multi-AS 2
Scenario 4 is a slight modification of the topology of scenario 3. The transit path includes
one more AS and the transit links decrease in capacity. This results in two contention
places and multiplies the class separation effect. The results for BE only support, in AS10
and AS11 compared to identical four class support in both ASes, is not printed out, since
the characteristic throughput graph in each case is found. It is however of more interest to
vary the supported class sets along the AS chain.
Fig. 110
Scenario 4: AS interconnection – Multi-AS 2
Fig. 111 and Fig. 112 reveal, that the order of the limited class sets to pass through does
matter, even in a symmetrical topology. In Fig. 111, all four traffic classes will be taken
care of in AS10 before the non-prioritized BE only forwarding in AS11 occurs. Fig. 112
takes the opposite configuration and performs worse, particularly for the TCP based traffic.
122
17.11.2009
Fig. 111
S4: CBWFQ / no priority
Fig. 112
S4: no priority / CBWFQ
The reason for this behaviour is the resulting traffic mix after AS10. In the first case, a high
percentage of AF traffic will remain in the mix, which is then equally discriminated in AS11.
In the latter case, hardly any AF traffic is contained in the mix after AS10 and gets starved
out in AS11.
9.2.5 Scenario 5: AS interconnection – Multi-AS 3
Scenario 5 is again a slight modification of the topology of scenario 4. A further transit AS
has been introduced accompanied with a further link capacity reduction.
Fig. 113
Scenario 5: AS interconnection – Multi-AS 3
As Fig. 114 and Fig. 115 show, the advantage of late CoS bottlenecks along a forwarding
path remains. However, each introduced transit AS has detrimental effects on the variation
and the throughput level of rate adopting sources.
However, Fig. 116 and Fig. 117 document the case, where the CoS bottlenecks would at
least separate two traffic classes. The resulting traffic characteristics and throughput
numbers underline the thesis’ strong request for coarse-grained inter-domain CoS support.
123
17.11.2009
Fig. 114
S5: 2x CBWFQ / no priority
Fig. 115
S5: no priority / 2x CBWFQ
Fig. 116
S5: 2x CBWFQ / EF&BE
Fig. 117
S5: EF&BE / 2x CBWFQ
9.2.6 Scenario 6: AS interconnection – Multi-AS 4
Scenario 6 is again a slight modification of the topology of scenario 5. A further transit AS
has been introduced accompanied with a further link capacity reduction.
No further fundamental knowledge might be gained from this extended topology, except
for the comparison of remarking and non-remarking simulation results.
All scenario simulations and results so far assumed, that enqueued traffic would keep its
class marking regardless of the respective class support at each relay node. Although this
124
17.11.2009
concept strongly recommends such funnel mapping and marking preservation, network
operators are free to remark packets as they traverse their AS.
Fig. 118
Scenario 6: AS interconnection – Multi-AS 4
The remarking behaviour will therefore be documented on 2 selected examples. It is
assumed, that class mappings for enqueuing and class remarking will be applied
identically by the respective CoS limited AS.
Fig. 119
S6: 2 classes w/o remark.
Fig. 120
S6: 2 classes with remark.
Fig. 119 and Fig. 120 depict the result when either the CoS bottleneck is performing
marking preservation or remarking. In the latter case, traffic classes are either upgraded
(as shown with the AF traffic being remarked into EF type) or downgraded. Consecutive
forwarding behaviour can therefore no longer be class type specifically applied.
The situation gets worse in the best effort only support case, where all traffic passes
through a single (BE) class. As clearly shown in Fig. 121 and Fig. 122, the forwarding
within AS 12 applied single class funnel classification without remarking on the left and
with remarking on the right. In the remarking case, consecutive transit ASes will no longer
be able to distinguish the all BE marked packets. This effectively results in traditional best
effort forwarding along the remaining transit path segments.
125
17.11.2009
Fig. 121
S6: 1 class w/o remarking
Fig. 122
S6: 1 class with remarking
9.2.7 Scenario 7: AS interconnection – Cross-Layer
Scenario 7 addresses the cross-layer marking and mapping challenge, which arises with
any underlying transport networking technology. Since AS interconnection is increasingly
based on Ethernet links (being IXP platforms or point-to-point links), the example will focus
on the interworking of IP QoS and Ethernet QoS. Fig. 123 depicts the selected topology,
where AS00 and AS20 are interconnected across an Ethernet switch.
Fig. 123
Scenario 7: AS interconnection – Cross-Layer
The introduction of an Ethernet model requires the selection of 10 or 100 Mbps link
capacities. However, this requires an increase of EF load on the network as well. An
aggregated traffic load of 2.64 Mbps VoIP, 13.2 Mbps CBR0 and 8.43 Mbps CBR1 has
been chosen. Furthermore, a ‘110’ mapping has been applied for the VoIP traffic in order
to conform to the IEEE voice encoding.
126
17.11.2009
Fig. 124
S7: with Ethernet QoS
Fig. 125
S7: without Ethernet QoS
Fig. 124 and Fig. 125 depict the resulting CoS based throughput in the traffic mix after the
4 queue CoS switch or after the BE only switch, respectively. The best effort only switch
virtually destroys the prioritization for EF and AF traffic and prefers the high volume CBR
traffic type. Therefore, the underlying CoS support with consistent cross-layer mapping is
important for the successful overall performance. It is a constituent part of the proposed
CoS concept.
9.3 Setup selection for token bucket ingress filtering
Class of Service support will require class overload protection using token bucket based
ingress rate limitation. Since the OMNET++ does not support token bucket filtering, the
“Network Simulator 2 (ns2)” [145] has been used for these simulations.
The simulator’s classification scheme differs from the one used in OMNET++ and
generally refers to so called “flow IDs (fid)”. Those source-to-destination flow identifiers
allow for the mapping of packets into DiffServ DSCP encodings in DiffServ enabled nodes.
Such DiffServ domains are modelled by means of three nodes, the edge node performing
ingress classification, the core performing CoS based enqueuing and dropping and the
outgoing edge node performing CoS based dequeuing and rate limited priority scheduling.
Those nodes can be abstracted into a single ingress rate limited node, when they are
arranged in a single forwarding line. Fig. 126 depicts the resulting simulation topology.
127
17.11.2009
Fig. 126
Single node structure with token bucket filtering
The rate limitation by means of the ns2 token bucket implementation is a combination of
token bucket based metering and the resulting increased dropping probability for excess
traffic and a strict rate limitation on CoS enabled enqueuing. Table 1 lists the applied
parameters, whereby the queue rate is the limitation setting for each of the four traffic
types.
3 Mbps link
it
Table 20
Simulation parameter settings
All simulated packets were of 1000 byte size. Strict priority queuing with the highest queue
being queue 1 was applied.
9.4 Simulation results for token bucket ingress filtering
The simulation of this single node example did yield the expected rate limitation result and
is documented in Fig. 127, Fig. 128, Fig. 129 and Fig. 130. The sending behaviour of the
128
17.11.2009
four sources remains the same, but the mapping groups LE into BE (4:3), then additionally
AF with LE into BE (4:2) and lastly all traffic classes into BE. The legend of the graphs
reads with the curves “Before TB” as the throughput observed before the token bucket
limitation and “At Dest” as the throughput received at the destination. All but Class 2 have
constant sending rates. The AF source is a TCP source, which always sends as fast as
possible. The sending and queueing rates are deliberately chosen above the physical
bottleneck link capacity in order to demonstrate the hard rate limitations (e.g. of 0.5 Mbps
for EF) combined with priority based queuing and dropping.
Fig. 127
Single node TB 4->4
Fig. 128
Single node TB 4->3
The start time of each source is varied to demonstrate the source’s influence. EF starts
first and is limited to a “committed information rate (CIR)” of 500kbps. This slightly reduces
the constant sending rate of 600kbps. It has the highest priority and as long as there are
two queues available, hardly notices any surrounding traffic changes. It is a typical rate
limited EF forwarding service with best forwarding quality.
Secondly, a constant bit rate class of type LE is turned on with a nominal sending rate of 2
Mbps. However, token bucket limitation is applied for the selective LE queue in Fig. 127.
This limits the LE traffic to just 400kbps.
Thirdly, a third constant bit rate class of type BE is turned on with a nominal sending rate
of 2.5 Mbps. Its token bucket limitation is set to 2.5 Mbps in the 4 class case. However,
token bucket limitation is applied for the selective BE queue in Fig. 127. This limits the BE
traffic to 2.2 Mbps. The bottleneck link with 3 Mbps capacity is completely used up. Only
300kbps remain for the Lower Effort class. Lastly, the TCP source starts sending with the
highest achievable rate and a priority of 4. The rate limitation of 2.2 Mbps is not reached
because of the congestion avoidance mechanism in TCP. LE type traffic gets starved out
and BE traffic is reduced to about 800kbps. The TCP stream uses about 1.6 Mbps on
average.
A similar behaviour is observed in the 3 class setup, where BE and LE share the same
unlimited low priority queue. Initially, LE can transmit its full 2 Mbps link load in the uncongested phase. With the BE source in place, both streams equally share the remaining
2.5 Mbps link capacity. Together with the TCP load, BE and LE share the formerly
available 800kbps BE link share and the TCP traffic yields the same throughput as in the 4
class support case.
129
17.11.2009
Fig. 129
Single node TB 4->2
Fig. 130
Single node TB 4->1
Fig. 129 depicts the minimal CoS support setup, where a 500kbps rate limited high priority
EF class is contrasted with the common link share of the LE, BE and AF sources. It can be
seen, that the remaining 2.5 Mbps link capacity is taken up by the constant bit rate BE and
LE sources in the relation of their sending rates. The congestion controlled TCP stream
dies off, due to the lost prioritization. This situation is stepped up further in the single class
support case depicted in Fig. 130. The EF class gains its full transmission rate in the uncongested phase, but is drastically reduced under the view of all sources being active. EF,
BE and LE divide the 3 Mbps capacity in equal shares according to their sending rates. No
TCP type traffic can sustain this link’s 100% congestion phase.
9.5 Summary of simulation results
The simulation results of the QoS marking and forwarding behaviour as well as the
functionality of token bucket ingress limitation filters clearly demonstrate the superior class
of service forwarding operational quality as contrasted with the currently deployed best
effort only transmission capabilities.
Due to the impossible handling of arbitrarily complex Internet traffic models for this
simulation effort, the concept’s coarse-grained class of service support has been applied in
the modelling as well. Up to four most commonly found traffic classes have been distinguished in the setups combined with extensive parameter simulations and scheduling
strategy variations. Single node as well as AS interconnection setups have been modelled,
which allowed for simulations of varying class support situations in interconnected ASes.
Only some few examples of the gained results have been documented in here. The
complete result set of all combinations is available upon request.
The concept’s expectation of sensible usage shares with matching class set support has
been proven. The applicability of strict priority queuing and class based fair queuing is both
valid and leads to satisfactory results. However class based fair queuing is preferred due
to its configurable prevention of traffic starvation in all classes.
The interconnection scenario of multiple-ASes revealed the general advantage even of
consistent two class support in transit networks. CoS bottleneck simulations revealed, that
the ordering of class support granularities along a forwarding path does matter. The later a
merging of traffic occurs, the better.
Furthermore, the advantage of cases where marking preservation is performed in CoS
bottlenecks as compared with remarking cases has been demonstrated. This backs up the
concept’s strong recommendation of tunnelled customer traffic transport with matched
tunnel CoS support.
130
17.11.2009
The consequences of missing tunnel CoS support have been simulated and exposed.
Class overload prevention will be performed by token bucket ingress filtering as specified
in the second IETF draft document. Therefore, the precise limitation characteristics and
some typical application scenarios have been simulated. Token bucket metering combined
with prioritised queuing is a simple but powerful means for network protection and sensible
rate limited class of service based forwarding of traffic.
The fundamental building blocks of the new cross-domain and cross-layer coarse grained
Quality of Service support concept have been successfully simulated. Given the high level
of traffic aggregation across interconnection links and the current poor best effort
forwarding situation, even a simple two class of service interconnection is shown to be
highly beneficial for the separated transport of prioritized traffic.
131
17.11.2009
10 Concept implementation
New concepts and ideas can only become Internet standards, if they are contributed to the
respective working group within the “Internet Engineering Task Force (IETF)” and gain
community support there. RFC 2418 [40] defines the guidelines and procedures for the
working group operation. This formally defines, what is cited as an early quote by David
Clark: “We reject kings, presidents and voting. We believe in rough consensus and running
code”.
Especially the philosophy of running code as a way to find out possibly missing subtleties
in specifications and at the same time ensuring that the new specification can actually be
used straight away is a fundamental building block in the IETF standardization work.
Consequently, the new cross-domain and cross-layer coarse grained Quality of Service
support was not only submitted to the “Inter-Domain Routing (idr)” working group of the
IETF, but also implemented. Basic functionality has already been achieved and used for
test runs in the University laboratories as well as with service providers.
10.1 Linux implementation
The open-source operating system Linux includes the routing suite software “Quagga”
[154], which includes the implementation of several routing protocols as well as local
routing table management. The Border Gateway Protocol is also supported, which has
been used and extended for the implementation of the new signalling concept.
Fig. 131
Quagga Routing Suite structure
Fig. 131 depicts the structure of the routing suite software. It consist of the central software
process (daemon) “Zebra” and several processes (daemons) for the depicted routing
protocols. Each process has an associated command line interface called “Virtual
132
17.11.2009
TeletYpe shell (vtysh)”. Router administrators can therefore connect to each process and
issue control commands to them.
All processes exchange their routing information with external peers as well as with the
central Zebra daemon for node local routing information and table updates.
The BGP daemon and its associated vtysh has been modified for the implementation of
the concept’s new routing information exchange as well as for the required new vytsh
commands for its configuration. Fig. 132, Fig. 133 and Fig. 134 give example setups for
both new Extended Communities as well as for the token bucket rate limitation signalling.
All configuration command create the required internal data structures, initiate the
respective sending and show up in the “show running-config“ command, which displays the
current router configuration. The configuration and activation of the new communities and
attributes targets the so called “route-map” mechanism, which is used in such configurations for triggered actions on matched criteria. This powerful mechanism is now extended
to selectively output the CoS signalling data to neighbours by attaching the respective
route-map to this peering session.
Fig. 132
Example setup for 4 QoS Marking Ex. Communities for IP-DiffServ
Fig. 133
Example setup for a CoS Capability Ex. Community
133
17.11.2009
Fig. 134
Example setup for a CoS Parameter Attribute
The modifications of the BGP daemon within the Quagga routing suite have not yet been
submitted to the Quagga team for inclusion. They are still under development and testing,
but have proven operational stability and functionality.
Table 21 lists all newly added commands and their available parameters. A detailed
description of the commands and their parameter handling will be published after the
official code adoption within the Quagga software project.
Chapter 11 documents the test results, which were achieved by means of this modified
Linux routing software.
134
17.11.2009
Table 21
Extended command line syntax for CoS configurations
135
17.11.2009
10.2 Wireshark implementation
Measurement tools are required to analyze test results and to aid the debugging process.
The most widely used network analyzer tool for data communication networks is the freely
available software “Wireshark” [181]. This successor of the former “ethereal” software
holds a comprehensive set of protocol dissectors, which allows the user to analyze almost
all types of captured data packets in a cleanly structured way and detailed to every bit of
the packet’s control information.
Since all programming sources of the Wireshark package are available, the new Extended
Communities have been added to the dissection repository.
The modifications have been submitted to the Wireshark team and were accepted for
inclusion. The newest official release therefore includes the decoding functionality for the
data structures.
Fig. 135 depicts a screenshot of the software. The enlargement shows some examples of
transmitted Extended Communities within a BGP UPDATE message.
Fig. 135
Wireshark screenshot with captured Extended Communities
136
17.11.2009
10.3 Online debug form
The new CoS signalling capabilities in Linux based BGP speakers is not yet available in
commercial routers. Therefore, many network operators might well be able to receive
those Extended Communities and attributes, but will be confronted with decimal or
hexadecimal encodings of the information as shown in Fig. 136.
Fig. 136
Reception example of Extended Communities in commercial routers
In order to decode the received information, such operators would be forced to use the
augmented Wireshark functionality. However, this is will not happen in production style
setups.
Therefore, a second means of decoding is provided, which accepts the un-decoded
command line output and displays the decoding result. This service has been set up as
online debug form and can be accessed at the following URL:
http://www.bgp-qos.org/draft-knoll/decode_attributes.php .
Either single encodings (e.g. “0x420:11778:3422565120”) or complete command line log
files can be submitted in the online form for decoding.
The result will be returned in structured table output style as shown in Fig. 137.
Fig. 137
Decoding result of the online form
137
17.11.2009
11 Implementation test
Service providers run large network setups and almost exclusively use commercial router
equipment. The interoperability of the Linux-based concept’s implementation with such
routers is therefore vital and needs to be tested.
Due to the selection of BGP as signalling protocol and the reuse of Extended Communities
for most of the signalling information transport allows for the interconnection of modified
Linux and commercial router systems. More over, the Extended Community attribute is by
definition a transitive attribute and the CoS Parameter attribute has been designed as such
as well. Thus, all commercial routers of all vendors will receive and store the attributes and
eventually relay them unprocessed. In practise, the latter holds only true, if the network
operator has not configured the discard filter for Extended Communities and for unknown
attributes.
The simple interconnection of a Linux-PC with several types of Cisco routers has been
tested. The establishment of a peering session, the exchange of routing information and
the attachment of some of the new path attributes was successfully realized.
The more challenging testing of the attribute relay between commercial routers as well as
the extensive signalling of the new path attributes under DFZ routing conditions has been
successfully completed and is documented below.
11.1 Test setup
The intention of the test setup as shown in Fig. 138 is to learn a so called “full feed” global
routing table view from a public peering, to augment this information with some CoS
tagged self-originated prefixes and to relay this advertisement to a commercial Router 1.
This in turn relays the full routing information, including the injected CoS signalling
information, towards a second commercial Router 2.
Fig. 138
Implementation test setup
138
17.11.2009
Three Linux PCs and two Cisco 2811 routers have been used in the testing arrangement.
The test has been performed in test labs of independent service providers for the global
Internet peering connectivity.
All information exchanges were to be captured for documentation and offline analysis
purposes. Wireshark was used for this task and the modified Linux PC (“themis” in the
figure) was able to do the packet capture for the public interconnection link.
The direct links between the Linux-PC and Router 1 as well as the one between Router 1
and Router 2 needed to be eavesdropped by means of two simple Ethernet hubs and two
Wireshark equipped Linux-PCs (“leda” and “maia” in the figure).
11.2 Test result and observations
The session establishment between the BGP speakers of the described test setup could
be successfully realized and routing advertisements within BGP UPDATE messages were
observed.
The public Internet peering for global connectivity was configured to relay Internet to test
setup advertisements and to filter out all locally generated or repeated advertisements.
This way, all unacknowledged announcements could be suppressed. This is particularly
important, since service level agreements and mutual information exchange within BGP
peering sessions are of legal relevance and must be kept clear of any unwanted or
uncontrolled leakage of information.
Fig. 139 documents the successful reception of a full routing table feed from the public
Internet along the modified Linux PC into the Router 1 BGP routing process. The routing
table contained 273109 IP prefixes and consumed about 56 MB of RIB memory space.
The time between session establishment and table convergence at the Linux PC came to
about 6 minutes. The same time was needed for the unchanged relay operation toward
Router 1 and again to Router 2.
Fig. 139
Router1: show ip bgp sum – full feed
In a second test series, the Linux PC started to send out route-map matched reachability
UPDATE of its own networks. Some made up networks were configured and associated
with a four CoS Extended communities. The signalling of the respective CoS Extended
Communities was successful and could be observed in Wireshark and in the routers’
debug and statistic outputs. Fig. 140 and Fig. 141 exemplarily show the resulting memory
consumption of 40 bytes for the four attached Extended Communities, which were
received in one Extended Community attribute. This consumption is interesting in two
ways. Firstly, the Extended Community itself is an 8 byte structure and four such structures should result in 32 bytes memory usage and secondly, no difference in consumption
139
17.11.2009
was found between 1, 10, 100 or several hundred announced prefixes with the same
associated CoS Extended Communities.
Fig. 140
Router 1: show ip bgp sum – single prefix with 4 communities
Fig. 141
Router 1: show ip bgp sum – 10 prefixes with 4 communities
Fig. 142 in turn documents the maximum tested simulation run for a single CoS Extended
Community being deliberately attached to all incoming full feed announced prefixes. This
forced behaviour is not conformant to the specified concept, but clearly proves the sending
capability of the Linux-PC as well as the stable handling and efficient storing of this
massive load of incoming CoS attributes by commercial routers.
Fig. 142
all
Router1: show ip bgp sum – full feed with single community attached to
Further test runs have been performed, where hundreds of prefixes were associated with
thousands of differing CoS Extended Communities in the connectivity advertisements. It
can be stated, that all tests have successfully been passed and the resource usage
analysis of this testing is discussed in chapter 11.4.
140
17.11.2009
One further observation has been made, which revealed a still unresolved BGP signalling
flaw.
Fig. 143
Completely processed Extended Community attribute example
Fig. 143 depicts the reception of four CoS Extended Communities contained in one
Extended Community attribute. However, this UPDATE message passed through a Cisco
router which marked the attribute as “completely” processed.
This marking flag (complete vs. partial processing) is a mandatory field associated with all
BGP path attributes. The standard requests, that all BGP speakers, that receive an
unknown to them transitive attribute, must relay the attribute with raised “partially processed” marking.
In the Extended Community attribute case, the Cisco router obviously did not raise the
partial flag because of its familiarity with Extended Community attributes as such. The
unknown content, namely the new QoS Marking Extended Communities, is therefore
silently relayed as completely processed.
This signalling inconsistency has been acknowledged by the two major router vendors.
141
17.11.2009
11.3 Ethernet QoS support test at IXPs
The new cross-domain and cross-layer coarse grained Quality of Service support concept
places emphasis on the CoS interworking not only between networking domains, but also
between networking layers.
Since many potentially CoS capable interconnected Service Providers peer across public
Internet Exchange Points, the underlying Ethernet QoS support needed to be tested as
well. All major Internet Exchange Points in the world are currently not QoS enabled and
switch untagged Ethernet frames.
However, talks to IXP operators revealed, that they are willing to support their customers
in high class peering setups and want to be prepared for it. Allowing customers to
configure VLAN tagged peerings across an IXP platform is a prerequisite for QoS support.
The support itself can be divided into QoS marking and marking preservation only support
or QoS marking and QoS forwarding support. The latter is unlikely to be enabled soon.
Fig. 144
VLAN User Priority test at DE-CIX [144]
142
17.11.2009
The IXP administration at the German Internet Exchange Point in Frankfurt was kind
enough to perform the raised testing request on their cascaded platform. Two PCs,
languard1 and languard4 were exchanging IEEE 802.1Q tagged Ethernet frames with
configured user priority markings. It could be shown, that all marks traversed the Force10
and Foundry based switching platform unchanged.
This is a fundamental building block for VLAN tunnelled and priority marked AS interconnections. IXP customers are supplied with VLAN based platform access upon request. The
switch hardware of the platform is even capable of performing multi-class prioritised
forwarding.
Other IXPs are expected to offer high class peerings as well. However, no central
database exists currently, which could guide potential peering partners to the QoS enabled
peering platforms. Therefore, a new database about QoS enabled IXPs has been initiated.
This registry of QoS-enabled IXPs can be found at: http://www.bgp-qos.org/qos-ixp/ .
As Fig. 145 already shows, the second European IXP has also acknowledged support for
VLAN tagged peering with user priority preservation. Differences are in the number of
priority queues supported on the Ethernet hardware platform.
Fig. 145
Major European IXPs with VLAN User Priority support
11.4 Resource usage estimates
The applicability of an inter-domain concept depends on three major criteria:
• simplicity to gain common understanding and usage,
• scalability to large number of interconnections and routing table entries and
• modesty in resource usage.
The latter two are due to the facts of ever increasing Internet routes and autonomous
systems. Fig. 146 and Fig. 147 document this continuing growth trend.
143
17.11.2009
Fig. 146
Active BGP entries over time [year] - [93]
Fig. 147
Unique ASes over time [year] - [93]
Network operators are strongly concerned about the growth rate. The situation is further
intensified by AS multi-homing. Here, stub ASes connect via two or more Internet service
providers to the Internet. Due to the BGP best path selection, only one homing path would
be used at a time and frequent swapping between them is not aimed for and on the
contrary is harmful for inter-domain routing stability. Therefore, stub-ASes tend to split their
currently aggregated IP address spaces into de-aggregated ranges. This leads to larger
prefix lengths and two routing table entries instead of one.
This in turn results in an increase in routing table size and has led to the commonly
accepted prefix length limitation policy. Prefixes of more than 24 bit length are filtered out
in the BGP processing, which virtually disconnects any finer-grained IP address range.
Fig. 148 shows the average rate of updated and withdrawn prefixes. Less than ten such
prefix manipulations are on average to be processed by any BGP speaker in the Internet.
This reveals, that UPDATE processing and UPDATE message size is of minor importance
144
17.11.2009
during time of normal operation. However, it becomes critical for the initialisation phase of
BGP sessions. Here, the full routing table is exchanged, which results in considerable
UPDATE message amounts and high processing loads.
Fig. 148
Hourly Average of Updated and Withdrawn Prefix Rate - [93]
Therefore BGP UPDATE messages were generated, which included just one prefix
associated with 173 Extended Communities. This extreme message design yielded no
operational flaws and no measurable processing increase.
The new coarse grained CoS concept has proven to scale well, under the light of
thousands of prefixes being associated with CoS signalling in the UPDATE procedure, as
well as in the number of differing CoS Extended Communities being received, stored and
relayed by BGP speakers.
The resource estimation depends on a number of factors, which can vary widely depending on the number of providers adopting this CoS concept.
However, due to the limited number of different classes being sensibly used and the fact
that identical attributes are going to be stored as a single instance, the overall resource
usage is expected to be rather small – especially in terms of additional memory consumption.
11.4.1 Increase in routing update information size
The resource usage for the sole transmission of BGP UPDATE messages is best analyzed
under worst case conditions. An upcoming BGP peer in a newly established BGP session
will require the transmission of the full routing table with all prefixes and associated
attributes. This situation will be taken as the starting point for the calculations, which follow
below. However, it is to be noted, that such a dense full table exchange does provide the
highest information packing density. As far as possible, the sender will group all prefixes
with identical attribute sets into one UPDATE message for transmission. The addition of a
new CoS attribute will therefore be applied to all advertised prefixes within the message as
well, which results in 8 byte signalling overhead for several hundred CoS enabled prefixes.
Under normal network operation, single prefixes with associated attributes will be
exchanged. Such an UPDATE message of one IPv4 prefix and only the limited set of
mandatory attributes (ORIGIN – 4 bytes, AS_PATH with one AS and NEXT_HOP)
accounts for a message size of 32 byte. The addition of one CoS Extended Community
would need the Extended Community attribute control information (3 byte) and the actual
Extended Community (8 byte). 43 bytes would therefore be needed. This yields an
overhead of 11 bytes to the original size, which is 34.375 %.
145
17.11.2009
Associating two CoS Extended Communities in the single prefix case requires 3+16=19
bytes overhead, which yields 59.375 % total overhead and goes down to 29.69 %
overhead per CoS Extended Community. Fig. 149 depicts the resulting overhead graph.
Fig. 149
CoS signalling UPDATE message overhead – single prefix case
As mentioned earlier a limitation of 173 Extended Communities within an UPDATE
message exists. This limitation is caused by the Ethernet MTU size of 1500 byte.
Furthermore, if the same prefix is associated with 173 Extended Communities in a first
UPDATE message and the same prefix is announced a second time with 173 Extended
Communities in a second UPDATE message, those communities do not accumulate at the
receiver, but rather are regarded as new information replacing the old one.
As Fig. 150 shows, the UPDATE message with 173 Extended Communities contained
173 * 8 byte = 1384 bytes of Extended Communities, led to 1444 bytes of BGP UPDATE
message size and grew to 1498 bytes Ethernet frame size due to the IP, TCP and
Ethernet header information.
146
17.11.2009
Fig. 150
Wireshark screenshot with 173 Extended Communities UPDATE
The signalling of CoS marking information is done transitively and globally in QoS Marking
Extended Communities.
The rate-limitation signalling in CoS Parameter Extended Communities, however, is of
interconnection local significance (see chapter 7.3.2.3) and therefore of limited concern for
resource usage estimation.
The signalling of class markings in the newly defined Extended Communities is realized by
the construction of single Extended Communities for each class of service and each of its
technology representations.
The additional signalling bytes transferred in a single update message are therefore
calculated as follows.
Supdate = classes * technologies * 8 byte + 3 byte
To limit the expected attribute’s usage to practical scenarios, one needs to consider the
following facts:
• most vendors support a maximum of 8 classes (e.g. Ethernet priority, MPLS E-LSP)
in their network devices,
• most router interfaces support only four hardware queues,
• CoS marking signalling for the IP technology does not distinguish between IP version 4 and 6, because of the common DSCP marking design and
• common transport technologies for IP packets are Ethernet with or without intermediate MPLS tunnelling support.
The maximum UPDATE size increase for the CoS support signalling given the above
mentioned assumption for 8 traffic classes encoded within 3 networking technologies
yields
Supdate = 195 byte / update message.
147
17.11.2009
The exchange of the full feed routing table as shown in Fig. 142 required the sending of
95160 update messages. Under the assumption that all globally reachable prefixes would
need to be associated with the 195 byte CoS signalling in their UPDATE advertisements,
this would yield an increased UPDATE transfer size of additional 17.7 MB.
Given the commonly used interconnection speed of 1Gbps, this results in additional 149
μs transfer delay.
The coarse-grained CoS support, which is aimed for in this concept targets only four
classes. Because of the intensive usage of IXPs for AS interconnection, It is expected, that
CoS support signalling will potentially be limited to IP and Ethernet CoS markings. This
two class / two technologies setup sums up to Supdate = 67 byte / update message.
Given the current 95160 update messages, this yields a maximum transfer size of 6.1 MB.
Given the current interconnection speed of 1Gbit/s, this results in 51 μs transfer delay.
An UPDATE size increase for a full BGP table feed will even reduce to a size of 3.2 MB
(and a respective 26,6 μs transfer delay on a gigabit interconnection link), if only IP CoS
support signalling is deployed.
Because of this small additional UPDATE message size increase, the more important
factor in terms of resource usage and possible machine update requirement results from
the memory usage estimate.
11.4.2 Increase in memory consumption with routers
The storage of globally flooded QoS Marking Extended Communities is of highest concern.
Each BGP speaker in the world will receive the announced CoS support by the IP prefix
originating AS and store this information in its local BGP memory space.
Estimating the actual memory usage is non-trivial due to the observed storage concept.
Given that the memory consumption for extended community attribute storage is independent from the number of prefixes being advertised, the estimate needs to focus on the
number of different attribute sets being advertised. This, however, is not only dependent
on the number of classes and technologies being addressed but also on the markings and
flags being conveyed in those attributes. Only those attributes, that are identical in every
single bit, can be stored as a single attribute instance. Otherwise, they need to be stored
separately.
This leads to different estimation approaches.
1. Estimate the number of class sets and markings (possibly belonging to different
ASes) and calculate the resulting memory space, taking into consideration that flags
might change as well.
2. Analyze the attribute structure and multiply the combinations of each field that might
independently vary its value.
Taking both approaches into account, the actual estimate is the minimum value of both.
For convenience, the structure of the QoS Marking Community is again depicted in Fig.
151 below.
148
17.11.2009
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 P R I A 0 0| QoS Set Number|Technology Type| QoS Marking Oh|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| QoS Marking Ol| QoS Marking A |0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fig. 151
Structure of the QoS Marking Community
Based on possibly varying attributes fields, the maximum (and unrealistic) number of
storable binary combinations sums up as follows:
• 4 varying flags bits contribute 16 combinations,
• 8 bit QoS Set number contribute 256 combinations,
• 6 defined technologies contribute 6 combinations,
• 6 DSCP bits and 1 group bit within Marking O contribute 128 combinations and
• 6 bit DSCP bits within Marking A contribute 64 combinations.
Assuming that no dependencies exist between those fields, a total number of 201,326,592
combinations can theoretically exist. Since each Extended Community is of 8 byte size, up
to 1536 MB of additional memory space were required.
This theoretical resource estimate is prohibitive for the deployment of the concept.
However, the made worst case assumptions are partially dependent and highly unlikely.
The respective limitations will therefore be discussed and considered for a more realistic
estimate calculation.
The four flags from the Flags field indicate processing states of the respective community
and might well be observed in all 16 constellations. They will not be observed for a given
CoS signalling in all 16 combinations at once. Combined with the combinations of the
remaining structure fields, only one of those 16 combinations can be found for a given
QoS Marking Community at a given router device. That is only one instance of this
Community must be stored in this device.
Secondly, the QoS Set Number is the linking field between multiple QoS Marking
Communities of different networking technologies. One such number is needed for the
cross-layer technology linking per class. It has UPDATE message local significance and
recounts with a value of zero. Given the realistic assumption of a maximum of 8 classes
for the AS interconnection, this QoS Set Number will also vary between 0 and 7.
Thirdly, because of the current best practice for interconnections, the Technology Type
field will commonly contain the IP, Ethernet and MPLS E-LSP enumeration values. This
yields a value of 3 combinations.
Fourthly, Table 3 lists 21 DSCP values, which might be combined with the grouping bit 14
for AF DSCP in the QoS Marking O field. Furthermore, 3 bit priority based technologies will
potentially signal up to 8 combinations in this field. The resulting number of combinations
is 41.
Lastly, the mentioned 21 DSCP values as well as the 8 priority encodings are most likely
to be found in the Marking A field. 29 binary combinations are therefore estimated for this
field.
149
17.11.2009
Taken the above assumptions into account, the following number of storable binary
combinations needs to be considered:
• 4 varying flags bits contribute 1 combination,
• the 8 bit QoS Set number contributes 8 combinations,
• the Technology Type field contributes 3 combinations,
• the Marking O field contributes 41 combinations and
• the Marking A contributes 29 combinations.
This yields a total of 28536 combinations.
Since each combination uses 8 byte memory storage, this sums up to 223 KB of
additional memory.
Taking one more technology (e.g. separate virtual channels per class) into account, this
result would change to 57,072 combinations or 446 KB, respectively.
Fig. 152 depicts the resulting memory usage estimate for up to 8 classes and four
technologies.
Fig. 152
Memory usage estimates for up to 8 classes and four technologies
To conclude the estimate on memory usage analysis, the following two figures document
real world measurement examples of BGP memory consumptions.
Fig. 153 correlates the number of Extended Communities associated to a prefix and the
displayed statistics about BGP memory consumption for those communities. A single step
measurement has been made, which documents non-linear memory consumption for
Cisco implementation specific memory allocation. The most obvious non-linearity occurs
with the storage of 7 Extended Communities, which raises the memory allocation to 250
byte. The next increase happens with Extended Community 31.
150
17.11.2009
Fig. 153
Memory usage for ext. communities sent within one UPDATE message
Fig. 154
Memory usage for large quantities of sent extended communities
Fig. 154 documents the linear increase in memory usage under the light of thousands of
stored Extended Communities. The storage of e.g. 65567 Communities consumes 533KB
of memory.
The test clearly demonstrated the technical feasibility and scalability of the proposed
signalling solution to global scale deployment.
151
17.11.2009
12 Summary and outlook
This thesis focuses on interconnections of autonomous systems with emphasis on the
introduction of a currently missing class of service support.
In general, this work has three main contributions. In the first part, a comprehensive
compilation of quality of service support concepts with detailed network and node internal
building block descriptions has been arranged, which proves the technical readiness of
currently deployed devices for an inter-domain class of service based interconnection.
Combined with an oral survey among major European, American and Middle East
networks operators, this contribution led to the strong request for a simple, understandable
and manageable concept design. In the second part, the specification of the new interdomain CoS concept has been drafted and submitted to the IETF for standardization. In
the third part, simulations and implementations of vital building blocks of the concept have
been made to underline its functionality and technical feasibility. Resource estimates and
successful field trials provide evidence for its scalable and functioning design.
12.1 Contributions and results
In particular, the following contributions have been made:
• The interconnection of autonomous systems for global Internet connectivity is a
critical point between network providers in technical and economical terms. Current deployments are solely based on basic public Internet Protocol interconnection only without any quality of service support. Capacity over-provisioning and
network internal QoS control have been found as state of the art operation
strategies. Due to the continuing fast growth of Internet traffic, the thesis forecasts rising capacity provisioning costs combined with a raised level of congestion on the interconnection links. To address this foreseeable trend, a new class
of service interconnection concept of global scale has been designed.
• Simplicity has been identified as most important design factor for the concept’s
acceptance in the Internet community. This simplicity design goal stretches into
the designed signalling structures and handling procedures as well as in the actual extent of supported traffic classes.
• The importance to introduce at least two and recommendable four classes of
service at AS interconnections has been stated and underlined with simulations.
• Simple traffic separation as opposed to existing complex quality of service support concepts with delay, loss and jitter guarantees is strongly aspired to in order
to avoid complex, costly and prohibitive deployment restrictions.
• Simplicity in terms of waived quality guarantees is a prerequisite of the concept
and contributes to global deployment.
• The analysis of existing and possibly newly defined signalling protocols for the
concept’s dissemination of CoS support information led to the selection and reuse of BGP as commonly available signalling protocol at interconnection points.
• New Extended Communities and a new BGP path attribute have been designed
for the required signalling of cross-domain and cross-layer CoS support information.
152
17.11.2009
•
•
•
•
•
•
The design of transitive relay functionality of CoS signalling via Extended Communities, as well as the provider controllable mapping of CoS support information between different networking technology CoS support concepts, is a novel
principle and fundamental contribution.
Elaborate simulation results on single node and AS level class of service support
have exemplarily been documented within this thesis and are freely available
upon request.
Implementation test results have been contributed, which prove the concept’s
applicability and interoperability with existing networking equipment.
Resource estimates have been worked out, which revealed a negligible influence of the new CoS signalling on routing UPDATE message exchanges and
moderate memory consumption within routing devices. The analysis of realistic
CoS support scenarios documents the concept’s applicability in large scale.
The design of the simple CoS concept does not prohibit the selective application
of more complex QoS guaranteeing concepts. In fact, the concurrent deployment of the generally available CoS support combined with QoS guaranteeing
setups for a limited set of interconnections or transit paths is supported.
A global class-based Internet with at least 2 and recommendable 4 generally
available classes of service is recommended by this new CoS concept.
12.2 Practical usage
Emphasis has been placed on the practical usage of the concept. The following achievements address some important milestones for the deployment.
•
•
•
•
The intellectual property right free submission of the concept’s design specification to the IETF standardization body prohibits possible patent applications. Free
global deployment is aimed for and provider internal cost savings contribute to
the benefit of the concept’s deployment.
The implementation results within the Linux routing suite Quagga and the network protocol analyzer software Wireshark are freely available. The Wireshark
extension has already been contributed to the official software release and the
Quagga implementation will be submitted for inclusion in the official source tree.
An online service for decoding of raw CoS signalling data has been setup and
can be used at the following location: http://www.bgp-qos.org/draftknoll/decode_attributes.php
Type number assignments have been granted by IANA, which already enables
the public signalling of QoS Markings and CoS Capabilities in production style
network operation. The concept has thereby crossed the border from laboratory
confined setups into public applicability.
12.3 Outlook
The current status of the new cross-domain and cross-layer coarse grained Quality of
Service support concept, limits its deployment to Linux based internetworking devices.
Ongoing discussions with network operators and router vendors aim for a general concept
support in commercial routers. Technical feasibility has been attested by the discussion
partners and deployment interest has been raised by European providers.
153
17.11.2009
Future deployment experiences and adoption requests will lead to concept and implementation refinements.
To foster the concept’s deployment in production style networks, the augmentation of
legacy commercial router equipment by means of an interactive Linux-based remote
management mechanism is currently under development. Fig. 155 depicts the concealed
CoS control of commercial border routers by an AS internal Linux-PC. The transitive
design of all signalling elements ensures that the passive bidirectional signalling relay
within the commercial border actually forward the signalling information to and from the
Linux-PC. This PC is in charge of the CoS signalling processing and generation and
simply uses the router as signalling relay.
A second connection of the Linux-PC to the command line interface of the router will be
used to issue the respective control commands for the configuration and activation of the
router’s existing class of service support functionality.
This intermediate solution will allow operators to enable inter-domain CoS support without
costly software or hardware upgrades.
Fig. 155
Linux remote control of existing commercial AS border router
An ongoing discussion on “Network neutrality” is influencing the vendors’ support and
operators’ deployment of any inter-domain quality of service enhancements. A neutral
Internet operation without any service blockings, content filtering or any favouring of
Internet users over other Internet users is requested.
Discussions with service providers and federal network agencies revealed, that the
designed CoS concept with its simple and generally applicable structure is likely to be
regarded as non-discriminating and possibly omnipresent Internet enhancement.
Further techno-economic studies on the cost reduction potential of the concept will need to
be carried out to guide the device upgrade and CoS deployment decision process.
The BGP Community based signup procedure for new services and concepts, proposed
by the company Google, is briefly described in chapter 5.2. Depending on the outcome,
this CoS support concept can even be used as the contractual base for inter-provider class
of service support agreements.
154
17.11.2009
Bibliography
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
3GPP, “Quality of Service (QoS) concept and architecture (Release 5)”,3GPP TS
23.107 V5.8.0, 2003.
3GPP,” Policy and charging control architecture (Release 8)”, 3GPP TS 23.203
V8.6.0, 2009.
3GPP, ”UTRA-UTRAN Long Term Evolution (LTE) and 3GPP System Architecture
Evolution (SAE)”, 3GPP, 2006, [Online]. Available:
ftp://ftp.3gpp.org/Inbox/2008_web_files/LTA_Paper.pdf
3GPP, “Requirements for further advancements for Evolved Universal Terrestrial
Radio Access (E-UTRA) (LTE-Advanced) (Release 8)”,3GPP TR 36.913, 2009,
[Online]. Available: http://www.3gpp.org/ftp/Specs/archive/36_series/36.913/
Abley, J.; Savola, P. and Neville-Neil, G., "Deprecation of Type 0 Routing Headers
in IPv6", RFC 5095, IETF, 2007.
Alaettinoglu, C.; Villamizar, C.; Gerich, E.; Kessens, D.; Meyer, D.; Bates, T.;
Karrenberg, D., Terpstra, M., "Routing Policy Specification Language (RPSL)", RFC
2622, IETF, 1999.
Amante, S., Bitar, N., Bjorkman, N., et. al., "Inter-provider Quality of Service - White
paper draft 1.1", 2006, [Online]. Available: http://cfp.mit.edu/docs/interprovider-qosnov2006.pdf
AMS-IX, “AMS-IX Monthly Reporting”, 2009, [Online]. Available: http://www.amsix.net/technical/stats/CUMU/
Andersson, L., Asati, R., "Multiprotocol Label Switching (MPLS) Label Stack Entry:
EXP Field Renamed to Traffic Class Field", RFC 5462, IETF, 2009.
Andersson, L.; Minei, I. & Thomas, B., "LDP Specification", RFC 5036, IETF, 2007.
Andersson, L., Swallow, G., "The Multiprotocol Label Switching (MPLS) Working
Group decision on MPLS signaling protocols", RFC 3468, IETF, 2003.
Andras, V., “OMNeT++”, OMNeT Development Team, 2009, [Online]. Available:
http://www.omnetpp.org
Awduche, D.; Berger, L.; Gan, D.; Li, T.; Srinivasan, V. and Swallow, G., "RSVP-TE:
Extensions to RSVP for LSP Tunnels", RFC 3209, IETF, 2001.
Awduche, D.; Malcolm, J.; Agogbua, J.; O'Dell, M. and McManus, J., "Requirements
for Traffic Engineering Over MPLS", RFC 2702, IETF, 1999.
Ayyangar, A.; Kompella, K.; Vasseur, J. and Farrel, A., "Label Switched Path
Stitching with Generalized Multiprotocol Label Switching Traffic Engineering
(GMPLS TE)", RFC 5150, IETF, 2008.
Babiarz, J.; Chan, K. and Baker, F., "Configuration Guidelines for DiffServ Service
Classes", RFC 4594, IETF, 2006.
Baker, F.; Polk, J. and Dolly, M., "DSCP for Capacity-Admitted Traffic", InternetDraft draft-ietf-tsvwg-admitted-realtime-dscp-05, IETF, Work in progress, 2008.
Banerjea, A., Ferrari, D., et. al., "The Tenet Real-Time Protocol Suite: Design,
Implementation, and Experiences", IEEE/ACM Transactions on Networking, Volume
4, Issue 1, pp. 1-10, 1996.
Bates, T.; Chandra, R.; Katz, D. & Rekhter, Y., "Multiprotocol Extensions for BGP4", RFC 4760, IETF, 2007.
Bates, T.; Chen, E. & Chandra, R., "BGP Route Reflection: An Alternative to Full
Mesh Internal BGP (IBGP)", RFC 4456, IETF, 2006.
Bates, T.; Gerich, E.; Joncheray, L.; Jouanigot, J.-M.; Karrenberg, D.; Terpstra, M.
and Yu, J., "Representation of IP Routing Policies in a Routing Registry (ripe81++)", RFC 1786, IETF, 1995.
155
17.11.2009
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
Bauschert, T., “Lecture: Data Communications”, Chemnitz University of Technology, 2008.
Bellman, R. E., "Dynamic Programming", Princeton University Press, Princeton,
N.J., 1957.
Benmohamed, L.; Liang, C.; Naber, E.; Terzis, A., "QoS Enhancements to BGP in
Support of Multiple Classes of Service", draft-liang-bgp-qos-00 (work in progress),
IETF, June 2006.
Berger, L., "Generalized Multi-Protocol Label Switching (GMPLS) Signaling
Functional Description", RFC 3471, IETF, 2003.
Berger, L., "Generalized Multi-Protocol Label Switching (GMPLS) Signaling
Resource ReserVation Protocol-Traffic Engineering (RSVP-TE) Extensions", RFC
3473, IETF, 2003.
Bernet, Y., "Format of the RSVP DCLASS Object", RFC 2996, IETF, 2000.
Bernet, Y.; Blake, S.; Grossman, D. and Smith, A., "An Informal Management Model
for Diffserv Routers", RFC 3290, IETF, 2002.
Bernet, Y.; Ford, P.; Yavatkar, R.; Baker, F.; Zhang, L.; Speer, M.; Braden, R.;
Davie, B.; Wroclawski, J. and Felstaine, E., "A Framework for Integrated Services
Operation over Diffserv Networks", RFC 2998, IETF, 2000.
Black, D., "Differentiated Services and Tunnels", RFC 2983, IETF, 2000.
Black, D.; Brim, S.; Carpenter, B. and Faucheur, F. L., "Per Hop Behavior Identification Codes", RFC 3140, IETF, 2001.
Blake, S.; Black, D.; Carlson, M.; Davies, E.; Wang, Z. and Weiss, W., "An
Architecture for Differentiated Service", RFC 2475, IETF, 1998.
Bless, R., "Dynamic Aggregation of Reservations for Internet Services", Proceedings of the Tenth International Conference on Telecommunication Systems - Modeling and Analysis (ICTSM 10), Vol. 1, pp. 26-38, Monterey, 2002, [Online]. Available:
http://www.tm.uka.de/doc/2003/ictsm-daris-journal-crc-web.pdf
Bless, R.; Nichols, K. and Wehrle, K., "A Lower Effort Per-Domain Behavior (PDB)
for Differentiated Services", RFC 3662, IETF, 2003.
Blunk, L.; Damas, J.; Parent, F. and Robachevsky, A., "Routing Policy Specification
Language next generation (RPSLng)", RFC 4012, IETF, 2005.
Bohge, M., Renwanz, M., “A realisitic VoIP traffic generation and evaluation tool for
OMNeT++”, First International OMNeT++ Workshop, 2008, [Online]. Available:
http://www.tkn.tu-berlin.de/research/omnetVoipTool/
Boucadair, M., "QoS-Enhanced Border Gateway Protocol", draft-boucadair-qosbgp-spec-01 (work in progress), IETF, July 2005.
Braden, R.; Clark, D. and Shenker, S., "Integrated Services in the Internet Architecture: an Overview", RFC 1633, IETF, 1994.
Braden, R.; Zhang, L.; Berson, S.; Herzog, S. and Jamin, S., "Resource ReSerVation Protocol (RSVP) -- Version 1 Functional Specification", RFC 2205, IETF, 1997.
Bradner, S., "IETF Working Group Guidelines and Procedures", RFC 2418, IETF,
1998.
Brown, M., Underwood, T., Zmijewski, E., “The Day the YouTube Died”, Renesys
Corp. at MENOG 3, [Online]. Available:
http://www.renesys.com/tech/presentations/pdf/menog3-youtube.pdf
Callon, R., "Use of OSI IS-IS for routing in TCP/IP and dual environments", RFC
1195, IETF, 1990.
Callon, R., “Email list discussion on: [NSIS] FW: I-D Action:draft-ietf-nsis-ntlp20.txt”, IETF NSIS working group email archive, 11 June 2009, [Online]. Available:
http://www.ietf.org/mail-archive/web/nsis/current/msg08563.html
Carpenter, B. E., “Re: [Diffserv] A question”, email discussion on DiffServ working
group list, [Online]. Available: http://www.ietf.org/mailarchive/web/diffserv/current/msg04257.html
156
17.11.2009
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
Chan, K.; Babiarz, J. and Baker, F., "Aggregation of DiffServ Service Classes", RFC
5127, IETF, 2008.
Chandra, R.; Traina, P. and Li, T., "BGP Communities Attribute", RFC 1997, IETF,
1996.
Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, IETF, 2000.
Cisco, “An Introduction to IGRP”, ID: 26825, 2005, [Online]. Available:
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a00800c8a
e1.shtml
Cisco, “BGP Best Path Selection Algorithm”, ID: 13753, 2006, [Online]. Available:
http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094431
.shtml
Cisco, “Enhanced Interior Gateway Routing Protocol”,ID: 16406, 2005, [Online].
Available:
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094c
b7.shtml
Cisco, “Cisco 12000 Series Internet Router Architecture: Switch Fabric”, ID: 47240,
2005, [Online]. Available:
https://www.cisco.com/en/US/products/hw/routers/ps167/products_tech_note09186
a00801e1da7.shtml
Cisco, “Network Infrastructure for Ensuring Predictable Business Service Delivery”,
ID: C11-397769-00, 2007, [Online]. Available:
http://www.cisco.com/en/US/prod/collateral/routers/ps6342/prod_white_paper0900a
ecd805f62b1.html
Cisco, “Network Infrastructure – Chapter 3”, ID: OL-13817-04, [Online]. Available: http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/6x/netstruc.html
Cisco, “Congestion Avoidance Overview”, ID: QC-75, [Online]. Available:
http://www.cisco.com/en/US/docs/ios/12_0/qos/configuration/guide/qcconavd.html
Cisco, “Configuring QoS”, ID: 78-13490-01, [Online]. Available:
http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.1/8aew/configuratio
n/guide/qos.html
Cisco, “Evolving Data Center Architectures: Meet the Challenge with Cisco Nexus
5000 Series Switches”, ID: C11-473501-01, [Online]. Available:
http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns783/white_p
aper_c11-473501.html
Clos, C., “A study of non-blocking switching networks”, Bell System Technical
Journal, vol. 32 issue 2, pp. 406–424, 1953.
Colitti, L., “A strategy for IPv6 adoption”, RIPE 57, October 2008, [Online]. Available: http://www.ripe.net/ripe/meetings/ripe-57/presentations/ColittiA_strategy_for_IPv6_adoption.Z8ri.pdf
Cristallo, G.; Jacquenet, C., "The BGP QOS_NLRI Attribute", draft-jacquenet-bgpqos-00 (work in progress), IETF, February 2004.
Davie, B.; Charny, A.; Bennet, J.; Benson, K.; Boudec, J. L.; Courtney, W.; Davari,
S.; Firoiu, V. and Stiliadis, D., "An Expedited Forwarding PHB (Per-Hop Behavior)",
RFC 3246, IETF, 2002.
DE-CIX, “DE-CIX topology 2009”, 2009, [Online]. Available: http://www.decix.net/content/network/topology.html
DE-CIX, “DE-CIX yearly traffic graph”, 2009, [Online]. Available: http://www.decix.de/content/network.html
Deering, S., Hinden, R., "Internet Protocol, Version 6 (IPv6) Specification", RFC
1883, IETF, 1995.
Deering, S., Hinden, R., "Internet Protocol, Version 6 (IPv6) Specification", RFC
2460, IETF, 1998.
157
17.11.2009
[65]
[66]
[67]
[68]
[69]
[70]
[71]
[72]
[73]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]
[85]
[86]
[87]
[88]
Delgrossi, L. & Berger, L., "Internet Stream Protocol Version 2 (ST2) Protocol
Specification - Version ST2+", RFC 1819, IETF, 1995.
Demers, A.; Keshav, S.; Shenkar, S.; ”Analysis and simulation of a fair queuing
algorithm”. Proceedings of SIGCOMM '89, pages 1-12, 1989.
Dijkstra, E. W., “A note on two problems in connexion with graphs”, Numerische
Mathematik, 1, pp. 269-271, 1959, [Online]. Available: http://wwwm3.ma.tum.de/twiki/pub/MN0506/WebHome/dijkstra.pdf
Djernaes, M., Appanna, C., Ward, D., “Context updates in BGP”, draft-djernaessimple-context-update-00 (work in progress), IETF, 2006.
DSL Forum, “Migration to Ethernet-Based DSL Aggregation“, DSL Forum Technical
Report TR-101, 2006, [Online]. Available: http://www.broadbandforum.org/technical/download/TR-101.pdf
Eardley, P., "Metering and marking behaviour of PCN-nodes", Internet-Draft draftietf-pcn-marking-behaviour-03, IETF, Work in progress, 2009.
Evans, J., Filsfils, C., "Deploying IP and MPLS QoS for multiservice networks:
Theory and practice", Morgan Kaufmann/Elsevier, Amsterdam, 2007.
Farinacci, D.; Li, T.; Hanks, S.; Meyer, D. and Traina, P., "Generic Routing
Encapsulation (GRE)", RFC 2784, IETF, 2000.
Farrel, A.; Ayyangar, A. and Vasseur, J., "Inter-Domain MPLS and GMPLS Traffic
Engineering -- Resource Reservation Protocol-Traffic Engineering (RSVP-TE) Extensions", RFC 5151, IETF, 2008.
Farrel, A.; Vasseur, J.-P. and Ayyangar, A., "A Framework for Inter-Domain
Multiprotocol Label Switching Traffic Engineering", RFC 4726, IETF, 2006.
Faucheur, F. L.; Wu, L.; Davie, B.; Davari, S.; Vaananen, P.; Krishnan, R.; Cheval,
P. and Heinanen, J., "Multi-Protocol Label Switching (MPLS) Support of Differentiated Services", RFC 3270, IETF, 2002.
Feher, G., Nemeth, K., Maliosz, M., et.al., "Boomerang A Simple Protocol for
Resource Reservation in IP Networks", IEEE RTAS, 1999.
Floyd, S., Jacobson, V., "Random early detection gateways for congestion
avoidance", IEEE/ACM Transactions on Networking, V.1 N.4, p. 397-413, 1993.
Ford, L. R. Jr., and Fulkerson, D. R., "Flows in Networks", Princeton University
Press, Princeton, N.J., 1962.
Franke, K., “Lecture material: Digital Communication Networks”, Chemnitz
University, 2006.
Fu, X., Schulzrinne, H., Bader, A., et. al., "NSIS: A new extensible IP signaling
protocol suite," IEEE Communications Magazine, vol. 43, pp. 133 - 141, 2005.
Fuller, V.; Li, T.; Yu, J. and Varadhan, K., "Classless Inter-Domain Routing (CIDR):
an Address Assignment and Aggregation Strategy", RFC 1519, IETF, 1993.
Fuller, V., Li, T., "Classless Inter-domain Routing (CIDR): The Internet Address
Assignment and Aggregation Plan", RFC 4632, IETF, 2006.
Gamer, T., Scharf, M., “Realistic Simulation Environments for IP-based Networks”,
First International OMNeT++ Workshop, 2008, [Online]. Available:
http://doc.tm.uka.de/2008/omnet2008.pdf
Golestani, S.: "A Self-Clocked Fair Queueing Scheme for Broadband Applications".
Proceedings of IEEE Infocom '94, p. 636-646, 1994.
Grossman, D., "New Terminology and Clarifications for Diffserv", RFC 3260, IETF,
2002.
Hawkinson, J., Bates, T., "Guidelines for creation, selection, and registration of an
Autonomous System (AS)", RFC 1930, IETF, 1996.
Hedrick, C., "Routing Information Protocol", RFC 1058, IETF, 1988.
Heinanen, J.; Baker, F.; Weiss, W. & Wroclawski, J., "Assured Forwarding PHB
Group", RFC 2597, IETF, 1999.
158
17.11.2009
[89]
[90]
[91]
[92]
[93]
[94]
[95]
[96]
[97]
[98]
[99]
[100]
[101]
[102]
[103]
[104]
[105]
[106]
Heinanen, J., Guerin, R., "A Single Rate Three Color Marker", RFC 2697, IETF,
1999.
Heinanen, J., Guerin, R., "A Two Rate Three Color Marker", RFC 2698, IETF,
1999.
Herzog, S.; Boyle, J.; Cohen, R.; Durham, D.; Rajan, R. & Sastry, A., "COPS usage
for RSVP", RFC 2749, IETF, 2000.
Hinden, R., Deering, S., "IP Version 6 Addressing Architecture", RFC 4291, IETF,
2006.
Huston, G., “BGP reports”, [Online]. Available: http://bgp.potaroo.net/
Hwang, J.; Altmann, J.; Oliver, H.; Suarez, A., “Enabling dynamic market-managed
QoS interconnection in the next generation internet by a modified BGP mechanism”, ICC 2002, IEEE International Conference on Communications, 2002,
[Online]. Available:
http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/7828/21517/00997325.pdf?arnu
mber=997325
IANA, “BGP Extended Communities Types”, IANA Protocol Registries, [Online].
Available: http://www.iana.org/assignments/bgp-extended-communities
IANA, “IANAifType-MIB”, [Online]. Available:
http://www.iana.org/assignments/ianaiftype-mib
IEEE, "IEEE Standard for Local and metropolitan area networks Media Access
Control (MAC) Bridges", IEEE 802.1D, 2004.
IEEE, "IEEE standard for local and metropolitan area networks virtual bridged local
area networks", IEEE 802.1Q, p. 1-285, 2006.
IEEE, "IEEE Standard for Information technology-Telecommunications and
information exchange between systems-Local and metropolitan area networksSpecific requirements - Part 11: Wireless LAN Medium Access Control (MAC) and
Physical Layer (PHY) Specifications", IEEE Std 802.11-2007 (Revision of IEEE Std
802.11-1999) , C1-1184, 2007.
IEEE, "IEEE Std 802.1ad - 2005 IEEE Standard for Local and metropolitan area
networks - virtual Bridged Local Area Networks, Amendment 4: Provider Bridges",
IEEE Std 802.1ad-2005 (Amendment to IEEE Std 8021Q-2005) , p. 1-60, 2006.
IEEE, "IEEE Standard for Local and metropolitan area networks¿Virtual Bridged
Local Area Networks Amendment 7: Provider Backbone Bridges", IEEE Std
802.1ah-2008 (Amendment to IEEE Std 802.1Q-2005) , C1-109, 2008.
IEEE, "IEEE Standards for Local and Metropolitan Area Networks: Supplements to
Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method
and Physical Layer Specifications - Specification for 802.3 Full Duplex Operation
and Physical Layer Specification for 100 Mb/s Operation on Two Pairs of Category
3 or Better Balanced Twisted Pair Cable (100BASE-T2)", IEEE Std 802.3x-1997
and IEEE Std 802.3y-1997 (Supplement to ISO/IEC 8802-3: 1996; ANSI/IEEE Std
802.3, 1996 Edition) , p. 1-324, 1997.
IEEE, “IEEE 802 Tutorial: Data Center Bridging”, IEEE 802, 2007, [Online].
Available: http://www.ieee802.org/802_tutorials/07-November/Data-CenterBridging-Tutorial-Nov-2007-v2.pdf
IEEE, ” Virtual Bridged Local Area Networks — Amendment: Congestion Notification”, IEEE P802.1Qau/D2.1, 2009.
IEEE, "IEEE Standard for Local and metropolitan area networks Part 16: Air
Interface for Broadband Wireless Access Systems", IEEE Std 802.16-2009 (Revision of IEEE Std 802.16-2004) , C1-2004, 2009.
ISO, “Information technology – Telecommunications and information exchange
between systems – Intermediate System to Intermediate System intra-domain
routeing information exchange protocol for use in conjunction with the protocol for
159
17.11.2009
[107]
[108]
[109]
[110]
[111]
[112]
[113]
[114]
[115]
[116]
[117]
[118]
[119]
[120]
[121]
[122]
[123]
[124]
[125]
[126]
[127]
[128]
[129]
providing the connectionless-mode network service (ISO 8473)”,ISO/IEC
10589:2002, second edition, 2002.
ITU-T, “Traffic control and congestion control in IP based networks”, ITU-T Y.1221,
2002.
ITU-T, “Network performance objectives for IP-based services”, ITU-T Y.1541,
2006.
ITU-T, “NGN FG Proceedings Part II”, ITU-T NGN Focus Group, 2005.
ITU-T, “One-way transmission time”, ITU-T G.114, 2003.
ITU-T, “Gigabit-capable passive optical networks (GPON): General characteristics”,
ITU-T G.984.1, 2008.
ITU-T, “Asymmetric digital subscriber line (ADSL) transceivers”, ITU-T G.992.1,
1999.
ITU-T, “Splitterless asymmetric digital subscriber line (ADSL) transceivers”, ITU-T
G.992.2, 1999.
ITU-T, “Asymmetric digital subscriber line transceivers 2 (ADSL2)”, ITU-T G.992.3,
2005.
ITU-T, “Splitterless asymmetric digital subscriber line transceivers 2 (splitterless
ADSL2)”, ITU-T G.992.4, 2002.
ITU-T, “Very high speed digital subscriber line transceivers”, ITU-T G.993.1, 2004.
ITU-T, “Very high speed digital subscriber line transceivers 2 (VDSL2)”, ITU-T
G.993.2, 2006.
ITU-T, “End-user multimedia QoS categories”, ITU-T G.1010, 2001.
ITU-T, “Perceptual evaluation of speech quality (PESQ): An objective method for
end-to-end speech quality assessment of narrow-band telephone networks and
speech codecs”, ITU-T P.862, 2001, [Online]. Available: http://www.itu.int/rec/TREC-P862/en
Jacquenet, C.; Bourdon, G. ; Boucadair, M., "Service Automation and Dynamic
Provisioning Techniques in IP/MPLS Environments (Wiley Series on Communications Networking & Distributed Systems)", Wiley, 2008.
Jacobson, V.,“Differentiated Services for the Internet”, Internet2 Joint Applications/Engineering QoS Workshop, 1998, [Online]. Available:
ftp://ftp.ee.lbl.gov/talks/vj-i2qos-may98.pdf
Jamoussi, B.; Andersson, L.; Callon, R.; Dantu, R.; Wu, L.; Doolan, P.; Worster, T.;
Feldman, N.; Fredette, A.; Girish, M.; Gray, E.; Heinanen, J.; Kilty, T. and Malis, A.,
"Constraint-Based LSP Setup using LDP", RFC 3212, IETF, 2002.
Klein, P., Sprecher, N., “Provider Ethernet VLAN Cross Connect”, Seabridgenetworks/NSN, 2006, [Online]. Available:
http://www.ieee802.org/1/files/public/docs2006/new-sprecher-vlan-xc-ieee-0106.pdf
Knoll, T. M., "BGP Extended Community Attribute for QoS Marking", draft-knoll-idrqos-attribute-04 (work in progress), IETF, 2009.
Knoll, T. M., "BGP Class of Service Interconnection", draft-knoll-idr-cosinterconnect-03 (work in progress), IETF, 2009.
Knoll, T. M., “Flow control + priority consideration -> PRIORITY_PAUSE”, NG
Ethernet Forum post, 2006.
Knoll, T. M., “QoS capable Internet Exchange Points”, 2009, [Online]. Available:
http://www.bgp-qos.org/qos-ixp/list.php
Kompella, K., Rekhter, Y., "Label Switched Paths (LSP) Hierarchy with Generalized
Multi-Protocol Label Switching (GMPLS) Traffic Engineering (TE)", RFC 4206,
IETF, 2005.
Lee, S., Gahng-Seop, A., Zhang, X. and Campbell, A., "INSIGNIA: An IP-Based
Quality of Service Framework for Mobile Ad Hoc Networks". Journal of Parallel and
Distributed Computing (Academic Press), Special issue on Wireless and Mobile
Computing and Communications, Vol. 60, Number 4, pp. 374-406 April, 2000.
160
17.11.2009
[130] Malkin, G., "RIP Version 2", RFC 2453, IETF, 1998.
[131] Manner, J; Fu, X., “Analysis of Existing Quality-of-Service Signaling Protocols”,
RFC 4094, IETF, 2005.
[132] Manner, J.; Karagiannis, G. & McDonald, A., "NSLP for Quality-of-Service Signaling", Internet-Draft draft-ietf-nsis-qos-nslp-16, IETF, Work in progress, 2008.
[133] Manning, B., "Registering New BGP Attribute Types", RFC 2042, IETF, 1997.
[134] Manns, D., “Simulative Untersuchung von klassenbasiertem Inter-AS IP-Forwarding
mit Ethernet IXP”, Diplomarbeit – TU Chemnitz, Chemnitz, 2009.
[135] Marques, P.; Sheth, N.; Raszuk, R.; Greene, B.; McPherson, D., "Dissemination of
flow specification rules", draft-ietf-idr-flow-spec-09 (work in progress), IETF, May
2009.
[136] Menth, M., Lehrieder, F., “Pre-Congestion Notification: Lightweight Admission
Control and Flow Termination for the Future Internet”, ICC 2009, Dresden, 2009.
[137] Merit, “Internet Routing Registry”, Merit Network Inc., 2009, [Online]. Available:
http://www.irr.net/
[138] Mills, D., "Exterior Gateway Protocol formal specification", RFC 904, IETF, 1984.
[139] Morand, P., Boucadair, M., Asgari, H., Egan, et al., “D1.4: Issues in MESCAL InterDomain QoS Delivery: Technologies, Bi-directionality, Inter-operability, and Financial Settlements”, MESCAL Consortium, 2004, Online. Available:
http://www.istmescal.org/deliverables/MESCAL-D14-final-v2.pdf
[140] Moy, J., "OSPF Version 2", RFC 2328, IETF, 1998.
[141] Nichols, K., “An Opinionated View of the Current State of IP Differentiated Services”, UC Berkeley MIG Seminar, 1999, [Online]. Available:
http://bmrc.berkeley.edu/courseware/cs298/fall99/nichols/kmn_ucbmm.pdf
[142] Nichols, K.; Blake, S.; Baker, F. and Black, D., "Definition of the Differentiated
Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, IETF, 1998.
[143] Nichols, K. and Carpenter, B., "Definition of Differentiated Services Per Domain
Behaviors and Rules for their Specification", RFC 3086, IETF, 2001.
[144] Nipper, A., “VLAN User Priority test on DE-CIX platform”, DE-CIX, 2009.
[145] NS2 team, “Network Simulator 2 – ns2”, NS2 webpage, 2009, [Online]. Available:
http://www.isi.edu/nsnam/ns/
[146] Ould-Brahim, H.; Fedyk, D. & Rekhter, Y., "BGP Traffic Engineering Attribute", RFC
5543, IETF, 2009.
[147] Pan, P., Hahne, E. and Schulzrinne, H., "BGRP: A Tree-Based Aggregation
Protocol for Inter-domain Reservations", Journal of Communications and Networks,
Vol. 2, No. 2, pp. 157-167, 2000.
[148] Pan, P., Schulzrinne, H.,"YESSIR: A Simple Reservation Mechanism for the
Internet". Proceedings of NOSSDAV, Cambridge, UK, 1998.
[149] Parekh, A.K.; Gallager, R.G.;”A generalized processor sharing approach to flow
control in integrated services networks”. Proceedings of IEEE Infocom ’92, p. 915924, 1992.
[150] Park, H., "Systematic QoS Class Mapping Framework for Application Requirement
over Heterogeneous Networks" Telecommunications Network Strategy and Planning Symposium, Networks 2008, 2008.
[151] PeeringDB, “Peering Database - PeeringDB”, PeeringDB.com, 2009, [Online].
Available: http://www.peeringdb.com
[152] Perkins, C., "IP Encapsulation within IP", RFC 2003, IETF, 1996.
[153] Postel, J., "Internet Protocol", RFC 791, IETF, 1981.
[154] Quagga, “Quagga Routing Software Suite”, 2009, [Online]. Available:
http://www.quagga.net
[155] Rajahalme, J.; Conta, A.; Carpenter, B. and Deering, S., "IPv6 Flow Label Specification", RFC 3697, IETF, 2004.
161
17.11.2009
[156] Ramakrishnan, K.; Floyd, S. and Black, D., "The Addition of Explicit Congestion
Notification (ECN) to IP", RFC 3168, IETF, 2001.
[157] Rekhter, Y.; Li, T. and Hares, S., "A Border Gateway Protocol 4 (BGP-4)", RFC
4271, IETF, 2006.
[158] Rice, L., “The inter-colonial telegraph station at Eucla.”, [Online]. Available:
http://members.iinet.net.au/~oseagram/eucla.html
[159] RIPE, “Representation of IP Routing Policies in a Routing Registry”, Réseaux IP
Européens, 1994, [Online]. Available: ftp://ftp.ripe.net/ripe/docs/ripe-181.txt
[160] Rosen, E.; Tappan, D.; Fedorkow, G.; Rekhter, Y.; Farinacci, D.; Li, T. and Conta,
A., "MPLS Label Stack Encoding", RFC 3032, IETF, 2001.
[161] Sangli, S.; Tappan, D. and Rekhter, Y., "BGP Extended Communities Attribute",
RFC 4360, IETF, 2006.
[162] Sara, “Peering policy for SARA (AS1126)”, SARA Computing and Networking
Services, 2009, [Online]. Available: http://www.as1126.net/
[163] Seaman, M.; Smith, A.; Crawley, E. & Wroclawski, J., "Integrated Service Mappings
on IEEE 802 Networks", RFC 2815, IETF, 2000.
[164] Schulzrinne, H. and Stiemerling, M., "GIST: General Internet Signalling Transport",
Internet-Draft draft-ietf-nsis-ntlp-20, IETF, Work in progress, 2009.
[165] Schwabe, T., “IP-Netze mit Interdomain-BGP-Routing: Konvergenzverhalten,
Dienstqualität und Dimensionierung”, Ph.D. dissertation, TU München, 2007.
[166] Shenker, S.; Partridge, C. and Guerin, R., "Specification of Guaranteed Quality of
Service", RFC 2212, IETF, 1997.
[167] Smith, P., “BGP Scaling Techniques”, AfNOG workshop, 2006, [Online]. Available:
http://ws.edu.isoc.org/data/2006/153397902444822943b7611/bgpscal.ppt
[168] Sofia, R., Guerin, R. and Veiga, P., “SICAP, a Shared-segment Inter-domain
Control Aggregation Protocol”, High Performance Switching and Routing, HPSR,
Turin, 2003.
[169] Spenneberg, R., "Linux-firewalls mit Iptables& Co.", Pearson Education, 2006.
[170] Suter, B.; Lakshman, T.V.; Stiliadis,D.; Choudhury, A.K.; "Buffer Management
Schemes for Supporting TCP in Gigabit Routers with Per-Flow Queueing". IEEE
Journals in Selected Areas in Communications, 1999.
[171] Suzuki, M., “Per-priority Flow Control”, IEEE 802.1 meeting Portland, 2004,
[Online]. Available: http://www.ieee802.org/1/files/public/docs2004/Perpriority%20Flow%20Control1.pdf
[172] Traina, P.; McPherson, D. & Scudder, J., "Autonomous System Confederations for
BGP", RFC 5065, IETF, 2007.
[173] Trick, U., Weber, F., "SIP, TCP/IP und Telekommunikationsnetze: Anforderungen Protokolle - Architekturen", Oldenbourg, München, 2004.
[174] Verizon, “Verizon Business Policy for Settlement-Free Interconnection with Internet
Networks”, Verizon Business, 2009, [Online]. Available:
http://www.verizonbusiness.com/terms/peering/
[175] Villamizar, C.; Chandra, R. & Govindan, R., "BGP Route Flap Damping", RFC
2439, IETF, 1998.
[176] Vohra, Q., Chen, E., "BGP Support for Four-octet AS Number Space", RFC 4893,
IETF, 2007.
[177] Westerlund, M., “Email list discussion on: [NSIS] GIST updated from todays IESG
call”, IETF NSIS working group email archive, 9 April 2009, [Online]. Available:
http://www.ietf.org/mail-archive/web/nsis/current/msg08534.html
[178] Westerlund, M., “Email list discussion on: [NSIS] GIST updated from todays IESG
call”, IETF NSIS working group email archive, 21 April 2009, [Online]. Available:
http://www.ietf.org/mail-archive/web/nsis/current/msg08543.html
[179] Wikipedia, “Eucla, Western Australia”, [Online]. Available:
http://en.wikipedia.org/wiki/Eucla,_Western_Australia
162
17.11.2009
[180] Wikipedia, “Network neutrality”, [Online]. Available:
http://en.wikipedia.org/wiki/Network_neutrality
[181] Wireshark, “Wireshark network protocol analyzer”, Wireshark design team, 2009,
[Online]. Available: http://www.wireshark.org
[182] Wroclawski, J., "The Use of RSVP with IETF Integrated Services", RFC 2210,
IETF, 1997.
[183] Wroclawski, J., "Specification of the Controlled-Load Network Element Service",
RFC 2211, IETF, 1997.
[184] Yavatkar, R.; Hoffman, D.; Bernet, Y.; Baker, F. & Speer, M., "SBM (Subnet
Bandwidth Manager): A Protocol for RSVP-based Admission Control over IEEE
802-style networks", RFC 2814, IETF, 2000.
[185] Zhang, Z., "ExtCommunity map and carry TOS value of IP header", draft-zhang-idrbgp-extcommunity-qos-00 (work in progress), IETF, November 2005.
List of Figures
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
IP version 4 datagram structure..................................................................................................................... 4
IPv4 address class system - [22] ................................................................................................................... 5
CIDR example network mask......................................................................................................................... 5
IP version 6 datagram structure..................................................................................................................... 6
Differentiated Services (DS) field in IPv4 and IPv6 datagram headers ................................................... 7
IP routing and forwarding functionality.......................................................................................................... 8
Internet routing hierarchy ................................................................................................................................ 9
Internet routing architecture.......................................................................................................................... 10
IP routing protocols – classified by applicability ........................................................................................ 10
IP routing protocols – classified by working principle ........................................................................... 12
Internet Exchange Point - IXP.................................................................................................................. 12
BGP Best Path Selection Algorithm - [49] ............................................................................................. 14
BGP message structure ............................................................................................................................ 15
BGP path attribute classification [46], [161] ........................................................................................... 15
BGP UPDATE message structure – after [157]..................................................................................... 16
BGP UPDATE message structure with Extended Community attribute ............................................ 17
BGP Route Reflector topology ................................................................................................................. 18
Autonomous System Confederations for BGP ...................................................................................... 18
IP router block diagram ............................................................................................................................. 19
IP router internal structure -> route processing ..................................................................................... 20
IP router with non-blocking fabric and virtual output queues............................................................... 22
Router internal forwarding path per hop behaviour............................................................................... 26
Drop-Tail queue dropping strategy .......................................................................................................... 27
Random Early Detection (RED) for congestion avoidance................................................................. 28
Longest Queue Drop (LQD) of virtually separated flows...................................................................... 29
Round Robin scheduling ........................................................................................................................... 30
Strict Priority scheduling............................................................................................................................ 30
Weighted Round Robin scheduling ......................................................................................................... 31
Symbolized fair queuing in an idealized GPS = Fluid-Flow Queuing ................................................. 31
Fluid-flow approximated queuing in WFQ .............................................................................................. 32
VoQ with 8 classes CoS support (scheduling and dropping) .............................................................. 33
Per hop forwarding behaviour composition in relaying nodes............................................................. 33
Leaky bucket algorithm ............................................................................................................................. 35
163
17.11.2009
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40
Fig. 41
Fig. 42
Fig. 43
Fig. 44
Fig. 45
Fig. 46
Fig. 47
Fig. 48
Fig. 49
Fig. 50
Fig. 51
Fig. 52
Fig. 53
Fig. 54
Fig. 55
Fig. 56
Fig. 57
Fig. 58
Fig. 59
Fig. 60
Fig. 61
Fig. 62
Fig. 63
Fig. 64
Fig. 65
Fig. 66
Fig. 67
Fig. 68
Fig. 69
Fig. 70
Fig. 71
Fig. 72
Fig. 73
Fig. 74
Fig. 75
Fig. 76
Fig. 77
Fig. 78
Fig. 79
Fig. 80
Fig. 81
Fig. 82
Fig. 83
Fig. 84
Fig. 85
Fig. 86
Fig. 87
Fig. 88
Fig. 89
Fig. 90
Fig. 91
Fig. 92
Fig. 93
Fig. 94
Fig. 95
Fig. 96
Token bucket algorithm............................................................................................................................. 36
QoS-based IP lookup variants ................................................................................................................. 37
Best Effort interconnection example........................................................................................................ 38
QoS-based forwarding interconnection example .................................................................................. 39
QoS-based path selection in BGP........................................................................................................... 40
QoS-based routing interconnection example......................................................................................... 40
Tunnelling scope ........................................................................................................................................ 42
QoS-based tunnelling interconnection example.................................................................................... 43
Differentiated Services regions, domains and nodes ........................................................................... 48
Behaviour aggregate classification and DSCP marking....................................................................... 48
Logical View of a Packet Classifier and Traffic Conditioner ................................................................ 49
PHB ÅÆ DSCP mapping......................................................................................................................... 49
PHB encoding [31] ..................................................................................................................................... 50
Encoding of Assured Forwarding PHBs ................................................................................................. 51
RSVP flow descriptor structure ................................................................................................................ 53
RSVP message flow diagram................................................................................................................... 54
RSVP support block diagram – after [39] ............................................................................................... 54
Cisco’s two RSVP operation models: IntServ and IntServ/DiffServ [53]............................................ 55
Ethernet frame format................................................................................................................................ 56
IEEE 802.1p User Priority marking in 802.1q (VLAN) tagged frames................................................ 57
VLAN Cross Connect / VLAN XC [123] .................................................................................................. 58
Q-in-Q / stacked VLAN / Provider Bridges - IEEE 802.1ad [100] ....................................................... 59
MAC-in-MAC / Provider Backbone Bridges (PBB) – IEEE 802.1ah [101]......................................... 59
Priority Flow Control [56]........................................................................................................................... 60
Congestion spreading [103]...................................................................................................................... 60
MPLS shim header structure and hierarchy usage............................................................................... 62
MPLS Label stack structure...................................................................................................................... 62
MPLS LSP signalling: contiguous, nested, stitched.............................................................................. 63
GMPLS label representations .................................................................................................................. 64
GMPLS LSP hierarchy .............................................................................................................................. 64
ATM cell structure ...................................................................................................................................... 68
Functional layering structure for the Ethernet data service [111] ....................................................... 69
AS interconnection options ....................................................................................................................... 71
DE-CIX topology 2009 [61] ....................................................................................................................... 73
Internet hierarchy ....................................................................................................................................... 74
Route Flap Dampening [167] ................................................................................................................... 79
MESCAL - Cascaded Approach [139] .................................................................................................... 79
Components of a NSIS node - [80].......................................................................................................... 83
GIST protocol change to “Experimental“ status [164].......................................................................... 83
GIST protocol objections explained by Ross Callon [43] ..................................................................... 84
PCN working principle - [136]................................................................................................................... 85
DE-CIX yearly traffic graph - [62]............................................................................................................. 86
Cross-Domain CoS marking concept...................................................................................................... 90
IANA registry for BGP Extended Community type numbers ............................................................... 91
BGP Extended Community Attribute structure with type 0x40 or 0x44.............................................. 91
Structure of the QoS Marking Community.............................................................................................. 92
CoS enabled AS interconnection example topology............................................................................. 94
QoS Marking Extended Community signalling example ...................................................................... 96
Class overload limitation concept ............................................................................................................ 97
CoS Capability Extended Community Structure.................................................................................... 98
Per-Hop-Behaviour Identification Codes implied by CoS Capability .................................................. 98
CoS Parameter Attribute structure .......................................................................................................... 99
Classification of the Mapping scope...................................................................................................... 101
User/Subscriber Service Classes Grouping - [16]............................................................................... 105
Service Class Characteristics - [16] ...................................................................................................... 106
DSCP to Service Class Mapping - [16]................................................................................................. 107
QoS Mechanisms Used for Each Service Class - [16] ....................................................................... 107
Treatment Aggregate / Service Class Performance Requirements - [45] ....................................... 108
Treatment Aggregate Behaviour - [45].................................................................................................. 109
MPLS E-LSP mapping of Treatment Aggregates - [45] ..................................................................... 109
QoS Class Mapping framework - [150]................................................................................................. 111
Mandatory L-LSP encoding rules - [75] ................................................................................................ 113
Scenario 1: single node interconnection............................................................................................... 118
164
17.11.2009
Fig. 97
Fig. 98
Fig. 99
Fig. 100
Fig. 101
Fig. 102
Fig. 103
Fig. 104
Fig. 105
Fig. 106
Fig. 107
Fig. 108
Fig. 109
Fig. 110
Fig. 111
Fig. 112
Fig. 113
Fig. 114
Fig. 115
Fig. 116
Fig. 117
Fig. 118
Fig. 119
Fig. 120
Fig. 121
Fig. 122
Fig. 123
Fig. 124
Fig. 125
Fig. 126
Fig. 127
Fig. 128
Fig. 129
Fig. 130
Fig. 131
Fig. 132
Fig. 133
Fig. 134
Fig. 135
Fig. 136
Fig. 137
Fig. 138
Fig. 139
Fig. 140
Fig. 141
Fig. 142
Fig. 143
Fig. 144
Fig. 145
Fig. 146
Fig. 147
Fig. 148
Fig. 149
Fig. 150
Fig. 151
Fig. 152
Fig. 153
Fig. 154
Fig. 155
S1: 9-f-cbwfq............................................................................................................................................. 118
S1: 9-a-no-priority .................................................................................................................................... 118
S1: 9-f-cbwfq............................................................................................................................................. 119
S1: 9-a-no-priority .................................................................................................................................... 119
S1: 9-f-strict-priority.................................................................................................................................. 119
Scenario 2: AS Interconnection – Single AS........................................................................................ 120
S2: 9-f-cbwfq............................................................................................................................................. 120
S2: 9-a-no-priority .................................................................................................................................... 120
S2: 9-b-cbwfq............................................................................................................................................ 121
S2: 9-e-cbwfq............................................................................................................................................ 121
Scenario 3: AS interconnection – Multi-AS .......................................................................................... 121
S3: 9-f-cbwfq............................................................................................................................................. 122
S3: 9-a-no-priority .................................................................................................................................... 122
Scenario 4: AS interconnection – Multi-AS 2 ....................................................................................... 122
S4: CBWFQ / no priority.......................................................................................................................... 123
S4: no priority / CBWFQ.......................................................................................................................... 123
Scenario 5: AS interconnection – Multi-AS 3 ....................................................................................... 123
S5: 2x CBWFQ / no priority .................................................................................................................... 124
S5: no priority / 2x CBWFQ .................................................................................................................... 124
S5: 2x CBWFQ / EF&BE......................................................................................................................... 124
S5: EF&BE / 2x CBWFQ......................................................................................................................... 124
Scenario 6: AS interconnection – Multi-AS 4 ....................................................................................... 125
S6: 2 classes w/o remark........................................................................................................................ 125
S6: 2 classes with remark....................................................................................................................... 125
S6: 1 class w/o remarking....................................................................................................................... 126
S6: 1 class with remarking...................................................................................................................... 126
Scenario 7: AS interconnection – Cross-Layer.................................................................................... 126
S7: with Ethernet QoS............................................................................................................................. 127
S7: without Ethernet QoS ....................................................................................................................... 127
Single node structure with token bucket filtering ................................................................................. 128
Single node TB 4->4................................................................................................................................ 129
Single node TB 4->3................................................................................................................................ 129
Single node TB 4->2................................................................................................................................ 130
Single node TB 4->1................................................................................................................................ 130
Quagga Routing Suite structure............................................................................................................. 132
Example setup for 4 QoS Marking Ex. Communities for IP-DiffServ .............................................. 133
Example setup for a CoS Capability Ex. Community.......................................................................... 133
Example setup for a CoS Parameter Attribute..................................................................................... 134
Wireshark screenshot with captured Extended Communities ........................................................... 136
Reception example of Extended Communities in commercial routers............................................. 137
Decoding result of the online form......................................................................................................... 137
Implementation test setup....................................................................................................................... 138
Router1: show ip bgp sum – full feed.................................................................................................... 139
Router 1: show ip bgp sum – single prefix with 4 communities......................................................... 140
Router 1: show ip bgp sum – 10 prefixes with 4 communities .......................................................... 140
Router1: show ip bgp sum – full feed with single community attached to all .................................. 140
Completely processed Extended Community attribute example ...................................................... 141
VLAN User Priority test at DE-CIX [144] ............................................................................................. 142
Major European IXPs with VLAN User Priority support...................................................................... 143
Active BGP entries over time [year] - [93] ............................................................................................ 144
Unique ASes over time [year] - [93] ...................................................................................................... 144
Hourly Average of Updated and Withdrawn Prefix Rate - [93].......................................................... 145
CoS signalling UPDATE message overhead – single prefix case.................................................... 146
Wireshark screenshot with 173 Extended Communities UPDATE................................................... 147
Structure of the QoS Marking Community............................................................................................ 149
Memory usage estimates for up to 8 classes and four technologies ............................................... 150
Memory usage for ext. communities sent within one UPDATE message ....................................... 151
Memory usage for large quantities of sent extended communities .................................................. 151
Linux remote control of existing commercial AS border router ......................................................... 154
165
17.11.2009
List of Tables
Table 1
Table 2
Table 3
Table 4
Table 5
Table 6
Table 7
Table 8
Table 9
Table 10
Table 11
Table 12
Table 13
Table 14
Table 15
Table 16
Table 17
Table 18
Table 19
Table 20
Table 21
Transfer demand matrix – after [79] ___________________________________________________ 46
Assured Forwarding DSCP encoding __________________________________________________ 51
Currently specified PHBs ____________________________________________________________ 52
Excerpt of IP QoS class definitions and performance objectives [108] ______________________ 56
Ethernet traffic types [97] ____________________________________________________________ 57
Mapping of traffic types to available queues [97] ________________________________________ 58
Chemnitz University applied Ethernet-priority-to-DSCP mapping __________________________ 58
UMTS QoS classes [1] ______________________________________________________________ 65
UMTS Bearer Service Attributes [1] ___________________________________________________ 66
LTE QoS class attributes [2]________________________________________________________ 67
Overview of available layer 2 and 3 quality of service classes ___________________________ 70
Technology Type Enumeration _____________________________________________________ 93
CoS Capability Attribute – binary class encoding ______________________________________ 98
Queue mapping reuse for priority mapping __________________________________________ 110
Cisco’s default CoS-to-DSCP mapping [55]__________________________________________ 112
Cisco’s default DSCP-to-CoS mapping [55]__________________________________________ 112
Chemnitz University applied CoS-to-DSCP mapping__________________________________ 112
Traffic source configuration parameters _____________________________________________ 116
Class and traffic type variations in simulations _______________________________________ 117
Simulation parameter settings _____________________________________________________ 128
Extended command line syntax for CoS configurations________________________________ 135
166
17.11.2009
Versicherung
Hiermit versichere ich, dass ich die vorliegende Arbeit ohne unzulässige Hilfe Dritter und
ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe; die aus
fremden Quellen direkt oder indirekt übernommenen Gedanken sind als solche kenntlich
gemacht.
Bei der Auswahl und Auswertung des Materials sowie bei der Herstellung des Manuskripts
habe ich Unterstützungsleistungen von folgenden Personen erhalten:
Prof.
Dr.-Ing. Thomas Bauschert
...................................................
Prof.
Dr.-Ing. habil. Klaus Franke
...................................................
Simon
Ehnert
...................................................
Daniel
Manns
...................................................
Uwe
Steglich
...................................................
Brian
Schaefer
...................................................
Weitere Personen waren an der Abfassung der vorliegenden Arbeit nicht beteiligt. Die
Hilfe eines Promotionsberaters habe ich nicht in Anspruch genommen. Weitere Personen
haben von mir keine geldwerten Leistungen für Arbeiten erhalten, die im Zusammenhang
mit dem Inhalt der vorgelegten Dissertation stehen.
Die Arbeit wurde bisher weder im Inland noch im Ausland in gleicher oder ähnlicher Form
einer anderen Prüfungsbehörde vorgelegt.
Chemnitz, 17.11.2009
...........................................
.................................................
Ort, Datum
Unterschrift
167
17.11.2009
Theses
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
The Internet has become increasingly popular in recent years and has a
steadily growing user base. The resulting traffic load, especially due to rapidly increasing Internet access speeds, will lead to high traffic volumes in
the core of the network.
A rising usage for time and loss critical services, such as voice over IP
(VoIP), video streaming (IPTV) and online gaming, across the Internet can
be observed, together with high user expectations of the service quality.
This will inevitably require quality of service (QoS) handling procedures in
provider networks.
The current Internet structure consists of about 30,000 interconnected service provider networks. Those interconnections are based on the Internet
Protocol (IP) and do not distinguish the mixed traffic types within the transported traffic load.
Link capacity over-provisioning is the easiest and most sustainable way to
provide high quality of transmission service and will always be used.
Over-provisioning of interconnection links results in low link capacity utilization and frequent speed upgrades due to the traffic growth rate.
The resulting hardware upgrade cost for faster router interfaces will evolve
into a financial burden for service providers, who only apply overprovisioning in their network operation.
Today, service providers are already making use of the QoS concept of
“Differentiated Services (DiffServ)” within their network domains - Autonomous Systems (AS). Its deployment is increasing and is expected to be
universally available within ASes.
The Internet’s default packet forwarding behaviour, Best Effort (BE), will
not be sufficient in the future on interconnection links.
AS interconnections need to support at least simple traffic separation and
separate traffic queues for an enhanced interconnection quality support.
The setup and operation of AS interconnections is a fundamental element
in any provider’s network. Any inter-domain QoS solutions will therefore
need to be simple for community acceptance.
The thesis’ work identified two fundamental design requirements for a simple QoS concept. They are simplicity in design and simplicity in QoS support.
QoS in this approach therefore refers to primitive traffic separation into several classes, which will experience differently prioritized forwarding behaviour in relaying nodes. Enqueueing in separate queues is thereby aspired
to.
The always performed link capacity over-provisioning combined with a simple traffic class separating inter-domain QoS concept will enable classbased over-provisioned interconnections.
The signalling of available traffic classes is required and mutual (Service
Level Agreement (SLA) based) solutions can be manually set up. However,
168
17.11.2009
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
the designed new class of service signalling procedures will inform all globally interconnected service providers about the available traffic separation
support as well as automate the CoS enabled AS interconnection setup
procedure.
Because of the currently missing CoS support, service providers often perform multi-layer ingress classification on incoming traffic in order to make a
good guess on which traffic is entering the domain. This costly classification procedure can be waived, if the new CoS signalling informs about the
available class sets and provides inter-domain mapping information.
Simulations have shown, that even the interconnection of differently set up
CoS enabled ASes leads to a considerable increase in successfully transferred high priority traffic across a chain of several transit ASes in the path
as compared to the BE only interconnections style.
Talks to service providers have revealed a strong request for “Lower Effort
(LE)” class of service support. De-prioritization support by means of signalled LE CoS is included in the concept’s specification. The support of BE
and LE traffic the simplest recommended class set combination in the
specification.
The simple CoS support concept claims that the support of Expedited Forwarding (EF), one commonly used Assured Forwarding (AF) group, BE
and LE traffic classes will suffice at AS interconnections for most service
providers. A generally available 2 class or recommended 4 class CoS enabled global Internet, is aspired to.
The advertised availability of higher priority traffic class support will potentially lead to misuse. Furthermore, high class forwarding quality can only be
supplied to a limited share of the link capacity. Therefore, class-overload
protection is required and will optionally be provided by the new concept.
The class of service support allows for higher link utilization without noticeable service degradation. This way, the concept allows postponement of interface speed upgrades until a higher utilization threshold is crossed. The
deferral is expected to deliver an easily achievable economical benefit to
service providers.
Quality of service support is not confined to the IP layer, but is offered on
several packetized networking technologies. Multi-protocol Label Switching
and Ethernet with virtual LAN (local area network) support are the two most
common QoS capable tunnelling technologies for IP transport. The harmonization of IP QoS and lower layer QoS is essential and is manually
cared for in the intra-domain case.
Inter-domain signalling of cross-layer QoS support is a novel feature and is
provided in the new CoS concept. Even upcoming tunnelled interconnection can thus be automatically harmonized.
Simulations have shown that the preservation of class of service markings
is vital for a successful traffic separation. ASes, which remark packets on
their way through CoS domains with very limited class support virtually destroy the separation along the remaining AS forwarding chain and finer
grained class sets can no longer be utilized for separation.
The transparent transport of customer traffic is strongly recommended by
the concept. Marking preservation is automatically achieved by traffic encapsulation and tunnelled transport.
Current trends are observable, where Ethernet and MPLS based interdomain tunnelling is arising. The new CoS concept already provides the
signalling means for the harmonized CoS interconnection.
169
17.11.2009
26. For the practical usage of the new concept, a Linux implementation, the
implementation in the official release of the network analysing tool Wireshark and an online decoding form for decoding of raw signalling data is
available.
27. Linux remote control of commercial routers via command line sessions is
planned as an intermediate deployment solution of the concept with legacy
routing equipment. The transitive design of the required signalling elements
allows for the passive bidirectional signalling relay through existing routers
without hardware or software update requirements.
28. The concept’s integration in commercial equipment is expected due to its
simplicity and ease of implementation.
29. The current discussion about network neutrality reveals a fundamental objection to any traffic separation scheme. However, because of the new concept’s universal applicability and the resulting generally available CoS
support to all network users, the concept is likely to be regarded as nondiscriminating.
30. Further techno-economic studies on the cost reduction potential of the concept will need to be carried out to guide the device upgrade and CoS deployment decision process.
31. A new BGP Community based signup procedure for new services and concepts has recently been proposed by the company Google. Depending on
the outcome, this CoS support concept can even be used as contractual
base for inter-provider class of service support agreements.
170
17.11.2009
Lebenslauf
171