Download Avici’s Test Program for High Quality/Reliability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Distributed firewall wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Internet protocol suite wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Computer network wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Network tap wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Airborne Networking wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Transcript
Core Router Testing for
High Availability
Scott Poretsky
Avici Systems, Inc.
June 3, 2002
Avici Company Confidential
Reliable
Routing for the Internet
Outline



IP Network Availability
Test Coverage for 99.999% Availability
Commercial Test Equipment Requirements
Architecture for the 21st Century Network
IP Network Availability
Architecture for the 21st Century Network
High Reliability = More Revenue
Reliability is the single biggest criteria in selecting an
ISP, according to Interactive Week/Telechoice
Relative Importance
ISP Customer Survey
4.8
4.7
4.6
4.5
4.4
4.3
4.2
4.1
4
Reliability
Value
PerformanceCustomer Provisioning
Service
Speed
New IP services demand higher levels of network reliab
Architecture for the 21st Century Network
High Reliability = More Profit
Compensation for poor router reliability through redundancy and
interconnects can increase network cost by up to 50%
IP Backbone
Service
Service
Peering
Provider
Provider
Peer
Peer
Core Layer
(Backbone Router)
Aggregation Layer
(Hub Router)
Edge
Layer
DSLAM
L3/4
Switch
CMTS
Direct
Connects
GGSN
L3/4
Switch
Architecture for the 21st Century Network
VOIP
Direct
Connects
Access
Devices
Definitions

Reliable


Availability


Mean Time Between Critical Failures (MTBCF) or the average
time between hardware or software failures that interrupt service
(the mission)
Maintenance Reliability


Measure of Reliability using router/switch Uptime
Mission Reliability


Capable of being dependable (Webster)
Mean Time Between Failures (MTBF) or the average time
between hardware failures that require corrective maintenance
actions
Defects Per Million (DPM)

Measure of downtime equal to (1 – Availability) x 106
Architecture for the 21st Century Network
Contributing Factors for Availability
Total Time to Restore Router/Switch After a Software
CrashDump Time
Mission Reliability
Software
Failure
Occurs
Time
Image
Upgrade Time
Boot Time
Not to Scale
Total Time to Restore a Module After a Hardware
Maintenance Reliability
Time
Maintainer Response Time
Hardware
Failure
Occurs
Failure
Removal and
Replacement
Time
Not to Scale
Architecture for the 21st Century Network
Protocol
Convergence
Time
Full
Operation
Restored
Failure
Boot Time
Protocol
Convergence
Time
Full
Operation
Restored
The Availability Goal



The Goal – 99.999% Router Availability
The Reality – 99.9% Router Availability
Features to achieve 99.999% availability.




Non-Stop Routing
Graceful Restart
What if testing could could improve Mission Reliability
to achieve 99.999% Availability in absence of new
features?
What if the addition of these new features would then
achieve 99.9999% Availability?
Architecture for the 21st Century Network
Test Coverage
Architecture for the 21st Century Network
Traditional Test Coverage

Isolated testing of protocols






Functionality
Conformance
Interoperability
Scaling
Forwarding Performance in the absence of protocols.
Disadvantages



Operational environment is not tested
Operational conditions are not tested
The router under test is not completely stressed.
Deployed routers run multiple protocols
simultaneously.
Architecture for the 21st Century Network
Test Program for 99.999% Availability





Stress Testing
Longevity Testing
Convergence Testing
Network-Specific Topology Testing
Automated Regression Testing
Architecture for the 21st Century Network
Stress Testing

Simultaneous configuration and scaling of multiple protocols.




Traffic Forwarding




Line Rate Traffic Forwarding
Overutilize links
Enable QoS
Network Instability




BGP, IGP
MPLS-TE, LDP (optional)
MBGP, PIM-SM, MSDP (optional)
Repeated Route Flaps
Link Loss
Tunnel Reroutes (optional)
Serviceability




Repeated SNMP Gets
Logging Enabled
Debug Enabled
Telnet with SHOW commands (stressful and invalid)
Architecture for the 21st Century Network
Stress Configuration
Optional
Neighbor
Router for
Tunnel
Reroutes
Neighbor
Router
Router
Under Test
Neighbor
Router
Test
Equipment
Test
Equipment
Test
Equipment
Architecture for the 21st Century Network
Stress Execution Guidelines





Configure ECMP, Parallel Paths, and Composite
Links between routers
Use Live BGP Feed for Route Table
Mix traffic types across links (IP Unicast, IP Multicast,
MPLS)
One neighbor router should be a different vendor to
show interoperability under stress
Run Stress for many days (if the router lasts that
long)
Router should experience more in a couple of days
then it likely would in its operational lifetime.
Architecture for the 21st Century Network
Typical Stress Metrics






Flap 1 million BGP routes per hour
Forward 10 Terabits of data per hour
Perform 100,000 SNMP Gets per hour
Simulate 100 fiber cuts per hour (use every remote
interface)
Along with
 Full BGP Table
 Full IGP Table
 Full Multicast Cache
 Required MPLS-TE Tunnels (protection optional)
 Required LDP FECs
Enable Logging and Protocol Debug
Architecture for the 21st Century Network
Longevity Testing

Similar to Stress Testing, but more operational (less
stressful) conditions injected over many weeks.





Simultaneous configuration and scaling of multiple protocols
Traffic Forwarding
More realistic Network Instability
More typical Serviceability actions
Use Live Internet feed.
Architecture for the 21st Century Network
Convergence Terms

Network Convergence The point in time at which all nodes in a network have updated
their routing tables for a route entry change (new, withdrawal,
or modification)

Protocol Convergence The point in time in which a single node updates its routing table
and advertises the route table change to its peer in a routing
protocol advertisement (or update) message.

Route Convergence The point in time in which a single node updates its routing table
and reroutes traffic out the new interface.
Route Convergence is the common Router Benchmark.
Architecture for the 21st Century Network
Convergence Test Issues

Large number of Protocols in which Convergence is
important.

Number of conditions that can impact results.

Technical difficulty in testing convergence of one
protocol due to flap or instability of another protocol.
Architecture for the 21st Century Network
Convergence Test Conditions

Interface shutdown



Fiber Pull





on Local Interface
on Remote Interface
Peer removal via CLI


on Local Interface
on Remote Interface
on Local router
on Peer router
All conditions must be
tested because different
results can be produced.
Peer node failure
Route Table changes






Route Withdrawal
Route Flap
Next-Hop Change
Metric Change
Dynamic Constraint Change
Policy Change
Architecture for the 21st Century Network
Network-Specific Topology Testing




Large network with many routers (e.g. 10)
Use multiple vendors for interoperability/functionality
testing.
Multiple protocols configured in deployment scenario
Run test cases to match deployment scenario
Architecture for the 21st Century Network
Automated Regression Testing






Addition of bug fixes/new features put previously working features
at risk.
Regression testing ensures that the previously working features
still work.
As the number of releases with new features grow it is more
difficult to provide complete regression coverage through manual
testing (increasingly labor intensive).
Automated regression testing enables more coverage in less time.
Automation is typically achieved using TCL scripts.
Configuration:
Test
Equipment
Router
Under Test
Architecture for the 21st Century Network
Commercial Test Equipment
Requirements
Architecture for the 21st Century Network
The State of the Union



Test Equipment fails to meet today’s requirements for
testing 99.999% Availability.
Router vendors have been forced to develop their own
specialized test tools.
Carriers have been forced to use the router vendor test
tools.
Test Equipment vendors must respond to the challenge
today.
Architecture for the 21st Century Network
Stress Testing Requirements







Maintain BGP Sessions and IGP Adjacencies
Flap BGP Routes
Signal and maintain RSVP-TE tunnels
Distribute LDP FECs
Signal and maintain Multicast Groups
Perform SNMP GETs and check validity
Forward Traffic (IP Unicast, IP Multicast, and MPLS)
Make the network seem much bigger than it really is
without having to obtain hundreds of routers.
Architecture for the 21st Century Network
Required Protocol Emulation/
Conformance Suites Coverage




Routing Protocols
 BGP
 OSPF, ISIS
 OSPF-TE, ISIS-TE
RSVP-TE
 Fast Reroute
 Standby Tunnels
 Ingress, Mid-Point, Egress
LDP
 RFC 2547 Layer 3 VPNs
 Martini Layer 2 VPNs
 P and PE
 LDP over RSVP
Multicast
 MBGP
 PIM-SM
 MSDP
Architecture for the 21st Century Network
Protocol Emulation Requirements






Run any protocols in combination on the same interface
Forward traffic for emulated protocols
Protocol Emulation on any interface type – GigE,
10GigE, and POS (including 192c).
Scaling
 BGP Sessions >500/system, >100/interface
 BGP Routes >3M/system, >500K/session
 MPLS-TE Tunnels >10K - Ingress, Mid-Point, Egress
 FECs >10K
Load external BGP table for advertisement
Controlled BGP Route Flapping
Architecture for the 21st Century Network
Automated Regression Requirements

Commercial test equipment vendors offer protocol
conformance TCL suites.
 Test Case coverage must be improved within each
suite
 Interaction between protocols must be tested
 Need each script to test multiple interfaces (4 or
more)

Full Protocol Coverage
 Multicast protocols have been the “forgotten son”
Architecture for the 21st Century Network
System Requirements







Multiple ports per chassis (>32)
Automated Convergence measurement
Automated reroute/failover measurement
Support for ECMP and Composite Links
System/Protocol Stability For Many Days
Ability to store GUI configuration for repeatability.
Ability to TCL script any GUI test case.
Architecture for the 21st Century Network