* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Avici’s Test Program for High Quality/Reliability
Survey
Document related concepts
Distributed firewall wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Internet protocol suite wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Computer network wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Network tap wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Transcript
Core Router Testing for High Availability Scott Poretsky Avici Systems, Inc. June 3, 2002 Avici Company Confidential Reliable Routing for the Internet Outline IP Network Availability Test Coverage for 99.999% Availability Commercial Test Equipment Requirements Architecture for the 21st Century Network IP Network Availability Architecture for the 21st Century Network High Reliability = More Revenue Reliability is the single biggest criteria in selecting an ISP, according to Interactive Week/Telechoice Relative Importance ISP Customer Survey 4.8 4.7 4.6 4.5 4.4 4.3 4.2 4.1 4 Reliability Value PerformanceCustomer Provisioning Service Speed New IP services demand higher levels of network reliab Architecture for the 21st Century Network High Reliability = More Profit Compensation for poor router reliability through redundancy and interconnects can increase network cost by up to 50% IP Backbone Service Service Peering Provider Provider Peer Peer Core Layer (Backbone Router) Aggregation Layer (Hub Router) Edge Layer DSLAM L3/4 Switch CMTS Direct Connects GGSN L3/4 Switch Architecture for the 21st Century Network VOIP Direct Connects Access Devices Definitions Reliable Availability Mean Time Between Critical Failures (MTBCF) or the average time between hardware or software failures that interrupt service (the mission) Maintenance Reliability Measure of Reliability using router/switch Uptime Mission Reliability Capable of being dependable (Webster) Mean Time Between Failures (MTBF) or the average time between hardware failures that require corrective maintenance actions Defects Per Million (DPM) Measure of downtime equal to (1 – Availability) x 106 Architecture for the 21st Century Network Contributing Factors for Availability Total Time to Restore Router/Switch After a Software CrashDump Time Mission Reliability Software Failure Occurs Time Image Upgrade Time Boot Time Not to Scale Total Time to Restore a Module After a Hardware Maintenance Reliability Time Maintainer Response Time Hardware Failure Occurs Failure Removal and Replacement Time Not to Scale Architecture for the 21st Century Network Protocol Convergence Time Full Operation Restored Failure Boot Time Protocol Convergence Time Full Operation Restored The Availability Goal The Goal – 99.999% Router Availability The Reality – 99.9% Router Availability Features to achieve 99.999% availability. Non-Stop Routing Graceful Restart What if testing could could improve Mission Reliability to achieve 99.999% Availability in absence of new features? What if the addition of these new features would then achieve 99.9999% Availability? Architecture for the 21st Century Network Test Coverage Architecture for the 21st Century Network Traditional Test Coverage Isolated testing of protocols Functionality Conformance Interoperability Scaling Forwarding Performance in the absence of protocols. Disadvantages Operational environment is not tested Operational conditions are not tested The router under test is not completely stressed. Deployed routers run multiple protocols simultaneously. Architecture for the 21st Century Network Test Program for 99.999% Availability Stress Testing Longevity Testing Convergence Testing Network-Specific Topology Testing Automated Regression Testing Architecture for the 21st Century Network Stress Testing Simultaneous configuration and scaling of multiple protocols. Traffic Forwarding Line Rate Traffic Forwarding Overutilize links Enable QoS Network Instability BGP, IGP MPLS-TE, LDP (optional) MBGP, PIM-SM, MSDP (optional) Repeated Route Flaps Link Loss Tunnel Reroutes (optional) Serviceability Repeated SNMP Gets Logging Enabled Debug Enabled Telnet with SHOW commands (stressful and invalid) Architecture for the 21st Century Network Stress Configuration Optional Neighbor Router for Tunnel Reroutes Neighbor Router Router Under Test Neighbor Router Test Equipment Test Equipment Test Equipment Architecture for the 21st Century Network Stress Execution Guidelines Configure ECMP, Parallel Paths, and Composite Links between routers Use Live BGP Feed for Route Table Mix traffic types across links (IP Unicast, IP Multicast, MPLS) One neighbor router should be a different vendor to show interoperability under stress Run Stress for many days (if the router lasts that long) Router should experience more in a couple of days then it likely would in its operational lifetime. Architecture for the 21st Century Network Typical Stress Metrics Flap 1 million BGP routes per hour Forward 10 Terabits of data per hour Perform 100,000 SNMP Gets per hour Simulate 100 fiber cuts per hour (use every remote interface) Along with Full BGP Table Full IGP Table Full Multicast Cache Required MPLS-TE Tunnels (protection optional) Required LDP FECs Enable Logging and Protocol Debug Architecture for the 21st Century Network Longevity Testing Similar to Stress Testing, but more operational (less stressful) conditions injected over many weeks. Simultaneous configuration and scaling of multiple protocols Traffic Forwarding More realistic Network Instability More typical Serviceability actions Use Live Internet feed. Architecture for the 21st Century Network Convergence Terms Network Convergence The point in time at which all nodes in a network have updated their routing tables for a route entry change (new, withdrawal, or modification) Protocol Convergence The point in time in which a single node updates its routing table and advertises the route table change to its peer in a routing protocol advertisement (or update) message. Route Convergence The point in time in which a single node updates its routing table and reroutes traffic out the new interface. Route Convergence is the common Router Benchmark. Architecture for the 21st Century Network Convergence Test Issues Large number of Protocols in which Convergence is important. Number of conditions that can impact results. Technical difficulty in testing convergence of one protocol due to flap or instability of another protocol. Architecture for the 21st Century Network Convergence Test Conditions Interface shutdown Fiber Pull on Local Interface on Remote Interface Peer removal via CLI on Local Interface on Remote Interface on Local router on Peer router All conditions must be tested because different results can be produced. Peer node failure Route Table changes Route Withdrawal Route Flap Next-Hop Change Metric Change Dynamic Constraint Change Policy Change Architecture for the 21st Century Network Network-Specific Topology Testing Large network with many routers (e.g. 10) Use multiple vendors for interoperability/functionality testing. Multiple protocols configured in deployment scenario Run test cases to match deployment scenario Architecture for the 21st Century Network Automated Regression Testing Addition of bug fixes/new features put previously working features at risk. Regression testing ensures that the previously working features still work. As the number of releases with new features grow it is more difficult to provide complete regression coverage through manual testing (increasingly labor intensive). Automated regression testing enables more coverage in less time. Automation is typically achieved using TCL scripts. Configuration: Test Equipment Router Under Test Architecture for the 21st Century Network Commercial Test Equipment Requirements Architecture for the 21st Century Network The State of the Union Test Equipment fails to meet today’s requirements for testing 99.999% Availability. Router vendors have been forced to develop their own specialized test tools. Carriers have been forced to use the router vendor test tools. Test Equipment vendors must respond to the challenge today. Architecture for the 21st Century Network Stress Testing Requirements Maintain BGP Sessions and IGP Adjacencies Flap BGP Routes Signal and maintain RSVP-TE tunnels Distribute LDP FECs Signal and maintain Multicast Groups Perform SNMP GETs and check validity Forward Traffic (IP Unicast, IP Multicast, and MPLS) Make the network seem much bigger than it really is without having to obtain hundreds of routers. Architecture for the 21st Century Network Required Protocol Emulation/ Conformance Suites Coverage Routing Protocols BGP OSPF, ISIS OSPF-TE, ISIS-TE RSVP-TE Fast Reroute Standby Tunnels Ingress, Mid-Point, Egress LDP RFC 2547 Layer 3 VPNs Martini Layer 2 VPNs P and PE LDP over RSVP Multicast MBGP PIM-SM MSDP Architecture for the 21st Century Network Protocol Emulation Requirements Run any protocols in combination on the same interface Forward traffic for emulated protocols Protocol Emulation on any interface type – GigE, 10GigE, and POS (including 192c). Scaling BGP Sessions >500/system, >100/interface BGP Routes >3M/system, >500K/session MPLS-TE Tunnels >10K - Ingress, Mid-Point, Egress FECs >10K Load external BGP table for advertisement Controlled BGP Route Flapping Architecture for the 21st Century Network Automated Regression Requirements Commercial test equipment vendors offer protocol conformance TCL suites. Test Case coverage must be improved within each suite Interaction between protocols must be tested Need each script to test multiple interfaces (4 or more) Full Protocol Coverage Multicast protocols have been the “forgotten son” Architecture for the 21st Century Network System Requirements Multiple ports per chassis (>32) Automated Convergence measurement Automated reroute/failover measurement Support for ECMP and Composite Links System/Protocol Stability For Many Days Ability to store GUI configuration for repeatability. Ability to TCL script any GUI test case. Architecture for the 21st Century Network