* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download GNET-1の概要
Survey
Document related concepts
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
TCP congestion control wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Passive optical network wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
Distributed firewall wikipedia , lookup
Computer network wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Network tap wikipedia , lookup
Airborne Networking wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Packet switching wikipedia , lookup
Transcript
Network measurement, emulation, protocol benchmarking and scheduling Tomohiro Kudoh Grid Technology Research Center National Institute of Advanced Industrial Science and Technology (AIST) Outline Network measurement, emulation, protocol benchmarking GtrcNET PSPacer Network scheduling G-lambda project 2 GtrcNET Measurement Microscopic behavior observation Burstiness (i.e. burst transfer with in RTT period) can not be observed by software (such as iperf) Emulation Reproducible environment Emulated WAN environment is preferable than “real network” in for reproducible experiment. Software delay emulation is not stable Clock accurate delay emulation can be achieved by hardware Hardware Network Test-bed 3 GtrcNET-1 GNET-1 GtrcNET-1 Control SNMP Agent 4 Block Diagram of GtrcNET-1 - Clock-accurate Behavior - Network emulation, Traffic measurement, … 5 GtrcNET usage Measuerment Sub-ms BW measurement μs accurate one-way delay measurement using GPS Emulation GNET-1 GNET-1 WAN delay and error(Gbps, 300ms RTT) Stable environment for software development Interne t GNET-1 Protocol benchmarking Emulation, measurement and bandwidth control for protocol benchmarking GNET-1 GNET-1 6 Pure Grid Pure Grid is a network emulation environment High Controllability : various precise parameters High Performance : 10GbE wire-rate operation High Resolution : less than 1ms measurement interval PC Cluster 1 Emulate bottleneck network GbE … Measure precise network behavior PC Cluster 2 One way latency (0 - 800ms) Bandwidth (Interval 100usec - 32sec) Bandwidth (1Mbps -10Gbps) Frame capture Buffer size (0 - 1GBytes) Stream-wise bandwidth Frame loss (5.0x10-10 step) GPS synchronized time GbE … Buffer control (Tail-drop, random, RED) 10GbE GtrcNET 10GbE New protocol prototyping Smooth traffic shaping (1Mbps – 10Gbps) Multi-path transfer for dependable communication 7 Example Usage of GtrcNET-1 (1) GridMPI has been evaluated on Pure Grid Latency: 0, 4ms, 20ms, 200ms 8nodes 8nodes GbE GbE GbE GbE 1.2 Relative Performance PerformanceNASofParallel GridMPI is evaluated Benchmarks (NPB2.3) 1 on various network parameters. 0.8 http://www.gridmpi.org/ 0.6 0.4 0.2 0 GridMPI (20ms) GridMPI (4ms) GridMPI (0ms) BT CG FT IS LU MG GridMPI (200ms) SP 8 Example Usage of GtrcNET-1 (2) Measuring fine-grain bandwidth Latency: 100ms Measure bandwidth every 1ms GbE GbE GbE 1 stream with 16MB socket buffer 1ms average bandwidth is bursty. 16M-WADIFQ Bandwidth (Mbps) 1000 Real-time measurement 200ms average bandwidth is about 500Mbps because socket buffer size is small. It looks like stable. 800 100ms ave. 600 400 bandwidth 200 ave(200ms) 1ms ave. 0 0 1 2 3 time (sec) 4 5 9 Example Usage of GtrcNET-1 (3) Measuring per-stream bandwidth Latency: 100ms Bandwidth: 500Mbps PC-1 PC-2 GbE GbE PC-3 GbE GbE PC-4 Bandwidth of each stream is controlled by PSPacer realizes precise pacing by software PSPacer to 256Mbps. Bandwidth of each stream is controlled by socket buffer size(8MB,250Mbps). 1200 600 http://www.gridmpi.org/pspacer-1.0/ Bandwidth (Mbps) 1000 800 PC2→PC4 600 400 500 Bandwidth (Mbps) Txall TxCH1 PC1→PC3 TxCH3 Ave(1sec) 400 300 200 100 200 0 0 Each stream exceeds bottleneck link capacity, and 2 4 frames 6 are dropped. 8 10 time (second) 0 0 Txall TxCH1 TxCH3 Ave(1sec) Each stream is with in the specified 2 6 stable. 8 rate, and they4 are very time (second) 10 10 GtrcNET-10 Two types of GtrcNET-10 have been developed GtrcNET-10p2 10GbE (MSA300) x 2 ports with 1GBytes / ports GtrcNET-10p3 10GbE (XENPAK) x 3 ports with 1GBytes / ports Shown on the desk 11 Architecture of GtrcNET-10p3 1GBytes 64bit x 333MHz (XC2VP100) SO-DIMM DDR333 SO-DIMM DDR333 SO-DIMM DDR333 System ACE/CF 4bit x 3.125GHz FPGA 10GbE MAC 10GbE 10GbE MAC 10GbE 10GbE MAC 10GbE XENPAK. XENPAK XENPAK MICTOR USB2.0 GPS 12 Current implemented functions of GtrcNET-10 Delay Emulation (-800ms) Precise Bandwidth Measurement (1ms interval) Port Replication Output Rate Control with Pacing (64Kbps10Gbps) Random Frame Loss (min-rate 4.7E-10) Buffer Size Control (1KB-1GB) All the functions currently implemented on GtrcNET-1 will be implemented soon 13 Real Network Measurement using GtrcNET-10 GbEx8 GbEx8 10GbE(JGNII) SW Tsukuba (8PC) SW 60Km (RTT 1.36ms) GtrcNET-10 Measure bandwidth in every 100ms Akihabara (8PC) NPB class B (JGNII) Bandwidth (Gbps) 5 4 3 2 1 0 0 100 200 300 time (sec) 400 500 BT Tx CG Tx EP Tx FT Tx IS Tx LU Tx MG Tx SP Tx 14 Network Emulation using GtrcNET-10 10GbE SW SW GtrcNET-10 Emulate a network with 1.36ms RTT Measure bandwidth in every 100ms 8PC (GbE) 8PC (GbE) NPB ClassB (emulation) Bandwidth (Gbps) 5 4 BT Tx CG Tx EP Tx FT Tx IS Tx LU Tx MG Tx SP Tx 3 2 1 0 0 100 200 300 time (sec) 400 500 15 PSPacer Quite accurate software pacing mechanism Works on Linux A classful queuing discipline for tc Effective for long fat pipe TCP transfer Can be used for per-flow traffic engineering 16 Pacing Pacing has been proposed to avoid burstiness of TCP traffic over long fat network The sender adjusts the Inter Packet Gap (IPG) properly to smooth the traffic bandwidth Bursty traffic occurs during an RTT without Pacing: RTT Packets are spread out during an RTT IPG With Pacing: RTT 17 Pacing (cont.) Hardware approach Use special hardware to realize precise pacing We have evaluated the effects of pacing using a hardware (GtrcNET) in SC2003 BW challenge. 18 Bandwidth Challenge with pacing Tx total 2500 2000 with Pacing 1500 The total bandwidth is quite stable, and the bandwidth utilization is high 1000 500 0 375 3000 375.5 376 376.5 377 Time (sec) Bandwidth (Mbps) Bandwidth (Mbps) 3000 Tx total 2500 377.5 378 2000 1500 1000 500 0 245 More than 95% efficiency 245.5 246 246.5 Time (sec) 247 247.5 19 248 Pacing (cont.) Software approach Software-base pacing mechanisms use a software timer to adjust the IPG This approach has some problems: coarse resolution (1-10ms), fluctuation, increase in system load Need a precise software pacing mechanism 20 Gap Packet: Virtual Inter Packet Gap A dummy packet (gap packet) is inserted between real packets to control the IPG A : B Bursty traffic occurs during an RTT Without Pacing: RTT We insert a gap packet between real packets A : B With Pacing: Gap packets real packet gap packet 21 Gap Packet (cont.) A gap packet should be an actual packet which is transmitted from a network interface A gap packet should not propagate beyond switches or routers Sender A sender transmits real packets and gap packets Switch Gap packets are discarded in an input port IPG The interval between real packets is preserved real packet gap packet 22 Gap Packet Format We employ a PAUSE packet as a gap packet Gap packet size is set to the required gap size in byte unit (i.e. 8ns) Pause time = 0 MAC Header MAC control (88 08) MAC opcode (00 01) pause time=0 padding (variable) Gap packet size (= IPG) IEEE 802.3x flow control If a host receives a PAUSE packet, the host suspend to transmit packets until the pause time is expired 23 Effects of Gap Packets Bandwidth while varying the IPG (Packet size = 1500B) We can transmit real packets to meet the target rate accurately Bandwidth (MB/s) 120 Theoretical Bandwidth Theoretical Bandwidth Actual Bandwidth 100 80 60 40 20 0 0 2000 4000 6000 8000 Inter Packet Gap (Byte) 10000 12000 24 Evaluation using Emulated WAN Two streams share single GigE bottleneck link Scalable TCP Execute iperf, with pacing and without pacing alternatively. Emulated WAN RTT: 200ms BW: 125 MB/s iperf -c iperf GtrcNET-1 Catalyst 3750 -s Catalyst 3750 Microscopic BW measurement 25 w/o with w/o with w/o with w/o with w/o 125MB/s pacing pacing pacing pacing pacing pacing pacing pacing pacing 50 s Stream 1 Stream 2 Total 26 Network scheduling and G- lambda project Co-scheduling of computing and network resources Advance reservation Network service interface (i.e. interface to reserve network resources) is being defined in the G-lambda project 27 Grid Grid provides a single system image to users by virtualization of service infrastructure such as computing, data and network resources from multiple domains. Users do not care about actual resources they are using. Grid middleware (such as planner, broker and scheduler) coordinates resources and provides virtual infrastructure. Software catalogs user Grid Middleware virtualizes resources Computers Sensor nets Data archives 28 Network service for Grid To realize such virtual infrastructure for Grid, resource management is one of the key issues. Grid middleware should allocate appropriate resources, including network resources, according to user’s request. Network resource manager should provide resource management service to Grid middleware. Network Service A standard open interface between Grid middleware and network resource manager is required, but not yet established. 29 Requirements for the network service interface Web Service Grid is being built based on Web Services technology Network service should be provided as a “Web Service”. SLA support Bandwidth, latency etc. Advance reservation Reserve bandwidth 30 G- lambda project overview The goal of this project is to establish a standard web services interface (GNS-WSI) between Grid resource manager and network resource manager provided by network operators. G-lambda project has been started in December 2004. Joint project of KDDI R&D labs., NTT, and AIST. We have defined a preliminary interface, and in cooperation with NICT, conducted a experiment using a JGN II GMPLS-based network test bed Live Demonstration at iGrid2005 and SC|05 31 System overview Grid Application Request Grid Portal 5 WSRF 1 Grid Resource Scheduler (GRS) Computing Resource Managers 10 1Gbps GNS-WSI Network Resource Management System (NRM) Network Control I/F GMPLS Network Cluster Computers 32 Grid Resource Scheduler (GRS) A Grid scheduler developed by AIST Implemented using GT4 (Globus Toolkit 4) According to users’ request, reserves computing and network resources (lambda paths) in advance Accepts requests which specify required # of clusters, # of CPUs at each clusters, and the bandwidth between clusters. GRS selects appropriate clusters by interworking between the NRM and multiple CRMs (Computing Resource Manager) 33 Network Resource Management System (NRM) Current implementation was developed by KDDI R&D Labs. Response to the requests from GRS through GNS-WSI Hide detailed path implementation. Provide a path between end points. (Path virtualization) Schedule and manage lambda paths. When the reserved time arrives, activate paths using GMPLS protocol. 34 GNS-WSI (Grid Network Service / Web Services Interface) Web services interface between GRS and NRM KDDI R&D Labs, NTT and AIST are working together to define the specification of the interface. Standardization Preliminary interface has been defined Polling-based operations Advance reservation of a path between end points Modification of reservation (i.e. reservation time or duration) Query of reservation status Cancellation of reservation 35 Overview of Demonstration 5 1 GUI ① User requests service via GUI, specifying the required number of computers and the network bandwidth needed 10 1Gbps WSRF Grid Resource Scheduler (GRS) GT4 GNS-WSI Web Services I/F Computing Resource Manager (CRM) GMPLS Network Resource Management System (NRM) JGN II Kanazawa JGN II Fukuoka KDDI Labs. Kamifukuoka AIST Tsukuba GMPLS Router Optical Cross-Connect Cluster Gigabit Ether ( X n streams) JGN ⅡOsaka Research Center JGN Ⅱ ②The computing resources and GMPLS network resources are reserved as the result of interworking between the GRS and NRM using GNS-WSI (Grid Network Service / Web Services Interface) ③ A molecular dynamics simulation is executed using the reserved computers and lambda paths. Ninf-G2 and Globus Toolkit 2 (GT2) are used at each cluster. AIST Akihabara 36 KAN TKB 180Miles KMF 410Miles 40Miles 150Miles AKB 250Miles FUK OSA 37 Demo Environment JGNⅡ Fukuoka JGNⅡ Kanazawa 4 KAN FUK KDDI Kamifukuoka 16 4 KMF TKB 3 2 2 2 32 3 4 2 2 OSA AIST Tsukuba AKB 2 16 12 JGN Ⅱ Osaka GMPLS Router Optical Cross-Connect 2 JGN Ⅱ n 2 AIST Akiba Cluster with n processors Lambda path (GbE) Clusters distributed over six locations in Japan are connected over GMPLS network test-bed deployed by JGN II 38 Overview of the Demo Application A molecular dynamics simulation implemented with a Grid Middleware called Ninf-G2, that is developed by AIST, Japan Ninf-G2 conforms the GridRPC API, a Global Grid Forum standard programming API for Grid Uses Globus Toolkit 2 for job invocation and communication 39 Demonstration replay 40 Thank you GtrcNET: Yuetsu Kodama PSPacer: Ryousei Takano, Yuetsu Kodama, Motohiko Matsuda, Yutaka Ishikawa G-lambda: Hidemoto Nakada, Atsuko Takefusa, Yoshio Tanaka, Fumihiro Okazaki, Satoshi Sekiguchi (AIST) Masatoshi Suzuki, Hideaki Tanaka, Tomohiro Otani, Munefumi Tsurusawa, Michiaki Hayashi, Takahiro Miyamoto (KDDI R&D lab.) Akira Hirano, Yasunori Sameshima, Wataru Imajuku, Takuya Ohara, Yukio Tsukishima , Atsushi Taniguchi, Masahiko Jinno, Yoshihiro Takigawa (NTT) Shuichi Okamoto, Shinji Shimojo (NICT) For more information: GtrcNET: http://gtrc.aist.go.jp/gnet/ PSPacer: http://www.gridmpi.org/ G-lambda: http://www.g-lambda.net/ 41