* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DYSWIS_20081209 - Columbia University
Survey
Document related concepts
Wake-on-LAN wikipedia , lookup
Distributed operating system wikipedia , lookup
Deep packet inspection wikipedia , lookup
TCP congestion control wikipedia , lookup
Internet protocol suite wikipedia , lookup
Distributed firewall wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Airborne Networking wikipedia , lookup
Remote Desktop Services wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Routing in delay-tolerant networking wikipedia , lookup
Transcript
DYSWIS K Y U N G - H WA K I M HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY Do You See What I See? Do you see what I see? End user Internet End user End user Outline Overview Fault Detection Peer Selection Probing Problem Implementation Demo Overview Overview DYSWIS – Do you see what I see Motivation Different causes for a particular network fault Need different ‘view’ from other sources for the fault End-to-end diagnosis Need user-friendly interface Current Problem Distributed network fault detection and analysis system Centralized management schemes Complexity in the user network and devices Failed to solve the service quality problem Approach Collaborate with other end users P2P based Remote probing For Quick Understanding Detect Diagnosis Probe Detect Diagnosis Detect Probe Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Detect Probe Diagnosis Probe Detect Detect Diagnosis Diagnosis Probe Probe Detect Detect Diagnosis Diagnosis Probe Probe Detect Detect Diagnosis Diagnosis Probe Probe Fault Detection Automatic fault detection Network raw packet capturing Analyze network packet and protocol Raw packet capturing Check error response Check timeout Check TCP congestion Monitoring TCP sequence numbers Define fault cases Automatic vs. Manual FSM approach pre-define learning FSM - Approach * Automatic Protocol Failure Detection Using Finite State Machines Zhifeng Wang , Kai X. Miao, Tao Zuo, Henning Schulzrinne, Kyung Hwa Kim, Vishal Kumar Singh FSM - Approach * Automatic Protocol Failure Detection Using Finite State Machines Zhifeng Wang , Kai X. Miao, Tao Zuo, Henning Schulzrinne, Kyung Hwa Kim, Vishal Kumar Singh Peer Selection Peer Selection DHT or Database Register myself to DHT network AS number, subnet, first hop, AP. Search probing nodes Inner nodes and outer nodes I need some nodes who can help me. Who is in same subnet with me? You can contact to B. His IP address is 218.59.21.16 and port number is 9090 A B DHT Peer Selection - DHT (key, value) <key> <type>node</type> <asn>14<asn> <subnet>128.59.0.0/16</subnet> </key> I need some nodes who <key> can help me. <type>node</type> Who is in same subnet <asn>9880<asn> with me? <subnet>45.45.45.0/24</subnet> <firewall>no</firewall> <nat>no</nat> </key> <value> <type>node</type> <ip>128.59.21.15</ip> <port>9090</port> <protocol>udp</protocol> </value> <value> <type>node</type> A <ip>128.59.21.15</ip> <hostname>kkh.cs.columbia.edu</hostname> B <port>9090</port> <protocol>tcp</protocol> </value> DHT Remote Probing Distributing modules Detecting and probing modules should be added and updated Dynamic class loading Dynamic module distributing Modules can be created and updated separately. XMLRPC Probing Scenarios HTTP Causes: Dead web-server , page moved, low bandwidth … Check DNS query TCP connection Ask other node to try same query Check TCP congestion … DNS Causes : Dead DNS server , resolution failed, udp is not working , … Check other DNS server Ask other node to try to connect my DNS server Ask other node to query same host to another DNS server SIP/RTP Causes: NAT, DNS, proxy server, authentication Proxy connectivity test Ask other node to try same action. … Probing Scenarios Connection problem Causes : Dead server, firewall, wrong port number … Traceroute – Check routers Ask other node to try to connect the server Ask other node to check my port … TCP Congestion Causes : Queuing delay, dead routers Traceroute , ping Try to find bottleneck … Probing Scenarios A B Data Gathering Problem We have resources: Other machines But how do we use them efficiently? We need real data Approach Collecting data Collecting Scenarios Implementing prototype Implementation Architecture http://wiki.cs.columbia.edu/display/res/DYSWIS For the detail, visit : http://wiki.cs.columbia.edu/display/res/DYSWIS Demo Demo Future work Implementation http://www.cs.columbia.edu/~khkim/project/dyswis Coming soon : Mac & Linux Testbed - PlanetLab Mature research for analysis Support real time protocols How to find solutions for end users backup Check local network. Select two nodes, one from same subnet, another one from outer subnet. Let the nodes try to connect the server. If both nodes failed to connect the server, log this fault as ‘server failure’. If only internal node failed, execute traceroute to check where the packet is blocked. If internal node succeeded, it is possible that this problem is caused by local firewall or something else. Check incoming/outgoing port; Let other nodes open same port, and try to connect there. Check the remote node received packet or not. Check the ACK from remote node came back.