Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Wireless Mesh Network Management - Fault and Performance Diagnosis: A Survey Vijay P Gabale (CSE, IIT Bombay) MTech Seminar under the guidance of Prof. Bhaskaran Raman Agenda • Overview • Motivation • Enterprise vs. Long-Distance Networks • Techniques • Fault Diagnosis - Examples • Future scope • Conclusion Wireless Network Woes! • My machine says: wireless connection unavailable. • Why is the network performance so low? • Is someone interfering with my transmissions? • Do we have complete coverage in all the buildings? • I wonder if some one has sneakily installed an unauthorized access point. Wireless Network Anomalies • RF holes • Interference • Hidden terminal • Rogue Access Points Which anomaly was the cause of undesired network performance? Challenges • Quantification of possible causes • Attribution of a performance problem to a specific root cause i.e. recognizing a fault • Network management to proactively deal with likely faults • Avoiding personal visits to nodes in long distance links Effects • System downtime •Loss of productivity (loss of faith) •Recovery cost Number of wireless related complaints logged by the IT department of a major US corporation Source:[4] Agenda • Overview • Motivation • Enterprise vs. Long-Distance Networks • Techniques • Fault Diagnosis - Examples • Future scope • Conclusion Enterprise Network • Comprises of dense deployment of access points & clients in a university or corporate building • Challenges • RF holes • Interference • Hidden terminals • Rogue Access Points • Solution space : Characterizing & then analyzing entire wireless behaviour, Online and Offline diagnosis Long-Distance Network • Comprises of point to point links of several meters to build a multi-hop mesh network • Challenges • Physical visits are costly • Remote locations could sometimes become inaccessible • Lack of trained personnel • Poor power quality • Solution space : In-node recovery or inference techniques, Independent control mechanisms Diagnostic Questions! • What is the per packet signal strength at every node – RF holes • How many concurrent receptions are there – Hidden Terminal • How is the noise level varying over time – Interference • Is there any foreign node wandering in the network – Rogue AP • Is the remote node working? What is the software or hardware status of the node? – Primary link failure or Software-Hardware failures Agenda • Introduction • Motivation • Enterprise vs. Long-Distance Networks • Techniques • Fault Diagnosis - Examples • Future scope • Conclusion Existing Techniques • Offline data collection & analysis : [1], [6] • Online anomaly detection : [2] • Simulation : [3] • Daemon running as a part of the node : [4] • Software & Hardware redundancies : [5], [7] Offline data collection & analysis • Steps : • Dense deployment of monitors • Synchronization & unification at a central server • Inference techniques • Example : Jigsaw[1], MacWild[6] • Fault Diagnosis : • Pr (Interference | Concurrent Transmissions) • Over-protective 802.11g clients and access points Offline data collection & analysis - Framework Central Server source : MacWild[6] Offline data collection & analysis • Steps : • Dense deployment of monitors • Synchronization & unification at a central client • Inference techniques • Example : Jigsaw[1], MacWild[6] • Fault Diagnosis : • Pr (Loss due to Interference | Concurrent Transmissions) • Over-protective 802.11g clients and access points Online anomaly detection Steps : • • Deploy multiple monitors Sample physical layer parameters Dynamic interference engine Example : Mojo[2] Fault Diagnosis : Threshold for Hidden Terminal, Capture Effect, Non 802.11 interference Simulation Steps : • Traces to drive simulation Deviation of observed behavior from expected behavior Decision trees to make distinction between possible faults Example : Troubleshooting Wireless Mesh Networks[3] Fault Diagnosis : External noise, Packet dropping, Misbehaving clients Simulation (contd…) Decision Tree If simSent – realSent > ThreshSentDiff CW misuse If simNoise – realNoise > ThreshNoiseDiff External Noise If simLoss – realLoss > ThreshLossDiff Packet dropping Normal Daemon running as a part of the node • An application resides at client side • Takes reactive or proactive actions in response to an event • Example : Client Conduit technique[4] • Fault Detection : Rogue APs, RF holes Software & Hardware redundancies • Experiences of software & hardware failures • Techniques : • Software & hardware watchdogs • Independent control mechanisms • Tracking & predicting health of a node • Example : Beyond Pilots[5], Fault Diagnosis[7] • Fault Diagnosis : Erratic power conditions, Primary link failure, Non 802.11 interference, Antenna misalignment Agenda • Introduction • Motivation • Enterprise vs. Long-Distance Networks • Techniques • Fault Diagnosis - Examples • Future scope • Conclusion Problem : Intermittent Connectivity • Symptoms : Irregular changes in connectivity or total failure • Causes : Weak RF signal, Lack of signal, unpredictable ambiance, obstructions • Parameter : Received signal strength How to tackle total failure? How to track a mobile node? Remedy : Client Conduit • It is a mechanism to allow disconnected users to convey messages to system and network administrators. Problem : Rogue Access Point • What is Rogue Access Point? • Security holes, unwanted RF interference and network load. • Access Point Database • Location, MAC, Channel Remedy : Client Conduit Yes Is MAC Registered? Is AP at Expected Location? No No Rogue AP Detected No Yes No Is AP Advertising Expected SSID? Is AP on Expected Channel? Yes Problem & Remedy : Hidden Terminal • Symptoms : Degraded performance, lower throughput • Causes : One transmitter not able to hear other transmissions to the same receiver, heterogeneous transmit powers • Remedy : Quantify number of concurrent transmissions • Around 40% • Capture Effect : Around 5% Problem & Remedy : Non 802.11 Interference • Symptoms : Retransmissions at the MAC layer, No concurrent transmissions detected • Quantify noise level • Moving window average • Threshold Problem : Connectivity problems over LongDistance Links • Symptoms : Remote node NOT Reachable • Causes: IP address misconfiguration, routing misconfiguration, power shutdown at remote node, a board failure, malfunctioning wireless card • Solution : • Link Local IP addressing • SMS backchannel Solution : Troubleshooting a Link Does Link Local IP Addressing Work? Yes Power Unavailable No Power available, Send SMS query and Get the result Router is Up Router Down Log In & Fix Configuration Problem Wait for Power Visit not required Reboot or Visit & Replace Get Status Report: Signal Strength, Noise Visit may be Required Problem & Remedy : Software and Hardware Failures •Symptoms : Node suddenly goes down, node does not respond on trying to connect over the primary link • Causes : Damaged power supplies or router boards, damaged CF cards, low voltages leave router in wedged state, battery problems • Techniques : Software and hardware watchdogs, power controllers, Low Voltage Disconnect, read only boot loader Agenda • Overview • Motivation • Enterprise vs. Long-Distance Networks • Techniques • Fault Diagnosis - Examples • Future scope • Conclusion Future Scope • Comprehensive Network Monitoring & Inference Tool • Quantify Performance Improvement • User Friendly GUIs • Automatic Recovery Conclusion • Classification of Techniques to resolve fault diagnosis • Enterprise as well as Long-Distance Mesh Networks • Faults: Connectivity, Hidden Terminal, Interference, Hardware Failures • Need for ‘Complete Monitoring & Inference’ Suit to Detect Root Level Causes Appendix – Comparison Table Appendix – Comparison Table (contd…) References of the Survey [1] Yu-Chung Cheng, John ellardo, and Peter Benko. Jigsaw:Solving the Puzzle of Enterprise 802.11 Analysis. SIGCOMM’06. [2] Anmol Sheth, Christian Doerr, Dirk Grun wald, Richard Han, and Dougla Sicker. Mojo :a Distributed Physical Layer Anomaly Detection System for 802.11WLANS. MOBISYS’06. [3] Lili Qiu, Paramvir Bahl, Ananth Rao, and Lidong Zhou. Troubleshooting-Wireless Mesh Networks. SIGCOMM’06. [4] Atul Adya, Paramvir Bahl, Ranveer Chandra, and Lilli Qiu. Architecture and techniques for diagnosing faults in ieee 802.11 infrastructure networks. MOBICOM’04. [5] Sonesh Surana, Rabin Patra, Sergiu Nedevschi and Manuel Ramos. Beyond Pilots: Keeping Rural Wireless Networks Alive. To appear in USENIX NSDI’08. References of the Survey [6] Ratul Mahajan, Maya Rodrig, David Wetherall, and John Zahorjan. Analyzing the MAC Level Behavior of Wireless Networks in the Wild. SIGCOMM, 2006. [7] Sonesh Surana, Rabin Patra, and Eric Brewer. Simplifying Fault Diagnosis in Locally Managed Rural wifi networks. SIGCOMM NSDR, 2007. [8] Yu-Chung Cheng, Mikhali Afanasyev, Patrick Verkaik, and Peter Benko. Automating Cross-Layer Diagnosis of Enterprise Wireless Networks. SIGCOMM, 2007. [9] Kameswari Chebrolu, Bhaskaran Raman, and Sayandeep Sen. Long-distance 802.11b Links: Performance Measurements and Experience. MOBICOM, 2006.