Download Network Management Session 1 Network Basics

COMP2221 Networks in Organisations Richard Henson May 2013 Week 11 – Troubleshooting & Optimisation  Learning Objectives: – Explain the principles of troubleshooting as a means of mitigating against failure – Use the various tools available on a named operating system to identify potential faults and problems – Take appropriate action to stop a fault becoming a failure “A stitch in time saves nine” Business - Worst Possible Scenario (1)  There is an interruption in the power supply – UPS is invoked – the interruption continues… – servers all have to be shut down  Power supply restored… – but main domain controller doesn’t reboot – no other domain controllers therefore connect to it – the domain tree fails Business - Worst Possible Scenario (2)  Organisation cannot do business with the network down… – server can’t be persuaded to boot – new main domain controller has to be commissioned – whole directory tree has to be rebuilt!!! – word spreads very rapidly…  Business loses so much custom, trust, and credibility that even when it starts doing business again customers choose to go elsewhere – without a flourishing customer base… the business folds Analysis: This scenario shouldn’t have occurred…  Unlikely that the server would fail to boot without prior warning… – warnings would have been presented… – but were clearly not acted upon!  Disaster recovery plan!?! – not formulated? – not tested? – not effective (in the event of a domain tree controller failure…) But it does…  Actual example (15th Feb 2010): – root domain controller [on the network] had not been backed up for 10 months, when it crashed (well… at least it had been backed up at some time…) – http://searchwindowsserver.techtarget.com/generi c/0,295582,sid68_gci1381567,00.html  The consultant called in to fix it reported that: – “I had never seen a case where the forest root domain had to be recovered -- and I couldn't find anyone who had.” Analysis: Who is to blame? (1)  In this example, the organisation said they were following Microsoft guidelines – they set up an empty root domain – the root domain controller had a RAID-5 (best) disk configuration  Was true, to some extent… – Microsoft did espouse this as best practice… (in the year 2000!) – guidelines had changed since then… Analysis: Who is to blame? (2)  The disaster that struck was: – two RAID drives failed on the same day! – unlucky? possible to prepare for this?  The recovery process took about three weeks – most of the time was spent studying logs, doing the restore, etc.  In this case, the tree was still able to function without a root domain – business was able to continue – customer base wasn’t compromised… Fault Tolerance and Risk Assessment  General “common sense” principle: – always have a backup – ESPECIALLY for the most important computer on the network…  Q: – How can you tell what needs backing up?  A: – Risk Assessment and Risk Management Why not Risk Management? Time consuming!  However, without proper risk management…  – how does the organisation know what processes are most important to its functioning? – how can an organisation provide resources to protect aspects of its network? Risk Management and Risk Assessment  Risk Assessment is an essential first step – requires putting a “value” on assets – more valuable… greater protection  Do information assets have value? – organisations still failing to acknowledge that they do… – categorisation of information assets therefore potentially problematic – need to look at the consequence to the organisation of losing that asset… How do you back up a Domain Controller?  The Windows “Backup” program works, and can easily be scheduled – but heavily criticised… – even the 2008 server version…  Third Party products give more flexibility and protection e.g. : – Recovery Manager » http://www.quest.com/recovery-manager-for-active-directory – Backup Exec » http://www.symantec.com/business/products/family.jsp?familyid=backupexec Prevention is Better than Cure  A server shouldn’t crash unexpectedly! – should be kept cool (environmental unit mustn’t break down!) – monitoring should show that unexpected things are happening – action can then (usually) be taken to take care of the unexpected  Many tools available to: – Check/monitor the system on a regular basis – Provide stats/ to administrators » could also be used for security purposes – Generate alerts if something is starting to go wrong… Troubleshooting Tools for a Windows Server: Task Manager  Applications tab: – shows which applications are running – enables changing of process priority » use view/update speed – used to » open new applications » shut rogue applications down Task Manager (continued)  Processes tab: – all system processes – Memory usage of each – % CPU time for each – total CPU time since boot up – also used to close a process down » careful! (but you get a warning…) Task Manager (continued)  Performance tab: – total no. of threads, processes, handles running – Graph: % CPU usage » User mode » Kernel mode (optional: view menu) » graph per CPU (optional: view menu) – physical (Page File) memory available/usage – virtual memory available/usage Event Viewer  Events recorded into “event log” files – System log – Auditing log (customisable) – Application log – customisable - additional files  New files recorded daily; old ones archived – time before archiving also customisable Event Viewer  Three types of events recorded in log: – Information – Warning – Error  More information on each event obtained by double-clicking – make note of event code – heed and take action if necessary Using Event Viewer  Wise to check all event logs regularly – take time/trouble to find out that those messages really mean…  The action is needed that it – sort out potential problems now – Make sure they don’t become real ones later… Auditing Further Events Any “object” can be audited  Objects to audit, and processes audited can be set through audit (group) policy  – Using MMC & relevant snap-in  Types of process audited: – access – attempt to access Security auditing Same principles as general auditing  Refers to “restricted” objects  Events appear in separate security log  Event Management software (SIEM)  Who’s going to look at all these log files? – in practice, often no-one..  Solution – SIEM software to analyse and present information from: – – – – – network and security devices identity & access management applications vulnerability management/policy compliance tools os, database & application logs external threat data http://www.focus.com/briefs/how -select-security-information-andevent-management-siem Performance Monitor Not available on disk  To obtain and download Performance Monitor Wizard (PerfWiz), visit the following Web site:  – http://www.microsoft.com/downloads/details.a spx?FamilyID=31fccd98-c3a1-4644-9622faa046d69214&displaylang=en What if the machine doesn’t boot…  Tools available: – The boot error itself » blue screen? driver software » constant reboot? motherboard – Last Known Good… » Gives machine a chance to go back to the previous (usually last but one) configuration What if the machine doesn’t boot… (continued)  Safe Mode – includes VGA Mode or boot logging – Debugging mode also available » output difficult to decipher for nonexperts  Recovery Console – “DOS-type prompt” for performing minor repairs What if the machine doesn’t boot… (continued)  System Configuration Utility (Msconfig.exe) – automates the routine troubleshooting steps relating to Windows configuration issues – can be used to modify the system configuration and troubleshoot the problem using a process-of-elimination method What if the machine doesn’t boot… (continued)  Emergency Repair Disk (ERD) – reboot machine using different media » e,g. floppy disk (yes… still possible) – media should be generated BEFORE it needs to be used! – option to create the ERD during the set up process… What if the machine doesn’t boot… (continued)  Full restore – assumes a full backup has already been made – still have to: » reformat hard disk from scratch… » and then restore the backup files using backup/restore option…. – but better than losing all your data! Optimisation… All about improving the performance of system resources…  A network manager should never have “nothing to do…” 

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Network Management Session 1 Network Basics