Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COMP2221 Networks in Organisations Richard Henson May 2013 Week 11 – Troubleshooting & Optimisation Learning Objectives: – Explain the principles of troubleshooting as a means of mitigating against failure – Use the various tools available on a named operating system to identify potential faults and problems – Take appropriate action to stop a fault becoming a failure “A stitch in time saves nine” Business - Worst Possible Scenario (1) There is an interruption in the power supply – UPS is invoked – the interruption continues… – servers all have to be shut down Power supply restored… – but main domain controller doesn’t reboot – no other domain controllers therefore connect to it – the domain tree fails Business - Worst Possible Scenario (2) Organisation cannot do business with the network down… – server can’t be persuaded to boot – new main domain controller has to be commissioned – whole directory tree has to be rebuilt!!! – word spreads very rapidly… Business loses so much custom, trust, and credibility that even when it starts doing business again customers choose to go elsewhere – without a flourishing customer base… the business folds Analysis: This scenario shouldn’t have occurred… Unlikely that the server would fail to boot without prior warning… – warnings would have been presented… – but were clearly not acted upon! Disaster recovery plan!?! – not formulated? – not tested? – not effective (in the event of a domain tree controller failure…) But it does… Actual example (15th Feb 2010): – root domain controller [on the network] had not been backed up for 10 months, when it crashed (well… at least it had been backed up at some time…) – http://searchwindowsserver.techtarget.com/generi c/0,295582,sid68_gci1381567,00.html The consultant called in to fix it reported that: – “I had never seen a case where the forest root domain had to be recovered -- and I couldn't find anyone who had.” Analysis: Who is to blame? (1) In this example, the organisation said they were following Microsoft guidelines – they set up an empty root domain – the root domain controller had a RAID-5 (best) disk configuration Was true, to some extent… – Microsoft did espouse this as best practice… (in the year 2000!) – guidelines had changed since then… Analysis: Who is to blame? (2) The disaster that struck was: – two RAID drives failed on the same day! – unlucky? possible to prepare for this? The recovery process took about three weeks – most of the time was spent studying logs, doing the restore, etc. In this case, the tree was still able to function without a root domain – business was able to continue – customer base wasn’t compromised… Fault Tolerance and Risk Assessment General “common sense” principle: – always have a backup – ESPECIALLY for the most important computer on the network… Q: – How can you tell what needs backing up? A: – Risk Assessment and Risk Management Why not Risk Management? Time consuming! However, without proper risk management… – how does the organisation know what processes are most important to its functioning? – how can an organisation provide resources to protect aspects of its network? Risk Management and Risk Assessment Risk Assessment is an essential first step – requires putting a “value” on assets – more valuable… greater protection Do information assets have value? – organisations still failing to acknowledge that they do… – categorisation of information assets therefore potentially problematic – need to look at the consequence to the organisation of losing that asset… How do you back up a Domain Controller? The Windows “Backup” program works, and can easily be scheduled – but heavily criticised… – even the 2008 server version… Third Party products give more flexibility and protection e.g. : – Recovery Manager » http://www.quest.com/recovery-manager-for-active-directory – Backup Exec » http://www.symantec.com/business/products/family.jsp?familyid=backupexec Prevention is Better than Cure A server shouldn’t crash unexpectedly! – should be kept cool (environmental unit mustn’t break down!) – monitoring should show that unexpected things are happening – action can then (usually) be taken to take care of the unexpected Many tools available to: – Check/monitor the system on a regular basis – Provide stats/ to administrators » could also be used for security purposes – Generate alerts if something is starting to go wrong… Troubleshooting Tools for a Windows Server: Task Manager Applications tab: – shows which applications are running – enables changing of process priority » use view/update speed – used to » open new applications » shut rogue applications down Task Manager (continued) Processes tab: – all system processes – Memory usage of each – % CPU time for each – total CPU time since boot up – also used to close a process down » careful! (but you get a warning…) Task Manager (continued) Performance tab: – total no. of threads, processes, handles running – Graph: % CPU usage » User mode » Kernel mode (optional: view menu) » graph per CPU (optional: view menu) – physical (Page File) memory available/usage – virtual memory available/usage Event Viewer Events recorded into “event log” files – System log – Auditing log (customisable) – Application log – customisable - additional files New files recorded daily; old ones archived – time before archiving also customisable Event Viewer Three types of events recorded in log: – Information – Warning – Error More information on each event obtained by double-clicking – make note of event code – heed and take action if necessary Using Event Viewer Wise to check all event logs regularly – take time/trouble to find out that those messages really mean… The action is needed that it – sort out potential problems now – Make sure they don’t become real ones later… Auditing Further Events Any “object” can be audited Objects to audit, and processes audited can be set through audit (group) policy – Using MMC & relevant snap-in Types of process audited: – access – attempt to access Security auditing Same principles as general auditing Refers to “restricted” objects Events appear in separate security log Event Management software (SIEM) Who’s going to look at all these log files? – in practice, often no-one.. Solution – SIEM software to analyse and present information from: – – – – – network and security devices identity & access management applications vulnerability management/policy compliance tools os, database & application logs external threat data http://www.focus.com/briefs/how -select-security-information-andevent-management-siem Performance Monitor Not available on disk To obtain and download Performance Monitor Wizard (PerfWiz), visit the following Web site: – http://www.microsoft.com/downloads/details.a spx?FamilyID=31fccd98-c3a1-4644-9622faa046d69214&displaylang=en What if the machine doesn’t boot… Tools available: – The boot error itself » blue screen? driver software » constant reboot? motherboard – Last Known Good… » Gives machine a chance to go back to the previous (usually last but one) configuration What if the machine doesn’t boot… (continued) Safe Mode – includes VGA Mode or boot logging – Debugging mode also available » output difficult to decipher for nonexperts Recovery Console – “DOS-type prompt” for performing minor repairs What if the machine doesn’t boot… (continued) System Configuration Utility (Msconfig.exe) – automates the routine troubleshooting steps relating to Windows configuration issues – can be used to modify the system configuration and troubleshoot the problem using a process-of-elimination method What if the machine doesn’t boot… (continued) Emergency Repair Disk (ERD) – reboot machine using different media » e,g. floppy disk (yes… still possible) – media should be generated BEFORE it needs to be used! – option to create the ERD during the set up process… What if the machine doesn’t boot… (continued) Full restore – assumes a full backup has already been made – still have to: » reformat hard disk from scratch… » and then restore the backup files using backup/restore option…. – but better than losing all your data! Optimisation… All about improving the performance of system resources… A network manager should never have “nothing to do…”