Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Module 8 Implementing failover clustering Module Overview • Planning a failover cluster • Creating and configuring a new failover cluster • Maintaining a failover cluster • Troubleshooting a failover cluster • Implementing site high availability with stretch clustering Lesson 1: Planning a failover cluster • Preparing to implement failover clustering • Failover-cluster storage • Hardware requirements for a failover-cluster implementation • Network requirements for a failover-cluster implementation • Demonstration: Verify a network adapter's RSS and RDMA compatibility on an SMB Server • Infrastructure and software requirements for a failover cluster • Security considerations • Quorum in Windows Server 2016 • Planning for migrating and upgrading failover clusters Preparing to implement failover clustering Features of failover clustering include: • High availability • Stateful application • IP-based protocols Preparing to implement failover clustering Consider the following guidelines when planning node capacity in a failover cluster: • Distribute the highly-available applications from a failed node • Ensure that each node has sufficient capacity • Use hardware with similar capacity for all nodes in a cluster Failover-cluster storage • Failover clusters require shared storage to provide consistent data to a virtual server after failover • Shared storage options include: • • • • • SAS iSCSI Fibre Channel Shared .vhdx Scale-Out File Server • You can also implement clustered storage spaces to achieve high availability at the storage level Hardware requirements for a failover-cluster implementation The hardware requirements for a failover implementation include: • You must use server hardware that is certified for Windows Server • Server nodes should all have the same configuration and contain the same or similar components • All servers must pass the tests in the Validate a Configuration Wizard Network requirements for a failover-cluster implementation The network requirements for a failover implementation include: • Your server should connect to multiple networks to ensure communication redundancy, or it should connect to a single network with redundant hardware, to remove single points of failure • You should ensure that network adapters are identical and that they have the same IP protocol versions, speed, duplex, and flow-control capabilities • Your network adapters should be compatible with RSS and RDMA Demonstration: Verify a network adapter's RSS and RDMA compatibility on an SMB Server In this demonstration, you will learn how to verify a network adapter’s RSS and RDMA compatibility on an SMB Server Infrastructure and software requirements for a failover cluster • The infrastructure requirements for a failover implementation include: • Active Directory domain controllers should run Windows Server 2008 or newer • Domain-functional level and forest-functional level should run Windows Server 2008 or newer • The application must support Windows Server 2016 high availability • The software best practices for a failover cluster implementation requires that: • All nodes have the same edition of Windows Server 2016, same service pack and updates Security considerations • Security considerations for failover clustering include that you must: • • • • Provide a method for authentication and authorization Ensure that unauthorized users do not have physical access to failover cluster nodes Ensure that you use antimalware software Ensure that your intra-cluster communication authenticates with Kerberos version 5 • If you use an Active Directory-detached cluster: • • • • AD DS objects for network names are not created Cluster network name that you register in a DNS is not necessary to create new objects in AD DS We do not recommend this for any scenario that requires Kerberos authentication You must run Windows Server 2012 R2 or newer on all cluster nodes Security considerations Windows Server 2016 introduces several cluster types, and which one you use depends on your domain-membership scenario: Single-domain clusters • Workgroup clusters • Multi-domain clusters • Workgroup and domain clusters • Quorum in Windows Server 2016 Quorum mode What has the vote? When is quorum maintained? Node majority Only nodes in the cluster have a vote When more than half of the nodes are online Node and disk majority Node and file share majority No majority: disk only Dynamic quorum The nodes in the cluster and a disk witness have a vote The nodes in the cluster and a file share witness have a vote Only the quorumshared disk has a vote When more than half of the votes are online Votes are dynamically assigned to always be odd When half the votes are online When more than half of the votes are online When the shared disk is online Quorum in Windows Server 2016 • Dynamic quorum: • Disk • File witness share witness • Azure Cloud Witness • We recommend that you use dynamic quorum, which is the default configuration • You should use all other forms of quorum in specific use cases only Planning for migrating and upgrading failover clusters The upgrade steps for each node in the cluster include: • Pause the cluster node and drain all cluster resources • Migrate cluster resources to another node in the cluster • Replace the cluster node operating system with Windows Server 2016 and add the node back to the cluster • Upgraded all nodes to Windows Server 2016 • Run cmdlet Update-ClusterFunctionalLevel Lesson 2: Creating and configuring a new failover cluster • The Validation Wizard and the cluster support-policy requirements • The process for creating a failover cluster • Demonstration: Creating a failover cluster • Demonstration: Reviewing the Validation Wizard • Configuring roles • Demonstration: Creating a general file-server failover cluster • Managing failover clusters • Configuring cluster properties • Configuring failover and failback • Configuring storage • Configuring networking • Configuring quorum options • Demonstration: Configuring the quorum The validation wizard and the cluster supportpolicy requirements • Validation Wizard performs multiple types of tests, such as: Cluster • Inventory • Network • Storage • System • • You can perform validation from the Validate a Configuration Wizard or with the Test-Cluster Windows PowerShell cmdlet The process for creating a failover cluster 1. Install the failover clustering feature 2. Verify the configuration, and create a cluster 3. Install the role on all cluster nodes by using Server Manager 4. Create a clustered application by using the Failover Clustering Management snap-in 5. Configure the application 6. Test failover Demonstration: Creating a failover cluster In this demonstration, you will learn how to install a Failover Clustering feature Demonstration: Reviewing the Validation Wizard In this demonstration, you will learn how to validate and configure a failover cluster Configuring roles • Configuring a cluster role includes: • Choosing a clustering role • Installing the role • Verifying the status (Running) on all cluster nodes • You can configure a cluster role by using: • The Cluster Manager console • The New-Cluster Windows PowerShell cmdlet Demonstration: Creating a general file-server failover cluster In this demonstration, you will learn how to cluster a file server role Managing failover clusters The most common management tasks include: • Managing nodes • Managing networks • Managing permissions • Configuring cluster-quorum settings • Migrating services and applications to a cluster • Configuring new services and applications • Removing the cluster Configuring cluster properties The three aspects of managing cluster nodes include: • Adding nodes after you create a cluster • Pausing nodes, which prevents resources from running on that node • Evicting nodes from a cluster, which removes the node from the cluster configuration Configuration tasks are available in: • The Actions pane of the Failover Cluster Management console • Windows PowerShell Configuring failover and failback • During failover, the clustered instance and all associated resources move from one node to another • Failover occurs when: • • • The node that hosts the instance becomes inactive for some reason One of the resources within the instance fails An administrator performs a failover • The Cluster service can fail back after the offline node becomes active again • Failover can be planned or unplanned Configuring storage Storage configuration tasks in Failover Clustering include: • Adding storage spaces • Adding a disk to available storage and to the CSV • Taking a disk offline • Bringing the disk back online Configuring networking Network Description Public network Clients use this network to connect to the clustered service Private network Nodes use this network to communicate with each other Public-and-private network Required to communicate with external storage systems • One network can support both client and node communications • Multiple network adapters are recommended for enhanced performance and redundancy • iSCSI storage should have a dedicated network Configuring quorum options Quorum configuration options available in the Configure Cluster Quorum Wizard and Windows PowerShell) include: Use typical settings • Add or change the quorum witness • Advanced quorum configuration and witness selection • Dynamic quorum and quorum-configuration considerations • Dynamic quorum management: • • • • Failover cluster dynamically manages the vote assignment to nodes Allows for a cluster to run on the last surviving cluster node Cannot survive a simultaneous failure of a majority of voting nodes If you explicitly remove a vote from a node, the cluster cannot dynamically add or remove that vote. • Quorum configuration considerations include: • • Validating the quorum configuration by using the Validate a Configuration Wizard, or the Test-Cluster Windows PowerShell cmdlet. Changing the quorum configuration only in specific scenarios: • Adding or evicting nodes • Node or witness have failed and cannot be recovered quickly • Recovering a cluster in a multisite disaster recovery scenario. Demonstration: Configuring the quorum In this demonstration, you will learn how to configure a quorum Lab A: Implementing failover clustering • Exercise 1: Creating a failover cluster • Exercise 2: Verifying quorum settings and adding a node Logon Information Virtual machines: 20740A-LON-DC1 20740A-LON-SVR1 20740A-LON-SVR2 20740A-LON-SVR3 20740A-LON-SVR5 20740A-LON-CL1 User name: Adatum\Administrator Password: Pa$$w0rd Estimated Time: 45 minutes Lab Scenario A. Datum Corporation is looking to ensure that its critical services, such as file services, have better uptime and availability. You decide to implement a failover cluster with file services to provide better uptime and availability. Lab Review • What information do you need for planning a failover-cluster implementation? • After running Validate a Configuration Wizard, how can you resolve the network communication’s single point of failure? • In which situations might it be important to enable failback of a clustered application during a specific time? Lesson 3: Maintaining a failover cluster • Monitoring failover clusters • Backing up and restoring failover-cluster configuration • Maintaining failover clusters • Managing cluster-network heartbeat traffic • What is cluster-aware updating? • Demonstration: Configuring CAU Monitoring failover clusters Tools you can use to monitor clusters include: • Event Viewer • Tracerpt.exe • MHTML-formatted cluster configuration reports • Performance and Reliability Monitor snap-in Backing up and restoring failover-cluster configuration • When backing up failover clusters, remember that: • Windows Server Backup is a Windows Server 2016 feature • Non-Microsoft tools are available to perform backups and restores • You must perform system-state backups • A nonauthoritative restore completely restores a single node in the cluster • An authoritative restore restores the entire cluster configuration to a point in time Maintaining failover clusters Failover cluster troubleshooting techniques include: Using the Validate a Configuration Wizard • Reviewing events in logs (cluster, hardware, storage) • Defining a process for troubleshooting failover clusters • Reviewing storage configuration • Checking for group and resource failures • Managing cluster-network heartbeat traffic • Types of network monitoring: • Aggressive • Relaxed • Network-monitoring parameter settings: • Delay • Threshold • Windows PowerShell cmdlet examples: Get-Cluster | fl *subnet* (Get-Cluster).SameSubnetThrehold=10 What is cluster-aware updating? • Automated feature in Windows Server 2016 • Updates nodes in a cluster with minimal or no downtime • Benefits: Updating is automatic • Can be scheduled • No downtime • How CAU works CAU works in two modes: • Remote updating mode: Configure a separate computer as an orchestrator • Install the failover-clustering administrative tools • Ensure that the orchestrator computer is not a cluster member • Self-updating mode: • Configure the CAU clustered role as a workload • Ensure that there is no dedicated orchestrator computer • Remember that cluster updates itself • Demonstration: Configuring CAU In this demonstration, you will learn how to configure CAU Lesson 4: Troubleshooting a failover cluster • Communication issues • Repairing the cluster name object in AD DS • Starting a cluster with no quorum • Demonstration: Reviewing the Cluster.Log file • Monitoring performance with failover clustering • Using Event Viewer with failover clustering • Windows PowerShell troubleshooting cmdlets Communication issues • The following might cause communications issues in failover clustering: Network latency • Network failures • Network-adapter driver issues • Firewall rules • Security software • • You can use Get-ClusterLog cmdlet to generate the Cluster.log file for troubleshooting located in C:\Windows\Cluster\Reports Repairing the cluster name object in AD DS • The CNO repair process: • Use Repair Active Directory Object option in the Failover Cluster Manager • You must have Reset Password permissions on the CNO computer object • The VCO repair process: • Use the AD Recycle Bin feature to recover deleted computer objects, and use the Repair function as the last recovery action • The CNO will reset the password and self-heal automatically • The CNO must have Create Computer Objects permissions on the VCO’s OU Starting a cluster with no quorum • Cluster nodes must retain quorum for the cluster to work • If quorum is lost, try to reestablish the quorum • If you cannot reestablish quorum during an extended period, start the cluster in the ForceQuorum mode • After you start the cluster in ForceQuorum mode, other nodes can rejoin the cluster • Once quorum is reestablished again, cluster mode changes from ForceQuorum to normal automatically • When joining nodes to the cluster in ForceQuorum mode, you should start other nodes with a setting preventing quorum Demonstration: Reviewing the Cluster.Log file In this demonstration, you will learn how to review the Cluster.log file Monitoring performance with failover clustering Some of the failover clustering performance counters include: • Cluster Network Messages • Cluster Network Reconnections • Global Update Manager • Database • Resource Control • API • Cluster Shared Volumes Using Event Viewer with failover clustering Events that are displayed in Event Viewer and require you to troubleshoot clusters include: • Cluster resource in clustered service or application failed • Cluster network interface for cluster node on network failed • File share witness resource failed to arbitrate for the file share • Cluster node was removed from the active failover cluster membership • The Cluster service failed to bring clustered service or application completely online or offline • Cluster network name resource failed registration of one or more associated DNS name(s) • Cluster network name resource cannot be brought online Windows PowerShell troubleshooting cmdlets Common cmdlets for troubleshooting failover clustering include: • Get-Cluster • Get-ClusterAccess • Get-ClusterDiagnostics • Get-ClusterGroup • Get-ClusterLog • Get-ClusterNetwork • Get-ClusterResourceDependencyReport • Get-ClusterVMMonitoredItem • Test-Cluster • Test-ClusterResourceFailure Lesson 5: Implementing site high availability with stretch clustering • What is a stretch cluster? • Prerequisites for implementing a stretch cluster • Synchronous and asynchronous replication • Overview of the Storage Replica feature • Demonstration: Implementing server-to-server storage replica • Selecting a quorum mode for a stretch cluster • Configuring a stretch cluster • Challenges for deploying a stretch cluster • Multisite failover and failback considerations What is a stretch cluster? A stretch cluster is a cluster that has been extended so that different nodes in the same cluster reside in separate physical locations Site A Site B SAN SAN Prerequisites for implementing a stretch cluster To implement a stretch-failover cluster, you must ensure the following: Plan for additional hardware to support enough nodes on each site • Ensure that the same operating systems and service packs are installed on each node • Include at least one low-latency and reliable network connection between sites • Configure a storage replication mechanism • Configure storage infrastructure services on each site • Synchronous and asynchronous replication • In synchronous replication, the host receives a write complete response from the primary storage after the data is written successfully to both storage locations • In asynchronous replication, the host receives a write complete response from the primary storage after the data is written successfully on the primary storage Site A Site B Replication Write request Write complete Data Primary storage Data Secondary storage Overview of the Storage Replica feature • Use for disaster recovery or preparedness • Configure via Failover Cluster Manager or Windows PowerShell • The three replication scenarios are: Stretch cluster • Server-to-server • Cluster-to-cluster • • Replicates synchronously or asynchronously • Requires Windows Server 2016 Datacenter Edition • Requires GPT-initialized disks Storage Replica • Synchronous replication • Asynchronous replication Storage Replica Hyper-V stretch cluster supports synchronous replication only Storage Replica Server-to-server supports both synchronous and asynchronous replication Storage Replica Cluster-to-cluster supports synchronous replication only Demonstration: Implementing server-to-server storage replica In this demonstration, you will learn how to configure storage replica Selecting a quorum mode for a stretch cluster • File-share witness: • Requires three or more datacenter locations • Is available in Windows Server 2012 R2 and Windows Server 2016 • Azure Cloud Witness: • Requires two datacenter locations • Requires Internet connection for all nodes • Is available in Windows Server 2016 only • No witness: • Is not recommended • Manual failover (disaster-recovery site) Configuring a stretch cluster Site-aware failover-cluster services provide: Failover affinity • Cross-site heartbeating • Preferred site configuration • Challenges for deploying a stretch cluster When deploying stretch clusters: Ensure that the business requirements are met • Use storage replication between sites: • • • Hardware vendor (Windows Server 2012 R2 or earlier) Storage Replica (Windows Server 2016) Choose the correct quorum witness to properly maintain functionality in the event of failures • Choose the correct storage-replication solution to meet the needs for Storage Replica • Multisite failover and failback considerations When implementing stretch clusters in disaster recovery scenarios, consider the following: Failover time • Services for failover • Quorum maintenance • Storage connection • Published services and name resolution • Client connectivity • Failback procedure • Lab B: Managing a failover cluster • Exercise 1: Evicting a node and verifying quorum settings • Exercise 2: Changing the quorum from disk witness to file-share witness, and defining node voting • Exercise 3: Verifying high availability Logon Information Virtual machines: 20740A-LON-DC1 20740A-LON-SVR1 20740A-LON-SVR2 20740A-LON-SVR3 20740A-LON-SVR5 20740A-LON-CL1 User name: Adatum\Administrator Password: Pa$$w0rd Estimated Time: 45 min Lab Scenario A. Datum Corporation recently implemented failover clustering for better uptime and availability. The implementation is new and your boss has asked you to go through some failovercluster management tasks so that you are prepared to manage it moving forward. Lab Review • Why would you evict a cluster node from a failover cluster? • Do you perform failure-scenario testing for your high-available applications based on Windows Server failover clustering? Module Review and Takeaways • Review Questions • Real-world Issues and Scenarios • Tools • Best Practices • Common Issues and Troubleshooting Tips