Download 20740A_08

Document related concepts

OS-tan wikipedia , lookup

Distributed operating system wikipedia , lookup

Unix security wikipedia , lookup

Object storage wikipedia , lookup

Transcript
Module 8
Implementing failover clustering
Module Overview
• Planning a failover cluster
• Creating and configuring a new failover cluster
• Maintaining a failover cluster
• Troubleshooting a failover cluster
• Implementing site high availability with stretch
clustering
Lesson 1: Planning a failover cluster
• Preparing to implement failover clustering
• Failover-cluster storage
• Hardware requirements for a failover-cluster
implementation
• Network requirements for a failover-cluster
implementation
• Demonstration: Verify a network adapter's RSS and
RDMA compatibility on an SMB Server
• Infrastructure and software requirements for a failover
cluster
• Security considerations
• Quorum in Windows Server 2016
• Planning for migrating and upgrading failover clusters
Preparing to implement failover clustering
Features of failover clustering include:
• High availability
• Stateful application
• IP-based protocols
Preparing to implement failover clustering
Consider the following guidelines when planning
node capacity in a failover cluster:
• Distribute the highly-available applications from a
failed node
• Ensure that each node has sufficient capacity
• Use hardware with similar capacity for all nodes in
a cluster
Failover-cluster storage
• Failover clusters require shared storage to provide
consistent data to a virtual server after failover
• Shared storage options include:
•
•
•
•
•
SAS
iSCSI
Fibre Channel
Shared .vhdx
Scale-Out File Server
• You can also implement clustered
storage spaces to achieve high
availability at the storage level
Hardware requirements for a failover-cluster
implementation
The hardware requirements for a failover
implementation include:
• You must use server hardware that is certified for
Windows Server
• Server nodes should all have the same
configuration and contain the same or similar
components
• All servers must pass the tests in the Validate a
Configuration Wizard
Network requirements for a failover-cluster
implementation
The network requirements for a failover
implementation include:
• Your server should connect to multiple networks
to ensure communication redundancy, or it should
connect to a single network with redundant
hardware, to remove single points of failure
• You should ensure that network adapters are
identical and that they have the same IP protocol
versions, speed, duplex, and flow-control
capabilities
• Your network adapters should be compatible with
RSS and RDMA
Demonstration: Verify a network adapter's RSS
and RDMA compatibility on an SMB Server
In this demonstration, you will learn how to verify a
network adapter’s RSS and RDMA compatibility on
an SMB Server
Infrastructure and software requirements for a
failover cluster
• The infrastructure requirements for a failover
implementation include:
• Active Directory domain controllers should run
Windows Server 2008 or newer
• Domain-functional level and forest-functional level
should run Windows Server 2008 or newer
• The application must support Windows Server 2016
high availability
• The software best practices for a failover cluster
implementation requires that:
•
All nodes have the same edition of Windows Server
2016, same service pack and updates
Security considerations
• Security considerations for failover clustering include that
you must:
•
•
•
•
Provide a method for authentication and authorization
Ensure that unauthorized users do not have physical access to failover
cluster nodes
Ensure that you use antimalware software
Ensure that your intra-cluster communication authenticates with
Kerberos version 5
• If you use an Active Directory-detached cluster:
•
•
•
•
AD DS objects for network names are not created
Cluster network name that you register in a DNS is not necessary to
create new objects in AD DS
We do not recommend this for any scenario that requires Kerberos
authentication
You must run Windows Server 2012 R2 or newer on all cluster nodes
Security considerations
Windows Server 2016 introduces several cluster
types, and which one you use depends on your
domain-membership scenario:
Single-domain clusters
• Workgroup clusters
• Multi-domain clusters
• Workgroup and domain clusters
•
Quorum in Windows Server 2016
Quorum
mode
What has the vote?
When is quorum
maintained?
Node
majority
Only nodes in the
cluster have a vote
When more than half of
the nodes are online
Node and
disk
majority
Node and
file share
majority
No
majority:
disk only
Dynamic
quorum
The nodes in the cluster
and a disk witness have
a vote
The nodes in the cluster
and a file share witness
have a vote
Only the quorumshared disk has a vote
When more than half of
the votes are online
Votes are dynamically
assigned to always be
odd
When half the votes are
online
When more than half of
the votes are online
When the shared disk is
online
Quorum in Windows Server 2016
• Dynamic quorum:
• Disk
• File
witness
share witness
• Azure
Cloud Witness
• We recommend that you use dynamic quorum,
which is the default configuration
• You should use all other forms of quorum in
specific use cases only
Planning for migrating and upgrading failover clusters
The upgrade steps for each node in the cluster
include:
• Pause the cluster node and drain all cluster resources
• Migrate cluster resources to another node in the cluster
• Replace the cluster node operating system with Windows
Server 2016 and add the node back to the cluster
• Upgraded all nodes to Windows Server 2016
• Run cmdlet Update-ClusterFunctionalLevel
Lesson 2: Creating and configuring a new
failover cluster
• The Validation Wizard and the cluster support-policy
requirements
• The process for creating a failover cluster
• Demonstration: Creating a failover cluster
• Demonstration: Reviewing the Validation Wizard
• Configuring roles
• Demonstration: Creating a general file-server failover cluster
• Managing failover clusters
• Configuring cluster properties
• Configuring failover and failback
• Configuring storage
• Configuring networking
• Configuring quorum options
• Demonstration: Configuring the quorum
The validation wizard and the cluster supportpolicy requirements
• Validation Wizard performs multiple types of tests,
such as:
Cluster
• Inventory
• Network
• Storage
• System
•
• You can perform validation from the Validate a
Configuration Wizard or with the Test-Cluster
Windows PowerShell cmdlet
The process for creating a failover cluster
1. Install the failover clustering feature
2. Verify the configuration, and create a cluster
3. Install the role on all cluster nodes by using
Server Manager
4. Create a clustered application by using the
Failover Clustering Management snap-in
5. Configure the application
6. Test failover
Demonstration: Creating a failover cluster
In this demonstration, you will learn how to install a
Failover Clustering feature
Demonstration: Reviewing the Validation Wizard
In this demonstration, you will learn how to validate
and configure a failover cluster
Configuring roles
• Configuring a cluster role includes:
•
Choosing a clustering role
•
Installing the role
•
Verifying the status (Running) on all cluster nodes
• You can configure a cluster role by using:
•
The Cluster Manager console
•
The New-Cluster Windows PowerShell cmdlet
Demonstration: Creating a general file-server
failover cluster
In this demonstration, you will learn how to cluster
a file server role
Managing failover clusters
The most common management tasks
include:
• Managing nodes
• Managing networks
• Managing permissions
• Configuring cluster-quorum settings
• Migrating services and applications to a cluster
• Configuring new services and applications
• Removing the cluster
Configuring cluster properties
The three aspects of managing cluster nodes
include:
• Adding nodes after you create a cluster
• Pausing nodes, which prevents resources from
running on that node
• Evicting nodes from a cluster, which removes the
node from the cluster configuration
Configuration tasks are available in:
• The Actions pane of the Failover Cluster
Management console
• Windows PowerShell
Configuring failover and failback
• During failover, the clustered instance and all
associated resources move from one node to
another
• Failover occurs when:
•
•
•
The node that hosts the instance becomes inactive for
some reason
One of the resources within the instance fails
An administrator performs a failover
• The Cluster service can fail back after the offline
node becomes active again
• Failover can be planned or unplanned
Configuring storage
Storage configuration tasks in Failover Clustering
include:
• Adding storage spaces
• Adding a disk to available storage and to the CSV
• Taking a disk offline
• Bringing the disk back online
Configuring networking
Network
Description
Public network
Clients use this network to connect to the
clustered service
Private network
Nodes use this network to communicate with
each other
Public-and-private
network
Required to communicate with external storage
systems
• One network can support both client and node
communications
• Multiple network adapters are recommended for
enhanced performance and redundancy
• iSCSI storage should have a dedicated network
Configuring quorum options
Quorum configuration options available in the
Configure Cluster Quorum Wizard and Windows
PowerShell) include:
Use typical settings
• Add or change the quorum witness
• Advanced quorum configuration and witness selection
•
Dynamic quorum and quorum-configuration
considerations
• Dynamic quorum management:
•
•
•
•
Failover cluster dynamically manages the vote assignment to nodes
Allows for a cluster to run on the last surviving cluster node
Cannot survive a simultaneous failure of a majority of voting nodes
If you explicitly remove a vote from a node, the cluster cannot
dynamically add or remove that vote.
• Quorum configuration considerations include:
•
•
Validating the quorum configuration by using the Validate a
Configuration Wizard, or the Test-Cluster Windows PowerShell
cmdlet.
Changing the quorum configuration only in specific scenarios:
•
Adding or evicting nodes
•
Node or witness have failed and cannot be recovered quickly
•
Recovering a cluster in a multisite disaster recovery scenario.
Demonstration: Configuring the quorum
In this demonstration, you will learn how to
configure a quorum
Lab A: Implementing failover clustering
• Exercise 1: Creating a failover cluster
• Exercise 2: Verifying quorum settings and adding a
node
Logon Information
Virtual machines: 20740A-LON-DC1
20740A-LON-SVR1
20740A-LON-SVR2
20740A-LON-SVR3
20740A-LON-SVR5
20740A-LON-CL1
User name:
Adatum\Administrator
Password:
Pa$$w0rd
Estimated Time: 45 minutes
Lab Scenario
A. Datum Corporation is looking to ensure that its
critical services, such as file services, have better
uptime and availability. You decide to implement a
failover cluster with file services to provide better
uptime and availability.
Lab Review
• What information do you need for planning a
failover-cluster implementation?
• After running Validate a Configuration Wizard,
how can you resolve the network communication’s
single point of failure?
• In which situations might it be important to
enable failback of a clustered application during a
specific time?
Lesson 3: Maintaining a failover cluster
• Monitoring failover clusters
• Backing up and restoring failover-cluster
configuration
• Maintaining failover clusters
• Managing cluster-network heartbeat traffic
• What is cluster-aware updating?
• Demonstration: Configuring CAU
Monitoring failover clusters
Tools you can use to monitor clusters include:
•
Event Viewer
•
Tracerpt.exe
•
MHTML-formatted cluster configuration reports
•
Performance and Reliability Monitor snap-in
Backing up and restoring failover-cluster configuration
• When backing up failover clusters, remember that:
•
Windows Server Backup is a Windows Server 2016 feature
•
Non-Microsoft tools are available to perform backups and restores
•
You must perform system-state backups
• A nonauthoritative restore completely restores a single
node in the cluster
• An authoritative restore restores the entire cluster
configuration to a point in time
Maintaining failover clusters
Failover cluster troubleshooting techniques include:
Using the Validate a Configuration Wizard
• Reviewing events in logs (cluster, hardware, storage)
• Defining a process for troubleshooting failover clusters
• Reviewing storage configuration
• Checking for group and resource failures
•
Managing cluster-network heartbeat traffic
• Types of network monitoring:
• Aggressive
• Relaxed
• Network-monitoring parameter settings:
• Delay
• Threshold
• Windows PowerShell cmdlet examples:
Get-Cluster | fl *subnet*
(Get-Cluster).SameSubnetThrehold=10
What is cluster-aware updating?
• Automated feature in Windows Server 2016
• Updates nodes in a cluster with minimal or no
downtime
• Benefits:
Updating is automatic
• Can be scheduled
• No downtime
•
How CAU works
CAU works in two modes:
• Remote updating mode:
Configure a separate computer as an orchestrator
• Install the failover-clustering administrative tools
• Ensure that the orchestrator computer is not a cluster
member
• Self-updating mode:
• Configure the CAU clustered role as a workload
• Ensure that there is no dedicated orchestrator computer
• Remember that cluster updates itself
•
Demonstration: Configuring CAU
In this demonstration, you will learn how to
configure CAU
Lesson 4: Troubleshooting a failover cluster
• Communication issues
• Repairing the cluster name object in AD DS
• Starting a cluster with no quorum
• Demonstration: Reviewing the Cluster.Log file
• Monitoring performance with failover clustering
• Using Event Viewer with failover clustering
• Windows PowerShell troubleshooting cmdlets
Communication issues
• The following might cause communications issues
in failover clustering:
Network latency
• Network failures
• Network-adapter driver issues
• Firewall rules
• Security software
•
• You can use Get-ClusterLog cmdlet to generate
the Cluster.log file for troubleshooting located in
C:\Windows\Cluster\Reports
Repairing the cluster name object in AD DS
• The CNO repair process:
• Use Repair Active Directory Object option in the Failover
Cluster Manager
• You must have Reset Password permissions on the CNO
computer object
• The VCO repair process:
• Use the AD Recycle Bin feature to recover deleted
computer objects, and use the Repair function as the last
recovery action
• The CNO will reset the password and self-heal
automatically
• The CNO must have Create Computer Objects
permissions on the VCO’s OU
Starting a cluster with no quorum
• Cluster nodes must retain quorum for the cluster to
work
• If quorum is lost, try to reestablish the quorum
• If you cannot reestablish quorum during an extended
period, start the cluster in the ForceQuorum mode
• After you start the cluster in ForceQuorum mode,
other nodes can rejoin the cluster
• Once quorum is reestablished again, cluster mode
changes from ForceQuorum to normal automatically
• When joining nodes to the cluster in ForceQuorum
mode, you should start other nodes with a setting
preventing quorum
Demonstration: Reviewing the Cluster.Log file
In this demonstration, you will learn how to review
the Cluster.log file
Monitoring performance with failover clustering
Some of the failover clustering performance
counters include:
• Cluster Network Messages
• Cluster Network Reconnections
• Global Update Manager
• Database
• Resource Control
• API
• Cluster Shared Volumes
Using Event Viewer with failover clustering
Events that are displayed in Event Viewer and require you to
troubleshoot clusters include:
• Cluster resource in clustered service or application failed
• Cluster network interface for cluster node on network
failed
• File share witness resource failed to arbitrate for the file
share
• Cluster node was removed from the active failover cluster
membership
• The Cluster service failed to bring clustered service or
application completely online or offline
• Cluster network name resource failed registration of one or
more associated DNS name(s)
• Cluster network name resource cannot be brought online
Windows PowerShell troubleshooting cmdlets
Common cmdlets for troubleshooting failover
clustering include:
• Get-Cluster
• Get-ClusterAccess
• Get-ClusterDiagnostics
• Get-ClusterGroup
• Get-ClusterLog
• Get-ClusterNetwork
• Get-ClusterResourceDependencyReport
• Get-ClusterVMMonitoredItem
• Test-Cluster
• Test-ClusterResourceFailure
Lesson 5: Implementing site high availability with
stretch clustering
• What is a stretch cluster?
• Prerequisites for implementing a stretch cluster
• Synchronous and asynchronous replication
• Overview of the Storage Replica feature
• Demonstration: Implementing server-to-server
storage replica
• Selecting a quorum mode for a stretch cluster
• Configuring a stretch cluster
• Challenges for deploying a stretch cluster
• Multisite failover and failback considerations
What is a stretch cluster?
A stretch cluster is a cluster that has been extended so that
different nodes in the same cluster reside in separate
physical locations
Site A
Site B
SAN
SAN
Prerequisites for implementing a stretch cluster
To implement a stretch-failover cluster, you must
ensure the following:
Plan for additional hardware to support enough nodes
on each site
• Ensure that the same operating systems and service
packs are installed on each node
• Include at least one low-latency and reliable network
connection between sites
• Configure a storage replication mechanism
• Configure storage infrastructure services on each site
•
Synchronous and asynchronous replication
• In synchronous replication, the host receives a write complete response
from the primary storage after the data is written successfully to both
storage locations
• In asynchronous replication, the host receives a write complete
response from the primary storage after the data is written
successfully on the primary storage
Site A
Site B
Replication
Write
request
Write
complete
Data
Primary
storage
Data
Secondary
storage
Overview of the Storage Replica feature
• Use for disaster recovery or preparedness
• Configure via Failover Cluster Manager or
Windows PowerShell
• The three replication scenarios are:
Stretch cluster
• Server-to-server
• Cluster-to-cluster
•
• Replicates synchronously or asynchronously
• Requires Windows Server 2016 Datacenter Edition
• Requires GPT-initialized disks
Storage Replica
• Synchronous replication
• Asynchronous replication
Storage Replica
Hyper-V stretch cluster supports synchronous replication only
Storage Replica
Server-to-server supports both synchronous and
asynchronous replication
Storage Replica
Cluster-to-cluster supports synchronous replication only
Demonstration: Implementing server-to-server
storage replica
In this demonstration, you will learn how to
configure storage replica
Selecting a quorum mode for a stretch cluster
• File-share witness:
• Requires three or more datacenter locations
• Is available in Windows Server 2012 R2 and
Windows Server 2016
• Azure Cloud Witness:
• Requires two datacenter locations
• Requires Internet connection for all nodes
• Is available in Windows Server 2016 only
• No witness:
• Is not recommended
• Manual failover (disaster-recovery site)
Configuring a stretch cluster
Site-aware failover-cluster services provide:
Failover affinity
• Cross-site heartbeating
• Preferred site configuration
•
Challenges for deploying a stretch cluster
When deploying stretch clusters:
Ensure that the business requirements are met
• Use storage replication between sites:
•
•
•
Hardware vendor (Windows Server 2012 R2 or earlier)
Storage Replica (Windows Server 2016)
Choose the correct quorum witness to properly
maintain functionality in the event of failures
• Choose the correct storage-replication solution to meet
the needs for Storage Replica
•
Multisite failover and failback considerations
When implementing stretch clusters in disaster
recovery scenarios, consider the following:
Failover time
• Services for failover
• Quorum maintenance
• Storage connection
• Published services and name resolution
• Client connectivity
• Failback procedure
•
Lab B: Managing a failover cluster
• Exercise 1: Evicting a node and verifying quorum settings
• Exercise 2: Changing the quorum from disk witness to
file-share witness, and defining node voting
• Exercise 3: Verifying high availability
Logon Information
Virtual machines: 20740A-LON-DC1
20740A-LON-SVR1
20740A-LON-SVR2
20740A-LON-SVR3
20740A-LON-SVR5
20740A-LON-CL1
User name:
Adatum\Administrator
Password:
Pa$$w0rd
Estimated Time: 45 min
Lab Scenario
A. Datum Corporation recently implemented
failover clustering for better uptime and
availability. The implementation is new and your
boss has asked you to go through some failovercluster management tasks so that you are
prepared to manage it moving forward.
Lab Review
• Why would you evict a cluster node from a failover
cluster?
• Do you perform failure-scenario testing for your
high-available applications based on Windows
Server failover clustering?
Module Review and Takeaways
• Review Questions
• Real-world Issues and Scenarios
• Tools
• Best Practices
• Common Issues and Troubleshooting Tips