Download Exchange Server 2013 High Availability and Site Resilience

Capabilities are subject to change Packaging and licensing have not yet been determined Any screen captures or concepts shown are pre-release and for illustration purposes only Disclaimer This presentation contains preliminary information that may be changed substantially prior to final commercial release of the software described herein. The information contained in this presentation represents the current view of Microsoft Corporation on the issues discussed as of the date of the presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of the presentation. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESSED, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this presentation. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this information does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Database sizes must be manageable Reseeds must be fast and reliable Passive copy disk IOPS inefficient Lagged copies Limited agility have assymetric from low disk storage design space recovery Capacity is increasing, but IOPS aren’t Server1 DB1 Passive Server2 Server3 20MB/Sec DB1 Active 20MB/Se DB2 Passive DB2 Active DB1 c Passive DB3 Passive DB4 Active Server4 DB1 Active 12MB/Sec DB1 Passive DB3 Active 12MB/Sec DB4 Active 20MB/Sec DAG Single database copy/disk: • Reseed 2TB Database = ~23 hrs • Reseed 8TB Database = ~93 hrs 4 database copies/disk: • Reseed 2TB Disk = ~9.7 hrs • Reseed 8TB Disk = ~39 hrs Server1 Server2 Server3 Server4 50% IOPS 40% Utilization 40% IOPS IOPS Utilization Utilization 50% IOPS 40% Utilization 40% IOPS IOPS Utilization Utilization 30% IOPS Utilization 50% IOPS 40% Utilization 40% IOPS IOPS Utilization Utilization 30% IOPS Utilization 50% IOPS 40% IOPS Utilization Utilization 30% IOPS Utilization Active A/A/P/L A/P/P/L Passive A/A/P/L Passive A/P/P/L Passive A/A/P/L Passive A/P/P/L Lag A/A/P/L A/P/P/L Lag DAG Exchange 2013 failover speed 2x better than Exchange 2010! Decreased IO latency = better user experience Exchange Server 2010 Passive database IOPS = active database IOPS • Active = 100MB Checkpoint Depth • Passives = 5MB Checkpoint Depth Exchange Server 2013 Passive database IOPS = 50% of active database IOPS • IOPS Savings + ESE Fast failover = 100MB Checkpoint Depth on passives with no failover perf penalty 25% increase in aggregate disk utilization, e.g., • 4 database copies/disk • Balanced = 1 Active, 2 Passives, 1 Lag Failure Mode: Actives/disk doubles* • 2 Active, 1 Passive, 1 Lag • 50% IOPS Utilization passive database IO = active database IO passive copy performs aggressive prereading background database maintenance runs at 5 MB/sec/copy Single logical disk/partition per physical disk Database copies per volume = copies per database Same neighbors on all servers Balance activation preferences Disk failure on active copy = database failover Failed disk & database corruption need to be addressed quickly Fast recovery to restore redundancy is needed Use spares to automatically restore database redundancy after a disk failure Automatic Reseed Periodically scan for failed and suspended copies Check prerequisites: single copy, spare availability Allocate and remaps a spare Start the seed Verify that healthy copy Release the original disk AutoDagDatabasesRootFolderPath AutoDagVolumesRootFolderPath Configure storage subsystem with spare disks Create DAG, add servers with configured storage Create directory and mount points Configure DAG, including 3 new properties Create mailbox databases and database copies MDB1 AutoDagDatabaseCopiesPerVolume = 1 MDB1 MDB1 DB MDB2 MDB2 MDB1 logs MDB1 DB MDB1 logs Name Check Action Threshold System Bad State No threads, including non-managed threads, can be scheduled Hard restart (bugcheck) 302 seconds Long I/O Times I/O operation latency Hard restart (bugcheck) 41 seconds Replication service memory threshold (ok, not a storage failure ) MSExchangeRepl.exe consumes excessive memory 1. Log event 4395 with termination message 2. Initiate termination of msexchangerepl.exe 3. If termination fails, hard restart (bugcheck) 4 GB Managed Availability Database Failover Changes DAG Network Auto-Config Best Copy Selection Changes Cmdlet Enhancements Maintenance Mode Transport HA Enhancements If a protocol goes down on a mailbox server, every active database loses access to that protocol For most protocols, quick correction is provided through restart action If restart fails, often a failover is triggered • Protocols control recovery sequence • Recovery sequence optimized thru Office 365 experience; Service experience accrues to enterprise! Layer 4 LB time CAS15-1 CAS15-2 DAG MBX15-1 DB1 DB1 OWA DB2 MBX15-2 DB1 OWA DB2 Managed Availability = Monitoring + HA “Stuff breaks, but the Experience does not” MBX15-3 DB1 OWA DB2 • Reliable and scalable monitoring framework for Exchange components Provides • Broader perspective across groups of Exchange servers • Sequencing mechanism to control when recovery actions are done vs. alert issued (human engaged) Provides • Common set of recovery actions • Set of enhancements to the best copy selection (BCS) process • Mechanism to control in and out of service for Mailbox and CAS Provides (maintenance mode++) Restart Service - kill and start a service; optional dump AppPool - restart an app pool; optional dump Server bugcheck the machine Failover, Offline, Online Database - failover a single active database System- failover all active databases Protocol off - set health state for protocol to offline Protocol on calculate when a health set is green Escalate Notify a human of an issue Checks for a server hosting a copy of the affected database that has health sets in a state that is the same as the current server hosting the affected copy Same as Source Checks for a server hosting a copy of the affected database that has health sets in a state that is better than the current server hosting the affected copy All Better than Source Checks for a server hosting a copy of the affected database that has all health sets Medium and above in a healthy state Up to Normal Healthy All Healthy Checks for a server hosting a copy of the affected database that has all health sets in a healthy state cas1 cas2 Redmond cas3 cas4 Portland 1. Mark the failed servers/site as down: Stop-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Redmond 2. Stop the Cluster Service on Remaining DAG members: Stop-Clussvc 3. Activate DAG members in 2nd datacenter: Restore-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Portland mbx1 mbx2 Redmond dag1 mbx3 mbx4 Portland 1. Mark the failed servers/site as down: Stop-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Redmond 2. Stop the Cluster Service on Remaining DAG members: Stop-Clussvc 3. Activate DAG members in 2nd datacenter: Restore-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Portland mbx1 mbx2 Redmond dag1 mbx3 mbx4 Portland namespace simplification consolidation of server roles separation of CAS array and DAG recovery de-coupling of CAS and Mailbox by AD site load balancing changes three locations Assuming MBX3 and MBX4 are operating and one of them can lock the witness.log file, automatic failover should occur If not, you can perform fast recovery using previous steps mbx1 mbx2 dag1 mbx3 mbx4 witness Redmond Portland Download the preview version of Exchange Server 2013 Try the new Exchange Online in the Office 365 Enterprise Preview Follow the Exchange Team Blog Product Documentation

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Exchange Server 2013 High Availability and Site Resilience