Download High Availability and Disaster Recovery SQL Server Solution

May 08 – 09 2012, Kongresshaus Berchtesgaden SQL Server High Availability Concepts & Solution Guidance (2008 R2 & 2012) Satya SK Jayanty Director & Principal Architect D BI A Solutions [email protected] About me  IT Experience  Principal Architect & Consultant – D Bi A Solutions : Europe ([email protected])  Been in the IT field over 20+ years (using SQL Server ver.4.2 onwards)  Publications  Author: Microsoft SQL Server 2008 R2 Administration cookbook – Packt Publishers (May 2011)   Co-author for MVP Deep Dives Volume II – Manning Publications (October 2011) Community Contributions  SQL Server MVP since 2006  Founder (SQLMaster) & blogs at www.sqlserver-qa.net (SQL Server Knowledge Sharing Network)  Contributing Editor & Moderator - www.sql-server-performance.com [SSP]  Quiz Master & Blogger: www.beyondrelational.com & www.sqlservergeeks.com  Active participation in assorted forums such as SSP, SQL Server Central, MSDN, SQL Server magazine, dbforums etc. www.sqlserver-qa.net @sqlmaster www.packtpub.com www.sqlserver-qa.net @sqlmaster Agenda • Understanding High Availability • Common terms • Planned and Unplanned Downtime • High Availability vs. Disaster Recovery • Specific SQL Server 2008 + R2 features for High Availability • SQL Server 2012 – What’s New • AlwaysOn Describing High Availability Number of 9’s Availability Percentage 2 99% 3 99.9% 4 99.99% 5 99.999% IT Lifecycle Database Service Management Build IT Operations Operate Maintenance High Availability Definition Database Service Management Online Offline Online Offline Online Time between failure Time to repair IT Operations Online Offline Online Uptime Offline Online Downtime MTBF – Mean Time Between Failures, MTTR – Mean Time To Repair Database Server Availability Tables Database SQL Server Instance Operating System Hardware, Network and Storage Total Availability = product of the availability of each component Downtime Planned Unplanned Software releases Hardware component failure OS Patch releases Security breaches SQL Server service packs and hotfixes Human error Database maintenance and upgrades Natural disasters Consider: • RTO – Duration of Outage (Recovery Time Objective) • RPO – Measure of acceptable data loss (Recovery Point Objective) Justify ROI  Avoid downtime  Automating recovery  Resource utilization Disaster recovery Business Continuity Plan Business Function Database Service Management Online Offline Online Offline Online IT Operations Online Recovery Time Objective Disaster recovery plan Data loss Recovery Point Objective High Availability vs. Disaster Recovery High Availability Disaster Recovery Increased uptime Designed to meet RTO and RPO Typically focuses within a site Recovery from site-level disasters Component level redundancy Data and Operations Recovery Windows Failover Clustering Data backup and Restore Downtime: limit investigation time Recovered: find the root cause • • High Availability (HA) – prevent an outage Disaster Recovery (DR) – address & re-establish HA after outage • HA is feature and DR is implementation (must be tested) • RCA is highly essential in both aspects. SQL SERVER 2008 R2 HIGH AVAILABILITY FEATURES Failover clustering Storage Area Network Direct Attached Storage HIGH AVAILABILITY FEATURES Geographically dispersed failover clustering HIGH AVAILABILITY FEATURES Database mirroring HIGH AVAILABILITY FEATURES Log shipping HIGH AVAILABILITY FEATURES Transactional Replication HIGH AVAILABILITY FEATURES Peer-to-peer transactional replication WINDOWS HIGH AVAILABILITY FEATURES Other features Edition specific • Online Indexing • Hot Add CPU • Hot Add Memory – Adjust memory online without restart of SQL Services COMMON CONFIGURATION SCENARIOS Failover clustering SQL Instance A Node A SQL Instance A Node B mscs SQL Instance A SQL Instance B Node A SQL Instance A SQL Instance B Node B mscs Storage Area Network Storage Area Network COMMON CONFIGURATION SCENARIOS Geographically dispersed failover clustering SQL Instance A Node A SQL Instance A Node B SQL Instance A Node C SQL Instance A Node D COMMON CONFIGURATION SCENARIOS Database mirroring and log shipping SQL Instance A SQL Instance A SQL Instance A SQL Instance A Node A Node B Node A Node B COMMON CONFIGURATION SCENARIOS Peer-to-peer transactional replication SQL Instance A SQL Instance A SQL Instance B SQL Instance B Node A1 Node A2 Node B1 Node B2 SQL Instance C SQL Instance C Node C1 Node C2 COMPARING HIGH AVAILABILITY SOLUTIONS Failover Failover clustering Geographically dispersed failover cluster • SQL Server Instance • Automated • Within minutes • No application redirection required • SQL Server Instance • Automated • Within minutes • No application redirection required Database mirroring • Database • Manual or automated • Few seconds to minutes • Application redirection required (can be automated) Log shipping • Database • Manual • Based on configuration • Forced application redirection required Peer-to-peer replication • Publication • Application redirection • Few seconds • Forced application redirection COMPARING HIGH AVAILABILITY SOLUTIONS Secondary server Failover clustering • Instance • Multiple nodes • Server in same LAN • Hot standby server • Available for use • Shared storage • Status check using Heartbeat Geographically dispersed failover cluster • Instance • Multiple nodes • Span across WAN • Hot standby • Available for use • Shared storage at site-level • Status check using heartbeat Database mirroring • Database • Principal and Mirror • Span across WAN • Hot standby • Available for use • Storage need not be shared • No status check unless using witness Log shipping • Database • Multiple secondary nodes • Span across WAN • Warm standby • Available for use • Storage need not be shared • No status checking Peer-to-peer replication • Publication • Multiple nodes • Span across WAN • Hot standby • Available for use • Storage need not be shared • No status checking COMPARING HIGH AVAILABILITY SOLUTIONS Data considerations Failover clustering • All instance databases are shared • Loss limited to last hardened record • No data copy across network Geographically dispersed failover cluster • All instance databases are shared • Loss limited to last copied storage unit • Based on technology Database mirroring • Configured database is mirrored • Loss limited to last mirrored transaction • Synchronous or Asynchronous mirroring Log shipping • Configured database logs are applied • Loss limited to last applied transaction log backup • Transaction log Backup, Copy and Restore Peer-to-peer replication • Configured publication is replicated • Loss limited to last replicated record • Replication agents replicated changes COMPARING HIGH AVAILABILITY SOLUTIONS Network considerations Failover clustering • Network connectivity needs to meet heartbeat requirements • Data is not transferred over network Geographically dispersed failover cluster • Nodes can be geographically dispersed • Based on the storage replication configuration Database mirroring Log shipping • Network latency to meet I/U/D SLA’s • Data is transferred over network • Network latency to meet RTO • Transaction Log file records transferred over network Peer-to-peer replication • Network should meet RTO • Data changes transferred over network High Availability Solution High Availability Solution Database mirroring/Log shipped High Availability Solution Interoperability: Database mirroring and Log shipping combination High Availability Solution Peer-to-peer transactional replication SQL Server 2012:: HA What’s new AlwaysOn :: Configuring availability at both database & instance level. • AlwaysOn Availability Groups (AG) – – – – – Log based data movement without shared disks Zero data loss Automatic & manual failover of a logical group of databases Support upto 4 secondary replicas Automatic page repair (continuing from SQL2008 R2) • AlwaysOn Failover Cluster Instances (FCI) – Multi-site clustering across subnets – Enables cross data-center failover of SQL instances – Faster failover for application availability What’s new:: Reduce Downtime Helps reduced planned downtime! (Comes with a cost) • Windows Server Core • Online operations (SQL Server) • Rolling upgrade & patches (AlwaysOn) • SQL Server on Hyper-V (benefit of Live Migration) – migrate virtual machines between hosts with zero downtime. • Easy deployment – Configuration Wizard, Windows PowerShell command-line interface, dashboards, dynamic management views (DMVs), policy-based management, and System Center integration help simplify deployment and management of availability groups. AlwaysOn: RPO & RTO Capabilities Potential Data Loss (RPO) Potential Recovery Time (RTO) Automatic Failover Readable Secondaries(1) AlwaysOn Availability Group - synchronous-commit Zero Seconds Yes(4) 0-2 AlwaysOn Availability Group - asynchronous-commit Seconds Minutes No 0-4 AlwaysOn Failover Cluster Instance NA(5) Yes NA Database Mirroring(2) - High-safety (sync + witness) Zero Seconds -to-minutes Seconds Yes NA Database Mirroring(2) - High-performance (async) Seconds(6) Minutes(6) No NA Log Shipping Minutes(6) Minutes -to-hours(6) Hours -to-days(6) No Not during a restore Not during a restore High Availability and Disaster Recovery SQL Server Solution Backup, Copy, Restore(3) 1. 2. 3. 4. 5. 6. Hours(6) No An AlwaysOn Availability Group can have no more than a total of four secondary replicas, regardless of type. This feature will be removed in a future version of Microsoft SQL Server. Use AlwaysOn Availability Groups instead. Backup, Copy, Restore is appropriate for disaster recovery, but not for high availability. Automatic failover of an availability group is not supported to or from a failover cluster instance. The FCI itself doesn’t provide data protection; data loss is dependent upon the storage system implementation. Highly dependent upon the workload, data volume, and failover procedures. AlwaysOn Layers of Protection Provides fault tolerance and disaster recovery across several logical and physical layers of infrastructure and application components • Infrastructure level – Windows Server Failover Clustering (WSFC): Server-level-fault-tolerance & intra-node network • SQL Server instance level – FCI: attached to symmetric shared storage • Database level – AG – Availability Groups: Primary replica & 4 Secondary replica – Each replica is hosted by an instance (FCI or non-FCI) on different node of WSFC • Client Connectivity – connect directly to a SQL Server instance network name, or – they may connect to a virtual network name (VNN) that is bound to an availability group listener – Logical redirection to appropriate SQL Server instance and database replica AlwaysOn: Storage Considerations Direct-attached vs. remote • HBA • SAN – iSCSI or Fibre channel • SMB (Server Messaging Block) Symmetric or Assymetric • Storage devices are considered symmetric • SSDs are good Dedicated vs Shared • Dedicated reserved for use and assigned to a single node in the cluster • Shared storage is accessible to multiple nodes in the cluster • WSFC supports cluster shared volumes – file sharing • SQL Server does not support to a shared volume Availability Improvements • Flexible Failover Policy – sp_server_diagnostics uses FailureConditionLevel • Failover Policy for Failover Cluster Instances (http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx) • Enhanced logging and instrumentation – Specific system configuration views, DMVs, performance counters, and an extended event health session • AlwaysOn Availability Groups Dynamic Management Views and Functions (http://msdn.microsoft.com/en-us/library/ff877943(SQL.110).aspx), & • sys.dm_os_cluster_nodes (http://msdn.microsoft.com/en-us/library/ms187341(SQL.110).aspx) • SMB file share support: – SQL Databases on File Shares - It's time to reconsider the scenario (http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sql-databases-onfile-shares-it-s-time-to-reconsider-the-scenario.aspx) Using storage replication technologies effectively • Test the solution prior to deployment – Test the database failover and ensure they can be brought online every single time – Test the entire solution to ensure that the required operation processes and documents are in-place • Understand the performance impact of the solution implemented – Synchronous replication can reduce RPO to zero but impacts performance based on the network latency – Asynchronous replication can reduce performance impact, but increases RPO – Benchmark the solution performance prior to the deployment • Implement vendor specified best practices Using storage replication technologies effectively • Data growth – Understand the impact of dynamically increasing the size of the LUNS if available • Follow SQL Server best practices – Keep each database data and log files on it’s own devices – Avoid replicating the TempDB – Simple user databases physical layout help simplify maintenance www.packtpub.com www.sqlserver-qa.net @sqlmaster End slide if you need one Any Questions?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download High Availability and Disaster Recovery SQL Server Solution