Download High Availability with Microsoft SQL Server

MICROSOFT SQL SERVER HIGH AVAILABILITY AND DISASTER RECOVERY Michael Poremba // October 2008 Database HA & DR Experience… 2  Work with business to determine HA or DR requirements for applications and data?  Design HA or DR solutions?  Administer HA or DR process?  Still learning MS SQL Server HA & DR capabilities? Scope of this Presentation 3 Presentation Focus  Data Availability Data recovery  High availability  Disaster recovery   Technology Focus MS SQL Server  Physical servers  SANs  Beyond Scope of Presentation          In-depth how-to (available elsewhere) Partitioned views (federated) Advanced DBA techniques Custom application logic 3rd-party software solutions Alternate DBMS engines (e.g. Oracle; DB2) HA on virtual machines Complex scenarios & solutions Load balancing 4 Introduction to Data Availability So, you need to make your production database bulletproof… Data Availability Continuum 5 Degrees of protection for information systems: Business Risk Data loss Solution Redundant data High Availability Downtime of database service Redundant system components Disaster Recovery Downtime of business operations Redundant systems and facilities Data Recovery Business Case for Availability 6 High Availability   Keep business-critical applications available Secondary:  Server maintenance Disaster Recovery   Protect against loss of data center Secondary:  Application upgrades  Infrastructure upgrades Service Level Agreement (SLA) 7     Permitted downtime (planned vs. unplanned?) Uptime SLA Downtime per Year Downtime per Month 99.9% 8.76 hours 43.8 minutes 99.99% 52.6 minutes 4.38 minutes 99.999% 5.26 minutes 0.438 minutes Acceptable data/transaction loss Application response times Mean time to recovery Note: Database uptime is not equivalent to application availability   Failures of other application services Network outages Protect What? 8  Application data stores Databases  Files  Other data repositories   Database services   DBMS availability for applications Application services  Application availability for users and external systems Databases are the heart of most information systems; they deserve the highest affordable protection. Database Failure Scenarios 9 Physical Infrastructure Failures  Storage subsystem Logical Data Failures   Disk  DBMS interruption  Drops / deletes  Controller    Network Server Power Operator errors    Application defects DBMS defects Data corruption Service Recovery Strategies 10 Standby Mode Cold standby Warm standby Failover Behavior SQL Server Feature • Manual intervention required • Backup and restore to restore offline data copy • Data copy online and ready • Transaction log • Manual failover required shipping • Database mirroring Hot • Automatic failover • Database mirroring standby • Failover clustering Data Recovery—Terminology 11 Terminology varies for source vs. copy High Availability Strategy Data Source Data Copy Backup and Restore Database Backup Log Shipping Primary Secondary Standby Database Mirroring Principal Mirror Failover Clustering Primary Active Secondary Passive Standby Inactive 12 Data Recovery [Briefly…] Database Backups 13  Traditional backup types Full backup  Differential backup  Transaction log backup   Disk is better than tape First backup to disk (separate physical disk volume)  Detect exceptions encountered during backup  Verify backup files  Copy backup files to tape or remote disk   Data retention policy for backup files Database Backup Strategy 14 Backup of user databases not sufficient for recovery  System database  Master database  MSDB database  Model database  External data stores… Synch with External Data Stores 15 Synchronize recovered database with external data stores:  Identity column seeds  Full-text indexes (SQL Server 2000)    LDAP entries File system objects Other databases Backup Retention Policy 16    Location of backup files Duration of retention Protection of sensitive data  Sarbanes/Oxley (SOX)  HIPAA  Internal  policies for data management and protection Access to backups from offsite data storage Data Recovery Process 17  Backup file sets    Full baseline, differential, and transaction logs Recovery strategy depends on failure scenario  Retrieving backup files     Offsite storage Tape Network copy Dependency on multiple people to get access to backup files      Create comprehensive failure matrix Devise recovery strategy for each scenario Does worst-case recovery scenario fit within SLA parameters? Recovery time; SLA Include future data growth in recovery plan Fully test recovery strategies—practice is essential 18 High Availability High Availability 19  Minimize or avoid service downtime  Whether  planned or unplanned When components fail, service interruption is brief or non-existent  Automatic  failover Eliminate single points of failure (as affordable)  Redundant components  Fault-tolerant servers Redundant Components 20 Objective: Avoid single points of failure (where affordable) Approach: Use redundant components for database service  Database server nodes  Server components      DBMS instance User databases Storage devices Storage unit components    MPIO: Interfaces; paths; switches; controllers RAID: Disks Networking   ECC RAM; failure-tolerant HW & OS MPIO: Interfaces; paths; switches Data copies  E.g. Recovering torn page from mirror in SQL Server 2008 Transaction Log Shipping 21   Warm standby solution Duplicate user database  Copy  transaction logs to standby server & restore Database available for read-only access  Users must disconnect for logs to be applied  Two database licenses required if querying standby    Manual application failover Supported on standard hardware Possible data loss (unapplied transactions) Database Mirroring 22  Redundancy at user database level           Requires witness server Mirror-aware application client connection  node A node B High-availability: commit @ log on mirror; automatic failover High-protection: commit @ log on mirror; manual failover High-performance: commit when logged on principal Very fast automatic failover—seconds   Mirror always redoing transactions from principal Negligible impact on transaction throughput Multiple mirroring modes:   witness (optional) Mirrored over private network channel   Duplicate copy of user database Independent storage devices Multiple copies of instance databases Provided by client library Database connection string must specify both servers Mirror may be available for read-only access (snapshots) Works with standard hardware Local Storage · local sys DBs · source user DB Local Storage · local sys DBs · mirror user DB Mirror Witness 23   With mirroring, more than one server is required to decide on failover Witness automates failover from primary to mirror Watches database availability  Reports observations back to principal and mirror     Runs in separate SQL Server instance (Express is OK) Prevents “split brain” scenario Very low resource consumption   Can be witness for multiple databases Not a single point of failure SQL Server Failover Clustering 24  Two clustered nodes  Active/Passive  MS SQL services  Running  config on virtual server Shared storage device  User databases  System databases  Quorum drive  Redundant internal components node A node B Shared Storage · system DBs · user DBs · quorum Active/Passive Failover Clustering 25  Redundancy at database instance level    Single data copy on shared storage device      SQL Agent; Analysis Services; Full-Text engine, MS DTC Automatic failover (up to minutes) DBMS accessed over virtual IP Database not available from inactive node for DB client connections   No I/O overhead reducing throughput Storage unit is single point of failure for cluster node A node B All database services are clustered   All databases fail over together Shared copy of system databases Storage is controlled by one cluster node at a time Requires hardware certified by Microsoft for Microsoft Cluster Service Shared Storage · system DBs · user DBs · quorum HA Comparison 26 Database Mirroring          Scope: user DB Standard hardware One SQL license (unless querying snapshots on mirror) Very fast failover (seconds) OS flexible (e.g. 32/64) Independent storage Independent services Reporting on mirror Geographic separation OK Failover Clustering          Scope: DBMS instance Certified hardware One SQL license (only one node can access database) Automatic failover (up to minutes) Enterprise OS Shared storage Clustered services Standby not available Servers are usually co-located Considerations for HA 27  HA complements backup and recovery strategy   Does not replace data recovery plan Application service availability is often determined by a network of interdependent services Availability can be difficult to define (e.g. partial failures)  Failure probability difficult to measure or compute   Increased system complexity could lead to lower service availability! Operator error a leading cause of availability issues  Increased number/types of system components  More complex to configure and administer  Data Recovery Requirements Requirements Backup and Recovery Log Shipping DB Mirroring – High-Performance DB Mirroring – High-Protection DB Mirroring – High-Availability Failover Clustering 28 Cost Low Low/Med Medium Medium Medium High Relative complexity Low Low Medium Medium High High Data loss Possible Latest log Possible None None None Scope of duplication Database Database Database Database Database DBMS Failover Downtime Downtime Manual Manual Seconds Up to minutes Client redirect Manual Manual Automatic Automatic Automatic Automatic Rolling upgrades & maint. No No OS & DB OS & DB OS & DB OS Access data on secondary Restore Read-only Snapshot Snapshot Snapshot No Geographic separation OK OK OK Latency? Latency? Latency? 29 Disaster Recovery Disaster Recovery 30  Minimize downtime of business operations  Redundant  systems and facilities SQL Server features:  Transaction log shipping  Database mirroring  Failover clustering  Other technologies  Storage-based mirroring Disaster Recovery Planning 31        Data security requirements Clarify SLA, data loss allowance Evaluate system cost vs. data protection Failure analysis System redundancy Process validation Training for personnel Prevention practices  Executing disaster recovery and business continuity   Practice, practice, practice Business Continuity Facility 32  System redundancy  Systems: Web servers app servers; database, etc.  Data: Databases; data files on OS; security info, etc.  Networking: Domain, routing, subnet, VIPs, etc.  Alternate facilities  Network bandwidth  Physical or network access by operations staff  Failover  Often a deliberate decision, using manual failover Data Redundancy 33  Synchronous redundancy Network bandwidth cost  Network latency and application performance  Network reliability   Asynchronous redundancy Risk of data loss  More cost-effective  Resilient to network latency issues   Candidate Technologies SQL Server database mirroring  Failover clustering with SAN-based mirroring  DR Using Database Mirroring 34    Two sites: Primary and DR location Separate failover clusters at each site SQL Server database mirroring between sites witness (optional) failover cluster at site A node A1 failover cluster at site B node A2 Shared Storage A · local sys DBs · local quorum · source user DB database mirroring node B1 node B2 Shared Storage B · local sys DBs · local quorum · mirror user DB DR Using SAN-Based Mirroring 35     Two sites: Primary and DR location Four-node failover cluster; one virtual IP address SAN-based mirroring between sites Manual cluster failover failover cluster nodes at site A node A1 failover cluster nodes at site B node A2 Shared Storage A · system DBs · quorum · user DBs node B1 storagebased mirroring node B2 Shared Storage B · system DBs · quorum · user DBs 36 Complimentary Technologies [Skip if time is running short.] SAN-Based Data Mirroring 37  Data blocks duplicated at storage level   Copy performed in sequence and coordinated with database checkpoint     Ensures consistency of mirrored data files Synchronous or asynchronous mirroring Co-located or geographically dispersed—both are OK   Similar to transaction log shipping SAN link bandwidth must support database I/O rate May require extra feature support from SAN vendor Could rely on Failover Clustering for HA SQL Server Database Snapshots 38   Read-only point-in-time database snapshot No data is copied—instantaneous  Historical snapshot pages tracked separately from changing pages  Snapshots can be maintained indefinitely  Limited  only by available storage Snapshot copy can be used for reporting  Read-only, so no locking issues SQL Server Replication 39  Transactional replication     Merge replication    High transaction volume Low data latency required Mixed technologies: Integrates with other DBMS Bi-directional data changes Typically server-to-client Snapshot replication    Large, infrequent data changes Data change latency OK Best for smaller data sets     Subscriber databases available for reporting Replicate data subsets Some data loss is possible Periodically validate replicated data 40 App Development and Admin Considerations for App Developers 41             App services tolerant to database service interruptions Application transactions must be handled in code—data consistency Exception handling for transaction retry, connection recovery Requires coding standards, code reviews, and testing Bulk data operations Transaction volume impacts rollback time during failover Batch jobs must be run on alternate nodes Don’t bypass transaction logging Synchronization with external data sources? Be aware of database recovery model Mirroring uses FailoverPartner in connection string Use TCP/IP as client protocol Considerations for Admins 42   Use identical server hardware, when possible Design network redundancies, when feasible     Always manage through virtual cluster, not individual cluster nodes Retest failover/failback after HA maintenance Diagnose after failover      Consider network latency for geographic separation Repair alternate node Resynchronize data, as necessary Be aware of primary/secondary locations Ensure application services are connected and functioning properly Keep server node configurations synchronized:    Service pack and patch levels Duplicate non-redundant resources Jobs; logins and permissions; OS & sys objects HA Risks 43       System performance degradation HA system complexity leads to availability issues Some system failures not planned for Backup and recovery planning incomplete Administrators not fully trained or informed User databases not synchronized with other data sources Common Admin Use Cases 44  Maintain HA nodes  Hardware maintenance  Rolling upgrades and software patches  Resynchronize the redundant copy  Re-synch mirror  Restart log shipping  Diagnose and repair  Diagnose cause of failover  Repair failed node and restore failover capabilities  Test failover and failback Common Admin Actions 45 Train and practice administrators to:  Initiate a database mirror  Manually failover mirror database or cluster node  Add/remove passive node from mirror or cluster  Upgrade/patch servers nodes  Restart or redirect application services 46 More Information References—Books 47 High Availability    Microsoft SQL Server 2008 High Availability with Clustering & Database Mirroring by Michael Otey, 2009. Microsoft SQL Server High Availability by Paul Bertucci, 2004. Pro SQL Server 2005 High Availability by Allan Hirt, 2007. Related Topics    Pro SQL Server 2005 Replication by Sujoy Paul, 2006. Pro SQL Server 2005 Service Broker by Klaus Aschenbrenner, 2007. The Rational Guide to SQL Server 2005 Service Broker by Roger Wolter, 2006. References—Presentations 48        Microsoft Load Balancing and Clustering http://ce.sharif.edu/courses/84-85/2/ce317/resources/root/lecture%20slides/ 14.%20Microsoft%20Load%20Balancing%20and%20Clustering.ppt SQL Server 2005 High Availability http://www.atlantamdf.com/Presentations/AtlantaMDF_111207HA.ppt High Availability Technologies In SQL Server 2000 And SQL Server 2005 http://202.181.238.2/hk/teched2004/ppt/Day_2_Rm407/DAT431(1330-1445).ppt Meeting the Availability Challenge http://download.microsoft.com/download/E/D/C/EDCF54DB-19CD-4882-9FC44F7D46FCEAA6/HighAvailability.ppt Disaster Recovery Mistakes http://www.sqlsig.org/Oct%2011%20DASSUG%20-%20Jason%20Hall%2010-11-07%20MM.ppt SQL Server 2005 High Availability http://blogs.msdn.com/sql2005event/attachment/564303.ashx Effective Usage of SQL Server 2005 Database Mirroring http://www.sqlserver-qa.net/SSQAEffective%20Usage%20of%20SQL%20Server%202005%20Database%20Mirroring_show.ppt References—Articles 49  Achieve High Availability for SQL Server http://technet.microsoft.com/en-us/magazine/cc162477.aspx  Geographically Dispersed Clusters in Windows Server 2003 http://www.microsoft.com/windowsserver2003/techinfo/overview/clustergeo.mspx  Restoring file and filegroup backups http://support.microsoft.com/kb/281122/en-us  Restoring specific tables or rows from backups http://support.microsoft.com/kb/321836/en-us  Maintaining Availability During Upgrades http://msdn.microsoft.com/en-us/library/ms191449.aspx

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download High Availability with Microsoft SQL Server