* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database Mirroring
Serializability wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Tandem Computers wikipedia , lookup
Microsoft Access wikipedia , lookup
Team Foundation Server wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Ingres (database) wikipedia , lookup
Concurrency control wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Database model wikipedia , lookup
Meeting the Availability Challenge Don Vilen Program Manager SQL Server Microsoft Corporation Causes of Data Unavailability What part of your database system causes the most data unavailability? Hardware? SQL Server and Windows? Application software? Your people? People, Planning, and Procedures In several studies the major cause of data unavailability is people Operators and DBAs Users SQL Server 2005 has many enhancements to improve availability, but not all problems can be solved by DBMS technology Training your people, planning your operations, and documenting your procedures can have the greatest impact on availability Availability Enhancements Database Snapshots Database Mirroring Partial Database Availability Table and Index Partitioning Row-level Versioning Snapshot Isolation Read Committed Snapshot Isolation Failover Clustering enhancements Fast Database Recovery Peer-to-Peer Replication Scaleable Shared Database Checksum on Data Pages Instant File Initialization Backup/Restore Checksum on Backups Backup Media Mirroring Restore Read-Only Filegroups without Log Page-level Restore Piecemeal Restore Data Backups don’t block Log Backups Dedicated Admin Connection (DAC) Dynamic Configuration CPU Affinity, AWE memory Attach/Detach Areas of Availability Operator, DBA, and User Error High Availability and the SQL Server process Availability inside the SQL Server process Operator, DBA, User Error People are the cause of most unavailability Protect – Permissions: Restrict as necessary Prevent / Avoid – Triggers on DDL and DML can rollback an operation even if they do have permission Detect – Triggers and Events can be used to detect and record who, what, when Delay – Log Shipping can delay the error from reaching the standby server Recover – Database Snapshot can save a copy of a database at a point in time Can RESTORE from the Snapshot to recover Database Snapshot Read-only Snapshot of an entire database at a point in time Created instantly; no data is copied Snapshot must be created before the error Base database continues to change Database Snapshot does not restrict the base database Multiple Snapshots are allowed Database Snapshots can exist forever Constrained by resources Database Snapshot Syntax Examples To create a database snapshot CREATE DATABASE mydbSnap0600 ON (<filelist>) AS SNAPSHOT OF mydb To drop a database snapshot DROP DATABASE mydbSnap0600 To restore a database to a snapshot RESTORE DATABASE mydb FROM DATABASE_SNAPSHOT = 'mydbsnap0600' Database Snapshot How it really works CREATE DATABASE mydbSnap AS SNAPSHOT OF mydb USE mydb UPDATE (pages 4, 9, 10) mydb – Database Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 mydbSnap – Read-Only Database Snapshot USE mydbSnap SELECT (pages 4, 6, 9, 10, 14) Areas of Availability Operator, DBA, and User Error High Availability and the SQL Server process Availability inside the SQL Server process High Availability and the SQL Server Process Instance Maintenance Instance and Database Startup Ensuring High Availability for the SQL Server Process Instance Maintenance Install and Upgrade More reliable Install / Setup process Faster Upgrade Minimal downtime during upgrade System Stored Procedures are in the ‘resource database’ Read-only, invisible/transparent to users Allows for fast, side-by-side upgrades No scripts to run to upgrade system procs master is upgraded during the Upgrade process Other databases are upgraded as part of database startup – once master is complete Upgrade also occurs: When a down-level database is attached When a down-level database is restored Instance Maintenance Configuration More dynamic configuration No instance restart required to change: AWE Memory configuration Hot-add memory CPU Affinity configuration Surface Area Configuration Manager Instance and Database Startup Faster Instance Startup Memory, even AWE memory, is allocated as needed Faster Database Startup Fast Recovery User databases start up in parallel Fast Recovery on Database Restart SQL Server 2000 Database is available after Undo completes Redo Undo Time Available SQL Server 2005, Enterprise Edition Database is available when Undo begins Redo Undo Available Ensuring High Availability for the SQL Server Process Failover Clustering Database Mirroring Peer-to-Peer Transactional Replication Others Log Shipping Backup/Restore and Attach/Detach 3rd Party Solutions Geographically-Dipersed Clusters Hot Standby Failover Solutions Failover Clustering and Database Mirroring Both Provide Failover Cluster Automatic detection Automatic, fast failover Manual failover Transparent client redirect Zero work loss Database Mirroring Failover Cluster Failover Clustering * Inst1 Hot Standby – Automatic failover Built on Microsoft Server Clusters (MSCS) Multiple nodes provide availability, transparent to client Maximum of 2, 4, or 8 nodes depending on OS edition Automatic detection and failover Requires certified hardware; see Windows Catalog: Clustered Supports many scenarios: Multiple Active Instances, N+1, N+I Multiple Active Instances * Inst1 Inst3 * Inst2 * N+1: N Active, 1 Inactive Instances * Inst1 Inst2 * N+I: N Active, I Inactive Instances Failover Cluster Failover Clustering Hot Standby Instance * Inst1 Zero work loss, zero impact on throughput Instance Failover – entire instance works as a unit Single copy of instance databases Available since SQL Server 7.0 Setup is performed at install time Clients connect to ‘virtual’ server-name / IP Failover Cluster Failover Clustering SQL Server 2005 More nodes Match operating system limits Two-node Failover Clustering is available in SQL Server 2005 Standard Edition Unattended setup Support for mounted volumes (Mount Points) Full support for Majority Node Set quorum Full-Text Search and Analysis Services are now cluster aware Database Engine, Agent have been since 7.0 Analysis Services now has multiple instances Database Mirroring New for SQL Server 2005 Hot Standby Database Database Failover Very fast failover Just seconds in most cases Zero data loss Automatic or manual failover Automatic re-sync after failover Automatic, transparent client redirect Database Mirroring Database Mirroring Database Mirroring Hardware Works with standard computers, storage, and networks No shared storage components, virtually no distance limitations Impact to transaction throughput Zero to minimal, depending on environment / workload / network Database Mirroring Fault Tolerant Virtual Database Clients Witness Principal Mirror Witness Witness and Quorum Sole purpose of the Witness is to provide automatic failover To survive the loss of one server you must have at least three Prevents “split brain” Does a lost connection mean the partner is down or is the network down? To become the Principal, a server must talk to at least one other server The Witness does not direct the Mirror to become Principal It simply answers the question “Who do you see?” Witness Witness Witness is an instance of SQL Server 2005 Perhaps even SQL Server Express Can be witness for multiple sessions Consumes very little resources Not a single point of failure Partners can form quorum on their own Database Mirroring How it works Application Mirror is always redoing – it remains current Witness Commit Principal Mirror 1 5 2 SQL Server 2 Log >2 Data SQL Server 4 3 Log >3 Data Safety / Performance There is a trade-off between performance and safety Database Mirroring has two safety levels FULL – commit when logged on Mirror Allows automatic failover No data loss OFF – commit when logged on Principal System does its best to keep up Prevents failover; to make mirror available Must ‘force’ service Or terminate Database Mirroring session Database Mirroring Modes High-Availability Mode Safety Full; Synchronous operation Database is available whenever a quorum exists Automatic failover High-Protection Mode Safety Full; Synchronous operation No witness Principal continues servicing the database even if it loses connection to mirror Manual failover only; no automatic failover A transition mode; should not be in this mode for long High-Performance Mode Safety Off; Asynchronous operation Manual failover only Supports only one form of role switching: forced service (with possible data loss) Transparent Client Redirect No changes to application code Client automatically redirected if session is dropped Client library is aware of Principal and Mirror servers Upon initial connect to Principal, library caches Mirror name When client attempts to reconnect If Principal is available, connects If not, client library automatically redirects connection to Mirror Peer-to-Peer Transactional Replication Peer-to-Peer Transactional Replication New in SQL Server 2005 Transactional replication but .. All participants are peers Schema is identical on all sites Publish the updates made on ‘their’ data Subscribe to others to pick up their changes No hierarchy as in ‘normal’ transactional replication A given set of data can be updated at only one site at a time Data ‘ownership’ is purely logical; does not prevent conflicts SQL Server prevents a change from round-tripping Enables load-balancing and high availability Warm / hot standby Small possibility of data loss on failure Scalable Shared Database Scalable Shared Database New in SQL Server 2005 EE Multiple instances sharing the ReadOnly same database files Volume must be Read-only Database access is Read-only Enables load-balancing and scalability Some availability increase but it depends on shared disk system Data files can be mirrors periodically split off from an updateable database Third-party functionality Warm Standby Solutions Replication Replication and Log Shipping Both Provide Multiple copies and Manual failover Replication – since SQL Server 6.0 Primarily used where availability is required in conjunction with scale out of read activity Failover possible; a custom solution Not limited to entire database; Can define subset of source database or tables Copy of database is continuously accessible for read activity Latency between source and copy can be as low as seconds Log Shipping Basic idea: Backup, Copy, Restore Log will always be supported But no more investment in the scripts Database scope Database accessible but read-only Users must exit for next log to be applied Data backups no longer block log backups Log Shipping Cold Standby Solutions Backup / Restore Backup / Restore and Detach / Copy / Attach Both Provide Manual detection and failover Potential for some work loss Whole-database scope Standard servers Limited reporting on standby Duplicate copy of database Client must know where to re-connect Slowest failover – Most downtime Backup / Restore Smaller size – only used pages are copied Log backups allow restore to point in time Longer restore time Detach / Copy / Attach Copies entire files No possibility of rolling forward subsequent logs Detach / Copy / Attach Third-Party Availability Solutions Many solutions by many companies with many different names Most replicate data and log changes by one of these methods Looking into SQL Server’s log files and translating log records into logical operations Intercepting IOs and duplicating them, either to another disk set or simply recording them Replicating the writes at the storage level Local copies / mirrors for snapshot backups, etc. Remote copies / mirrors for disaster recovery and Geographically-Dispersed Clusters Geographically-Dispersed Clusters and Storage-level Replication Geographical Failover Cluster Solutions from many third party vendors replicate the local I/Os on a remote system Hardware and software methods Synchronous and asynchronous solutions Solutions must meet Core I/O Requirements Many solutions use MSCS to provide automatic failover See Windows Catalog, Geographically-dispersed Clusters Other solutions are primarily to duplicate the data at a remote site, often with an independent SQL Server Similar to Log Shipping or Detach / Attach Complementary Technologies Failover Solution + Replication Example Publisher Fault-Tolerant Publisher and Distributor Distributor Subscribers Combining HA Technologies Principal Server can be a Failover Cluster Failover to mirror will occur before failover within the cluster So Principal will come back up as the Mirror Mirror can be a Failover Cluster as well Failover Cluster Principal Failover Cluster Mirror Comparing HA Technologies How Do You Compare the Alternatives? There are many dimensions to consider when evaluating an HA / business continuance solution Detection: Automatic or Manual Failover: Automatic or Manual Time to Fail Over Number of Failures it can survive Data Currency / Loss Cost of redundant system(s) Additional hardware Additional management Granularity of Data Safety: Instance, Database, Table, Row Complexity Data Consistency Transparency to Clients Privileges Required to Setup Remote DR Site Impact on Performance There is no “one size fits all” Everyone’s requirements are different Areas of Availability Operator, DBA, and User Error High Availability and the SQL Server process Availability inside the SQL Server process Availability Inside the SQL Server Process Memory and CPUs I/O and Media/Disks Concurrency, Avoiding Deadlocks Diagnosing an Issue Tuning and Fragmentation DBCC Operations Memory and CPUs Memory Dynamic Configuration applies to AWE memory as well as regular memory AWE allocation is dynamic Checkpoint – more well behaved for large memory systems Sniffer – Checksums periodically evaluated in cache CPUs CPU Affinity is Dynamically Configurable NUMA-awareness Scheduling on a CPU is at the batch level rather than the session level I/O and Media/Disks I/O retries, diagnosis/transparency, and better messages SQL Server’s basic I/O requirements RAID and storage is usually reliable, but what happens if there are issues with: Data Files Log Files Full-text Catalogs Backup Files tempdb Microsoft SQL Server Core I/O Requirements I/O- and Storage-Level Replication See whitepaper SQL Server 2000 I/O Basics on TechNet http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIOb asics.mspx To ensure correct operation the underlying I/O and storage system must meet the Core I/O Requirements Stable Media – once ‘write’ is acknowledged, must be guaranteed Write Ordering –writes must preserve ordering across devices Torn I/O Prevention – writes must be done as a unit; not split Applies to both local storage and any I/O-level or storage-level replication scheme, synchronous or asynchronous Guarantees that what appears at secondary site is an image that actually appeared at some point in time at the primary Asynchronous replication allows potential loss of committed transactions Data Files Database Page Checksums Detect disk I/O errors not reported by the hardware or operating system DBCC Check* now use Database Snapshots for improved scalability Partial Database Availability Database is available if the Primary filegroup is available Fine-Grained Online Repair Online Restore – Database remains online; Only data being restored is offline Piecemeal Restore – Online restore of filegroups by priority Filegroup / File backups allowed on Simple recovery model databases Page-level Restore – Can restore individual pages to repair errors found by page checksum or torn pages Instant File Initialization Skips file zeroing, fast DB create / restore Restore read-only filegroups without applying logs Log Files and Full-text Catalogs Log Files Checksum Full-text Catalogs Backup / Restore includes Full-text data Detach / Attach includes Full-text data Backup Files RESTORE VERIFYONLY now checks everything it can short of writing the data Backup Media Mirroring Extra copies for archival or disaster recovery Backup and Database Page Checksums RESTORE can detect disk I/O errors not reported by the hardware or operating system Can continue past errors – repair later tempdb Scalability Scalability of tempdb has been enhanced Caching of initial pages for temporary tables and table variables Improved allocation page latching protocol Reduced logging overhead for tempdb so there is less I/O bandwidth for tempdb log file More efficient allocation algorithm for pages from ‘mixed’ extents in tempdb tempdb Best Practices We still recommend the following if you see latch contention on tempdb allocation or system catalog pages: Avoid auto grow Pre-allocate space for tempdb files Make as many tempdb files as you have CPUs Account for any affinity mask settings File sizes of equal amounts Increased Concurrency Avoiding Blocking and Deadlocks Database Snapshots Standby / Reporting server with Replication Log Shipping Database Mirroring Scalable Shared Databases – Read-only Row-level versioning and new Snapshot isolation levels Increased Concurrency Row-level Versioning Row-level versioning keeps a copy of each row prior to an update Rather than wait for an update to complete, you can see the data that was committed at the time: Your transaction began: Your statement began: Snapshot Isolation Read-Committed Snapshot Isolation You only see committed data, and it is time-consistent and transactionally consistent But it might not be the most recent, so not appropriate for some OLTP applications If you can work with this snapshot data Readers don’t wait for locks, reducing blocking and deadlocks You have a view of the data at a single point in time Triggers now use row-level versions rather than the log Diagnosing an Issue Diagnosis, Runaway query DMVs Events DDL Triggers Deadlock graphs in XML Dedicated Administration Connection – DAC Tuning and Fragmentation Database Tuning Advisor Understands indexes and partitions Online Index Operations Indexes remain online during reorganize and rebuild Table and Index Partitioning Rows can be placed in filegroups depending on a column value Partial availability, defragmentation can occur at the partition/filegroup level Partitions can be moved among tables for fast load, data movement DBCC Operations Shrink – includes BLOBs Defrag – new ALTER INDEX <xxx> REORGANIZE | REBUILD Check* uses Database Snapshots for improved scalability, performance In all editions! Progress Reporting for Shrink and Defrag Areas of Availability Summary Operator, DBA, and User Error High Availability and the SQL Server process Availability inside the SQL Server process For More Information SQL Server Books Online Whitepapers Failover Clustering Database Mirroring Peer-to-Peer Replication MSDN and TechNet webcasts © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.