* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Partial Database Availability
Survey
Document related concepts
Tandem Computers wikipedia , lookup
Serializability wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Team Foundation Server wikipedia , lookup
Microsoft Access wikipedia , lookup
Functional Database Model wikipedia , lookup
Ingres (database) wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
ContactPoint wikipedia , lookup
Database model wikipedia , lookup
Transcript
Building Highly Available Systems with SQL Server™ 2005 Keith Burns Architect Evangelist Microsoft Ltd Availability What does it mean to you? Why not? Can your customers get done, what they need to get done, when they need to do it? Site is unavailable System is unavailable Database is unavailable Database is partially unavailable Table is unavailable Data is unavailable 24x7x365 Barriers to Availability Isolated Failures Concurrency Issues Catastrophic Failures Barriers to Availability Isolated Failures Continuing to work with isolated failures Limiting the scope of failure Partial Database Availability Online Piecemeal Restore Supporting Technology Instant File Initialization How do they work? What happens when… Disks Fail In SQL Server™ 2000 Database is marked suspect Users are unable to access the database In SQL Server™ 2005 Filegroup is marked offline Users are able to access undamaged data What happens when… Recovery begins In SQL Server™ 2000 Database is in a restoring state Users are unable to access the database File needs to be recreated and zero initialized File Restore can proceed – offline In SQL Server™ 2005 Filegroup is in a restoring state Users are able to access undamaged data File can be recreated with instant file Initialization Piecemeal Restore can proceed – online How is This Possible? Fine grained operations are based on “functional partitioning” Partitioning – in this sense – does not require Partitioned Tables Partitioned Tables benefit significantly from fine grained operations Partitioning for fine grained operations requires secondary, non-primary data files where data is strategically placed Recovery of your damaged devices can be prioritized and then the database can be brought online in stages Functional Partitioning Strategies to separate Objects/Data Related Object-groupings Separate tables – strategically placed on different filegroups Time-based data placement/partitioning Structures designed for sliding window scenario List-based groupings/partitioning Range-based partitioning based on complete lists To fully leverage Partial Database Availability for partitioned objects – use Partitioned Tables Partitioned Tables – new feature in SQL Server™ 2005 to further simplify the process of building large data warehouses Benefits of Partitioning Speed in managing sliding window Partition manipulation outside of active table Piecemeal Backup Backup active components more frequently, inactive less frequently Partial Database Availability If a filegroup becomes unavailable the undamaged data remains available Online Piecemeal Restore During the restore, the undamaged data remains available Partial Database Availability Improving Availability for Isolated Disaster Undamaged data remains available while damaged data is inaccessible File Status shown in sys.database_files catalog view Page Errors written to suspect_pages table in msdb Agent alerts: Notification of the damaged file Can take the database offline, if desired Can automate the restore, for read-only data Database Components Readonly filegroups Readwrite filegroup TicketSalesD BPrimary File1 File2 2004 File3 2003 File4 2002 File5 2001 Database consists of… Filegroups consist of… Files consist of… Extents consist of… Pages consist of data File Header 0 4 8 12 16 20 24 28 1 5 9 13 17 21 25 29 2 6 10 14 18 22 26 30 3 7 11 15 19 23 27 31 File6 Log extent extent extent extent 0 1 2 3 … Online Piecemeal Restore Improving Availability during Recovery Almost any component (page, file, filegroup) can be restored – ONLINE If a page is damaged – restore only that page from a file, filegroup or database backup If a file is damaged – restore only that file from a file, filegroup or database backup If a filegroup is damaged – restore only that filegroup from a filegroup or database backup Readonly filegroups can be restored without rolling forward log changes Users can access the database during the restore Summary: Isolated Failures Technology Partial Database Availability Instant File Initialization Online Piecemeal Restore Improves When Data Availability Undamaged data/partitions remains Upgrade available Immediate Recovery Time Recover only that which is damaged Database Creation Time Files are not zero-initialized File, Filegroup, and Database Restore Upgrade Missing files are created quickly Immediate Autogrow and Manual Growth time Additional space is quickly added Recovery Time – Less time to create files Data Availability Undamaged data/partitions remains Upgrade available during recovery Immediate Recovery Time Recover only that which is damaged – online Barriers to Availability Isolated Failures Concurrency Issues Catastrophic Failures What happens when… Indexes need to be rebuilt In SQL Server™ 2000 Index rebuilds require an exclusive table-level lock, resulting in offline rebuilds Users are unable to access the table In SQL Server™ 2005 Rebuilds of an index can be performed online if a few simple criteria are met Users are able to access the table Online Index Operations Improving Concurrency during Index Maintenance SQL Server™ 2000 Offline Index Rebuilds; table data is unavailable during operation Rebuild options: DBCC DBREINDEX and CREATE with DROP_EXISTING SQL Server™ 2005 Includes all of the above offline operations, plus… New ALTER INDEX…REBUILD: ONLINE – allows concurrent user access (queries as well as modifications) to the index during rebuild OFFLINE – works using locks (same as SQL Server™ 2000) If online is not possible by default, consider design alternatives to fully leverage online index rebuilds What happens when… Readers and Writers desire the same data In SQL Server™ 2000 Locking is used to guarantee the intended level of isolation Users must wait to access locked data Concurrency and performance compromised Correctness is compromised when lower isolation levels are used to avoid locking In SQL Server™ 2005 Locking OR Versioning can be used to guarantee a variety of isolation levels With versioning, Readers won’t block writers and writers won’t block readers Performance improved if contention was primary bottleneck Correctness is not compromised due to use of lower isolation levels Snapshot Isolation Improving Concurrency in Mixed Workloads SQL Server™ 2000 Isolation implemented solely through locking Mixed workloads may experience: Concurrency problems due to blocking The Inconsistent Analysis problem SQL Server™ 2005 Isolation implemented using locking and versioning Mixed workloads can improve read consistency and performance using: Read committed with Statement-level snapshot to improve statement-level consistency Snapshot Isolation to improve transaction-level consistency Summary: Concurrency Requirements Technology Improves When Online Index Operations Table concurrency tables being rebuilt remain available Downtime due to Maintenance no longer required for majority of indexes Minimal Work to Leverage Design and Architect Snapshot Isolation Statement-level Snapshot Snapshot Isolation Transaction-level Snapshot for read-consistency Transaction-level Snapshot for Update Conflict Resolution Row concurrency locked rows prior and consistent version remain accessible Accuracy long running aggregates/statements use consistent version from statement start Analysis/Query Time Queries do not wait! Row concurrency locked rows prior and consistent version remain accessible Accuracy long running aggregates/statements use consistent version from transaction start Analysis/Query Time Queries do not wait! Minimal Work to Leverage Minimal Work to Leverage Design and Architect Barriers to Availability Isolated Failures Concurrency Issues Catastrophic Failures Failover Clustering Server-level Redundancy Established High Availability Technology Hot Standby: Automatic Detection and Automatic Failover No work loss exposure and no direct impact to workload Protects against node failures Geographically Dispersed Failover Clusters with approved hardware Recovery on failover improved by Fast Recovery Failover Cluster Failover Clustering New for SQL Server™ 2005 Faster Failover through Fast Recovery Supports up to an 8-node Failover Cluster with Enterprise Edition Supports up to a 2-node Failover Cluster with Standard Edition Supports mounted volumes for better explicit disk usage – helps in server consolidation Supports dynamic AWE for better memory utilization Unattended setup All SQL Server data services participate Database Engine, SQL Server Agent, Full-Text Search Analysis Services – Now has multiple instances Fast Recovery Improving Availability by Reducing Downtime Not only beneficial to Failover Clustering On every server startup, Restart Recovery runs to guarantee consistency Restart Recovery has two phases: REDO: rolls forward committed transactions UNDO: rolls back any incomplete transactions In SQL Server™ 2005, users are allowed access after REDO SQL Server™ 2005 Redo ONLINE SQL Server™ 2000 Undo ONLINE Database Mirroring Database-level Redundancy Upcoming High Availability Technology Released for testing and prototyping in SQL Server™ 2005 RTM Certified for Production Use in the first half of 2006 Supports three configurations: High Availability High Protection High Performance Database Mirroring Technology Overview Principal Database handles user activity Mirror Database receives changes via secure, dedicated TCP channel Server does NOT require a license if the server acts solely for redundancy Optional Witness Server Lightweight mechanism to help provide quorum Can run on any SQL Server Edition Supports three configurations: High Availability High Protection High Performance Database Mirroring Basic Principal of Synchronous Mirroring Acknowledge Acknowledge Commit Constantly Redoing on Mirror Transmit to Mirror Write to Local Log DB Committed in Log Log Write to Remote Log Log DB Database Mirroring Configuration Summary High Availability High Protection High Performance Automatic Detection No Automatic Detection No Automatic Detection Automatic Failover Manual Failover Manual Failover Uses synchronous form of mirroring Uses synchronous form of mirroring Uses asynchronous form of mirroring Requires Witness Does not require Witness Does not require Witness Principal performance is Principal Principal affected by performance is performance is speed affected within by affected by network Mirror database is available secondsNOT of failover distance network speed and network speed and and Mirror database is available for read-only analysis through distance distance the use of Database Snapshots Database Scale Out Peer to Peer Replication Identical databases continuously synchronize in near real time Scale query workloads beyond what’s possible with a single database Example: Distributed Trading System Chicago London Tokyo Availability through Scalability Peer to Peer Replication Enables load-balancing and improved availability through scalability Database failures shouldn’t bring down the application system Database upgrades should be done without outages Individual databases can be taken online/offline and maintained without application downtime Warm Standby Small possibility of some data loss on failure Peer-to-Peer Replication Based on Established Transaction Replication Technology Based on Bi-directional Transactional Replication All participants are peers Schema is identical on all sites Publish the updates made on “their” data Subscribe to others to pick up their changes No hierarchy as in “normal” transactional replication A given set of data can be updated at only one site at a time Data “ownership” is purely logical; does not prevent conflicts SQL Server prevents a change from round-tripping Peer to Peer Topology London Chicago Logreader Agent Logreader Agent Peer to Peer Transactional Replication Dist DB Dist DB Distribution Agent Tokyo Logreader Agent Dist DB Distribution Agent Distribution Agent Log Shipping Database-level Redundancy Established High Availability Technology Supports multiple secondary servers Secondary for Failover Secondary for Reporting Secondary with delay for Human Error Recovery Can be combined with other technologies such as Failover Clustering and Database Mirroring New for SQL Server™ 2005 Integration in SQL Server Management Studio Log Shipping is not delayed during Database or Differential Backups Barriers to Availability Many more barriers than discussed Only some are addressable by database technology Be sure to consider people, planning, procedures and training Microsoft SQL Server™ 2005 gives you greatly improved tools to overcome these barriers to availability: Database Server Failure or Disaster Isolated Disk Failure Data Access Concurrency Limitations Database Maintenance and Operations Availability at Scale User or Application Error Summary SQL Server™ 2005 offers greater availability – immediately Many technologies available just by upgrading! Some architected/implemented over time SQL Server™ 2005 is more Available Partially damaged databases remain available Databases being recovered remain available Instant File Initialization, Fast Recovery New and Improved Replication Alternatives SQL Server™ 2005 is more Robust Take Advantage When? How much work to leverage the technology? Partial Database Availability Online Piecemeal Restore Instant File Initialization Online Index Operations Snapshot Isolation Statement-level Snapshot Failover Clustering Database Mirroring Log Shipping Database Snapshots Snapshot Isolation With Update Conflict Detection Snapshot Isolation Transaction-level Snapshot (RO) Online Index Operations When Criteria NOT Met (minority) When Criteria Met Fast Recovery Design and Architect Minimal Work to Leverage Upgrade Immediate Replication Improving Availability from Installation to Design Availability in Layers to minimize downtime and data loss © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.