Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hosted by Storage Network Designs for OLTP Business Continuity Marc Farley President, Building Storage Networks, Inc. Hosted by Agenda The Vendor Neutral Approach Overview of OLTP &High Availability I/O Redundancy Methods Storage Network Technologies Storage Networking for HA OLTP Hosted by Vendor Neutral Approach Generic terms, not vendor terms Assumed basic knowledge of SAN, NAS, RAID Hosted by And now, for something completely different….. Hosted by OLTP Environments Mission critical business applications • Business in real-time Expensive equipment and software Aggressive performance objectives Highly skilled IT staff • Hands-on computing operations Hosted by OLTP Database Software Oracle, • 8i Oracle Parallel Server (OPS) • 9i Real Application Cluster (RAC) IBM • DB2 UDB • Informix MS SQL Server Sybase, My SQL, others Hosted by OLTP OS Platforms IBM S/390 MVS Unix Systems Windows 2000+ HA Linux Hosted by OLTP Requirements 99.999% uptime Non-degrading response time High transaction rates Seamless scalability Cost relief Hosted by Database Storage Approaches Raw parititions • Bypass OS I/O buffering File system • Facilitates data management NFS mounted • Offload DB server, NTAP + Oracle Hosted by ACID Properties of OLTP Atomicity – No partial transactions Consistency – All tables are in a consistent state before and after a completed transaction Isolation – One transaction cannot contaminate other transactions Durability – Transactions are complete only when the database updates are written to disk storage Hosted by Challenges of OLTP Major systems integration effort • Intricate tuning and monitoring • Little tolerance for errors Complex data structures & relationships Time and sequence-sensitive processes • Must be adhered to for data integrity Shifting workloads and bottlenecks Hosted by OLTP Database Files Data files • Database data, tablespaces Redo log files, archive log files • Reconstruct or rollback transactions Control files • File layout information Hosted by OLTP Table Space Storage Use many spindles to distribute hot spots RAID 0+1 recommended File system recommended over raw partitions • Easier data management Hosted by Striping for Performance RAID Controller (Microsecond performance) Disk Drive Disk Drive Disk Drive Disk Drive Disk Drive Disk Drives (Millesecond performance) From rotational latency and seek time Disk Drive Hosted by My Personal Favorite, RAID 0+1 RAID Controller DiskDisk Drive Drive 1 DiskDisk Drive Drive 2 DiskDisk Drive Drive DiskDisk Drive Drive DiskDisk Drive Drive 3 4 5 Mirrored Pairs of Striped Members Hosted by OLTP Redo Log Storage Raw partitions recommended • Sequential high speed writes Separate mirror pairs per log file group Capacity for 30 – 60 minutes of data Goal is to limit disk contention for current and active log files Hosted by OLTP Archive Log Storage File system or NFS mounting is required • NFS mounting is recommended Mirroring or RAID Goal is to have easy access in case they are needed for reconstruction Hosted by High Availability The ability for a system or application to immediately continue its mission after loss or damage to system components, systems, facilities and data Hosted by Availability Threats Expected • Scaling limitations Processor Storage capacity Network • • Consolidations Product life cycles Unexpected • • • • • Failures Bugs Virus Operator errors Disasters Hosted by HA Engages All Elements Systems • Application Network connections • Network services Storage and I/O subsystems Hosted by Scoping the Risks System Network Storage Component HBA Cable Disk drive System Server Switch Subsystem Pathological Virus attack Service provider Environmental on platform outage media loss Server rooms All external Total data loss Site gutted communications Hosted by Managing the Risks Local copies of data • Immediate availability (Remote) Nearby • Immediate availability to several hours Remote Far away • One to several days availability Hosted by Disaster/Availability Radii Local Remote Nearby Remote Far Away Hosted by Nobody Expects….. Weird things to happen to them Disintegration of media Underground flooding through tunnels Fires in Telco switching centers Hosted by High Availability for OLTP Duplication of functions • Without degrading performance • Without risking data integrity Brute force techniques Automation and efficiency Cost is always an issue • And high availability DOES cost Hosted by A Long Time Ago in a Job Not So Far Away……………. You must learn the Remember Marc, Redundancy. Got it Jim. Whatever Let’s Again! Eat! Marc Skyfaller Farley to be isa master there only oneof REDUNDANCY! redundancy concept: it if you are going to be a storage geek. Jedi Jim Gast Hosted by Eventually, I Learned to Appreciate His Teachings…… Don’t get the giant spicy Polish for lunch – its too much thePoint digestion NSPoF (Nofor Single of Failure) •REDUNDANCY Hosted by OLTP HA Requires Complete Redundancy Protection Client network Server systems and components Application modules I/O Channels and Networks Storage subsystems and components Data Hosted by A Quick Look At Clustered Storage Shared Nothing Each server controls its own storage address space Shared Everything Both servers share control of a common storage address space Hosted by Examples of OLTP Clusters Microsoft SQL Server Oracle 9.1 RAC Data is exchanged between servers Failover paths only Data is accessed directly from storage Hosted by One more time, with subsystems… Microsoft SQL Server Same subsystem but different address spaces Oracle 9.1 RAC All storage is shared by all cluster nodes Hosted by I/O Redundancy Host to subsystem • Mirroring: Host to independent targets • Multi-pathing: Host to a single target Subsystem to subsystem • Store and forward: Local Remote Hosted by Disk Mirroring: Redundant storage targets Independent, identically sized storage address spaces One controller Two controllers Hosted by Disk Mirroring: I/Os to 2 Targets “Brute force” redundancy: fast and simple Both read and write I/Os • Overlapped reads for performance Local connections Limited capacity* I/O Bottlenecks* for random I/O activity • * if targets are disk drives Hosted by Disk Mirroring for Redo Log Files Log files are a common bottleneck Use raw partitions Redundancy is required • Mirroring is adequate Use highest RPM with lowest seek times Put on a separate channel from database I/O Use separate mirrored pairs per group Hosted by Mirroring to Storage Subsystems Storage Subsystem Independent, identically sized storage address spaces Two controllers Storage Subsystem Hosted by Mirroring to Subsystems Targets are subsystems, not disks • Separate address spaces Capacity scales to subsystem max Double level redundancy • Mirroring plus RAID Multiple disk spindles reduces I/O bottlenecks Hosted by Disk Mirroring Datafiles from Host to Storage Subsystems Disk mirroring + subsystem RAID Excellent capacity scaling Adjacent and across campus/town • One subsystem outside site radius Requires longer distance cabling Reads and writes both transmitted Hosted by Multi-Pathing: Redundant Paths Between a Host & Subsystem Pathing software determines that a transmission error occurs & switches to a redundant path Application data volume Hosted by Multi-pathing vs Mirroring Mirroring assumes independent, but similar storage targets Multi-pathing assumes multiple paths to the exact same target Mirroring can use a single HBA, multipathing needs two HBAs Hosted by Path Failures 1. HBA problem 2. Link, switch or network problem 3. Subsystem controller problem Application data volume Hosted by Transmission failures recognized after SCSI timeouts are exceeded I/O sent to storage No ack received The I/Os is retried and eventually an error is passed back to the process that issued the I/O Hosted by Path Failover for OLTP I/O Redundant path resources take over activities for a failed path to sustain operations without disrupting service or risking data integrity Hosted by Store and Forward Independent, identically sized storage address spaces Host A B Hosted by Store & Forward: One Host I/O and Two Copies of Data Only real option for remote copies Does not forward read I/Os Proprietary protocols and methods • Standards are emerging ie. FC/IP First step to storage snapshots Hosted by Store and Forward: Acknowledgements Asynchronous I/O ACK Forward Synchronous I/O ACK Forward ACK A B A B Hosted by Trade-offs with Acknowledgement Handling Synchronous • Always preferred • Slowest performance • State of copy is precise Asynchronous: • Fastest performance • Least precise knowledge of copy status Hosted by Store & Forward: Local and Remote Copies Local & nearby copy techniques • • Synchronous Fiber optic cabling, optical/DWDM services Remote-far away copy techniques • • Asynchronous ATM gateways, OC-12 or less, FC/IP Hosted by Mirroring vs Synchronous Store and Forward for Local & Nearby Copies Mirroring • • • • • Async I/O Reads and writes No snapshot tie-in Uses more host slots Store and Forward • • • • Async or Sync I/O Writes only Snapshot ready May conserve host I/O slots Least costly • Most costly Hosted by Combining Mirroring with Store and Forward Store and Forward Radius Local Nearby Mirroring Radius Remote Far Away Hosted by Data Redundancy for OLTP Backup Snapshots Delta (log files) Hosted by Backup for OLTP A whole subject unto itself Disaster recovery primarily Cold? Who can afford to do that anymore? Hot – put DB in backup mode Backup snapshot image of data Hosted by Subsystem Snapshots for OLTP 1. Flush host buffers (sync, sync) Database Server Disk Storage Subsystem A 2. Create Snapshot Disk Storage Subsystem B Disk Storage Subsystem c Hosted by Logical Snapshots for OLTP 1. The address space is mapped 2. First updates v Overwritten data locations are not returned to the free space pool. (Undelete) 3. Second updates Hosted by Delta Redundancy with Log Files Recording of all transaction activities Roll forward, bring up to date Roll Backward, go to known good state Terrific tool for remote redundancy Not HA Process cannot have holes in it Hosted by Remote Redundancy w/ Log Files -1 d(x) = f(x) – f(x-1) Latest Redo Log File f(x-1) Previous Instance f(x) Current to Log File Switch Checkpoint Hosted by He never does anything except eat and sleep How come I always end up doing all the work? Managing Redundancy isRedundancy a way of life And now, some is Hard Work thoughts from our sponsor….. Hosted by SAN Considerations Fabrics and SAN Islands Zoning Switches and directors Multiplexing (oversubscribing) Security Hosted by Fabrics ARE the SAN Environment One size does not fit all applications Larger fabrics carry more risks VSANs are probably a good idea Only use switches supporting hot, stateful firmware upgrades Hosted by SAN Islands May be Best for OLTP Most risk averse approach Dual fabrics, one fabric per I/O path Switch problems do not cascade But, higher management costs Hosted by Zoning & OLTP All ports defined to zones • No rogue ports and zombie zones Restrict access to current servers • Need-to-access only Hosted by Switches and Directors Redundancy eats slots and ports • Pathing, mirroring • Separate channels for data and logs Avoid traversing ISLs, if possible • Added latency and blocking potential • Trunking must have NSPoF Hosted by Security Admin security for an OLTP SAN should be as strong as possible • No monkey business No default passwords left WAN encryption of log files Hosted by Recommendations: Determine OLTP availability needs • Where copies should be, time to access Match storage network implementation to DB file types Develop availability-driven policies • Equipment • Processes