* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DB-05: Database Storage Considerations
Survey
Document related concepts
Transcript
DB-05: Database Storage Considerations Adam Backman White Star Software [email protected] Outline Why is DB storage so important? Hardware options RAID vs. JBOD Network Storage (SAN, NAS, …) Database setup options Block size Cluster size (DB and BI) Splitting Data into areas DB-05 Database Storage Considerations © 2005 White Star Software Why is DB Storage So Important? Everything starts from the disk Disks need to be reliable and fast All physical disks are only capable of doing approximately 100 operations per second Salespeople are weasels DB-05 Database Storage Considerations © 2005 White Star Software Who cares about 100 I/O operations per second? Example: 10000 read operations 90% buffer hit rate 1 disk would take 10 seconds to complete 10 disks would take 1 second to complete Which of these would your users choose? DB-05 Database Storage Considerations © 2005 White Star Software Stop I/O before it gets to the disks Application code Use indexes Watch out for can-find Bulk updates vs. many small updates Use memory not disk Add system memory (no swapping or paging) Increase buffers (-B, -Bt, -Bp) Add cache to disk controllers DB-05 Database Storage Considerations © 2005 White Star Software A little more about buffer hit rate Using same example: 10000 Logical reads 90% buffer hit rate = 1000 Physical reads 95% buffer hit rate = 500 Physical reads 96% buffer hit rate = 400 Physical reads See the effect buffer efficiency has on disks! DB-05 Database Storage Considerations © 2005 White Star Software What causes disk I/O? Operating system (swapping and paging) Progress Database (DB and BI) Application (code and temp files) Other applications DB-05 Database Storage Considerations © 2005 White Star Software Hardware Options RAID vs. JBOD Buying disks What is variance Simplicity and cost vs. control and budget Network vs. Locally Attached Storage Definitions Configuration Pros and Cons DB-05 Database Storage Considerations © 2005 White Star Software Buying Disks Buy small disks Each disk regardless of it’s size is capable of doing the same number of I/Os per second Buy fast disks Slow disk = slow performance Buy reliable disks Buy many disks The inner portion of the disk is 20% faster than the outer portion of the disk DB-05 Database Storage Considerations © 2005 White Star Software Balancing Disk I/O Balancing disk I/O is the process of making sure you are using all of the available disk resources (file systems, disks and controllers) are working equally as hard at load. This is also called eliminating variance. A well tuned system will have less than a 15% variance across all disks. DB-05 Database Storage Considerations © 2005 White Star Software How can I eliminate variance? Warning: This is a religious argument. Everyone has an opinion and everyone is right. There are two schools of thought. The system should take care of this. I will take care of this. DB-05 Database Storage Considerations © 2005 White Star Software The system will take care of this. Disk I/O can be effectively distributed by the operating system with the use of smart I/O systems. Most notably RAID. Basic Definition of RAID: Redundant Array of Independent Disks DB-05 Database Storage Considerations © 2005 White Star Software What RAID really means RAID has many levels. I will only cover a few. RAID 0: This level is also called striping. RAID 1: This is referred to as mirroring. RAID 5: Most popular RAID level RAID 10: This is mirroring and striping. Also known as RAID 0 + 1 DB-05 Database Storage Considerations © 2005 White Star Software Network vs. Local Storage Network storage options Benefits of network storage Drawbacks of network storage DB-05 Database Storage Considerations © 2005 White Star Software Network Storage Options SAN (Storage Area Network) A dedicated storage network supporting multiple computers NAS (Network Attached Storage) A dedicated storage device on an existing network including a SAN iSCSI A storage networking protocol allows storage traffic to travel through general-purpose networks instead of the specialized Fibre Channel SAN DB-05 Database Storage Considerations © 2005 White Star Software Benefits of Network Storage Centralized data store - buy a SAN, hook up many servers Easier to maintain (backup one place) DB-05 Database Storage Considerations © 2005 White Star Software Drawbacks to Network Storage More complex performance monitoring Many servers all making requests at the same time make it difficult to resolve conflicts Overall more complex More things to break Single point of failure Many vendors support array to array mirroring to resolve this issue Generally more expensive Unless you have a large number of servers using the network storage DB-05 Database Storage Considerations © 2005 White Star Software Creative Hardware Solution: Solid State Disks PROs Reliable Fast CONs Expensive Really expensive Conclusion: Good for logs (AI and BI) and limited portions (hot spots) of the database DB-05 Database Storage Considerations © 2005 White Star Software I will take care of this Manual spread of data across disks Better control as you can see where the I/O is going More attention by system administrator is needed Administrator must think about reliability as well DB-05 Database Storage Considerations © 2005 White Star Software Database Setup Options Database Stuff Storage areas Blocksize and Records per block Type II areas BI cluster size Effect of BI cluster size Why are APWs affected by BI Cluster size DB-05 Database Storage Considerations © 2005 White Star Software Storage Areas Benefits Greater control of location of data Minimize downtime for utilities Stripe some, leave some on straight disks Drawbacks More things to break More complex to monitor DB-05 Database Storage Considerations © 2005 White Star Software Storage Areas - Control A Storage Area can hold 1 or more data objects (index, table, schema, …) Separate schema from data if possible Isolate the logs (AI and BI) Try to keep the number of areas manageable, only add more areas for valid business reasons DB-05 Database Storage Considerations © 2005 White Star Software Database Block Size 8k for most (4k for NT) More in synch with the operating system More stuff per read Especially good for index reads Watch the number of records per block so you don’t waste block space DB-05 Database Storage Considerations © 2005 White Star Software Records per Block Can be set to a binary number between 1 and 256 If it is set too low you will waste space at the end of each block If you set it too high you run the risk of fragmenting records (most true for records that grow over time) More important to get right for type II areas DB-05 Database Storage Considerations © 2005 White Star Software Setting Records per Block First do a database analysis. proutil <dbname> -C dbanalys > filename Then take the mean record size of the table(s) you are working with and add 20 bytes for record and block overhead. Divide the result above into your block size. Last take a binary number just above the result above. DB-05 Database Storage Considerations © 2005 White Star Software Example: Setting Records/Block Table 1 has a mean record size of 120 bytes Add 20 bytes of overhead (140) Divide block size (8192) by result (140) This (58.51) is how many mean-sized records will fit in a single database block Take the next higher binary number (64) Are there exceptions, you bet! DB-05 Database Storage Considerations © 2005 White Star Software Type II Areas The table and index blocks from individual objects are clustered together The cluster size is “tunable” to 8, 64 or 512 block clusters Benefits: Potential performance benefits Drawbacks: Yet another thing to try to get right, potential wasted space DB-05 Database Storage Considerations © 2005 White Star Software How Type II Areas Work The data blocks are clustered together and in turn these clusters are chained together Scan type operations need only find the first block and the rest of the blocks will follow Example: When you do an index rebuild, the first step is to delete all of the index blocks. In a type 1 area all of the blocks in the area are scanned and the index blocks are deleted. In a type II area the first cluster is put on the free chain and no other updates are needed. DB-05 Database Storage Considerations © 2005 White Star Software Why are APWs Affected by BI Cluster Size The BI cluster size determines the length of the checkpoint The APW efficiency determines the efficiency of the checkpoint Generally, I start with 1 apw and the “correct” BI cluster size and then add 1 apw if I see buffers flushed at checkpoint during my “important” processing times. DB-05 Database Storage Considerations © 2005 White Star Software BI Cluster Size The default value of 512 KB is wrong for most people Monitor the DB during your high update portion of the day Set the cluster size high enough to keep checkpoint lengths greater than 2 minutes Generally, 1 MB to 8 MB for most sites DB-05 Database Storage Considerations © 2005 White Star Software Conclusion Look at your storage No RAID 5 RAID or JBOD To network or not to network Get your DB setup right Block size (remember records per block) Rational split of your data into storage areas Type II areas (where needed) Get your BI cluster size right DB-05 Database Storage Considerations © 2005 White Star Software Use Both Methods Use mirroring to protect your data Use stripping to get your data onto more physical drives Use storage areas and extents eliminate variance Use storage areas and extents to further spread your data DB-05 Database Storage Considerations © 2005 White Star Software Questions DB-05 Database Storage Considerations © 2005 White Star Software