* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DBA421
Tandem Computers wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
DBA421 Building Very Large Databases with SQL Server 2000 (32/64bit) Gert E.R. Drapers Software Architect Customer Advisory Team SQL Server Development SQL Server Development Customer Advisory Team We invest in Large Scale SQL Server projects across the world Technical Assistance with design, performance tuning, etc. Product group programs We use our customer experiences to make a better product We share the knowledge learned from these challenging implementations so that more can be built Session Objective Show that Large Scale projects using the Microsoft database platform is not unique Give you some thoughts on clever design and implementation tips dealing with large scale SQL Server applications Make you feel comfortable that it absolutely can be done Scaling up / out with SQL Server - It can be done! Agenda General comments on building large scale applications Scaling up Large Scale OLTP Best Practices Case Study Example Scaling up Large Scale Data Warehouse Best Practices Case Study Example Scale out - Clarification 64-bit – Awesome, but understand the value Summary / Q & A What is large scale? Any database application that requires special configuration work outside default Numbers do not mean anything but I always get asked so here are some guidelines: TB+ size databases 1000(s) of Database transactions / sec 1000(s) of connected Users Throughput of the IO subsystem exceeding 80 mb/sec Greater than 8 way processor Greater than 8GB Ram Large Scale is when you think it is challenging and your vendor says it is a piece of cake. This is when you start worrying Building Large Scale SQL Server applications The General Implementation Rules Change slightly (true for all RDBMS) It’s not quite auto-tuning You need to understand your entire platform: Hardware, OS and Database Server configuration You may have to change defaults that work great for 99% of other applications DOP Recovery Interval Query / Index Hints etc Experience and know how is what makes large scale applications work It’s not a piece of cake but it ABSOLUTELY CAN BE DONE! Some interesting SQL Server implementations in Production 25+ Terabyte or greater applications in production Currently active on at least as many as we speak 7+TB Telco Billing reporting system growing to 15TB 3TB GIS mapping application growing to 10-20TB 13TB Debt processing financial application Currently working on system that will grow to 50 TBs Many High Volume mission Critical applications in production on SQL Server Trading application exceeding 60000+ Database transactions / sec Credit card processing system processing 3000 authentications/sec Banking application processing money transfers of greater than 40m trx / day while reporting is active Agenda General comments on building large scale applications Scaling up Large Scale OLTP Best Practices Case Study Example Scaling up Large Scale Data Warehouse Best Practices Case Study Example Scale out - Clarification 64-bit – Awesome, but understand the value Summary / Q & A OLTP Large Scale Tips and Best Practices Eliminate Log Bottleneck Eliminate ineffective Stored Procedure Recompiles Eliminate Lock escalation that causes performance issues or deadlocks Understand your scheduler and what hardware resources needed Eliminate Log Bottleneck Set log on its own drive or RAID 10 stripe set Assign its own Controller or own LUN(s) if on SAN Set controller or IO Subsystem cache for log as 100% write through (unless reading log with replication or a lot of trigger activity) Monitor Waitstats to ensure log is not waiting – DBCC waitstatsDBCC SQLPERF(waitstats, clear) go DBCC SQLPERF (waitstats) Create Waitstats Script dbcc sqlperf(waitstats,clear) -- clear out wait statistics drop table waitstats create table waitstats (Wait_Type varchar(80), Requests numeric(18,1), Wait_Time numeric (18,1), Signal_Wait_Time numeric(18,1), timenow datetime default getdate()) declare @start int, @finish int select @start = 1, @finish = 10 while (@start < @finish) begin begin transaction insert into waitstats (Wait_Type, Requests, Wait_Time,Signal_Wait_Time) exec (‘dbcc sqlperf(waitstats)') commit select @start = @start + 1 waitfor delay '00:00:10' -- every 10 seconds End select * from waitstats order by Wait_Time desc Eliminate ineffective Stored Procedure Recompiles In some cases, stored procedure recompilation behavior adversely affects performance Multiple recompilations Entire stored procedure query plan recompiled (especially for very large stored procedures) Stored procedure recompiles cause serialization, proc execution wait serially during stored proc recompilation Recompile triggered by Set options cause recompilation DML/DDL mixed TempDB Table cardinality changes Etc. Customers miss recompilation aspect of performance Possible Solution / Suggestion Whitepaper on MSDN and KB Article: Q243586 Execute proc [ keepfixedplan at stmt level ] Statement level recompilation (Yukon) Be on the look out of Recompilations using Perfmon. Locking / Lock Escalation Lock Escalation SQL Server will escalate to a table lock when a specific percentage of memory is used up for locking or the system allocates 2500 locks(5000 in SS SP2). Table Lock escalation saves on memory allocation but can cause of deadlocks and other locking problems Resolution: Force page locks which will allocate fewer locks Or get clever like one customer did Insert dummy record 9999 99 99999 999 999 Open thread and grab update lock on dummy record Next thread can not escalate because you can not escalate to a table lock when another lock is held on same table. Worst case – There is a trace flag to shut it off, call your Microsoft support prior to using this option. Monitor through normal locking tools: profiler, sp_who, sp_who2, sp_lock and sp_blockinfo Locking script Send email to [email protected] for: sp_blockinfo script waitstats script Other documentation SQL Server Scheduler – understand it! UMS UMS UMS UMS 4 way CPU 5 BCP.exe jobs 1) 2) 3) 4) Connections get assigned to UMS UMS schedules across Processors A connection stayed on UMS for life of thread Two heavy threads will fight on the same UMS SQL Server Scheduler – continued Task 1 2 3 1, 5 2 UMS 3 4 UMS 4 5 5 BCP.exe jobs UMS UMS 4 way CPU 1) 2) 3) 4) Each load job takes 2 minutes to run Task 2,3 and 4 each finished in 2 minutes Task 1 & 5 took closer to 4 minutes Using 4 threads instead of 5 would be a better solution for this example. 5) Monitor using DBCC SQLPERF (UMSTATS) UMSSTATS example DBCC SQLPERF (UMSSTATS,clear) go waitfor delay '00:00:15' go DBCC SQLPERF (UMSSTATS) OLTP Case Study – 60000+ database trx/sec Financial trading messages: Bids / Asks coming off Tandem to message queue at a rate of 12000-15000/sec Each message goes to a SQL Server stored procedure and makes approximately 5 database transactions Simultaneously real time queries are reading from active data Topology Tandem Non-stop SQL Appl Tier Microsoft SQL Server 8-way Dell Brokers Queries – real time Read from Queue SS Stored Proc SS Stored Proc Message Queue …. 6 Different Databases OLTP Lessons learned Challenges Scheduling a mix workload evenly across Schedulers Database Log to handle 60000+ database trx/sec Real time reporting and loading data Can be done with SQL Server 2000 Multiple database logs Read-only queries Priority scheduling of connections Agenda General comments on building large scale applications Scaling up Large Scale OLTP Best Practices Case Study Example Scaling up Large Scale Data Warehouse Best Practices Case Study Example Scale out - Clarification 64-bit – Awesome, but understand the value Summary / Q & A Data Warehouse Lessons Learned Loading large quantity quickly Maintaining very large databases Optimizing TempDB Optimizing your IO subsystem Data Loading Strategy For VLDB Bulk Insert without indexes or minimal indexes seems to be fastest (hundreds of thousands rows/sec) but not applicable to larger systems with incremental data loads Bulk Insert with single NC index fairly linear. Bulk Insert with Clustered index less linear due to fragmentation Parallel Bulk Insert can be fastest but not linear with sorted data and partitioned load files No contention using multiple Bulk Insert (one per table) if physical model works for data access (e.g. horizontal partitioning) Maintenance Tips: Reorg & Create Index Run DBCC ShowContig and look for Scan Density(logical extent density) and Avg Page Density(Avg free bytes/page). Below 30-50% then a Reorg is recommended. Run Index Defrag if your data in not interleaved (that is single table/index in its own File Group). Run DBCC Reindex if you want all clustered and nonclustering indexes rebuilt Run Create Index with drop existing if you only want clustered index rebuilt and Index defrag is not helping Create Index runs fastest with more memory Index Defrag with Interleaved data Single File Group Tab A….page Tab B….page 6……………….10 Tab B….page Tab B….page Tab A….page 16………………20 Tab A….page 1………………….5 Tab A….page Tab B….page 11……………..15 Tab A….page 21……………….25 Tab B….page Run DBCC Index Defrag (Tab A) Index Defrag with Interleaved data… Single File Group Tab A….page Tab B….page 1…………………..5 Tab B….page Tab B….page Tab A….page Tab A….page 16……………20 6………………10 Tab A….page Tab B….page 11…………..15 Tab A….page Tab B….page 21……………….25 1. 2. 3. 4. Start with first extent and swap with next smallest extent If pages not at fill factor then will fill up to fill factor If pages are greater than fill factor it will NOT take away records to meet fill factor Will not reorganize other objects in file group therefore data will not be contiguous following Index Defrag Index Defrag with Interleaved data… Single File Group Tab A….page Tab B….page 1……………….5 Tab B….page Tab B….page Tab A….page 6………………10 Tab A….page 11…………….15 Tab A….page Tab B….page 16……………..20 Tab A….page 21……………….25 Tab B….page • When completed data is in correct • Data may or may not be contiguous • Free space is used up but not given back • DBCC Reindex will put data back in contiguous order but takes longer to run and is more intrusive TempDB slows down when busy! Challenge Performance degradation during peak load Many concurrent users creating temporary tables Performance will start to degrade Sysprocesses shows locks on DBID #2 (TempDB) Resolution More files for TempDB file group Place TempDB across more spindles Could possibly use trace flag to disable single page allocation but please please please call Microsoft support before turning this trace flag on. Challenge: New to SAN technology Never have worked with SAN technology can be challenging Does the data layout rules change? What does the cache really buy me? How do I monitor? How do I determine if the problem is SAN or database? Some thoughts: Test your SAN throughput out prior to database activity. Use IOMETER or just copy large files You should be getting approx. 120mb/sec throughput per channel If heavy updated system then make cache more write through vs read. This will significantly help log writes (Should get 1 ms / write – Avg Disk Writes /sec) Still make sure data is spread across as many drives as possible. Individual disks are still the slowest component for database activity. Case Study Data Warehouse Large Telco processing call detail records (CDRs) Loads into SQL Server 350 million call detail records / day. Bulk loaded Must keep a history Reporting done all day long based off daily, weekly, monthly quarterly and yearly intervals Very partitionable by period of time 5-7 TB database This was based off a prototype, project being developed. Case Study – Data Warehouse design 3 types of databases Type 1 contains 24 - 1 hour tables Type 2 contains 7 - 1 day tables Type 3 contains 7 – 1 week tables Online consolidation was much more difficult, so all 3 are built in parallel. This is good design if Hardware has head room Data is loaded without indexes and then indexed Data made available to users once indexed Yesterday, Last Week and Last 7 weeks are available while Today and This week are being loaded. Challenges for data warehouse case study Real time reporting Partition management – History Maintenance of very large databases Auto Updating statistics on new high volume data coming in. Design using 1 server SQL Server View PView Last week Week n-7 Last week This week This week Yesterday Today Yesterday Today Design using 4 servers Last week Week n-7 Last week This week This week Yesterday Today Yesterday Today Design using 7 servers Last week Last week This week This week Yesterday Yesterday Today Today Yesterday Today Case Study Data Warehouse – Summary Large Database with Very Large requirement of number of rows loaded / day. Loaded in Parallel to different tables with Hardware being the determining bottleneck Partitioning using a data value of time Could be partitioned using Partitioned Views or partitioned within application Data partitioning allowed for easy scalability to multiple databases, instances or servers with minimal changes to code Maintenance decisions weren’t made because of prototype nature of the project. Being made now Agenda General comments on building large scale applications Scaling up Large Scale OLTP Best Practices Case Study Example Scaling up Large Scale Data Warehouse Best Practices Case Study Example Scale out - Clarification 64-bit – Awesome, but understand the value Summary / Q & A What is Scale Out? Multiple Nodes Evenly distributed data as in data partitioning Data Warehouse Telco example is data partitioning Distributed Partitioned Views is data partitioning Evenly distributed workload/data as in function partitioning Online shopping is function partitioning Break up unrelated functions across resources vs breaking up related data Scale Out Case Study: 13TB Data Analysis System Provides bad loan/credit collection service and reporting (Online and batch). Designed to create an environment where credit grantors, debt buyers, and collection agencies can meet their business objectives Large volume of data inserted every night from several sources Heavy analytical reporting/data mining performed all day long to predict and quantify the likelihood of collecting each delinquent account in a client’s portfolio. <24 hour turn around to customer requests Scale Out Case Study: Challenges Manage 13TBs of data Manage large files from several external customers / day Load millions of rows from multiple files and run batch reporting at the same time How do I manage different SLAs from different customers with same system. Scale Out Case Study: Function Partitioning Daily Customer Source files Account Function Tracking DB Individual Account Databases Function X Tables Common Database Function Y Tables Common Function Procedures - Data Function Z Tables Function Partitioning Partitioning is a VERY good idea Partitioning by business function is very logical Application must developed to partitioned data versus using database technology to access partitioned data. NOT DIFFICULT Putting all the data in one single database doesn’t make you a hero. No matter how you look at it 13TB of data is 13TB of data used to solve a business problem. Agenda General comments on building large scale applications Scaling up Large Scale OLTP Best Practices Case Study Example Scaling up Large Scale Data Warehouse Best Practices Case Study Example Scale out - Clarification 64 bit – Awesome, but understand the value Summary / Q&A 64-bit – Is it an upgrade lock? What is the hype of 64-bit? Produced world record benchmarks New advancements in CPU – Itanium It begs the questions: Should you automatically upgrade? Is it a performance guarantee? Is it an easy upgrade? 64-Bit Clear Winners OLAP Applications with large dimensions Breaks through the 3GB limit Essential for dimensions with millions of members Also can leverage large file system cache Relational applications requiring extreme memory (> 32GB) VLDBs with random access patterns that can be ‘cached’ to DB buffers for performance Any DB requiring > 64GB buffers Alternative to 32-bit apps using AWE No pressure on ‘real’ memory No overhead for mapping AWE pages 64-bit winners… If you need to keep procedure/statement cache in the virtual address space If you need to keep the page headers in the virtual address space. The more memory we use via AWE, the more page headers we have to keep and the less space for real data in the virtual address space If you need to keep memory associated with a server side cursor in virtual address space Calculating memory for hash joins and sort operations we only can calculate with the max of the virtual address space. We can not use AWE memory for such operations. Example: Spill out in TempDB Going beyond 16GB, we have to reduce the virtual address space to 2GB. This is contra productive since we get more data to be held in virtual address space (page headers). Concurrency issues of AWE under W2K got improved with W2K3. Problem showed up with 32way hardware. Some realities! Documentation states that Itanium twice the CPU power/speed as Xeon although slower clock speed You pay premium $ for a slower Itanium compared to Xeon CPU-bound SQL relational workloads will see performance improvement on same number of CPUs BUT, this does NOT mean 4way Itanium is equal to 8way Xeon. Very Large Memory could take a long time to shutdown due to a system checkpoint with a lot of data to flush. Recommendation – Keep system checkpoint at small interval (example: 1 minute). Takeaways For relational DBMS workloads, 64-bit is not a slam-dunk Often better to have more CPUs Often better to have faster CPUs Qualify before recommending. Look for Memory-intensive workloads Beyond what can be gracefully handled with AWE today, e.g. >> 32 GB Large-Scale OLAP workloads Agenda General comments on building large scale applications Scaling up Large Scale OLTP Best Practices Case Study Example Scaling up Large Scale Data Warehouse Best Practices Case Study Example Scale out - Clarification 64-bit – Awesome, but understand the value Summary / Q & A Summary We can build VLDB systems with SQL Server today and we are doing it. Design Wins Program may be able to assist. Send email to your local Microsoft representative or [email protected] It is not auto-tuning , auto configuration. It will take effort from you. Your Microsoft MCS can help Having this knowledge will help you be more successful while building large scalable systems Questions… [email protected] Community Resources Community Resources http://www.microsoft.com/communities/default.mspx Most Valuable Professional (MVP) http://www.mvp.support.microsoft.com/ Newsgroups Converse online with Microsoft Newsgroups, including Worldwide http://www.microsoft.com/communities/newsgroups/default.mspx User Groups Meet and learn with your peers http://www.microsoft.com/communities/usergroups/default.mspx Suggested Reading And Resources The tools you need to put technology to work! TITLE Microsoft® SQL Server™ 2000 Administrator's Companion:07356-1051-7 Microsoft® SQL Server™ 2000 High Availability: 0-7356-1920-4 Available Today 7/9/03 Microsoft Press books are 20% off at the TechEd Bookstore Also buy any TWO Microsoft Press books and get a FREE T-Shirt evaluations © 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.