Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Designing and Tuning High Speed Data Loading Thomas Kejser Senior Program Manager [email protected] 1 Agenda Tuning Methodology Bulk Load API Basics Design Pattern and Techniques Parallelism Table Layout Tuning the SQL Server Engine Tuning the Network Stack Tuning Integration Services 2 Tuning ETL and ELT Tuning Methodology 3 The Tuning Loop Get a baseline Make small change at Generate Hypothesis Save Result Measure a time Agree on targets for optimization Actual runtime CPU, Memory, I/O Measure Change The greedy tuner: “Tune it till it breaks, then fix it, so you can break it again” 4 Tools of the Trade - Windows Perfmon Logical Disk Memory Processor Process (specifically the DTEXEC process) Network Interface Task Manager WinDbg KernRate 5 Tool of the Trade – SQL Server Sys.dm_os_wait_stats All my tuning starts here Get familiar with common wait types Sys.dm_os_latch_stats Allows deep dive into LATCH_<X> waits Sys.dm_os_spinlock_stats When too much CPU seems to be spend Sys.dm_io_virtual_filestats Because I/O systems are rarely perfect 6 Designing and Tuning High Speed Data Loading Bulk load API Basics 7 Four ways to Load Data to SQL Server Integration Services OLEDB Destination SQL Server Destinations BULK INSERT CSV or fixed width files BCP Like BULK INSERT, but can be run remotely INSERT ... SELECT 8 Minimally logged and Bulk Bulk Load Feeds a continuous stream of data into a table As opposed to running singleton INSERT statements Minimally logged Only allocations are logged, not individual rows/pages Key Takeway: An operation can be a bulk load operation without being minimally logged 9 To TABLOCK or not to TABLOCK General Rule (batch style): Heaps: Use TABLOCK on Heaps Cluster Indexes: Do NOT use TABLOCK Minimally logged: INSERT Heap WITH (TABLOCK) SELECT ... If TF610 is on: INSERT ClusterIndex SELECT ... Same rules apply for SSIS OLEDB and SQL Destinations in SSIS 10 Designing and Tuning High Speed Data Loading Design Patterns 11 Integration Services or T-SQL Sometimes: Matter or preference Integration Services is graphical Some users like this Hard to make modular SQL Server uses T-SQL ”text language” Modular programming The right tool for the right job Learn both… 12 SQL Server – Which load method? BULK INSERT / BCP Pro INSERT ... SELECT Pro Can takes BU-lock Can perform transformations No need for Linked Servers Any OLEDB enabled input or OPENROWSET Cons Only CSV and fixed width files for input Cons Takes X-locks on table Linked Servers or OPENROWSET needed 13 Integration Services – Which Destination? OLEDB Destination Pros: SQL Server Destination Pro: Can be used over TCP/IP Fastest option ETL Servers can be scaled Easy to configure out remote Con: Typically slower than SQL Destination Con: Must run on same box as SQL Server (shared memory connections) 14 Design Pattern: Parallel Load Create a (priority) queue for your packages SQL Table good for this purpose Packages / T-SQL include a loop: Loop takes one item from queue Until queue empty… Priority Queue DTEXEC (1) DTEXEC (2) 15 Design Pattern: Table Hash Partitioning Create filegroups to hold the partitions Use CREATE PARTITION FUNCTION command Partition the tables into #cores partitions Use CREATE PARTITION SCHEME command Equally balance over LUN using optimal layout hash 0 1 2 3 4 5 6 Bind partition function to filegroups Add hash column to table (tinyint, just one byte per row) Calculate a good hash distribution For example, use hashbytes with modulo or binary_checksum 253 254 255 16 Design Pattern: Large Updates Sales Sales Updated 2001 2002 Sales_Ne w SWITCH Sales_Old Update Records Sales_Delt a BULK INSERT 2003 2004 17 Design Pattern: Large Deletes Sales 2001 (Filtered) 2001 2002 BULK SWITCH INSERT Sales_Temp (2001 Filtered) Sales_Temp (2001) 2003 2004 18 Designing and Tuning High Speed Data Loading Tuning the SQL Server Engine 19 ALLOC_FREESPACE_CACHE - Heap limits Measure: 250.0 Sys.dm_os_latch_waits Long waits for ALLOC_FREESPACE_CAC HE SQL Server® Books Online: cache of pages with available space for heaps and binary large objects (BLOBs). Contention on latches of this class can occur when multiple connections try to insert rows into a heap or BLOB at the same time. You can reduce this contention by partitioning the object.” 150.0 MB/Sec “Used to synchronize the access to a 200.0 100.0 50.0 Hypothesis: More heaps = more speed 0.0 0 10 20 Concurrent Bulks 30 20 PAGELATCH_UP – PFS contention Measure: sys.dm_os_wait_stats Hypothesis Generation I/O problem? What can we predict? Fix: Add more files to the filegoup! 21 RESOURCE_SEMAPHORE - Query memory usage DW load queries will often be very memory intensive By default, a single query can max use 25% of SQL Server’s allocated memory Queries waiting to get a memory grant will wait for: RESOURCE_SEMAPH ORE Can use RG to work around it 22 SOS_SCHEDULER_YIELD Hypothesis: Caused by two bulk commands at same scheduler Predict: We should see multiple bulk commands on same scheduler Observe: And we do… scheduler_id in sys.dm_exec_requests 23 Fixing SOS_SCHEDULER_YIELD How can we fix this? Two ways: Terminate and reconnect Soft NUMA Core 0 Soft-NUMA Node 0 TCP port 1433 x CPU cores Core X Soft-NUMA Node X TCP port 1433 + X BULK INSERT x CPU cores BULK INSERT 24 I/O Related Waits for BULK INSERT BULK insert uses a double buffering scheme Important to feed it fast enough Also, target SQL Server must be able to absorb writes Table PAGEIOLATCH_EX Pars e 64KB 64KB CSV IMPROVIO_WAIT OLEDB ASYNC_NETWORK_IO 25 CXPACKET – When it Matters Statements of type Throughput / DOP 50.0 INSERT…SELECT 45.0 Measure: Sometimes Throughput (MB/sec( throughput drops with higher DOP Hypothesis: backpressure in query execution 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 1 11 21 31 41 DOP 26 Drinking From a Fire Hose CXPACKET waits / Throughput 200,000,000 180,000,000 140,000,000 120,000,000 100,000,000 Solution: OPTION (MAXDOP = X) 80,000,000 60,000,000 CXPACKET Waits 160,000,000 40,000,000 20,000,000 0 40.0 30.0 20.0 10.0 Throughput (MB/sec) 27 SQL Server waits - Summary Wait Type Typical Cause Resolution PAGELATCH_UP Contention on PFS pages Add more data files to filegroup ALLOC_FREESPACE_CACHE Heap allocation bottleneck Partition target table and use SWITCH SOS_SCHEDULER_YIELD Network speed not keeping up Optimize network settings in Windows (Jumbo Frames) Increase packet size RESOURCE_SEMAPHORE Too much memory used by query Optimize query for less memory or use Resource Governor to limit max allocation LCK_X Locks prevent parallelism Use correct lock hints WRITELOG Transaction log contention Use TF610, seeks minimally logged operatorions PAGEIOLATCH_<X> I/O system not keeping Tune I/O IMPROV_IOWAIT Input file I/O too slow Improve input file latency and/or through CXPACKET Normallly harmless. But may be too much coordination Use MAXDOP hint, but carefully OLEDB/ASYNC_NETWORK_IO Not feeding bulk load fast enough Optimize source 28 Designing and Tuning High Speed Data Loading Tuning the Network Stack 29 How to Affinitize NICs Using the Interrupt-Affinity Policy Tool you can affinitize individual NICs to CPU cores Affinitize each of the NIC to their own core One NIC per hard NUMA node You mileage may very – depends on the box Match Soft NUMA TCP/IP connections with NIC NIC on the hardware NUMA node maps to SQL bulk stream target on same node 30 Tune Network Parameters Jumbo Frames = 9014 bytes enabled Adaptive Inter-Frame spacing disabled Flow control = Tx & Rx enabled Client & server Interrupt Moderation = Medium Coalesc buffers = 256 Set server Rx buffers to 512 and server Tx buffers to 512 Set client Rx buffers to 512 and client Tx buffers to 256 Link speed 1000mbps Full Duplex 31 Network Packet Size Measure Perfmon shows huge discrepancy between num reads and writes Hypothesis: This is caused by small network packet size (Default 4096) forcing stream to be broken into smaller pieces Test and prove: Adjusting network packet size to 32K Increases throughput by 15% 32 Designing and Tuning High Speed Data Loading Tuning Integration Services 33 Integration Services vs. SQL Lab Test Setup Test 2: Raw Join Time/s Krows/s Transform fact data with SSIS 2008 144 2222 surrogate key lookups 5 dimension tables, 100K rows each Partitioned fact table, total of 320M rows SQL MAXDOP = 0 158 2025 SQL MAXDOP = 1 x 32 162 1975 SQL MAXDOP = 1 x 32 246 1301 SSIS 2008 278 1151 SQL MAXDOP = 0 1927 166 Test speed of hash Test 3: Join and write joins Integration Services lookup join is comparable in speed with T-SQL! 34 Baseline of Package Sanity check: How much memory does each package use? How much CPU does each package stream use? Need enough CPU and Memory to run them all Performance counters: Process – Private Bytes / Working Set (DTEXEC) Processor – % Processor Time Network interface Network / Current Bandwidth Network / Bytes Total/sec 35 Scaling the Package - Method Using the parallel load technique described earlier you can run multiple copies of the package Using the baseline of the package, you can now calculate how many scale servers you will need 36 Data Loading – Fast Enough? Bulk load scales near linearly with bulk streams Measured so far up to 96 cores Possible to reach 100% CPU load on all cores “Just” Get rid of all bottlenecks 37 & 38 © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 39 Tuning ETL and ELT APPENDIX 40 Data Loading Links The Data Loading Performance Guide Top 10 SQL Server Integration Services Best Practices Managing and Deploying SQL Server Integration Services SQL Server 2005 Integration Services: A Strategy for Performance Integration Services: Performance Tuning Techniques High Impact Data Warehousing with SQL Server Integration Services 41