Download Set Title in 40pt. No more than 2 lines

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IEEE 1355 wikipedia , lookup

Bus (computing) wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Airborne Networking wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Remote Desktop Services wikipedia , lookup

Lag wikipedia , lookup

Transcript
Designing and Tuning High
Speed Data Loading
Thomas Kejser
Senior Program Manager
[email protected]
1
Agenda
 Tuning Methodology
 Bulk Load API Basics
 Design Pattern and Techniques
 Parallelism
 Table Layout
 Tuning the SQL Server Engine
 Tuning the Network Stack
 Tuning Integration Services
2
Tuning ETL and ELT
Tuning Methodology
3
The Tuning Loop
 Get a baseline
 Make small change at
Generate
Hypothesis
Save
Result
Measure
a time
 Agree on targets for
optimization
 Actual runtime
 CPU, Memory, I/O
Measure
Change
 The greedy tuner:
 “Tune it till it breaks,
then fix it, so you can
break it again”
4
Tools of the Trade - Windows
 Perfmon
 Logical Disk
 Memory
 Processor
 Process (specifically the DTEXEC process)
 Network Interface
 Task Manager
 WinDbg
 KernRate
5
Tool of the Trade – SQL Server
 Sys.dm_os_wait_stats
 All my tuning starts here
 Get familiar with common wait types
 Sys.dm_os_latch_stats
 Allows deep dive into LATCH_<X> waits
 Sys.dm_os_spinlock_stats
 When too much CPU seems to be spend
 Sys.dm_io_virtual_filestats
 Because I/O systems are rarely perfect
6
Designing and Tuning High Speed Data Loading
Bulk load API Basics
7
Four ways to Load Data to SQL
Server
 Integration Services
 OLEDB Destination
 SQL Server Destinations
 BULK INSERT
 CSV or fixed width files
 BCP
 Like BULK INSERT, but can be run remotely
 INSERT ... SELECT
8
Minimally logged and Bulk
 Bulk Load
 Feeds a continuous stream of data into a table
 As opposed to running singleton INSERT statements
 Minimally logged
 Only allocations are logged, not individual rows/pages
 Key Takeway: An operation can be a bulk load
operation without being minimally logged
9
To TABLOCK or not to TABLOCK
 General Rule (batch style):
 Heaps: Use TABLOCK on Heaps
 Cluster Indexes: Do NOT use TABLOCK
 Minimally logged:
 INSERT Heap WITH (TABLOCK) SELECT ...
 If TF610 is on:
 INSERT ClusterIndex SELECT ...
 Same rules apply for SSIS OLEDB and SQL
Destinations in SSIS
10
Designing and Tuning High Speed Data Loading
Design Patterns
11
Integration Services or T-SQL
 Sometimes: Matter or preference
 Integration Services is graphical
 Some users like this
 Hard to make modular
 SQL Server uses T-SQL ”text language”
 Modular programming
 The right tool for the right job
 Learn both…
12
SQL Server – Which load method?
BULK INSERT / BCP
 Pro
INSERT ... SELECT
 Pro
 Can takes BU-lock
 Can perform transformations
 No need for Linked Servers
 Any OLEDB enabled input
or OPENROWSET
 Cons
 Only CSV and fixed width
files for input
 Cons
 Takes X-locks on table
 Linked Servers or
OPENROWSET needed
13
Integration Services – Which
Destination?
OLEDB Destination
 Pros:
SQL Server Destination
 Pro:
 Can be used over TCP/IP
 Fastest option
 ETL Servers can be scaled
 Easy to configure
out remote
 Con:
 Typically slower than SQL
Destination
 Con:
 Must run on same box as
SQL Server (shared memory
connections)
14
Design Pattern: Parallel Load
 Create a (priority) queue for your packages
 SQL Table good for this purpose
 Packages / T-SQL include a loop:
 Loop takes one item from queue
 Until queue empty…
Priority Queue
DTEXEC (1)
DTEXEC (2)
15
Design Pattern: Table Hash
Partitioning

Create filegroups to hold the partitions


Use CREATE PARTITION FUNCTION
command


Partition the tables into #cores partitions
Use CREATE PARTITION SCHEME
command


Equally balance over LUN using optimal
layout
hash
0
1
2
3
4
5
6
Bind partition function to filegroups
Add hash column to table (tinyint, just
one byte per row)

Calculate a good hash distribution

For example, use hashbytes with modulo
or binary_checksum
253
254
255
16
Design Pattern: Large Updates
Sales
Sales
Updated
2001
2002
Sales_Ne
w
SWITCH
Sales_Old
Update Records
Sales_Delt
a
BULK INSERT
2003
2004
17
Design Pattern: Large Deletes
Sales
2001
(Filtered)
2001
2002
BULK
SWITCH
INSERT
Sales_Temp
(2001
Filtered)
Sales_Temp
(2001)
2003
2004
18
Designing and Tuning High Speed Data Loading
Tuning the SQL Server Engine
19
ALLOC_FREESPACE_CACHE
- Heap limits
 Measure:
250.0
Sys.dm_os_latch_waits
 Long waits for
ALLOC_FREESPACE_CAC
HE
 SQL Server® Books Online:
cache of pages with available space
for heaps and binary large objects
(BLOBs). Contention on latches of
this class can occur when multiple
connections try to insert rows into a
heap or BLOB at the same time. You
can reduce this contention by
partitioning the object.”
150.0
MB/Sec
 “Used to synchronize the access to a
200.0
100.0
50.0
 Hypothesis: More heaps =
more speed
0.0
0
10
20
Concurrent Bulks
30
20
PAGELATCH_UP
– PFS contention
 Measure:
 sys.dm_os_wait_stats
 Hypothesis Generation
 I/O problem?
 What can we predict?
 Fix: Add more files to
the filegoup!
21
RESOURCE_SEMAPHORE
- Query memory usage
 DW load queries will
often be very memory
intensive
 By default, a single
query can max use 25%
of SQL Server’s
allocated memory
 Queries waiting to get a
memory grant will wait
for:
RESOURCE_SEMAPH
ORE
 Can use RG to work
around it
22
SOS_SCHEDULER_YIELD
 Hypothesis: Caused by two bulk commands at
same scheduler
 Predict:
 We should see multiple bulk commands on same scheduler
 Observe: And we do…
 scheduler_id in sys.dm_exec_requests
23
Fixing SOS_SCHEDULER_YIELD
 How can we fix this?
 Two ways:
 Terminate and reconnect
 Soft NUMA
Core 0
Soft-NUMA
Node 0
TCP port
1433
x CPU
cores
Core X
Soft-NUMA
Node X
TCP port
1433 + X
BULK INSERT
x CPU
cores
BULK INSERT
24
I/O Related Waits for BULK INSERT
 BULK insert uses a
double buffering
scheme
 Important to feed it fast
enough
 Also, target SQL
Server must be able to
absorb writes
Table
PAGEIOLATCH_EX
Pars
e
64KB
64KB
CSV
IMPROVIO_WAIT
OLEDB
ASYNC_NETWORK_IO
25
CXPACKET – When it Matters
 Statements of type
Throughput / DOP
50.0
 INSERT…SELECT
45.0
 Measure: Sometimes
Throughput (MB/sec(
throughput drops with
higher DOP
 Hypothesis:
backpressure in query
execution
40.0
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
1
11
21
31
41
DOP
26
Drinking From a Fire Hose
CXPACKET waits /
Throughput
200,000,000
180,000,000
140,000,000
120,000,000
100,000,000
Solution:
OPTION (MAXDOP = X)
80,000,000
60,000,000
CXPACKET Waits
160,000,000
40,000,000
20,000,000
0
40.0
30.0
20.0
10.0
Throughput (MB/sec)
27
SQL Server waits - Summary
Wait Type
Typical Cause
Resolution
PAGELATCH_UP
Contention on PFS pages
Add more data files to filegroup
ALLOC_FREESPACE_CACHE
Heap allocation bottleneck
Partition target table and use SWITCH
SOS_SCHEDULER_YIELD
Network speed not keeping up
Optimize network settings in Windows
(Jumbo Frames)
Increase packet size
RESOURCE_SEMAPHORE
Too much memory used by query
Optimize query for less memory or use
Resource Governor to limit max
allocation
LCK_X
Locks prevent parallelism
Use correct lock hints
WRITELOG
Transaction log contention
Use TF610, seeks minimally logged
operatorions
PAGEIOLATCH_<X>
I/O system not keeping
Tune I/O
IMPROV_IOWAIT
Input file I/O too slow
Improve input file latency and/or through
CXPACKET
Normallly harmless. But may be too much
coordination
Use MAXDOP hint, but carefully
OLEDB/ASYNC_NETWORK_IO
Not feeding bulk load fast enough
Optimize source
28
Designing and Tuning High Speed Data Loading
Tuning the Network Stack
29
How to Affinitize NICs
 Using the Interrupt-Affinity
Policy Tool you can affinitize
individual NICs to CPU
cores
 Affinitize each of the NIC to
their own core
 One NIC per hard NUMA
node
 You mileage may very –
depends on the box
 Match Soft NUMA TCP/IP
connections with NIC
 NIC on the hardware NUMA
node maps to SQL bulk
stream target on same node
30
Tune Network Parameters
 Jumbo Frames = 9014 bytes enabled
 Adaptive Inter-Frame spacing disabled
 Flow control = Tx & Rx enabled
 Client & server Interrupt Moderation = Medium
 Coalesc buffers = 256
 Set server Rx buffers to 512 and server Tx buffers
to 512
 Set client Rx buffers to 512 and client Tx buffers to
256
 Link speed 1000mbps Full Duplex
31
Network Packet Size

Measure
 Perfmon shows huge discrepancy
between num reads and writes

Hypothesis:
 This is caused by small network
packet size (Default 4096)
forcing stream to be broken into
smaller pieces

Test and prove:
 Adjusting network packet size to
32K
 Increases throughput by 15%
32
Designing and Tuning High Speed Data Loading
Tuning Integration Services
33
Integration Services vs. SQL
 Lab Test Setup
Test 2: Raw Join
Time/s
Krows/s
 Transform fact data with
SSIS 2008
144
2222
surrogate key lookups
 5 dimension tables, 100K
rows each
 Partitioned fact table,
total of 320M rows
SQL MAXDOP = 0
158
2025
SQL MAXDOP = 1 x 32
162
1975
SQL MAXDOP = 1 x 32
246
1301
SSIS 2008
278
1151
SQL MAXDOP = 0
1927
166
 Test speed of hash
Test 3: Join and write
joins
Integration Services lookup join is comparable in speed with T-SQL!
34
Baseline of Package
 Sanity check:
 How much memory does each package use?
 How much CPU does each package stream use?
 Need enough CPU and Memory to run them all
 Performance counters:
 Process – Private Bytes / Working Set (DTEXEC)
 Processor – % Processor Time
 Network interface
 Network / Current Bandwidth
 Network / Bytes Total/sec
35
Scaling the Package - Method
 Using the parallel load technique described earlier
you can run multiple copies of the package
 Using the baseline of the package, you can now
calculate how many scale servers you will need
36
Data Loading – Fast Enough?
 Bulk load scales near linearly with bulk
streams
 Measured so far up to 96
cores
 Possible to reach 100% CPU load on all cores
 “Just” Get rid of all bottlenecks
37
&
38
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
39
Tuning ETL and ELT
APPENDIX
40
Data Loading Links
 The Data Loading Performance Guide
 Top 10 SQL Server Integration Services Best




Practices
Managing and Deploying SQL Server Integration
Services
SQL Server 2005 Integration Services: A Strategy
for Performance
Integration Services: Performance Tuning
Techniques
High Impact Data Warehousing with SQL Server
Integration Services
41