* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Tim Chapman
Survey
Document related concepts
Transcript
Troubleshoot Customer Performance Problems Like a Microsoft Engineer Tim Chapman Senior Field Engineer, Microsoft About me… Tim Chapman Sr. Dedicated Premier Field Engineer at Microsoft Contributing author • SQL Server 2012 Bible • SQL Server MVP Deep Dives 2 @chapmandew Agenda A brief introduction to SQLDiag and Diag Manager A troubleshooting overview A quick look at Waits and Queues SQLDiag Command line utility that ships with SQL Server Located in the installation Binn directory Gathers perfmon logs, error logs, profiler traces, blocking information, etc Requires and XML configuration file This XML file specifies what to collect Can add custom collectors – allows you to grab the information you need You execute a PSSDIAG file, which in turn uses SQLDIAG under the covers PSSDIAG SQLDIAG Collectors Diag Manager What we use to create a pssdiag for you GUI tool used to create configuration file Free download from Codeplex The more you configure to trace, the more impact you may have on performance I rarely use Trace Capturing Custom Data Collections This is the REAL power of using Diag Manager & SQL Nexus Diag Manager can capture any scripts you specify and SQL Nexus can import them into a database Once imported, you can run your own diagnostic scripts to find problems SQL Nexus Tool used to import and report on SQLDiag output Allows you to develop custom collections and reports Available on Codeplex: http://sqlnexus.codeplex.com/ This means that the source code is available RML Utilities must be installed prior to installing SQL Nexus RML Utilities for SQL Server (x86) – http://www.microsoft.com/enus/download/details.aspx?id=8161 RML Utilities for SQL Server (x64) – http://www.microsoft.com/enus/download/details.aspx?id=4511 For more on capturing data… My 24 hours of PASS session on PSSDiag & SQL Nexus http://www.sqlpass.org/24hours/2014/summitpreview/Sessi ons/SessionDetails.aspx?sid=7293 Troubleshooting Techniques Define the problem There are many different techniques for troubleshooting Performance Troubleshooting is an iterative process Validate solution Implement Solution Gather data Analyze Data Waits and Queues A performance tuning method to help identify bottlenecks Waits Queues • Time spent waiting on resources inside the database engine • Use system DMVs to view this information • Measure system resource and utilization • Use perfmon to view this information Often there is a correlation between wait types and performance counters How can Waits & Queues help me? Avoid troubleshooting the “wrong” bottleneck Get the biggest payoff for your performance tuning efforts Find non-obvious performance bottlenecks (ie - Waits on memory grants) Excellent first step prior to deep tuning using other tools (SQL Profiler, logs, DMVs, Execution Plans) Examples An end-user or customer tells you about vague performance issue and you’re not sure where to start You are looking for ways to speed up a database-driven application and need to identify the primary bottleneck Benchmarking and trending (great for before-after testing during development) Performance Methodology Often, there is more than one resource bottlenecks on the system Approach Identify resource that is the main bottleneck Find the statements using the most of that resource Tune or optimize usage Repeat process How tasks are scheduled…High level Session Created Batch Submitted Task(s) Created Workers bound to Tasks Task Executed on Scheduler Session States sys.dm_exec_requests • Only one session can be running or executing per scheduler at a time Running sys.dm_exec_requests • Sessions waiting for CPU. Next SPID in the runnable queue is scheduled to start running Runnable • Sessions wait in the waiter list until resources become available Waiting sys.dm_exec_requests OR sys.dm_os_waiting_tasks wait_time_ms versus signal_wait_time Resource Wait Time Signal Wait Time sys.dm_os_wait_stats Total Wait Time Example: Mapping Waits to Queues Wait Type PAGEIOLATCH ASYNC_IO_COMPLETION WRITELOG Perfmon Counters Average Disk Queue Length (consistently high) Disk sec/read and disk sec/write is also high Log Bytes Flushed/sec high Conclusion? IO subsystem may be issue High I/O bandwidth issues Need to now identify and see if any un-tuned queries contribute to the IO How to read sys.dm_os_wait_stats wait_type – the resource unit that SQL Server is waiting on waiting_tasks_count – #of tasks that have spent time on this resource wait_time_ms – total time spent waiting overall max_wait_time_ms – max time spent waiting signal_wait_time_ms – time spent in the runnable queue A few notes on wait stats… Not all wait types are useful to report on Waits are accumulated after event occurs Granularity depends on event type Viewing as a whole may or may not be useful Time slicing can be more useful for trending and/or problem analysis Common High Waits If you’re seeing an extremely high % of a single wait on your system, it is very likely one of the following WRITELOG When we flush a database change to the transaction log file Logical Disk: Avg Disk Sec/Write Logical Disk: Avg Disk Bytes/Write Database: Log Flushes/Sec Database: Log Flush Write Time (ms) sys.dm_io_virtual_file_stats sys.dm_io_pending_io_requests CXPACKET Parallel queries are happening Not a good or bad thing Sometimes less desirable for OLTP workloads Need to know how the application is supposed to be processing before you can make a definite judgment sys.dm_os_waiting_tasks sys.dm_exec_query_stats SOS_SCHEDULER_YIELD When a thread voluntarily releases its hold on the scheduler to allow another thread to perform its work Not necessarily a problem unless it consumes a very high % of wait time on the system Processor: % User Time System: Context Switches/sec sys.dm_os_spinlock_stats sys.dm_exec_requests PAGEIOLATCH_* Latching a buf structure to move a page to memory from disk Long waits may indicate a disk or memory issue Logical Disk: Avg Disk Sec/Read Logical Disk: Disk Bytes/Sec Buffer Manager: Checkpoint Pages/sec Buffer Manager: Page Life Expectancy sys.dm_io_virtual_file_stats sys.dm_io_pending_io_requests PAGELATCH_* A task is waiting for a page latch not associated with an IO request Can be caused by inserts into the same page or contention on allocation pages sys.dm_exec_requests sys.dm_os_waiting_tasks sys.dm_os_buffers_descriptors ASYNC_NETWORK_IO Typically occurs because the client requesting data from SQL Server isn’t processing it fast enough Look at how the client is processing data Network Adapter: Current Bandwidth Network Adapter: Bytes Total/sec Network Adapter: Output Queue Length OLEDB Occurs when SQL calls the OLE DB Provider Often associated with 3rd party tools that heavily call DMVs Also can be associated with Linked Server calls, RPC calls, OpenQuery, OpenRowset or Profiler sys.dm_exec_requests LCK_* Waiting to acquire a lock We accumulate these AFTER the lock has been released Access Methods: Table Lock Escalations/sec Locks: Lock wait time (ms) Locks: Lock waits/sec Memory Manager: Lock Memory (KB) Missing Indexes sys.dm_db_index_operational_stats sys.dm_tran_locks sys.dm_exec_requests RESOURCE_SEMAPHORE Waiting for a memory grant due to a high number of concurrent queries or excessive memory grant requests Not uncommon for DW workloads Resource Governor can help Memory Manager: Memory Grants Pending Memory Manager: Memory Grants Outstanding sys.dm_exec_query_memory_grants sys.dm_exec_query_resource_semaphores tempdb A performance talk is not complete with mentioning tempdb Can become a bottleneck if not properly sized/allocated Faster drives here are better (hint: SSD) We use tempdb for a LOT (to name a few): Temporary tables(#) & table variables (@) Internal work tables (Spools) Spills (hash/sort/exchange) Version Store tempdb cont. Make sure all files are equally sized upon creation For # of files, we recommend: <8 Cores = use 8 tempdb files >=8 Cores = use 8 unless you still have latch contention Then add 4 at a time afterwards And now…Perfmon. Incredibly useful for SQL Server troubleshooting Not enough people use this tool The “hard” part is knowing what to monitor …but, we are here to help with that today. Introduces very little overhead on a SQL Server system Running Performance Monitor Start Run perfmon Causes perfmon to open with only a single counter Start Run perfmon /sys Causes perfmon to open with your saved counters Allows you to save .PerfmonCfg settings, which allows you to move to other machines Useful Memory Counters SQL Server:Buffer Manager Page Life Expectancy Checkpoint Pages/Sec Free Pages Lazy Writes/Sec Memory Manager: Memory Grants Pending Process: Working Set Memory: Available MBytes Useful CPU Counters Very high and sustained CPU usage is a clear sign of an underlying problem Processor: % Privileged Time Processor: % Processor Time Process: * SQL Statistics: Batch Requests/Sec Databases: Transactions/Sec SQL Statistics: Compiles/Sec Useful Process Counters Very useful to debug misbehaving processes on a machine Always look here to see if processes are competing with SQL Server – such as Anti-virus Useful counters: IO Data Bytes/sec % Processor Time Working Set Private Bytes Useful IO Counters Logical Disk Monitors logical partitions on the machine (drive letters) Physical Disk Monitors physical disks on the machine Avg Disk Sec/Read Avg Disk Sec/Write % Idle Time Disk Transfers/sec Power Settings The server power settings can be one of the easiest performance gains Setting from the default (Balanced) to High Performance can lead to gains of 20% or more Processor Information: % Processor Performance Application Settings Look at Process counters for specific processes Find out if the application is requesting too much data Application time-outs aren’t necessarily a SQL related problem Avoid applications installed on the same machine as SQL Filter Drivers Typically exists in the form of anti-virus (AV) software If you insist on using AV on the database instance, make sure to exclude the SQL Process and all SQL data types Look at Process: IO Data Bytes/sec for processes driving high IO levels other than SQL Server When to Use Which Tool? PAL is great for overall system performance ( PAL Download ) Benchmark Get acquainted with a workload PSSDIAG/Nexus More targeted performance analysis Need to view SQL internal resources (waits, blocking chains, query plans) Short timespan for collection