* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Powerpoint - SQL Saturday
Microsoft Jet Database Engine wikipedia , lookup
Tandem Computers wikipedia , lookup
Database model wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Object-relational impedance mismatch wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Monitor SQL Server Efficiently Remus Rusanu #sqlsaturday565 24 Sept 2016, Bucharest 24 Sept 2017 SQLSaturday 565 Bucharest whoami     Worked in the SQL Server team with Microsoft since 2001 Now founder with DBHistory.com Spent many hours investigating performance On-call performance engineer for Azure SQL DB  Troubleshooting based on telemetry alone 24 Sept 2017 SQLSaturday 565 Bucharest Why do we monitor?  Troubleshooting     Post Mortem analysis     Understand long term direction, estimate capacity needs, justify spending requests Contiguous, low resolution, long retention Baselining    After an incident we need to look back to understand why it occurred Contiguous, medium resolution, discardable after a grace period Missing data feedback loop Trending    We need to investigate incidents as they occur and we need to measure right now On demand, high resolution, short duration, discardable Requires access to the system being monitored Periodically collect same data we would collect in troubleshooting so we can compare an incident with normal activity On schedule, high resolution, medium duration, long retention Alerting   Detect incidents, notify on-call team and trigger investigation Alerting coupled with automation: mitigation bots for know issues 24 Sept 2017 SQLSaturday 565 Bucharest What do we monitor?  Activity  What is running  Capacity      Free Disk, space used CPU utilization IO utilization Memory use, paging Network use, bandwidth, latency 24 Sept 2017 SQLSaturday 565 Bucharest  Availability  Uptime, SLA  Errors  Recoverability  Backups  Availability Groups  Specific features  Replication How do we monitor?  Performance Counters  The Golden Standard when it comes to measurement  Easy to collect, low impact, rich toolset, cheap to store  DMVs  Difficult to collect, many require snapshot-store-and-compare  Some have significant impact  XEvents  Abundant information  Easy to filter at source  Difficult to collect 24 Sept 2017 SQLSaturday 565 Bucharest     ETW Logs Event Notifications Query Store USE methodology http://www.brendangregg.com/usemethod.html  Utilization, Saturation and Errors  Identify resources in the system  For each resource, identify metrics that represent utilization (in use vs. idle) and saturation (queueing, blocking, waiting). Identify errors indicators (events, logs etc)  When investigating, iterate through resources  Look at error indicators  Look for saturation indicators  Look for high utilization percentage  Generic methodology, the trick is identifying resources and collecting/finding the associated metrics  Can be applied at host level (CPU, IO, network) but also at SQL Server internals level 24 Sept 2017 SQLSaturday 565 Bucharest SQL Server Query Execution http://rusanu.com/2013/08/01/understanding-how-sql-server-executes-a-query 24 Sept 2017 SQLSaturday 565 Bucharest Performance Counters  Extremely cheap for a process to produce performance counters  Increment a memory location in a shared memory area  Extremely cheap for monitoring to read a value  Read the value via shared memory  Low impact  Rich toolset      SDK: .Net, PDH native OS service for collecting them (Data Collection Sets) perfmon.exe, logman.exe, typeperf.exe Can write directly to SQL (and this is supported by Data Collection Sets) PowerShell supports direct counter querying Get-Counter  Data Collector Sets must go through COM object Pla.DataCollectorSet 24 Sept 2017 SQLSaturday 565 Bucharest Performance Counters tools  perfmon.exe  Interactive GUI for Data Collector Sets management  Interactive GUI for on-demand counters collection  Graphic visualization for both real time and historic logs  logman.exe  CLI for Data Collector Sets (counters, ETW, alerts)  Countless 3rd party tools  DMV sys.dm_os_performance_counters  I advice against it’s use: expensive to query, difficult to read correctly  Has the advantage of being available over TDS 24 Sept 2017 SQLSaturday 565 Bucharest Performance Counters SQL logging option  Read and Write counter values into an ODBC defined destination  Define an ODBC 64bit System DSN, specify database name explicitly  A feature of PDH, available for all PDH consumers  SDK: PdhOpenLog (…, PDH_LOG_TYPE_SQL, …)  perfmon, logman, typeperf  On first connect it will deploy the SQL tables  Schema is documented at https://msdn.microsoft.com/enus/library/windows/desktop/aa373198(v=vs.85).aspx  CounterData table is a perfect columnstore candidate 24 Sept 2017 SQLSaturday 565 Bucharest SQL Logging table schema 24 Sept 2017 SQLSaturday 565 Bucharest Interpreting the SQL logged data 24 Sept 2017 12 SQLSaturday | 565 Bucharest Reading SQL logged counters using Perfmon 24 Sept 2017 SQLSaturday 565 Bucharest What to collect: CPU  Processor (_Total)\% Processor Time  Processor Object(*) seldom justifies the overhead  Processor (_Total)\% Priviledged Time  Process(*)\% Processor Time  Can help track down when other processes starve CPU  Expensive to collect, each process is a separate instance  Collect Process(sqlservr)\% Processor Time and Process(_Total)\% Processor Time can at least shift the blame, but not pinpoint the culprit.  Buffer Manager\Page lookups/sec  Indicative of scans, it helps explain high CPU 24 Sept 2017 SQLSaturday 565 Bucharest What to collect: memory (OS)  Memory\Page Reads/sec  This are hard page faults. Page Faults\sec is soft faults.     Memory\% Committed Bytes In Use Memory\Available Bytes Memory\Commit Limit Process(sqlservr)\Private Bytes 24 Sept 2017 SQLSaturday 565 Bucharest What to collect: memory (SQL)  Memory Manager\*  Really: collect every counter if you can afford it.  No magic formula for ‘good’ vs. ‘bad’ values, but can be compared with baseline  Memory Grants Outstanding  Memory Grants Pending  Buffer Manager\Buffer cache hit ratio  Buffer Node(*)\Page Life Expectancy 24 Sept 2017 SQLSaturday 565 Bucharest What to collect: IO (OS)  Process(sqlservr)\      IO Read Operations/sec IO Read Bytes/sec IO Write Operations/sec IO Write Bytes/sec The counters measure all IO (disk, network, devices)  Physical Disk\  What to capture and how to interpret it is a black art  Windows Performance Monitor Disk Counters Explained https://blogs.technet.microsoft.com/askcore/2012/03/16/windows-performance-monitor-disk-counters-explained/  ‘Good’ vs. ‘Bad’ values are highly dependent on hardware  Memory\Pages/sec 24 Sept 2017 SQLSaturday 565 Bucharest What to collect: IO (SQL)  Buffer Manager/         Page reads/sec Readahead pages/sec Page writes/sec Background writer pages/sec Checkpoint pages/sec Extension page reads/sec Extension pages writes/sec Lazy writes/sec  Database(*)/  Log Bytes Flushed/sec  Backup/Restore Throughput/sec 24 Sept 2017 SQLSaturday 565 Bucharest Understanding how SQL Server executes a query http://rusanu.com/2013/08/01/understanding-how-sql-server-executes-a-query/  TL/DR: CPU Wait C P U Wait Time 24 Sept 2017 SQLSaturday 565 Bucharest CPU Wait CPU What to collect: blocking (the aspiration slide)  sys.dm_os_wait_stats  sys.dm_os_latch_stats  sys.dm_os_spinlock_stats  TODO: fill up when decent monitoring possible     Ever Increasing values Require snapshot-store-and-compare Can be reset, difficult to detect in processing logic No differentiation between idle wait and busy wait resulting in tribal knowledge ‘benign waits’  However, still important to collect… somehow  sys.dm_session_wait_stats 24 Sept 2017 SQLSaturday 565 Bucharest What to collect: blocking (the pragmatic slide)  Wait Statistics(*)\Average wait time (ms)  A performance counter, easy to collect and analyze  There is an instance per wait type  but only few selected wait types are represented      Latches\Average Latch Wait Time (ms) Latches\Total Latch Wait Time (ms) Locks\Average Wait Time (ms) Locks\Number of Deadlocks/sec General Statistics\Processes blocked 24 Sept 2017 SQLSaturday 565 Bucharest What to collect: workload  SQL Statistics\Batch Requests/sec  SQL Statistics\SQL Attention rate  ‘Attention’ is the TDS jargon for client command timeout  Transactions\Transactions  General Statistics\User Connections  SQL Errors(_Total)\Errors/sec 24 Sept 2017 SQLSaturday 565 Bucharest What to collect: miscelaneous       LogicalDisk(*)\% Free Space Databases(*)\Data File(s) Size (KB) Databases(*)\Log Growths Plan Cache(*)\Cache Objects Count General Statistics\Temp tables creation rate Access Methods  Full scans/sec, Probe Scans/sec, Range Scans/sec, Index Searches/sec  Page Splits/sec  Skipped Ghosted Records/sec, Forwarded Records/sec 24 Sept 2017 SQLSaturday 565 Bucharest Measure workload response distribution  Batch Resp Time(*)\*  “one latency distribution plot is worth a thousand throughput measurements”  Typical SQL Server has a heterogeneous workload  X batches complete in .01s Y in 1s and Z in 100.0  Is it one query over a latency distribution? Or is it 3 very different queries?  Latency for specific queries better measured in app  It is trivial to expose new counters from apps http://rusanu.com/2009/04/11/using-xslt-to-generate-performance-counters-code/ 24 Sept 2017 SQLSaturday 565 Bucharest Collecting via querying  Collecting data by periodically querying catalog views/DMVs     Expensive, some DMVs have serious performance overhead Snapshot-store-and-compare Many DMVs arbitrarily reset internally, unreliable for monitoring Some information, still, hard to discover any other way  sys.dm_io_virtual_file_stats  sys.dm_db_index_usage_stats 24 Sept 2017 SQLSaturday 565 Bucharest Collecting query execution stats     Don’t. Use Query Store instead. But if you must: sys.dm_exec_query_stats sys.dm_exec_procedure_stats Avoid the join with sys.dm_exec_sql_text and sys.dm_exec_sql_plan  Very expensive  Honor query_hash and plan_hash 24 Sept 2017 SQLSaturday 565 Bucharest Using Event Notifications create queue notifications; go create service notifications on queue notifications ([http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]); go create event notification [sqlsaturday] on server for ddl_events to service N'notifications', N'current database'; go waitfor(receive cast(message_body as xml) from notifications); 24 Sept 2017 SQLSaturday 565 Bucharest DDL_EVENTS  Captures any DDL, in any database, for any type  CREATE/ALTER/DROP  GRANT/DENY/REVOKE  sp_rename, sp_tableoptions etc  Captures any configuration change  sp_configure  Database scoped options  It can be made more granular if desired  Capture only specific database  Capture only specific events:  select * from sys.server_events 24 Sept 2017 SQLSaturday 565 Bucharest Event Notifications for Profiler events  Bridges Profiler events as Event Notification messages  select * from sys.trace_events  It can do everything administrative trace can do plus: + Deliver the message remotely + Trigger activated procedure 24 Sept 2017 SQLSaturday 565 Bucharest Collect all warnings? File Growth? Deadlocks? Logins? create event notification [trace_events] on server for Hash_Warning, Execution_Warnings, Sort_Warnings, Bitmap_Warning, Log_File_Auto_Grow, Deadlock_graph, Audit_Login_Failed, Audit_Login to service … 24 Sept 2017 SQLSaturday 565 Bucharest Collecting Event Notification  Events can be delivered remotely  All events look the same, an XML payload  Must shred the XML and discover object types, object names and event types  Some events apply to multiple objects eg. CREATE INDEX  Coupled with Activation can trigger warnings or mitigation  Reliable delivery, can survive disconnects  Also means that events always refer to past, automated mitigation must check current state before proceeding  No toolset whatsoever 24 Sept 2017 SQLSaturday 565 Bucharest Query Data Store  Absolutely best option for monitoring query performance  Optimized, hooked deep into execution  Cannot be simulated with DMV snapshotting  Excellent troubleshooting tool  Runtime stats, Compilation Stats 24 Sept 2017 SQLSaturday 565 Bucharest Query Store missing features  Collection of wait info  Centralized aggregation of multiple sources  Collect info from readable secondaries 24 Sept 2017 SQLSaturday 565 Bucharest XEvents  Overwhelmingly rich information  Cheap to produce  Built in analytical capabilities  Counter, Histogram, Pair Matching  Difficult to collect  Require ETL through a SQL Server  Poor toolset 24 Sept 2017 SQLSaturday 565 Bucharest ETW, Windows Performance Analyzer  Very powerful for analyzing the entire stack  A must when the issue is not SQL Server  Quick Start Guide: WPA Basics https://msdn.microsoft.com/en-us/library/ff190975.aspx  Bruce Dawson blog: https://randomascii.wordpress.com/ 24 Sept 2017 SQLSaturday 565 Bucharest Sampling Profiling  Requires Visual Studio toolset vsperf.exe  Captures execution stacks on all threads, about 10k/sec  Only as an on-demand troubleshooting option  Difficult to setup  Can have impact  Incredibly rich insight into what the server is doing  Provided you manage to resolve the symbols…  Requires code understanding  Educated guess of execution role from function names 24 Sept 2017 SQLSaturday 565 Bucharest Sqlservr sample profile 24 Sept 2017 SQLSaturday 565 Bucharest Tanks, Q&A and please review http://speakerscore.com/ZFJ9 24 Sept 2017 SQLSaturday 565 Bucharest