Download Powerpoint - SQL Saturday

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Tandem Computers wikipedia , lookup

Microsoft Access wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Btrieve wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Null (SQL) wikipedia , lookup

Relational model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Transcript
Building a Performance Monitoring
System using XEvents and DMVs
Ola Hallengren, Saxo Bank
PASS SQL Saturday Prague - 2016
About me





Ola Hallengren
https://ola.hallengren.com
E-mail: [email protected]
DBA in Saxo Bank, a Danish investment bank
Microsoft MVP – Data Platform
2 | PASS SQL Saturday Prague - 2016
Agenda
 About Extended Events
 How we built a monitoring solution using Extended Events
and DMVs
 Techniques that we used
 Demos
3 | PASS SQL Saturday Prague - 2016
About Extended Events
 Light weight replacement for SQL Trace / Profiler
http://msdn.microsoft.com/en-us/library/bb630282.aspx
 Introduced in SQL Server 2008
 Greatly improved in SQL Server 2012 (more events)
 Event data in xml format
 CREATE EVENT SESSION (or SSMS GUI)
 sys.fn_xe_file_target_read_file to read event (or SSMS GUI)
4 | PASS SQL Saturday Prague - 2016
XEvents – First steps
 Creating an Extended Events session using the SSMS GUI or
a script, with Event File as target
 Query the events using the SSMS GUI or through XQuery on
the production server (or copy the files to another server and
query them there)
5 | PASS SQL Saturday Prague - 2016
XEvents – Challenges
 XQuery is slow
 XQuery is not easy to write if you are not familiar with it
 Querying the events on the production server puts a load on
the server (and even more if many DBAs are doing it at the
same time)
 You need to have access to the production server to query the
events (makes it difficult to give access to developers)
 If you copy the files to another server, you are not getting in
new events
 Difficult to correlate events with data from DMVs (like SQL
Texts and Query Plans)
6 | PASS SQL Saturday Prague - 2016
XEvents Monitoring - Requirements
 It should be running all the time on all servers
 Events should be stored in a central database
 Events should be available for querying very close to real-time
(so that it can be used in live incidents)
 If the monitoring solution is down, no events should be lost (it
should just catch up when it starts again)
 Data should be easily available for DBAs and developers, and
without using XQuery
 No XQuery on the production servers (for performance
reasons)
 Collection of SQL Texts and Query Plans (triggered by events)
 SQL Server 2012 and later
7 | PASS SQL Saturday Prague - 2016
XEvents Monitoring – Design
 A company default Extended Events session
(database_health) running on all SQL Servers with Event File
as target
 PowerShell scripts (running on a central server) collecting
events every 30 seconds
 Using sys.fn_xe_file_target_read_file to read new events
 Storing data into a central database
 Views to access data
 XQuery is performed either at load time in an instead-oftrigger or in the views when data is accessed
 PostActions to collect SQL Texts and Query Plans
8 | PASS SQL Saturday Prague - 2016
XEvents Monitoring – Overview
Events Database
Job Server running
PowerShell scripts
Database Servers
9 | PASS SQL Saturday Prague - 2016
DBAs
Scenario I: Timeout
 An application is getting a command timeout in
AdventureWorks. What is going on?
 Use ExtendedEvents.AbortedExecutions to see the aborted
query
 See how the columns statement_last, statement, and
query_plan are available, even though they are not in the
events (this information is coming from
sys.dm_exec_sql_text)
 If the wait_type = LCK_* it is waiting for locks (it is being
blocked) and we can use ExtendedEvents.BlockedProcesses
to see who the blocker is
 We can also see the root blocker
10 | PASS SQL Saturday Prague - 2016
Scenario II: Deadlocks
 An application is getting a deadlock in AdventureWorks
 Use ExtendedEvents.Deadlocks to see the deadlock graph
 The deadlock graph is parsed in
ExtendedEvents.DeadlockProcesses and
ExtendedEvents.DeadlockResources
 We can also see the “Transaction was deadlocked on lock
resources with another process and has been chosen as the
deadlock victim.” errors in ExtendedEvents.Errors
11 | PASS SQL Saturday Prague - 2016
Scenario III: Errors
 An application is inserting data in a batch and are getting
“String or binary data would be truncated.” - errors
 Use ExtendedEvents.Errors to see the errors
 We can see the statement, but we want to see the actual
values that the application tried to insert
 We can add the action sql_text to get the input buffer
 Can generate very large amount of event data in short time, if
there is a batch with many errors
 There will be one event for each error, and the sql_text of
each event will have the complete batch
12 | PASS SQL Saturday Prague - 2016
The session database_health
 SQL Server comes with a system_health default session that
contains a lot of useful information
 We have created a company default extended event session
that is running on all servers (database_health)
 Different thresholds on different servers (higher duration
thresholds on OLTP - servers, than data warehouse - servers)
 Running with target Event File
13 | PASS SQL Saturday Prague - 2016
How an event is traveling - I




An event passes the predicate evaluation (filters)
Additional information (Actions) is collected (e.g. session_id)
The event is buffered to the memory buffers
The event is written to an event file (default 30 seconds
latency)
14 | PASS SQL Saturday Prague - 2016
How an event is traveling - II
 A job runs a PowerShell script on the job server (every 30
seconds)
 The script is querying sys.fn_xe_file_target_read_file
 The first time it is getting all events from the files
 After that it is passing the last file name and file offset, that it
has in its events database (so getting only new events)
 The events are inserted into the events database
 An instead-of-trigger is fired
 The trigger is extracting out the most important elements and
attributes using XQuery, and also does some data type
conversions
15 | PASS SQL Saturday Prague - 2016
How an event is traveling - III
 The PowerShell script is now collecting SQL Texts and Query
Plans (PostActions)
 Joins and additional logic (and sometimes more XQuery) in
views
16 | PASS SQL Saturday Prague - 2016
Latency
Production
Demo
MAX_DISPATCH_LATENCY
30 s
1s
PowerShell Job Schedule
30 s
10 s
<1s
<1s
≈ 60 s
≈ 11 s
Time to read and insert
events
Total
17 | PASS SQL Saturday Prague - 2016
Where to do the XQuery?
 Extracting out elements and attributes into its own columns at
load time in an instead-of-trigger is optimal for query
performance, but has a cost in load performance and storage
 Doing the XQuery in the views is optimal for load performance
and storage, but has a cost in query performance
 The attribute timestamp has to be extracted out at load time
(as you want to be able to look at the latest events fast)
 In general try to avoid queries that need to do XQuery on
large number of events
 When the performance for a query is not acceptable, then it is
time to move some of the elements or attributes to its own
columns
18 | PASS SQL Saturday Prague - 2016
Blocking
 The blocked_process_report event is very useful when
investigating blocking problems
 The event is only triggered if ‘blocked process threshold’ has
been enabled on the server
 It should not be set lower than 5 seconds
 Handled by the same thread in SQL Server that is searching
for deadlocks
 A blocked_process_report event has always one blocked and
one blocking process
 Every time the thread wakes up and is looking for blocking it
has a new monitor_loop_id (filter on monitor_loop_id to get a
snapshot of the blocking)
<blocked-process-report monitorLoop="1369">
19 | PASS SQL Saturday Prague - 2016
Blocking – Using the Execution Stack
 The executionStack in the blocked_process_report can be
used to see which stored procedures and statements that are
involved
 The first frame is always the inner statement
20 | PASS SQL Saturday Prague - 2016
Getting SQL Texts
 To get an SQL Text you need an sql_handle
 The handle can be used in sys.dm_exec_sql_text to get the text
 If you also have a start_offset and an end_offset you can extract
the statement from the text
 The sql_handle is a hash of the text
 The sql_handle and offsets are available in the action
tsql_frame (various events), the executionStack
(blocked_process_report and xml_deadlock_report), and also in
DMVs like sys.dm_exec_requests
 By storing the text with the handle, the next time an event
comes with the same handle you don’t need to go and get it (as
you already have it)
21 | PASS SQL Saturday Prague - 2016
Getting Query Plans – The plan_handle
 The action plan_handle is a “A token that refers to the compiled
plan that the query is part of.”
 The plan_handle can be used in sys.dm_exec_text_query_plan
to get the query plan
 The problem is that a query plan can change while keeping the
same plan_handle
22 | PASS SQL Saturday Prague - 2016
Getting Query Plans – Statement - level
 In events like wait_info and sp_statement_completed, and
DMVs like sys.dm_exec_requests you have this information
available:
plan_handle
start_offset
end_offset
query_hash
query_plan_hash
 You can store this information with the plan and the next time
you come across the same combination, you don’t need to get
the plan (as you already have it)
23 | PASS SQL Saturday Prague - 2016
Getting Query Plans – Module - level
 When you only have a plan_handle (like in module_end) then
you need to go out and get the plan fast
 You should also verify that there hasn’t been a recompile after
the event (as it is then not the right plan). You can do that like
this:
WHERE NOT EXISTS(SELECT * FROM
sys.dm_exec_query_stats WHERE plan_handle =
@plan_handle AND creation_time > @timestamp)
24 | PASS SQL Saturday Prague - 2016
High frequency polling of events
Things to consider when you are polling for new events frequently
using sys.fn_xe_file_target_read_file:
1. Use small files! It is faster to query a small file than a large file
(even if you specify a file name and a file offset)
2. Only specify a wildcard in the [path] when there has been a file
rollover (check the current file in sys.dm_xe_session_targets)!
When you specify a wildcard, then SQL Server will access all
files (even if you specify a file name and a file offset)
25 | PASS SQL Saturday Prague - 2016
The offset is invalid for log file …
 When you are querying sys.fn_xe_file_target_read_file with a
file name and a file offset it can happen that you get an error like
this:
“The offset 2394624 is invalid for log file
"...\database_health_0_130903806628660001.xel". Specify an
offset that exists in the log file and retry your query.”
 You get this error if all the files have been rolled over since you
read events the last time (so the file name you come with no
longer exists)
 This is if there has been very large number of events generated
in short time or if the monitoring solution has been down
 Increasing the number of files reduces the risk of this happening
26 | PASS SQL Saturday Prague - 2016
Questions?
 The code is available at
https://ola.hallengren.com/scripts/PerformanceStore.zip
 You can contact me at [email protected]
27 | PASS SQL Saturday Prague - 2016
Please fill in the evaluation forms
28 | PASS SQL Saturday Prague - 2016