Download Index Fragmentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Team Foundation Server wikipedia , lookup

Relational model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Transcript
Index Fragmentation
Interactive workload test: Using a 2TB masked database to determine the effect that heavy index fragmentation
has on performance.
Sean Long
Enterprise Performance Team
Summary.................................................................................................................................................. 1
How to Use This Report ........................................................................................................................ 1
Investigation Details ............................................................................................................................... 1
Conclusion ............................................................................................................................................... 5
Next Steps ............................................................................................................................................... 5
Appendix A - Setup ................................................................................................................................ 6
Appendix B - Glossary ........................................................................................................................... 7
Summary
Index maintenance is a key part of SQL database management, as SQL Server relies heavily on indexes to find data and
over time indexes can degrade through a process known as index fragmentation. Recommendations are readily
available about how best to defragment your indexes to maintain performance, but until now there was no information
on what effect index fragmentation has on a Blackbaud CRM database specifically.
To measure the effect, the Enterprise Performance Team ran a known realistic workload against a BBCRM database that
was known to be fragmented, then defragmented the database and ran the workload again. During both runs,
information was gathered about end-user response times as well as system resource utilization.
We found that substantial performance gains can be made by rebuilding indexes and updating statistics. This is evident
by looking at the reduced disk I/O, lower end-user response times, and an increase in the amount of work SQL Server
can handle.
How to Use This Report
This report is intended to offer both a simple overview and in depth explanation of the concepts mentioned above. As
such, a number of terms that need clarification are defined in a Glossary section at the end of the document. When
defined terms are introduced, they will appear in bold italics. In addition, certain areas have links to additional
references produced by both the Enterprise Performance Team and outside references. For ease of use, links to
resources will be blue and underlined.
Investigation Details
We ran two test runs using a realistic interactive workload, using the test rig described in Appendix A. Each test run
mimicked 400 end users completing manual tasks using the BBCRM user interface, and ran for a total of an hour. To
control for as many outside influences as possible, we reset the caches commonly used by BBCRM, as well as using an
isolated test environment. The first run used a database that had high fragmentation on over 7000 tables, as a result of
several months of development efforts. We then ran Ola Hellegren’s index maintenance script to update statistics and
rebuild all indexes with fragmentation greater than 30%. For indexes that had fragmentation between 5% and 30%, we
chose to reorganize instead of rebuild as this is the preferred way to handle moderate amounts of index fragmentation.
Ola’s script was chosen as the basis for this maintenance for several reasons:




Industry Standard – The script is highly recommended almost universally in the SQL Server community, notably
by many speakers at PASS Summits.
Ease of use – The entire script is very well documented and easy to implement on a variety of SQL Server
versions.
Used internally – The maintenance solution is recommended by our Professional Services personnel as well as
used by SDO.
High level of control – We are easily able to define what a heavily fragmented index is and how to handle it.
The maintenance process took approximately 11 hours to rebuild the fragmented indexes on the 2TB masked database.
This resulted in notably less fragmentation on all of the tables with greater than 1000 pages, which was in line with the
settings we used in the maintenance solution. In addition, as the database statistics were also out of date and indexes
were rebuilt, we updated the statistics as a part of the maintenance.
After index maintenance finished, we cleared the appropriate caches and ran the same test run again.
Overall, we saw a performance improvement in Page Response Time by an average of 16%. We expect a 4% variation
between test runs from noise, so the improvement is significant.
Below is a chart showing the 5 most commonly called web pages accessed by the workload. As the specific page being
referenced is the same in each of the selected item, I’ve broken them down by the test that these were a part of. In each
example, we saw an improvement to requests to the UIModelingService.ashx page which is a page very frequently used
to construct the interface that the end user will see. It was selected as it shows a large amount of improvement and is
the most frequently accessed page.
These are http web requests that are used with the ui modeling service to craft the interface that a user will see. As
these are very frequent requests, I’ve broken them down with further analysis based on the test that the request is
associated with.



Constituent_Interaction_Edit – This request makes up a total of 3.5% of all pages accessed during the test run
and is made around 27% more efficient, from 26 mSec to 19 mSec per request. The improvement was 7 mSec
per request, with 36047 times being made over the entire hour and resulting in a total of 261 seconds saved.
Household_Edit – This request makes up a total of 6.3% of all pages accessed during the test run and we saw a
21% improvement, from 25 Msec to 20 mSec per request. The improvement was 5 mSec per request, with
63524 requests being made over the entire hour and resulting in a total of 340 seconds saved.
Constituent_Solicit_Code_Add – This request makes up a total of 6.3% of all pages accessed during the test run
and we saw a 25% improvement, from 25 mSec to 19 msec. The improvement was 6 mSec per request, with
65617 requests being made over the entire hour and resulting in a total of 408 seconds saved.


Constituent_Edit_Address –This request makes up a total of 7.8% of all pages accessed during the test run and
we saw a 24% improvement, from 30 mSec to 23 mSec. The improvement was 7 mSes per request, with 78788
requests being made over the entire hour and resulting in a total of 551 seconds saved.
Constituent_NameFormat_Edit – This request makes up a total of 15.3% of all pages accessed during the test
run and we saw a 25% improvement, from 27 mSec to 20 mSec. The improvement was 7 mSec per request, with
156250 requests being made over the entire hour and resulting in 1037 seconds saved.
An improvement of around 23% in average response times was seen due to the defragmentation of the indexes for the 5
most frequently accessed pages. This is despite an increase in the number of times these requests were made. When
you add up all of the time that was saved by these you get 2597 seconds, which is around 43 minutes of time that the
servers can now spend processing other requests.
In addition, we saw that the rate of work submitted to the system increased overall by 4%. Due to the way our workload
functions and the level of work we have set as a goal for the workload, when tasks complete faster more tasks will be
attempted (This is not ideal behavior and will be addressed in future iterations of the workload). It is important to note
that while we generally saw an increase in the number of tasks run, this is not the case for the group of tests called Init
tests. These tests are used by the workload to log users in for testing purposes, which is not reflective of how users
behave when using the system and thus a decrease in the amount of these tests being run is not indicative of
meaningful change within BBCRM.
Why did the system run faster using less fragmented indexes? Indexes are used to quickly find data in tables, but when
the indexes are fragmented, more pages must be accessed when using the index. This indirectly results in more reads
from the physical disk. We can measure the cumulative disk activity using the performance counter disk read bytes/sec
for the data drive. It is illustrated in the chart below. There is about a 13% reduction in disk reads, even though the
system is supporting about 4% more activity.
By looking at the page response time data, it is apparent that not all page responses saw an improvement. There were a
few pages that saw an increase in the amount of time each request took. This is due to the noise within our workload
created by simple randomness and chance and the comparatively few requests these are (in total they make up less
than 1% of total requests) they don’t have an impact on this analysis. They are mentioned here only as context.
Conclusion
Rebuilding indexes and updating statistics can result in an improvement to response time, disk usage, and total test
time. This has been shown using performance counter data and response time data. A good SQL Server maintenance
plan should thus include a plan for measuring and maintaining indexes, as this is a proactive way to reduce
fragmentation and improve performance in BBCRM.
Next Steps
It is our recommendation that index monitoring and maintenance be made easier to do, preferably included in the core
product. As this is a task that every DBA should be including and BBCRM uses a single database for all transactions,
placing monitoring and maintenance in the product would increase the ease with which our clients maintain their
software. SDO will still likely want to fold index maintenance in with other database maintenance tasks such as backups
and thus may not take full advantage of such internal tools, but the benefits to our on-premise clients is great.
A potentially useful follow-up experiment would be to see how long it take indexes in production systems to become
fragmented enough to need defragmenting. This would give an idea of how frequently to run the process and allow us
to give better guidance to all clients and SDO.
Appendix A - Setup
Performance Rig 2
o




System Under Test
 PerfSQL02 – The server that runs SQL Server 2008 R2 and processes requests.
 2x Xeon E5-2650 @ 2.0 GHz (Limited to 16 cores)
 96GB RAM
 Windows Server 2008 R2
 SQL Server 2008 R2 SP2 Enterprise Edition
 PerfIIS06 & PerfIIS07 – The web servers that handle requests from the end users and transfer
data from the SQL Server to the end user.
 Virtual Servers
 2x Intel Xeon E5-2660 @ 2.20GHz (4 cores)
 8GB RAM
 Windows Server 2008 R2
 Load Balanced using Windows Native Load Balancer, 50/50 split
 PTLReportSVR01 – The server that generates reports for the end users. This is referenced by
Constituent_History_Report and Itinerary_Report
 Virtual Server
 Intel Xeon E5-2665 @ 2.4GHz (4 cores)
 4GB RAM
 Windows Server 2008 R2
 SQL Server Reporting Services
Heifer Interactive 400 Workload
o 400 simulated users
o Standard mix of interactive tests
o Constant load patterns
o 5 minute warm up time
Large Masked DB
o Heifer Masked Database
Tools used for Measurement
o Standard Performance Counter sets for load testing
Pre-test steps
o Performed IIS reset on both IIS servers
o Created a checkpoint, cleared process cache and dropped clean buffers on SQL Server
Appendix B - Glossary








Cache - A collection of items that are stored for easier future use. In Blackbaud CRM, many items (like
permission to view data) are put into faster storage to make retrieval faster.
Database Page – The most basic level of storage in SQL Server, where data is stored. These are created when
data is added to the table and more space is needed to contain that data, and SQL Server has to access a
complete page to get to the data contained on it. This can lead to additional reads from storage (either memory
or physical disk) if the data isn’t contained efficiently within a small number of pages (as can happen when
fragmentation occurs on indexes).
Data Repository – A place where information on the tests are stored for future reference. The information may
be gathered from many sources and is placed in a single place for easier retrieval and reference. In Visual Studio
load testing, this is where the information for reporting is stored. Specifically, the information is located on
PTLR84 in the database Loadtest_Perf.
Database Statistics – SQL server maintains information that helps it understand where specific data is stored
and the size and makeup of tables. It uses this information to determine the most efficient way to retrieve data.
For example, a statistic might be that a particular table has only 15 rows. When SQL server sees this it would
know that it would be more efficient to look through the entire table for the needed information rather than
looking in an index.
DBA – Database Administrator, which is the person responsible for the maintenance and operation of SQL
Server. In general, this person is responsible for the security, performance and integrity of the database and
often multiple databases.
Defragment – The process by which data is organized into patterns that are easier to look through. For example,
a disk defragmentation process puts data in linear order so that the hardware can find groups of data faster.
Disk I/O – Input/Output, or the amount of data being transferred to or from the hard drive (or hard drives). Disk
I/O commonly refers to the amount that SQL Server goes to the hard drive (rather than to memory). It can refer
to either a specific numerical piece of information (like the number of times SQL Server reads from the hard
drive) or a more qualitative description (“high overall disk I/O” would mean that the hard drive has a high
amount of reads and writes).
Disk Read Bytes/sec – A numerical performance counter that describes the total number of bytes retrieved
from the hard drive over the period of a second. In our investigation, this counter was sampled every 10
seconds, so the number is the average of 10 1-second periods. For example, take the following distribution:
o 00:00:01 – 500 bytes read
o 00:00:02 – 1000 bytes read
o 00:00:03 – 500 bytes read
o 00:00:04 – 1000 bytes read
o 00:00:05 – 500 bytes read
o 00:00:06 – 1000 bytes read
o 00:00:07 – 500 bytes read
o 00:00:08 – 1000 bytes read
o 00:00:09 – 500 bytes read
o 00:00:10 – 1000 bytes read
o









For each 1-second interval the average is as recorded above, but in our report it would be an average of
all of these values, so 750 bytes read per second on average.
End-user Response Time – A numerical measurement describing the amount of time that passes before
something is presented to the entity making the request.
Index – An object within a SQL Server database that contains references to data on a table, organized in a
specific way to allow for faster retrieval of the data. A table can have multiple indexes organized in different
ways to provide easier access to different data, and SQL Server will intelligently pick the best index for what it
needs to find.
HTTP Web Requests – All communication between a server and a client workstation is done through a series of
signals and responses, which are called requests. HTTP is the foundation of these, and there are multiple
different types of requests. Most frequent and referenced in testing documentation are:
o GET – Retrieves specified data or an object.
o POST – Asks that the server accept a particular piece of data or an object.
Index Fragmentation – As data is added to a table, it is added wherever SQL Server can find the room on the
appropriate indexes where it will be needed. As well, sometimes when data is added to a table it will not be
added in a continuous way to the appropriate indexes. When this happens, we consider the indexes to be
fragmented and this is usually reported as a percentage. This percentage describes how much data out of the
full amount of data on that index is out of order and, thus, needs to be fixed.
Index Maintenance – The process and concept of making adjustments to indexes within a database periodically.
As database indexes can go out of order from time to time (see Index Fragmentation above) it is important to
keep these adjusted periodically. This term usually refers to the specific tasks of rebuilding or reorganizing
indexes and keeping track of overall index fragmentation.
Init Tests – A group of specific scripts that are used to create pools of users for the work load to use when
completing tasks. In general, these run at the beginning of a task during the workload and provide a user context
for that task to be completed under. In the context of the Performance Team’s Workload, these tasks will allow
enough users to be logged in to complete the specified number of tasks. If the workload will need more users to
accomplish the number of tasks (for example, if the tasks are slow to complete) more users will be required and
more Init Tests will be shown as running as more users are logged in. If the system is efficient, less Init Tests may
be run.
Interactive – Actions taken in BBCRM that are completed by a user sitting at a workstation rather than an
automated process or mechanical means. For example, opening a Constituent and changing the Address
through after logging in is a manual task that would be included in a set of Interactive tests. Importing
information into a batch and committing the batch after hours is an automated process that would not be
included in a set of Interactive tests.
Isolated – Removed from outside influences. The Enterprise Performance Team’s testing servers are on a
separated section of the Blackbaud network that does not have any traffic on it which isn’t part of the tests. As
well, they have been secured so that only members of the Enterprise Performance Team can use it, so that there
is minimal chance of randomness in the recorded numbers.
Noise – Improvements or declines of performance in a test run due to randomness caused by the testing
process itself. This could be things like a small number of times a page is accessed leading to an improper














skewing of an average number, or other uncontrolled and unobserved variables unrelated to what is being
considered.
On-Premise Clients – An organization that provides its own servers and personnel to run Blackbaud CRM, as
opposed to clients that use our application hosting services.
Page Response Time – The amount of time it takes for a particular URL to load. The URL is generally associated
with the time, and it is the time of the full trip to and from the web server to get the information from that
specific request.
Performance Counter – An object that measures usage of an operating system, application, driver or piece of
hardware. Counters are numerical in nature (% CPU Utilization or Disk Read Bytes per Second) and are used in
analysis and observation of the various servers and SQL Server itself to form conclusions about what work is
occurring and how that work is affecting the server.
Realistic workload – The Enterprise Performance Team has been dedicated to determining what makes up a
usage pattern for a client that represents a large percentage of our usage scenarios. In other words, we need to
know how our clients use the software. The result of this is a set of tests that simulate a specific way that clients
use the system. When these tests are run together, we are able to use this as realistic workload. It is a pattern of
usage that models observed client use.
Rebuilding Indexes – A process for removing fragmentation from an index by recreating it completely, using all
of the available data from the table the index references. This is recommended when indexes are more than
30% fragmented.
Reorganizing Indexes – A process of taking data that already exists on an index and changing the position to
make it easier to read by SQL Server.
SDO – Service Delivery Organization, the group that manages the environment that runs BBCRM for clients that
choose to allow us to host the software.
Test Run – The execution of a group of tasks in a specific order for a specific amount of time. This generally
refers to tasks involved in creating a desired usage pattern.
Test Rig – A group of servers involved in running and recording a load test.
Total Test Time – The time, in seconds, it takes for a virtual user to complete a single test, which is a logical
group of tasks (like editing a household).
UI Modeling Service – This is an object that accepts information and provides data and constructs the interface
that will be used to interact with the data.
Updating Statistics – The process of changing the values stored by SQL Server that help it make decisions on
how to find certain pieces of data. This is part of recommended SQL Server maintenance.
Web Test Framework – A set of code, tools and objects that allows us to create ways to mimic user actions
within BBCRM. We have used this as a foundation to create patterns of use.
Work – A generic term describing what SQL Server does. It refers to the amount of use that the hardware is
seeing and can be used to help describe how much activity is occurring on SQL Server.