Download Index Fragmentation

Index Fragmentation Interactive workload test: Using a 2TB masked database to determine the effect that heavy index fragmentation has on performance. Sean Long Enterprise Performance Team Summary.................................................................................................................................................. 1 How to Use This Report ........................................................................................................................ 1 Investigation Details ............................................................................................................................... 1 Conclusion ............................................................................................................................................... 5 Next Steps ............................................................................................................................................... 5 Appendix A - Setup ................................................................................................................................ 6 Appendix B - Glossary ........................................................................................................................... 7 Summary Index maintenance is a key part of SQL database management, as SQL Server relies heavily on indexes to find data and over time indexes can degrade through a process known as index fragmentation. Recommendations are readily available about how best to defragment your indexes to maintain performance, but until now there was no information on what effect index fragmentation has on a Blackbaud CRM database specifically. To measure the effect, the Enterprise Performance Team ran a known realistic workload against a BBCRM database that was known to be fragmented, then defragmented the database and ran the workload again. During both runs, information was gathered about end-user response times as well as system resource utilization. We found that substantial performance gains can be made by rebuilding indexes and updating statistics. This is evident by looking at the reduced disk I/O, lower end-user response times, and an increase in the amount of work SQL Server can handle. How to Use This Report This report is intended to offer both a simple overview and in depth explanation of the concepts mentioned above. As such, a number of terms that need clarification are defined in a Glossary section at the end of the document. When defined terms are introduced, they will appear in bold italics. In addition, certain areas have links to additional references produced by both the Enterprise Performance Team and outside references. For ease of use, links to resources will be blue and underlined. Investigation Details We ran two test runs using a realistic interactive workload, using the test rig described in Appendix A. Each test run mimicked 400 end users completing manual tasks using the BBCRM user interface, and ran for a total of an hour. To control for as many outside influences as possible, we reset the caches commonly used by BBCRM, as well as using an isolated test environment. The first run used a database that had high fragmentation on over 7000 tables, as a result of several months of development efforts. We then ran Ola Hellegren’s index maintenance script to update statistics and rebuild all indexes with fragmentation greater than 30%. For indexes that had fragmentation between 5% and 30%, we chose to reorganize instead of rebuild as this is the preferred way to handle moderate amounts of index fragmentation. Ola’s script was chosen as the basis for this maintenance for several reasons:     Industry Standard – The script is highly recommended almost universally in the SQL Server community, notably by many speakers at PASS Summits. Ease of use – The entire script is very well documented and easy to implement on a variety of SQL Server versions. Used internally – The maintenance solution is recommended by our Professional Services personnel as well as used by SDO. High level of control – We are easily able to define what a heavily fragmented index is and how to handle it. The maintenance process took approximately 11 hours to rebuild the fragmented indexes on the 2TB masked database. This resulted in notably less fragmentation on all of the tables with greater than 1000 pages, which was in line with the settings we used in the maintenance solution. In addition, as the database statistics were also out of date and indexes were rebuilt, we updated the statistics as a part of the maintenance. After index maintenance finished, we cleared the appropriate caches and ran the same test run again. Overall, we saw a performance improvement in Page Response Time by an average of 16%. We expect a 4% variation between test runs from noise, so the improvement is significant. Below is a chart showing the 5 most commonly called web pages accessed by the workload. As the specific page being referenced is the same in each of the selected item, I’ve broken them down by the test that these were a part of. In each example, we saw an improvement to requests to the UIModelingService.ashx page which is a page very frequently used to construct the interface that the end user will see. It was selected as it shows a large amount of improvement and is the most frequently accessed page. These are http web requests that are used with the ui modeling service to craft the interface that a user will see. As these are very frequent requests, I’ve broken them down with further analysis based on the test that the request is associated with.    Constituent_Interaction_Edit – This request makes up a total of 3.5% of all pages accessed during the test run and is made around 27% more efficient, from 26 mSec to 19 mSec per request. The improvement was 7 mSec per request, with 36047 times being made over the entire hour and resulting in a total of 261 seconds saved. Household_Edit – This request makes up a total of 6.3% of all pages accessed during the test run and we saw a 21% improvement, from 25 Msec to 20 mSec per request. The improvement was 5 mSec per request, with 63524 requests being made over the entire hour and resulting in a total of 340 seconds saved. Constituent_Solicit_Code_Add – This request makes up a total of 6.3% of all pages accessed during the test run and we saw a 25% improvement, from 25 mSec to 19 msec. The improvement was 6 mSec per request, with 65617 requests being made over the entire hour and resulting in a total of 408 seconds saved.   Constituent_Edit_Address –This request makes up a total of 7.8% of all pages accessed during the test run and we saw a 24% improvement, from 30 mSec to 23 mSec. The improvement was 7 mSes per request, with 78788 requests being made over the entire hour and resulting in a total of 551 seconds saved. Constituent_NameFormat_Edit – This request makes up a total of 15.3% of all pages accessed during the test run and we saw a 25% improvement, from 27 mSec to 20 mSec. The improvement was 7 mSec per request, with 156250 requests being made over the entire hour and resulting in 1037 seconds saved. An improvement of around 23% in average response times was seen due to the defragmentation of the indexes for the 5 most frequently accessed pages. This is despite an increase in the number of times these requests were made. When you add up all of the time that was saved by these you get 2597 seconds, which is around 43 minutes of time that the servers can now spend processing other requests. In addition, we saw that the rate of work submitted to the system increased overall by 4%. Due to the way our workload functions and the level of work we have set as a goal for the workload, when tasks complete faster more tasks will be attempted (This is not ideal behavior and will be addressed in future iterations of the workload). It is important to note that while we generally saw an increase in the number of tasks run, this is not the case for the group of tests called Init tests. These tests are used by the workload to log users in for testing purposes, which is not reflective of how users behave when using the system and thus a decrease in the amount of these tests being run is not indicative of meaningful change within BBCRM. Why did the system run faster using less fragmented indexes? Indexes are used to quickly find data in tables, but when the indexes are fragmented, more pages must be accessed when using the index. This indirectly results in more reads from the physical disk. We can measure the cumulative disk activity using the performance counter disk read bytes/sec for the data drive. It is illustrated in the chart below. There is about a 13% reduction in disk reads, even though the system is supporting about 4% more activity. By looking at the page response time data, it is apparent that not all page responses saw an improvement. There were a few pages that saw an increase in the amount of time each request took. This is due to the noise within our workload created by simple randomness and chance and the comparatively few requests these are (in total they make up less than 1% of total requests) they don’t have an impact on this analysis. They are mentioned here only as context. Conclusion Rebuilding indexes and updating statistics can result in an improvement to response time, disk usage, and total test time. This has been shown using performance counter data and response time data. A good SQL Server maintenance plan should thus include a plan for measuring and maintaining indexes, as this is a proactive way to reduce fragmentation and improve performance in BBCRM. Next Steps It is our recommendation that index monitoring and maintenance be made easier to do, preferably included in the core product. As this is a task that every DBA should be including and BBCRM uses a single database for all transactions, placing monitoring and maintenance in the product would increase the ease with which our clients maintain their software. SDO will still likely want to fold index maintenance in with other database maintenance tasks such as backups and thus may not take full advantage of such internal tools, but the benefits to our on-premise clients is great. A potentially useful follow-up experiment would be to see how long it take indexes in production systems to become fragmented enough to need defragmenting. This would give an idea of how frequently to run the process and allow us to give better guidance to all clients and SDO. Appendix A - Setup Performance Rig 2 o     System Under Test  PerfSQL02 – The server that runs SQL Server 2008 R2 and processes requests.  2x Xeon E5-2650 @ 2.0 GHz (Limited to 16 cores)  96GB RAM  Windows Server 2008 R2  SQL Server 2008 R2 SP2 Enterprise Edition  PerfIIS06 & PerfIIS07 – The web servers that handle requests from the end users and transfer data from the SQL Server to the end user.  Virtual Servers  2x Intel Xeon E5-2660 @ 2.20GHz (4 cores)  8GB RAM  Windows Server 2008 R2  Load Balanced using Windows Native Load Balancer, 50/50 split  PTLReportSVR01 – The server that generates reports for the end users. This is referenced by Constituent_History_Report and Itinerary_Report  Virtual Server  Intel Xeon E5-2665 @ 2.4GHz (4 cores)  4GB RAM  Windows Server 2008 R2  SQL Server Reporting Services Heifer Interactive 400 Workload o 400 simulated users o Standard mix of interactive tests o Constant load patterns o 5 minute warm up time Large Masked DB o Heifer Masked Database Tools used for Measurement o Standard Performance Counter sets for load testing Pre-test steps o Performed IIS reset on both IIS servers o Created a checkpoint, cleared process cache and dropped clean buffers on SQL Server Appendix B - Glossary         Cache - A collection of items that are stored for easier future use. In Blackbaud CRM, many items (like permission to view data) are put into faster storage to make retrieval faster. Database Page – The most basic level of storage in SQL Server, where data is stored. These are created when data is added to the table and more space is needed to contain that data, and SQL Server has to access a complete page to get to the data contained on it. This can lead to additional reads from storage (either memory or physical disk) if the data isn’t contained efficiently within a small number of pages (as can happen when fragmentation occurs on indexes). Data Repository – A place where information on the tests are stored for future reference. The information may be gathered from many sources and is placed in a single place for easier retrieval and reference. In Visual Studio load testing, this is where the information for reporting is stored. Specifically, the information is located on PTLR84 in the database Loadtest_Perf. Database Statistics – SQL server maintains information that helps it understand where specific data is stored and the size and makeup of tables. It uses this information to determine the most efficient way to retrieve data. For example, a statistic might be that a particular table has only 15 rows. When SQL server sees this it would know that it would be more efficient to look through the entire table for the needed information rather than looking in an index. DBA – Database Administrator, which is the person responsible for the maintenance and operation of SQL Server. In general, this person is responsible for the security, performance and integrity of the database and often multiple databases. Defragment – The process by which data is organized into patterns that are easier to look through. For example, a disk defragmentation process puts data in linear order so that the hardware can find groups of data faster. Disk I/O – Input/Output, or the amount of data being transferred to or from the hard drive (or hard drives). Disk I/O commonly refers to the amount that SQL Server goes to the hard drive (rather than to memory). It can refer to either a specific numerical piece of information (like the number of times SQL Server reads from the hard drive) or a more qualitative description (“high overall disk I/O” would mean that the hard drive has a high amount of reads and writes). Disk Read Bytes/sec – A numerical performance counter that describes the total number of bytes retrieved from the hard drive over the period of a second. In our investigation, this counter was sampled every 10 seconds, so the number is the average of 10 1-second periods. For example, take the following distribution: o 00:00:01 – 500 bytes read o 00:00:02 – 1000 bytes read o 00:00:03 – 500 bytes read o 00:00:04 – 1000 bytes read o 00:00:05 – 500 bytes read o 00:00:06 – 1000 bytes read o 00:00:07 – 500 bytes read o 00:00:08 – 1000 bytes read o 00:00:09 – 500 bytes read o 00:00:10 – 1000 bytes read o          For each 1-second interval the average is as recorded above, but in our report it would be an average of all of these values, so 750 bytes read per second on average. End-user Response Time – A numerical measurement describing the amount of time that passes before something is presented to the entity making the request. Index – An object within a SQL Server database that contains references to data on a table, organized in a specific way to allow for faster retrieval of the data. A table can have multiple indexes organized in different ways to provide easier access to different data, and SQL Server will intelligently pick the best index for what it needs to find. HTTP Web Requests – All communication between a server and a client workstation is done through a series of signals and responses, which are called requests. HTTP is the foundation of these, and there are multiple different types of requests. Most frequent and referenced in testing documentation are: o GET – Retrieves specified data or an object. o POST – Asks that the server accept a particular piece of data or an object. Index Fragmentation – As data is added to a table, it is added wherever SQL Server can find the room on the appropriate indexes where it will be needed. As well, sometimes when data is added to a table it will not be added in a continuous way to the appropriate indexes. When this happens, we consider the indexes to be fragmented and this is usually reported as a percentage. This percentage describes how much data out of the full amount of data on that index is out of order and, thus, needs to be fixed. Index Maintenance – The process and concept of making adjustments to indexes within a database periodically. As database indexes can go out of order from time to time (see Index Fragmentation above) it is important to keep these adjusted periodically. This term usually refers to the specific tasks of rebuilding or reorganizing indexes and keeping track of overall index fragmentation. Init Tests – A group of specific scripts that are used to create pools of users for the work load to use when completing tasks. In general, these run at the beginning of a task during the workload and provide a user context for that task to be completed under. In the context of the Performance Team’s Workload, these tasks will allow enough users to be logged in to complete the specified number of tasks. If the workload will need more users to accomplish the number of tasks (for example, if the tasks are slow to complete) more users will be required and more Init Tests will be shown as running as more users are logged in. If the system is efficient, less Init Tests may be run. Interactive – Actions taken in BBCRM that are completed by a user sitting at a workstation rather than an automated process or mechanical means. For example, opening a Constituent and changing the Address through after logging in is a manual task that would be included in a set of Interactive tests. Importing information into a batch and committing the batch after hours is an automated process that would not be included in a set of Interactive tests. Isolated – Removed from outside influences. The Enterprise Performance Team’s testing servers are on a separated section of the Blackbaud network that does not have any traffic on it which isn’t part of the tests. As well, they have been secured so that only members of the Enterprise Performance Team can use it, so that there is minimal chance of randomness in the recorded numbers. Noise – Improvements or declines of performance in a test run due to randomness caused by the testing process itself. This could be things like a small number of times a page is accessed leading to an improper               skewing of an average number, or other uncontrolled and unobserved variables unrelated to what is being considered. On-Premise Clients – An organization that provides its own servers and personnel to run Blackbaud CRM, as opposed to clients that use our application hosting services. Page Response Time – The amount of time it takes for a particular URL to load. The URL is generally associated with the time, and it is the time of the full trip to and from the web server to get the information from that specific request. Performance Counter – An object that measures usage of an operating system, application, driver or piece of hardware. Counters are numerical in nature (% CPU Utilization or Disk Read Bytes per Second) and are used in analysis and observation of the various servers and SQL Server itself to form conclusions about what work is occurring and how that work is affecting the server. Realistic workload – The Enterprise Performance Team has been dedicated to determining what makes up a usage pattern for a client that represents a large percentage of our usage scenarios. In other words, we need to know how our clients use the software. The result of this is a set of tests that simulate a specific way that clients use the system. When these tests are run together, we are able to use this as realistic workload. It is a pattern of usage that models observed client use. Rebuilding Indexes – A process for removing fragmentation from an index by recreating it completely, using all of the available data from the table the index references. This is recommended when indexes are more than 30% fragmented. Reorganizing Indexes – A process of taking data that already exists on an index and changing the position to make it easier to read by SQL Server. SDO – Service Delivery Organization, the group that manages the environment that runs BBCRM for clients that choose to allow us to host the software. Test Run – The execution of a group of tasks in a specific order for a specific amount of time. This generally refers to tasks involved in creating a desired usage pattern. Test Rig – A group of servers involved in running and recording a load test. Total Test Time – The time, in seconds, it takes for a virtual user to complete a single test, which is a logical group of tasks (like editing a household). UI Modeling Service – This is an object that accepts information and provides data and constructs the interface that will be used to interact with the data. Updating Statistics – The process of changing the values stored by SQL Server that help it make decisions on how to find certain pieces of data. This is part of recommended SQL Server maintenance. Web Test Framework – A set of code, tools and objects that allows us to create ways to mimic user actions within BBCRM. We have used this as a foundation to create patterns of use. Work – A generic term describing what SQL Server does. It refers to the amount of use that the hardware is seeing and can be used to help describe how much activity is occurring on SQL Server.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Index Fragmentation