Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Performance Matters The best scalability testing and performance optimization tips for your TARGIT solution 1 Content 3 Performance Matters 4 What impacts performance? 4 Queries 5 Hardware 6 Data structure (data model) 7 Scalability testing 7 Step 1: Select benchmark queries 9 Step 2: Capture the queries 11 Step 3: Create the scalability testing SSIS packages 14 Step 4: Results of the scalability test 15 Performance optimization steps 19 Raw cube with no optimizations (baseline) 20 Cube with aggregations 21 Cube with partitions and aggregations 22 Remodeled cube with partitions and aggregations 22 Relational data model with SQL partitions, compression, and TARGIT aggregations 25 Tabular 26 Conclusions 27 Appendix 27 Appendix 1: Performance tuning development files (zip package) 2 Performance Matters This whitepaper will describe in detail how to perform TARGIT Decision Suite scalability testing and how to use the five-step performance optimization guide. It will furthermore benchmark performance across the following BI platforms: Microsoft Analysis Services 2014 TARGIT 2014 ROLAP on SQL Server 2014 database Microsoft Tabular 2014 The goal for this whitepaper is to give insight into what impacts performance on large BI solutions and provide the tools to identify bottlenecks and improve performance. In TARGIT Consulting Services, we strive for two optimization goals when we deliver solutions: 1 ETL goals: Full load of data should take no longer than three hours to run and incremental loads (daily loads) should take no longer than one hour to run. 2 Front-end goals: The top 10 most used analyses/dashboards and reports should take no longer than 10 seconds to load. We’ll focus on the front-end goals by identifying slow performing TARGIT analyses/ reports and applying different performance tuning techniques to optimize the load time. 3 3 What impacts performance? There are many factors that impact performance, but there are three contributors that are most often the culprits: queries, hardware, and data structure (data model). Queries The primary thing to look into when a TARGIT analysis or report is performing poorly is how the query is constructed. This is the easiest place to improve performance as it doesn’t require any back-end development. There are several things to be done in the design of the queries that can make a huge impact on performance. Ensuring that queries are respecting the design of the cube and hitting the aggregations is key to getting good performance out of analysis services and ROLAP data models. Below are a few pointers to get better performing queries. Analysis Services If you have multiple attributes from the same dimension, use member properties instead of dimension attributes. Avoid nesting of dimensions if possible. If data is required from multiple dimensions, consider building them into one dimension and use the member properties option to improve performance. (This will be shown in a later example.) Be careful with large dimensions in general as they tend to scale worse than other dimensions. If a dimension has more then 1 million members, you should consider redesigning it to cut it down in size if possible. Use complex MDX with caution as it calculates results on run time. If the logics can be moved back to the ETL and processed into a physical measure, that will provide significantly better performance. 4 TARGIT ROLAP When building queries on TARGIT ROLAP data models it is important to ensure filters are applied before running queries. Because ROLAP data models are not usually aggregated as much as an analysis services cube, it’s important not to run queries that require running though the entire fact table. Filtering by a period usually does the trick. Using TARGIT ROLAP allows you to nest dimensions and work with larger dimensions as all queries from TARGIT go directly against the relational database. Hardware Hardware naturally plays a strong role when it comes to the performance of your BI solution—especially when dealing with large cubes. Analysis services and SQL Server in general use a lot of resources both when processing data and when querying so the hardware setup has to be tuned properly. The server requires fast CPU and plenty of memory, but the bottleneck is often the disk system. Queries against a large analysis services cube can produce a very large number of disk IOPS and create disk queues – which essentially means wait-time for the users. There are basically two routes to handling that problem. Solution 1: Buy a disk system that can cope with the large number of disk IOPS, or change the design or technology being used to limit the number of disk IOPS required to fetch the data. This solution can be very expensive because it typically requires a large direct disk array of multiple drives being scribed to handle large result sets on a busy TARGIT solution. For this same reason we typically do not recommend running large analysis services solutions on virtual servers, as disk operations usually suffer from being virtualized. Solution 2: There is a shortcut to improving hardware performance without having to spend a fortune on a large disk system. Place the analysis services and/or database files on local SSD drives (solid state drives)—especially if you place multiple SSD drives in a RAID 10 setup. You can improve the throughput of the disk system the same way by attaching local SSD drives to a virtual server. But rather than relying on large hardware setups it is advisable to look into how you can redesign the cube to where it requires less disk IOPS to retrieve the data. Some techniques to doing that will be discussed later in this whitepaper. 5 Data structure (data model) Proper data model design is something that has been lacking in the recent years as OLAP cubes has taken over most of our BI disciplines. However, ensuring data is stored in a simple star schema is still recommended and a good practice to ensure better performance on the data warehouse. For enterprise data warehouses, designing partitions on both the data warehouse and in the analysis services cubes is strongly recommended to make cube processing and queries run faster. Slicing data into smaller partitions means less data has to be read and less disk IOPS is required, thereby avoiding bottlenecks on the hardware. Partitioning by period is typically a good approach as most queries have filters on periods. Some BI solutions might contain dimensions that are equally as central as the time dimension however, and should also be considered for partitioning. On later versions of SQL Server, some new features have been made available with the purpose of limiting disk IOPS. One of them is table compression, which is recommended for larger fact tables. These can be scanned a lot faster when used in ROLAP data models. The ROLAP technology (building cubes in TARGIT Management) is increasingly used because it opens up for the ability to run on real-time data. It’s a strong alternative to running analysis services, as it can take the use of many of the new SQL engine features such as column store indexes and memory optimized tables (SQL 2014). 6 Finally, tabular models are starting to find their footing in the BI world and should be considered when building new solutions. This does require a completely different hardware configuration, but when properly designed, it delivers great overall performance. Scalability testing The goal for the following scalability test is to provide a method to test how well a BI solution performs when you run multiple queries against it. The test will only run a specific set of queries but simulates if five, 10, 20, or 50 users were running queries at the same time. That way it’s easy to see if your BI solution will live up to your performance goals, even when under extreme pressure. As a rule of thumb for a very active BI solution, the concurrent users on the system at any given time usually would not exceed 20 percent. So 50 concurrent users would represent a solution with at least 250 users. Step 1: Select benchmark queries The first step in the scalability test is to select a set of analyses and reports that provide a good all-round representation of what the users are requesting in the system. You can use the TARGIT logging database to find out which analyses/reports are being requested the most often, or ask the users to point out five to 10 analyses/reports that represent the data they are pulling on a daily basis from the system. 7 For this test I will select five analyses that use different ways of querying data to find some patterns in how I should design my analyses and reports in order for them to scale well. Analysis 1: Classic analysis Three objects with one dimension and one measure in each. Analysis 2: Classic analysis with filter Two objects with more details with filter on only one year’s data. Analysis 3: Comparison analysis One object with four different comparison elements. (Note that each comparison element creates a separate query.) Analysis 4: Nested dimension analysis Two objects with two small dimensions and one measure. (Note that when using multiple dimension in the same object the queries tend to not scale as well.) Analysis 5: Detailed analysis One object listing with multiple attribute from a very large dimension with more than 2.4 million members. 8 Step 2: Capture the queries Once the reports and analyses have been selected for the test, each query generated against the analysis services cube or SQL database must be captured. This is done from the SQL Profiler tool, which needs to be running while you run each of the reports/ analyses from TARGIT. SQL Profiler comes with SQL Server. It captures active queries on both SQL databases and analysis services. 1 Open the SQL Profiler and log onto the analysis servers where your cube is located: 9 2 From the Event Selection make sure Query Begin, Query End, and Query Subcube are all selected: 3 Once you run the trace and execute queries from TARGIT, you should see logs come though to the Profiler: 4 Copy the queries located under Query End and put it in a text file. Note that a report/analysis most likely will have several queries as each object and each dimension in the criteria bar will create a query on its own. A side note on the Event Subclass on the Query Subcube: This will tell you if data is fetched from cache, an aggregation, or directly from the cube files. Use this to confirm if your queries are hitting the aggregations or not. 10 Step 3: Create the scalability testing SSIS packages 1 From SQL Data tools (or Business Development Studio in versions prior to SQL 2012) create an SSIS package that includes all queries from each analysis/report in a sequence: 2 Create an OLDB connection to the analysis services cube and use a SQL task to run the queries. Even if it’s called SQL task you can use it to run MDX as long as you are pointed to a cube. 11 3 Create a SQL task for each report/analysis and place it in a container called User 01: 4 Use BixPress to apply auditing framework to the package. This will allow each step of the SSIS packages to be logged into a SQL database. From there we can see how long each query took to run and build a small ROLAP model on it so it can be analyzed from TARGIT. For more info on how to apply Auditing Framework using BIxPress please visit www.pragmaticworks.com ROLAP data model can be found in attached zip-file in Appendix 1. We now have a baseline on how long each report takes to run with one user on the system. 12 5 Go back to the SSIS package and create a new package with the same queries included. Copy the container so there are five users running the same queries: 6 Change the order of the queries so they run in a different sequence for each user. Be sure to rename the container so they are called User 01, User 02, etc. Test number two can now be compared with the baseline test. 7 Repeat the same process in SSIS so there is a package with 10, 20, and 50 concurrent users. 13 Step 4: Results of the scalibility test After running all five tests here are the results: As we see in the results above, running the same analysis from 50 concurrent users takes 34 seconds in comparision to 12 seconds when there is only one user on the system. Ecspecially the last two analysis where we are nesting dimensions and looking at large dimension, the system seems to be scaling significantly worst than the ones with only one dimension per object. There are a lot of possibilites with this type of testing, as it’s a great way to document and test if you make changes to your cubes or your hardware and want to run another test to see how it improved the performance. Note that this test used caching heavily, but you can clear the cache in between each query to get a worst case picture of the performance. Clear the cache with this query: <ClearCache xmlns="http://schemas.microsoft.com/analysisservices/2003/ engine"> <Object> <DatabaseID>Cube Name</DatabaseID> </Object> </ClearCache> Also note that this test only measures the time it takes to retreive the data from analysis services – not the time it takes for TARGIT to render it. To see that you need to use the TARGIT log tables in the auxilirary database. 14 Performance optimization steps Now that we know how to test the performance, let’s look into what actions we can take to improve the peformance of our cubes. As we saw in the results of the scalability testing, the performance issues usually happens when we use large dimension or when we nest dimensions. The following performance optimizations take those exact issues and apply different tecniques to try to resolve them. Before we get started let’s take a look at the data model we are going to work with in these examples: This demo sales database consists of a fairly large fact table and some small dimension tables as well as a large product UPC dimension table. The hardware we are running on for this demo is an Intel i7 4 Core CPU, 16 GB RAM and SSD drives for both system and data. 15 The test analysis we will be using is a text book example of an analysis that nests two dimension in the same object: Product Style and Store. A similar example would be All Items and All Customers in one object, etc. Even if this is a very relevant query for the business user, this is heavy task for analysis services to handle as it needs to scan though all possible combinations and then throw away all the empty compinations. The two objects on top show the same two dimensions on its own for comparison purposes. The idea for this test is to apply optimization methods one by one and see how well each step optimizes the speed of the query and the resources demand on the system. The following six steps will be tested: Analysis Services 1 Testing a raw cube with no optimizations (baseline) 2 Testing cube with aggregations 3 Testing cube with partitions and aggregations 4 Testing remodeled cube including partitions and aggregations 16 ROLAP: 5 Same database running from a relational datamodel built in TARGIT Management Tabular: 6 Same database running as a tabular model Logging has been enabled on the Ant server from TARGIT Management in order to track the query time and the resources used by the system: 17 Make sure an auxiliary database has been set up to capture the logs. A performance monitoring log has been set up to capture the CPU, memory, and disk IOPS on the system and log them to the same database: Avg. Disk Bytes/Read Avg. Disk Bytes/Write Avg. Disk Read Queue Length Avg. Disk Sec/Transfer Disk % Idle Time Memory> Available Mbytes CPU: % Processor Time Avg. Disk sec/Read Avg. Disk sec/Write Disk Reads/Sec Disk Writes/sec A real-time relational data model has been set up in TARGIT Management to combine data from the TARGIT log and data from the performance logging tables. A backup of the views and data model can be found in the Appendix 1. 18 Raw cube with no optimizations (baseline) Now lets run the analysis on a raw cube without any optimization on to give us a baseline from which to work. 19 The analysis took 1:26 to load and used 16% CPU and 10,160 IOPS to retreive the data from analysis services. Cube with aggregations Now lets try to add aggregations—both standard and custom: 20 The result of the query: The report now took 1:12 to run and used a little less IOPS – but not much difference. Cube with partitions and aggregations Now lets try to add partitions. Since we have four years of data and a faily large fact table it would make sense to partition it by the date dimension. I have chosen to make eight partitions with a half year of data in each. The analysis now only takes 52 seconds to run and only uses 2,492 IOPS. That’s about 1/5 of the IOPS the original raw cube required. The partition improved the performance because there is a filter on 2014 in the analysis. So only two partitions were scanned instead of the entire fact table. 21 Remodeled cube with partitions and aggregations A classic optimization trick is to bring the data together in the backend so that all data can be found inside a single dimension. It requires quite a lot of ETL work as you most likely have to join though the fact table to get all the combinations of your two dimensions. In this example, I built a combined table with all the store and product style information. The advantage of this technique is that all the combinations there are done at query time before being moved to the ETL process, resulting in rapid retrieval of results. The new combined tables can be improved by making hierarchies and building aggregations on top. The result of the same query is now 44 seconds and the disk IOPS is about the same as the previous query as it still has to scan though the same two partitions. This is as quick as we can get this analysis when using analysis services, but let’s have a look at some other technologies to produce the same analysis. Relational data model with SQL partitions, compression, and TARGIT aggregations In this example we are going to use the relational data model editor that can be found inside TARGIT Management. 22 A connection is made to the same data warehouse that the cubes were built on using the original data model (without the remodeled table). A set of aggregations has been designed to improve the performance of the queries of each of the dimensions and combined aggregations when more dimensions are being used. The same half yearly partitions were made in the SQL database to speed up the scan of the large fact table. Further compression was made on each of the partitions to lower the overall disk IOPS. This feature was introduced with SQL Server 2008. 23 The result of the analysis now only takes 16 seconds to run and requires almost no disk IOPS. It uses a little extra CPU as it needs to decompress the data from the partitions. 24 Tabular As we are benchmarking the different technologies, it wouldn’t be complete if we didn’t have a tabular model running on the same data warehouse. The original data model was built in-memory using a SQL Server 2014 tabular model: The result of the same query was an impressive 12 seconds and even less Disk IOPS is required–naturally as all data is in-memory. The full 113 million fact table was able to fit into memory on my 16 GB test machine without problems. 25 Conclusions As a starting point, it is recommended to set up a scalability testing solution to ensure the back-end solution can handle the pressure of having multiple users pulling queries at the same time. The examples that were shown in this whitepaper are a good way to give you that baseline testing for your TARGIT solution before rolling out new BI areas to users. The techniques can be used on cubes, ROLAP, and tabular models. If a bottleneck is found during the testing, there are several techniques for improving the performance as shown in this article. There are many quick wins that can be done simply by optimizing the way the queries are put together, such as using member properties and avoiding nesting large dimensions. If nesting is required to create a specific report, dashboard, or analysis, other technologies can be used that better handle the nesting. As we saw in the examples both ROLAP and tabular can speed up queries by three to four times. Finally, general performance tuning on analysis services can be achieved by using aggregations, partitions, and by remodeling the cube. The table above shows the requirements on both TARGIT and SQL Server to utilize the different performance tuning techniques and technologies. 26 Appendix Appendix 1: Performance tuning development files (zip package) The following files can be found in the PerformanceMatters_DevelopmentFiles.zip to be used to recreate these techniques on your own TARGIT Solution: Databases (structure scripts) StressTest.sql TARGIT_SystemDB.sql ROLAP data models: Create connections in TARGIT Management and copy data model files into the Connection folder in the TARGIT/Antserver/Settings/Connection folder Stress test: Datamodel.xml TARGIT log: Datamodel.xml Test dashboards: Stress test Log.xview Performance Log.xview 27