Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Tandem Computers wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Team Foundation Server wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Database model wikipedia , lookup
A Guide to Consolidating SQL Server Data Warehouses By Allan Hirt Published: May 2009 A Guide to Consolidating SQL Server Data Warehouses C onsolidation of SQL Server instances and databases has become a common practice over the past few years. In most environments, this consolidation focuses on application-based databases that generally support OLTP workloads. But you should consider a completely different type of SQL Server use for consolidation: a data warehouse. With SQL Server, a data warehouse typically includes Analysis Services, but it usually has significant relational components as well. ➔ Contents SQL Server 2008.................................... 3 Choosing a Direction for Consolidation......................................... 4 Example Data Warehouse Consolidation Scenario.......................... 4 Approaching the Problems..................... 5 Addressing Disk Performance................. 6 Realizing Savings.................................... 6 Summary................................................ 7 2 Why would you consider consolidating databases involved in data warehousing? The reality is that at the heart of any data warehouse is the same database technology, so why wouldn’t you consider doing it? All of the potential benefits and cost savings that can be realized as part of a “normal” consolidation can also apply to a data warehouse. It also stands to reason that many of the problems you face with your relational deployments are also manifested in warehouses. The following list highlights some of the most common issues and how consolidation can benefit you: • Do you have a handle on the data warehouse environment? Is there a list of every server, instance and database along with what feeds what? If you answered no to either of these questions, then you’re experiencing a variation of SQL Server sprawl. Because of the unknown landscape other problems might exist. Consolidation allows you to contain the sprawl and centralize where servers, instances, and databases are located and deployed, thus reducing the overall footprint and easing administration. • Was each deployment rolled out in the same way? A lack of standards for both Windows and SQL Server configuration leads to potential problems and confu- THE A GUIDE IMPACT TO OF CONSOLIDATING DISK FRAGMENTATION SQL SERVER ON DATA SERVERS WAREHOUSES sponsored sponsored by Diskeeper by Unisys sion when it comes to administering and patching. And it means that one server could be less stable than another. Consolidation allows you to standardize the configuration by reducing variations to a manageable few instead of everything having its own unique configuration. • Are the database servers powering the warehouse over- or under-utilized? Many warehouses see constant use, but there could be times that they are used more than others. For example, under normal usage, a reporting system may be used on average 20 percent, but at month close the system experiences nearly 100 percent utilization for a week. Some of the existing servers may be relatively idle and not be using much server capacity, while others may be close to running at 100 percent all of the time. Neither situation is desirable. The former means you have excess capacity; the latter means you do not have enough capacity for future growth. Consolidation should allow you to have servers that will meet your performance needs now and into the future. • Is your data center at, near or out of space for new deployments? SQL Server sprawl contributes to the consumption of valuable rack space, electricity, cooling, ports, and all other physical aspects with older, less efficient servers. Consolidating and replacing many older and inefficient servers with fewer, more efficient ones can reduce the overall cost to keep the servers running. • Are you compliant with all your vendors when it comes to licensing? Because you may not know what you have, it’s impossible to be paying what you should be. Consolidation allows you to contain licensing costs and keep them manageable and predictable. • Are you having trouble managing the data warehouse environment? A major factor driving companies to consolidation is a desire to reduce administration costs. Consolidation should allow you to standardize and centralize administrative tasks so that the daily responsibilities become easier to handle, not harder. • Are you currently getting the availability needed for each data warehouse deployment? Chances are that some deployments may not have been designed with availability in mind. Because the warehouse has become more central to most environments, increasing availability is never a bad idea. • Are existing servers in your data warehouse using older versions of software, and are they still sponsored by Unisys supported? Being out of support may not be a problem until you need support from a vendor. Consolidation is an opportune time to consider upgrading to a supported version of all data warehouse products, including making the big change to SQL Server 2008. Consolidation begins with the idea; however, to move the idea forward, consolidation requires an executive sponsor who will be ultimately responsible for the project. The executive sponsor will also be instrumental in securing funding for everything from new hardware to any additional resources (i.e., consultants) that may be needed along the way. Consolidation is a true transformation and is not a project that can be planned and executed in a weekend. Remember that not everything should be consolidated, and what can and cannot be consolidated is the result of a thorough investigation and analysis phase. Certain applications and systems—due to various factors discovered during investigation and analysis—may be better off left alone and not consolidated. A consolidation effort will take some time, and depending on what is discovered and then what will be consolidated, the work may encompass months of time. The resources chosen to work on the project need to be reasonably dedicated (if not 100 percent). This will speed the project along. SQL Server 2008 SQL Server 2008 is the natural choice for consolidating existing SQL Server- and Analysis Servicesbased implementations. Its memory and scalability is limited only by the hardware and version of Windows underneath it. From an availability perspective, both SQL Server’s database engine and Analysis Services can be made available through various methods. The only availability feature in SQL Server 2008 that is supported by both SQL Server and Analysis Services is failover clustering. As noted above, the data warehouse is no longer the server that only a few people use; chances are it’s central to business operations, so you would want to make Analysis Services just as available as the database engine. SQL Server 2008 supports data compression, which in this age when some warehouses are terabytes in size, could wind up saving a lot of space and potentially increasing the throughput A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES 3 for your applications because you can have better memory and I/O utilization. While introduced in SQL Server 2005, data compression is enhanced in SQL Server 2008 – it fully supports row- or page-level compression, and can be enabled at the partition- or table-level. Data compression is more of a factor for applications developed in-house where you can control the schema; not all thirdparty applications support this feature. SQL Server 2008 also introduces the Resource Governor, which is a powerful performance-related feature. It can especially assist in cases of complex SQL Server environments where performance problems such as runaway queries—a major concern in a consolidated warehouse environment— may be difficult to resolve. SQL Server 2008 Analysis Services includes several new performance features to help improve the data warehouse performance and manageability. The write back partitions can now be stored as MOLAP, which eliminates the need to query the relational data source as in the previous version and hence removes the performance bottleneck. The backup storage subsystem has been rewritten to scale linearly and can now handle an Analysis Services database of more than a terabyte. In addition, the limitations on backup size and metadata files have been removed. A new feature, block computation, addresses the issue of Null handling and greatly enhances the performance of queries regardless of the proportion of null values in the cube. Choosing a Direction for Consolidation There are two main options when it comes to SQL Server consolidation: physical servers that utilize multiple SQL Server or Analysis Services instances, or virtualization. Both approaches are valid, and ultimately your strategy may be to mix the two depending on factors such as workload, availability, performance and business requirements. This subject was tackled in a recent Essential Guide published by Windows IT Pro, but a brief comparison is offered here. Virtualization, endorsed by many IT organizations as the ultimate savior of costs, is most likely already in-house. Make no mistake about it—virtualization is great for certain types of workloads and environments. Currently, the most significant 4 A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES limitations as they pertain to SQL Server and Analysis Services would be any constraints inherited from the host virtualization platform, such as overall performance (including memory, processor and disk I/O). Because data warehouses tend to be very I/O hungry, virtualization may not be the best choice for the entire warehouse. The greatest downside to virtualization is that you can end up in the same, if not worse, scenario as pre-consolidation because you’re not really reducing the number of Windows installations being managed; you’re just reducing the number of physical servers. If you’re planning on using failover clustering, it’s important to note that as of the writing of this paper, Microsoft does not support clustering SQL Server or Analysis Services as a guest. SQL Server has supported multiple installations on a single server or cluster since SQL Server 2000 with the concept of the instance. While virtualization requires more planning than simply performing a physical-to-virtual conversion, more often than not, deploying multiple instances on a single physical server or Windows failover cluster is much easier to manage and maintain. Consolidation should always be done on fewer, larger servers in order to obtain the maximum benefits. Physical hardware also offers some of the best performance. In February 2009, Unisys announced that the ES7000 Model 7600R using six-core processors set a new record for price and performance using the TPC-H test. The TPC-H test is designed to show performance in a DSS or BI-type scenario. Using that server platform, Unisys achieved performance of 80,172.7 QphH@10,000GB and price/performance of $18.95 per QphH@10,000GB. Performance in the TPC-H benchmark test is measured as composite queries per hour against a database of 10,000 gigabytes (QphH@10,000GB), while economic efficiency is shown as price/performance expressed in dollars per QphH@10,000GB. Example Data Warehouse Consolidation Scenario This section will illustrate a sample data warehouse consolidation scenario. The scenario is as follows: a company has offices located around the world. Each branch office needs to share common sales data and other relevant information. The current data warehouse, approximately 1.8 TB in size, grows about 12 percent per year. This means the sponsored by Unisys data warehouse would grow to just under 3.2 TB within the next five years. Suppose that this example company currently has nine separate databases that provide the underlying transactional data for the main warehouse, and that each database is used by one or more of 20 different applications (seven are dedicated reporting applications in various departments, others are traditional applications). Seven of the nine databases use SQL Server 2000, which is nearly 10 years old, as the database platform. Each application that has similar functionality was implemented locally in each region around the world. Some of the different applications came about as the result of acquisitions; others came about due to a lack of communication and someone made a decision to implement a new application. Having to support all of these databases and applications has become a huge drain on the staff that maintains the warehouse. All of the data sources and the processes for data extraction and loading are different because each application’s schema does not match; custom processes are needed for each application. Add to that the increasing need for more real-time data by end users and business decision makers; with this downturned economy, more than ever before the need to establish trends quickly may mean the difference between profit and loss, staying in business or shutting down the company. Unfortunately, due to the location of some of the servers around the globe, coupled with the network’s performance, it may take 24 to 48 hours to extract, transform and load the data into the warehouse. By the time end users using the warehouse see the data—which is not matched up because each subsidiary is loaded at different times—it may be two or three days old. This lack of agility in obtaining data could have disastrous results to the company’s bottom line. The company’s ultimate goal is to have a backend infrastructure that makes it easier to deploy applications and databases, but at the same time proving to be much more scalable, available, reliable and agile than the one it has now. Approaching the Problems The challenges our example company faces are not unique, but that does not mean a textbook answer exists, either. First and foremost consider how much the company is wasting in terms of sponsored by Unisys labor and infrastructure costs, as well as opportunity costs associated with all of those different applications. While it may have made sense at one point to maintain separate applications that have similar functionality, it becomes a drain on both end users and administrative staff. Add to that the actual cost of multiple support contracts and software updates, and suddenly each of those implementations is not as attractive. Moving to fewer applications, or in a best-case scenario one main application to serve all subsidiaries, may be the best alternative. But that will be a challenge to implement because you not only need to decide on one application (or a select few), which may not even be one of the already existing ones, but once a final decision is made, the application must be customized, have data migrated to it and be deployed in production. And finally, you need to train all end users on how to use this new application. That is not an undertaking to be taken lightly, but the end return on investment should be huge and the cost savings should be realized fairly quickly. This may be a good time to re-evaluate applications that may peripherally use the warehouse data or feed it with data. Consider eliminating those that aren’t being used and update others so that they are more efficient. For example, assuming SQL Server 2008 is deployed, consider rearchitecting the schema so that it can take advantage of data compression. Because the data will be centralized, there may also be the opportunity to build business intelligence into new applications (or existing ones) much more easily, providing greater access to data across an entire enterprise. When consolidating a data warehouse, do not forget the need to change and update applications or the impact on the end user experience. Ensuring that all applications can handle the changes brought on by consolidation, such as a new location for a database, may not be a trivial amount of work (this should be known after the investigation phase of consolidation). Similarly, if the connection string for the application is on each desktop, is it feasible to touch every desktop? One of the applications the example company has is a “fat client,” so for them, they would need to investigate the effort it will take to reconfigure every end user’s application settings to point to the new database backend. The configuration may be as simple as a configuration file or registry update A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES 5 pushed out by a management tool such as Microsoft’s System Center Configuration Manager, or as difficult as walking around and reinstalling the entire software suite. Along the same lines, consider the cost of porting applications or upgrading them to support SQL Server 2008. Realistically, budget may need to be reserved to pay for contractors to help in the coding effort since chances are the current staff may already have a lot of work on their plates that cannot be put aside to fix legacy applications. Addressing Disk Performance Arguably, disk performance will present the biggest challenge. Having a disk subsystem that supports large growth, performs well, and is easy to maintain is challenging. It’s fairly obvious that the disk infrastructure supporting the warehouse cannot be an afterthought, and it may require purchasing additional storage capacity to support the workloads. This determination can result from an analysis of the information gathered about the warehouse. In the consolidation process, think about how you can better leverage the features of the storage hardware to be more agile. For example, assume that post-consolidation fewer applications exist and they all are located in the same data center attached to the same storage hardware. As long as the storage supports it and you get the proper software and licensing from the storage vendor, taking advantage of features such as disk-based snapshots to not only perform backups, but also to attach the underlying snapshot to another server to present the data, or do things like easily refresh development and test environments may save time and money by immensely speeding up access to data. The cost of implementation may be insignificant in relation to the time administrators gain back and the increased productivity of end users. It’s paramount to ensure that the performance requirements beyond disk I/O are achieved. A few methods are available to ensure that SQL Server or Analysis Services are allocated sufficient memory and processor resources for their workloads. In a virtualized environment you’re generally at the mercy of what resources are available at the host and then presented to the virtualized guest, but with multiple instances you can not only use the new Resource Governor feature within SQL Server 2008, but also take advantage of Windows System Resource Manager. Our example company would 6 A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES be wise to consider both of those features to see how they will work in a consolidated environment. For our example company, it’s anticipated that by centralizing all of the remaining applications into one data center (since everyone is connected via the wide area network, it should not be a problem for normal application access) the company will realize considerable savings not only in maintenance and administration costs, but with latency. The company can reduce data center costs because it will no longer have to pay for the physical aspects of the now consolidated servers—such as air conditioning and electricity—as well as reduce tasks such as large data extracts over the wide area network, thereby reducing overall network throughput. In some regions there may be some latency associated with the use of the applications, but those will be addressed as they arise. Realizing Savings Data extracts, transformations and loads will be the same size post-consolidation for the example company, but it should no longer be a problem to move, for example, 100GB of data between servers because they will be on the same local area network. Squeezing that same amount of data over the wide area network from Sydney to New York may strain the entire network even if done off-hours. As mentioned earlier, centralized storage hardware could be used to assist in the data moves from one server to another. Data extracts and loads (ETL) could be simplified if they are designed for and executed on the same database platform and version. This would reduce the administrative overhead associated with ETL, since maintaining processes across multiple versions can get expensive as older versions reach maturity and end-of-life from their vendors. Fewer applications translates into less overhead in monitoring the processes because there won’t be as many. It may also be possible to standardize how the extracts and loads are done to further reduce administrative costs associated with the warehouse. Again, it should be stressed that changing these processes is not trivial. It will require a lot of work, but the end result will benefit everyone. With a consolidated data warehouse, overall storage capacity should no longer be an issue. For example, creating that 100GB extract that caused a major problem at the subsidiary with fewer sponsored by Unisys resources (for example, it needs to be copied immediately or other applications couldn’t run) will no longer be a problem. Finally, the company may be able to substantially reduce or eliminate the 24- to 48-hour delay in getting data updates. This is arguably the biggest benefit of consolidating: you have fewer applications and databases in one location. Our example company is also realizing that backups, backup retention, and data archiving will need to be addressed. The company may not be regulated by a standards organization such as HIPAA, but nonetheless it needs to retain a certain amount of backups for a period of time for financial reporting and regulatory purposes. Multi-terabyte backups, even compressed, are not small. That will affect a disaster recovery scenario. The company realizes that purging data from the data warehouse periodically to attempt to maintain a relatively constant and predictable size of the warehouse will help performance and capacity planning. Most of the example company’s regional subsidiaries did not have high availability for their database infrastructure. If there was availability, it was for a select few installations and even those did not have good track records. By centralizing everything, even applications and databases that once had poor availability may get high availability. The company is strongly considering implementing Allan Hirt has been using SQL Server in various guises since 1992. For the past ten years, he has been consulting, training, developing content, speaking at events, and authoring books, whitepapers, and articles related to SQL Server architecture, high availability, administration, and more. His upcoming book Pro SQL Server 2008 Failover Clustering (Apress) is due to be published in June, sponsored by Unisys failover clustering as its primary form of availability because it is a natural fit for physical consolidation and works with both SQL Server 2008 and Analysis Services 2008. Because SQL Server 2000 is basically at its end of life for mainstream support from Microsoft, the company is concerned that because seven of the nine warehouse databases are on SQL Server 2000, they may be in danger if something goes wrong. They would rather be supported than not, and while they know upgrading to SQL Server 2008 will take some work, the end result will be worth it. Summary Consolidation should allow you to deploy applications to a centralized environment in a standard way. Database consolidation takes this a step further by reducing the database footprint within an existing company. Applying the same approaches and techniques of database consolidation to a data warehouse environment should be straightforward. No longer should it take months to deploy a business intelligence application; it should be a matter of days once the application is ready to “go live.” If you care about having a warehouse with better availability, scalability, reliability and agility that is cost-efficient, then it should be an easy decision to investigate consolidation. 2009. Before forming Megahirtz in 2007, he most recently worked for both Microsoft and Avanade, and still continues to work closely with Microsoft on various projects including contributing to the recently published SQL Server 2008 Upgrade Technical Reference Guide. He can be contacted through his website http://www.sqlha.com or at [email protected]. A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES 7