Download A Guide to Consolidating SQL Server Data Warehouses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Tandem Computers wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Team Foundation Server wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Transcript
A Guide to
Consolidating SQL Server
Data Warehouses
By Allan Hirt
Published: May 2009
A Guide to Consolidating
SQL Server
Data Warehouses
C
onsolidation of SQL Server instances and databases has become a common practice over the past
few years. In most environments, this consolidation focuses on application-based databases that generally
support OLTP workloads. But you should consider a completely different type of SQL Server use for consolidation:
a data warehouse. With SQL Server, a data warehouse
typically includes Analysis Services, but it usually has
significant relational components as well.
➔ Contents
SQL Server 2008.................................... 3
Choosing a Direction for
Consolidation......................................... 4
Example Data Warehouse
Consolidation Scenario.......................... 4
Approaching the Problems..................... 5
Addressing Disk Performance................. 6
Realizing Savings.................................... 6
Summary................................................ 7
2
Why would you consider consolidating databases involved in data warehousing? The reality is that at the heart
of any data warehouse is the same database technology,
so why wouldn’t you consider doing it? All of the potential benefits and cost savings that can be realized as
part of a “normal” consolidation can also apply to a data
warehouse. It also stands to reason that many of the problems you face with your relational deployments are also
manifested in warehouses. The following list highlights
some of the most common issues and how consolidation
can benefit you:
• Do you have a handle on the data warehouse environment? Is there a list of every server, instance and
database along with what feeds what? If you answered
no to either of these questions, then you’re experiencing
a variation of SQL Server sprawl. Because of the unknown landscape other problems might exist. Consolidation allows you to contain the sprawl and centralize
where servers, instances, and databases are located and
deployed, thus reducing the overall footprint and easing
administration.
• Was each deployment rolled out in the same way? A
lack of standards for both Windows and SQL Server
configuration leads to potential problems and confu-
THE
A GUIDE
IMPACT
TO OF
CONSOLIDATING
DISK FRAGMENTATION
SQL SERVER
ON DATA
SERVERS
WAREHOUSES
sponsored
sponsored
by Diskeeper
by Unisys
sion when it comes to administering and patching. And it means that one server could be less
stable than another. Consolidation allows you to
standardize the configuration by reducing variations to a manageable few instead of everything
having its own unique configuration.
• Are the database servers powering the warehouse over- or under-utilized? Many warehouses
see constant use, but there could be times that
they are used more than others. For example,
under normal usage, a reporting system may
be used on average 20 percent, but at month
close the system experiences nearly 100 percent utilization for a week. Some of the existing
servers may be relatively idle and not be using much server capacity, while others may be
close to running at 100 percent all of the time.
Neither situation is desirable. The former means
you have excess capacity; the latter means you
do not have enough capacity for future growth.
Consolidation should allow you to have servers
that will meet your performance needs now and
into the future.
• Is your data center at, near or out of space for
new deployments? SQL Server sprawl contributes to the consumption of valuable rack space,
electricity, cooling, ports, and all other physical
aspects with older, less efficient servers. Consolidating and replacing many older and inefficient
servers with fewer, more efficient ones can reduce the overall cost to keep the servers running.
• Are you compliant with all your vendors when it
comes to licensing? Because you may not know
what you have, it’s impossible to be paying what
you should be. Consolidation allows you to contain licensing costs and keep them manageable
and predictable.
• Are you having trouble managing the data
warehouse environment? A major factor driving
companies to consolidation is a desire to reduce
administration costs. Consolidation should allow
you to standardize and centralize administrative
tasks so that the daily responsibilities become
easier to handle, not harder.
• Are you currently getting the availability needed
for each data warehouse deployment? Chances
are that some deployments may not have been
designed with availability in mind. Because the
warehouse has become more central to most
environments, increasing availability is never a
bad idea.
• Are existing servers in your data warehouse using older versions of software, and are they still
sponsored by Unisys
supported? Being out of support may not be a
problem until you need support from a vendor.
Consolidation is an opportune time to consider
upgrading to a supported version of all data
warehouse products, including making the big
change to SQL Server 2008.
Consolidation begins with the idea; however, to
move the idea forward, consolidation requires an
executive sponsor who will be ultimately responsible for the project. The executive sponsor will also
be instrumental in securing funding for everything
from new hardware to any additional resources
(i.e., consultants) that may be needed along the
way. Consolidation is a true transformation and is
not a project that can be planned and executed in
a weekend. Remember that not everything should
be consolidated, and what can and cannot be
consolidated is the result of a thorough investigation and analysis phase. Certain applications and
systems—due to various factors discovered during
investigation and analysis—may be better off left
alone and not consolidated.
A consolidation effort will take some time, and
depending on what is discovered and then what
will be consolidated, the work may encompass
months of time. The resources chosen to work on
the project need to be reasonably dedicated (if not
100 percent). This will speed the project along.
SQL Server 2008
SQL Server 2008 is the natural choice for consolidating existing SQL Server- and Analysis Servicesbased implementations. Its memory and scalability
is limited only by the hardware and version of
Windows underneath it.
From an availability perspective, both SQL Server’s
database engine and Analysis Services can be
made available through various methods. The only
availability feature in SQL Server 2008 that is supported by both SQL Server and Analysis Services
is failover clustering. As noted above, the data
warehouse is no longer the server that only a few
people use; chances are it’s central to business
operations, so you would want to make Analysis
Services just as available as the database engine.
SQL Server 2008 supports data compression,
which in this age when some warehouses are
terabytes in size, could wind up saving a lot of
space and potentially increasing the throughput
A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES
3
for your applications because you can have better
memory and I/O utilization. While introduced in
SQL Server 2005, data compression is enhanced
in SQL Server 2008 – it fully supports row- or
page-level compression, and can be enabled at the
partition- or table-level. Data compression is more
of a factor for applications developed in-house
where you can control the schema; not all thirdparty applications support this feature.
SQL Server 2008 also introduces the Resource
Governor, which is a powerful performance-related feature. It can especially assist in cases of complex SQL Server environments where performance
problems such as runaway queries—a major concern in a consolidated warehouse environment—
may be difficult to resolve.
SQL Server 2008 Analysis Services includes several new performance features to help improve the
data warehouse performance and manageability.
The write back partitions can now be stored as
MOLAP, which eliminates the need to query the
relational data source as in the previous version
and hence removes the performance bottleneck.
The backup storage subsystem has been rewritten
to scale linearly and can now handle an Analysis
Services database of more than a terabyte. In addition, the limitations on backup size and metadata
files have been removed. A new feature, block
computation, addresses the issue of Null handling
and greatly enhances the performance of queries
regardless of the proportion of null values in the
cube.
Choosing a Direction for Consolidation
There are two main options when it comes to SQL
Server consolidation: physical servers that utilize
multiple SQL Server or Analysis Services instances,
or virtualization. Both approaches are valid, and
ultimately your strategy may be to mix the two
depending on factors such as workload, availability, performance and business requirements. This
subject was tackled in a recent Essential Guide
published by Windows IT Pro, but a brief comparison is offered here.
Virtualization, endorsed by many IT organizations as the ultimate savior of costs, is most likely
already in-house. Make no mistake about it—virtualization is great for certain types of workloads
and environments. Currently, the most significant
4
A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES
limitations as they pertain to SQL Server and
Analysis Services would be any constraints inherited from the host virtualization platform, such as
overall performance (including memory, processor
and disk I/O). Because data warehouses tend to
be very I/O hungry, virtualization may not be the
best choice for the entire warehouse. The greatest
downside to virtualization is that you can end up
in the same, if not worse, scenario as pre-consolidation because you’re not really reducing the
number of Windows installations being managed;
you’re just reducing the number of physical servers. If you’re planning on using failover clustering,
it’s important to note that as of the writing of this
paper, Microsoft does not support clustering SQL
Server or Analysis Services as a guest.
SQL Server has supported multiple installations on
a single server or cluster since SQL Server 2000
with the concept of the instance. While virtualization requires more planning than simply performing a physical-to-virtual conversion, more often
than not, deploying multiple instances on a single
physical server or Windows failover cluster is
much easier to manage and maintain. Consolidation should always be done on fewer, larger servers in order to obtain the maximum benefits.
Physical hardware also offers some of the best
performance. In February 2009, Unisys announced that the ES7000 Model 7600R using
six-core processors set a new record for price
and performance using the TPC-H test. The
TPC-H test is designed to show performance in a
DSS or BI-type scenario. Using that server platform, Unisys achieved performance of 80,172.7
QphH@10,000GB and price/performance of
$18.95 per QphH@10,000GB. Performance in the
TPC-H benchmark test is measured as composite
queries per hour against a database of 10,000
gigabytes (QphH@10,000GB), while economic efficiency is shown as price/performance expressed
in dollars per QphH@10,000GB.
Example Data Warehouse Consolidation
Scenario
This section will illustrate a sample data warehouse consolidation scenario. The scenario is as
follows: a company has offices located around the
world. Each branch office needs to share common
sales data and other relevant information. The current data warehouse, approximately 1.8 TB in size,
grows about 12 percent per year. This means the
sponsored by Unisys
data warehouse would grow to just under 3.2 TB
within the next five years.
Suppose that this example company currently has
nine separate databases that provide the underlying transactional data for the main warehouse, and
that each database is used by one or more of 20
different applications (seven are dedicated reporting applications in various departments, others
are traditional applications). Seven of the nine
databases use SQL Server 2000, which is nearly 10
years old, as the database platform. Each application that has similar functionality was implemented
locally in each region around the world. Some of
the different applications came about as the result
of acquisitions; others came about due to a lack of
communication and someone made a decision to
implement a new application. Having to support
all of these databases and applications has become
a huge drain on the staff that maintains the warehouse. All of the data sources and the processes for
data extraction and loading are different because
each application’s schema does not match; custom
processes are needed for each application.
Add to that the increasing need for more real-time
data by end users and business decision makers;
with this downturned economy, more than ever
before the need to establish trends quickly may
mean the difference between profit and loss, staying in business or shutting down the company. Unfortunately, due to the location of some of the servers around the globe, coupled with the network’s
performance, it may take 24 to 48 hours to extract,
transform and load the data into the warehouse.
By the time end users using the warehouse see the
data—which is not matched up because each subsidiary is loaded at different times—it may be two
or three days old. This lack of agility in obtaining
data could have disastrous results to the company’s
bottom line.
The company’s ultimate goal is to have a backend infrastructure that makes it easier to deploy
applications and databases, but at the same time
proving to be much more scalable, available, reliable and agile than the one it has now.
Approaching the Problems
The challenges our example company faces are
not unique, but that does not mean a textbook
answer exists, either. First and foremost consider
how much the company is wasting in terms of
sponsored by Unisys
labor and infrastructure costs, as well as opportunity costs associated with all of those different
applications. While it may have made sense at
one point to maintain separate applications that
have similar functionality, it becomes a drain on
both end users and administrative staff. Add to
that the actual cost of multiple support contracts
and software updates, and suddenly each of those
implementations is not as attractive. Moving to
fewer applications, or in a best-case scenario one
main application to serve all subsidiaries, may be
the best alternative. But that will be a challenge to
implement because you not only need to decide
on one application (or a select few), which may
not even be one of the already existing ones, but
once a final decision is made, the application must
be customized, have data migrated to it and be
deployed in production. And finally, you need to
train all end users on how to use this new application. That is not an undertaking to be taken
lightly, but the end return on investment should be
huge and the cost savings should be realized fairly
quickly.
This may be a good time to re-evaluate applications that may peripherally use the warehouse data
or feed it with data. Consider eliminating those
that aren’t being used and update others so that
they are more efficient. For example, assuming
SQL Server 2008 is deployed, consider rearchitecting the schema so that it can take advantage of
data compression. Because the data will be centralized, there may also be the opportunity to build
business intelligence into new applications (or
existing ones) much more easily, providing greater
access to data across an entire enterprise.
When consolidating a data warehouse, do not
forget the need to change and update applications or the impact on the end user experience.
Ensuring that all applications can handle the
changes brought on by consolidation, such as a
new location for a database, may not be a trivial
amount of work (this should be known after the
investigation phase of consolidation). Similarly,
if the connection string for the application is on
each desktop, is it feasible to touch every desktop?
One of the applications the example company
has is a “fat client,” so for them, they would need
to investigate the effort it will take to reconfigure
every end user’s application settings to point to the
new database backend. The configuration may be
as simple as a configuration file or registry update
A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES
5
pushed out by a management tool such as Microsoft’s System Center Configuration Manager, or
as difficult as walking around and reinstalling the
entire software suite.
Along the same lines, consider the cost of porting
applications or upgrading them to support SQL
Server 2008. Realistically, budget may need to be
reserved to pay for contractors to help in the coding effort since chances are the current staff may
already have a lot of work on their plates that cannot be put aside to fix legacy applications.
Addressing Disk Performance
Arguably, disk performance will present the biggest
challenge. Having a disk subsystem that supports
large growth, performs well, and is easy to maintain is challenging. It’s fairly obvious that the disk
infrastructure supporting the warehouse cannot be
an afterthought, and it may require purchasing additional storage capacity to support the workloads.
This determination can result from an analysis of
the information gathered about the warehouse.
In the consolidation process, think about how
you can better leverage the features of the storage
hardware to be more agile. For example, assume
that post-consolidation fewer applications exist
and they all are located in the same data center
attached to the same storage hardware. As long as
the storage supports it and you get the proper software and licensing from the storage vendor, taking
advantage of features such as disk-based snapshots
to not only perform backups, but also to attach the
underlying snapshot to another server to present
the data, or do things like easily refresh development and test environments may save time and
money by immensely speeding up access to data.
The cost of implementation may be insignificant in
relation to the time administrators gain back and
the increased productivity of end users.
It’s paramount to ensure that the performance
requirements beyond disk I/O are achieved. A few
methods are available to ensure that SQL Server or
Analysis Services are allocated sufficient memory
and processor resources for their workloads. In
a virtualized environment you’re generally at the
mercy of what resources are available at the host
and then presented to the virtualized guest, but
with multiple instances you can not only use the
new Resource Governor feature within SQL Server
2008, but also take advantage of Windows System
Resource Manager. Our example company would
6
A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES
be wise to consider both of those features to see
how they will work in a consolidated environment.
For our example company, it’s anticipated that by
centralizing all of the remaining applications into
one data center (since everyone is connected via
the wide area network, it should not be a problem
for normal application access) the company will
realize considerable savings not only in maintenance and administration costs, but with latency.
The company can reduce data center costs because it will no longer have to pay for the physical
aspects of the now consolidated servers—such as
air conditioning and electricity—as well as reduce
tasks such as large data extracts over the wide
area network, thereby reducing overall network
throughput. In some regions there may be some
latency associated with the use of the applications, but those will be addressed as they arise.
Realizing Savings
Data extracts, transformations and loads will be
the same size post-consolidation for the example
company, but it should no longer be a problem
to move, for example, 100GB of data between
servers because they will be on the same local
area network. Squeezing that same amount of data
over the wide area network from Sydney to New
York may strain the entire network even if done
off-hours. As mentioned earlier, centralized storage hardware could be used to assist in the data
moves from one server to another.
Data extracts and loads (ETL) could be simplified
if they are designed for and executed on the same
database platform and version. This would reduce
the administrative overhead associated with ETL,
since maintaining processes across multiple versions can get expensive as older versions reach
maturity and end-of-life from their vendors. Fewer
applications translates into less overhead in monitoring the processes because there won’t be as
many. It may also be possible to standardize how
the extracts and loads are done to further reduce
administrative costs associated with the warehouse. Again, it should be stressed that changing
these processes is not trivial. It will require a lot of
work, but the end result will benefit everyone.
With a consolidated data warehouse, overall storage capacity should no longer be an issue. For
example, creating that 100GB extract that caused
a major problem at the subsidiary with fewer
sponsored by Unisys
resources (for example, it needs to be copied immediately or other applications couldn’t run) will
no longer be a problem. Finally, the company may
be able to substantially reduce or eliminate the
24- to 48-hour delay in getting data updates. This
is arguably the biggest benefit of consolidating:
you have fewer applications and databases in one
location.
Our example company is also realizing that
backups, backup retention, and data archiving
will need to be addressed. The company may not
be regulated by a standards organization such
as HIPAA, but nonetheless it needs to retain a
certain amount of backups for a period of time
for financial reporting and regulatory purposes.
Multi-terabyte backups, even compressed, are not
small. That will affect a disaster recovery scenario.
The company realizes that purging data from the
data warehouse periodically to attempt to maintain a relatively constant and predictable size of
the warehouse will help performance and capacity
planning.
Most of the example company’s regional subsidiaries did not have high availability for their database
infrastructure. If there was availability, it was for
a select few installations and even those did not
have good track records. By centralizing everything, even applications and databases that once
had poor availability may get high availability. The
company is strongly considering implementing
Allan Hirt has been using SQL Server in various
guises since 1992. For the past ten years, he has
been consulting, training, developing content,
speaking at events, and authoring books, whitepapers, and articles related to SQL Server architecture, high availability, administration, and more.
His upcoming book Pro SQL Server 2008 Failover
Clustering (Apress) is due to be published in June,
sponsored by Unisys
failover clustering as its primary form of availability because it is a natural fit for physical consolidation and works with both SQL Server 2008 and
Analysis Services 2008.
Because SQL Server 2000 is basically at its end
of life for mainstream support from Microsoft,
the company is concerned that because seven of
the nine warehouse databases are on SQL Server
2000, they may be in danger if something goes
wrong. They would rather be supported than not,
and while they know upgrading to SQL Server
2008 will take some work, the end result will be
worth it.
Summary
Consolidation should allow you to deploy applications to a centralized environment in a standard
way. Database consolidation takes this a step further by reducing the database footprint within an
existing company. Applying the same approaches
and techniques of database consolidation to a data
warehouse environment should be straightforward.
No longer should it take months to deploy a business intelligence application; it should be a matter
of days once the application is ready to “go live.”
If you care about having a warehouse with better
availability, scalability, reliability and agility that is
cost-efficient, then it should be an easy decision to
investigate consolidation.
2009. Before forming Megahirtz in 2007, he most
recently worked for both Microsoft and Avanade,
and still continues to work closely with Microsoft
on various projects including contributing to the
recently published SQL Server 2008 Upgrade
Technical Reference Guide. He can be contacted
through his website http://www.sqlha.com or at
[email protected].
A GUIDE TO CONSOLIDATING SQL SERVER DATA WAREHOUSES
7