Download Feeding the Active Data Warehouse

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Supporting Sandbox
Applications
On Production Platforms
By: Carrie Ballinger
Date: October 1, 2007
Doc: 541 – 0006954 – A02
Abstract: Supporting sandbox applications on the Enterprise Data Warehouse platform is a costeffective and quick solution for getting important business answers from non-production data.
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
541-0006954A02
Supporting Sandbox Applications
NCR CONFIDENTIAL
Copyright © 2007 by NCR Corporation.
All Rights Reserved.
This document, which includes the information contained herein,: (i) is the exclusive property of NCR Corporation; (ii) constitutes NCR confidential information; (iii) may not be disclosed by you to third parties;
(iv) may only be used by you for the exclusive purpose of facilitating your internal NCR-authorized use of
the NCR product(s) described in this document to the extent that you have separately acquired a written
license from NCR for such product(s); and (v) is provided to you solely on an "as-is" basis. In no case will
you cause this document or its contents to be disseminated to any third party, reproduced or copied by
any means (in whole or in part) without NCR's prior written consent. Any copy of this document, or portion thereof, must include this notice, and all other restrictive legends appearing in this document. Note
that any product, process or technology described in this document may be the subject of other intellectual property rights reserved by NCR and are not licensed hereunder. No license rights will be implied. Use,
duplication or disclosure by the United States government is subject to the restrictions set forth in DFARS
252.227-7013 (c) (1) (ii) and FAR 52.227-19. Other brand and product names used herein are for identification purposes only and may be trademarks of their respective companies.
Revision History
Revision/Version
Authors
Date
Comments
A01
Carrie Ballinger
09-13-07
Initial review version
A02
Carrie Ballinger
10-01-07
Final version
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
Supporting Sandbox Applications
541-0006954A02
Table of Contents
1. Introduction ............................................................................................................................... 3
1.1. What is a Sandbox Application ........................................................................................... 3
1.2. Possible Approaches for Hosting Sandbox Applications .................................................... 4
1.3. The Audience ...................................................................................................................... 4
2. Workload Management Techniques ........................................................................................ 5
2.1. Priority Scheduler ................................................................................................................ 5
2.2. CPU Limits .......................................................................................................................... 6
2.3. Controlling Concurrency...................................................................................................... 7
2.4. Managing Queries ............................................................................................................... 8
2.5. Improving the Performance of the Sandbox Work .............................................................. 9
2.6. Limiting Disk Space ............................................................................................................. 9
2.7. Processing Window Considerations ................................................................................. 10
3. Database Options .................................................................................................................... 11
4. Operational Considerations ................................................................................................... 12
5. Administrative Considerations .............................................................................................. 13
5.1. Administrative Steps ......................................................................................................... 13
5.2. Use of Roles ...................................................................................................................... 13
5.3. Enforcing Temporary Status ............................................................................................. 14
6. Conclusions and Recommendations .................................................................................... 15
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
1
541-0006954A02
Supporting Sandbox Applications
Table of Figures
Figure 1: Give the sandbox work a dedicated Resource Partition and a low relative weight. .................... 5
Figure 2: CPU limits may be placed on both the Standard and the Sandbox Resource Partitions. ........... 7
Figure 3: A throttle controls concurrency levels among a group of queries. ................................................ 7
Figure 4: When using Teradata Active System Management, each sandbox application has its own WD . 8
Figure 5: A Filter rule can prevent inappropriate queries from running. ...................................................... 9
2
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
Supporting Sandbox Applications
541-0006954A02
1. Introduction
The Teradata Database has a long history of supporting load-and-go applications.
In the 80’s and the early 90’s, initial users of the database frequently threw data into
Teradata with little concern over process or preparedness. Over the years, as enterprise data warehouses evolved, so did the recommended procedures and practices for keeping the data clean, well-integrated, and secure.
However, the ability to easily absorb unrefined data stores and extract quick insights
from them still remains. At the same time, there is an increasing demand from business users to undertake these more adventuresome applications. Techniques built
into the foundation of the database, along with workload management options that
have deepened over the years, come together to support coexistence of the rough
and the refined in a single Teradata data warehouse.
Loosening of standards across the board is not being suggested. An enterprise data
warehouse benefits from strong governance, well thought-out data models, and
clean data. System administrators, well-educated in these production practices, are
often apprehensive when it comes to lowering the bar for non-standard activities.
But there are times when it is acceptable to bend, or bypass, those rules. Creating
sandboxes on your production platform is one of those times.
1.1.
What is a Sandbox Application
The typical sandbox application involves a collection of data on which in-depth exploratory analysis needs to be done to answer one or more critical business question. Such analysis might be a one-off event. The data is dropped in for a brief period of time, it’s scrutinized, and then it’s disposed of. For example, there may be a
need to look at a set of web log rows that originate from another company, one that
is being considered as a partner. A quick decision on this partnership is required,
and timely analysis of these web logs may help you succeed in doing that.
In other cases, a sandbox application could be under consideration for formal incorporation into the data warehouse. But until a modest proof-of-concept is run, it’s not
possible to know if the value is there or not. In such cases, bringing the data into the
Teradata data warehouse to fill the immediate need is an acceptable action, and can
put that data on the road to production status faster.
In either case, the Teradata Database can be used as a robust analysis platform for
unprepared and un-modeled data. New insights and conclusions can be quickly and
efficiently unearthed. “Go” or “no-go” decisions can be promptly reached.
Characteristics of sandbox applications usually include the following:






The scope is very focused, sometimes limited to getting a single answer.
The analysis-to-be-done requires persistent data sets.
A single collection of data is put in place by one or more informal load events.
The data is segregated within a separate database.
Access is by a small set of known users making ad hoc requests.
The sandbox data has a limited shelf life, for example 1 to 3 months.
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
3
541-0006954A02



1.2.
Supporting Sandbox Applications
Low monitoring, data cleansing and availability requirements.
May be isolated, or require joins to production data.
Absence of service level expectations.
Possible Approaches for Hosting Sandbox Applications
There are 3 different approaches to satisfying sandbox application requirements using a Teradata Database:
1. A separate hardware platform: Offers simplicity, reduced workload management
effort, and protected performance. However, a separate hardware platform will
be required, resources on the sandbox platform may not be fully utilized, and joining to production data will be difficult.
2. A Dual Active environment: Using the active standby platform of a dual active
system offers reliable performance during normal times. However, the sandbox
application will be unavailable during failover. Depending on what else is running
on the standby system, may involve fewer workload management controls. Provides good use of spare resources.
3. The production data warehouse platform: Hardware resources less likely to be
under-utilized, ease of joining to production data, as well as the most costeffective solution. Enables a clear pathway to becoming integrated with production data and eliminates inter-platform data movement. Will require more workload management, and sandbox applications may experience more responsetime variability.
This Orange Book concentrates on the third alternative--running the sandbox applications alongside of the production work already in the data warehouse. Suggestions on how to set up the environment to support this coexistence will be made in
subsequent chapters.
1.3.
The Audience
This book is written for the data warehouse architect, database administrator, or
others who have a basic technical background in the Teradata Database. It is assumed that the reader has a basic understanding of Teradata workload management, including Priority Scheduler and the Filters and Throttles functionality.
4
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
Supporting Sandbox Applications
541-0006954A02
2. Workload Management Techniques
Sandbox applications are a special case of processing that are usually treated as
background work, just as test or development might be treated when it runs on the
production platform. It is important that the sandbox work get done, but that it not
interfere with the production work.
Workload management tools are the key to bringing sandbox applications into the
production platform. The same workload management techniques that have become invaluable in supporting active data warehousing and mixed workload platforms can be used both to protect the production work from the sandbox application,
and to ensure that the sandbox application has a predictable level of resources.
2.1.
Priority Scheduler
The first workload management effort to make when supporting sandbox applications is to set priorities according to the importance of the work. This is the same activity you would undertake using Priority Scheduler to support a mixed workload or
an active data warehouse. This effort involves differentiating the different workloads
that will be active on the platform and quantifying the relative importance of the different work. See the Orange Book Using Priority Scheduler, Teradata Database
V2R6 for guidelines and recommendations on how to implement different priorities.
There are two guidelines for managing sandbox users with priorities when the data
is on the production platform:
1. Place all sandbox users under the control of a Performance Group or a Workload
Definition that maps to its own dedicated Resource Partition. (If a dedicated Resource Partition is not available, then map to a single Allocation Group within a low
priority Resource Partition.)
2. Give the sandbox users a priority that is below most, or all, of the production work.
A relative weight assignment between 1% and 5% will usually be adequate. (See
Figure 1 below.)
Resource
Partitions
Performance
Groups or Workload
Definitions
Allocation
Group Relative
Weights
RP1
Tactical
RP0
Default
RP2
Standard
L
M
H
R
T1
QA
1%
3%
7%
10%
50%
1%
AH
5%
DM
RP3
Sandbox
AN
QT
SB
8%
13%
2%
Figure 1: Give the sandbox work a dedicated Resource Partition and a comparatively low relative weight.
In Priority Scheduler, relative weight is an indication of comparative priority and controls frequency of access to CPU. It is not a predictor of, but can influence, CPU
consumption. Work that runs with a relative weight of 1% may, and often does, consume a significantly higher percentage of CPU. See Chapter 5 of the Priority
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
5
541-0006954A02
Supporting Sandbox Applications
Scheduler Orange Book for a better understanding of relative weight and what it
means.
2.2.
CPU Limits
Because a low relative weight does not always translate to a low CPU consumption,
stronger techniques may be needed in order to contain the CPU used by the sandbox applications. The second workload management technique to consider is CPU
limits.
CPU limits are a Priority Scheduler option that restricts the percentage of CPU that a
group of users are able to consume at any point in time. CPU limits can be assigned
to individual Resource Partitions. For background on using CPU limits, see the Orange Book Guidelines for Using CPU Limits for Post-Upgrade Management and Capacity on Demand.
A CPU limit on the Sandbox Resource Partition will limit the resources that can be
used by the sandbox users. This is important in order to prevent them from taking
more CPU than intended away from production work. Where the CPU limit is set
will depend on how much CPU is intended to be used by the sandbox users. In an
example illustrated by Figure 2 below, the CPU limit on the Sandbox Resource Partition is set to 5%. That is a reasonable starting point for selecting a CPU limit for
sandbox work.
The steps involved in placing a CPU limit on sandbox work include:
1. Place the sandbox users within their own, dedicated Resource Partition.
2. Give the Sandbox Resource Partition a CPU limit set at 5%.
If the system is heavily utilized, it is possible that the higher priority production work
will use resources you would like to be available to the sandbox users. To make sure
that some CPU is always available for sandbox users, a CPU limit may be placed on
the Standard Resource Partition as well. Select that CPU limit based on how much
CPU you would like to leave available at all times for the sandbox active work. (See
Figure 2.) Reading the CPU Limits Orange Book is essential in understanding this
technique and determining the correct CPU limit for non-production work.
CPU Limit = 85%
Resource
Partitions
Performance
Groups
Allocation
Group Relative
Weights
6
RP1
Tactical
RP0
Default
CPU Limit = 5%
RP2
Standard
L
M
H
R
T1
QA
1%
3%
7%
10%
50%
1%
AH
5%
DM
RP3
Sandbox
AN
QT
SB
8%
13%
2%
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
Supporting Sandbox Applications
541-0006954A02
Figure 2: CPU limits may be placed on both the Standard and the Sandbox Resource Partitions.
The steps involved in setting a CPU limit on non-sandbox, production work include:
1. Follow the recommendations to place all non-tactical user or DBA work into a single “Standard” Resource Partition.
2. Place a CPU limit on that Resource Partition that limits users to whatever level
you would like them to use, but that still leaves some CPU available for tactical
work (optional) and the sandbox work.
It is important to understand how to monitor and tune CPU limits. Chapter 3 of the
CPU Limits Orange Book covers this topic in great detail. This information should be
well-understood before attempting to apply CPU limits.
2.3.
Controlling Concurrency
The combination of a very low Priority Scheduler relative weight combined with a
constrictive CPU limit is likely to be effective in keeping most sandbox applications
from interfering with production work. However, a third workload management option is available if the number of sandbox queries being issued reaches such a high
level that they begin to slow each other down, or the production work down, as they
compete for limited resources.
Query limits or Throttles are particularly beneficial because they control demand for
all resources on the platform: CPU, I/O, memory, and AMP worker tasks. With fewer sandbox queries active, more of these resources are available for work that is already active, which allows them to complete sooner. And there will be less impact
on production activities.
Throttles control how many queries coming from a specific source (Users, WDs, Performance Groups, for example) are allowed to be active at any point in time. Queries that would exceed the limit are typically placed in a delay queue and are eventually run. For more information on throttles and how to implement them, see the
Orange Book Filters and Throttles in Teradata Workload Management.
When a throttle is active, there is a counter that keeps track of the number of queries
that are active under the control of that throttle at any point in time. When a query is
ready to begin running, and the query is under the control of that throttle, a check is
made of the counter. If the counter is below the limit, the query is allowed to run; if
the counter is at or above the limit, the query is placed in a delay queue.
Counter
LT Limit?
NO
Delay
Queue
Query 3
Query Runs
Query 2
Limit = 12
Counter = 10
YES
Query 1
Optimized
Query
Object
Throttle
PARSING ENGINE
Figure 3: A throttle controls concurrency levels among a group of queries.
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
7
541-0006954A02
Supporting Sandbox Applications
If the production platform is being fully utilized (reaching 95% to 100% CPU utilization for one hour or more each day), it is recommended that throttles be used to control the number of sandbox queries that can be active at any point in time. For very
busy systems, a good place to start is with a limit of 2 queries at a time for each
sandbox application.
If standard workload management is being used, follow these steps:
1. Provide each sandbox application with a unique account name.
2. Create an Object Throttle rule with a low query limit.
3. Associate each sandbox account name to the Object Throttle rule.
If Teradata Active System Management is being used:
1. Assign each sandbox application to a different Workload Definition.
2. Add a WD Throttle with a low query limit on each sandbox WD.
3. Map all sandbox WDs to the same Priority Scheduler Allocation Group so that they
may share the same priority. (See Figure 4 below.)
Resource
Partition
Workload Definitions
with Throttles
Allocation Group with
Relative Weight
Sandbox
Resource Partition
Throttle=2
Throttle=2
Throttle=2
WD
Sandbox1
WD
Sandbox2
WD
Sandbox3
2%
Figure 4: When using Teradata Active System Management, each sandbox application has its own WD
with its own throttle, but all share the same Allocation Group and its relative weight.
2.4.
Managing Queries
In general, sandbox queries are un-tuned and benefit from the any-query-any-time
philosophy. However, some level of query-by-query oversight may prove beneficial
when sandbox queries run alongside of production work.
Filters, for example, can prevent queries with extremely high estimated row counts
from ever starting to run. Filter rules allow the DBA to specify conditions under
which a query will be allowed, or not be allowed, to begin execution. If the query
does not comply with the stated rules, the query will be rejected and the end-user
who submitted the query will receive an error message. Refer to the Orange Book
Filters and Throttles in Teradata Workload Management for more information on defining Filters.
8
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
Supporting Sandbox Applications
Optimized
Query
Query Rules
List
• User A no scans
• User B LT 3 hours
• User C no Item
table access
541-0006954A02
YES
Query Runs
Does
Query
Comply?
NO
ect
Rej
DISPATCHER
Figure 5: A Filter rule can prevent inappropriate queries from running.
In addition to using Filters to prevent certain queries from running, if Teradata Active
System Management is active, selected queries can be automatically terminated.
For example, it might be better for the production environment if sandbox queries
that exhibit significant processing skew or that use unreasonable levels of CPU are
aborted. Refer to Teradata Active System Management Usage Considerations and
Best Practices for more information on how to set up automatic exception handling.
2.5.
Improving the Performance of the Sandbox Work
If you believe that not enough sandbox work is getting through the system, and there
are available resources on the platform that could be used for this purpose, it is recommended that you follow these tuning steps in order to improve the performance
and throughput of the sandbox work:
1. Analyze how much CPU is being used by the sandbox applications with your current settings. This can be seen in Priority Scheduler monitor output, described in
Chapter 6 of the Priority Scheduler Orange Book. Determine if the CPU being utilized by the sandbox work is at about the same level as the CPU limit on the sandbox Resource Partition, an indication that there is more demand than can be met.
If that is the case, then increase that CPU limit before you make any other tuning
changes. Consider doubling it from 5% to 10%.
2. If resources provided to the sandbox work are still inadequate, then increase the
throttle limit to allow twice as many sandbox queries to run concurrently. If current
sandbox throttles are defined with a limit of 2, increase the limit to 4.
2.6.
Limiting Disk Space
Controlling disk capacity is another method of apportioning resources in the data
warehouse. By limiting how much permanent space is allowed to support a given
sandbox application, the number of base table data blocks that will be in place to
support this work will be restricted. Fewer data blocks translate to fewer resources
required to read through all the data blocks when performing a table scan operations.
Because many sandbox applications are similar to a proof-of-concept, enforcing a
defined level of disk space is sensible and will protect the disk space needed by
production applications. Exceptions requiring more space can be made as needed.
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
9
541-0006954A02
Supporting Sandbox Applications
At one Teradata customer site, permanent space of 100 GB is allocated to sandbox
databases as a matter of course. Exceptional sandbox database needs can go up
to, but never exceed, 500 GB of permanent space.
Limiting the amount of spool space that is available to sandbox users is also a common practice, as a technique to control poorly-written queries. However, due to the
ad hoc nature of sandbox queries, this spool limit can be somewhat higher than that
of the typical spool limit associated to production applications.
Where to set the space and spool limitations will depend on the amount of available
space on the production platform, and the nature of the sandbox applications being
housed on that platform.
2.7.
Processing Window Considerations
Workload management techniques are necessary to prevent the sandbox queries
from impacting the production work. Such rules and controls will be especially important during the busy times of the day, the week and the month. However, processing demands on most Teradata data warehouses is uneven. There may be
spare processing power over on the weekend or during the evening hours, or on holidays. During these times workload management rules can be relaxed, allowing
more sandbox work to get through the system.
Some changes to consider for off-peak hours include:

Increase the Priority Scheduler relative weight used by the sandbox application.

Increase the CPU limit, if one is in place for sandbox work.

Remove or raise any query limits or throttles.
Changing settings by time of day or day or week can be built into Throttle and Filter
rules, as part of the rule definition requires a start and an end time. Multiple such
rules, with different settings, can be defined for different processing windows.
DBAs who work directly with Priority Scheduler often schedule batch jobs (such as
cron jobs on MP-RAS) to make changes in Priority Scheduler settings at specific
times of the day. Such batch jobs typically change priorities and modify CPU limits.
Teradata Active System Management includes the concept of “operating period”
which is representative of a processing window. If multiple such operating periods
have been defined, changes from one period to another can automatically trigger
changes to such things as priority, query limits or exception actions.
10
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
Supporting Sandbox Applications
541-0006954A02
3. Database Options
Sandboxes support coarse data and no-frills applications. The importance of query
tuning diminishes in such an environment. Queries may only be executed one time,
and the emphasis is on speed of delivery via brute force.
This chapter examines database design conventions for data residing in the sandbox. First, database options that you will want to continue to follow will be considered. Second, database options that are still useful, but can be used less extensively will be listed. Third, database options that can be completely ignored will be
shared.
The important database conventions to consistently address in a sandbox database
include the following:

Use of Roles and Views for security and isolation.

Well-thought-out primary index selections for all tables to prevent skewing and
enable well-performing joins.

Maximum block sizes for improved I/O performance for applications that are heavy
users of table scans.
Database options that are less important for a sandbox application include:

Statistics: Fewer will be needed in an exploratory environment because queries are
less likely to be repeated and the data volume will be low. In addition, sandbox
analysis is more likely to be looking at all the data based on full table scans and joins
where all rows participate.

Partitioned Primary Indexing: Offers less value when there are fewer and nonpredictable load events and an absence of repetitive query access. The type of
analysis performed is less likely to pick out selected data, but make more use of full
table scans.

Secondary or join indexing: The temporary nature of the data, combined with
unpredictable access patterns makes extensive tuning with indexes less fruitful. The
space required for such secondary structures may be better utilized by
accommodating more base data.
And finally, some database design options can be completely omitted in the sandbox
databases. This will both save time for any analysis involved in making these settings, as well as eliminate any additional space needed to support these options.
Options you can safely eliminate include the following:

Fallback: Availability is less important on non-production tables.

Value list compression: The temporary nature of the data reduces the benefit of
compression, and the analysis time required to set it up no longer is worth the effort.

Free space specification on table definitions: With no incremental loading expected
in a sandbox, the likelihood of block or cylinder splits is minimal.

Referential integrity: No value for non-production tables, particularly when the data
is not likely to be updated.

Check constraints: No value when data cleanliness is not goal and the data is not
likely to be updated.
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
11
541-0006954A02
Supporting Sandbox Applications
4. Operational Considerations
The sandbox data can be treated as second-class. It is intended to be temporary, it
is not integrated into the global data model, it is not part of the production processes.
Many standard operational routines simply don’t apply to sandbox data.
For these reasons, there is no need to back up the sandbox data. The input file
should be saved so the data can be reloaded, should that become necessary. The
fallback option can be skipped. No ETL processes will be needed, and sophisticated transformation processes are unnecessary. The philosophy of sandbox operations is “take it as it is and get an answer fast”.
For isolated sandbox applications, monitoring and logging may be quite different
than in production. Database Query Log output may be unnecessary, unless there
is a need for after-the-fact resource usage analyses. All of the standard reasons
why DBQL is important to production application—query tuning, capacity planning,
resource consumption, performance analysis—are either less important or not important at all to the sandbox world. Access logging or other audit trail measures can
be bypassed as well.
However, if the sandbox data joins to any of the production tables, then there may
be a clearer role for both DBQL and access logging.
In general, there is no need for the DBA team to review the code that sandbox users
execute or be involved in helping to tune their queries. At one site, queries that
show evidence of being problematic (high CPU to I/O ratios combined with very high
row counts, for example) are automatically terminated, but without the usual followup analysis.
At another site, all new applications running on the production platform are given
some level of review. When a new sandbox application is being considered, DBAs
at that site give the expected workload a high level check-over, so there will be no
surprises when they begin to execute.
12
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
Supporting Sandbox Applications
541-0006954A02
5. Administrative Considerations
While operational considerations are significantly simpler for sandbox applications
running on the production platform, system administration issues are interestingly
the same. One noted difference is that when supporting sandbox data on the production platform, it will be important to keep production reports away from the sandbox databases. At one site, batch user IDs are not allowed to access the sandbox
data.
5.1.
Administrative Steps
Some of the administrative steps that may need to be taken in order to support
sandbox applications are similar to those activities already being done in production.
They include:
1. Establish a central super-sandbox database with the maximum level of permanent space you wish to dedicate to the sandbox activity, for example 1 TB.
2. Give control over this database to a sandbox DBA whose role is to manage and
administer the sandbox users and queries, their roles, profiles and views. This
sandbox DBA is also the point of contact when communication from a given
sandbox database to other production databases is required, for example if joins
between them are required.
3. It is recommended that independent databases be set up for each sandbox application, created out of the initial super-sandbox database.
Each lower-level
sandbox database will come with a prescribed level of permanent space, for example 100 GB, and an expiration date, for example 90 days.
4. Account strings, Users and Profiles will be required for each sandbox application.
This will be useful to control spool usage as well as for mapping to Priority
Scheduler Performance Groups and any Object Throttles that may be required.
The actual amount of space allocated for a sandbox application will depend in part
on whether the application intends to make use of production data. For sites that allow their sandbox applications to use production data, less initial database permanent space may be required for a given application’s needs to be satisfied.
5.2.
Use of Roles
A role is a collection of access rights, which are used to simplify access rights administration. Roles will be essential to keep the sandbox data private and completely separated from production data. Because sandbox data may be raw and unprocessed, it is critical to protect reports based on production data from reflecting any
information that comes from the sandbox tables where the same levels of integrity
are not being enforced. On the other hand, there might be cases where sandbox
data contains sensitive information, such as health or financial history of individuals.
Roles can be used to protect production reporting as well as safeguard the privacy
of the sandbox detail. Set up correctly, only the roles given to sandbox users will
contain the privileges to access sandbox data. And production reports and queries
will use roles that exclude privileges to access sandbox data. In the cases where a
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
13
541-0006954A02
Supporting Sandbox Applications
specific sandbox application requires a join to production data, a second role can be
given to those particular users that supports read access to those production tables.
5.3.
Enforcing Temporary Status
Once end-users get their sandbox data loaded and begin analyzing it, it is not uncommon to discover new uses for that data, beyond the scope of the original project.
Some sites have noticed a tendency for sandbox data that has been placed on the
data warehouse platform to take on permanent residency, bypassing normal production processes.
Without guidelines and procedures in place, it may be difficult to move sandbox applications off of the production platform after they have met their initial goals. This is
especially true if clear expectations and time boundaries were not defined and
agreed to by all parties before implementation.
At one Teradata data warehouse site, the DBA is struggling to retire sandbox applications that have been on the platform for over 2 years. Such after-the-fact efforts
can take time and careful negotiation, and are not always successful. In addition, a
proliferation of hanger-on applications can make it more difficult to host newer, more
urgent ones.
Temporary status can be enforced if the central DBA staff puts together clearlydefined procedures to govern the sandbox applications. For example, at another
site that is supporting sandbox applications, an agreement must be reached between end users and the DBA staff before data for the sandbox application will be
loaded. This agreement includes how much space can be used, what the workload
management settings effecting this application will be, and how long the data can be
retained.
14
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
Supporting Sandbox Applications
541-0006954A02
6. Conclusions and Recommendations
There are both benefits and trade-offs when sandbox applications are supported on
the enterprise data warehouse, as compared to supporting them on a stand-alone
system.
Benefits of hosting sandbox applications on the production platform include:

The platform is already successfully operational.

There may be no, or minimal, additional cost in terms of hardware or software.

Minimal DBA involvement or other support activity will be required.

The size of the configuration may offer powerful processing potential, particularly at
times when production work is less demanding.

Joining from the sandbox to production data is effortless and efficient.

Presence on the production platform enables a smoother path to enterprise-level
integration in the future.
Trade-offs include:

More extensive workload management is required.

Sandbox performance may be more variable due to peaks and valleys in production
processing.

Procedures will need to be put in place to prevent or control the intermixing of
production and sandbox data.
Sandbox applications are a natural fit on Teradata data warehouse platforms, as validated by the Teradata Database’s long history of performing well with load-and-go
applications. With some workload management techniques, on top of the robust
parallel processing that comes with the platform, varied and un-tuned requests can
be absorbed readily onto the production platform.
Teradata Confidential — Copyright © 2007 Teradata Corp. — All Rights Reserved
15