Download Facilities and Techniques for Event Processing

Document related concepts

Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

SQL wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Ingres (database) wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

PL/SQL wikipedia , lookup

Relational model wikipedia , lookup

Join (SQL) wikipedia , lookup

Database model wikipedia , lookup

Transcript
Facilities and Techniques
for Event Processing
Teradata Database V2R6
By:
Date:
Doc:
Active Data Warehouse Center of Expertise
February 14, 2005
541 – 0004922 – A01
Abstract: The emergence of event processing is Teradata Database V2R6 allows a new
class of active data warehouse applications to be supported, and deepens the ability of
Teradata to interact with an organization’s operational environment.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
541-0004922A01
Facilities and Techniques for Event Processing
NCR CONFIDENTIAL
Copyright © 2005 by NCR Corporation.
All Rights Reserved.
This document, which includes the information contained herein,: (i) is the exclusive property of NCR Corporation; (ii) constitutes NCR confidential information; (iii) may not be disclosed by you to third parties;
(iv) may only be used by you for the exclusive purpose of facilitating your internal NCR-authorized use of
the NCR product(s) described in this document to the extent that you have separately acquired a written
license from NCR for such product(s); and (v) is provided to you solely on an "as-is" basis. In no case will
you cause this document or its contents to be disseminated to any third party, reproduced or copied by
any means (in whole or in part) without NCR's prior written consent. Any copy of this document, or portion thereof, must include this notice, and all other restrictive legends appearing in this document. Note
that any product, process or technology described in this document may be the subject of other intellectual property rights reserved by NCR and are not licensed hereunder. No license rights will be implied.
Use, duplication or disclosure by the United States government is subject to the restrictions set forth in
DFARS 252.227-7013 (c) (1) (ii) and FAR 52.227-19. Other brand and product names used herein are
for identification purposes only and may be trademarks of their respective companies.
WebSphere® is a registered trademark of International Business Machines Corporation in the US and
other countries.
WebLogic Integration™ is a trademark of BEA Corporation.
BizTalk Server® is a registered trademark of Microsoft Corporation.
Tibco® is a registered trademark of Tibco Software, Inc.
Revision/Version
A01
Authors
Date
Primary contributors:
Rick Glick, Bob Hahn
Others: Carrie Ballinger
02-07-05
Comments
Initial version
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
Table of Contents
1. Introduction....................................................................................................1
1.1. Why Do Event Detection in Teradata? ............................................................................... 1
1.2. Tools for Event Generation and Management ................................................................... 2
1.3. Scope.................................................................................................................................. 2
2. Queue Tables .................................................................................................3
2.1. Queue Table Advantages for Event Processing ................................................................ 3
2.2. Considerations Using Queue Tables.................................................................................. 6
2.3. Example of Queue Table Use ............................................................................................ 9
3. Stored Procedures ......................................................................................10
3.1. Standard vs External Stored Procedures ......................................................................... 10
3.2. External Stored Procedure Examples .............................................................................. 11
3.3. A Simple Work Dispatcher Example ................................................................................ 12
4. User Defined Functions -- Scalar ...............................................................15
4.1. Protected vs Nonprotected ............................................................................................... 15
4.2. Opportunities for Scalar UDFs in Event Processing......................................................... 16
5. User Defined Functions -- Table.................................................................21
5.1.
5.2.
5.3.
5.4.
5.5.
5.6.
How Table Functions Work .............................................................................................. 21
Table Functions with Transformations and Text Manipulation......................................... 22
Table Functions with Analysis .......................................................................................... 23
Table Functions that Generate Data ................................................................................ 23
Table Functions with External I/O .................................................................................... 28
UDF Considerations ........................................................................................................ 35
6. Using Triggers in Event Strategies ............................................................36
6.1. The Firing Statement ........................................................................................................ 36
6.2. Trigger Complexity Tradeoffs ........................................................................................... 39
6.3. Other Examples of Event Triggers ................................................................................... 42
7. Enterprise Data Warehouse Considerations.............................................43
7.1. Monitoring ......................................................................................................................... 43
7.2. Security............................................................................................................................. 44
7.3. Workload Management .................................................................................................... 45
8. Interacting with Event Architectures Outside Teradata ...........................48
8.1. Service Oriented Architectures......................................................................................... 48
8.2. How to Expose an Event or Service in Teradata.............................................................. 49
8.3. Example of Teradata within an SOA ................................................................................ 49
9. Final Thoughts.............................................................................................54
Appendix……………………………………………………………………………...55
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
iii
541-0004922A01
Facilities and Techniques for Event Processing
Table of Figures
Figure 1: Five stages in the evolution of the data warehouse........................................................ 1
Figure 2: Business Process Initiation/Continuation ........................................................................ 3
Figure 3: Periodic polling as a batch approach to capturing events............................................... 5
Figure 4: Queue tables support immediate notification of events .................................................. 5
Figure 5: Selecting a Queue Table primary index for processing performance............................. 7
Figure 6: Mini-batch event processing using queue tables for coordination .................................. 9
Figure 7: All components can be scaled out inside the database .................................................. 9
Figure 8: External stored procedure writes to a queue outside of Teradata ................................ 11
Figure 9: A Spawned Stored Procedure Architecture .................................................................. 13
Figure 10: A UDF is used to parse and process an XML document stored as a CLOB .............. 16
Figure 11: One UPI value is selected, therefore one AMP executes the UDF ........................... 21
Figure 12: Rows selected from the Allamp table control which AMPs do the work ..................... 25
Figure 13: The table function’s output is similar to a derived table ............................................... 27
Figure 14: If the query requests 1 day, only 1 partition is returned by the table function............. 33
Figure 15: Each date selected causes one query to be executed on the remote system............ 34
Figure 16: Processing an event may involve multiple physical transactions ............................... 37
Figure 17: The approach to using triggers can extend or reduce the recovery unit..................... 39
Figure 18: Triggers defined on the TPumpStatusTbl preseving status information ..................... 42
Figure 19: Providing a user and password for external platform.................................................. 45
Figure 20: Tibco Workflow example ............................................................................................. 51
Figure 21: Teradata Adaptor Configuration.................................................................................. 52
Figure 22: Teradata Adaptor Services Settings ........................................................................... 53
iv
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
1. Introduction
Data warehousing is in a constant state of forward motion. Within that motion, patterns in the evolution of data warehousing can be defined within five typical stages:
1) Reporting; 2) Analyzing; 3) Predicting; 4) Operationalizing; and 5) Activating.
This Orange Book is about the mechanics for event processing and connecting
with the enterprise, addressing the 4th and 5th stage in the evolution of active data
warehousing. The focus will be on implementing event processing inside Teradata
using Teradata Database V2R6 features.
STAGE 1
STAGE 2
STAGE 3
STAGE 4
STAGE 5
REPORTING
ANALYZING
PREDICTING
OPERATIONALIZING
ACTIVE
WAREHOUSING
WHAT happened? WHY did it happen? WHY will it happen? WHAT IS Happening? MAKING it happen!
Primarily
Batch
Batch
Increase in
Ad Hoc
Queries
Ad Hoc
Analytics
Analytical
Modeling
Grows
Continuous Update & Event Initiated
Time Sensitive Queries
Actions
Gain Importance
Take Hold
Continuous Update/Short Queries
Event Initiated Actions
Figure 1: Five stages in the evolution of the data warehouse
Each iteration of data warehousing builds upon its predecessor to increase overall
business value. Previous evolutionary stages are stepping stones that create the
conditions that support an integrated enterprise-wide event architecture. Such an
event-inclusive architecture requires integrated decision-making data, needs tactical access and fresh data, relies on crisp and deep analytic capabilities. Event
processing sits on top of a pyramid built upon and fed from these earlier, more established capabilities.
1.1. Why Do Event Detection in Teradata?
For years, Teradata has been a successful platform for performing analysis on
data. Teradata’s ability to make correlations and enable complex analytics such as
predictive modeling has helped people discover interesting things about their data.
The emergence of event capabilities within Teradata allows what used to be ad hoc
discovery endeavors to be standardized into regular practice.
Teradata is now capable of richer interaction with external systems, which allows
you to deploy analytics closer to the data.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
1
541-0004922A01
Facilities and Techniques for Event Processing
Exploiting the event capabilities inside of the Teradata data warehouse offers advantages that may be difficult to realize with an external event architecture tool.
Embedded events support deep analysis with all of the information already collected in the data warehouse. Inside-Teradata events can interact with other decision-making already in play, with minimum overhead and no manual intervention.
The analysis that these events rely on will be richer and deeper if they unfold inside
the data warehouse.
Also, because it is able to bring together disparate sources of information, Teradata
can offer cross-subject-area conclusions, to what may appear to be simple questions. For example, determining if there is going to be drug interaction may require
access to and knowledge of an array of different data already in Teradata.
1.2. Tools for Event Generation and Management
Teradata has the following components which can be useful in event initiated processing and in operationalizing event generation and management:
•
•
•
•
•
•
Queue Tables
SQL-based Stored Procedures
External Stored Procedures
User Defined Functions
Table User Defined Functions
Triggers
With queuing functionality inside the database and the ability to reach outside the
database in real-time via external stored procedures and UDFs, Teradata’s role
has expanded into being a player in the overall event architecture.
1.3. Scope
This Orange Book is intended to explain the mechanics available in the database
to initiate events, to interact with the outside world, and to do in-the-database
analysis of events, both internal and external. It does not extend itself to the perspective of implementation of business functions. This Orange Book discusses
the tools, and leaves what can be done with those tools to a later discussion.
The majority of this Orange Book focuses on the internal tools that support event
processing, illustrated by examples and prototypes. But a second, equally important discussion is presented in Chapter 8: How Teradata can fit into modern service-oriented architectures, such as IBM’s WebSphere® Business Integration
Server, BEA’s WebLogic Integration™, or Tibco® BusinessWorks.
The targeted audience for this Orange Book is Teradata database administrators,
enterprise application architects, business analysts, and NCR/Teradata associates
who have a background in Teradata database management and implementation.
The content and terminology assumes the reader has knowledge equivalent to that
acquired from the Teradata Physical Database Design class.
2
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
2. Queue Tables
Queue tables are database objects similar to tables but with the properties of
queues. An Orange Book entitled “Queue Tables User’s Guide” (541-0004817),
published in October of 2004, is a good source of detail on how to use these structures.
Two of the most relevant properties of a queue table are exposed in the SELECT
and CONSUME syntax:
1. Blocking read, which means that if a queue table is empty at the time a query is
trying to access a row, the query will wait until a row is placed in the table.
2. Destructive read, which means when a queue table row is being accessed, it is
automatically deleted from the queue table at the time the transaction commits.
Just briefly, when an event is identified, a row can be written immediately into a
queue table, with appropriate data to identify the event. A queue table is generally
associated with at least one process that is waiting for the appearance of data in
the queue table, data which represents an event. This process will then read a
row, by means of a SELECT AND CONSUME statement, from the queue table and
continue on with processing the event. After the event is dealt with, the process listens for another message.
Blocking
Read
SELECT and
Call a
Stored
Procedure
OR
CONSUME
OR
Program or
Trigger
INSERT
a Row
Queue
Table
Publish
Externally
Process
Monitoring
Figure 2: Business Process Initiation/Continuation
2.1. Queue Table Advantages for Event Processing
There are several key advantages of queue tables for event processing:
•
•
•
•
•
•
They provide a highly-efficient mechanism to pass information from one place to another at the moment that the information becomes available.
Queue tables allow you to decouple the notification of an event and its consumption,
such that they can be processed asynchronously and independently.
Queue tables can act as a buffer when the inserts are occurring at a different rate
than the consumption.
They are a more efficient alternative to periodic polling, the primary method of reporting events inside Teradata pre-V2R6.
The message itself can be structured by defining columns for different attributes.
The SELECT AND CONSUME implementation offers both blocking reads and destructive reads.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
3
541-0004922A01
Facilities and Techniques for Event Processing
2.1.1. Efficiency
Both writing to and consuming from the queue table is a single-AMP operation, and
as such, avoids the blocking potential and coordination effort of all-AMP activities.
One transaction can be writing to the queue table at the same point in time that a
second transaction is reading and consuming a row with a different row hash, without conflict.
In addition, any part of your IT infrastructure, whether internal or external from
Teradata, can act as a publisher or subscriber, either writing to or selecting and
consuming from a queue table.
2.1.2. Asynchronous Processing
Processing an event asynchronously is similar to leaving a voice mail message
when someone’s line is busy. You can be confident that the message will be delivered in the near future, but you are not required to wait around keeping your own
phone line open and twiddling your thumbs until that time comes. You are free to
terminate your call after leaving your message and continue on with other activities.
Asynchronous processing as represented by queue tables has the same advantage. The poster of the message is not weighed down or held back by whatever
subsequent processing is done based on the posted message. He is free to commit this transaction and move ahead with other work.
2.1.3. Queued Notification vs. Periodic Polling
Because it enables asynchronous processing, data can be loaded into a queue table and buffered until it can be processed by procedures that perform potentially
complex analysis or insertion of data.
Periodic polling relies on a program querying a table at regular intervals, a table
that is acting as a collection points for events. Triggers may have been defined to
insert rows into this collector-table when each individual event is recognized.
The interval of time between program executions may or may not match the appearance of expected events. Some of this timed access may be fruitless, as the
polling program may be casting a net when there are no fish in the pond. Polling
exhibits a constant and un-resolvable tension between asking too often and incurring more overhead vs. conserving resources by asking less often and allowing too
much time to go by before the event is properly processed. Queue tables solve
this quandary via blocking reads.
An example from a recently-performed active data warehouse benchmark illustrates the advantages of queue tables over the traditional polling approach to reporting events. In this example, a Business Activity Monitoring (BAM) query reports on the effectiveness of current promotions, based on sales that happened
just a few minutes ago. To support this BAM query’s needs, a subset of rows inserted into the Mkt_Basket_Dtl table are identified as part of a special promotion,
and thus as significant events.
This first graphic below illustrates the pre-V2R6 image of the event processing,
where triggers on Mkt_Basket_Dtl inserted a row into a staging table for each pro4
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
motional sale recognized. Periodic polling was used in this earlier version of the
benchmark to pull those inserted rows out of the staging table every 15 minutes,
followed by an emptying of the staging table.
Every 15 minutes
Periodic
Query
Mkt_Basket
_Dtl
Trigger
Promotional
DB Table
Report
Figure 3: Periodic polling as a batch approach to capturing events
The periodic polling approach was replaced in the V2R6 version of the benchmark
by queue tables. Using queue tables pushes the information out to a dashboard
where it can be seen and acted on when the event is first noticed.
SELECT and CONSUME
Blocking
Query
Mkt_Basket
_Dtl
Trigger
Queue
Table
Display on
Dashboard
Figure 4: Queue tables support notification of events
The queue table definition from the benchmark follows:
create table event02_QT, QUEUE
(table_event02_QT_QITS TIMESTAMP(6) NOT NULL
DEFAULT CURRENT_TIMESTAMP(6),
orderkey DECIMAL(18,0) NOT NULL,
productkey DECIMAL(18,0) NOT NULL )
PRIMARY INDEX (orderkey);
2.1.4. A Structured Message
Because queue tables are based on Teradata base tables, they support the relational concept of sets of rows composed of different columns. Each row in a queue
table represents a different message; each column, a different attribute within the
message.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
5
541-0004922A01
Facilities and Techniques for Event Processing
Thus, the message being passed in Teradata is formatted, with information describing each column held in the data dictionary. There is no extra effort required
to decompose the message into meaningful attributes, as there would be if the
message were read as a continuous string.
2.2. Considerations Using Queue Tables
Queue tables often act as conduits rather than repositories, as a means of transport, not a destination. For that reason, consider the lifespan of a row placed in a
queue table to be short, compared to a base table, whose rows usually reflect nontransient, often historical data.
As a result, you do not need to be concerned with collecting statistics or indexing.
In the first release, views are not supportable on top of queue tables, nor may you
build join indexes or hash indexes on them. The source or destination of the AS
clause (when copying table definitions) may not be a queue table. Currently,
queue tables may not be replicated. Recognize that there needs to be sufficient
space for the queue table to handle maximum bursts of events, and to account for
downtimes of event handlers.
However, just like having confidence that your voice mail messages will be delivered and acted on, it is extremely important that queues be 100% reliable. Because reliable messaging is key to any application using queues, it is important to
note that Teradata queue tables benefit from all the standard database reliability,
including transient journaling, fallback, and the ability to backup and recover.
Database Query Log (DBQL) treats queue tables the same as ordinary database
tables. Below is output from the DBC.DBQLObjTbl after a single insert into, and a
single select and consume from a queue table. When a queue table access is recorded, DBQL uses the object type ‘T,’ the same object type as a base table.
QueryID
31590
31590
31590
31590
31590
31595
31595
31595
31595
31595
ObjectTableName
?
QUEUETST1
QUEUETST1
QUEUETST1
QUEUETST1
?
QUEUETST1
QUEUETST1
QUEUETST1
QUEUETST1
ObjectColumnName
ObjectType
?
?
MessageBody
MessageID
messageT
?
?
messageT
MessageID
MessageBody
D
T
C
C
C
D
T
C
C
C
2.2.1. Primary Index Selection
Selection of a primary index may, or may not be important when defining a queue
table. The default, if no primary index is specified, is the QITS (Queue Insertion
Time Stamp) column, the first column in the queue table, a column that reflects the
time of insertion of that row. If you need to do a primary index update or delete of
6
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
one row from the table, in most cases you can easily browse the table and get the
value of the QITS and other columns needed for the single-AMP update activity.
However, you may choose a key column based on the input data that is easily
known, to support frequent updating. The queue table definition presented in Section 2.1.3 above has orderkey as its primary index, for example. Using a business
entity for the primary index makes sense if you have a requirement to re-order the
queue on a regular basis (or otherwise manipulate the rows), and the number of
rows it holds is not trivial, making browsing the entire queue table for each update
less desirable.
Another situation calls for more attention to PI selection of the queue table. That is
the case where you are performing insert/select processing from a staging table
into a queue table. Such an insert/select might perform a similar function as a trigger when doing row-at-a-time inserts: Select out the few rows that compose
events and insert them immediately into a queue table for further processing.
Different Primary Indexing
Staging
Table
Insert/Select
Queue
Table
PI = ClaimID
AMP0
PI = QITS
AMP1
AMP2
R
AMP3
Same Primary Indexing
Insert/Select
Staging
Table
PI = ClaimID
AMP4
Queue
Table
PI = ClaimID
AMP5
AMP6
AMP7
Row
tion
tribu
edis
Staging Table
Row Inserted
Here
Queue Table
Row Inserted
Here
Figure 5:
Staging Table
Row Inserted
Here
Queue Table
Row Inserted
Here
Selecting a Queue Table primary index for processing performance
If you are using a mini-batch approach to loading data, then using set processing to
identify events makes sense. The WHERE clause on the insert/select statement
would contain the event-identification criteria. In this case, illustrated above, having a queue table with the same primary index definition as the staging table will
improve the efficiency of the insert/select processing. The two inserts into the staging and the queue table would happen on the same AMP, eliminating row redistribution overhead.
Designing a queue table to share the same primary index as a base table might
also be useful for tactical queries. A given tactical application may want to peek at
the queue, or seek out the presence of a specific row on the queue. If the base table primary index value is known, and is also the primary index value of a potential
queue table row, then single-AMP access will be enabled.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
7
541-0004922A01
Facilities and Techniques for Event Processing
A similar situation exists if TPump inserts are into a base table that has a trigger
into a queue table. If the queue table and the base table being inserted into share
the same primary index, and there is a desire to serialize the input to avoiding
cross-session blocking, the serializing on the base table’s primary index will also
cause inserts into the queue table to be serialized effectively.
2.2.2. When Requests are Delayed
Programs attempting to SELECT AND CONSUME from queue tables will block until such time as there is a row present in the queue table. No locks on the queue
table are granted to the transaction that is in delay mode waiting for a row
A limit on the number of total possible sessions that may go into the delayed state
has been set at 20% of the total possible sessions (usually that will be 24 delayed
sessions per node). Once that threshold has been exceeded, an error will be returned to the user whose query would have been the next one delayed.
If Teradata Dynamic Workload Manager (formerly known as Teradata Dynamic
Query Manager) rules are enabled, it is possible that requests intended to select
and consume may themselves be delayed due to object throttle rules (formerly
known as workload limit rules), or rejected due to query filter rules (formerly known
as query management rules). In the former case (that is, when delayed), the
queue table queue depth may increase and events may not be processed as
quickly as anticipated.
2.2.3. Transactional Considerations
In order to avoid tying up database resources unnecessarily within a transaction, it
is recommended that the SELECT AND CONSUME statement happen first in an
explicit, or implicit, transaction. That way if the queue table access statement
blocks, no sister-statement row or table locks will be held, potentially blocking other
transactions.
In the following transaction the SELECT AND CONSUME statement will wait for a
queue table row to be inserted and committed by another transaction. Because it
is the last statement in the transaction, table level write locks placed by the all-AMP
update statement preceding it will be held until the queue table can be read.
BT;
UPDATE CLIENT SET CALLFLAG = ‘Y’ WHERE ABS_HON_DT < RECEIPT_DT;
SELECT AND CONSUME TOP 1 * FROM QTBL;
ET;
If queue table consume commands are placed within the same transaction as other
statements that hold locks, do the queue table access first:
BT;
SELECT AND CONSUME TOP 1 * FROM QTBL;
UPDATE CLIENT SET CALLFLAG = ‘Y’ WHERE ABS_HON_DT < RECEIPT_DT;
ET;
8
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
It is important to do the appropriate action for an event as part of the same transaction in which it is consumed, in order to prevent events from being lost.
2.3. Example of Queue Table Use
A queue table can be useful to pass information from one stored procedure to another, each of which has a different task in the chain of event processing. For example, if one stored procedure is performing mini-batch insert/selects into a base
table, it could use an insert into a queue table as a method to indicate that a minibatch cycle is complete.
A second stored procedure could be standing by trying to read from the queue table so it can initiate further, more complex processing, perhaps reading and updating information in another base table, or performing more complex analyses.
Because the queue table supports structured messages, the first stored procedure
could pass detailed information as to what actions need to be taken next. The
second stored procedure, relying on its procedural logic, could branch in the code
depending on what information was passed in the message.
Load Stored
Procedure
INSERT
Queue
Table
SELECT / Event Processing
CONSUME
1. Insert/Select
Stored Procedure
1. Selects/Consumes
Queue Table row
2. Signify I’m done
by Insert into
Queue Table
2. Processes events in
another base table
Figure 6: Mini-batch event processing using queue tables for coordination
Queue tables being read by stored procedures lend themselves well to scaling out
as demand grows. When the number of rows in the queue table exceeds the capability of a single stored procedure to process, you may increase stored procedure instances as needed.
Queue
Table 1
Queue
Table 2
SELECT /
CONSUME
SELECT /
CONSUME
SP1
SP1
2
2
3
3
Figure 7: All components can be scaled out inside the database
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
9
541-0004922A01
Facilities and Techniques for Event Processing
3. Stored Procedures
Stored procedures can encapsulate both event detection and its processing, and
allow the mixing of SQL with procedural logic. Teradata has supported standard
stored procedures since the V2R4.0 release. Stored procedures are database objects that must be compiled prior to use. Their object code is held in the data dictionary. Parameters may be passed into and out of a stored procedure, which itself
is a program that is called from and executes within the database.
A stored procedure executes in the parsing engine (PE) under one parser task.
While the SQL portions of the stored procedure will be executed across all AMPs
and benefit from Teradata’s inherent parallelism, the procedural portions will not.
Algorithms will be most efficient when written to make use of Teradata’s set processing advantage. Particular attention should be paid to row-at-time cursor processing within a stored procedure, as that activity will not be parallelized.
3.1. Standard vs External Stored Procedures
External stored procedures are new in Teradata Database V2R6. External stored
procedures are similar to the standard SQL-based variety, in that they are called
from and execute within the database, they support parameters, and while only one
may be called by any given session at a time, a stored procedure can call another
stored procedure. Either variety may be invoked by a trigger, as a result of an action on a table.
3.1.1. Differences
However, there are some clear differences between the SQL-based and external
varieties. The following list shows some of the ways external stored procedures
are different:
•
•
•
•
Implemented in C or C++ code
Can issue SQL by invoking an SQL-based stored procedure
Cannot be invoked from another external stored procedure
Can perform external I/O, such as reading or writing from a file, a message queue, or
an EAI bus
• Unrestricted access to the operating system and library functions
3.1.2. Benefits of External Stored Procedures
The essential value of external stored procedures for event processing is apparent
in at least two contexts:
•
•
External communications
Leveraging C programs and library functions already in existence
In considering communications outside of the Teradata database, reading and writing to an external queue, such as WebSphere MQ (referred to here as MQ), can be
extremely powerful for processing events. Speaking to the advantage of the first
bullet in the differences list above, some analytic algorithms are more easily ex-
10
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
pressed in C than in SQL. External stored procedures allow those analyses to be
performed under the control of the database.
3.1.3. When to Consider an External Stored Procedure
There are several good reasons to consider an external stored procedure:
•
•
•
•
Processing close to the data is a benefit because you have access to other associated data, if needed, and do not incur the overhead of pulling the data out in a raw
form that may be less cumbersome in its final form.
Teradata offers sound reliability and availability advantages, and doing all the work in
one place eliminates concerns about having more than one system up and running
to get the work accomplished.
Using the Teradata infrastructure, you can easily scale out the processing if you
need, increasing how many instances of the same stored procedure that you invoke,
as demand increases. This scale-out can even be automated, depending on the
time of day, or day or week.
The signature (input and output parameters) is maintained in the Teradata dictionary.
3.2. External Stored Procedure Examples
To illustrate a simple external stored procedure, a prototype was built with such a
stored procedure reading a queue table, then writing the contents to an MQ queue
outside of Teradata. Detailed code from the prototypes discussed in these chapters will be posted, when they are mature, on the Tech Center site on Teradata.com.
http://www.teradata.com/t/page/118769/index.html
The stored procedure is started up and calls an SQL-based stored procedure. This
SQL-based stored procedure will block using a SELECT AND CONSUME until a
row has been inserted into the queue table. As soon as a row is available, a destructive read of that queue table row occurs, and the SQL-based stored procedure
continues execution, returning the message to the external stored procedure. That
queue table row is no longer available for any other transaction to read.
As soon as the queue table row is read, the external stored procedure does a put
to the queue, using the Teradata WebSphere MQ access module. The stored
procedure then loops back to attempt another read from the queue table, and will
wait until a new row has been inserted, if needed.
TERADATA
Claims
Table
Trigger
Queue
Table
SELECT and
CONSUME
GetMsg
SP
WriteMQ
XSP
Put Msg
on Queue
MQ
Queue
Figure 8: External stored procedure writes to a queue outside of Teradata
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
11
541-0004922A01
Facilities and Techniques for Event Processing
One clear advantage of this approach is that rows inserted into the queue table can
be immediately put to an external queue outside of Teradata. A second advantage
is that a single MQ Connect/Open can be amortized over many Get/PUTs.
The external stored procedure call used in this prototype looks like this:
call WriteMQ('queue.manager.1‘
,'QUEUE1‘
,'CHANNEL1/TCP/153.64.119.177‘
,'rmh.getmsg‘ -- name of SP to call--this one consumes a Queue table
,nummsgs);
[Prototype Example #1]
The parameters being passed with the WriteMQ stored procedure are:
•
•
•
•
Queue manager name
Queue name
Client communication channel
The name of a second stored procedure that is SQL-based
This SQL-based stored procedure ‘getmsg’ is called within the external stored procedure. It performs the SQL that reads the queue table, a row at a time. This
SQL-based stored procedure code looks like this:
replace procedure rmh.getmsg(Out msg varChar(32000))
Begin
sel and consume top 1 MessageBody into :msg
from rmh.mqmsg;
End;
3.3. A Simple Work Dispatcher Example
To stimulate thinking, here is an example of the kind of things that can be built
upon the basic event infrastructure inside of Teradata. In the following prototype,
the database is being deployed to schedule the event and fan it out. Several external stored procedures are initiated and controlled by an event infrastructure built
by the implementer.
Key to this prototype is a stored procedure that acts as a manager (SPMGR). This
stored procedure reads from a single queue table and starts off other stored procedures to accomplish specific tasks. Triggers from different tables in the database
can place rows in this queue table. Each row represents a command to be executed on behalf of an event, and carries three columns:
•
•
•
A logon string
A command indicator
SQL syntax to call to a specific stored procedure
This queue table looks like a task list of events to be processed. SPMGR, which
reads the queue table, is an external stored procedure. For each row it reads, it
12
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
dispatches new work by logging on a new session (via CLI) to Teradata. Each of
these sessions executes a stored procedure using the logon string and the SQL
syntax contained in the queue table row that was just read. Multiple such sessions
can be held open at the same time.
The queue table decouples the trigger and the processing of the event, preventing
the original transaction from being held up.
A
CL syn
IC c
al
l
SP1
SELECT /
CONSUME
Trigger
Base
Tables
Queue
Table
Async
CLI Call
SPMGR
Table Function:
Read from MQ
SP2
XML
Shredding
c
yn all
As I C
CL
SP3
UDF: Single
row scoring
Control
Table
Figure 9: A Spawned Stored Procedure Architecture
All stored procedures that are spawned from SPMGR use a common method of
logging and accept control commands. A shared command table makes these
control commands available to all active sessions. Each running stored procedure
places an entry in the control table when they first begin processing, and removes
the row when they complete. They periodically read the control table for new directives. For example, there may be a directive asking them to shut down. Each
stored procedure is responsible for logging into a common set of log tables (not
shown here).
This command table offers reliability and recoverability. If one of the stored procedures fails, the command table will show that the job never completed.
This design is scalable to meet increasing demand. Multiple SPMGR instances
can be reading from the same queue table, each spawning their own set of worker
stored procedures. All spawned stored procedures, no matter what their point of
origin, will report in to the command table by inserting and deleting rows.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
13
541-0004922A01
Facilities and Techniques for Event Processing
The syntax to create the queue table used in this prototype follows:
CREATE MULTISET TABLE RDG.spmgrq ,QUEUE ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
InTS TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
RwID INTEGER GENERATED ALWAYS AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE -2147483647
MAXVALUE 2147483647
CYCLE),
LogonStr VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
Command VARCHAR(32) CHARACTER SET LATIN NOT CASESPECIFIC,
spCall VARCHAR(1024) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( RwID );
[Prototype Example #2]
14
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
4. User Defined Functions -- Scalar
User Defined Functions (UDFs) are database objects that the implementer either
builds or acquires, that can extend the capability of normal SQL within the database. A UDF is similar to the standard SQL functions such as SQRT, ABS, or
TRIM and is invoked in exactly the same manner.
UDFs execute in parallel within the database. However, the developer of the function can direct which AMPs will participate and which AMPs won’t. UDFs may be
written in C or C++, and are then compiled into shared objects in UNIX, or into dynamic link libraries (DLLs) in Windows.
Once compiled, the UDF can then be referenced in SQL statements for activities
such as enforcing business rules or aiding in the transformation of data. Samples
of User Defined Functions can be found at the Tech Center site on Teradata.com:
http://www.teradata.com/t/page/118769/index.html
There are 3 types of UDFs:
•
•
•
Scalar, used like a column and operates on the values of a single row
Aggregate, returns a result (such as a MAX or a SUM) from a pass over a group
Table Function, appears in the FROM clause and returns a table, a row at a time
This chapter will explore scalar UDFs, and table function UDFs will be addressed in
the next chapter. Aggregate UDFs will not be addressed in this Orange Book. Information on implementing UDFs can be found in the Orange Book titled “Teradata
Database User Defined Function User’s Guide,’ authored by Mike Watzke, August,
2003.
In Teradata, UDFs execute under the control of the AMP and can be very efficient
doing row by row complex analyses. They are scalable and inherit all of Teradata’s natural parallelism.
4.1. Protected vs Nonprotected
When you create a UDF, the mode for that UDF will be the default of “protected”.
Protected means the UDF runs in its own address space, and is isolated from other
AMP work. If you are running in protected mode and a hardware or software fault
occurs, the user is notified and the database does not restart, and any required
cleanup is done. If you are running in non-protected mode and are holding resources such as memory, and the UDF aborts or a fault occurs, the resources may
not be cleaned up.
When a UDF runs in protected mode, it runs as user “tdatuser” which is established
when the database is installed. This is a generic user that has no special privileges
that any other ordinary user might have on the system. UDFs running in protected
mode use a separate process set up for that purpose, rather than using AMP
worker tasks. These processes are referred to as protected mode servers. See
Section 7.2.1 for security considerations.
Depending on where your default has been set, there will be a limit of from zero to
a maximum of 20 protected mode servers available at any one time. Each proNCR Confidential — Copyright © 2005 NCR — All Rights Reserved
15
541-0004922A01
Facilities and Techniques for Event Processing
tected mode server requires 256 KB of file space on the system disk. If 20 per
vproc is the default setting you use, and you have 8 vprocs per node, then 8 x 20 x
256 KB = 40 megabytes of system disk space will be required. Note that the performance in protected mode will be somewhat slower.
Although protected mode has some limitations, in order to allow a UDF to do I/O
safely and not interfere with the database, it is recommended that such UDFs run
in protected mode.
When not running in protected mode, the UDF will run in the context of the AMP
worker task already in-use by that query step. No additional AWT overhead is involved.
4.2. Opportunities for Scalar UDFs in Event Processing
Because they centralize control over specific actions and are highly flexible, UDFs
are ideal for managing events and standardizing operations inside of Teradata.
Some of the special things that can be done using UDFs are:
•
•
•
Transformations and text manipulation, such as XML to non-XML text, or converting
a picture into a thumbnail
Analytics, such as scoring of a predictive model or performing risk assessment
External I/O, such as talking to other EAI systems, getting external data, or talking to
queries that run outside of Teradata
The following three sections provide illustrations of scalar UDFs supporting event
processing within Teradata.
4.2.1. Processing XML Documents
One example where scalar UDFs are useful is scanning an XML document and returning specified content, after that document has been stored inside the database.
The following example is of a UDF that uses XPath, which is a set of syntax rules
that allow you to navigate an XML document. XPath, which has a function similar
to substring, uses path expressions to identify nodes in an XML document.
Teradata
Client
PathValue
UDF
OrderLog
Orderkey 7728
Ordekey 932
Orderkey 025418
CLOB
CLOB
CLOB
Figure 10: A UDF is used to parse and process an XML document stored as a CLOB
16
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
Depending on your requirements, the XML document could be stored as a CLOB
(Character Large Object) or as a varchar column. The former is illustrated in the
graphic above, while the following prototype uses the latter.
In this example below, the XML document is stored inside Teradata as one varchar
column, XMLOrder. The base table, OrderLog, only contains two columns,
PONum and the varchar column. Here is the XML document:
<?xml version="1.0"?>
<ROOT>
<ORDER>
<DATE>8/22/2004</DATE>
<PO_NUMBER>101</PO_NUMBER>
<BILLTO>Mike</BILLTO>
<ITEMS>
<ITEM>
<PARTNUM>101</PARTNUM>
<DESC>Partners Conference Ticket</DESC>
<USPRICE>1200.00</USPRICE>
</ITEM>
<ITEM>
<PARTNUM>147</PARTNUM>
<DESC>V2R5.1 UDF Programming</DESC>
<USPRICE>28.95</USPRICE>
</ITEM>
</ITEMS>
</ORDER>
</ROOT>
<?xml version="1.0"?>
<ROOT>
<ORDER>
<DATE>08/12/2004</DATE>
<PO_NUMBER>108</PO_NUMBER>
<BILLTO>Rick</BILLTO>
<ITEMS>
<ITEM>
<PARTNUM>101</PARTNUM>
<DESC>Partners Conference Ticket</DESC>
<USPRICE>1200.00</USPRICE>
</ITEM>
<ITEM>
<PARTNUM>148</PARTNUM>
<DESC>V2R5.1 Stored Procedures and Embedded SQL</DESC>
<USPRICE>28.95</USPRICE>
</ITEM>
</ITEMS>
</ORDER>
</ROOT>
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
17
541-0004922A01
Facilities and Techniques for Event Processing
The Orderlog table was constructed to look like this:
CREATE SET TABLE orderlog ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
PONum INTEGER NOT NULL,
XMLOrder varchar(63,000) )
UNIQUE PRIMARY INDEX ( PONum );
The following SQL references the XpathValue UDF that uses XPath to pick out
element and attribute content from the XML document. The arguments passed
within the SQL (the BILLTO name, for example) are then used by XPath to search
each document in the table. When a document with that specific billing name is
identified, then the associated PO number and date are returned, as output arguments.
select XPathValue(O.xmlOrder, '//ORDER/PO_NUMBER/*') as PO_Number,
XPathValue(O.xmlOrder, '//ORDER/DATE/*') as theDate
from OrderLog O
where XPathValue(O.xmlOrder,'//ORDER/BILLTO/*') = 'Mike';
[Prototype Example #3]
And the output of the query that uses XPathValue UDF looks like this:
PO_Number TheDate
-------------------------101
8/22/2004
4.2.2. Analytics
Scalar UDFs can support on-the-spot analysis or predictive modeling, at the time of
an event instead of batching up predictions-to-be to process during off-hours. Or
the same scalar UDF can be used in a batch mode. Several different input parameters are fed into a set of algorithms that perform analysis on them and output
a conclusion.
This could be a score, if the algorithms are set up appropriately,
and represent the likelihood that a given customer or client will do something that is
good for the business, like book a trip, take out a loan, or make a particular purchase.
In the example below, a UDF named ‘Strategy’ comes up with a recommendation
for an appropriate financial strategy (‘Aggressive’, “Moderate”, Conservative”, etc.).
The same UDF could be used for a single client, or for all clients. This scalar UDF
encapsulates a simple decision tree analytic, based on data contained in columns
from a table in the Teradata database, in this case SavingsPlanCustomers, and returns a single value, the recommended financial strategy.
18
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
When executed in the batch mode, the output from the UDF execution is inserted
into a base table. But the same UDF could be used by a call center query to return
a financial strategy recommendation for just one client. This would require the addition of a client ID equality condition in the WHERE clause. SQL for the batch approach might look like this:
Insert into StrategyRecommendation
Select ClientID,
Strategy(SPC.age, SPC.balance, SPC.contribution, SPC.income)
From SavingsPlanCustomers SPC;
[Prototype Example #4]
4.2.3. External I/O
In a third example, a scalar UDF has been created that writes a message to an external queue. This prototype is similar to the approach presented in Section 3.2.
But instead of writing a message externally after reading from a queue table and
calling an external stored procedure, in this example a scalar UDF is used.
The UDF calls a Teradata Websphere MQ access module, identical to the access
module a TPump job or other utility might use when processing from a queue.
The SQL is a simple select that contains nothing in the select list but the scalar
UDF and the arguments it expects. Executing this SQL results in one message,
‘Hello World,’ being placed on an MQ queue on the client:
Select WriteMQ('queue.manager.1','QUEUE1','CHANNEL1/TCP/153.64.119.177',
'Hello World');
[Prototype Example #5]
The parameters being passed are the queue manager, queue name, client communication channel, and the content of the message.
The following is the DDL used to replace or create the function:
replace function WriteMQ(
qmgr varchar(256), qnm varchar(256), channel varchar(256), vcmsg varchar(32000))
returns integer
language C NO SQL parameter style sql EXTERNAL NAME
'F:emruwmq:SI:cmqc:/usr/include/cmqc.h:SL:mqic:SL:mqmcs:SS:emruwmq:/home
/rmh/projects/emruwmq/emruwmq.c';
In a broader use of the same UDF, data dictionary information within the Teradata
database is being accessed and written to the external queue. The UDF is invoked for each row found in the DBC.Tables table that meets the requirements
specified in the SQL where clause. The database name and table name are concatenated as a varchar input argument to the UDF that will then write that as a
message to the MQ queue.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
19
541-0004922A01
Facilities and Techniques for Event Processing
select count(*) as SentMsgs
from
(select WriteMQ
('queue.manager.1','QUEUE1','CHANNEL1/TCP/153.64.119.177',
Trim(databasename)||'.'||trim(TableName)) as c1
from dbc.Tables
Where TableKind = 'T')T;
[Prototype Example #6]
What the above example illustrates is the ease of sending an entire result set of an
arbitrary SQL statement to a queue outside of Teradata.
20
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
5. User Defined Functions -- Table
In contrast to scalar functions, discussed in the previous chapter, which return a
single value, table functions are used in the FROM clause and return a set of rows.
When present, a table function can be thought of as a derived table whose rows
are produced by the UDF itself.
5.1. How Table Functions Work
Table functions are sent to the AMPs at execution time. Each AMP calls the function repeatedly, one time for each row being produced, until the function signals
there is no more work to be done on that AMP.
A table function input argument may pass values that will determine what will be
processed, and optionally control which AMPs will be active doing it. As the table
function is called repetitively on the participating AMPs, each AMP builds up a
spool file that contains the rows produced by its instance of the table function.
The input arguments will determine if the table function is called in constant or
varying mode:
•
Constant Mode: If the input arguments use a constant expression, and there are no
correlated columns, then the table function will be sent to all AMPs. The table function can determine which AMPs actually produce rows.
•
Varying Mode: If the input arguments refer to a correlated base table column
(which will vary in value for each different base table row accessed), then the AMPs
that have rows pertaining to the input data provided will participate.
For example, consider a query that invokes a table function and also accesses selected rows using a single UPI value for a base table. The table function is invoked
across all AMPs. Because of the WHERE clause that references the base table’s
primary index column(s), only activity on one AMP will occur on behalf of the table
function, the AMP where the base table row(s) are located.
Select B.Rate, B.Degree
From AltClaims A,
Table(FuncGetRate(A.Diagdata)) B
Where A.ClaimID = 6;
AMP1
AMP2
AMP3
AMP4
UDF
ClaimID = 2
ClaimID = 5
ClaimID = 1
ClaimID = 6
ClaimID = 3
ClaimID = 8
Table Function
executes here
Figure 11: One UPI value is selected, therefore one AMP executes the UDF
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
21
541-0004922A01
Facilities and Techniques for Event Processing
When variable input arguments are passed, table functions are only active on
AMPs where correlated data exists. In the example illustrated in the graphic
above, the table function will only be called on AMP2. Only 1 row will be returned
by the table function because of the UPI access into the base table. It is up to the
table function to determine the number of rows it wants to generate on that AMP.
In the case where the table function returns multiple rows, a spool file will be created to hold these rows as they are created, just as would be the case during a full
table scan. Because of the presence of the spool file, you will need to include a
WHERE clause to control the join of the spool and the base table. Usually this join
constraint will be between the primary index of the base table and a related column
in the spool. Including this join constraint will avoid a Cartesian product between
the two.
When you define a table function, you will use the CREATE FUNCTION syntax, but
one of the additional parameters will be a RETURN TABLE clause. This labels the
UDF as a table function, specifying that a table consisting of a set of rows will be
returned. As part of that clause a list of column type and character set pairs are included to describe the columns that will be returned and how they can be referenced.
5.2. Table Functions with Transformations and Text Manipulation
A table function could produce rows solely from the input arguments. For example,
an input argument could be a reference to a Character Large Object (CLOB) that
contains XML text. From that CLOB it could parse the XML text and output a set of
SQL rows.
In this example, the xpath expression selects the parent of the multiple-occurring
element Item. The UDF returns the text content of these 3 elements.
SELECT L.var1 as Partnum, L.var2 as Price, L.var3 as Desc
FROM (SELECT xmlOrder, poNum FROM OrderLog) as O,
TABLE( XPathValues(O.poNum, O.xmlOrder,'/ORDER/ITEMS') ) AS L
(poNum,var1,var2,var3, ...)
where O.poNum = L.poNum;
The output from that SQL looks like this:
Partnum Price
--------------101
1200.00
147
28.95
101
1200.00
148
28.95
Desc
----------Partners Conference Ticket
V2R5.1 UDF Programming
Partners Conference Ticket
V2R5.1 Stored Procedures and Embedded SQL
[Prototype Example #7]
22
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
5.3. Table Functions with Analysis
Building on the scalar UDF example in Section 4.2.2, it is possible to re-create the
Strategy UDF as a table function. This would make for a more complex UDF,
which would return a set of rows, rather than just one value.
The SQL that invokes the table function might look like the following. The table
function provides significantly more detail than the simple UDF, as can be seen by
the columns in the request’s select list.
Insert into StrategyRecommendation
Select ClientID
,ST.Strategy
,ST.Percent_CD
,ST.CD_Return
,ST.Percent_Bonds
,ST.BondAvgReturn
,ST.Percent_Mutual
,ST.MAvg5YrReturn
From SavingsPlanCustomers SPC
,Table(Strategy(SPC.clientID,SPC.age,SPC.balance,SPC.contribution,
SPC.income)) ST
Where SPC.ClientID = ST.ClientID
[Prototype Example #8]
This table function is operating in varying mode and would engage all AMPs in the
system because all rows from the SavingsPlanCustomer table are being read without selection criteria. If only one client were selected, by means of an equality
condition on the primary index ClientID, then only a single AMP would be executing
the table function.
In addition, notice that there is a join constraint between the base table SavingsPlanCustomers and the output from the table function. This join back on ClientID prevents a Cartesian product join from being performed between the table
and the spool, and ensures that only one row per Client is inserted into StrategyRecommendation.
5.4. Table Functions that Generate Data
This section will discuss a data generation table function that illustrates several interesting things:
1.
2.
3.
4.
Using standard libraries, in this case an established access mode
Generating data within the database
Being able to control the degree of parallelism doing the work
Understanding database resource capacity
5.4.1. Using Standard Libraries
An existing access module, previously used by standard Teradata utilities, is being
called in this prototype. The same access module had been used to generate data
with TPump and could have been used with either MultiLoad or FastLoad.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
23
541-0004922A01
Facilities and Techniques for Event Processing
Because this was a simple prototype, the access module was designed to generate
only a single string of data. The parameters for the access module are contained
in the access module’s initstring. This initstring looks like this:
'-roww 100 -f unformat'
This initstring specifies that the row width will be 100 bytes and that the format type
is unformatted.
A simple invocation of the table function in a select statement without a where
clause will cause the table function be executed repeatedly on each AMP in parallel. Each repetition of the table function causes the access module to be called.
The arguments associated with the table function, which is named ‘emrcamrg,’
control the number of times the access module is called on each AMP.
Here is the select statement used in the prototype. It references the table function,
passes arguments for the table function, points to the location of the access module, and passes parameters to the access module.
select *
from table (emrcamrg
(12000, 5000, 1, '/home/rmh/bin/libamrgenu.so', '-roww 100 -f unformat'));
[Prototype Example #9]
The first parameters in the parenthetical expression control the maximum number
of milliseconds (12,000, the equivalent of 12 seconds) the table function will be allowed to execute. The second parameter states the maximum number of rows
each AMP will produce. Whichever number is reached first (max seconds or max
rows) will be the controlling factor in the execution of this particular table function.
5.4.2.
Generating Data
What was shown in the previous section was a highly efficient method of producing
a simple unformatted string of data by means of invoking a table function that
called a simple access module. Other access modules could be set up to produce
data with specific demographics and of greater complexity.
Some results were recorded from executing the above SQL. Executing this table
function in protected mode on an older generation of hardware produced data at a
rate greater than 10,000 rows per second per node. In unprotected mode, nearFastLoad rates were achieved, approaching 100,000 rows per second per node. In
contrast, using TPump with the same access module to produce the same data
produced 800 rows per second per node. The table function provided orders of
magnitude better performance, and with no client resources involved.
24
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
5.4.3. Controlling the Degree of Parallelism
In this variation of the same prototype, the same table function, ‘emrcamrg’, generates data which is immediately written to a base table, ‘udftarget’, by means of an
insert/select statement.
This example’s somewhat more complex SQL contains a convention that lets the
user control how many and which AMPs will be executing the table function. Limiting the number of participating AMPs controls the level of resources applied to the
work that the UDF is performing.
To understand how the degree of parallelism is managed, first look at the request
that contains the UDF, particularly the input arguments, in bold below.
insert udftarget
select ampid,seq,passthruo,themessage
from (select pivalue, ampid from rdg.allamp where ampid <4) A
,table (emrcamrg
(12000, 5000, a.pivalue, '/home/rmh/bin/libamrgenu.so', '-roww 100
-f unformat')) T
where a.pivalue = T.passthruo ;
[Prototype Example #10]
The 3rd position in the input argument is “a.pivalue” which is a correlated reference
to a column in a table named “allamp”, which the query reads and joins to the table
function. Because of the presence of this correlation, we know that the table function is in varying mode, and that only the AMPs that have rows pertaining to this
variable will execute the table function.
Allamp Table
4 Rows
Selected
Hashes to AMP7
Hashes to AMP6
Hashes to AMP5
Hashes to AMP4
Hashes to AMP3
Hashes to AMP2
Hashes to AMP1
Hashes to AMP0
Table Function sent to 4 AMPs
AMP0
AMP1
AMP2
UDF
UDF
UDF
AMP3
AMP4
AMP5
AMP6
AMP7
UDF
Figure 12: Rows selected from the Allamp table control which AMPs do the work
The definition of the parameters that are passed to the table function follows:
•
•
•
•
12000: A time limit (12000 ms or 12 seconds)
5000:
A maximum number of rows to return (5000) per AMP
A.pivalue: A variable that correlates to the primary index of the allamp table
'/home/rmh/bin/libamrgenu.so': Path to the data generation access module
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
25
541-0004922A01
Facilities and Techniques for Event Processing
The output generated by the table function, using the allamp table for guidance, is
inserted into a target table that is defined with the following layout. The column
‘themessage’ is where the single generated string of data resides. Here is the layout of that target table:
CREATE MULTISET TABLE RMH.udftarget ,NO FALLBACK ,
(AmpID INTEGER,
seq INTEGER,
passthruo INTEGER,
themessage VARCHAR(32000))
PRIMARY INDEX ( AmpID ,seq );
The other columns in the above target table serve this purpose:
•
•
•
AmpID represents the AMP that was the source of this row; its value originates from
the allamp table (described further below)
Seq is a sequence number of each individual row produced on that AMP
Passthruo is an output argument returned from the table function that matches the
primary index value of the associated allamp table row (described below).
Because a variable (a.pivalue) has been included in the input arguments of the table function, only a subset of the AMPs will invoke the UDF. AMPs that own rows
reflecting the primary index values contained in the pivalue column will do work, the
others will not. Because of selection criteria coded in the query’s access of the
ampid table (select pivalue, ampid from rdg.allamp where ampid <4), we can assume that
only the AMPs that hold rows with an ampid value of 0, 1, 2, and 3 will be invoking
the table function in this query. This where clause could have selected ampid values less than 2 and caused 2 AMPs to execute the UDF, or values less than 7 and
had 6 AMPs engaged.
Consequently the base table called “allamp’ acts as a control mechanism over how
many AMPs are active in this example. Here’s how that works:
•
•
•
The allamp table has been defined in such a way as to have only one row on each
AMP.
Values for the primary index column (pivalue) were intentionally chosen such that
each row of the allamp table hashes to a different AMP.
The associated AmpID column value carries an AMP identifier.
To make this more understandable, below are the first 8 rows of the allamp table,
sorted by ascending ampID. The numbers in the PIvalue column, the table’s primary index, were selected because they each hashed to different AMP. Values in
the allamp talbe rows were carefully selected by the implementer to display these
controlled characteristics.
26
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
CREATE MULTISET TABLE rdg.allamp ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(PIValue INTEGER,
AmpID INTEGER)
PRIMARY INDEX ( PIValue );
PIValue
2
30
3
12
7
4
24
13
AmpID
0
1
2
3
4
5
6
7
In Prototype Example 10 that executes the table function above, note that the insert/select has a WHERE clause that joins the table function with those 4 selected
rows of the allamp table. The join constraint ‘where a.pivalue = T.passthruo’ is
added to the query to prevent an unconstrained product join between the 4 rows in
the allamp spool and the rows being generated by the table function, each which
carry the pivalue of the AMP where they originated.
Allamp Table
Product
Join
4 Rows
Selected
Hashes to AMP7
Hashes to AMP6
Hashes to AMP5
Hashes to AMP4
Hashes to AMP3
Hashes to AMP2
Hashes to AMP1
Hashes to AMP0
20,000
Row
Spool File
Table Function sent to 4 AMPs
AMP0
AMP1
AMP2
UDF
UDF
UDF
AMP3
AMP4
AMP5
AMP6
AMP7
UDF
Only These AMPs Produce Data
Each Produces 5000 Rows
Figure 13: The table function’s output is similar to a derived table, and will be joined to any other
tables in the query, with or without a join constraint between them
Without that WHERE clause, the result set would have contained 80,000 rows (4
allamp rows x 20,000 table function rows), rather than the specified 20,000 (4
AMPs producing 5000 rows each). If the unconstrained product join were to happen, each row of generated data would appear 4 times in the result set.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
27
541-0004922A01
Facilities and Techniques for Event Processing
In order to create a UDF and make it available for use, both a compiled C or C++
code module and a data dictionary definition are required. The DDL to define this
table function within the Teradata data dictionary follows:
REPLACE FUNCTION RMH.EMRCAMRG
(maxtime INTEGER, maxrows INTEGER, passthrui INTEGER,
axsmodpath VARCHAR(255) CHARACTER SET LATIN,
initstr VARCHAR(256) CHARACTER SET LATIN)
RETURNS TABLE
(seq INTEGER, passthruo INTEGER,
themessage VARCHAR(32000) CHARACTER SET LATIN)
SPECIFIC emrcamrg
LANGUAGE C
NO SQL
PARAMETER STYLE SQL
NOT DETERMINISTIC
CALLED ON NULL INPUT
EXTERNAL NAME
'F:emrcamrg:SI:pmddamti:/home/rmh/projects/inc/pmddamti.h:SS:emrcamrg:/home/
rmh/projects/emrcamrg/emrcamrg.c'
The RETURNS TABLE clause describes the output of the table function. The
EXTERNAL NAME clause is the path to the source code of the table function,
which will be brought into memory for execution.
5.5. Table Functions with External I/O
This section will offer prototypes illustrating external I/O being performed within a
table function, including 1) reading from an external queue, and 2) accessing data
from a different Teradata platform.
A third reason you might want to use table functions to perform external I/O is if
you need to pull in snippets of highly volatile real-time facts. While no prototype is
included to illustrate this, this approach is worth a brief comment.
Some phenomenon may change so fast that the benefit of capturing them and
loading them into the data warehouse becomes questionable. Global Positioning
Satellite (GPS) data, for example, can reflect the precise location of every vehicle
on the nation’s highways at any point in time. Weather readings around the world
may be interesting information, but are under constant change and flux. Stock market quotes are rising and falling perpetually.
Does your data warehouse need all of this every-changing data? Perhaps. Or
perhaps it needs it eventually, but not all of it right now. If only particular details
provide value, or if you need only a handful of them at the moment they come into
being, table functions offer the interesting alternative of pulling just the pieces of
very unsettled data you actually need from the external world, on an as-needed
basis.
28
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
5.5.1. Reading from a Queue
In the last prototype example we illustrated generating data from an access module
invoked by a table function. Now we are going to use a table function that calls an
access module that reads from MQ.
In the earlier example labeled Prototype #4, a scalar UDF was making one call to
the access module in order to put one message on the queue. In this example a
table function is using the same access module as Prototype #4 to read multiple
messages from the queue.
This prototype example also uses the same allamp table that was presented previously in the discussion of generating data using UDFs found in Section 5.4.3. Just
as before, the allamp table is used to control how much parallelism will support this
read effort. In this case only one AMP will be reading from the queue. In a large
system, it may be desirable to limit the number of AMPs that participate in a table
function, in order to minimize the impact on the overall system. You also may want
to control the rate that data is being fed into the queue, and reducing AMP involvement gives you a lever for that purpose as well.
Rather than generating data and writing to a base table as Prototype #9 did, the
query illustrated here selects messages (‘TheMessage’) that represents the data
that was passed in the queue.
Select TheMessage
from (select pivalue, ampid from rdg.allamp where ampid <1) A
,Table (emrcamrq (2000,1,a.pivalue,
'/home/rmh/bin/libmqsc.so', '-qmgr queue.manager.1 -qnm QUEUE1',
'CHANNEL1/TCP/153.64.119.177')) mq
where a.pivalue = mq.passthruo
[Prototype Example #11]
The explain text that is associated with the request illustrates how the AllAMP table
drives the database activity.
1) First, we lock a distinct rdg."pseudo table" for read on a RowHash
to prevent global deadlock for rdg.allamp.
2) Next, we lock rdg.allamp for read.
3) We do an all-AMPs RETRIEVE step from rdg.allamp by way of an
all-rows scan with a condition of ("rdg.allamp.AmpID < 1") into
Spool 1 (all_amps), which is built locally on the AMPs. The size
of Spool 1 is estimated with no confidence to be 7 rows. The
estimated time for this step is 0.03 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 1 by way of an all-rows
scan executing table function RMH.emrcamrq into Spool 2 (all_amps),
which is built locally on the AMPs. The size of Spool 2 is
estimated with no confidence to be 7 rows. The estimated time for
this step is 0.04 seconds.
5) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of
an all-rows scan into Spool 4 (all_amps), which is redistributed
by hash code to all AMPs. The size of Spool 4 is estimated with
no confidence to be 7 rows. The estimated time for this step is
0.02 seconds.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
29
541-0004922A01
Facilities and Techniques for Event Processing
6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
all-rows scan, which is joined to Spool 1 (Last Use) by way of an
all-rows scan. Spool 4 and Spool 1 are joined using a single
partition hash join, with a join condition of ("PIVALUE =
PASSTHRUO"). The result goes into Spool 3 (group_amps), which is
built locally on the AMPs. The size of Spool 3 is estimated with
no confidence to be 19 rows. The estimated time for this step is
0.05 seconds.
In a more sophisticated example from the same prototype, a table was set up prior
to running the request with the table function. The table was designed to hold parameters, such as how many rows you intend for the function to process, and the
initstring. That table can then be read as a derived table in the query that invokes
the table function. This eliminates the need for each request to hard code the arguments. Here’s how that looks:
Select ampid
,seq
,passthruo
,themessage
From (Sel MaxTime ,MaxRows ,ReaderPIVal
,AxsmodPath
,InitStr
,Channel
,AmpId
From MQJobParms
Where AmpID < 8) prm
,Table (emrcamrq(prm.MaxTime,prm.MaxRows,prm.ReaderPIVal,
prm.AxsmodPath,prm.InitStr,prm.Channel)) mq
Where prm.ReaderPIVal = mq.PassThruO;
[Prototype Example #12]
In the above example, the MQJobParms table also controls the level of parallelism
within Teradata that is applied to the work. In this case ReaderPIVal is a variable
passed into the table function from the MQJobParms table rows. The value contained in the ReaderPIVal column is also represented as the output variable of the
table function, named ‘passthruo.’ There are 8 values for the AmpID column,
based on the WHERE clause within the derived table that accesses MSJobParms,
which delivers 8 different ReaderPIVal values as input to the table function.
5.5.2. Reading from a Remote Teradata System – Example 1
If you are running a Teradata dual active system, or have a second Teradata system for any reason, such as development, there may be times you would like to
pass data back and forth between the two platforms. For example, as shown in
this next prototype, it may be useful to query the dictionary tables from one system,
so they can be correlated to the other.
30
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
This SQL statement uses a table function (‘tdat’) that executes a query on a remote
Teradata system, and returns the answer set. In this example, it returns all the database names in the other system’s dictionary tables.
sel *
from table(rdg.tdat(2,1,'adw1/rdg,rdg'
,'sel databasename from dbc.databases'));
[Prototype Example #13]
The actual SQL executed on the second system is passed as a fixed input argument of the table function, as is the other system’s logon string.
The DDL to create the function follows:
REPLACE FUNCTION RDG.TDAT
(rowc INTEGER,
InLineNum INTEGER,
logonstr VARCHAR(50) CHARACTER SET LATIN,
sqlRqst VARCHAR(512) CHARACTER SET LATIN)
RETURNS TABLE
(ampId INTEGER,
cnt INTEGER,
OutLineNum INTEGER,
str1 VARCHAR(256) CHARACTER SET LATIN,
.
.
.
str20 VARCHAR(256) CHARACTER SET LATIN)
SPECIFIC tdat
LANGUAGE C
NO SQL
PARAMETER STYLE SQL
NOT DETERMINISTIC
CALLED ON NULL INPUT
EXTERNAL NAME 'SS:tdat:/home/rdg/tdat/Tdat.c:SL:cliv2'
By creating a view across two Teradata systems you can compare dictionary content across platforms, and compare detail such as table space, or access rights.
The view below simply compares the rows that appear in each system’s
DBC.Tables view.
create view allTables as
sel 'Local System' as system
,databasename
,tablename
,version
,tablekind
,protectionType
,JournalFlag
,CreatorName
,requesttext(varchar(100))
from dbc.tables
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
31
541-0004922A01
Facilities and Techniques for Event Processing
UNION
sel 'Remote System'
,str1 (char(30))
,str2 (char(30))
,str3 (Integer)
,str4 (char(1))
,str5 (char(1))
,str6 (char(2))
,str7 (char(30))
,str8 (varchar(100))
from table(rdg.tdat(2,1,'adw1/rdg,rdg'
,'sel databasename,tablename ,version,
tablekind,protectionType,JournalFlag,CreatorName,
requesttext(varchar(100))
from dbc.tables')) T;
A sampling of data returned from the above SQL, when ordered by tablename (for easy
cross-comparison), looks like this:
System
DatabaseName TableName
Remote System test
Local System DBC
Remote System DBC
Remote System DBC
Local System DBC
Local System rdg
Remote System test
a
AccessRights
AccessRights
AllSpace
AllSpace
allamp
allamp
Version
1
1
1
1
1
1
1
TableKind
T
T
T
V
V
T
T
5.5.3. Reading from a Remote Teradata System – Example 2
This prototype illustrates the case where the data of interest resides on a different
Teradata platform from which the query is executing. Table functions can provide
a quick way of moving data under such conditions. Perhaps the data has been offloaded to an older configuration because it is outdated, and rarely used. Or perhaps you wish to restore selected data that has been archived to a different Teradata platform. Or you may consider this when you need to access real time information where the cost of the occasional access is less than the cost of integrating
all the changes in real time.
In this prototype, System A hold rows of lineitems that are partitioned by day. System B executes a query that requires one or more partitions for processing. Only
the desired partitions are read, by means of a table function, and brought over to
System B.
In order to support this activity, a view has been created on System B that joins a
look-up table to the table function that accesses the PPI table on System A. The
look-up table is used to provide a logon string and the appropriate SQL that is required to pull the desired data off of System A. It has one row per partition in the
PPI table.
32
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
Query Requests
Old Data
System B
System A
View
Lookup Table
- Day
- Logon
- SQL
Table
Function
PPI Table
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Figure 14: If the query requests 1 day, only 1 partition is returned by the table function
The table that holds the SQL looks like this:
CREATE SET TABLE RDG.lisql
(
l_shipdate DATE FORMAT 'YY/MM/DD',
passthru INTEGER,
logonstr VARCHAR(12) CHARACTER SET UNICODE NOT CASESPECIFIC,
sqltxt VARCHAR(452) CHARACTER SET UNICODE NOT CASESPECIFIC)
PRIMARY INDEX ( l_shipdate );
Three random rows from the lisql table follow, with the SQL abbreviated:
l_shipdate
passthru
logonstr
sqltxt
1998-09-13
2447
adw1/cab,cab
Select L_ORDERKEY. . .
from ADW.liday
where l_shipdate = '1998-09-13'
1992-04-10
100
adw1/ cab,cab
Select L_ORDERKEY. . .
from ADW.liday
where l_shipdate = '1992-04-10'
1997-07-17
2024
adw1/ cab,cab
Select L_ORDERKEY. . .
from ADW.liday
where l_shipdate = '1997-07-17'
When a user submits a query that accesses this lookup table, each date selected
in the query will cause one row, a different row, in the table to be selected. For example, if query had a WHERE clause that said “where l_shipdate between ‘199501-01’ and ‘1995-01-03’” that would cause 3 rows from the lisql table to be selected. Each row has a logon string and a different SQL statement.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
33
541-0004922A01
Facilities and Techniques for Event Processing
…where sdate between ‘1995-01-01’ and ‘1995-01-03’
System A
System B
Lookup
Table
- Day
- Logon
- SQL
Select…where sdate = ‘1995-01-01’;
Select…where sdate = ‘1995-01-02’;
Select…where sdate = ‘1995-01-03’;
Join
Table
Function
AMP1
AMP3
AMP5
AMP7
PPI Table
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
3 Queries Execute
in Parallel
Figure 15: Each date selected causes one query to be executed on the remote system
When multiple dates are in the query, and as a result multiple rows are selected
from the lisql lookup table, this causes two things to happen:
3. There will be one AMP on the local system working on behalf of the table
function for each row accessed from the lisql table. This is an example of
a correlated join when the table function is in varying mode.
4. Each of the local AMPs that is executing the table function will be sending
one of the multiple SQL statements to the remote system, and receiving
output back.
The view that joins the data from System A and the look-up table follows.
replace View RemoteLineitem as
Select str1 (Integer) as L_ORDERKEY,
str2 (Integer) as L_PARTKEY ,
str3 (Integer) as L_SUPPKEY ,
str4 (Integer) as L_LINENUMBER ,
str5 (DECIMAL(15,2)) as L_QUANTITY ,
str6 (DECIMAL(15,2)) as L_EXTENDEDPRICE,
str7 (DECIMAL(15,2)) as L_DISCOUNT,
str8 (DECIMAL(15,2)) as L_TAX,
str9 as L_RETURNFLAG ,
str10 as L_LINESTATUS ,
l.l_shipdate (FORMAT 'yyyy-mm-dd') ,
str12 (date) (FORMAT 'yyyy-mm-dd')as L_COMMITDATE ,
str13 (date) (FORMAT 'yyyy-mm-dd')as L_RECEIPTDATE ,
str14 as L_SHIPINSTRUCT ,
str15 as L_SHIPMODE ,
str16 as L_COMMENT
from (select * from lisql) l
,table(rdg.tdat(2,l.passthru,l.logonstr,l.sqltxt)) T
where l.passthru = t.outlinenum;
Because when multiple dates are selected, multiple queries, one per date, are
generated and sent to the remote system, and because these queries execute in
34
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
parallel on the remote system, better than linear performance can be achieved using this technique.
For example, compare the time to return one partition, consisting of one date, with
the time to return 7 partitions, consisting of one week’s worth of data.
SQL Issued by the User
select * from RemoteLineitem where
l_shipdate = '1995-07-14'
select * from RemoteLineitem where
l_shipdate between '1995-08-14'
and '1995-08-20'
Number of
Partitions
Response
Time
Number
of Rows
Rows per
Second
1
27 seconds
124,905
4,626
7
54 seconds
872,936
16,165
[Prototype Example #14]
5.6. UDF Considerations
Some of the considerations when using UDFs include:
•
•
•
•
•
•
UDFs may impact parallel efficiency on the platform, particularly in the case when
the UDF is executing on a subset of the total nodes and it is resource-intensive.
Such a UDF execution may lengthen the amount of time a query’s step holds on to
an AMP worker task on the AMPs supporting the UDF execution.
If a UDF is running unprotected, the UDF will be running in the context of the AMP
worker task used by that query step. No additional AMP worker task will be required.
Running in protected mode requires that a protected mode server be available, a resource that is limited based on an internal setting that has a maximum setting of 20.
For a UDF, the protected mode server is held only as long as the UDF executes. For
a table function, the protected mode server is held for the duration of the query step.
Be aware that expanding the number of protected mode servers will draw from system disk resources.
UDF parameters are strongly typed. Because parameters are defined at compile
time you either need to account for any changes in format of the data coming back
yourself, or you will require a different UDF for each differently formatted set of rows.
As an illustration of how to account for this, the Tdat UDF in Prototype example #13
was defined with 20 generic varchar columns. This is so that up to 20 columns of
any reasonable length can be returned using the UDF.
There may be security ramifications in using UDFs, as they run as Root on the node
when unprotected. However, you can use Teradata access rights to control who can
create and who can execute these functions.
UDFs that access external data will need to consider the performance impact of consuming resources that are outside the Teradata platform, for example a Websphere
MQ server. Teradata tools that track and record resource usage, such as Database
Query Log, AMPusage, and ResUsage, will not be aware of this additional resource
demand. In addition, resources used outside of Teradata will be outside the scope
of Priority Scheduler.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
35
541-0004922A01
Facilities and Techniques for Event Processing
6. Using Triggers in Event Strategies
A trigger is a set of actions that are run automatically when a specified change operation is performed on a given table. Triggers are a key event technology because they initiate the automation of business events directly inside the database.
In Teradata, triggers are implemented as part of a multi-statement request with the
statement that caused the trigger to fire. Triggers are bundled in with the initiating
data changes into a single unit of work.
Because of that tight bundling, triggers are incorporated into the same recovery
unit with the original statement that caused them to fire; if one action fails, both will
be rolled back. This provides a level of integrity that is not always in place among
other event components.
In Teradata Database V2R6, the action of a trigger can include more than SQL, as
described in the following section.
6.1. The Firing Statement
The actual execution of a trigger pushes notification of an event out, as described
by its firing statement. The firing statement is important because it is an action
based on something happening or something being identified as requiring further
action. The firing statement can initiate a chain of steps related to the handling of
an event.
There is greater flexibility in the firing statement of the trigger in V2R6. The firing
statement can now do all of these things:
•
•
•
•
Execute SQL against objects within the Teradata database
Insert into a queue table
Call a stored procedure
Invoke a UDF
In this Orange Book we will focus on the second, third and fourth options.
6.1.1. Triggering into Queue Tables
Because queue tables have the potential for passing things on, they make a natural second step in an event chain. Something interesting has happened that requires additional processing, and queue tables can make a convenient hand-off
point for the trigger. The trigger simply inserts into the queue table.
Section 2.1.3 describes a technique using queue tables to monitor the effectiveness of current promotions, as it is used in a recent active data warehouse benchmark. In the V2R6 version of this same benchmark, a trigger causes a row to be
written to a queue table named event01_QT each time a promotional product is inserted into the Mkt_Basket_Dtl base table. TPump is the load utility being used.
The syntax for the trigger that writes to the queue table follows:
36
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
replace trigger event02_trig
after insert on Mkt_Basket_Dtl
referencing new as n
for each row when (n.mbd_productkey in (8,13,24,35,46,52,67,78,83,98 )
(insert into event02_QT
values (n.mbd_orderkey
,n.mbd_productkey
);
);
When a trigger inserts into a queue table, that activity belongs to the same transaction as inserted into the base table that owns the trigger. For example, if TPump
inserts a row that is part of a special promotion, and this causes a trigger to fire,
both the insert into the base table and the trigger are part of the same transaction.
The physical transaction will end with the queue table insert, even though the logical business transaction will continue. The subsequent SELECT AND CONSUME
of the queue table that will continue the life of that event will take place under the
control of a different transaction. Using queue tables in this manner is an asynchronous activity that breaks the chain of the event into smaller, independent links.
TERADATA
Claims
Table
Trigger
Queue
Table
Transaction 1
SELECT and
CONSUME
External
Stored
Procedure
Transaction 2
Put Msg
on Queue
MQ
Queue
Get Msg
From
Queue
Transaction 3
Figure 16: Processing an event may involve multiple physical transactions, inside and outside of Teradata
Note that while you may insert into queue tables by means of a trigger that has
been built on a base table, in the current version of Teradata a trigger may not itself
be defined on top of a queue table.
6.1.2. Triggers Invoking Stored Procedures or UDFs
Instead of the action of a trigger being an insert, as illustrated in the previous example, the action may be a call to a stored procedure or utilize a UDF.
In either case, the trigger and the stored procedure or UDF are part of the same
recovery unit. Considering just the stored procedure, if it were to fail, then both the
insert to the base table that caused the trigger to fire, and the trigger itself, would
be rolled back. However, it is important to note that the effect of the trigger might
not be completely rolled back, for example, if the trigger uses a UDF, external procedure or table function that causes some “external” action to occur. A trigger calling a stored procedure makes the event processing synchronous.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
37
541-0004922A01
Facilities and Techniques for Event Processing
For example, if you consider the example of the scalar UDF that writes one message to an external queue file, presented in Section 4.2.3, that same SQL statement could be incorporated into the firing statement of a row trigger. Using our
previous trigger example from an active data warehouse benchmark, the syntax
might look like this:
replace trigger event02_trig
after insert on Mkt_Basket_Dtl
referencing new as n
for each row when (n.mbd_productkey in (8,13,24,35,46,52,67,78,83,98 )
(Select emruwmq('queue.manager.1','QUEUE1','CHANNEL1/TCP/153.64.119.177',
‘trim(n.mbd_productkey)||’,’||trim(n.mbd_quantity)));
The same action of writing to a queue could have been compiled into an external
stored procedure.
Be aware that there are some restrictions on the type of SQL statement that can be
included in stored procedures that are called from triggers. DDL, for example, is
prohibited. See the formal documentation manual for a complete list.
6.1.3. When to Use Which
The choice between a UDF or a stored procedure on the one hand vs. a queue table on the other should be based on the amount of work the triggered action will
actually do. For example, extended analytics that rely heavily on database access
and logic branching are better handled by being spun off as an independent activity, by means of write to (followed by a read from) a queue table. This will keep the
scope of the work for the initiating transaction small.
In such cases, where there will be analytic work within a stored procedure called
from a trigger, keep in mind that the initial update action will not be committed until
the stored procedure has successfully completed. AMP worker tasks supporting
the initial update activity will be held, as will locks. Suppose the update caused
the trigger to fire is part of a TPump insert job, it is likely that performance for the
entire load job will be impacted.
External stored procedures, on the other hand, are easily replaced by UDFs within
the fired statement of a trigger.
38
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
One Unit of Work
Asynchronous
Synchronous
Insert
Insert
Insert
Base
Table
Base
Table
Base
Table
Trigger
Queue
Table
One Unit of Work
Synchronous
541-0004922A01
Select and
Consume
Trigger
Trigger
UDF
(write to MQ)
Stored
Procedure
Select…
Update…
Select UDF…
(write to MQ)
One Unit of Work
One Unit of Work
Figure 17: The approach to using triggers can extend or reduce the recovery unit
6.2. Trigger Complexity Tradeoffs
Performance of load processing can be greatly impacted by how triggers are designed. Primary index triggers, such as insert or updates and deletes based on
having a primary index value available, can complement TPump jobs, for example.
Such triggers rely on row-hash locks and impact only a single row on a single AMP.
But be aware, that even though this is a minimal level of overhead, row-hash locks
can, under some conditions, contribute to blocking.
But what to really watch out for is if the trigger generates insert/select statements
or complex updates. This can increase the overhead involved and cause table
level locks to be set, reducing parallelism and degrading performance of the load
process.
A good approach is to consider the overhead of triggers the same way you would
consider the overhead of join indexes. Running an explain of the base table update that causes the trigger to fire will provide a blueprint of the database effort involved in supporting the trigger, just as it would illustrate the overhead involved in
join index maintenance.
Note that tables that contain triggers may not be loaded using FastLoad or MultiLoad. In addition, you cannot create a trigger on a table participating in a join index.
In order to properly appreciate a simple vs. a complex trigger, and its impact on the
update, consider the contrast between the following examples.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
39
541-0004922A01
Facilities and Techniques for Event Processing
Simple Trigger Example:
What makes this first trigger simple is that the body of the trigger (the action that
happens as a result of the trigger firing) is a simple, one-AMP insert with only one
row-hash lock. The explain of the update statement that causes the trigger to fire
will illustrate the impact of the trigger.
replace trigger trigaudit after insert on resultinfo
referencing new as n
for each row
(insert into cabaudit values (n.r_resultinfokey,n.r_comment););
explain
insert into resultinfo values (99,'newitem',5,'exception');
Explanation:
1) First, we execute the following steps in parallel.
1) We do an INSERT into ADW.resultinfo.
2) We do an INSERT into ADW.cabaudit.
2) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
If you are loading data with TPump, the overhead of a simple trigger will depend on
how frequently that trigger will fire. Consider these test results from a TPump job
that loads over 300,000 rows.
TPump
Percent longer
Elapsed Time
Compared to No Trigger
TPump with no trigger
140 sec.
Trigger never fires
154 sec.
10%
Trigger fires 10% of the time
168 sec.
20%
Trigger fires 100% of the time
262 sec.
88%
Even if the trigger never fires, there is some overhead in checking the WHEN
clause conditions, in this case about 10%. More complex conditions will incur more
overhead for the condition checking. The overhead increases as the percentage of
rows that cause the trigger to fire increases.
Complex Trigger Example:
Contrast the single AMP insert above, with the activity caused by the trigger in the
explain below. There are several aspects of this trigger’s complexity. There are
two table-level write locks, and one level read lock. The plan also contains 6 allAMP steps, one of which performs a full table scan/update.
It is interesting to note that while the table that the trigger is on (resultinfo) does not
have a join index defined on it (triggers and join indexes are not supported on the
same table), there is a join index involved in the plan produced by the trigger (reftblJI). This is because within the body of the trigger an update is performed against
40
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
reftbl table, which does have a join index built upon it, and join index maintenance
must be included in the plan because that table is potentially being updated.
replace trigger trigaudit after insert on resultinfo
referencing new as n
for each row
when (n.r_resultinfokey not in (select o_altkey from orderalt))
(update reftbl set rt_acctbal = rt_acctbal + 1;);
explain
insert into resultinfo values (99,'newitem',5,'exception');
Explanation:
1) First, we lock a distinct ADW."pseudo table" for write on a
RowHash to prevent global deadlock for ADW.reftblJI.
2) Next, we lock a distinct ADW."pseudo table" for read on a
RowHash to prevent global deadlock for ADW.orderalt.
3) We lock a distinct ADW."pseudo table" for write on a RowHash
to prevent global deadlock for ADW.reftbl.
4) We lock ADW.reftblJI for write, we lock ADW.orderalt
for read, and we lock ADW.reftbl for write.
5) We execute the following steps in parallel.
1) We do an INSERT into ADW.resultinfo.
2) We do an INSERT into Spool 1.
6) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
way of an all-rows scan into Spool 3 (all_amps), which is
built locally on the AMPs. Then we do a SORT to order Spool
3 by row hash. The size of Spool 3 is estimated with high
confidence to be 1 row. The estimated time for this step is
0.01 seconds.
2) We do an all-AMPs RETRIEVE step from ADW.orderalt by way
of index # 4 without accessing the base table
"ADW.orderalt.O_ALTKEY = 99" with no residual conditions
into Spool 5 (all_amps), which is redistributed by hash code
to all AMPs. Then we do a SORT to order Spool 5 by the sort
key in spool field1 eliminating duplicate rows. The input
table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 5 is estimated with
high confidence to be 16 rows. The estimated time for this
step is 0.04 seconds.
7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
an all-rows scan into Spool 4 (all_amps), which is duplicated on
all AMPs. Then we do a SORT to order Spool 4 by row hash. The
size of Spool 4 is estimated with no confidence to be 320 rows.
8) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an
all-rows scan, which is joined to Spool 4 (Last Use) by way of an
all-rows scan. Spool 3 and Spool 4 are joined using an exclusion
merge join, with a join condition of ("Field_2 = O_ALTKEY"). The
resultinfo goes into Spool 2 (Last Use), which is built locally on the
AMPs. The size of Spool 2 (Last Use) is estimated with index join
confidence to be 1 row. The estimated time for this step is 0.06
seconds.
9) We execute the following steps in parallel.
1) If the number of rows returned in 8 is > 0, we do an all-AMPs
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
41
541-0004922A01
Facilities and Techniques for Event Processing
UPDATE from ADW.reftblJI by way of an all-rows scan
with no residual conditions.
2) If the number of rows returned in 8 is > 0, we do an all-AMPs
UPDATE from ADW.reftbl by way of an all-rows scan with
no residual conditions.
10) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
6.3. Other Examples of Event Triggers
Triggers can be useful in moving around and saving snapshot data produced by
monitoring tools. The following paragraphs offer an example.
TPump provides a method of monitoring its own progress, by means of an entity
known as the TPump status table, officially called TPumpStatusTbl. Only one row
in the database is used by TPump for this purpose. If this monitor table has been
initiated, TPump inserts a row, then updates that same row once every minute,
overlaying with each write the information it previously placed there.
TERADATA
TPump Job
Buffers
Msg Queue
Get
MQ Access
Module
SQL
Base Table
Parallel Sessions
Status
TP_Stat trigger
Hold
Stat
Summary &
Validation Reports
Figure 18: Triggers defined on the TPumpStatusTbl insert into a HoldStatus table, preseving status information
[Prototype Example #15]
By relying on database triggers, all information TPump writes to its monitor table
can be automatically moved into a history table also located in the Teradata database. This history table could specifically hold images taken from the monitor table
row. The database triggers would be fired only when the single row in the TPumpStatusTbl changes, either because it is inserted (as it would be at job start), modified (once every minute during the job) or deleted (at end of job). Triggering to a
history table would allow a delta for various load statistics to be computed between
TPump writes to the status table, as the information made available by TPump is
accumulative.
42
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
7. Enterprise Data Warehouse Considerations
When events are integrated into the Teradata data warehouse, items such as
monitoring, security, and performance all need to be pro-actively considered. This
chapter will address some of these topics.
7.1. Monitoring
Monitoring for events is similar, yet it is also different, from the standard data warehouse monitoring that may already be in place. While setting up monitoring to
track the volume and nature of the events passing through your system may be
fairly standard, correlating the resources used to particular events may require
more creative thinking.
7.1.1. Database Query Log and Stored Procedures
Database Query Log (DBQL) offers many benefits in the Teradata world in tracking
query activity. However, because it is focused on the Teradata database activity,
DBQL only captures query characteristics and resource usage from activity that is
running on the AMP. While SQL issued from stored procedures execute in the
AMPs, the stored procedure itself runs in the parsing engine.
Both the stored procedure call and each SQL statement within the stored procedure will be logged as separate entries in DBQL, each with a distinct Query ID. If
the QueryText column in DBQLogTbl contains the stored procedure call, for example it contains something like ‘call pksscan (10, avgep)’, then you will see zero in
the TotalCPUTime column of that row. Only the Teradata SQL statements within
the stored procedure, each which will have its own row in DBQLogTbl, will register
CPU and I/O usage.
An external stored procedure that does not issue Teradata SQL, will never accumulate resource usage that can be reported in DBQL. However, a call to an extended stored procedure will get logged in the default logging table and will be
given a query ID. In addition, all the resource accumulations will be accounted for
within the ResUsage tables from stored procedure executions.
When a stored procedure, whether external or SQL-based, calls another stored
procedure, both call statements will get logged in DBQLogTbl. However, the
nested stored procedure will always get a zero QueryID and nulls in the QueryText
column. If the second-level stored procedure executes SQL statements, they will
not appear in DBQL. This is because the current release of DBQL does not have
access to the request and session context when there is a call within a call.
7.1.2. Alternatives for Monitoring Events
Because there are no automatic approaches to getting usage information out of
stored procedures, some simple home-grown techniques are useful to consider.
Below are some thoughts to get you started:
•
Build your own log table that different processes you build can log into, capturing response times, counts, actions taken, etc.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
43
541-0004922A01
•
•
Facilities and Techniques for Event Processing
As the first statement in a stored procedure call, and as the last call, select the current timestamp, and insert both values into a log table to capture actual response
time.
Utilize a UDF that captures CPU usage levels based on the operating system process-level usage. For example, a UDF could be written to look at Windows PerfMon
measurements.
If you are using either a UDF or an external stored procedure, you may choose to
put your own logging in place. You can take advantage of the debug table as a
temporary place to log things. Or you can log to a file which is later processed or
loaded into Teradata.
When monitoring is expanded to cover events, it is more important to set up account strings appropriately. You might want more granular control over users, with
different roles and different priorities for different event types. Even the same
stored procedure executing at different times of day may require different execution
profiles.
7.2. Security
During event processing if you invoke an external service, you will be doing this external work under some outside authority, beyond the scope of standard Teradata
security.
7.2.1. UDFs and Security
User Defined Functions run as ‘root’ (in unprotected mode) or ‘tdatuser’ (in protected mode) on UNIX systems and in system mode on Windows systems. Running as ‘root’ (or system mode on Windows) gives the UDF super user privileges.
Because the execution itself cannot be managed, the point of control needs to be
in the privilege to CREATE and EXECUTE UDFs.
When working with UDFs, it is critical to oversee who is given the two above privileges. In addition, thorough testing should be required before a UDF is moved into
production.
7.2.2. Approaches to Handling External Security
A basic issue with security today is how to pass authentication to an external platform. For example, a user and password name will need to be made available to a
stored procedure that will be going to an external platform, so work can be done
there.
An easy thing to do would be to code the external stored procedure (XSP) with the
user and password information and any other logging on detail contained within.
Another approach is to set up your XSP with arguments that could pass the user
and password information at execution time. Or the XSP could call a second-level
SQL-based stored procedure that determines the appropriate log on and security
information from within the Teradata database.
44
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
The graphic below shows a similar approach that uses a high level stored procedure that first calls an SQL-based stored procedure to get the security information,
than calls an XSP to do the external access using the information provided by the
first stored procedure.
Teradata
Coordinating SP
- Call GetPassword
- Set user and password
- Call GetExtData
3.
1.
GetPassword
SP
2.
4.
7.
GetExtData
XSP
5.
6.
External
Database
Security
Table
Figure 19: Providing a user and password for external platform
This same approach could be enhanced by using an ODBC call to the external database that is going to be accessed, to replace the GetPassword stored procedure
illustrated above. An enterprise security server (for example, LDAP) could also be
accessed as the first step, in order to establish a valid identification.
Whatever approach is used, the scenario will be similar: Authentication for the external service is stored somewhere. The user executing the XSP or the UDF has
to be able to access this authentication data based on who he is inside Teradata.
7.2.3. Auditing
An important part of any security scheme is a means to capture violations. In an
event architecture, this type of auditing is even more important because of the risk
of poorly-written or rogue UDFs. A built-in logging structure, such as discussed
earlier in the context of monitoring, can do double-duty as a method of enforcing an
audit trail for external activities. Good coding practices around traceability will also
rise in importance.
7.3. Workload Management
Components that are involved in event processing in Teradata may show different
utilization patterns than what you are used to seeing. For example, you may have
these types of things happening in combination:
•
•
•
•
•
A continuously-running stored procedure issuing lots of short queries.
A stored procedure that does a single in-depth data analysis query.
Events being processed off an external queue.
Triggering stored procedures to external events.
Queue tables being processed at irregular rates.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
45
541-0004922A01
Facilities and Techniques for Event Processing
As background material for this section, it may be helpful if you read through the
Orange Books already published on Using Priority Scheduler and Using Teradata
Dynamic Query Manager.
7.3.1. Priorities
There may need to be procedures put in place to monitor and change priorities of
different processes that are running. By having more granular user IDs you can
track event frequency and resource usage, and you can prioritize event processing
with more flexibility.
All of the Teradata database activity for a given user session will run under the priority established by that user. This is true whether the resources are being used in
the AMP or in the PE, as would be the case with stored procedure logic. Keep in
mind that a UDF or XSP can consume resources outside of Teradata, which will
not be under the control of Teradata workload management facilities during that
period of time.
While AMP worker tasks may be held during the time that a UDF executes externally, based on the SQL step that caused the UDF to execute, no additional AWTs
are either acquired or held by UDF execution.
7.3.2. Workload Rules
TDWM (formerly known as Teradata Dynamic Query Manager) supports both filter
rules (object access and query resource types), as well as throttle rules. The throttle rules, previously called workload limit rules, control concurrency levels within
certain groupings of users.
These TDWM rules may need to be reconsidered when put into the context of queries that participate in events.
Rules will apply only to the SQL statements that
access data in Teradata, but have no usefulness in controlling external access.
Here are some guidelines on where TDWM will be or not be effective with event
components:
•
•
•
•
•
Only the SQL within stored procedures will adhere to rules.
External stored procedures will not be impacted by rules.
Filter rules can be associated to queue tables.
Filter rules cannot recognize or restrict UDFs.
Throttles can delay or reject queries than invoke UDFs or that access queue
tables.
Some additional considerations specific to TDWM follow:
Stored Procedures: Within a stored procedure, each SQL statement is a separate request and will be considered by TDWM separately for rule-compliance. It is
possible for the first request in a stored procedure to comply with all rules and execute successfully, but then to have a subsequent SQL statement in the stored procedure be delayed or rejected.
46
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
For example, if a throttle rule with a “reject” option has been defined, and one of
the SQL requests has a qualifying all-AMP step that would violate the rule, then the
entire stored procedure would be terminated with a 3151 error. Checks for TDWM
rules are made after the stored procedure begins execution, not before. To prevent abnormal termination under such conditions as described earlier, the stored
procedure could be coded to include error handling for the 3151 error code.
If the throttle rule specifies “delay” rather than “reject” the stored procedure would
stay alive in a suspended state and wait until its SQL request got off the delay
queue, executed and completed.
Delay Queues: Be careful when using throttle rules that you don’t inadvertently
place queries that participate in event processing in a delay queue. Depending on
your design, event processing may initiate a data analysis query. This query may
be viewed by a workload administrator as the type of work that is resource intensive and should be limited to a specified concurrency level. However, delaying this
query may not be in the interest of the broader event architecture. Workload management query rules may need to be revisited in the context of this broader perspective.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
47
541-0004922A01
Facilities and Techniques for Event Processing
8. Interacting with Event Architectures Outside Teradata
Twenty years ago Teradata made it a priority to fit into the mainframe environment.
Teradata Director Program (TDP), the channel connection, MVS-based utilities,
and applications such as TS/API all played a part in opening up Teradata for close
association with mainframe data and applications.
Teradata has evolved to plug into the organization’s operational environment, beyond the mainframe. Teradata supports open standards and interfaces such as
ODBC and JDBC. Operational data stores, for example, previously running on an
external data marts, can now run inside of Teradata. Active data warehousing is a
logical extension of this continuing drive towards interaction and cooperation beyond the database.
The next step forward, and outward, integrates Teradata more tightly with emerging Service Oriented Architectures (SOA) that have been set up to support them.
Some organizations will already have established architectural choices. This chapter will discuss ideas and recommendations for fitting Teradata into such a backdrop, starting with a brief explanation of what constitutes an SOA.
8.1. Service Oriented Architectures
Service Oriented Architectures are development and runtime frameworks that are
best represented by products such as IBM® WebSphere® Business Integration
Server, BEA’s WebLogic Integration™, Tibco® BusinessWorks and Microsoft® BizTalk Server®. These products’ architectures provide a standard way to integrate
and open up services and processes throughout the enterprise, and they are rapidly growing in popularity across industries.
Most SOA products include a design time graphical user interface to define integration scenarios and offer workflow management. In addition they provide webbased administration and monitoring. Most allow you to drag and drop symbolic
representations of services being integrated. SOAs leverage standards such as
XML, J2EE, .NET, and Web Services, are quick and easy to set up, offer flexibility
in how you deploy them, and support real-time information exchange.
To better understand the SOA framework, a few acronyms and frequently used
terms are listed below:
•
•
48
SOA – Service Oriented Architecture, the framework for interoperability. The processing of transactions are made available as business services with known interfaces using a standard representation. Services are published using a common interface descriptor language and protocol, such as WSDL.
WSDL – Web Services Descriptor Language, a new specification to describe networked XML-based services. It provides a simple way for service providers to describe the basic format of requests to their systems regardless of the underlying protocol. A WSDL definition will be required to complete the process of exposing the
DML as a service that can be used. After the WSDL definition has been recorded,
the SOA design tool will be able to reference that service at the appropriate place.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
•
•
•
•
•
541-0004922A01
SOAP – Standard Object Access Protocol. SOAP is a lightweight, vendor-neutral,
text-based protocol that uses XML for exchange of information in a decentralized,
distributed environment.
UDDI - Universal Description, Discovery, and Integration. Standard enterprise directories are like global white pages, a place you can go to look up technical details
about working with other web services, and advertise your own services. UDDI is
perhaps the most well-known web services directory.
Adapter – An implementation of a communication mechanism, such as a particular
protocol, that allows different software to talk.
Orchestration – Graphical creation of business processes. It brings understandability and provides a framework that allows business analysts to interact with IT architects.
Expose – Make a component, such as a service, available for interaction within an
SOA
8.2. How to Expose an Event or Service in Teradata
“Exposing” a service or an event means making it available for interaction with a
Service Oriented Architecture. What is of concern from the Teradata perspective
is how to design components in such as way that they are open to, and can plug
into, whatever event architecture is are already in place.
Some products have a specific Teradata adaptor, such as Tibco. This allows Teradata components to plug into the enterprise framework as it is defined by Tibco. In
Section 8.3 a real-life scenario involving Teradata and Tibco is illustrated. This
prototype relies on the Tibco adapter to allow components from different platforms
to talk.
WebLogic Integration (WLI) enables integration within an Enterprise Information
Systems EIS) framework. WLI provides a set of adapters to integrate with backend systems and enterprise applications and technologies, and supports custommade adapters using an Adapter Development Kit.
On the other hand, any Teradata application can be written in either JAVA or .NET
environment, and then be exposed as a service within one of those frameworks.
This does not require a special adaptor. Once you have exposed a service, it can
be placed in a larger business transaction, using one of the design time tools that
are made available in all SOAs.
8.3. Example of Teradata within an SOA
This section describes a prototype in which Teradata is integrated into a Tibco run
time framework. This particular application is from the transportation industry, and
illustrates how Teradata fits into an SOA.
8.3.1. The Event
A moving vehicle carries some risk of developing mechanical problems during a
long trip. Even the most thorough maintenance check back home may not prevent
an incident if a part goes bad somewhere along the route. In order to address this,
the Teradata customer has their vehicles instrumented to produce diagnostics
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
49
541-0004922A01
Facilities and Techniques for Event Processing
measurements periodically. These diagnostics can be used to understand the
health of the moving parts when they are in action.
For example, when a vehicle component becomes overheated, that could be a sign
that something is causing unusual friction inside the mechanism. Having this information right away while the vehicle is en route could prevent a serious accident.
But one reading from the instrumentation alone is inadequate in diagnosing a problem. To unnecessarily delay or sideline a vehicle that is performing a valuable
commercial service just on the suspicion of a problem could be costly and create
customer dissatisfaction. What is needed to make the right decision is to interpret
the on-the-spot diagnostic readings by looking at the history of that particular component and factoring in environmental variables.
As soon as diagnostics for a vehicle are emitted, these readings are batched up
and immediately sent into the Teradata database as one transaction. Inside the
Teradata database, a history of each component’s diagnostics have been collected, and during the event analysis the last 8 readings are compared against the
current transaction’s readings. As a result of this analysis, several different actions can be ordered. For example, the mandated action could be stop the vehicle
immediately, or stop at the next convenient location, or do maintenance at the end
of the trip. Most often the action is do nothing.
8.3.2. Using Tibco
Tibco provides a framework for a Service Oriented Architecture and allows you to
design a workflow, using icons, to represent a process and its services.
In this
prototype, the Business Works component was used to map out the processes. In
Business Works a workflow is representative of a process, and a Project includes
multiple workflows.
Below is a screen shot showing a simple workflow.
50
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
Figure 20: Tibco Workflow example
In this prototype data comes in as flat files. So the first step in defining this process
in Tibco’s Designer is to drag into the work space an icon (referred to as a palette)
representing reading of a flat file. Once you’ve dragged this palette in, all you need
to do is name it, give it properties, and associate it to a file name. The file name
you enter will be the file that contains the recent wheel readings.
Sending a query along with the file into Teradata is accomplished by dragging in an
instance of another palette known as the “Teradata Adaptor Configuration” palette.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
51
541-0004922A01
Facilities and Techniques for Event Processing
Figure 21: Teradata Adaptor Configuration
The Teradata adapter exposes a number of design time properties, and you may
configure each instance of the adapter differently. For example, one variable is the
“subscriber bulk insert size”, which is sort of like the pack factor in TPump. It directs the adapter to read data into Teradata with a specified batching size.
As part of configuring this adaptor instance, Tibco will read the Teradata dictionary
to pull off information, such as columns in a table, to help you establish the layout
of staging tables. There are also options that allow you to specify data conversions, if needed. The flexibility is there to map one input file to several different
database tables.
52
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
Figure 22: Teradata Adaptor Services Settings
Publication and subscription services are available for Teradata, again using palettes specific to those activities. If you drag a subscription palette into your workflow, for example, the tool automatically creates a shadow table for you and a trigger to instigate an insert into the shadow table, where it will move rows to be published. All of this is based on options you have selected. Then it automatically sets
up polling on the shadow table where it pulls data into its message bus. Of course,
you will need to define who subscribes to that data as well.
After defining and testing an instance of the Teradata adapter, the next step is to
deploy it into the infrastructure.
8.3.3. Tibco Example Conclusions
While this section shows a prototype using Tibco, it is only one of many possible
examples of how Teradata can work with standard Service Oriented Architectures.
This example is intended to illustrate how Teradata can plug into both the event
application and to the supporting structure, and could easily be redefined leveraging WebLogic Integration, Business Integration Server, BizTalk Server, or any standards-based EAI infrastructure.
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
53
541-0004922A01
Facilities and Techniques for Event Processing
9. Final Thoughts
In V2R6, the Teradata Database has a strong new look, offering foundational features for extensibility and integration into modern enterprise architectures. Innovative event-oriented applications are made possible by these new features, features
which are both forward-looking in their outreach to the operational world, and wellgrounded in the traditional Teradata strengths. This Orange Book not only illustrates these new capabilities, but describes various prototypes that establish the effectiveness and relevance of these new features.
We've examined the usefulness of queue tables for message passing inside Teradata, with their unique ability to support blocking and/or destructive reads. We've
demonstrated how external stored procedures can read or write to message
queues on other platforms, and invoke or be invoked from SQL-based stored procedures, making up a virtual chain of event activity. We've proven the usefulness
of both scalar UDFs and table functions for on the spot analytics, complex transformations and text manipulations, and even more extensive external I/O activity.
Consider these as starting points, something to build on, modest examples of what
Teradata can now offer in the world of event processing.
54
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
Facilities and Techniques for Event Processing
541-0004922A01
Appendix: XLST UDF and Access Module to Transform XML to Relational
This illustrates using XSLT to convert XML to vartext. XSLT is a W3C standard language for transforming
XML There are numerous free and commercial XSLT editors, debuggers, and processors. A prototype
under development embeds the XSLT processing in a Teradata utility Access Module and in a table function UDF. In this example, the XSLT processor is invoked via a table function UDF. Given an XML and
XSL document, the Axsmod returns vartext and the UDF returns rows comprised of varchars. The XSLT
is used to navigate and select the desired content from the XML document. The UDF can be used to update tables via SQL from a sub-table returned from the UDF which takes the source XML and transforming XSLT as input args (varchar/varbyte, clob/blob, or literal).
The sample below shows how an XSLT UDF creates a table of rows of vartext from varchar columns containing XML
CREATE SET TABLE RMH.xmlorderstage, NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ordernum INTEGER,
,xsltref INTEGER
,xmlOrder VARCHAR(30000) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( ordernum );
CREATE SET TABLE RMH.xslts ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
xsltnum INTEGER
,xslt VARCHAR(30000) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( ordernum );
<?xml version="1.0"?>
<ROOT>
<ORDER>
<DATE>08/12/2004</DATE>
<PO_NUMBER>108</PO_NUMBER>
<BILLTO>Rick</BILLTO>
<ITEMS>
<ITEM>
<PARTNUM>101</PARTNUM>
<DESC>Partners Conference Ticket</DESC>
<USPRICE>1200.00</USPRICE>
</ITEM>
<ITEM>
<PARTNUM>148</PARTNUM>
<DESC>V2R5.1 Stored Procedures and Embedded SQL</DESC>
<USPRICE>28.95</USPRICE>
</ITEM></ITEMS></ORDER></ROOT>
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved
55
541-0004922A01
Facilities and Techniques for Event Processing
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
<xsl:output method='text'/>
<xsl:strip-space elements="*"/>
<xsl:template match='/'>
<xsl:apply-templates select='ROOT' mode="order"/>
<xsl:apply-templates select='ROOT/ORDER/ITEMS' mode="item"/>
</xsl:template>
<xsl:template match='ITEM' mode="item">
<xsl:value-of select='"item"'/>
<xsl:text>,</xsl:text>
<xsl:value-of select='/ROOT/ORDER/PO_NUMBER/text()'/>
<xsl:text>,,,</xsl:text>
<xsl:value-of select="PARTNUM"/>
<xsl:text>,</xsl:text>
<xsl:value-of select='DESC/text()'/>
<xsl:text>,</xsl:text>
<xsl:value-of select="USPRICE"/>
<xsl:text>&#xa;</xsl:text>
</xsl:template>
<xsl:template match='ORDER' mode="order">
<xsl:value-of select='"order"'/>
<xsl:text>,</xsl:text>
<xsl:value-of select='PO_NUMBER/text()'/>
<xsl:text>,</xsl:text>
<xsl:value-of select="DATE"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="BILLTO"/>
<xsl:text>,,,</xsl:text>
<xsl:text>&#xa;</xsl:text>
</xsl:template>
</xsl:stylesheet>
replace function xslt(prmiPassthru INTEGER, prmiXMLDoc varchar(32000), prmiXSLT varchar(32000))
returns table (prmPassthru integer, vartext varchar(32000))
language C NO SQL parameter style sql EXTERNAL NAME
'F!xslt!SS!xslt!c:/files/projects/xslt/xslt.c';
SELECT L.vartext
FROM (select xmlOrder, xslt
From XMLOrderstage , XSLTS
where xsltref = xsltnum
) as T,
TABLE( XSLT(1,T.xmlOrder,T.xslt) ) AS L
order,108,08/12/2004,Rick,,,
item,108,,,101,Partners Conference Ticket,1200.00
item,108,,,148,V2R5.1 Stored Procedures and Embedded SQL,28.95
56
NCR Confidential — Copyright © 2005 NCR — All Rights Reserved