Download Facilities and Techniques for Event Processing

Facilities and Techniques for Event Processing Teradata Database V2R6 By: Date: Doc: Active Data Warehouse Center of Expertise February 14, 2005 541 – 0004922 – A01 Abstract: The emergence of event processing is Teradata Database V2R6 allows a new class of active data warehouse applications to be supported, and deepens the ability of Teradata to interact with an organization’s operational environment. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 541-0004922A01 Facilities and Techniques for Event Processing NCR CONFIDENTIAL Copyright © 2005 by NCR Corporation. All Rights Reserved. This document, which includes the information contained herein,: (i) is the exclusive property of NCR Corporation; (ii) constitutes NCR confidential information; (iii) may not be disclosed by you to third parties; (iv) may only be used by you for the exclusive purpose of facilitating your internal NCR-authorized use of the NCR product(s) described in this document to the extent that you have separately acquired a written license from NCR for such product(s); and (v) is provided to you solely on an "as-is" basis. In no case will you cause this document or its contents to be disseminated to any third party, reproduced or copied by any means (in whole or in part) without NCR's prior written consent. Any copy of this document, or portion thereof, must include this notice, and all other restrictive legends appearing in this document. Note that any product, process or technology described in this document may be the subject of other intellectual property rights reserved by NCR and are not licensed hereunder. No license rights will be implied. Use, duplication or disclosure by the United States government is subject to the restrictions set forth in DFARS 252.227-7013 (c) (1) (ii) and FAR 52.227-19. Other brand and product names used herein are for identification purposes only and may be trademarks of their respective companies. WebSphere® is a registered trademark of International Business Machines Corporation in the US and other countries. WebLogic Integration™ is a trademark of BEA Corporation. BizTalk Server® is a registered trademark of Microsoft Corporation. Tibco® is a registered trademark of Tibco Software, Inc. Revision/Version A01 Authors Date Primary contributors: Rick Glick, Bob Hahn Others: Carrie Ballinger 02-07-05 Comments Initial version NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 Table of Contents 1. Introduction....................................................................................................1 1.1. Why Do Event Detection in Teradata? ............................................................................... 1 1.2. Tools for Event Generation and Management ................................................................... 2 1.3. Scope.................................................................................................................................. 2 2. Queue Tables .................................................................................................3 2.1. Queue Table Advantages for Event Processing ................................................................ 3 2.2. Considerations Using Queue Tables.................................................................................. 6 2.3. Example of Queue Table Use ............................................................................................ 9 3. Stored Procedures ......................................................................................10 3.1. Standard vs External Stored Procedures ......................................................................... 10 3.2. External Stored Procedure Examples .............................................................................. 11 3.3. A Simple Work Dispatcher Example ................................................................................ 12 4. User Defined Functions -- Scalar ...............................................................15 4.1. Protected vs Nonprotected ............................................................................................... 15 4.2. Opportunities for Scalar UDFs in Event Processing......................................................... 16 5. User Defined Functions -- Table.................................................................21 5.1. 5.2. 5.3. 5.4. 5.5. 5.6. How Table Functions Work .............................................................................................. 21 Table Functions with Transformations and Text Manipulation......................................... 22 Table Functions with Analysis .......................................................................................... 23 Table Functions that Generate Data ................................................................................ 23 Table Functions with External I/O .................................................................................... 28 UDF Considerations ........................................................................................................ 35 6. Using Triggers in Event Strategies ............................................................36 6.1. The Firing Statement ........................................................................................................ 36 6.2. Trigger Complexity Tradeoffs ........................................................................................... 39 6.3. Other Examples of Event Triggers ................................................................................... 42 7. Enterprise Data Warehouse Considerations.............................................43 7.1. Monitoring ......................................................................................................................... 43 7.2. Security............................................................................................................................. 44 7.3. Workload Management .................................................................................................... 45 8. Interacting with Event Architectures Outside Teradata ...........................48 8.1. Service Oriented Architectures......................................................................................... 48 8.2. How to Expose an Event or Service in Teradata.............................................................. 49 8.3. Example of Teradata within an SOA ................................................................................ 49 9. Final Thoughts.............................................................................................54 Appendix……………………………………………………………………………...55 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved iii 541-0004922A01 Facilities and Techniques for Event Processing Table of Figures Figure 1: Five stages in the evolution of the data warehouse........................................................ 1 Figure 2: Business Process Initiation/Continuation ........................................................................ 3 Figure 3: Periodic polling as a batch approach to capturing events............................................... 5 Figure 4: Queue tables support immediate notification of events .................................................. 5 Figure 5: Selecting a Queue Table primary index for processing performance............................. 7 Figure 6: Mini-batch event processing using queue tables for coordination .................................. 9 Figure 7: All components can be scaled out inside the database .................................................. 9 Figure 8: External stored procedure writes to a queue outside of Teradata ................................ 11 Figure 9: A Spawned Stored Procedure Architecture .................................................................. 13 Figure 10: A UDF is used to parse and process an XML document stored as a CLOB .............. 16 Figure 11: One UPI value is selected, therefore one AMP executes the UDF ........................... 21 Figure 12: Rows selected from the Allamp table control which AMPs do the work ..................... 25 Figure 13: The table function’s output is similar to a derived table ............................................... 27 Figure 14: If the query requests 1 day, only 1 partition is returned by the table function............. 33 Figure 15: Each date selected causes one query to be executed on the remote system............ 34 Figure 16: Processing an event may involve multiple physical transactions ............................... 37 Figure 17: The approach to using triggers can extend or reduce the recovery unit..................... 39 Figure 18: Triggers defined on the TPumpStatusTbl preseving status information ..................... 42 Figure 19: Providing a user and password for external platform.................................................. 45 Figure 20: Tibco Workflow example ............................................................................................. 51 Figure 21: Teradata Adaptor Configuration.................................................................................. 52 Figure 22: Teradata Adaptor Services Settings ........................................................................... 53 iv NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 1. Introduction Data warehousing is in a constant state of forward motion. Within that motion, patterns in the evolution of data warehousing can be defined within five typical stages: 1) Reporting; 2) Analyzing; 3) Predicting; 4) Operationalizing; and 5) Activating. This Orange Book is about the mechanics for event processing and connecting with the enterprise, addressing the 4th and 5th stage in the evolution of active data warehousing. The focus will be on implementing event processing inside Teradata using Teradata Database V2R6 features. STAGE 1 STAGE 2 STAGE 3 STAGE 4 STAGE 5 REPORTING ANALYZING PREDICTING OPERATIONALIZING ACTIVE WAREHOUSING WHAT happened? WHY did it happen? WHY will it happen? WHAT IS Happening? MAKING it happen! Primarily Batch Batch Increase in Ad Hoc Queries Ad Hoc Analytics Analytical Modeling Grows Continuous Update & Event Initiated Time Sensitive Queries Actions Gain Importance Take Hold Continuous Update/Short Queries Event Initiated Actions Figure 1: Five stages in the evolution of the data warehouse Each iteration of data warehousing builds upon its predecessor to increase overall business value. Previous evolutionary stages are stepping stones that create the conditions that support an integrated enterprise-wide event architecture. Such an event-inclusive architecture requires integrated decision-making data, needs tactical access and fresh data, relies on crisp and deep analytic capabilities. Event processing sits on top of a pyramid built upon and fed from these earlier, more established capabilities. 1.1. Why Do Event Detection in Teradata? For years, Teradata has been a successful platform for performing analysis on data. Teradata’s ability to make correlations and enable complex analytics such as predictive modeling has helped people discover interesting things about their data. The emergence of event capabilities within Teradata allows what used to be ad hoc discovery endeavors to be standardized into regular practice. Teradata is now capable of richer interaction with external systems, which allows you to deploy analytics closer to the data. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 1 541-0004922A01 Facilities and Techniques for Event Processing Exploiting the event capabilities inside of the Teradata data warehouse offers advantages that may be difficult to realize with an external event architecture tool. Embedded events support deep analysis with all of the information already collected in the data warehouse. Inside-Teradata events can interact with other decision-making already in play, with minimum overhead and no manual intervention. The analysis that these events rely on will be richer and deeper if they unfold inside the data warehouse. Also, because it is able to bring together disparate sources of information, Teradata can offer cross-subject-area conclusions, to what may appear to be simple questions. For example, determining if there is going to be drug interaction may require access to and knowledge of an array of different data already in Teradata. 1.2. Tools for Event Generation and Management Teradata has the following components which can be useful in event initiated processing and in operationalizing event generation and management: • • • • • • Queue Tables SQL-based Stored Procedures External Stored Procedures User Defined Functions Table User Defined Functions Triggers With queuing functionality inside the database and the ability to reach outside the database in real-time via external stored procedures and UDFs, Teradata’s role has expanded into being a player in the overall event architecture. 1.3. Scope This Orange Book is intended to explain the mechanics available in the database to initiate events, to interact with the outside world, and to do in-the-database analysis of events, both internal and external. It does not extend itself to the perspective of implementation of business functions. This Orange Book discusses the tools, and leaves what can be done with those tools to a later discussion. The majority of this Orange Book focuses on the internal tools that support event processing, illustrated by examples and prototypes. But a second, equally important discussion is presented in Chapter 8: How Teradata can fit into modern service-oriented architectures, such as IBM’s WebSphere® Business Integration Server, BEA’s WebLogic Integration™, or Tibco® BusinessWorks. The targeted audience for this Orange Book is Teradata database administrators, enterprise application architects, business analysts, and NCR/Teradata associates who have a background in Teradata database management and implementation. The content and terminology assumes the reader has knowledge equivalent to that acquired from the Teradata Physical Database Design class. 2 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 2. Queue Tables Queue tables are database objects similar to tables but with the properties of queues. An Orange Book entitled “Queue Tables User’s Guide” (541-0004817), published in October of 2004, is a good source of detail on how to use these structures. Two of the most relevant properties of a queue table are exposed in the SELECT and CONSUME syntax: 1. Blocking read, which means that if a queue table is empty at the time a query is trying to access a row, the query will wait until a row is placed in the table. 2. Destructive read, which means when a queue table row is being accessed, it is automatically deleted from the queue table at the time the transaction commits. Just briefly, when an event is identified, a row can be written immediately into a queue table, with appropriate data to identify the event. A queue table is generally associated with at least one process that is waiting for the appearance of data in the queue table, data which represents an event. This process will then read a row, by means of a SELECT AND CONSUME statement, from the queue table and continue on with processing the event. After the event is dealt with, the process listens for another message. Blocking Read SELECT and Call a Stored Procedure OR CONSUME OR Program or Trigger INSERT a Row Queue Table Publish Externally Process Monitoring Figure 2: Business Process Initiation/Continuation 2.1. Queue Table Advantages for Event Processing There are several key advantages of queue tables for event processing: • • • • • • They provide a highly-efficient mechanism to pass information from one place to another at the moment that the information becomes available. Queue tables allow you to decouple the notification of an event and its consumption, such that they can be processed asynchronously and independently. Queue tables can act as a buffer when the inserts are occurring at a different rate than the consumption. They are a more efficient alternative to periodic polling, the primary method of reporting events inside Teradata pre-V2R6. The message itself can be structured by defining columns for different attributes. The SELECT AND CONSUME implementation offers both blocking reads and destructive reads. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 3 541-0004922A01 Facilities and Techniques for Event Processing 2.1.1. Efficiency Both writing to and consuming from the queue table is a single-AMP operation, and as such, avoids the blocking potential and coordination effort of all-AMP activities. One transaction can be writing to the queue table at the same point in time that a second transaction is reading and consuming a row with a different row hash, without conflict. In addition, any part of your IT infrastructure, whether internal or external from Teradata, can act as a publisher or subscriber, either writing to or selecting and consuming from a queue table. 2.1.2. Asynchronous Processing Processing an event asynchronously is similar to leaving a voice mail message when someone’s line is busy. You can be confident that the message will be delivered in the near future, but you are not required to wait around keeping your own phone line open and twiddling your thumbs until that time comes. You are free to terminate your call after leaving your message and continue on with other activities. Asynchronous processing as represented by queue tables has the same advantage. The poster of the message is not weighed down or held back by whatever subsequent processing is done based on the posted message. He is free to commit this transaction and move ahead with other work. 2.1.3. Queued Notification vs. Periodic Polling Because it enables asynchronous processing, data can be loaded into a queue table and buffered until it can be processed by procedures that perform potentially complex analysis or insertion of data. Periodic polling relies on a program querying a table at regular intervals, a table that is acting as a collection points for events. Triggers may have been defined to insert rows into this collector-table when each individual event is recognized. The interval of time between program executions may or may not match the appearance of expected events. Some of this timed access may be fruitless, as the polling program may be casting a net when there are no fish in the pond. Polling exhibits a constant and un-resolvable tension between asking too often and incurring more overhead vs. conserving resources by asking less often and allowing too much time to go by before the event is properly processed. Queue tables solve this quandary via blocking reads. An example from a recently-performed active data warehouse benchmark illustrates the advantages of queue tables over the traditional polling approach to reporting events. In this example, a Business Activity Monitoring (BAM) query reports on the effectiveness of current promotions, based on sales that happened just a few minutes ago. To support this BAM query’s needs, a subset of rows inserted into the Mkt_Basket_Dtl table are identified as part of a special promotion, and thus as significant events. This first graphic below illustrates the pre-V2R6 image of the event processing, where triggers on Mkt_Basket_Dtl inserted a row into a staging table for each pro4 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 motional sale recognized. Periodic polling was used in this earlier version of the benchmark to pull those inserted rows out of the staging table every 15 minutes, followed by an emptying of the staging table. Every 15 minutes Periodic Query Mkt_Basket _Dtl Trigger Promotional DB Table Report Figure 3: Periodic polling as a batch approach to capturing events The periodic polling approach was replaced in the V2R6 version of the benchmark by queue tables. Using queue tables pushes the information out to a dashboard where it can be seen and acted on when the event is first noticed. SELECT and CONSUME Blocking Query Mkt_Basket _Dtl Trigger Queue Table Display on Dashboard Figure 4: Queue tables support notification of events The queue table definition from the benchmark follows: create table event02_QT, QUEUE (table_event02_QT_QITS TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6), orderkey DECIMAL(18,0) NOT NULL, productkey DECIMAL(18,0) NOT NULL ) PRIMARY INDEX (orderkey); 2.1.4. A Structured Message Because queue tables are based on Teradata base tables, they support the relational concept of sets of rows composed of different columns. Each row in a queue table represents a different message; each column, a different attribute within the message. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 5 541-0004922A01 Facilities and Techniques for Event Processing Thus, the message being passed in Teradata is formatted, with information describing each column held in the data dictionary. There is no extra effort required to decompose the message into meaningful attributes, as there would be if the message were read as a continuous string. 2.2. Considerations Using Queue Tables Queue tables often act as conduits rather than repositories, as a means of transport, not a destination. For that reason, consider the lifespan of a row placed in a queue table to be short, compared to a base table, whose rows usually reflect nontransient, often historical data. As a result, you do not need to be concerned with collecting statistics or indexing. In the first release, views are not supportable on top of queue tables, nor may you build join indexes or hash indexes on them. The source or destination of the AS clause (when copying table definitions) may not be a queue table. Currently, queue tables may not be replicated. Recognize that there needs to be sufficient space for the queue table to handle maximum bursts of events, and to account for downtimes of event handlers. However, just like having confidence that your voice mail messages will be delivered and acted on, it is extremely important that queues be 100% reliable. Because reliable messaging is key to any application using queues, it is important to note that Teradata queue tables benefit from all the standard database reliability, including transient journaling, fallback, and the ability to backup and recover. Database Query Log (DBQL) treats queue tables the same as ordinary database tables. Below is output from the DBC.DBQLObjTbl after a single insert into, and a single select and consume from a queue table. When a queue table access is recorded, DBQL uses the object type ‘T,’ the same object type as a base table. QueryID 31590 31590 31590 31590 31590 31595 31595 31595 31595 31595 ObjectTableName ? QUEUETST1 QUEUETST1 QUEUETST1 QUEUETST1 ? QUEUETST1 QUEUETST1 QUEUETST1 QUEUETST1 ObjectColumnName ObjectType ? ? MessageBody MessageID messageT ? ? messageT MessageID MessageBody D T C C C D T C C C 2.2.1. Primary Index Selection Selection of a primary index may, or may not be important when defining a queue table. The default, if no primary index is specified, is the QITS (Queue Insertion Time Stamp) column, the first column in the queue table, a column that reflects the time of insertion of that row. If you need to do a primary index update or delete of 6 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 one row from the table, in most cases you can easily browse the table and get the value of the QITS and other columns needed for the single-AMP update activity. However, you may choose a key column based on the input data that is easily known, to support frequent updating. The queue table definition presented in Section 2.1.3 above has orderkey as its primary index, for example. Using a business entity for the primary index makes sense if you have a requirement to re-order the queue on a regular basis (or otherwise manipulate the rows), and the number of rows it holds is not trivial, making browsing the entire queue table for each update less desirable. Another situation calls for more attention to PI selection of the queue table. That is the case where you are performing insert/select processing from a staging table into a queue table. Such an insert/select might perform a similar function as a trigger when doing row-at-a-time inserts: Select out the few rows that compose events and insert them immediately into a queue table for further processing. Different Primary Indexing Staging Table Insert/Select Queue Table PI = ClaimID AMP0 PI = QITS AMP1 AMP2 R AMP3 Same Primary Indexing Insert/Select Staging Table PI = ClaimID AMP4 Queue Table PI = ClaimID AMP5 AMP6 AMP7 Row tion tribu edis Staging Table Row Inserted Here Queue Table Row Inserted Here Figure 5: Staging Table Row Inserted Here Queue Table Row Inserted Here Selecting a Queue Table primary index for processing performance If you are using a mini-batch approach to loading data, then using set processing to identify events makes sense. The WHERE clause on the insert/select statement would contain the event-identification criteria. In this case, illustrated above, having a queue table with the same primary index definition as the staging table will improve the efficiency of the insert/select processing. The two inserts into the staging and the queue table would happen on the same AMP, eliminating row redistribution overhead. Designing a queue table to share the same primary index as a base table might also be useful for tactical queries. A given tactical application may want to peek at the queue, or seek out the presence of a specific row on the queue. If the base table primary index value is known, and is also the primary index value of a potential queue table row, then single-AMP access will be enabled. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 7 541-0004922A01 Facilities and Techniques for Event Processing A similar situation exists if TPump inserts are into a base table that has a trigger into a queue table. If the queue table and the base table being inserted into share the same primary index, and there is a desire to serialize the input to avoiding cross-session blocking, the serializing on the base table’s primary index will also cause inserts into the queue table to be serialized effectively. 2.2.2. When Requests are Delayed Programs attempting to SELECT AND CONSUME from queue tables will block until such time as there is a row present in the queue table. No locks on the queue table are granted to the transaction that is in delay mode waiting for a row A limit on the number of total possible sessions that may go into the delayed state has been set at 20% of the total possible sessions (usually that will be 24 delayed sessions per node). Once that threshold has been exceeded, an error will be returned to the user whose query would have been the next one delayed. If Teradata Dynamic Workload Manager (formerly known as Teradata Dynamic Query Manager) rules are enabled, it is possible that requests intended to select and consume may themselves be delayed due to object throttle rules (formerly known as workload limit rules), or rejected due to query filter rules (formerly known as query management rules). In the former case (that is, when delayed), the queue table queue depth may increase and events may not be processed as quickly as anticipated. 2.2.3. Transactional Considerations In order to avoid tying up database resources unnecessarily within a transaction, it is recommended that the SELECT AND CONSUME statement happen first in an explicit, or implicit, transaction. That way if the queue table access statement blocks, no sister-statement row or table locks will be held, potentially blocking other transactions. In the following transaction the SELECT AND CONSUME statement will wait for a queue table row to be inserted and committed by another transaction. Because it is the last statement in the transaction, table level write locks placed by the all-AMP update statement preceding it will be held until the queue table can be read. BT; UPDATE CLIENT SET CALLFLAG = ‘Y’ WHERE ABS_HON_DT < RECEIPT_DT; SELECT AND CONSUME TOP 1 * FROM QTBL; ET; If queue table consume commands are placed within the same transaction as other statements that hold locks, do the queue table access first: BT; SELECT AND CONSUME TOP 1 * FROM QTBL; UPDATE CLIENT SET CALLFLAG = ‘Y’ WHERE ABS_HON_DT < RECEIPT_DT; ET; 8 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 It is important to do the appropriate action for an event as part of the same transaction in which it is consumed, in order to prevent events from being lost. 2.3. Example of Queue Table Use A queue table can be useful to pass information from one stored procedure to another, each of which has a different task in the chain of event processing. For example, if one stored procedure is performing mini-batch insert/selects into a base table, it could use an insert into a queue table as a method to indicate that a minibatch cycle is complete. A second stored procedure could be standing by trying to read from the queue table so it can initiate further, more complex processing, perhaps reading and updating information in another base table, or performing more complex analyses. Because the queue table supports structured messages, the first stored procedure could pass detailed information as to what actions need to be taken next. The second stored procedure, relying on its procedural logic, could branch in the code depending on what information was passed in the message. Load Stored Procedure INSERT Queue Table SELECT / Event Processing CONSUME 1. Insert/Select Stored Procedure 1. Selects/Consumes Queue Table row 2. Signify I’m done by Insert into Queue Table 2. Processes events in another base table Figure 6: Mini-batch event processing using queue tables for coordination Queue tables being read by stored procedures lend themselves well to scaling out as demand grows. When the number of rows in the queue table exceeds the capability of a single stored procedure to process, you may increase stored procedure instances as needed. Queue Table 1 Queue Table 2 SELECT / CONSUME SELECT / CONSUME SP1 SP1 2 2 3 3 Figure 7: All components can be scaled out inside the database NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 9 541-0004922A01 Facilities and Techniques for Event Processing 3. Stored Procedures Stored procedures can encapsulate both event detection and its processing, and allow the mixing of SQL with procedural logic. Teradata has supported standard stored procedures since the V2R4.0 release. Stored procedures are database objects that must be compiled prior to use. Their object code is held in the data dictionary. Parameters may be passed into and out of a stored procedure, which itself is a program that is called from and executes within the database. A stored procedure executes in the parsing engine (PE) under one parser task. While the SQL portions of the stored procedure will be executed across all AMPs and benefit from Teradata’s inherent parallelism, the procedural portions will not. Algorithms will be most efficient when written to make use of Teradata’s set processing advantage. Particular attention should be paid to row-at-time cursor processing within a stored procedure, as that activity will not be parallelized. 3.1. Standard vs External Stored Procedures External stored procedures are new in Teradata Database V2R6. External stored procedures are similar to the standard SQL-based variety, in that they are called from and execute within the database, they support parameters, and while only one may be called by any given session at a time, a stored procedure can call another stored procedure. Either variety may be invoked by a trigger, as a result of an action on a table. 3.1.1. Differences However, there are some clear differences between the SQL-based and external varieties. The following list shows some of the ways external stored procedures are different: • • • • Implemented in C or C++ code Can issue SQL by invoking an SQL-based stored procedure Cannot be invoked from another external stored procedure Can perform external I/O, such as reading or writing from a file, a message queue, or an EAI bus • Unrestricted access to the operating system and library functions 3.1.2. Benefits of External Stored Procedures The essential value of external stored procedures for event processing is apparent in at least two contexts: • • External communications Leveraging C programs and library functions already in existence In considering communications outside of the Teradata database, reading and writing to an external queue, such as WebSphere MQ (referred to here as MQ), can be extremely powerful for processing events. Speaking to the advantage of the first bullet in the differences list above, some analytic algorithms are more easily ex- 10 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 pressed in C than in SQL. External stored procedures allow those analyses to be performed under the control of the database. 3.1.3. When to Consider an External Stored Procedure There are several good reasons to consider an external stored procedure: • • • • Processing close to the data is a benefit because you have access to other associated data, if needed, and do not incur the overhead of pulling the data out in a raw form that may be less cumbersome in its final form. Teradata offers sound reliability and availability advantages, and doing all the work in one place eliminates concerns about having more than one system up and running to get the work accomplished. Using the Teradata infrastructure, you can easily scale out the processing if you need, increasing how many instances of the same stored procedure that you invoke, as demand increases. This scale-out can even be automated, depending on the time of day, or day or week. The signature (input and output parameters) is maintained in the Teradata dictionary. 3.2. External Stored Procedure Examples To illustrate a simple external stored procedure, a prototype was built with such a stored procedure reading a queue table, then writing the contents to an MQ queue outside of Teradata. Detailed code from the prototypes discussed in these chapters will be posted, when they are mature, on the Tech Center site on Teradata.com. http://www.teradata.com/t/page/118769/index.html The stored procedure is started up and calls an SQL-based stored procedure. This SQL-based stored procedure will block using a SELECT AND CONSUME until a row has been inserted into the queue table. As soon as a row is available, a destructive read of that queue table row occurs, and the SQL-based stored procedure continues execution, returning the message to the external stored procedure. That queue table row is no longer available for any other transaction to read. As soon as the queue table row is read, the external stored procedure does a put to the queue, using the Teradata WebSphere MQ access module. The stored procedure then loops back to attempt another read from the queue table, and will wait until a new row has been inserted, if needed. TERADATA Claims Table Trigger Queue Table SELECT and CONSUME GetMsg SP WriteMQ XSP Put Msg on Queue MQ Queue Figure 8: External stored procedure writes to a queue outside of Teradata NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 11 541-0004922A01 Facilities and Techniques for Event Processing One clear advantage of this approach is that rows inserted into the queue table can be immediately put to an external queue outside of Teradata. A second advantage is that a single MQ Connect/Open can be amortized over many Get/PUTs. The external stored procedure call used in this prototype looks like this: call WriteMQ('queue.manager.1‘ ,'QUEUE1‘ ,'CHANNEL1/TCP/153.64.119.177‘ ,'rmh.getmsg‘ -- name of SP to call--this one consumes a Queue table ,nummsgs); [Prototype Example #1] The parameters being passed with the WriteMQ stored procedure are: • • • • Queue manager name Queue name Client communication channel The name of a second stored procedure that is SQL-based This SQL-based stored procedure ‘getmsg’ is called within the external stored procedure. It performs the SQL that reads the queue table, a row at a time. This SQL-based stored procedure code looks like this: replace procedure rmh.getmsg(Out msg varChar(32000)) Begin sel and consume top 1 MessageBody into :msg from rmh.mqmsg; End; 3.3. A Simple Work Dispatcher Example To stimulate thinking, here is an example of the kind of things that can be built upon the basic event infrastructure inside of Teradata. In the following prototype, the database is being deployed to schedule the event and fan it out. Several external stored procedures are initiated and controlled by an event infrastructure built by the implementer. Key to this prototype is a stored procedure that acts as a manager (SPMGR). This stored procedure reads from a single queue table and starts off other stored procedures to accomplish specific tasks. Triggers from different tables in the database can place rows in this queue table. Each row represents a command to be executed on behalf of an event, and carries three columns: • • • A logon string A command indicator SQL syntax to call to a specific stored procedure This queue table looks like a task list of events to be processed. SPMGR, which reads the queue table, is an external stored procedure. For each row it reads, it 12 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 dispatches new work by logging on a new session (via CLI) to Teradata. Each of these sessions executes a stored procedure using the logon string and the SQL syntax contained in the queue table row that was just read. Multiple such sessions can be held open at the same time. The queue table decouples the trigger and the processing of the event, preventing the original transaction from being held up. A CL syn IC c al l SP1 SELECT / CONSUME Trigger Base Tables Queue Table Async CLI Call SPMGR Table Function: Read from MQ SP2 XML Shredding c yn all As I C CL SP3 UDF: Single row scoring Control Table Figure 9: A Spawned Stored Procedure Architecture All stored procedures that are spawned from SPMGR use a common method of logging and accept control commands. A shared command table makes these control commands available to all active sessions. Each running stored procedure places an entry in the control table when they first begin processing, and removes the row when they complete. They periodically read the control table for new directives. For example, there may be a directive asking them to shut down. Each stored procedure is responsible for logging into a common set of log tables (not shown here). This command table offers reliability and recoverability. If one of the stored procedures fails, the command table will show that the job never completed. This design is scalable to meet increasing demand. Multiple SPMGR instances can be reading from the same queue table, each spawning their own set of worker stored procedures. All spawned stored procedures, no matter what their point of origin, will report in to the command table by inserting and deleting rows. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 13 541-0004922A01 Facilities and Techniques for Event Processing The syntax to create the queue table used in this prototype follows: CREATE MULTISET TABLE RDG.spmgrq ,QUEUE ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( InTS TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6), RwID INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MINVALUE -2147483647 MAXVALUE 2147483647 CYCLE), LogonStr VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC, Command VARCHAR(32) CHARACTER SET LATIN NOT CASESPECIFIC, spCall VARCHAR(1024) CHARACTER SET LATIN NOT CASESPECIFIC) PRIMARY INDEX ( RwID ); [Prototype Example #2] 14 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 4. User Defined Functions -- Scalar User Defined Functions (UDFs) are database objects that the implementer either builds or acquires, that can extend the capability of normal SQL within the database. A UDF is similar to the standard SQL functions such as SQRT, ABS, or TRIM and is invoked in exactly the same manner. UDFs execute in parallel within the database. However, the developer of the function can direct which AMPs will participate and which AMPs won’t. UDFs may be written in C or C++, and are then compiled into shared objects in UNIX, or into dynamic link libraries (DLLs) in Windows. Once compiled, the UDF can then be referenced in SQL statements for activities such as enforcing business rules or aiding in the transformation of data. Samples of User Defined Functions can be found at the Tech Center site on Teradata.com: http://www.teradata.com/t/page/118769/index.html There are 3 types of UDFs: • • • Scalar, used like a column and operates on the values of a single row Aggregate, returns a result (such as a MAX or a SUM) from a pass over a group Table Function, appears in the FROM clause and returns a table, a row at a time This chapter will explore scalar UDFs, and table function UDFs will be addressed in the next chapter. Aggregate UDFs will not be addressed in this Orange Book. Information on implementing UDFs can be found in the Orange Book titled “Teradata Database User Defined Function User’s Guide,’ authored by Mike Watzke, August, 2003. In Teradata, UDFs execute under the control of the AMP and can be very efficient doing row by row complex analyses. They are scalable and inherit all of Teradata’s natural parallelism. 4.1. Protected vs Nonprotected When you create a UDF, the mode for that UDF will be the default of “protected”. Protected means the UDF runs in its own address space, and is isolated from other AMP work. If you are running in protected mode and a hardware or software fault occurs, the user is notified and the database does not restart, and any required cleanup is done. If you are running in non-protected mode and are holding resources such as memory, and the UDF aborts or a fault occurs, the resources may not be cleaned up. When a UDF runs in protected mode, it runs as user “tdatuser” which is established when the database is installed. This is a generic user that has no special privileges that any other ordinary user might have on the system. UDFs running in protected mode use a separate process set up for that purpose, rather than using AMP worker tasks. These processes are referred to as protected mode servers. See Section 7.2.1 for security considerations. Depending on where your default has been set, there will be a limit of from zero to a maximum of 20 protected mode servers available at any one time. Each proNCR Confidential — Copyright © 2005 NCR — All Rights Reserved 15 541-0004922A01 Facilities and Techniques for Event Processing tected mode server requires 256 KB of file space on the system disk. If 20 per vproc is the default setting you use, and you have 8 vprocs per node, then 8 x 20 x 256 KB = 40 megabytes of system disk space will be required. Note that the performance in protected mode will be somewhat slower. Although protected mode has some limitations, in order to allow a UDF to do I/O safely and not interfere with the database, it is recommended that such UDFs run in protected mode. When not running in protected mode, the UDF will run in the context of the AMP worker task already in-use by that query step. No additional AWT overhead is involved. 4.2. Opportunities for Scalar UDFs in Event Processing Because they centralize control over specific actions and are highly flexible, UDFs are ideal for managing events and standardizing operations inside of Teradata. Some of the special things that can be done using UDFs are: • • • Transformations and text manipulation, such as XML to non-XML text, or converting a picture into a thumbnail Analytics, such as scoring of a predictive model or performing risk assessment External I/O, such as talking to other EAI systems, getting external data, or talking to queries that run outside of Teradata The following three sections provide illustrations of scalar UDFs supporting event processing within Teradata. 4.2.1. Processing XML Documents One example where scalar UDFs are useful is scanning an XML document and returning specified content, after that document has been stored inside the database. The following example is of a UDF that uses XPath, which is a set of syntax rules that allow you to navigate an XML document. XPath, which has a function similar to substring, uses path expressions to identify nodes in an XML document. Teradata Client PathValue UDF OrderLog Orderkey 7728 Ordekey 932 Orderkey 025418 CLOB CLOB CLOB Figure 10: A UDF is used to parse and process an XML document stored as a CLOB 16 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 Depending on your requirements, the XML document could be stored as a CLOB (Character Large Object) or as a varchar column. The former is illustrated in the graphic above, while the following prototype uses the latter. In this example below, the XML document is stored inside Teradata as one varchar column, XMLOrder. The base table, OrderLog, only contains two columns, PONum and the varchar column. Here is the XML document: <?xml version="1.0"?> <ROOT> <ORDER> <DATE>8/22/2004</DATE> <PO_NUMBER>101</PO_NUMBER> <BILLTO>Mike</BILLTO> <ITEMS> <ITEM> <PARTNUM>101</PARTNUM> <DESC>Partners Conference Ticket</DESC> <USPRICE>1200.00</USPRICE> </ITEM> <ITEM> <PARTNUM>147</PARTNUM> <DESC>V2R5.1 UDF Programming</DESC> <USPRICE>28.95</USPRICE> </ITEM> </ITEMS> </ORDER> </ROOT> <?xml version="1.0"?> <ROOT> <ORDER> <DATE>08/12/2004</DATE> <PO_NUMBER>108</PO_NUMBER> <BILLTO>Rick</BILLTO> <ITEMS> <ITEM> <PARTNUM>101</PARTNUM> <DESC>Partners Conference Ticket</DESC> <USPRICE>1200.00</USPRICE> </ITEM> <ITEM> <PARTNUM>148</PARTNUM> <DESC>V2R5.1 Stored Procedures and Embedded SQL</DESC> <USPRICE>28.95</USPRICE> </ITEM> </ITEMS> </ORDER> </ROOT> NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 17 541-0004922A01 Facilities and Techniques for Event Processing The Orderlog table was constructed to look like this: CREATE SET TABLE orderlog ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( PONum INTEGER NOT NULL, XMLOrder varchar(63,000) ) UNIQUE PRIMARY INDEX ( PONum ); The following SQL references the XpathValue UDF that uses XPath to pick out element and attribute content from the XML document. The arguments passed within the SQL (the BILLTO name, for example) are then used by XPath to search each document in the table. When a document with that specific billing name is identified, then the associated PO number and date are returned, as output arguments. select XPathValue(O.xmlOrder, '//ORDER/PO_NUMBER/*') as PO_Number, XPathValue(O.xmlOrder, '//ORDER/DATE/*') as theDate from OrderLog O where XPathValue(O.xmlOrder,'//ORDER/BILLTO/*') = 'Mike'; [Prototype Example #3] And the output of the query that uses XPathValue UDF looks like this: PO_Number TheDate -------------------------101 8/22/2004 4.2.2. Analytics Scalar UDFs can support on-the-spot analysis or predictive modeling, at the time of an event instead of batching up predictions-to-be to process during off-hours. Or the same scalar UDF can be used in a batch mode. Several different input parameters are fed into a set of algorithms that perform analysis on them and output a conclusion. This could be a score, if the algorithms are set up appropriately, and represent the likelihood that a given customer or client will do something that is good for the business, like book a trip, take out a loan, or make a particular purchase. In the example below, a UDF named ‘Strategy’ comes up with a recommendation for an appropriate financial strategy (‘Aggressive’, “Moderate”, Conservative”, etc.). The same UDF could be used for a single client, or for all clients. This scalar UDF encapsulates a simple decision tree analytic, based on data contained in columns from a table in the Teradata database, in this case SavingsPlanCustomers, and returns a single value, the recommended financial strategy. 18 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 When executed in the batch mode, the output from the UDF execution is inserted into a base table. But the same UDF could be used by a call center query to return a financial strategy recommendation for just one client. This would require the addition of a client ID equality condition in the WHERE clause. SQL for the batch approach might look like this: Insert into StrategyRecommendation Select ClientID, Strategy(SPC.age, SPC.balance, SPC.contribution, SPC.income) From SavingsPlanCustomers SPC; [Prototype Example #4] 4.2.3. External I/O In a third example, a scalar UDF has been created that writes a message to an external queue. This prototype is similar to the approach presented in Section 3.2. But instead of writing a message externally after reading from a queue table and calling an external stored procedure, in this example a scalar UDF is used. The UDF calls a Teradata Websphere MQ access module, identical to the access module a TPump job or other utility might use when processing from a queue. The SQL is a simple select that contains nothing in the select list but the scalar UDF and the arguments it expects. Executing this SQL results in one message, ‘Hello World,’ being placed on an MQ queue on the client: Select WriteMQ('queue.manager.1','QUEUE1','CHANNEL1/TCP/153.64.119.177', 'Hello World'); [Prototype Example #5] The parameters being passed are the queue manager, queue name, client communication channel, and the content of the message. The following is the DDL used to replace or create the function: replace function WriteMQ( qmgr varchar(256), qnm varchar(256), channel varchar(256), vcmsg varchar(32000)) returns integer language C NO SQL parameter style sql EXTERNAL NAME 'F:emruwmq:SI:cmqc:/usr/include/cmqc.h:SL:mqic:SL:mqmcs:SS:emruwmq:/home /rmh/projects/emruwmq/emruwmq.c'; In a broader use of the same UDF, data dictionary information within the Teradata database is being accessed and written to the external queue. The UDF is invoked for each row found in the DBC.Tables table that meets the requirements specified in the SQL where clause. The database name and table name are concatenated as a varchar input argument to the UDF that will then write that as a message to the MQ queue. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 19 541-0004922A01 Facilities and Techniques for Event Processing select count(*) as SentMsgs from (select WriteMQ ('queue.manager.1','QUEUE1','CHANNEL1/TCP/153.64.119.177', Trim(databasename)||'.'||trim(TableName)) as c1 from dbc.Tables Where TableKind = 'T')T; [Prototype Example #6] What the above example illustrates is the ease of sending an entire result set of an arbitrary SQL statement to a queue outside of Teradata. 20 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 5. User Defined Functions -- Table In contrast to scalar functions, discussed in the previous chapter, which return a single value, table functions are used in the FROM clause and return a set of rows. When present, a table function can be thought of as a derived table whose rows are produced by the UDF itself. 5.1. How Table Functions Work Table functions are sent to the AMPs at execution time. Each AMP calls the function repeatedly, one time for each row being produced, until the function signals there is no more work to be done on that AMP. A table function input argument may pass values that will determine what will be processed, and optionally control which AMPs will be active doing it. As the table function is called repetitively on the participating AMPs, each AMP builds up a spool file that contains the rows produced by its instance of the table function. The input arguments will determine if the table function is called in constant or varying mode: • Constant Mode: If the input arguments use a constant expression, and there are no correlated columns, then the table function will be sent to all AMPs. The table function can determine which AMPs actually produce rows. • Varying Mode: If the input arguments refer to a correlated base table column (which will vary in value for each different base table row accessed), then the AMPs that have rows pertaining to the input data provided will participate. For example, consider a query that invokes a table function and also accesses selected rows using a single UPI value for a base table. The table function is invoked across all AMPs. Because of the WHERE clause that references the base table’s primary index column(s), only activity on one AMP will occur on behalf of the table function, the AMP where the base table row(s) are located. Select B.Rate, B.Degree From AltClaims A, Table(FuncGetRate(A.Diagdata)) B Where A.ClaimID = 6; AMP1 AMP2 AMP3 AMP4 UDF ClaimID = 2 ClaimID = 5 ClaimID = 1 ClaimID = 6 ClaimID = 3 ClaimID = 8 Table Function executes here Figure 11: One UPI value is selected, therefore one AMP executes the UDF NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 21 541-0004922A01 Facilities and Techniques for Event Processing When variable input arguments are passed, table functions are only active on AMPs where correlated data exists. In the example illustrated in the graphic above, the table function will only be called on AMP2. Only 1 row will be returned by the table function because of the UPI access into the base table. It is up to the table function to determine the number of rows it wants to generate on that AMP. In the case where the table function returns multiple rows, a spool file will be created to hold these rows as they are created, just as would be the case during a full table scan. Because of the presence of the spool file, you will need to include a WHERE clause to control the join of the spool and the base table. Usually this join constraint will be between the primary index of the base table and a related column in the spool. Including this join constraint will avoid a Cartesian product between the two. When you define a table function, you will use the CREATE FUNCTION syntax, but one of the additional parameters will be a RETURN TABLE clause. This labels the UDF as a table function, specifying that a table consisting of a set of rows will be returned. As part of that clause a list of column type and character set pairs are included to describe the columns that will be returned and how they can be referenced. 5.2. Table Functions with Transformations and Text Manipulation A table function could produce rows solely from the input arguments. For example, an input argument could be a reference to a Character Large Object (CLOB) that contains XML text. From that CLOB it could parse the XML text and output a set of SQL rows. In this example, the xpath expression selects the parent of the multiple-occurring element Item. The UDF returns the text content of these 3 elements. SELECT L.var1 as Partnum, L.var2 as Price, L.var3 as Desc FROM (SELECT xmlOrder, poNum FROM OrderLog) as O, TABLE( XPathValues(O.poNum, O.xmlOrder,'/ORDER/ITEMS') ) AS L (poNum,var1,var2,var3, ...) where O.poNum = L.poNum; The output from that SQL looks like this: Partnum Price --------------101 1200.00 147 28.95 101 1200.00 148 28.95 Desc ----------Partners Conference Ticket V2R5.1 UDF Programming Partners Conference Ticket V2R5.1 Stored Procedures and Embedded SQL [Prototype Example #7] 22 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 5.3. Table Functions with Analysis Building on the scalar UDF example in Section 4.2.2, it is possible to re-create the Strategy UDF as a table function. This would make for a more complex UDF, which would return a set of rows, rather than just one value. The SQL that invokes the table function might look like the following. The table function provides significantly more detail than the simple UDF, as can be seen by the columns in the request’s select list. Insert into StrategyRecommendation Select ClientID ,ST.Strategy ,ST.Percent_CD ,ST.CD_Return ,ST.Percent_Bonds ,ST.BondAvgReturn ,ST.Percent_Mutual ,ST.MAvg5YrReturn From SavingsPlanCustomers SPC ,Table(Strategy(SPC.clientID,SPC.age,SPC.balance,SPC.contribution, SPC.income)) ST Where SPC.ClientID = ST.ClientID [Prototype Example #8] This table function is operating in varying mode and would engage all AMPs in the system because all rows from the SavingsPlanCustomer table are being read without selection criteria. If only one client were selected, by means of an equality condition on the primary index ClientID, then only a single AMP would be executing the table function. In addition, notice that there is a join constraint between the base table SavingsPlanCustomers and the output from the table function. This join back on ClientID prevents a Cartesian product join from being performed between the table and the spool, and ensures that only one row per Client is inserted into StrategyRecommendation. 5.4. Table Functions that Generate Data This section will discuss a data generation table function that illustrates several interesting things: 1. 2. 3. 4. Using standard libraries, in this case an established access mode Generating data within the database Being able to control the degree of parallelism doing the work Understanding database resource capacity 5.4.1. Using Standard Libraries An existing access module, previously used by standard Teradata utilities, is being called in this prototype. The same access module had been used to generate data with TPump and could have been used with either MultiLoad or FastLoad. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 23 541-0004922A01 Facilities and Techniques for Event Processing Because this was a simple prototype, the access module was designed to generate only a single string of data. The parameters for the access module are contained in the access module’s initstring. This initstring looks like this: '-roww 100 -f unformat' This initstring specifies that the row width will be 100 bytes and that the format type is unformatted. A simple invocation of the table function in a select statement without a where clause will cause the table function be executed repeatedly on each AMP in parallel. Each repetition of the table function causes the access module to be called. The arguments associated with the table function, which is named ‘emrcamrg,’ control the number of times the access module is called on each AMP. Here is the select statement used in the prototype. It references the table function, passes arguments for the table function, points to the location of the access module, and passes parameters to the access module. select * from table (emrcamrg (12000, 5000, 1, '/home/rmh/bin/libamrgenu.so', '-roww 100 -f unformat')); [Prototype Example #9] The first parameters in the parenthetical expression control the maximum number of milliseconds (12,000, the equivalent of 12 seconds) the table function will be allowed to execute. The second parameter states the maximum number of rows each AMP will produce. Whichever number is reached first (max seconds or max rows) will be the controlling factor in the execution of this particular table function. 5.4.2. Generating Data What was shown in the previous section was a highly efficient method of producing a simple unformatted string of data by means of invoking a table function that called a simple access module. Other access modules could be set up to produce data with specific demographics and of greater complexity. Some results were recorded from executing the above SQL. Executing this table function in protected mode on an older generation of hardware produced data at a rate greater than 10,000 rows per second per node. In unprotected mode, nearFastLoad rates were achieved, approaching 100,000 rows per second per node. In contrast, using TPump with the same access module to produce the same data produced 800 rows per second per node. The table function provided orders of magnitude better performance, and with no client resources involved. 24 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 5.4.3. Controlling the Degree of Parallelism In this variation of the same prototype, the same table function, ‘emrcamrg’, generates data which is immediately written to a base table, ‘udftarget’, by means of an insert/select statement. This example’s somewhat more complex SQL contains a convention that lets the user control how many and which AMPs will be executing the table function. Limiting the number of participating AMPs controls the level of resources applied to the work that the UDF is performing. To understand how the degree of parallelism is managed, first look at the request that contains the UDF, particularly the input arguments, in bold below. insert udftarget select ampid,seq,passthruo,themessage from (select pivalue, ampid from rdg.allamp where ampid <4) A ,table (emrcamrg (12000, 5000, a.pivalue, '/home/rmh/bin/libamrgenu.so', '-roww 100 -f unformat')) T where a.pivalue = T.passthruo ; [Prototype Example #10] The 3rd position in the input argument is “a.pivalue” which is a correlated reference to a column in a table named “allamp”, which the query reads and joins to the table function. Because of the presence of this correlation, we know that the table function is in varying mode, and that only the AMPs that have rows pertaining to this variable will execute the table function. Allamp Table 4 Rows Selected Hashes to AMP7 Hashes to AMP6 Hashes to AMP5 Hashes to AMP4 Hashes to AMP3 Hashes to AMP2 Hashes to AMP1 Hashes to AMP0 Table Function sent to 4 AMPs AMP0 AMP1 AMP2 UDF UDF UDF AMP3 AMP4 AMP5 AMP6 AMP7 UDF Figure 12: Rows selected from the Allamp table control which AMPs do the work The definition of the parameters that are passed to the table function follows: • • • • 12000: A time limit (12000 ms or 12 seconds) 5000: A maximum number of rows to return (5000) per AMP A.pivalue: A variable that correlates to the primary index of the allamp table '/home/rmh/bin/libamrgenu.so': Path to the data generation access module NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 25 541-0004922A01 Facilities and Techniques for Event Processing The output generated by the table function, using the allamp table for guidance, is inserted into a target table that is defined with the following layout. The column ‘themessage’ is where the single generated string of data resides. Here is the layout of that target table: CREATE MULTISET TABLE RMH.udftarget ,NO FALLBACK , (AmpID INTEGER, seq INTEGER, passthruo INTEGER, themessage VARCHAR(32000)) PRIMARY INDEX ( AmpID ,seq ); The other columns in the above target table serve this purpose: • • • AmpID represents the AMP that was the source of this row; its value originates from the allamp table (described further below) Seq is a sequence number of each individual row produced on that AMP Passthruo is an output argument returned from the table function that matches the primary index value of the associated allamp table row (described below). Because a variable (a.pivalue) has been included in the input arguments of the table function, only a subset of the AMPs will invoke the UDF. AMPs that own rows reflecting the primary index values contained in the pivalue column will do work, the others will not. Because of selection criteria coded in the query’s access of the ampid table (select pivalue, ampid from rdg.allamp where ampid <4), we can assume that only the AMPs that hold rows with an ampid value of 0, 1, 2, and 3 will be invoking the table function in this query. This where clause could have selected ampid values less than 2 and caused 2 AMPs to execute the UDF, or values less than 7 and had 6 AMPs engaged. Consequently the base table called “allamp’ acts as a control mechanism over how many AMPs are active in this example. Here’s how that works: • • • The allamp table has been defined in such a way as to have only one row on each AMP. Values for the primary index column (pivalue) were intentionally chosen such that each row of the allamp table hashes to a different AMP. The associated AmpID column value carries an AMP identifier. To make this more understandable, below are the first 8 rows of the allamp table, sorted by ascending ampID. The numbers in the PIvalue column, the table’s primary index, were selected because they each hashed to different AMP. Values in the allamp talbe rows were carefully selected by the implementer to display these controlled characteristics. 26 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 CREATE MULTISET TABLE rdg.allamp ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT (PIValue INTEGER, AmpID INTEGER) PRIMARY INDEX ( PIValue ); PIValue 2 30 3 12 7 4 24 13 AmpID 0 1 2 3 4 5 6 7 In Prototype Example 10 that executes the table function above, note that the insert/select has a WHERE clause that joins the table function with those 4 selected rows of the allamp table. The join constraint ‘where a.pivalue = T.passthruo’ is added to the query to prevent an unconstrained product join between the 4 rows in the allamp spool and the rows being generated by the table function, each which carry the pivalue of the AMP where they originated. Allamp Table Product Join 4 Rows Selected Hashes to AMP7 Hashes to AMP6 Hashes to AMP5 Hashes to AMP4 Hashes to AMP3 Hashes to AMP2 Hashes to AMP1 Hashes to AMP0 20,000 Row Spool File Table Function sent to 4 AMPs AMP0 AMP1 AMP2 UDF UDF UDF AMP3 AMP4 AMP5 AMP6 AMP7 UDF Only These AMPs Produce Data Each Produces 5000 Rows Figure 13: The table function’s output is similar to a derived table, and will be joined to any other tables in the query, with or without a join constraint between them Without that WHERE clause, the result set would have contained 80,000 rows (4 allamp rows x 20,000 table function rows), rather than the specified 20,000 (4 AMPs producing 5000 rows each). If the unconstrained product join were to happen, each row of generated data would appear 4 times in the result set. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 27 541-0004922A01 Facilities and Techniques for Event Processing In order to create a UDF and make it available for use, both a compiled C or C++ code module and a data dictionary definition are required. The DDL to define this table function within the Teradata data dictionary follows: REPLACE FUNCTION RMH.EMRCAMRG (maxtime INTEGER, maxrows INTEGER, passthrui INTEGER, axsmodpath VARCHAR(255) CHARACTER SET LATIN, initstr VARCHAR(256) CHARACTER SET LATIN) RETURNS TABLE (seq INTEGER, passthruo INTEGER, themessage VARCHAR(32000) CHARACTER SET LATIN) SPECIFIC emrcamrg LANGUAGE C NO SQL PARAMETER STYLE SQL NOT DETERMINISTIC CALLED ON NULL INPUT EXTERNAL NAME 'F:emrcamrg:SI:pmddamti:/home/rmh/projects/inc/pmddamti.h:SS:emrcamrg:/home/ rmh/projects/emrcamrg/emrcamrg.c' The RETURNS TABLE clause describes the output of the table function. The EXTERNAL NAME clause is the path to the source code of the table function, which will be brought into memory for execution. 5.5. Table Functions with External I/O This section will offer prototypes illustrating external I/O being performed within a table function, including 1) reading from an external queue, and 2) accessing data from a different Teradata platform. A third reason you might want to use table functions to perform external I/O is if you need to pull in snippets of highly volatile real-time facts. While no prototype is included to illustrate this, this approach is worth a brief comment. Some phenomenon may change so fast that the benefit of capturing them and loading them into the data warehouse becomes questionable. Global Positioning Satellite (GPS) data, for example, can reflect the precise location of every vehicle on the nation’s highways at any point in time. Weather readings around the world may be interesting information, but are under constant change and flux. Stock market quotes are rising and falling perpetually. Does your data warehouse need all of this every-changing data? Perhaps. Or perhaps it needs it eventually, but not all of it right now. If only particular details provide value, or if you need only a handful of them at the moment they come into being, table functions offer the interesting alternative of pulling just the pieces of very unsettled data you actually need from the external world, on an as-needed basis. 28 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 5.5.1. Reading from a Queue In the last prototype example we illustrated generating data from an access module invoked by a table function. Now we are going to use a table function that calls an access module that reads from MQ. In the earlier example labeled Prototype #4, a scalar UDF was making one call to the access module in order to put one message on the queue. In this example a table function is using the same access module as Prototype #4 to read multiple messages from the queue. This prototype example also uses the same allamp table that was presented previously in the discussion of generating data using UDFs found in Section 5.4.3. Just as before, the allamp table is used to control how much parallelism will support this read effort. In this case only one AMP will be reading from the queue. In a large system, it may be desirable to limit the number of AMPs that participate in a table function, in order to minimize the impact on the overall system. You also may want to control the rate that data is being fed into the queue, and reducing AMP involvement gives you a lever for that purpose as well. Rather than generating data and writing to a base table as Prototype #9 did, the query illustrated here selects messages (‘TheMessage’) that represents the data that was passed in the queue. Select TheMessage from (select pivalue, ampid from rdg.allamp where ampid <1) A ,Table (emrcamrq (2000,1,a.pivalue, '/home/rmh/bin/libmqsc.so', '-qmgr queue.manager.1 -qnm QUEUE1', 'CHANNEL1/TCP/153.64.119.177')) mq where a.pivalue = mq.passthruo [Prototype Example #11] The explain text that is associated with the request illustrates how the AllAMP table drives the database activity. 1) First, we lock a distinct rdg."pseudo table" for read on a RowHash to prevent global deadlock for rdg.allamp. 2) Next, we lock rdg.allamp for read. 3) We do an all-AMPs RETRIEVE step from rdg.allamp by way of an all-rows scan with a condition of ("rdg.allamp.AmpID < 1") into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 7 rows. The estimated time for this step is 0.03 seconds. 4) We do an all-AMPs RETRIEVE step from Spool 1 by way of an all-rows scan executing table function RMH.emrcamrq into Spool 2 (all_amps), which is built locally on the AMPs. The size of Spool 2 is estimated with no confidence to be 7 rows. The estimated time for this step is 0.04 seconds. 5) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of an all-rows scan into Spool 4 (all_amps), which is redistributed by hash code to all AMPs. The size of Spool 4 is estimated with no confidence to be 7 rows. The estimated time for this step is 0.02 seconds. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 29 541-0004922A01 Facilities and Techniques for Event Processing 6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an all-rows scan, which is joined to Spool 1 (Last Use) by way of an all-rows scan. Spool 4 and Spool 1 are joined using a single partition hash join, with a join condition of ("PIVALUE = PASSTHRUO"). The result goes into Spool 3 (group_amps), which is built locally on the AMPs. The size of Spool 3 is estimated with no confidence to be 19 rows. The estimated time for this step is 0.05 seconds. In a more sophisticated example from the same prototype, a table was set up prior to running the request with the table function. The table was designed to hold parameters, such as how many rows you intend for the function to process, and the initstring. That table can then be read as a derived table in the query that invokes the table function. This eliminates the need for each request to hard code the arguments. Here’s how that looks: Select ampid ,seq ,passthruo ,themessage From (Sel MaxTime ,MaxRows ,ReaderPIVal ,AxsmodPath ,InitStr ,Channel ,AmpId From MQJobParms Where AmpID < 8) prm ,Table (emrcamrq(prm.MaxTime,prm.MaxRows,prm.ReaderPIVal, prm.AxsmodPath,prm.InitStr,prm.Channel)) mq Where prm.ReaderPIVal = mq.PassThruO; [Prototype Example #12] In the above example, the MQJobParms table also controls the level of parallelism within Teradata that is applied to the work. In this case ReaderPIVal is a variable passed into the table function from the MQJobParms table rows. The value contained in the ReaderPIVal column is also represented as the output variable of the table function, named ‘passthruo.’ There are 8 values for the AmpID column, based on the WHERE clause within the derived table that accesses MSJobParms, which delivers 8 different ReaderPIVal values as input to the table function. 5.5.2. Reading from a Remote Teradata System – Example 1 If you are running a Teradata dual active system, or have a second Teradata system for any reason, such as development, there may be times you would like to pass data back and forth between the two platforms. For example, as shown in this next prototype, it may be useful to query the dictionary tables from one system, so they can be correlated to the other. 30 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 This SQL statement uses a table function (‘tdat’) that executes a query on a remote Teradata system, and returns the answer set. In this example, it returns all the database names in the other system’s dictionary tables. sel * from table(rdg.tdat(2,1,'adw1/rdg,rdg' ,'sel databasename from dbc.databases')); [Prototype Example #13] The actual SQL executed on the second system is passed as a fixed input argument of the table function, as is the other system’s logon string. The DDL to create the function follows: REPLACE FUNCTION RDG.TDAT (rowc INTEGER, InLineNum INTEGER, logonstr VARCHAR(50) CHARACTER SET LATIN, sqlRqst VARCHAR(512) CHARACTER SET LATIN) RETURNS TABLE (ampId INTEGER, cnt INTEGER, OutLineNum INTEGER, str1 VARCHAR(256) CHARACTER SET LATIN, . . . str20 VARCHAR(256) CHARACTER SET LATIN) SPECIFIC tdat LANGUAGE C NO SQL PARAMETER STYLE SQL NOT DETERMINISTIC CALLED ON NULL INPUT EXTERNAL NAME 'SS:tdat:/home/rdg/tdat/Tdat.c:SL:cliv2' By creating a view across two Teradata systems you can compare dictionary content across platforms, and compare detail such as table space, or access rights. The view below simply compares the rows that appear in each system’s DBC.Tables view. create view allTables as sel 'Local System' as system ,databasename ,tablename ,version ,tablekind ,protectionType ,JournalFlag ,CreatorName ,requesttext(varchar(100)) from dbc.tables NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 31 541-0004922A01 Facilities and Techniques for Event Processing UNION sel 'Remote System' ,str1 (char(30)) ,str2 (char(30)) ,str3 (Integer) ,str4 (char(1)) ,str5 (char(1)) ,str6 (char(2)) ,str7 (char(30)) ,str8 (varchar(100)) from table(rdg.tdat(2,1,'adw1/rdg,rdg' ,'sel databasename,tablename ,version, tablekind,protectionType,JournalFlag,CreatorName, requesttext(varchar(100)) from dbc.tables')) T; A sampling of data returned from the above SQL, when ordered by tablename (for easy cross-comparison), looks like this: System DatabaseName TableName Remote System test Local System DBC Remote System DBC Remote System DBC Local System DBC Local System rdg Remote System test a AccessRights AccessRights AllSpace AllSpace allamp allamp Version 1 1 1 1 1 1 1 TableKind T T T V V T T 5.5.3. Reading from a Remote Teradata System – Example 2 This prototype illustrates the case where the data of interest resides on a different Teradata platform from which the query is executing. Table functions can provide a quick way of moving data under such conditions. Perhaps the data has been offloaded to an older configuration because it is outdated, and rarely used. Or perhaps you wish to restore selected data that has been archived to a different Teradata platform. Or you may consider this when you need to access real time information where the cost of the occasional access is less than the cost of integrating all the changes in real time. In this prototype, System A hold rows of lineitems that are partitioned by day. System B executes a query that requires one or more partitions for processing. Only the desired partitions are read, by means of a table function, and brought over to System B. In order to support this activity, a view has been created on System B that joins a look-up table to the table function that accesses the PPI table on System A. The look-up table is used to provide a logon string and the appropriate SQL that is required to pull the desired data off of System A. It has one row per partition in the PPI table. 32 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 Query Requests Old Data System B System A View Lookup Table - Day - Logon - SQL Table Function PPI Table Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Figure 14: If the query requests 1 day, only 1 partition is returned by the table function The table that holds the SQL looks like this: CREATE SET TABLE RDG.lisql ( l_shipdate DATE FORMAT 'YY/MM/DD', passthru INTEGER, logonstr VARCHAR(12) CHARACTER SET UNICODE NOT CASESPECIFIC, sqltxt VARCHAR(452) CHARACTER SET UNICODE NOT CASESPECIFIC) PRIMARY INDEX ( l_shipdate ); Three random rows from the lisql table follow, with the SQL abbreviated: l_shipdate passthru logonstr sqltxt 1998-09-13 2447 adw1/cab,cab Select L_ORDERKEY. . . from ADW.liday where l_shipdate = '1998-09-13' 1992-04-10 100 adw1/ cab,cab Select L_ORDERKEY. . . from ADW.liday where l_shipdate = '1992-04-10' 1997-07-17 2024 adw1/ cab,cab Select L_ORDERKEY. . . from ADW.liday where l_shipdate = '1997-07-17' When a user submits a query that accesses this lookup table, each date selected in the query will cause one row, a different row, in the table to be selected. For example, if query had a WHERE clause that said “where l_shipdate between ‘199501-01’ and ‘1995-01-03’” that would cause 3 rows from the lisql table to be selected. Each row has a logon string and a different SQL statement. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 33 541-0004922A01 Facilities and Techniques for Event Processing …where sdate between ‘1995-01-01’ and ‘1995-01-03’ System A System B Lookup Table - Day - Logon - SQL Select…where sdate = ‘1995-01-01’; Select…where sdate = ‘1995-01-02’; Select…where sdate = ‘1995-01-03’; Join Table Function AMP1 AMP3 AMP5 AMP7 PPI Table Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 3 Queries Execute in Parallel Figure 15: Each date selected causes one query to be executed on the remote system When multiple dates are in the query, and as a result multiple rows are selected from the lisql lookup table, this causes two things to happen: 3. There will be one AMP on the local system working on behalf of the table function for each row accessed from the lisql table. This is an example of a correlated join when the table function is in varying mode. 4. Each of the local AMPs that is executing the table function will be sending one of the multiple SQL statements to the remote system, and receiving output back. The view that joins the data from System A and the look-up table follows. replace View RemoteLineitem as Select str1 (Integer) as L_ORDERKEY, str2 (Integer) as L_PARTKEY , str3 (Integer) as L_SUPPKEY , str4 (Integer) as L_LINENUMBER , str5 (DECIMAL(15,2)) as L_QUANTITY , str6 (DECIMAL(15,2)) as L_EXTENDEDPRICE, str7 (DECIMAL(15,2)) as L_DISCOUNT, str8 (DECIMAL(15,2)) as L_TAX, str9 as L_RETURNFLAG , str10 as L_LINESTATUS , l.l_shipdate (FORMAT 'yyyy-mm-dd') , str12 (date) (FORMAT 'yyyy-mm-dd')as L_COMMITDATE , str13 (date) (FORMAT 'yyyy-mm-dd')as L_RECEIPTDATE , str14 as L_SHIPINSTRUCT , str15 as L_SHIPMODE , str16 as L_COMMENT from (select * from lisql) l ,table(rdg.tdat(2,l.passthru,l.logonstr,l.sqltxt)) T where l.passthru = t.outlinenum; Because when multiple dates are selected, multiple queries, one per date, are generated and sent to the remote system, and because these queries execute in 34 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 parallel on the remote system, better than linear performance can be achieved using this technique. For example, compare the time to return one partition, consisting of one date, with the time to return 7 partitions, consisting of one week’s worth of data. SQL Issued by the User select * from RemoteLineitem where l_shipdate = '1995-07-14' select * from RemoteLineitem where l_shipdate between '1995-08-14' and '1995-08-20' Number of Partitions Response Time Number of Rows Rows per Second 1 27 seconds 124,905 4,626 7 54 seconds 872,936 16,165 [Prototype Example #14] 5.6. UDF Considerations Some of the considerations when using UDFs include: • • • • • • UDFs may impact parallel efficiency on the platform, particularly in the case when the UDF is executing on a subset of the total nodes and it is resource-intensive. Such a UDF execution may lengthen the amount of time a query’s step holds on to an AMP worker task on the AMPs supporting the UDF execution. If a UDF is running unprotected, the UDF will be running in the context of the AMP worker task used by that query step. No additional AMP worker task will be required. Running in protected mode requires that a protected mode server be available, a resource that is limited based on an internal setting that has a maximum setting of 20. For a UDF, the protected mode server is held only as long as the UDF executes. For a table function, the protected mode server is held for the duration of the query step. Be aware that expanding the number of protected mode servers will draw from system disk resources. UDF parameters are strongly typed. Because parameters are defined at compile time you either need to account for any changes in format of the data coming back yourself, or you will require a different UDF for each differently formatted set of rows. As an illustration of how to account for this, the Tdat UDF in Prototype example #13 was defined with 20 generic varchar columns. This is so that up to 20 columns of any reasonable length can be returned using the UDF. There may be security ramifications in using UDFs, as they run as Root on the node when unprotected. However, you can use Teradata access rights to control who can create and who can execute these functions. UDFs that access external data will need to consider the performance impact of consuming resources that are outside the Teradata platform, for example a Websphere MQ server. Teradata tools that track and record resource usage, such as Database Query Log, AMPusage, and ResUsage, will not be aware of this additional resource demand. In addition, resources used outside of Teradata will be outside the scope of Priority Scheduler. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 35 541-0004922A01 Facilities and Techniques for Event Processing 6. Using Triggers in Event Strategies A trigger is a set of actions that are run automatically when a specified change operation is performed on a given table. Triggers are a key event technology because they initiate the automation of business events directly inside the database. In Teradata, triggers are implemented as part of a multi-statement request with the statement that caused the trigger to fire. Triggers are bundled in with the initiating data changes into a single unit of work. Because of that tight bundling, triggers are incorporated into the same recovery unit with the original statement that caused them to fire; if one action fails, both will be rolled back. This provides a level of integrity that is not always in place among other event components. In Teradata Database V2R6, the action of a trigger can include more than SQL, as described in the following section. 6.1. The Firing Statement The actual execution of a trigger pushes notification of an event out, as described by its firing statement. The firing statement is important because it is an action based on something happening or something being identified as requiring further action. The firing statement can initiate a chain of steps related to the handling of an event. There is greater flexibility in the firing statement of the trigger in V2R6. The firing statement can now do all of these things: • • • • Execute SQL against objects within the Teradata database Insert into a queue table Call a stored procedure Invoke a UDF In this Orange Book we will focus on the second, third and fourth options. 6.1.1. Triggering into Queue Tables Because queue tables have the potential for passing things on, they make a natural second step in an event chain. Something interesting has happened that requires additional processing, and queue tables can make a convenient hand-off point for the trigger. The trigger simply inserts into the queue table. Section 2.1.3 describes a technique using queue tables to monitor the effectiveness of current promotions, as it is used in a recent active data warehouse benchmark. In the V2R6 version of this same benchmark, a trigger causes a row to be written to a queue table named event01_QT each time a promotional product is inserted into the Mkt_Basket_Dtl base table. TPump is the load utility being used. The syntax for the trigger that writes to the queue table follows: 36 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 replace trigger event02_trig after insert on Mkt_Basket_Dtl referencing new as n for each row when (n.mbd_productkey in (8,13,24,35,46,52,67,78,83,98 ) (insert into event02_QT values (n.mbd_orderkey ,n.mbd_productkey ); ); When a trigger inserts into a queue table, that activity belongs to the same transaction as inserted into the base table that owns the trigger. For example, if TPump inserts a row that is part of a special promotion, and this causes a trigger to fire, both the insert into the base table and the trigger are part of the same transaction. The physical transaction will end with the queue table insert, even though the logical business transaction will continue. The subsequent SELECT AND CONSUME of the queue table that will continue the life of that event will take place under the control of a different transaction. Using queue tables in this manner is an asynchronous activity that breaks the chain of the event into smaller, independent links. TERADATA Claims Table Trigger Queue Table Transaction 1 SELECT and CONSUME External Stored Procedure Transaction 2 Put Msg on Queue MQ Queue Get Msg From Queue Transaction 3 Figure 16: Processing an event may involve multiple physical transactions, inside and outside of Teradata Note that while you may insert into queue tables by means of a trigger that has been built on a base table, in the current version of Teradata a trigger may not itself be defined on top of a queue table. 6.1.2. Triggers Invoking Stored Procedures or UDFs Instead of the action of a trigger being an insert, as illustrated in the previous example, the action may be a call to a stored procedure or utilize a UDF. In either case, the trigger and the stored procedure or UDF are part of the same recovery unit. Considering just the stored procedure, if it were to fail, then both the insert to the base table that caused the trigger to fire, and the trigger itself, would be rolled back. However, it is important to note that the effect of the trigger might not be completely rolled back, for example, if the trigger uses a UDF, external procedure or table function that causes some “external” action to occur. A trigger calling a stored procedure makes the event processing synchronous. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 37 541-0004922A01 Facilities and Techniques for Event Processing For example, if you consider the example of the scalar UDF that writes one message to an external queue file, presented in Section 4.2.3, that same SQL statement could be incorporated into the firing statement of a row trigger. Using our previous trigger example from an active data warehouse benchmark, the syntax might look like this: replace trigger event02_trig after insert on Mkt_Basket_Dtl referencing new as n for each row when (n.mbd_productkey in (8,13,24,35,46,52,67,78,83,98 ) (Select emruwmq('queue.manager.1','QUEUE1','CHANNEL1/TCP/153.64.119.177', ‘trim(n.mbd_productkey)||’,’||trim(n.mbd_quantity))); The same action of writing to a queue could have been compiled into an external stored procedure. Be aware that there are some restrictions on the type of SQL statement that can be included in stored procedures that are called from triggers. DDL, for example, is prohibited. See the formal documentation manual for a complete list. 6.1.3. When to Use Which The choice between a UDF or a stored procedure on the one hand vs. a queue table on the other should be based on the amount of work the triggered action will actually do. For example, extended analytics that rely heavily on database access and logic branching are better handled by being spun off as an independent activity, by means of write to (followed by a read from) a queue table. This will keep the scope of the work for the initiating transaction small. In such cases, where there will be analytic work within a stored procedure called from a trigger, keep in mind that the initial update action will not be committed until the stored procedure has successfully completed. AMP worker tasks supporting the initial update activity will be held, as will locks. Suppose the update caused the trigger to fire is part of a TPump insert job, it is likely that performance for the entire load job will be impacted. External stored procedures, on the other hand, are easily replaced by UDFs within the fired statement of a trigger. 38 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing One Unit of Work Asynchronous Synchronous Insert Insert Insert Base Table Base Table Base Table Trigger Queue Table One Unit of Work Synchronous 541-0004922A01 Select and Consume Trigger Trigger UDF (write to MQ) Stored Procedure Select… Update… Select UDF… (write to MQ) One Unit of Work One Unit of Work Figure 17: The approach to using triggers can extend or reduce the recovery unit 6.2. Trigger Complexity Tradeoffs Performance of load processing can be greatly impacted by how triggers are designed. Primary index triggers, such as insert or updates and deletes based on having a primary index value available, can complement TPump jobs, for example. Such triggers rely on row-hash locks and impact only a single row on a single AMP. But be aware, that even though this is a minimal level of overhead, row-hash locks can, under some conditions, contribute to blocking. But what to really watch out for is if the trigger generates insert/select statements or complex updates. This can increase the overhead involved and cause table level locks to be set, reducing parallelism and degrading performance of the load process. A good approach is to consider the overhead of triggers the same way you would consider the overhead of join indexes. Running an explain of the base table update that causes the trigger to fire will provide a blueprint of the database effort involved in supporting the trigger, just as it would illustrate the overhead involved in join index maintenance. Note that tables that contain triggers may not be loaded using FastLoad or MultiLoad. In addition, you cannot create a trigger on a table participating in a join index. In order to properly appreciate a simple vs. a complex trigger, and its impact on the update, consider the contrast between the following examples. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 39 541-0004922A01 Facilities and Techniques for Event Processing Simple Trigger Example: What makes this first trigger simple is that the body of the trigger (the action that happens as a result of the trigger firing) is a simple, one-AMP insert with only one row-hash lock. The explain of the update statement that causes the trigger to fire will illustrate the impact of the trigger. replace trigger trigaudit after insert on resultinfo referencing new as n for each row (insert into cabaudit values (n.r_resultinfokey,n.r_comment);); explain insert into resultinfo values (99,'newitem',5,'exception'); Explanation: 1) First, we execute the following steps in parallel. 1) We do an INSERT into ADW.resultinfo. 2) We do an INSERT into ADW.cabaudit. 2) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. If you are loading data with TPump, the overhead of a simple trigger will depend on how frequently that trigger will fire. Consider these test results from a TPump job that loads over 300,000 rows. TPump Percent longer Elapsed Time Compared to No Trigger TPump with no trigger 140 sec. Trigger never fires 154 sec. 10% Trigger fires 10% of the time 168 sec. 20% Trigger fires 100% of the time 262 sec. 88% Even if the trigger never fires, there is some overhead in checking the WHEN clause conditions, in this case about 10%. More complex conditions will incur more overhead for the condition checking. The overhead increases as the percentage of rows that cause the trigger to fire increases. Complex Trigger Example: Contrast the single AMP insert above, with the activity caused by the trigger in the explain below. There are several aspects of this trigger’s complexity. There are two table-level write locks, and one level read lock. The plan also contains 6 allAMP steps, one of which performs a full table scan/update. It is interesting to note that while the table that the trigger is on (resultinfo) does not have a join index defined on it (triggers and join indexes are not supported on the same table), there is a join index involved in the plan produced by the trigger (reftblJI). This is because within the body of the trigger an update is performed against 40 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 reftbl table, which does have a join index built upon it, and join index maintenance must be included in the plan because that table is potentially being updated. replace trigger trigaudit after insert on resultinfo referencing new as n for each row when (n.r_resultinfokey not in (select o_altkey from orderalt)) (update reftbl set rt_acctbal = rt_acctbal + 1;); explain insert into resultinfo values (99,'newitem',5,'exception'); Explanation: 1) First, we lock a distinct ADW."pseudo table" for write on a RowHash to prevent global deadlock for ADW.reftblJI. 2) Next, we lock a distinct ADW."pseudo table" for read on a RowHash to prevent global deadlock for ADW.orderalt. 3) We lock a distinct ADW."pseudo table" for write on a RowHash to prevent global deadlock for ADW.reftbl. 4) We lock ADW.reftblJI for write, we lock ADW.orderalt for read, and we lock ADW.reftbl for write. 5) We execute the following steps in parallel. 1) We do an INSERT into ADW.resultinfo. 2) We do an INSERT into Spool 1. 6) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 3 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 3 by row hash. The size of Spool 3 is estimated with high confidence to be 1 row. The estimated time for this step is 0.01 seconds. 2) We do an all-AMPs RETRIEVE step from ADW.orderalt by way of index # 4 without accessing the base table "ADW.orderalt.O_ALTKEY = 99" with no residual conditions into Spool 5 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 5 by the sort key in spool field1 eliminating duplicate rows. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 5 is estimated with high confidence to be 16 rows. The estimated time for this step is 0.04 seconds. 7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of an all-rows scan into Spool 4 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool 4 by row hash. The size of Spool 4 is estimated with no confidence to be 320 rows. 8) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an all-rows scan, which is joined to Spool 4 (Last Use) by way of an all-rows scan. Spool 3 and Spool 4 are joined using an exclusion merge join, with a join condition of ("Field_2 = O_ALTKEY"). The resultinfo goes into Spool 2 (Last Use), which is built locally on the AMPs. The size of Spool 2 (Last Use) is estimated with index join confidence to be 1 row. The estimated time for this step is 0.06 seconds. 9) We execute the following steps in parallel. 1) If the number of rows returned in 8 is > 0, we do an all-AMPs NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 41 541-0004922A01 Facilities and Techniques for Event Processing UPDATE from ADW.reftblJI by way of an all-rows scan with no residual conditions. 2) If the number of rows returned in 8 is > 0, we do an all-AMPs UPDATE from ADW.reftbl by way of an all-rows scan with no residual conditions. 10) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. 6.3. Other Examples of Event Triggers Triggers can be useful in moving around and saving snapshot data produced by monitoring tools. The following paragraphs offer an example. TPump provides a method of monitoring its own progress, by means of an entity known as the TPump status table, officially called TPumpStatusTbl. Only one row in the database is used by TPump for this purpose. If this monitor table has been initiated, TPump inserts a row, then updates that same row once every minute, overlaying with each write the information it previously placed there. TERADATA TPump Job Buffers Msg Queue Get MQ Access Module SQL Base Table Parallel Sessions Status TP_Stat trigger Hold Stat Summary & Validation Reports Figure 18: Triggers defined on the TPumpStatusTbl insert into a HoldStatus table, preseving status information [Prototype Example #15] By relying on database triggers, all information TPump writes to its monitor table can be automatically moved into a history table also located in the Teradata database. This history table could specifically hold images taken from the monitor table row. The database triggers would be fired only when the single row in the TPumpStatusTbl changes, either because it is inserted (as it would be at job start), modified (once every minute during the job) or deleted (at end of job). Triggering to a history table would allow a delta for various load statistics to be computed between TPump writes to the status table, as the information made available by TPump is accumulative. 42 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 7. Enterprise Data Warehouse Considerations When events are integrated into the Teradata data warehouse, items such as monitoring, security, and performance all need to be pro-actively considered. This chapter will address some of these topics. 7.1. Monitoring Monitoring for events is similar, yet it is also different, from the standard data warehouse monitoring that may already be in place. While setting up monitoring to track the volume and nature of the events passing through your system may be fairly standard, correlating the resources used to particular events may require more creative thinking. 7.1.1. Database Query Log and Stored Procedures Database Query Log (DBQL) offers many benefits in the Teradata world in tracking query activity. However, because it is focused on the Teradata database activity, DBQL only captures query characteristics and resource usage from activity that is running on the AMP. While SQL issued from stored procedures execute in the AMPs, the stored procedure itself runs in the parsing engine. Both the stored procedure call and each SQL statement within the stored procedure will be logged as separate entries in DBQL, each with a distinct Query ID. If the QueryText column in DBQLogTbl contains the stored procedure call, for example it contains something like ‘call pksscan (10, avgep)’, then you will see zero in the TotalCPUTime column of that row. Only the Teradata SQL statements within the stored procedure, each which will have its own row in DBQLogTbl, will register CPU and I/O usage. An external stored procedure that does not issue Teradata SQL, will never accumulate resource usage that can be reported in DBQL. However, a call to an extended stored procedure will get logged in the default logging table and will be given a query ID. In addition, all the resource accumulations will be accounted for within the ResUsage tables from stored procedure executions. When a stored procedure, whether external or SQL-based, calls another stored procedure, both call statements will get logged in DBQLogTbl. However, the nested stored procedure will always get a zero QueryID and nulls in the QueryText column. If the second-level stored procedure executes SQL statements, they will not appear in DBQL. This is because the current release of DBQL does not have access to the request and session context when there is a call within a call. 7.1.2. Alternatives for Monitoring Events Because there are no automatic approaches to getting usage information out of stored procedures, some simple home-grown techniques are useful to consider. Below are some thoughts to get you started: • Build your own log table that different processes you build can log into, capturing response times, counts, actions taken, etc. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 43 541-0004922A01 • • Facilities and Techniques for Event Processing As the first statement in a stored procedure call, and as the last call, select the current timestamp, and insert both values into a log table to capture actual response time. Utilize a UDF that captures CPU usage levels based on the operating system process-level usage. For example, a UDF could be written to look at Windows PerfMon measurements. If you are using either a UDF or an external stored procedure, you may choose to put your own logging in place. You can take advantage of the debug table as a temporary place to log things. Or you can log to a file which is later processed or loaded into Teradata. When monitoring is expanded to cover events, it is more important to set up account strings appropriately. You might want more granular control over users, with different roles and different priorities for different event types. Even the same stored procedure executing at different times of day may require different execution profiles. 7.2. Security During event processing if you invoke an external service, you will be doing this external work under some outside authority, beyond the scope of standard Teradata security. 7.2.1. UDFs and Security User Defined Functions run as ‘root’ (in unprotected mode) or ‘tdatuser’ (in protected mode) on UNIX systems and in system mode on Windows systems. Running as ‘root’ (or system mode on Windows) gives the UDF super user privileges. Because the execution itself cannot be managed, the point of control needs to be in the privilege to CREATE and EXECUTE UDFs. When working with UDFs, it is critical to oversee who is given the two above privileges. In addition, thorough testing should be required before a UDF is moved into production. 7.2.2. Approaches to Handling External Security A basic issue with security today is how to pass authentication to an external platform. For example, a user and password name will need to be made available to a stored procedure that will be going to an external platform, so work can be done there. An easy thing to do would be to code the external stored procedure (XSP) with the user and password information and any other logging on detail contained within. Another approach is to set up your XSP with arguments that could pass the user and password information at execution time. Or the XSP could call a second-level SQL-based stored procedure that determines the appropriate log on and security information from within the Teradata database. 44 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 The graphic below shows a similar approach that uses a high level stored procedure that first calls an SQL-based stored procedure to get the security information, than calls an XSP to do the external access using the information provided by the first stored procedure. Teradata Coordinating SP - Call GetPassword - Set user and password - Call GetExtData 3. 1. GetPassword SP 2. 4. 7. GetExtData XSP 5. 6. External Database Security Table Figure 19: Providing a user and password for external platform This same approach could be enhanced by using an ODBC call to the external database that is going to be accessed, to replace the GetPassword stored procedure illustrated above. An enterprise security server (for example, LDAP) could also be accessed as the first step, in order to establish a valid identification. Whatever approach is used, the scenario will be similar: Authentication for the external service is stored somewhere. The user executing the XSP or the UDF has to be able to access this authentication data based on who he is inside Teradata. 7.2.3. Auditing An important part of any security scheme is a means to capture violations. In an event architecture, this type of auditing is even more important because of the risk of poorly-written or rogue UDFs. A built-in logging structure, such as discussed earlier in the context of monitoring, can do double-duty as a method of enforcing an audit trail for external activities. Good coding practices around traceability will also rise in importance. 7.3. Workload Management Components that are involved in event processing in Teradata may show different utilization patterns than what you are used to seeing. For example, you may have these types of things happening in combination: • • • • • A continuously-running stored procedure issuing lots of short queries. A stored procedure that does a single in-depth data analysis query. Events being processed off an external queue. Triggering stored procedures to external events. Queue tables being processed at irregular rates. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 45 541-0004922A01 Facilities and Techniques for Event Processing As background material for this section, it may be helpful if you read through the Orange Books already published on Using Priority Scheduler and Using Teradata Dynamic Query Manager. 7.3.1. Priorities There may need to be procedures put in place to monitor and change priorities of different processes that are running. By having more granular user IDs you can track event frequency and resource usage, and you can prioritize event processing with more flexibility. All of the Teradata database activity for a given user session will run under the priority established by that user. This is true whether the resources are being used in the AMP or in the PE, as would be the case with stored procedure logic. Keep in mind that a UDF or XSP can consume resources outside of Teradata, which will not be under the control of Teradata workload management facilities during that period of time. While AMP worker tasks may be held during the time that a UDF executes externally, based on the SQL step that caused the UDF to execute, no additional AWTs are either acquired or held by UDF execution. 7.3.2. Workload Rules TDWM (formerly known as Teradata Dynamic Query Manager) supports both filter rules (object access and query resource types), as well as throttle rules. The throttle rules, previously called workload limit rules, control concurrency levels within certain groupings of users. These TDWM rules may need to be reconsidered when put into the context of queries that participate in events. Rules will apply only to the SQL statements that access data in Teradata, but have no usefulness in controlling external access. Here are some guidelines on where TDWM will be or not be effective with event components: • • • • • Only the SQL within stored procedures will adhere to rules. External stored procedures will not be impacted by rules. Filter rules can be associated to queue tables. Filter rules cannot recognize or restrict UDFs. Throttles can delay or reject queries than invoke UDFs or that access queue tables. Some additional considerations specific to TDWM follow: Stored Procedures: Within a stored procedure, each SQL statement is a separate request and will be considered by TDWM separately for rule-compliance. It is possible for the first request in a stored procedure to comply with all rules and execute successfully, but then to have a subsequent SQL statement in the stored procedure be delayed or rejected. 46 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 For example, if a throttle rule with a “reject” option has been defined, and one of the SQL requests has a qualifying all-AMP step that would violate the rule, then the entire stored procedure would be terminated with a 3151 error. Checks for TDWM rules are made after the stored procedure begins execution, not before. To prevent abnormal termination under such conditions as described earlier, the stored procedure could be coded to include error handling for the 3151 error code. If the throttle rule specifies “delay” rather than “reject” the stored procedure would stay alive in a suspended state and wait until its SQL request got off the delay queue, executed and completed. Delay Queues: Be careful when using throttle rules that you don’t inadvertently place queries that participate in event processing in a delay queue. Depending on your design, event processing may initiate a data analysis query. This query may be viewed by a workload administrator as the type of work that is resource intensive and should be limited to a specified concurrency level. However, delaying this query may not be in the interest of the broader event architecture. Workload management query rules may need to be revisited in the context of this broader perspective. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 47 541-0004922A01 Facilities and Techniques for Event Processing 8. Interacting with Event Architectures Outside Teradata Twenty years ago Teradata made it a priority to fit into the mainframe environment. Teradata Director Program (TDP), the channel connection, MVS-based utilities, and applications such as TS/API all played a part in opening up Teradata for close association with mainframe data and applications. Teradata has evolved to plug into the organization’s operational environment, beyond the mainframe. Teradata supports open standards and interfaces such as ODBC and JDBC. Operational data stores, for example, previously running on an external data marts, can now run inside of Teradata. Active data warehousing is a logical extension of this continuing drive towards interaction and cooperation beyond the database. The next step forward, and outward, integrates Teradata more tightly with emerging Service Oriented Architectures (SOA) that have been set up to support them. Some organizations will already have established architectural choices. This chapter will discuss ideas and recommendations for fitting Teradata into such a backdrop, starting with a brief explanation of what constitutes an SOA. 8.1. Service Oriented Architectures Service Oriented Architectures are development and runtime frameworks that are best represented by products such as IBM® WebSphere® Business Integration Server, BEA’s WebLogic Integration™, Tibco® BusinessWorks and Microsoft® BizTalk Server®. These products’ architectures provide a standard way to integrate and open up services and processes throughout the enterprise, and they are rapidly growing in popularity across industries. Most SOA products include a design time graphical user interface to define integration scenarios and offer workflow management. In addition they provide webbased administration and monitoring. Most allow you to drag and drop symbolic representations of services being integrated. SOAs leverage standards such as XML, J2EE, .NET, and Web Services, are quick and easy to set up, offer flexibility in how you deploy them, and support real-time information exchange. To better understand the SOA framework, a few acronyms and frequently used terms are listed below: • • 48 SOA – Service Oriented Architecture, the framework for interoperability. The processing of transactions are made available as business services with known interfaces using a standard representation. Services are published using a common interface descriptor language and protocol, such as WSDL. WSDL – Web Services Descriptor Language, a new specification to describe networked XML-based services. It provides a simple way for service providers to describe the basic format of requests to their systems regardless of the underlying protocol. A WSDL definition will be required to complete the process of exposing the DML as a service that can be used. After the WSDL definition has been recorded, the SOA design tool will be able to reference that service at the appropriate place. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing • • • • • 541-0004922A01 SOAP – Standard Object Access Protocol. SOAP is a lightweight, vendor-neutral, text-based protocol that uses XML for exchange of information in a decentralized, distributed environment. UDDI - Universal Description, Discovery, and Integration. Standard enterprise directories are like global white pages, a place you can go to look up technical details about working with other web services, and advertise your own services. UDDI is perhaps the most well-known web services directory. Adapter – An implementation of a communication mechanism, such as a particular protocol, that allows different software to talk. Orchestration – Graphical creation of business processes. It brings understandability and provides a framework that allows business analysts to interact with IT architects. Expose – Make a component, such as a service, available for interaction within an SOA 8.2. How to Expose an Event or Service in Teradata “Exposing” a service or an event means making it available for interaction with a Service Oriented Architecture. What is of concern from the Teradata perspective is how to design components in such as way that they are open to, and can plug into, whatever event architecture is are already in place. Some products have a specific Teradata adaptor, such as Tibco. This allows Teradata components to plug into the enterprise framework as it is defined by Tibco. In Section 8.3 a real-life scenario involving Teradata and Tibco is illustrated. This prototype relies on the Tibco adapter to allow components from different platforms to talk. WebLogic Integration (WLI) enables integration within an Enterprise Information Systems EIS) framework. WLI provides a set of adapters to integrate with backend systems and enterprise applications and technologies, and supports custommade adapters using an Adapter Development Kit. On the other hand, any Teradata application can be written in either JAVA or .NET environment, and then be exposed as a service within one of those frameworks. This does not require a special adaptor. Once you have exposed a service, it can be placed in a larger business transaction, using one of the design time tools that are made available in all SOAs. 8.3. Example of Teradata within an SOA This section describes a prototype in which Teradata is integrated into a Tibco run time framework. This particular application is from the transportation industry, and illustrates how Teradata fits into an SOA. 8.3.1. The Event A moving vehicle carries some risk of developing mechanical problems during a long trip. Even the most thorough maintenance check back home may not prevent an incident if a part goes bad somewhere along the route. In order to address this, the Teradata customer has their vehicles instrumented to produce diagnostics NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 49 541-0004922A01 Facilities and Techniques for Event Processing measurements periodically. These diagnostics can be used to understand the health of the moving parts when they are in action. For example, when a vehicle component becomes overheated, that could be a sign that something is causing unusual friction inside the mechanism. Having this information right away while the vehicle is en route could prevent a serious accident. But one reading from the instrumentation alone is inadequate in diagnosing a problem. To unnecessarily delay or sideline a vehicle that is performing a valuable commercial service just on the suspicion of a problem could be costly and create customer dissatisfaction. What is needed to make the right decision is to interpret the on-the-spot diagnostic readings by looking at the history of that particular component and factoring in environmental variables. As soon as diagnostics for a vehicle are emitted, these readings are batched up and immediately sent into the Teradata database as one transaction. Inside the Teradata database, a history of each component’s diagnostics have been collected, and during the event analysis the last 8 readings are compared against the current transaction’s readings. As a result of this analysis, several different actions can be ordered. For example, the mandated action could be stop the vehicle immediately, or stop at the next convenient location, or do maintenance at the end of the trip. Most often the action is do nothing. 8.3.2. Using Tibco Tibco provides a framework for a Service Oriented Architecture and allows you to design a workflow, using icons, to represent a process and its services. In this prototype, the Business Works component was used to map out the processes. In Business Works a workflow is representative of a process, and a Project includes multiple workflows. Below is a screen shot showing a simple workflow. 50 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 Figure 20: Tibco Workflow example In this prototype data comes in as flat files. So the first step in defining this process in Tibco’s Designer is to drag into the work space an icon (referred to as a palette) representing reading of a flat file. Once you’ve dragged this palette in, all you need to do is name it, give it properties, and associate it to a file name. The file name you enter will be the file that contains the recent wheel readings. Sending a query along with the file into Teradata is accomplished by dragging in an instance of another palette known as the “Teradata Adaptor Configuration” palette. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 51 541-0004922A01 Facilities and Techniques for Event Processing Figure 21: Teradata Adaptor Configuration The Teradata adapter exposes a number of design time properties, and you may configure each instance of the adapter differently. For example, one variable is the “subscriber bulk insert size”, which is sort of like the pack factor in TPump. It directs the adapter to read data into Teradata with a specified batching size. As part of configuring this adaptor instance, Tibco will read the Teradata dictionary to pull off information, such as columns in a table, to help you establish the layout of staging tables. There are also options that allow you to specify data conversions, if needed. The flexibility is there to map one input file to several different database tables. 52 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 Figure 22: Teradata Adaptor Services Settings Publication and subscription services are available for Teradata, again using palettes specific to those activities. If you drag a subscription palette into your workflow, for example, the tool automatically creates a shadow table for you and a trigger to instigate an insert into the shadow table, where it will move rows to be published. All of this is based on options you have selected. Then it automatically sets up polling on the shadow table where it pulls data into its message bus. Of course, you will need to define who subscribes to that data as well. After defining and testing an instance of the Teradata adapter, the next step is to deploy it into the infrastructure. 8.3.3. Tibco Example Conclusions While this section shows a prototype using Tibco, it is only one of many possible examples of how Teradata can work with standard Service Oriented Architectures. This example is intended to illustrate how Teradata can plug into both the event application and to the supporting structure, and could easily be redefined leveraging WebLogic Integration, Business Integration Server, BizTalk Server, or any standards-based EAI infrastructure. NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 53 541-0004922A01 Facilities and Techniques for Event Processing 9. Final Thoughts In V2R6, the Teradata Database has a strong new look, offering foundational features for extensibility and integration into modern enterprise architectures. Innovative event-oriented applications are made possible by these new features, features which are both forward-looking in their outreach to the operational world, and wellgrounded in the traditional Teradata strengths. This Orange Book not only illustrates these new capabilities, but describes various prototypes that establish the effectiveness and relevance of these new features. We've examined the usefulness of queue tables for message passing inside Teradata, with their unique ability to support blocking and/or destructive reads. We've demonstrated how external stored procedures can read or write to message queues on other platforms, and invoke or be invoked from SQL-based stored procedures, making up a virtual chain of event activity. We've proven the usefulness of both scalar UDFs and table functions for on the spot analytics, complex transformations and text manipulations, and even more extensive external I/O activity. Consider these as starting points, something to build on, modest examples of what Teradata can now offer in the world of event processing. 54 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved Facilities and Techniques for Event Processing 541-0004922A01 Appendix: XLST UDF and Access Module to Transform XML to Relational This illustrates using XSLT to convert XML to vartext. XSLT is a W3C standard language for transforming XML There are numerous free and commercial XSLT editors, debuggers, and processors. A prototype under development embeds the XSLT processing in a Teradata utility Access Module and in a table function UDF. In this example, the XSLT processor is invoked via a table function UDF. Given an XML and XSL document, the Axsmod returns vartext and the UDF returns rows comprised of varchars. The XSLT is used to navigate and select the desired content from the XML document. The UDF can be used to update tables via SQL from a sub-table returned from the UDF which takes the source XML and transforming XSLT as input args (varchar/varbyte, clob/blob, or literal). The sample below shows how an XSLT UDF creates a table of rows of vartext from varchar columns containing XML CREATE SET TABLE RMH.xmlorderstage, NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( ordernum INTEGER, ,xsltref INTEGER ,xmlOrder VARCHAR(30000) CHARACTER SET LATIN NOT CASESPECIFIC) PRIMARY INDEX ( ordernum ); CREATE SET TABLE RMH.xslts ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( xsltnum INTEGER ,xslt VARCHAR(30000) CHARACTER SET LATIN NOT CASESPECIFIC) PRIMARY INDEX ( ordernum ); <?xml version="1.0"?> <ROOT> <ORDER> <DATE>08/12/2004</DATE> <PO_NUMBER>108</PO_NUMBER> <BILLTO>Rick</BILLTO> <ITEMS> <ITEM> <PARTNUM>101</PARTNUM> <DESC>Partners Conference Ticket</DESC> <USPRICE>1200.00</USPRICE> </ITEM> <ITEM> <PARTNUM>148</PARTNUM> <DESC>V2R5.1 Stored Procedures and Embedded SQL</DESC> <USPRICE>28.95</USPRICE> </ITEM></ITEMS></ORDER></ROOT> NCR Confidential — Copyright © 2005 NCR — All Rights Reserved 55 541-0004922A01 Facilities and Techniques for Event Processing <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'> <xsl:output method='text'/> <xsl:strip-space elements="*"/> <xsl:template match='/'> <xsl:apply-templates select='ROOT' mode="order"/> <xsl:apply-templates select='ROOT/ORDER/ITEMS' mode="item"/> </xsl:template> <xsl:template match='ITEM' mode="item"> <xsl:value-of select='"item"'/> <xsl:text>,</xsl:text> <xsl:value-of select='/ROOT/ORDER/PO_NUMBER/text()'/> <xsl:text>,,,</xsl:text> <xsl:value-of select="PARTNUM"/> <xsl:text>,</xsl:text> <xsl:value-of select='DESC/text()'/> <xsl:text>,</xsl:text> <xsl:value-of select="USPRICE"/> <xsl:text>
</xsl:text> </xsl:template> <xsl:template match='ORDER' mode="order"> <xsl:value-of select='"order"'/> <xsl:text>,</xsl:text> <xsl:value-of select='PO_NUMBER/text()'/> <xsl:text>,</xsl:text> <xsl:value-of select="DATE"/> <xsl:text>,</xsl:text> <xsl:value-of select="BILLTO"/> <xsl:text>,,,</xsl:text> <xsl:text>
</xsl:text> </xsl:template> </xsl:stylesheet> replace function xslt(prmiPassthru INTEGER, prmiXMLDoc varchar(32000), prmiXSLT varchar(32000)) returns table (prmPassthru integer, vartext varchar(32000)) language C NO SQL parameter style sql EXTERNAL NAME 'F!xslt!SS!xslt!c:/files/projects/xslt/xslt.c'; SELECT L.vartext FROM (select xmlOrder, xslt From XMLOrderstage , XSLTS where xsltref = xsltnum ) as T, TABLE( XSLT(1,T.xmlOrder,T.xslt) ) AS L order,108,08/12/2004,Rick,,, item,108,,,101,Partners Conference Ticket,1200.00 item,108,,,148,V2R5.1 Stored Procedures and Embedded SQL,28.95 56 NCR Confidential — Copyright © 2005 NCR — All Rights Reserved

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Facilities and Techniques for Event Processing