Download Teradata Parallel Data Pump Reference

Teradata Parallel Data Pump Reference Release 12.00.00 B035-3021-067A July 2007 The product or products described in this book are licensed products of Teradata Corporation or its affiliates. Teradata, BYNET, DBC/1012, DecisionCast, DecisionFlow, DecisionPoint, Eye logo design, InfoWise, Meta Warehouse, MyCommerce, SeeChain, SeeCommerce, SeeRisk, Teradata Decision Experts, Teradata Source Experts, WebAnalyst, and You’ve Never Seen Your Business Like This Before are trademarks or registered trademarks of Teradata Corporation or its affiliates. Adaptec and SCSISelect are trademarks or registered trademarks of Adaptec, Inc. AMD Opteron and Opteron are trademarks of Advanced Micro Devices, Inc. BakBone and NetVault are trademarks or registered trademarks of BakBone Software, Inc. EMC, PowerPath, SRDF, and Symmetrix are registered trademarks of EMC Corporation. GoldenGate is a trademark of GoldenGate Software, Inc. Hewlett-Packard and HP are registered trademarks of Hewlett-Packard Company. Intel, Pentium, and XEON are registered trademarks of Intel Corporation. IBM, CICS, DB2, MVS, RACF, Tivoli, and VM are registered trademarks of International Business Machines Corporation. Linux is a registered trademark of Linus Torvalds. LSI and Engenio are registered trademarks of LSI Corporation. Microsoft, Active Directory, Windows, Windows NT, and Windows Server are registered trademarks of Microsoft Corporation in the United States and other countries. Novell and SUSE are registered trademarks of Novell, Inc., in the United States and other countries. QLogic and SANbox trademarks or registered trademarks of QLogic Corporation. SAS and SAS/C are trademarks or registered trademarks of SAS Institute Inc. SPARC is a registered trademarks of SPARC International, Inc. Sun Microsystems, Solaris, Sun, and Sun Java are trademarks or registered trademarks of Sun Microsystems, Inc., in the United States and other countries. Symantec, NetBackup, and VERITAS are trademarks or registered trademarks of Symantec Corporation or its affiliates in the United States and other countries. Unicode is a collective membership mark and a service mark of Unicode, Inc. UNIX is a registered trademark of The Open Group in the United States and other countries. Other product and company names mentioned herein may be the trademarks of their respective owners. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS-IS” BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO THE ABOVE EXCLUSION MAY NOT APPLY TO YOU. IN NO EVENT WILL TERADATA CORPORATION BE LIABLE FOR ANY INDIRECT, DIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS OR LOST SAVINGS, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. The information contained in this document may contain references or cross-references to features, functions, products, or services that are not announced or available in your country. Such references do not imply that Teradata Corporation intends to announce such features, functions, products, or services in your country. Please consult your local Teradata Corporation representative for those features, functions, products, or services available in your country. Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or updated without notice. Teradata Corporation may also make improvements or changes in the products or services described in this information at any time without notice. To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this document. Please e-mail: [email protected] Any comments or materials (collectively referred to as “Feedback”) sent to Teradata Corporation will be deemed non-confidential. Teradata Corporation will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display, transform, create derivative works of, and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis. Further, Teradata Corporation will be free to use any ideas, concepts, know-how, or techniques contained in such Feedback for any purpose whatsoever, including developing, manufacturing, or marketing products or services incorporating Feedback. Copyright © 1996-2007 by Teradata Corporation. All Rights Reserved. Preface Purpose This book provides information about Teradata TPump (TPump), which is a Teradata® Tools and Utilities product. Teradata Tools and Utilities is a group of products designed to work with Teradata Database. TPump is a data loading utility that helps you maintain (update, delete, insert, and atomic upsert) the data in your Teradata Database. TPump uses standard Teradata SQL to achieve moderate to high data loading rates to the Teradata Database. Multiple sessions and multi-statement requests are typically used to increase throughput. Audience This book is intended for use by: • System and application programmers • System administrators Supported Releases This book supports the following releases: • Teradata Database 12.00.00 • Teradata Tools and Utilities 12.00.00 • Teradata TPumpVersion 12.00.00 Note: See “TPump Script Example” on page 72 to verify the Teradata TPump version number. To locate detailed supported release information: 1 Go to www.info.teradata.com. 2 Navigate to General Search>Publication Product ID. 3 Enter 3119. 4 Open the version of the Teradata Tools and Utilities Supported Versions spreadsheet associated with this release. The spreadsheet includes supported Teradata Database versions, platforms, and product release numbers. Teradata Parallel Data Pump Reference 3 Preface Prerequisites Prerequisites The following prerequisite knowledge is required for this product: • Basic computer technology • SQL and Teradata SQL • Teradata Database, database management systems • Teradata utilities that load and retrieve data Changes to This Book The following changes were made to this book in support of the current release. Changes are marked with change bars. For a complete list of changes to the product, see the Release Definition associated with this release. Date and Release Description July 2007 12.00.00 Updated to Teradata Warehouse 12.00.00, TTU 12.00.00, TPump 12.00.00. See “Supported Releases” on page 3. Extended text delimiter size and multi-character delimiters. See description for the syntax elements “FORMAT” on page 148 and “'c'” on page 149. Added query banding feature support. See Teradata SQL Statement “SET QUERY_BAND” on page 30. Added new data-related retryable error code. See “Error Types” on page 193 and “5991” on page 196 of “Table 19: TPump Error Conditions” on page 194. Added note regarding the use of the latency option when using AXSMOD and NPAXSMOD. See “LATENCY” on page 102 and “AXSMOD name” on page 145. Updated Run-time Parameters information. See Table 5 on page 45. Document has incorrect title for TPump Log Field. See “Example” on page 39, “TPump Script Example” on page 72, and “Table 13: TPump Statistics” on page 75. Supports an option to show Version and stop. See “RVERSION” on page 50. Supports multi-byte characters in object names when the client session character set is UTF8 or UTF16. See “Rules for Using Chinese and Korean Character Sets” on page 24. Added a statement regarding BOM not being supported on MVS in data file or AXSMODs using UTF8. See “UTF8 Character Sets” on page 25. 4 Teradata Parallel Data Pump Reference Preface Additional Information Date and Release Description TPump does not always place bad row in Error Table. See“ERRLIMIT” on page 98. Unicode data dictionary support. See “Multibyte Character Sets” on page 65. Limitations/characteristics of Teradata Database versions earlier than V2R6.0 documented in Teradata Parallel Data Pump Reference are no longer relevant and have been reworded or removed. Updated NOTIMERPROCESS syntax element in the BEGIN LOAD Command. On MVS, there were intermittent abends with SEC6, reason code = 0000FFOE. See “NOTIMERPROCESS” on page 102. Data Conversion Capabilities, Checkpoints, and Multibyte Character Sets information missing. See “Data Conversion Capabilities” on page 24 and “Character Set Specifications for AXSMODs” on page 65. DBS unable to handle 128th DML when the APPLY condition for 128th DML has value 1 in TPump script. See changes in Chapter 3. Removed references to ASF2TR product, discontinued effective with 12.00.00. Added information regarding new logon string size limit of 30 bytes. See “Multibyte Character Sets” on page 65. Additional Information Additional information that supports this product and Teradata Tools and Utilities is available at the web sites listed in the table that follows. In the table, mmyx represents the publication date of a manual, where mm is the month, y is the last digit of the year, and x is an internal publication code. Match the mmy of a related publication to the date on the cover of this book. This ensures that the publication selected supports the same release. Type of Information Description Access to Information Release overview Use the Release Definition for the following information: 1 Go to www.info.teradata.com. • Overview of all of the products in the release • Information received too late to be included in the manuals • Operating systems and Teradata Database versions that are certified to work with each product • Version numbers of each product and the documentation for each product • Information about available training and the support center 3 In the Publication Product ID box, type 2029. Late information Teradata Parallel Data Pump Reference 2 Select the General Search check box. 4 Click Search. 5 Select the appropriate Release Definition from the search results. 5 Preface Additional Information Type of Information Description Access to Information Additional product information Use the Teradata Information Products Publishing Library site to view or download specific manuals that supply related or additional information to this manual. 1 Go to www.info.teradata.com. 2 Select the Teradata Data Warehousing check box. 3 Do one of the following: • For a list of Teradata Tools and Utilities documents, click Teradata Tools and Utilities and then select a release or a specific title. • Select a link to any of the data warehousing publications categories listed. Specific books related to Teradata TPump are as follows: • Teradata Parallel Data Pump Reference B035-3021-mmyx • Teradata Tools and Utilities Command Summary B035-2401-mmyx CD-ROM images Ordering information for manuals Access a link to a downloadable CD-ROM image of all customer documentation for this release. Customers are authorized to create CD-ROMs for their use from this image. 1 Go to www.info.teradata.com. Use the Teradata Information Products Publishing Library site to order printed versions of manuals. 1 Go to www.info.teradata.com. 2 Select the General Search check box. 3 In the Title or Keyword box, type CD-ROM. 4 Click Search. 2 Select the How to Order check box under Print & CD Publications. 3 Follow the ordering instructions. General information about Teradata The Teradata home page provides links to numerous sources of information about Teradata. Links include: 1 Go to Teradata.com. 2 Select a link. • Executive reports, case studies of customer experiences with Teradata, and thought leadership • Technical information, solutions, and expert advice • Press releases, mentions, and media resources 6 Teradata Parallel Data Pump Reference Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Supported Releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Changes to This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Additional Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Chapter 1: Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 TPump Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complementing MultiLoad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TPump Support Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What it Does . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How it Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 16 17 17 18 Operating Features and Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Input Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Client Character Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Conversion Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unicode Character Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Client Character Set/Client Type Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 21 22 22 24 25 25 26 TPump Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 TPump Command Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Teradata SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 The TPump Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Task Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DML Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upsert Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TPump Macros. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teradata Parallel Data Pump Reference 31 32 32 32 32 33 7 Table of Contents Access Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33 Fallback vs. Nonfallback Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35 Chapter 2: Using TPump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37 Invoking TPump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37 TPump Support Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37 File Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 On IBM Mainframe Client-Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 On UNIX- and Windows-based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42 In Interactive Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42 In Batch Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42 Run-time Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44 Examples - Redirection of Inputs and Outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51 Terminating TPump. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51 Normal Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51 Abort Termination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52 After Terminating a TPump Job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52 Restarting and Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53 Basic TPump Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53 Protection and Location of TPump Database Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53 Reinitializing a TPump Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 Recovering an Aborted TPump Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 Recovering from Script Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 Programming Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 TPump Command Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58 Using ANSI/SQL DateTime Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 Using Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 Specifying a Character Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64 Using Graphic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66 Using Graphic Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67 Restrictions and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67 Termination Return Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68 Writing a TPump Job Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69 Script Writing Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69 Procedure for Writing a Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71 TPump Script Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72 Viewing TPump Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74 8 Teradata Parallel Data Pump Reference Table of Contents TPump Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 TPump Options Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Logoff/Disconnect Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Monitoring TPump Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monitor Interface Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TPump Monitor Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TPump Monitor Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 81 82 83 Estimating Space Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Space Calculation Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Chapter 3: TPump Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Syntax Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 TPump Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 TPump Teradata SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 ACCEPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 BEGIN LOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 DATABASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 DATEFORM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 DELETE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 DISPLAY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 DML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Basic Upsert Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upsert. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Atomic Upsert Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 117 120 121 122 END LOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 EXECUTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 FIELD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 FILLER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 IF, ELSE, and ENDIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 IMPORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 INSERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 LAYOUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 LOGDATA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 LOGMECH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 LOGOFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 LOGON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Teradata Parallel Data Pump Reference 9 Table of Contents LOGTABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .168 NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .170 PARTITION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .172 ROUTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .176 RUN FILE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .178 SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .180 SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .182 TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .184 UPDATE Statement and Atomic Upsert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .186 Chapter 4: Troubleshooting in TPump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .193 Early Error Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .193 Error Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .193 Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .194 Reading TPump Error Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .197 TPump Performance Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .200 Chapter 5: Using INMOD and Notify Exit Routines. . . . . . . . . . . . . . . . . . . . . . . . . . .201 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .201 INMOD Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .201 Notify Exit Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .202 Programming Languages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .202 Programming Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .203 Routine Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .204 The TPump/INMOD Routine Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .205 TPump/Notify Exit Routine Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .206 Rules and Restrictions for Using Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .209 Using INMOD and Notify Exit Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .211 TPump-specific Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .211 TPump/INMOD Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .212 Preparing the INMOD Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .214 INMOD Input Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215 INMOD Output Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215 Programming INMODs for UNIX-based Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .216 10 Teradata Parallel Data Pump Reference Table of Contents Compiling and Linking a C INMOD on a UNIX-based Client. . . . . . . . . . . . . . . . . . . . Compiling and Linking a C INMOD on MP-RAS and Sun Solaris SPARC. . . . . . . . . . Compiling and Linking a C INMOD on a Sun Solaris Opteron . . . . . . . . . . . . . . . . . . . Compiling and Linking a C INMOD on HP-UX PA RISC . . . . . . . . . . . . . . . . . . . . . . . Compiling and Linking a C INMOD on HP-UX Itanium. . . . . . . . . . . . . . . . . . . . . . . . Compiling and Linking a C INMOD on an IBM AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . Compiling and Linking a C INMOD on a Linux Client . . . . . . . . . . . . . . . . . . . . . . . . . 216 217 218 219 220 221 222 Programming INMODs for a Windows Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Compiling and Linking a C INMOD on a Windows Client . . . . . . . . . . . . . . . . . . . . . . 223 Appendix A: How to Read Syntax Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Syntax Diagram Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Appendix B: TPump Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Simple Script Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Restarted Upsert Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Example Using the TABLE Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Appendix C: INMOD and Notify Exit Routine Examples . . . . . . . . . . . . . . . . . . . . . . 241 COBOL Pass-Thru INMOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Assembler INMOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 PL/I INMOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 C INMOD - UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Sample Notify Exit Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Teradata Parallel Data Pump Reference 11 Table of Contents 12 Teradata Parallel Data Pump Reference List of Tables Table 1: TPump Data Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Table 2: TPump Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Table 3: Supported Teradata SQL Statements in TPump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Table 4: Comparison of Fallback and Nonfallback Target Tables . . . . . . . . . . . . . . . . . . . . . . 35 Table 5: Run-time Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Table 6: TPump Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Table 7: TPump Conditional Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Table 8: Predefined System Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Table 9: Ways to Either Specify a Character Set or Accept a Default Specification . . . . . . . . 64 Table 10: GRAPHIC Data Types for datadesc option in FIELD or FILLER Statement . . . . . 67 Table 11: Restrictions and Limitations on Operational Features and Functions . . . . . . . . . . 67 Table 12: Termination Return Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Table 13: TPump Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Table 14: Monitor Interface Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Table 15: TPump Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Table 16: TPump Teradata SQL Statements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Table 17: Events that Create Notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Table 18: ANSI/SQL DateTime Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Table 19: TPump Error Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Table 20: Acquisition Error Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Table 21: Programming Routines by Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Table 22: TPump-to-INMOD Status Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Table 23: INMOD-to-TPump Interface Status Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Table 24: Events Passed to the Notify Exit Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Table 25: INMOD Input Return Code Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Table 26: INMOD Output Return Code Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Teradata Parallel Data Pump Reference 13 List of Tables 14 Teradata Parallel Data Pump Reference CHAPTER 1 Overview This chapter provides an introduction to the Teradata TPump (TPump) utility. Topics include: • TPump Utility • Operating Features and Capabilities • TPump Commands • The TPump Task TPump Utility The following information provides a general overview of the TPump utility. Description TPump is a data loading utility that helps you maintain (update, delete, insert, and atomic upsert) the data in your Teradata Database. TPump allows you to achieve near-real time data in your data warehouse. TPump uses standard Teradata SQL to achieve moderate to high data loading rates to the Teradata Database. Multiple sessions and multistatement requests are typically used to increase throughput. TPump provides an alternative to Teradata MultiLoad for the low volume batch maintenance of large databases under control of a Teradata system. Instead of updating Teradata Databases overnight, or in batches throughout the day, TPump updates information in real time, acquiring data from the client system with low processor utilization. It does this through a continuous feed of data into the data warehouse, rather than through traditional batch updates. Continuous updates result in more accurate, timely data. Unlike most load utilities, TPump uses row hash locks rather than table level locks. This allows you to run queries while TPump is running. This also means that TPump can be stopped instantaneously. TPump provides a dynamic throttling feature that enables it to run “all out” during batch windows, but within limits when it may impact other business uses of the Teradata Database. Operators can specify the number of statements run per minute, or may alter throttling minute-by-minute, if necessary. TPump’s main attributes are: Teradata Parallel Data Pump Reference 15 Chapter 1: Overview TPump Utility • Simple, hassle-free setup – does not require staging of data, intermediary files, or special hardware. • Efficient, time-saving operation – jobs can continue running in spite of database restarts, dirty data, and network slowdowns. Jobs restart without intervention. • Flexible data management – accepts an infinite variety of data forms from an infinite number of data sources, including direct feeds from other databases. TPump is also able to transform that data on the fly before sending it to Teradata. SQL statements and conditional logic are usable within the utilities, making it unnecessary to write wrapper jobs around the utilities. Note: Full tape support is not available for any function in TPump for network-attached client systems. If you want to import data from a tape, you will need to write a custom access module that interfaces with the tape device. Refer to the Teradata Tools and Utilities Access Module Programmer Guide for information about how to write a custom access module. Complementing MultiLoad TPump uses MultiLoad-like syntax, which leverages MultiLoad knowledge and power, provides easy transition from MultiLoad to TPump, and supports the useful upsert feature. TPump shares much of its command syntax with MultiLoad, which facilitates conversion of scripts between the two utilities; however, there are substantial differences in how the two utilities operate. TPump complements MultiLoad in the following ways: 16 1 Economies of Scale: MultiLoad has an economy of scale and is not necessarily efficient when operating on really large tables when there are not many rows to insert or update. For MultiLoad to be efficient, it must touch more than one row per data block in the Teradata Database. For example, to achieve efficient MultiLoad performance on a two billion, 65-byte row table, composed of 16KB blocks, more than 0.4% of the table (8,125,000 rows) must be affected. While 0.4% of a table is a small update, eight million records is probably more data than you are going to want to run through a BTEQ script. 2 Concurrency: MultiLoad is limited to a Teradata Database variable limit for the maximum number of instances running concurrently. TPump does not impose this limit. In addition, while MultiLoad uses table-level locks, TPump uses row-hash locks, making concurrent updates on the same table a possibility. Finally, because of the phased nature of MultiLoad, there are potentially inconvenient windows of time when MultiLoad cannot be stopped without losing access to the target tables. In contrast, TPump can always be stopped and all of its locks dropped with no ill effect. 3 Resource Consumption: MultiLoad is designed for the highest possible throughput, and uses any database and host resources that help to achieve this capability. There is no way to reduce MultiLoad's resource consumption—even if you are willing to accept a longer run time for your job. TPump, however, has a built-in resource governing facility. This allows the operator to specify how many updates occur (the statement rate) minute by minute, and then change the statement rate, while the job continues to run. Thus, this facility can be used to increase the statement rate during windows when TPump is running by itself, but then decrease the statement rate later on, if users log on for ad hoc query access. Teradata Parallel Data Pump Reference Chapter 1: Overview TPump Utility TPump Support Environment The data-handling functionality of TPump is enhanced by the TPump support environment. In addition to coordinating activities involved in TPump tasks, it provides facilities for managing file acquisition, conditional processing, and performing certain Data Manipulation Language (DML) and Data Definition Language (DDL) activities on the Teradata Database. The TPump support environment enables an additional level of user control over TPump. For more information, see “TPump Support Environment” on page 37. What it Does Within a single invocation of TPump, one or more distinct TPump tasks can be executed in series with any TPump support commands. The TPump task provides the acquisition of data from client files for application to target tables through INSERT, UPDATE, or DELETE statements that specify the full primary index. Data is retrieved from the client, and sent as transaction rows to the Teradata Database, which are immediately applied to the various target tables. Each TPump task can acquire data from one or many client files with similar or different layouts. From each source record, one or more INSERT, UPDATE, or DELETE statements can be generated and directed to any target table. The following concepts may improve your understanding of TPump. • The language of TPump commands and statements is used to describe the task you want to accomplish. • TPump examines all commands and statements for a task, from the BEGIN LOAD command through the END LOAD command, before actually executing the task. • After all commands and statements involved in a given task have been processed and validated by TPump, the TPump task is executed as described in this and subsequent chapters. • Optionally, TPump supports data serialization for a given row, which guarantees that if a row insert is immediately followed by a row update, the insert is processed first. This is done by hashing records to a given session. • TPump supports bulletproof restartability using time-based checkpoints. Using frequent checkpoints provides a greater ease in restarting, but at the expense of the checkpointing overhead. • TPump supports upsert logic similar to MultiLoad. • TPump supports insert/update/delete statements in multiple-record requests. • TPump uses macros to minimize network overhead. Before TPump begins a load, it sends the statements to the Teradata Database to create equivalent macros for every insert/ update/delete statement used in the job script. The execute macro requests, rather than lengthy text requests, are then executed iteratively during a job run. • TPump supports interpretive, record manipulating and restarting features similar to MultiLoad. • TPump supports conditional apply logic, similar to MultiLoad. Teradata Parallel Data Pump Reference 17 Chapter 1: Overview TPump Utility • TPump supports error treatment options, similar to MultiLoad. • TPump runs as a single process. • TPump supports Teradata Database internationalization features such as kanji character sets. • Up to 600 operations can be packed into a single request for network efficiency. The limit of 600 may vary as the overall limit for a request is one megabyte. TPump assumes that every statement is a one- or two- (for fallback) step request. How it Works TPump is a Teradata utility with functions similar to the MultiLoad utility. MultiLoad edits Teradata tables by processing insert, updates, and deletes, and so does TPump. This section provides insight into the important differences between MultiLoad and TPump. All of the information in this section is discussed in further detail later in this document, either explicitly or by implication. Methods of Operation MultiLoad performs an update on the Teradata Database in phases. during the first phase of operation, MultiLoad uses a special database and CLIv2 protocol for efficiently sending “large” (64 KB) data messages to the RDBMS. The data is stored in a temporary table. During the second phase of operation, the temporary table is sorted and then changes from it are “applied” to the various target tables. In this phase, processing is entirely in the RDBMS and the MultiLoad application on the client waits to see if the job completes successfully. TPump performs updates on the Teradata Database in a synchronous manner. Changes are sent in conventional CLIv2 parcels and applied immediately to the target table(s). To improve its efficiency, TPump builds multiple statement requests and provides the serialize option to help reduce locking overhead. Economy of Scale and Performance MultiLoad performance improves as the volume of changes increases. This is because, in phase two of MultiLoad, the changes are applied to the target table(s) in a single pass and all changes for any physical data block are effected using one read and one write of the block. Furthermore, the temporary table and the sorting process used by MultiLoad are additional overheads that must be “amortized” through the volume of changes. TPump, on the other hand, does better on relatively low volumes of changes because there is no temporary table overhead. TPump becomes expensive for large volumes of data because multiple updates to a physical data block will most likely result in multiple reads and writes of the block. Multiple Statement Requests The most important technique used by TPump to improve performance over MultiLoad is the multiple statement request. Placing more statements in a single request is beneficial for two reasons. First, it reduces network overhead because large messages are more efficient than small ones. Secondly, (in ROBUST mode) it reduces TPump recovery overhead, which amounts to one extra database row written for each request. TPump automatically packs 18 Teradata Parallel Data Pump Reference Chapter 1: Overview TPump Utility multiple statements into a request based upon the PACK specification in the BEGIN LOAD command. Macro Creation TPump uses macros to efficiently modify tables, rather than using the actual DML commands. The technique of changing statements into equivalent macros before beginning the job greatly improves performance. Specifically, the benefits of using macros are: 1 the size of network (and channel) messages sent to the RDBMS by TPump are reduced. 2 RDBMS parsing engine overhead is reduced because the execution plans (or “steps”) for macros are cached and re-used. This eliminates “normal” parser handling, where each request sent by TPump is planned and optimized. Because the space required by macros is negligible, the only issue regarding the macros is where the macros are placed in the RDBMS. The macros are put into the database that contains the restart log table or the database specified using the MACRODB keyword in the BEGIN LOAD command. Locking and Transactional Logic In contrast to MultiLoad, TPump uses conventional row hash locking which allows for some amount of concurrent read and write access to its target tables. At any point TPump can be stopped and the target tables are fully accessible. Note however, that if TPump is stopped, depending on the nature of the update process, it may mean that the “relational” integrity of the data is impaired. This differs from MultiLoad, which operates as a single logical update to one or more target tables. Once MultiLoad goes into phase two of its logic, the job is “essentially” irreversible and the (entire set of) table(s) is locked for write access until it completes. If TPump operates on rows that have associated “triggers”, the triggers are invoked as necessary. Recovery Logic and Overhead TPump, in “ROBUST mode”, writes one database row in the log restart table for every request that it issues. This collection of rows in the restart log table can be referred to as the request log. Because a request is guaranteed by the RDBMS to either completely finish or completely rollback, the request log will always accurately reflect the completion status of a TPump import. Thus, the request log overhead for restart logic decreases as the number of statements packed per request increases. TPump also allows you to specify a checkpoint interval. During the checkpoint process TPump flushes all pending changes from the import file to the database and also cleans out the request log. The larger the checkpoint interval, the larger the request log (and its table) is going to grow. Upon an unexpected restart, TPump scans the import data source along with the request log in order to re-execute the statements not found in the request log. TPump in “SIMPLE (non-ROBUST) mode”, provides basic checkpoints. If a restart occurs between checkpoints, then some requests will likely be reprocessed. This is adequate protection under some circumstances. Teradata Parallel Data Pump Reference 19 Chapter 1: Overview TPump Utility During phase one, MultiLoad uses checkpoints so that restarts do not force the job to always restart from the beginning. During phase two, MultiLoad uses its temporary table as a repository of all changes to be applied and the RDBMS process of applying the changes guarantees that no changes are missed or applied more than once. Serialization of Changes In certain uses of TPump or MultiLoad it is possible to have multiple changes to one row in the same job. For instance, the row may be inserted and then updated during the batch job or it may be updated and then deleted. In any case, the correct ordering of these operations is obviously very important. MultiLoad automatically guarantees that this ordering of operations is maintained correctly. By using the serialization feature, TPump can also guarantee that this ordering of operations is maintained correctly, but it requires some small amount of scripting work and a small amount of utility overhead. The use of the serialize option on the BEGIN LOAD command guarantees that TPump will send each change for a data record of a given key in order. The KEY modifier to the FIELD command is how a script specifies that a given field is to be part of the serialization key. The intent of this feature is to allow you to specify the key corresponding to the primary index of the target table. In fact, the TABLE command automatically qualifies the generated fields with the KEY modifier when the fields are part of the primary index of the table. If the DML statements in the TPump script specify more than one target table then it is up to the script author to make sure that primary indices of all the tables match when using the serialization feature. The serialization feature works by hashing each data record based upon its key to determine which session transmits the record to the RDBMS. Thus the extra overhead in the application is derived from the mathematical operation of hashing and from the extra amount of buffering necessary to save data rows when a request is already pending on the session chosen for transmission. The serialization feature greatly reduces the potential frequency of RDBMS deadlock. Deadlocks can occur when requests for the application happen to affect row(s) that use the same hash code within the RDBMS. Although deadlocks are handled by the RDBMS and by TPump correctly, the resolution process is time-consuming and adds additional overhead to the application because it must re-execute requests that roll back due to deadlock. In addition to using SERIALIZEON in the BEGIN LOAD command, the SERIALIZEON keyword can also be specified in the DML command. This lets you turn serialization on for the fields you specify. For more information on the DML-based serialization feature, refer to “DML” on page 113. Dual Database Strategy The serialization feature is intended to support a variety of other potential customer applications that go under the general heading dual database. These are applications that in some way take a “live feed” of inserts, updates, and deletes from another database and apply them without any preprocessing to a Teradata Database. Both TPump and MultiLoad are potential parts of the dual database strategy. A dual database application will generate a DML stream which will be routed to TPump or MultiLoad through 20 Teradata Parallel Data Pump Reference Chapter 1: Overview Operating Features and Capabilities a paramod/inmod specific to the application. The choice between TPump or MultiLoad will depend on such things as the volume of data (with higher volumes favoring MultiLoad) and the concurrent access requirements (with greater access requirements favoring TPump). Resource Usage and Limitations A feature unique to TPump is the ability to constrain run-time resource usage through the statement rate feature. TPump gives you control over the rate per minute at which statements are sent to the RDBMS and the statement rate correlates directly to resource usage on both the client and in the RDBMS. The statement rate can be controlled in two ways, either dynamically while the job is running, or it can be scripted into the job with the RATE keyword on the BEGIN LOAD command. Dynamic control over the statement rate is provided by updates to a table on the RDBMS. In contrast with TPump, MultiLoad always uses CPU and memory very efficiently. During phase one (assuming that the RDBMS is not a bottleneck), MultiLoad will probably bottleneck on the client, consuming significant network or channel resources. During phase two, MultiLoad uses very significant RDBMS disk, CPU, and memory resources. In fact, the RDBMS limits the number of concurrent MultiLoad, FastLoad, and FastExport jobs for the very reason that they are so resource-intensive. TPump has no such RDBMS-imposed limitation. Warning: Although there is no RDBMS-imposed limitation on the number of concurrent TPump jobs, an excessive number of small jobs causes contention on the Teradata Database system catalogue. The limit will vary from one installation to another, and each installation should determine its own capacity for running a multiplicity of TPump jobs to avoid potential deadlocks. Operating Features and Capabilities The following section describes the operating modes; input data formats; and client, unicode, and site-defined character sets for TPump. For specific information on supported operating systems, refer to Teradata Tools and Utilities 12.00.00 Supported and Certified Versions, B035-3119-067K. This spreadsheet shows version numbers and platform information for all Teradata Tools and Utilities release 12.00.00 products and is available at www.info.teradata.com. Operating Modes TPump runs in the following operating modes: • Interactive – Interactive processing involves the more or less continuous participation of the user. • Batch – Batch programs process data in discrete groups of previously scheduled operations, typically in a separate operation, rather than interactively or in real time. Teradata Parallel Data Pump Reference 21 Chapter 1: Overview Operating Features and Capabilities Input Data Formats TPump supports the input data formats on UNIX and Windows platforms as listed in Table 1. Mainframes have standard records. Table 1: TPump Data Formats Data Format Description BINARY Specifies that each input record is a 2-byte integer, n, followed by n bytes of data. FASTLOAD Specifies that each input record is a 2-byte integer, n, followed by n bytes of data, followed by an end-of-record marker, either X’0A’ or X’0D’. TEXT Specifies that each input record is an arbitrary number of bytes followed by an endof-record marker, which is a: • Linefeed (X’0A’) on UNIX platforms. • Carriage-return/linefeed pair (X’0D0A’) on Windows platforms. UNFORMAT Specifies that each input record is defined by FIELD commands of the specified layout. VARTEXT Specifies that each variable-length text record has each field separated by a delimiter character. For a description of the supported input file formats, see the discussion of the FORMAT option for network-attached client systems in the IMPORT Command description in Chapter 3: “TPump Commands.” Client Character Sets Standard Character Sets The following standard character sets are supported by Teradata Database. Standard Character Sets System Configuration Name Channel-Attached EBCDIC Network-Attached ASCII The terms ASCII and EBCDIC are often used in ambiguous ways, and this presents a difficulty for accented and non-Latin characters. The user should select a client character set that exactly matches the character set that the import data uses. If you use accented and non-Latin characters, do not use the ASCII or EBCDIC client character sets. Instead, load and use one of the other Teradata-supplied character sets, or a site-defined character set that exactly matches the application character set, such as: EBCDIC037_0E for channel-attached clients (for the United States or Canada), LATIN1_0A, 22 Teradata Parallel Data Pump Reference Chapter 1: Overview Operating Features and Capabilities LATIN9_0A (for Western European languages), LATIN1252_0A for Western European Microsoft® Windows clients, or UTF8 for UNIX clients. Japanese Characters Sets The following Japanese character sets are supported by Teradata Database. Japanese Character Sets System Configuration Character Set Name Channel-Attached KATAKANAEBCDIC KANJIEBCDIC5026_0I KANJIEBCDIC5035_0I Network-Attached KANJIEUC_0U KANJISJIS_0S For more information on kanji character sets, refer to International Character Set Support. Caution: TPump statements do not accept object names specified in internal RDBMS hexadecimal form and do not display object names in hexadecimal form. Chinese and Korean Character Sets Chinese and Korean character sets are available for channel- and network-attached client systems. The following table defines the Chinese character sets: Chinese Character Sets System Configuration Name Channel-Attached SCHEBCDIC935_2IJ TCHEBCDIC937_3IB Network-Attached SCHGB2312_1T0 TCHBIG5_1R0 The following table defines the Korean character sets: Korean Character Sets System Configuration Name Channel-Attached HANGULEBCDIC933_1II Teradata Parallel Data Pump Reference 23 Chapter 1: Overview Operating Features and Capabilities Korean Character Sets Network-Attached HANGULKSC5601_2R4 Rules for Using Chinese and Korean Character Sets Certain rules apply when using Chinese and Korean character sets on channel- and networkattached platforms. • Object Names Since 12.0, Teradata Database supports multi-byte characters in object names when the client session character set is UTF8 or UTF16. For a list of valid and non-valid characters when multi-byte object names are used, see the Appendix of International Character Set Support. If multi-byte characters are used in object names in TPump script, they must be enclosed in double quotes. • Maximum String Length The Teradata Database requires two bytes to process each of the Chinese or Korean characters. This limits both request size and record size. For example, if a record consists of one string, the length of that string is limited to a maximum of 32,000 characters or 64,000 bytes. Data Conversion Capabilities Teradata TPump can redefine the data type specification of numeric, character, and date input data so it matches the type specification of its destination column in the TPump table on the Teradata Database. For example, if an input field with numeric type data is targeted for a column with a character data type specification, Teradata TPump can change the input data specification to character before inserting it into the table. You use the datadesc specification of the Teradata TPump FIELD command to convert input data to a different type before inserting it into the TPump table on the Teradata Database. The types of data conversions you can specify are: • Numeric-to-numeric (for example integer-to-decimal) • Character-to-numeric • Character-to-date • Date-to-character Note: Redundant conversions, such as integer-to-integer, are legal and necessary to support the zoned decimal format. For more information about the zoned decimal format, data types, and data conversions, see SQL Reference: Data Types and Literals. 24 Teradata Parallel Data Pump Reference Chapter 1: Overview Operating Features and Capabilities Checkpoints Teradata TPump supports the use of checkpoints. Checkpoints are entries posted to a restart log table at regular intervals during the TPump data transfer operation. If processing stops while a TPump job is running, you can restart the job at the most recent checkpoint. For example, assume you are loading 1,000,000 records in a table and have specified checkpoints every 50,000 records. Then Teradata TPump pauses and posts an entry to the restart log table whenever multiples of 50,000 records are successfully sent to the Teradata Database. If the job stops after record 60,000 has been loaded, you can restart the job at the record immediately following the last checkpoint—record 50,001. You enable the checkpoint function by specifying a checkpoint value in the BEGIN LOAD command. For more information, see “BEGIN LOAD” on page 95. Unicode Character Sets UTF8 and UTF16 are two of the standard ways of encoding Unicode character data. Teradata Database supports UTF8 and UTF16 client character sets. The UTF8 client character set supports UTF8 encoding. Currently Teradata Database supports UTF8 characters that can consist of from one to three bytes. The UTF16 client character set supports UTF16 encoding. Currently, the Teradata Database supports the Unicode 2.1 standard, where each defined character requires exactly 16 bits. There are restrictions imposed by Teradata Database on using the UTF8 or UTF16 character set. Refer to Teradata Database International Character Set Support manual for restriction details. UTF8 Character Sets TPump supports UTF8 character set on network-attached platforms and IBM MVS. When using UTF8 character set on the network-attached platform, the predefined macro should be used. This will ensure that the import data will be translated into the server character set defined in the predefined macro, with Teradata Database applying the import data to the macro during the load. When predefined macros are not supplied, TPump will create macros according to the user's default server character set (defined when the user account is created), which may lead to character translation errors if the user's default server character set is not Unicode. On IBM MVS, the job script must be in Teradata EBCDIC when using UTF8 client character set. TPump will translate commands in the job script from Teradata EBCDIC to UTF8 during the load. Be sure to examine the definition in the International Character Set Support manual to determine the code points of any special characters you might require in the job script. Different versions of EBCDIC do not always agree as to the placement of these characters. Refer to the mappings between Teradata EBCDIC and Unicode in Appendix E of International Character Set Support manual. Currently, UTF8 Byte Order Mark (BOM) is not supported on the MVS platform when using access modules or data files. Teradata Parallel Data Pump Reference 25 Chapter 1: Overview Operating Features and Capabilities See Chapter 3 for complete information on TPump commands. Refer to parameters commands nullexpr, fieldexpr, VARTEXT format delimiter, WHERE condition, and CONTINUEIF condition for additional information on using UTF8 client character set on the mainframe. UTF16 Character Sets TPump supports UTF16 character set on network-attached platforms. In general, the command language and the job output should be the same as the client character set used by the job. However, for user’s convenience and because of the special property of Unicode, the command language and the job output are not required to be the same as the client character set when using UTF16 character set. When using UTF16 character set, the job script and the job output can either be in UTF8 or UTF16 character set. This is provided by specifying runtime parameters "-i" and "-u" when the job is invoked. For more reference information on runtime parameters "-i" and "-u", see parameters -i scriptencoding and -u outputencoding on “-u outputencoding” on page 46. Also refer to parameters commands fieldexpr “fieldexpr” on page 132, nullexpr on “nullexpr” on page 131, WHERE condition on “WHERE condition” on page 109 and CONTINUEIF condition on “CONTINUEIF condition” on page 158 for additional information on using UTF16 client character set. Client Character Set/Client Type Compatibility Use the following table as a general guideline for choosing client character sets that may work better for your client environment. 26 If the Client Type is Client Character Sets that May Work Best are Channel-attached • • • • • • • • • EBCDIC EBCDIC037_0E KANJIEBCDIC5026_0I KANJIEBCDIC5035_0I KATAKANAEBCDIC SCHEBCDIC935_2IJ TCHEBCDIC937_3IB HANGULEBCDIC933_1II UTF8 Network-attached running UNIX • • • • • • • • • ASCII KANJIEUC_0U LATIN1_0A LATIN9_0A UTF8 UTF16 SCHGB2312_1T0 TCHBIG5_1R0 HANGULKSC5601_2R4 Teradata Parallel Data Pump Reference Chapter 1: Overview TPump Commands If the Client Type is Client Character Sets that May Work Best are Network-attached running Windows • • • • • • • • ASCII KANJISJIS_0S LATIN1252_0A UTF8 UTF16 SCHGB2312_1T0 TCHBIG5_1R0 HANGULKSC5601_2R4 Note: TPump supports UTF8 client character set on IBM MVS but not IBM VM on channelattached platforms. Site-Defined Character Sets When the character sets defined by Teradata Database are not appropriate for your site, you can define your own character sets. Refer to Teradata Database International Character Set Support for information on defining your own character set. TPump Commands TPump accepts both TPump commands and a subset of Teradata SQL statements. These are described in the following sections: TPump Command Input TPump commands perform two types of activities. The following table provides a description of those activities and functions. Activity Description Support Support commands establish the TPump sessions with the Teradata Database and establish the operational support environment for TPump. Support commands are not directly involved in specifying a TPump task. Task The TPump task commands specify the actual processing that takes place for each MultiLoad task. The task commands, combined with Teradata SQL INSERT, UPDATE, and DELETE statements, are used to define TPump IMPORT and DELETE tasks. The TPump commands that perform the support and task activities are listed in Table 2: Teradata Parallel Data Pump Reference 27 Chapter 1: Overview TPump Commands Table 2: TPump Commands Activity TPump Command Function Support ACCEPT Allows the value of one or more utility variables to be accepted from a file DATEFORM Lets you define the form of the DATE data type specifications for the TPump job. DISPLAY Writes messages to the specified destination ELSE Followed by commands and statements which execute when the preceding IF command is false ENDIF Delimits the group of TPump commands and statements that were subject to previous IF or ELSE commands or both IF When followed by a conditional expression, initiates execution of subsequent commands and statements LOGDATA Supplies parameters to the LOGMECH command beyond those needed by the logon mechanism, such as user ID and password, to successfully authenticate the user LOGMECH Identifies the appropriate logon mechanism by name LOGOFF Disconnects all active sessions and terminates TPump support on the client LOGON Specifies the LOGON string to be used in connecting all sessions established by TPump LOGTABLE Identifies the table to be used for journaling checkpoint information required for safe, automatic restart of the TPump support environment in the event of a client or Teradata Database hardware platform failure. NAME Sets the variable SYSJOBNAME to the jobname string specified. The jobname string can be up to 16 bytes in length and can contain kanji characters. ROUTE Identifies the destination of output produced by the TPump support environment. RUN FILE Invokes the specified external source as the current source of commands and statements. SET Assigns a data type and a value to a utility variable. SYSTEM Suspends TPump to issue commands to the local operating system. Task 28 BEGIN LOAD Specifies the kind of TPump task to be executed, the target tables to be used, and the parameters for executing the task FIELD Defines a field of the data source record Used with LAYOUT command DML Defines a label and error treatment option(s) for the Teradata SQL DML statement(s) following the DML command END LOAD Indicates completion of TPump command entries and initiates execution of the task Teradata Parallel Data Pump Reference Chapter 1: Overview TPump Commands Table 2: TPump Commands (continued) Activity TPump Command Function FILLER Defines a field in the data source that is not sent to the Teradata Database. Used with LAYOUT command IMPORT Identifies the data source, the layout, and the DML operation(s) to be performed, with optional conditions for performing these operations LAYOUT Introduces the record format of the data source to be used in the TPump task This command is followed by a sequence or combination of FIELD, FILLER, and TABLE commands. PARTITION Establishes session partitions to transfer SQL requests to the Teradata Database TABLE Identifies a table whose column names and data descriptions are used as the field names and data descriptions of the data source records Used with LAYOUT command Teradata SQL Statements Teradata SQL statements define and manipulate the data stored in the Teradata Database. TPump supports a subset of Teradata SQL statements so you do not need to invoke other utilities to perform routine database maintenance functions before executing TPump utility tasks. You can, for example, use the supported Teradata SQL statements to: • Create the table that you want to load • Establish a database as an explicit table name qualifier • Add checkpoint specifications to a journal table The Teradata SQL statements supported by TPump are summarized in Table 3. TPump supports only the Teradata SQL statements listed in this table. To use any other Teradata SQL statements, you must enter them from another application, such as BTEQ. The subset of Teradata SQL supported by the TPump support environment excludes usergenerated transactions (BEGIN TRANSACTION; END TRANSACTION;). Table 3: Supported Teradata SQL Statements in TPump Teradata SQL Statement Function ALTER TABLE Changes the column configuration or options of an existing table CHECKPOINT Adds checkpoint entry to a journal table COLLECT STATISTICS Collects statistical data for one or more columns of a table COMMENT Stores or retrieves a comment string associated with a database object Teradata Parallel Data Pump Reference 29 Chapter 1: Overview TPump Commands Table 3: Supported Teradata SQL Statements in TPump (continued) Teradata SQL Statement Function CREATE DATABASE CREATE MACRO CREATE TABLE CREATE VIEW Creates a new database, macro, table, or view DATABASE Specifies a new default database for the current session DELETE Removes rows from a table DELETE DATABASE Removes all tables, views, and macros from a database DROP DATABASE Removes an empty table from the Teradata Database EXECUTE Specifies a user-created (predefined) macro for execution. The macro named in this statement resides in the Teradata Database and specifies the type of DML statement (INSERT, UPDATE, DELETE, or UPSERT) being handled by the macro. GIVE Transfers ownership of a database to another user GRANT Grants access privileges to a database object INSERT Inserts new rows to a table MODIFY DATABASE Changes the options of an existing database RENAME Changes the name of an existing table, view, or macro REPLACE MACRO REPLACE VIEW Redefines an existing macro or view REVOKE Rescinds access privileges to a database object SET QUERY_BAND Sets the query band for a session and transaction Note: The statement can be used in two ways: SET QUERY_BAND = 'Document=XY1234; Universe=East;' FOR SESSION; SET QUERY_BAND = NONE FOR SESSION; SET SESSION COLLATION Overrides the collation specification for the current session SET SESSION OVERRIDE REPLICATION ON/OFF Turn on/off replication service UPDATE Statement and Atomic Changes the column values of an existing row in a table Upsert TPump supports statements starting with anything in the above list only in the sense that it submits them to the Teradata Database and deals with the success, failure, or error response. TPump rejects as unsupported any statements beginning with anything not in the above list and does not submit them to the Teradata Database. While restarting, only DATABASE and SET statements are reexecuted. The existence of a log table causes TPump on the client to execute its restart logic. 30 Teradata Parallel Data Pump Reference Chapter 1: Overview The TPump Task Note that, although SET is in the list, the only SET statements truly supported are the Teradata SQL SET statements: SET SESSION COLLATION and SET SESSION DATABASE. Any other SET statement passed through to the Teradata Database is rejected. Teradata SQL statements from the input command file are sent to the Teradata Database for execution via CLIv2. Pertinent information returned in SUCCESS, FAILURE, or ERROR parcels is listed in the message destination. Caution: Do not issue a DELETE DATABASE statement to delete the database containing the restart log table because this terminates the TPump job. See “Reinitializing a TPump Job” on page 55 for restart instructions if the restart log table is accidentally dropped. Support environment statements may be executed between invocations of TPump tasks. These include DATABASE, CHECKPOINT, and CREATE TABLE statements. The BEGIN LOAD command then starts a TPump task script. You may direct the action of TPump by commands and DML statements retrieved from an external source. The data source for these commands and statements may be specified in the TPump RUN FILE command, if one is used. The TPump support environment parses lines that begin with a period as commands. The period distinguishes commands from Teradata SQL statements, which are passed to the Teradata Database without parsing. More than one statement per line is not allowed but statements can span multiple lines. TPump follows the same rules as standard Teradata SQL for OPERATIONS on NULL. Refer to Teradata Database SQL Reference: Fundamentals for more information about using Teradata SQL statements. The TPump Task The TPump task is designed for the batch application of client data to one or more tables on the Teradata Database via DML commands and statements (INSERT, UPDATE, or DELETE). TPump executes these DML statements in multiple-record requests. The following information provides more information about the TPump task: • Task Limits • DML Commands • Upsert Feature • TPump Macros • Locks • Access Rights • Fallback vs. Nonfallback Tables Teradata Parallel Data Pump Reference 31 Chapter 1: Overview The TPump Task Task Limits TPump supports only single-row, primary index operations. Up to 600 of these operations can be packed into a single request for network efficiency. The 600-statement upper limit is arbitrary and may actually be lower for statements associated with large data parcels that may exceed the overall limit of 64 KB for a request, or where a statement itself is very long. DML Commands DML commands appear with their associated INSERT, UPDATE, or DELETE DML statements, together with the IMPORT commands that identify data to be read from the client. TPump DML statements support a conditional apply logic similar to MultiLoad, in which DML statements are applied based on record field contents. Specified DML statements following a DML command apply data from one or more separate data sources. The data sources contain a record for each table row to which one or more statements apply. Each IMPORT command identifies a separate data source, and references LAYOUT and DML commands. The IMPORT command matches records of the data source to the applicable DML statement or statements by means of its APPLY clauses. The LAYOUT command defines the layout of the records of a data source, using the parameters and a sequence of FIELD, FILLER, and TABLE commands. The DML command identifies an immediately following set of one or more DML statements. Each DML statement is converted into a macro and used for the duration of the import. As TPump reaches the end of one data source, as identified by the IMPORT command, it continues with the next IMPORT command. Upsert Feature TPump’s upsert feature is a composite of UPDATE and INSERT functionality applied to a single row. TPump upsert logic is similar to that used in MultiLoad, the only other load utility with this feature. The DML statements required to execute each iteration of upsert are a single UPDATE statement, followed by a single INSERT statement. With upsert, if the UPDATE fails because the target row does not exist, TPump automatically executes the INSERT statement. This capability can save considerable loading time by completing this operation in a single pass instead of two. TPump Macros Before beginning a load, TPump creates equivalent macros on the RDBMS, based on the actual DML statements. That is, for every INSERT, UPDATE, DELETE, and UPSERT statement in the DML statement, TPump creates an equivalent macro for it. These macros are then executed iteratively, in place of the actual DML statement, when an import task begins, and are removed when all import tasks are complete. The use of macros in place of lengthy requests helps to minimize network and parsing overhead. 32 Teradata Parallel Data Pump Reference Chapter 1: Overview The TPump Task For greater efficiency, TPump also supports the use of predefined macros, rather than creating macros from the actual DML statements. A predefined macro is created by the user and resides on the RDBMS before a TPump import task begins. When a predefined macro is used, TPump uses this macro directly instead of creating another macro. The use of predefined macros allows TPump to avoid the overhead of creating/dropping macros internally, and also to avoid modifying the data dictionary on the Teradata Database during the job run. TPump uses the EXECUTE command to support predefined macros. For more information on using predefined macros, refer to the EXECUTE command in this manual. For more information about creating a macro, see the Teradata Database SQL Reference: Data Definition Statements. For more information about executing a macro, see the Teradata Database SQL Reference: Data Manipulation Statements. Locks TPump uses conventional row hash locking, which allows for some amount of concurrent read and write access to its target tables. At any point, TPump can be stopped, making the target tables fully accessible. If TPump is stopped, however, depending on the nature of the update process, the relational integrity of the data may be compromised. Although TPump always uses conventional row hash locking, based on the nature of SQL statements used in the TPump job and the status of the target tables, a TPump job may introduce other levels of locking in a job run. For example, if a target table of a TPump job has a trigger defined and this trigger uses table-level locking when it is triggered, this TPump job may cause a table level-locking if such a trigger is triggered during the run. The TPump script developer should be familiar with the property of the database on which the TPump job will run and be aware of such possibilities. Access Rights TPump users must have access rights on the database containing the log restart table because TPump orchestrates the creation of macros to use during the task. Dropping the log table makes it impossible to restart a TPump job. Dropping the macros or the error table makes it very difficult to restart a TPump job. TPump does not have any special protections on database objects it creates. Therefore, it is the responsibility of TPump administrators and users to ensure that access rights on databases used by TPump have been established. Most of the access rights for TPump are intuitive. For example: • CREATE TABLE is required on the database where the log table is placed. • CREATE TABLE is required on the database where the error table is placed. • CREATE/DROP MACRO is required on the database where macros are placed. • EXECUTE MACRO is required on the database where the macros are placed. Teradata Parallel Data Pump Reference 33 Chapter 1: Overview The TPump Task Macros The use of macros slightly complicates the access rights for TPump. The remaining access rights necessary to run a TPump job have two different scenarios. 1 Where a TPump macro is placed in the same database as the table which it affects, required rights are INSERT/UPDATE/DELETE on the table affected, corresponding to the DML executed. 2 Where a TPump macro is placed in a different database from the table it affects, required rights specify that the database where the macro is placed must have INSERT/UPDATE/ DELETE, WITH GRANT in the table affected, corresponding to the DML executed. You must also have EXECUTE MACRO on the database where the macro is placed. Note that when the TPump job uses EXEC to directly specify a macro, the access rights scenarios are the same, except that you do not need the CREATE/DROP MACRO privilege since the macro exists both before and after the job. Tables You must have the corresponding INSERT, UPDATE, or DELETE privilege for each table to be changed by the TPump task. Multiple tables can be targeted by a single TPump job. The BEGIN LOAD command invokes TPump to execute task processing. Any statement of this task applies each matching imported data record to each of its target table rows having the specified index value. TPump supports all table types. Unlike MultiLoad, there are no forbidden index types. Thus, the tables may be either empty or populated, as well as being either with or without secondary indices. All required data is imported; none is obtained from tables already existing in the Teradata Database. No statement of an IMPORT task may make any reference to a table or row other than the one affected by the statement. All INSERT statements, when considered in conjunction with each applicable imported record, must explicitly specify values for all columns except those for which a default value (including null) is defined. All UPDATE and DELETE statements, when considered in conjunction with each applicable imported record, must explicitly specify values for all columns of the primary index. In order to fulfill this requirement for UPDATE and DELETE statements, you must supply a series of ANDed terms of either form: column_reference = colon_variable_reference or column_reference = constant TPump does not process UPDATE and DELETE statements that contain ORed terms because TPump must hash the imported records with a value from the import file (or with a NULL). Any attempt to use an OR with these statements causes TPump to fail. You can work around this by simply creating two separate DML statements and applying them conditionally. TPump imposes some restrictions on the updates of an IMPORT task. It rejects updates that try to change the value of the primary index of a row, but accepts even reflexive updates of other columns. A reflexive update of a column computes the new value as the result of an expression that involves the current value of one or more columns. 34 Teradata Parallel Data Pump Reference Chapter 1: Overview The TPump Task TPump processes and validates all statements from the BEGIN LOAD through the END LOAD statements. TPump control and processing sessions are established and Teradata SQL requests are transmitted to the Teradata Database. TPump creates a single error table and a set of macros, one for each DML statement. Nothing protects target tables from concurrent access. TPump imports data, evaluating each record according to specified apply conditions. For each satisfied apply condition, a record is sent to the Teradata Database. If the record causes an error, this sequence number is available in the error table so that the record can be identified. When the task completes, all locks are released, all macros dropped and, if empty, the error table is dropped. Statistics concerning the outcome of the IMPORT task are reported. Access logging can cause a severe performance penalty. If all successful table updates are logged, a log entry is made for each operation. The primary index of the access logging table may then create the possibility of row hash conflicts. Fallback vs. Nonfallback Tables Target tables can be either fallback or nonfallback. The differences between, and characteristics of, these tables are listed in Table 4: Table 4: Comparison of Fallback and Nonfallback Target Tables Fallback Tables Nonfallback Tables TPump task continues to execute even if AMPs are down, as long as there is not more than one AMP down, either logically or physically in a cluster. If one or more AMPs are down prior to entering the task and if one or more target tables are nonfallback, TPump terminates. If two or more AMPs in a cluster are logically or physically down, or both, the task does not run, or terminates if running. The TPump task may be restarted as soon as all AMPs are back up. During the task, if AMPs are down to the extent that data on the DSU is corrupted, the affected tables must be restored. If an AMP goes down once the task has started, the task cannot be restarted until all AMPs are back up. Not applicable. If more than one AMP in the same cluster is down, the Teradata Database cannot come up. Not applicable. Certain I/O errors during the task corrupt the target table so that it must be restored. In this case, TPump terminates. Teradata Parallel Data Pump Reference 35 Chapter 1: Overview The TPump Task 36 Teradata Parallel Data Pump Reference CHAPTER 2 Using TPump This chapter provides detailed information about using the TPump utility. Topics include: • Invoking TPump • Terminating TPump • Restarting and Recovery • Programming Considerations • Writing a TPump Job Script • Viewing TPump Output • Monitoring TPump Jobs • Estimating Space Requirements Invoking TPump TPump Support Environment This section describes those TPump functions that are invoked from the TPump support environment on the client system. The TPump support environment is a platform from which TPump and a number of standard Teradata SQL, DDL, and DML operations can be performed. This client program includes a facility for executing Teradata SQL and is separate from TPump tasks that run in the Teradata Database. TPump support environment functionality includes: • Providing access to the data manipulation and data definition functions of the Teradata SQL language. • User-defined variables and variable substitution. • System-defined variables (for example, date and time). • Conditional execution based on the value of return codes and variables. • Expression evaluation. • Redirection of command input. • Runtime parameters for TPump invocation, foreign language support, and error logging functions. • Character set selection options for IBM mainframe and UNIX client-based systems. The TPump support environment allows you to prepare for an initial invocation or resumption of a TPump task without having to invoke multiple distinct utilities. For example, Teradata Parallel Data Pump Reference 37 Chapter 2: Using TPump Invoking TPump you may need to create the table that is to be loaded, establish a database as an implicit tablename qualifier, or checkpoint the relevant permanent journal. Any statement not preceded by a period (.) is assumed to be a Teradata SQL statement and is sent to the Teradata Database to be processed. An object name in an Teradata SQL statement may contain Katakana or multibyte characters when the appropriate character set is used. The TPump support environment interprets the commands and statements that define the job. It also controls the execution of those commands and manages recovery from the Teradata Database or client failures during processing. Those commands not directly involved in defining a TPump task, but providing supportive functions (routing output, for example), are considered TPump support commands. These are individually executed as they are encountered. The commands that define a single TPump task are processed by the client as a single unit. These are considered to be TPump task commands. The actual execution of the TPump task is deferred until all pertinent task commands have been considered and validated by the client program. Support Environment Input/Output Support environment I/O appears in the following forms: • Command and statement input (default = SYSIN/stdin) • Accept command input from file • Command and statement output (default = SYSPRINT/stdout) • Display command output (default = SYSPRINT/stdout) • Error output (default = SYSPRINT/stdout) Note: For IBM statement input, the default is initially the user-provided invocation parameter (JCL PARM), if specified. After all commands and nested files in the parameter are processed, the default is SYSIN. SYSPRINT/stdout Output The characteristics of SYSPRINT output for VM/MVS and UNIX standard output (stdout) are: 38 • The first five positions of each output line are reserved. They contain a statement number if the line is the beginning of a TPump statement. This also applies to comments preceding TPump statements. • If the output line is a TPump-generated message, the first five positions contain the string ****. • In all other cases, the first five positions are blank. • A message indicating the processing start date appears at the beginning of every job. • TPump-generated messages are preceded by a header displaying system time. This timestamp appears on the same line as the message and follows the **** string. Teradata Parallel Data Pump Reference Chapter 2: Using TPump Invoking TPump Example This example depicts each type of SYSPRINT/stdout output line noted in the previous list. **** 13:57:16 UTY6633 WARNING: No configuration file, using build defaults ======================================================================== = = = Teradata Parallel Data Pump Utility Release 12.00.00.00 = = Platform MP-RAS = = = ======================================================================== = = = Copyright 1997-2007, NCR Corporation. ALL RIGHTS RESERVED. = = = ======================================================================== **** 13:57:16 UTY2411 Processing start date: FRI MAY 04, 2007 ======================================================================== = = = Logon/Connection = = = ======================================================================== 0001 .LOGTABLE sfdlogtable; 0002 .LOGON 9/sfd,; **** 13:57:19 UTY8400 Teradata Database Release: 12.00.00.00 **** 13:57:19 UTY8400 Teradata Database Version: 12.00.00.00 **** 13:57:19 UTY8400 Default character set: ASCII **** 13:57:19 UTY8400 Maximum supported buffer size: 1M **** 13:57:19 UTY8400 Upsert supported by RDBMS server **** 13:57:24 UTY6211 A successful connect was made to the RDBMS. **** 13:57:24 UTY6217 Logtable 'sfd.sfdlogtable' has been created. ======================================================================== = = = Processing Control Statements = = = ======================================================================== 0003 /*****************************************************************/ /* Test handling multiple TPump tasks. */ /*****************************************************************/ create table ImpX01A ( f1 char(1), f2 char(2), f3 char(3) ); **** 13:57:26 UTY1016 'CREATE' request successful. 0004 .begin LOAD sessions 4 1 pack 10 robust off serialize off checkpoint 30 nomonitor errortable ImpX01A_errtbl; ======================================================================== = = = Processing TPump Statements = = = ======================================================================== 0005 .layout Lay1; 0006 .field f1 * char(1); 0007 .field f2 * char(2); 0008 .field f3 * char(3); Teradata Parallel Data Pump Reference 39 Chapter 2: Using TPump Invoking TPump 0009 .dml label dml1; 0010 insert ImpX01A (f1, f2, f3) values ( :f1, :f2, :f3); 0011 .import infile dat01 layout lay1 apply dml1; 0012 .end LOAD; **** 13:57:27 UTY6609 Starting to log on sessions... **** 13:57:27 UTY6610 Logged on 4 sessions. ======================================================================== = = = TPump Import(s) Beginning = = = ======================================================================== **** 13:57:27 UTY6630 Options in effect for following TPump Import(s): . Tenacity: 4 hour limit to successfully connect load sessions. . Max Sessions: 4 session(s). . Min Sessions: 1 session(s). . Checkpoint: 30 minute(s). . Errlimit: No limit in effect. . Restart Mode: SIMPLE. . Serialization: OFF. . Packing: 10 Statements per Request. . StartUp Rate: UNLIMITED Statements per Minute. **** 13:57:31 UTY6608 Import 1 begins. **** 13:57:36 UTY6641 Since last chkpt., 200 recs. in, 200 stmts., 20 reqs **** 13:57:36 UTY6647 Since last chkpt., avg. DBS wait time: 0.25 **** 13:57:36 UTY6612 Beginning final checkpoint... **** 13:57:36 UTY6641 Since last chkpt., 200 recs. in, 200 stmts., 20 reqs **** 13:57:36 UTY6647 Since last chkpt., avg. DBS wait time: 0.25 **** 13:57:36 UTY6607 Checkpoint Completes with 200 rows sent. **** 13:57:36 UTY6642 Import 1 statements: 200, requests: 20 **** 13:57:36 UTY6643 Import 1 average statements per request: 10.00 **** 13:57:36 UTY6644 Import 1 average statements per record: 1.00 **** 13:57:36 UTY6645 Import 1 statements/session: avg. 50.00, min. 50.00, max. 50.00 **** 13:57:36 UTY6646 Import 1 requests/session: average 5.00, minimum 5.00, max imum 5.00 **** 13:57:36 UTY6648 Import 1 DBS wait time/session: avg. 1.25, min. 0.00, max. 3.00 **** 13:57:36 UTY6649 Import 1 DBS wait time/request: avg. 0.25, min. 0.00, max. 0.60 **** 13:57:36 UTY1803 Import processing statistics . IMPORT 1 Total thus far . ========= ============== Candidate records considered:........ 200....... 200 Apply conditions satisfied:.......... 200....... 200 Errors loggable to error table:...... 0....... 0 Candidate records rejected:.......... 0....... 0 ** Statistics for Apply Label : DML1 Type Database Table or Macro Name Activity I sdf ImpX01A 200 **** 13:57:37 UTY0821 Error table sdf.ImpX01A_errtbl is EMPTY, dropping table. ======================================================================== = Logoff/Disconnect = ======================================================================== **** 13:57:45 UTY6216 The restart log table has been dropped. **** 13:57:45 UTY6212 A successful disconnect was made from the RDBMS. **** 13:57:45 UTY2410 Total processor time used = '0.270389 Seconds' . Start : 13:57:16 - MON JUNE 25, 2007 40 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Invoking TPump . End : 13:57:45 - MON JUNE 25, 2007 . Highest return code encountered = '0'. turn code encountered = '0'. File Requirements Certain files are required for invoking TPump. In addition to the input data source, TPump accesses four different data sets/files or input/output devices: Data Set/File or Device Provides standard input TPump commands and Teradata SQL statements that make up your TPump job standard output Destination for TPump output responses and messages standard error Destination for TPump errors configuration Optional specification of TPump utility default values When running TPump in interactive mode, the keyboard functions as the standard input device and the display screen is the standard output/error device. When running TPump in batch mode, you must specify a data set or file name for each of these functions. The method of doing this varies, depending on the configuration of your client system: • On network-attached client systems, use the standard redirection mechanism (< infilename and > outfilename) to specify the TPump files when you invoke the utility. • On channel-attached client systems, use standard VM EXEC or MVS JCL control statements (FILEDEF and DD) to allocate and create the TPump data sets or files before you invoke the utility. On IBM Mainframe Client-Based Systems Start TPump with a RUN FILE command, with optional invocation parameters, such as JCL PARM. These are interpreted as a string of TPump support environment commands, separated by, and ending with, semicolons. After invocation, the first two commands executed must be LOGON and LOGTABLE. These commands are required and are permitted only once. Either can be supplied in the command string invoking TPump, and the other (or both) can appear in the INPUT file, or in a file called with the RUN FILE command. No commands can precede the LOGON, LOGTABLE, or RUN FILE commands. If you do not use a RUN FILE command to specify an initial source of commands and Teradata SQL statements, TPump defaults to the conventional source of control input, such as SYSIN. If a RUN FILE command is found in the parameter (PARM) input, the input source it identifies is used prior to SYSIN. Whether the input source is specified by RUN FILE, or by Teradata Parallel Data Pump Reference 41 Chapter 2: Using TPump Invoking TPump SYSIN, processing continues until a LOGOFF command, the end of control input, or a terminating error is encountered, whichever occurs first. If all input is exhausted without encountering a LOGOFF command, or if the program terminates because of an error, TPump automatically performs the LOGOFF function. The LOGON command establishes a Teradata SQL session that TPump uses for processing. The LOGTABLE command specifies a table to be used as the restart log in the event of failure. This table is placed in the default database unless otherwise specified. You must have CREATE TABLE, INSERT, UPDATE, and SELECT rights on the database containing the restart log table. Preparatory statements, which are processed by the Teradata SQL-processing function of TPump, must be executed before beginning a TPump task. It is here that any desired DATABASE statement and any desired CREATE TABLE statements are specified. At this point, a BEGIN LOAD command initiates a TPump task. On UNIX- and Windows-based Systems Start the TPump utility on Teradata Database for UNIX and Windows with a UNIX-style command format. The rules for invoking TPump under UNIX are the same as for IBM mainframe client-based systems described in the preceding section. The difference lies in UNIX syntax. The Windows syntax and the UNIX syntax are essentially the same, the main difference being that single quotes should be used on UNIX systems and double quotes should be used on Windows systems. In Interactive Mode To invoke TPump in interactive mode, enter tpump at your system command prompt: tpump TPump displays the following message to begin your interactive session: ================================================================ = = = Teradata Parallel Data Pump Utility Release mm.mm.mm.mmm = = Platform xxxxx = = = ================================================================ where • mm.mm.mm.mmm is the release level of your TPump utility software and • xxxxx is the platform on which the TPump utility software is running. In Batch Mode This section covers invoking TPump in batch mode on network-attached and channelattached systems. 42 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Invoking TPump For a description of how to read the syntax diagrams used in this book, see Appendix A: “How to Read Syntax Diagrams.” In Batch Mode on Network-attached UNIX Systems Refer to the runtime parameter descriptions in Table 5 on page 45 and use the following syntax to invoke TPump on network-attached UNIX client systems: tpump -b c charactersetname < infilename > outfilename -C filename -d periodicityvalue -e errorfilename -f numberofbuffers -m -r 'tpump command' -v -y -i scriptencoding -u outputencoding -t nn -V 3021E016 In Batch Mode on Network-attached Windows Systems Refer to the runtime parameter descriptions in Table 5 on page 45 and use the following syntax to invoke TPump on network-attached Windows client systems: tpump -b -c charactersetname -C filename < infilename > outfilename -d periodicityvalue -e errorfilename -f numberofbuffers -m -r "tpumpcommand" -v -y -i scriptencoding -u outputencoding -t nn -V 3021E015 Note: The Windows syntax is essentially the same as the UNIX system, the main difference being that single quotes should be used on UNIX systems and double quotes should be used on Windows systems. Teradata Parallel Data Pump Reference 43 Chapter 2: Using TPump Invoking TPump In Batch Mode on Channel-attached MVS Systems Refer to the runtime parameter descriptions in Table 5 on page 45 and use the following syntax to invoke TPump on channel-attached MVS client systems. // EXEC TDSTPUMP PARM = , , BRIEF BUFFERS = numberofbuffers CHARSET = charactersetname CONFIG = filename ERRLOG = filename MACROS PRDICITY = periodicityvalue VERBOSE 'tpump command' RTYTIMES = nn 3021D014 RVERSION In Batch Mode on Channel-attached VM Systems Refer to the runtime parameter descriptions in Table 5 on page 45 and use the following syntax to invoke TPump on channel-attached VM client systems. EXEC TPUMP BRIEF , BUFFERS = numberofbuffers CHARSET = charactersetname CONFIG = filename ERRLOG = filename MACROS PRDICITY = periodicityvalue VERBOSE 'tpump command' RTYTIMES = nn RVERSION 3021D013 Note: On VM, you must use the following statement before the EXEC LOAD statement: "GLOBAL LOADLIB DYNAMC" Run-time Parameters Table 5 describes the run-time parameters used by TPump on channel-attached and networkattached systems. 44 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Invoking TPump Table 5: Run-time Parameters Run-time parameter/systems Channel-attached Network-attached Description BRIEF -b Specifies reduced print output, which limits TPump printout to the minimal information required to determine the success of the job: • • • • • BUFFERS = numberofbuffers -f numberofbuffers Header information Logon/logoff information Candidate records Insert, update, and delete results Error table counts Sets the number of request buffers. For Teradata Tools and Utilities 06.02.00 and earlier, you can set the buffers runtime parameter from 2 to a maximum of 10. The default value is 2. Beginning with Teradata Tools and Utilities 06.02.00.01, you can set the buffers runtime parameter with a lower limit of 2 and no upper limit. The default value is 3. The maximum number of request buffers that may be allocated is BUFFERS * session_count. Beginning with Teradata Tools and Utilities 06.02.00.01, request buffers are a global resource, so buffers are assigned to any session as needed, and then returned to a free pool. At any point in time, the number of request buffers assigned to a session can vary from zero to BUFFERS * session_count. Prior to Teradata Tools and Utilities 06.02.00.01, a request buffer was permanently owned by the session to which it was first assigned, and so could not be used by any other session. The maximum number of requests buffers that a session could own was determined by the value of BUFFERS. CHARSET = charactersetname -c charactersetname Defines a character set for the TPump job. The character set specification remains in effect for the entire TPump job, even if the Teradata Database server resets, causing the TPump job to be restarted. Note: The character set specification does not remain in effect if the client system fails, or if you cancel the TPump job. In these cases, when you resubmit the job, you must use the same character set specification that you used on the initial job. If you use a different character set specification when you resubmit such a job, the data loaded by the restarted job will not appear the same as the data loaded by the initial job. Teradata Parallel Data Pump Reference 45 Chapter 2: Using TPump Invoking TPump Table 5: Run-time Parameters (continued) Run-time parameter/systems Channel-attached Network-attached Description If you do not enter a character set specification, the default is whatever character set that is specified for the Teradata Database when you invoke TPump. Note: See Client Character Sets in Chapter 1 for more information on supported character sets. When using a UTF16 client character set on the network or UTF8 client character set on the mainframe, specify the client character set name by the runtime parameter (that is, "-c" on the network and "CHARSET" on the mainframe.) Not Applicable -i scriptencoding Specifies the encoding form of the job script. If this parameter is not specified and the client character set is UTF16, TPump interprets the job script to UTF16. If character-type data is also specified in the script, TPump converts the string literals and the corresponding field in the import data to the same character set before comparing or concatenating them. (String literals are specified with .APPLY…WHERE….; LAYOUT…CONTINUEIF….; FIELD…NULLIF…; FIELD…||…commands.) Valid encoding options are: • UTF8 • UTF16-BE • UTF16-LE • UTF16 The specified encoding character set applies to all script files included by the .RUN FILE commands. The UTF16 or UTF 8 Byte Order Mark (BOM) can be present or absent in the script file. When UTF16 BOM is present and 'UTF16' is specified, TPump interprets the script according to the endianness indicated by the UTF16 BOM. When the UTF16 BOM is not present, TPump interprets the script according to the endianness indicated by the encoding option. Not Applicable -u outputencoding Specifies the encoding form of the job output. The parameter is valid only when the UTF16 client character set is used. When this parameter is used, it should be placed in front of other runtime parameters to ensure the whole job output is printed in the desired encoding form. 46 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Invoking TPump Table 5: Run-time Parameters (continued) Run-time parameter/systems Channel-attached Network-attached Description If is not placed ahead of other runtime parameters when invoking the job, a warning message will be printed. Available output encoding options are: • UTF16-BE • UTF16-LE • UTF16 UTF16-BE instructs TPump to print the job output in big endian UTF16 encoding scheme. UTF-LE instructs TPump to print the job output in little endian UTF16 encoding scheme. On big endian client systems, UTF16 instructs TPump to print the job output in big endian UTF16 encoding scheme. On little endian client systems, UTF16 instructs TPump to print the job output in little endian UTF16 encoding scheme. The UTF16 BOM is not printed as a part of job output. When this parameter is not specified and the client character set is UTF16, if TPump output needs to be redirected to a log file on network platforms, “-u outputencoding” must be specified. CONFIG = filename -C filename Specifies the configuration file for the TPump job. The configuration file contains various configuration and tuning parameters for TPump. The particular usefulness of this file is for values that: • are site- or host-specific • script that developers may not necessarily be aware of • will likely change independently of TPump scripts. The installation of TPump installs a default configuration file. On UNIX, it also installs a shell script that calls TPump using the default configuration file on the command line. The format of the entries in the file is: <keyword> <value> • Lines in the file that do not begin with a valid keyword are ignored. • Keywords are case insensitive. • On UNIX systems, this file is called tdatpump.cfg and is expected to be found in the directory /usr/lib. • If the configuration file is not found, the program issues a warning message and uses the default values wherever necessary. Teradata Parallel Data Pump Reference 47 Chapter 2: Using TPump Invoking TPump Table 5: Run-time Parameters (continued) Run-time parameter/systems Channel-attached Network-attached Description At this time, the only valid keyword is INMEMSORT, which is an integer data type containing the maximum number of bytes that can be sorted in memory. TPump recovery logic uses this value. This keyword can be modified if you want to increase the amount of memory available for sorting. If this keyword is not provided in the configuration file, or the configuration file is not provided, the default value for INMEMSORT is 6,000,000 for UNIX, 12,000,000 for VM and MVS, and 3,000,000 for Windows. PRDICITY periodicityvalue -d periodicityvalue Specifies to change the periodicity value to control the rate at which statements are transferred to the RDBMS. This parameter may be adjusted to improve the TPump workflow. This parameter is in effect whenever the BEGIN LOAD command uses the RATE parameter to control the rate that statements are sent to the RDBMS. The default periodicity value is four 15-second periods per minute. The periodicityvalue variable contains a value between 1 and 600, which is the value range for the number of periods specified. The default value is 4. Alternatively, periodicity can be changed by executing the PumpMacro.UserUpdateSelect macro (provided with TPump Monitor SQL scripts) to update the monitor interface table while the job is running. ERRLOG= errorf ilename -e errorfilename Specifies to use the error logging function. Using this parameter creates an alternate error log file to hold messages generated by TPump. Specifying an alternate file name produces a duplicate record of all TPump error messages. This allows you to examine any errors detected without having to go through the entire output stream. The errorfilename you define is the location in which you want to copy error messages. You can also include directory identifiers in the file names you define. On UNIX, the maximum length of the file name depends on the UNIX version currently in use. On channel-attached client systems, the alternate file specification is limited to eight characters and: • On MVS, it must be a DD name defined in the JCL • On VM, it must be an existing file definition (FILEDEF) 48 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Invoking TPump Table 5: Run-time Parameters (continued) Run-time parameter/systems Channel-attached Network-attached Description Note: If the file names that you define already exist, they are overwritten. Otherwise, they are automatically created. There is no default error log errorfilename specification. MACROS -m Invocation option to tell TPump to keep macros that were created during the job run. These macros can be used as predefined macros for the same job. In order to use the same script after the -m parameter is used in the previous run, the EXECMACRO command must be added to the script. To avoid duplicate macro names, a random number from 1 to 99 is used in each macro name when the NAME command is not used. The format in which the macro is created is: MYYYYMMDD_HHMMSS_LLLLL_DDD_SSS where • LLLLL is the low-order 5 digit of the logon sequence returned by the dbs from the .LOGON command. • DDD is the .DML sequence (ordinal) number. This value is not reset to one for successive loads (.BEGIN LOAD) in a single job, but continues to be incremented. • SSS is the SQL statement sequence (ordinal) number within the .DML group. RTYTIMES=nn -t nn Specification for retry times. The default for nn is 16. If nn = 0, retry times will be set back to 16. The retry times options in the BEGIN LOAD can override this option for the requests/data sent between "BEGIN LOAD" and "END LOAD" pair. ‘tpump command’ -r 'tpump command' Invocation option that can signify the start of a TPump job. This is usually a RUN FILE command specifying the file containing your TPump job script because only one tpump command may be specified. For example, on UNIX: ’.RUN FILE tpump.script;’ Teradata Parallel Data Pump Reference 49 Chapter 2: Using TPump Invoking TPump Table 5: Run-time Parameters (continued) Run-time parameter/systems Channel-attached Network-attached Description VERBOSE -v Specifies to turn on verbose mode. Using this parameter provides additional statistical data in addition to the regular statistics. In verbose mode, the input file from which statistics are normally displayed includes such additional statistics as the number of RDBMS requests sent, in addition to the normal number of requests. Note: In verbose mode, TPump displays each retryable error when it occurred. Not Applicable -y Specification for the data encryption option. When specified, data and requests will be encrypted in all sessions used by the job. The encryption options in the BEGIN LOAD or the PARTITION commands can override this option for the sessions associated with those commands Not Applicable < infilename Name of the standard input file containing your TPump commands and Teradata SQL statements. Your infilename specification redirects the standard input (stdin). If you do not enter an infilename specification, the default is stdin. If end-of-file is reached on the specified input file, the input does not refer to stdin and the job terminates. Note: On channel-attached client systems, you must use the FILEDEF or DD control statement to specify the input file before you invoke TPump. Not Applicable > outfilename Name of the standard output file for TPump messages. Your outfilename specification redirects the standard output (stdout). If you do not enter an outfilename specification, the default is stdout. Note: If you use an outfilename specification to redirect stdout, do not use the same outfilename as an output or echo destination in the DISPLAY or ROUTE commands. Doing so produces incomplete results because of the conflicting write operations to the same file. Note: On channel-attached client systems, you must use the FILEDEF or DD control statement to specify the output file before you invoke the utility. RVERSION -V Display version number and stop. Note: See the invocation examples in Appendix B: “TPump Examples” for sample JCL listings, commands, and output samples for the invocation options. 50 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Terminating TPump Examples - Redirection of Inputs and Outputs The following examples show various ways to redirect stdin and stdout via UNIX. Example 1 tpump </home/tpuser/tests/test1 >/home/tpuser/tests/out1 This example specifies both an input file and an output file. The TPump script is in /home/tpuser/tests/test1 and the job output is written to /home/tpuser/tests/out1. Example 2 tpump </home/tpuser/tests/test1 This example specifies only an input file. The TPump script is in /home/tpuser/tests/test1 and the job output is written to stdout, which ordinarily would be your terminal. Example 3 tpump >/home/tpuser/tests/out1 This example specifies only an output file. You enter the TPump script via stdin, normally at your terminal. When input is complete, type Control-D to indicate end-of-file. Type Control-D by simultaneously pressing the Control key and the letter D. The job output is written to /home/tpuser/tests/out1. Example 4 tpump This example specifies neither an input nor an output file. TPump commands are typed at your terminal via stdin and job output is written to your terminal via stdout. Terminating TPump This section covers methods of termination and other topics related to terminating TPump. There are two ways to terminate TPump: • Normal termination • Abort termination Normal Termination Use the TPump LOGOFF command in your TPump job script to terminate the utility normally on both network- and channel-attached client systems: ; .LOGOFF retcode HE03A003 Teradata Parallel Data Pump Reference 51 Chapter 2: Using TPump Terminating TPump TPump logs off all sessions with the Teradata Database and returns a status message indicating: • The total processor time that was used • The job start and stop date/time • The highest return code that was encountered: • 0 if the job completed normally • 4 if a warning condition occurred • 8 if a user error occurred • 12 if a fatal error occurred • 16 if no message destination is available TPump also: • Either maintains or drops the restart log table, depending on the success or failure of the job. • If specified, returns the optional retcode value to your client operating system. See the LOGON command description in Chapter 3 for more information about return codes and the conditions that maintain or drop the restart log table. Abort Termination The procedure for aborting a TPump job depends on whether the utility is running on a network-attached or a channel-attached client system: To abort a TPump job running on a channel-attached client system ✔ Cancel the job from the client system console. To abort a TPump job running on a network-attached client system ✔ Press the Control + C key combination three times on your workstation keyboard. After Terminating a TPump Job After terminating a TPump job, you can: • Restart the job and allow it to run to completion, or • Reinitialize the job and run it to completion, or • Abandon the job and clean up the database objects. For more information on how to perform the above options, see the following section. 52 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Restarting and Recovery Restarting and Recovery This section explains TPump’s handling of restart and recovery operations in the event of a system failure. The TPump facility includes a number of features that enable recovery from client or Teradata Database failure, with minimal requirements for job resubmission or continuation. Upon restart or resubmission, TPump interrogates the restart log table on the Teradata Database and resumes operations from where it had left off. Caution: Do not tamper with the contents of the restart log table. A missing or altered restart log table will cause the TPump job to be recovered incorrectly. Basic TPump Recovery Whenever a RDBMS restart is detected or a TPump job is restarted on the host system, the following activity occurs: 1 The restart log table is scanned with reference to the TPump script. Each statement within the script is either executed because a row does not exist or ignored because a row exists in the restart log. 2 In the case of the END LOAD statement, there are a number of rows which are placed in the restart log table which let TPump decide what to do. TPump ignores any complete IMPORT within a LOAD and begins at the incomplete IMPORT. 3 Within an unfinished IMPORT, TPump begins processing at the last complete checkpoint. If the TPump job was running in SIMPLE mode before the restart, then recovery is complete and processing continues at the last complete checkpoint. 4 If TPump was running in ROBUST mode before it was restarted, then TPump must next ascertain how much processing has been completed since the last checkpoint. This is accomplished by reading back a set of “Partial Checkpoints” from the restart log table in the Teradata Database, sorting them, and then reprocessing all transactions which were left incomplete when the job was interrupted. Protection and Location of TPump Database Objects The restart log table is critical to the recovery process. If the restart log table is dropped, there is no way to recover an interrupted TPump job. In addition to the restart log table, TPump also creates an error table and a number of macros (where each macro corresponds to a DML SQL statement involved in current IMPORT). If these database objects are dropped, they can, with some effort, being recreated. However, it is much more convenient for this NOT to be necessary. TPump does not have special locks that it places on database objects. It is important that administrators take security precautions to avoid the loss of these objects. If the objects are dropped accidentally, the following information should allow an administrator to recreate them. Teradata Parallel Data Pump Reference 53 Chapter 2: Using TPump Restarting and Recovery TPump macros are placed in the same database that contains the log restart table. The macros are named according to the following convention: Jobname_DDD_SSS where • Jobname is the job name, which, if not explicitly specified, defaults to: MYYYYMMDD_HHMMSS_LLLLL. • LLLLL is the low-order 5 digits of the logon sequence number returned by the dbs from the .LOGON command. • DDD is the number of the .DML sequence (ordinal) number. This value is not reset to one for successive loads (.BEGIN LOAD) in a single job, but continues to be incremented. • SSS is the SQL statements sequence (ordinal) number within the .DML group. Thus, given the following script fragment: .LOGTABLE LT_SIGH; .LOGON TDPID/CME,CME; ... .LAYOUT LAY1A ... .DML LABEL TAB1PART1; INSERT into tab1 values (:F0,:F1,:F2,:F3); .DML LABEL TAB2PART1; INSERT into tab2 values (:F0,:F1,:F2,:F3); ... .IMPORT INFILE TPDAT LAYOUT LAY1A APPLY TAB1PART1 APPLY TAB2PART1; and assuming the job name is defaulted, the macros are named: M20020530_171209_06222_001_001 and M20020530_171209_06222_002_001. The contents of a TPump macro is taken directly from the script and consists of a parameter clause derived from the LAYOUT and the actual statement which is specified in the script. Continuing the example above, if the LAYOUT associated with the statement is as follows: .LAYOUT LAY1A; .FIELD F0 * integer key; .FIELD F1 * integer; .FIELD F2 * integer; .FILLER FX * integer; .FIELD F3 * char(38); then the macros will be created as follows: CREATE MACRO CME.M20020530_171209_06222_001_001 ( F0 (INTEGER), F1 (INTEGER), F2 (INTEGER), F3 (CHAR(38)) ) AS (INSERT INTO TAB1 VALUES(:F0, :F1, :F2, :F3); ); CREATE MACRO CME.M20020530_171209_06222_002_001 ( F0 (INTEGER), F1 (INTEGER), F2 (INTEGER), F3 (CHAR(38)) ) AS ( INSERT INTO TAB2 VALUES(:F0, :F1, :F2, :F3); ); 54 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Restarting and Recovery Note that the actual names of the parameters in the parameter list are not important; however, what is important is that the types of the parameters are specified in the macro in exactly the same order as the types in the LAYOUT. Also important is the fact that FILLER fields are not included in the parameter list since they are stripped out by TPump. The error table, if it is not explicitly specified,is: <JobName>_nnn_ET Where nnn is the load sequence number. If the database for the error table is not explicit in the script, the table is placed in the database associated with the TPump user logon, unless the DATABASE command has been issued. Continuing the above example, assuming the user defaults the error table, then the create table command for it will be: CREATE SET TABLE M20020530_171209_06222_001_ET, NO BEFORE JOURNAL, NO AFTER JOURNAL ( ImportSeq BYTEINT, DMLSeq BYTEINT, SMTSeq BYTEINT, ApplySeq BYTEINT, sourceseq INTEGER, DataSeq BYTEINT, ErrorCode INTEGER, ErrorMsg VARCHAR(255) CHARACTER SET UNICODE NOT CASESPECIFIC, ErrorField SMALLINT, HostData VARBYTE(63677)) UNIQUE PRIMARY INDEX ( ImportSeq ,DMLSeq ,SMTSeq ,ApplySeq ,sourceseq , DataSeq ); Reinitializing a TPump Job If the restart log table has been accidentally dropped or corrupted for a TPump job, follow this procedure to reinitialize the job: 1 Determine how much of the job has completed in order to take data out of the TPump input data set. How this is done will depend on the table and procedures involved with table maintenance. This will vary between jobs and with the procedures in effect at each customer site. 2 Delete any database objects associated with the TPump job that may exist since TPump did not get a chance to clean up. These objects include the error table and any DMLassociated macros. Directions for finding these objects are provided in the previous section. Recovering an Aborted TPump Job An aborted TPump job is one that has been terminated early for any number of reasons (out of database space, accidental cancellation by mainframe operators, UNIX kernel panic, error limit in the TPump job exceeded, and so on) and all TPump database objects, the restart log table, the error table, and DML macros are intact. Teradata Parallel Data Pump Reference 55 Chapter 2: Using TPump Programming Considerations An aborted TPump job may be restarted using the same job script that was used in the original job, and TPump will perform the recovery of the job. Recovering from Script Errors When TPump encounters an error in the input script, a diagnostic message is generated and the operation is stopped with a non-zero return code. You can then modify the script, correct the faulty code, and resubmit the job. Operations begin with the statement following the last one that was successfully completed. Programming Considerations This section provides information to help applications programmers to design and script TPump jobs. Additional information needed by programmers and/or system administrators includes space requirements, locks, and the use of fallback or nonfallback tables. The information in this section includes TPump command conventions, variables, and ANSI/ SQL DateTime Data types. You will find information related to using comments, specifying a character set, using graphic data types, and using graphic constants. Restrictions and limitations, as well as termination return codes, are covered as well. TPump Command Conventions The following command conventions apply when using TPump. TPump Reserved Words Commands supported by TPump do not use reserved words (or keywords), except those that are operators, and only where an expression is allowed. Although there is no official restriction against the use of TPump reserved words as variable names, it is strongly recommended that you avoid their use, as well as the use of Teradata SQL reserved words. You should especially avoid words that are operators (see Table 6), as their use can result in ambiguous expressions. Table 6: TPump Operators Commands 56 AND BETWEEN EQ GE GT IN IS LE LIKE LT MOD NE NOT NULL OR Teradata Parallel Data Pump Reference Chapter 2: Using TPump Programming Considerations Teradata SQL Reserved Words TPump supports a subset of Teradata SQL listed in Table 16 on page 91. The subset of Teradata SQL consists only of statements beginning with one of the reserved words (or keywords) in Table 1. Avoid the use of the Teradata SQL reserved words listed in TPump commands. Conditional Expressions Some of the commands described in this chapter use conditional expressions. If they evaluate to true, conditional expressions return a result of 1; if false, they return a result of 0. Table 7: TPump Conditional Expressions Commands + - / MOD || IS NOT NULL IS NULL EQ = NE <> ^= NOT= ~= GE >= GT > LE <= LT < BETWEEN NOT BETWEEN AND OR IN NOT IN NOT These conditional expressions are similar to those described in the Teradata Database SQL Reference: Functions and Operators, with the following exceptions: 1 In the reference manual, a column name in a conditional expression is equivalent, in this document, to the field name, in records from an external data source or a utility variable. 2 In logical expressions that make up a conditional expression, the LIKE operator is not supported. In these expressions, only the following operators are supported: 3 a All comparison operators documented in Teradata Database SQL Reference: Functions and Operators. b NOT IN operator (only the first of the two forms). In arithmetic expressions that make up a logical expression, the following elements are not supported: a The exponentiation operator b Aggregate operators c Arithmetic functions Using Task Commands A BEGIN LOAD command must begin each task to declare, at a minimum, the number of sessions involved in the load. Teradata Parallel Data Pump Reference 57 Chapter 2: Using TPump Programming Considerations The logged on user must have the appropriate user privileges on the tables. At the time the BEGIN LOAD is initiated, you also must have SELECT privileges, as well as INSERT, UPDATE, and DELETE privileges, depending on the DML statements specified in the current task. Access privileges needed follow standard Teradata access privilege rules. The kind of privilege required depends on the kind of DML statements to be applied. TPump tasks require that you either own, or have select access to, the target table. Additional privilege for the target table is required, depending on the DML command, INSERT, UPDATE, or DELETE. The additional privilege is described for each statement type in later sections. Regardless of the kind of statement, you must have CREATE TABLE privilege on the databases where the error tables are going to be placed. You must also have CREATE TABLE privilege for TPump to create a restart log table. If the restart log table specified for the support environment already exists, INSERT and UPDATE privileges on the table are required. In a TPump task, it is possible for more than one statement/data record combination to affect a single row. If application of any statement/data record combination to a row would produce an error, it is not applied, but all prior and subsequent error-free combinations affecting the same row or other rows are applied. TPump can guarantee the order of operations on a given row via the correct use of the serialize option to specify the primary index of a given target table. When serialize is used, operations for a given set of rows occurs in order on one session. Without serialize, statements are executed on the first session available; hence, operations may occur out of order. Assuming that the serialize option is in effect, note that the order in which DML statement or host record pairings are applied to a given target row is totally deterministic; so too is the order in which rows are applied to the target rows. Operations occur in exactly the same order as they are read from the data source and, if there are multiple apply clauses, in order by apply clause from first to last. In addition to using serialize option in the BEGIN LOAD command, the SERIALIZEON keyword can also be specified in the DML command, which lets you turn serialization on for the fields you specify. You can use the SERIALIZEON keyword in the DML command with the SERIALIZE keyword in the BEGIN LOAD command. When you do, the DML-level serialization ignores and overrides the BEGIN LOAD-level serialization. In this case, the DML command with the serialization option in effect will be serialized on the fields specified. Operations generated from the first IMPORT statement take place before operations generated from the second IMPORT. Variables This section contains information about the variables used in TPump. Predefined System Variables Avoid use of the prefix &SYS in user-defined symbols because the names of predefined utility variables begin with the prefix. Predefined system variables are listed in Table 8. 58 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Programming Considerations Table 8: Predefined System Variables System Variable Description &SYSDATE 8-character date format yy/mm/dd &SYSDATE4 10-character date format yyyy/mm/dd &SYSDAY 3-character day of week: MON TUE WED THU FRI SAT SUN &SYSDELCNT[n} Number of rows deleted from all the target tables of Import n. If n is not specified, it gives the count of deletes done to all the target tables for all imports. The maximum value of n is 4. &SYSETCNT[n} Number of records inserted into the error table for import n. If n is not specified, it gives the total count of all the records inserted into the error table for all imports. The maximum value of n is 4. &SYSINSCNT[n} Number of rows inserted into all the target tables for import n. If n is not specified, this variable gives the total inserts done to all the target tables for all imports. The maximum value of n is 4. &SYSJOBNAME Up to 16 characters (ASCII or EBCDIC) in length, in whichever character set is appropriate. If &SYSJOBNAME is not set using the NAME command, it defaults to MYYYYMMDD_hhmmss_lllll, where M = macro YYYY = year MM = month DD = day hh = hour mm = minute ss = second lllll = is the low-order 5 digits of the logon sequence number returned by the dbs from the .LOGON command. Teradata Parallel Data Pump Reference 59 Chapter 2: Using TPump Programming Considerations Table 8: Predefined System Variables (continued) System Variable Description &SYSOS Client operating system: • • • • • • UNIX HP-UX IBM-AIX Win32 Linux For VM: VM/SP VM/XA SP VM/HPO VM/XA VM/ESA • For MVS: VS1 MVS MVS/SP MVS/ESA &SYSRC Completion code from last response by Teradata Database. &SYSRCDCNT[n] Total number of records read for import n. If n is not specified, it gives the total records read for all imports. &SYSTIME 8-character time format hh:mm:ss &SYSUPDCNT[n] Gives total updates to all target tables for import n. If n is not given, it gives the total updates done to all the target tables for all the imports. The maximum value of n is 4. &SYSUSER Client system dependent: CMS user ID, MVS Batch user ID. (MVS-batch returns user ID only when a security package such as RACF, ACF2, or Top Secret has been installed). &SYSAPLYCNT[n] Number of records applied for import n. If n is not given, it gives the total number of records applied for all imports. &SYSNOAPLYCNT[n] Number of records not applied for import n. If n is not given, it gives the total number of records not applied for all imports. &SYSRJCTCNT[n] Number of records rejected for import n. If n is not given, it gives the total number of rejected records of all the imports. The maximum value of n is 4. Date and Time Variables &SYSDATE, &SYSDATE4, &SYSTIME, and &SYSDAY reflect the time when TPump begins execution. The original values are restored at restart. These values are character data types and 60 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Programming Considerations should not be used in numeric operations. System variables cannot be modified, only referenced. The values returned by &SYSDAY are all in upper case. Monday, for example, is returned as ’MON’: 0003 .IF ’&SYSDAY’ NOT = ’MON’ THEN; 14:10:28 - FRI JUL 30, 1993 UTY2402 Previous statement modified to: 0004 .IF ’FRI’ NOT = ’MON’ THEN; 0005 .RUN FILE UTNTS39; 0006 .ENDIF; This example causes the RUN FILE command to be executed every day but Monday. It can be seen from this example that any of the system variables can be used as the subject condition within an IF/ELSE/ENDIF command construct. This allows you to create a script forcing certain events to occur or tasks to operate in a predetermined sequence, based on the current setting of the variable. As another example, if we create the following table: .SET TABLE TO ’TAB&SYSDAY’; Create table &TABLE ( Account_Number INTEGER NOT NULL, Last_Name VARCHAR(25), First_Name VARCHAR(25), Street_Address VARCHAR(30), City VARCHAR(20), State CHAR(2), Zip_Code CHAR(5) Balance DECIMAL(9,2) FORMAT ’-$,$$$,$$$.99’ ) Unique primary Index (Account_Number); and then check the system variable &SYSRC for a return code to verify if the table already exists, a file containing options to continue or quit is logged at the console. Any other error return codes terminate the job with a Teradata Database error, as follows: .SET CREATERC TO &SYSRC; .IF CREATERC = 3803 /* Table &TABLE already exists */ .RUN FILE RUN01; .ELSE .IF CREATERC <> 0 THEN .LOGOFF CREATRC; .ENDIF .BEGIN LOAD ----------; /* No errors returned. We can start the job.*/ /* TPump statements go here..... */ .END LOAD; .LOGOFF; File RUN01, which operates when the 3803 error causes the RUN FILE command to execute, contains the following options: .DISPLAY to FILE .DISPLAY to FILE .DISPLAY to FILE ’&SYSUSER: Table FOO already exists....’ console; ’&SYSUSER: Reply <C> Continue anyway...’ console; ’&SYSUSER: Reply <A> Abort this JOB....’ console; Teradata Parallel Data Pump Reference 61 Chapter 2: Using TPump Programming Considerations .DISPLAY ’&SYSUSER: Reply <C> or <A>.Default <A>’ to FILE console; .ACCEPT RESPONSE FROM FILE CONSOLE; .IF RESPONSE <> ’C’ THEN .LOGOFF CREATERC; /* Quit before we cause trouble */ .ENDIF; Row Count Variables The row count variables, which are updated for each TPump task, allow you to query the insert, update, and delete row counts and the error table counts for each import: • &SYSDELCNT[n] • &SYSETCNT[n] • &SYSINSCNT[n] • &SYSUPDCNT[n] The values are stored in the TPump utility restart log table and are restored after a client system or Teradata Database restart. When EXECUTE <macroname> INSERT|UPDATE|DELETE is used, TPump must rely on the user to have correctly identified the action (INSERT, UPDATE, or DELETE) which the macro performs. TPump cannot always determine the number of target tables, and therefore can only provide a single combined value for all target tables. Using the existing facility of variable substitution, each new system variable can be referenced as soon as the variable is defined. The new variables are defined during the import phase and should be referenced after the END LOAD command and before any subsequent BEGIN LOAD command in your TPump job script. The values of the new system variables must be stored in the TPump log table and be restored after a restart. Utility Variables TPump supports utility variables. These variables are set via either the SET command or the ACCEPT command. Chapter 3 describes them in greater detail. Additionally, TPump predefines some utility variables that provide information about the TPump environment at execution time. The name of these variables must begin with an ampersand (&) when variable substitution is desired. The rest of the name must obey the rules for standard Teradata SQL column names. Consequently, the name of the variable without ampersand can be no longer than 29 characters, so that with an ampersand it does not exceed the 30-character limit. TPump supports an environmental variable for each DML statement executed. At the end of an import, a variable is established for each statement executed. The variable is named using the number of the import (one through four), the label of the clause “containing” the DML statement, and the number of the statement within the IMPORT’s apply clause. Variable Substitution Variable substitution, to allow for dynamic statement modification, is allowed on any statement by preceding the variable name with an ampersand. Each occurrence of a variable 62 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Programming Considerations name, preceded by an ampersand, is replaced by its current value. Numeric values are permitted, but their values are converted to character for the replacement. This replacement occurs before the statement is analyzed. The replacement operation for a given statement occurs only once (one scan). This means that replacements generating ampersand variable names are not replaced. Even when it appears in a quoted string, an ampersand is always interpreted as the first character of a utility variable unless it is immediately followed by another ampersand. Such a double ampersand is converted to a single textual ampersand. If a reference to a utility variable is followed by a nonblank character that could appear in a variable name, there must be a period between the variable and the nonblank character(s). TPump discards the period in this context. For example, if a utility variable called &x has the value xy and is to be immediately followed by the characters .ab in some context, the sequence of variable and characters must appear as &x..ab to produce xy.ab as the result. Such a double period is converted to a single textual period and concatenated with the value of the utility variable. Using ANSI/SQL DateTime Data Types The following ANSI/SQL DateTime data types can be specified as column or field modifiers in some of the Teradata SQL statements you use with TPump: • DATE • TIME • TIMESTAMP • INTERVAL For example, you can use them in CREATE TABLE statements and in INSERT statements. However, some restrictions may apply when you use ANSI/SQL DateTime data types. In the FIELD command, you must convert ANSI/SQL DateTime data types to fixed-length CHAR(10) data types. See section “Using ANSI/SQL DateTime Data Types” on page 135 for a description of the fixed-length CHAR representations for each ANSI/SQL DateTime data types. Using Comments TPump supports C language style comments. A comment begins with a slash asterisk ’/*’ and all subsequent input is treated as a comment until a terminating asterisk slash ’*/’ is encountered. Comments may nest and they do not occur within string or character literals. For an example, a ’/*’ within a quoted string is not treated as the beginning of a comment. Comments are written to the message destination. Substitution of values for variable names continues within comments. If the variable name is required, two ‘&’s should be coded. Note that recursive comments are permitted, which means that to end the comment, the number of terminating ’*/’ sequences must match the number of start ’/*’ comment sequences that are outstanding. Teradata Parallel Data Pump Reference 63 Chapter 2: Using TPump Programming Considerations You have the option of either sending or not sending comments to the Teradata Database. If a comment is used together with a Teradata SQL statement, a semicolon may be placed as a terminating character to end the comment. The semicolon tells TPump to strip out the comment so that it is not sent to the Teradata Database. If a semicolon is not used, the comment is sent to the Teradata Database together with the Teradata SQL statement. Nested comments are supported when they occur before or after Teradata SQL statements. Nested comments within Teradata SQL statements are not supported. Nested comments must terminate with a semicolon. If a semicolon is not appended, the comment is erroneously sent to the Teradata Database and a syntax error is returned. Specifying a Character Set Table 9 describes ways to either specify the character set or accept a default specification. Table 9: Ways to Either Specify a Character Set or Accept a Default Specification Specification or Default Selection Description Runtime parameter specification Use when you invoke TPump, as described earlier in this chapter: • charset=charactersetname for channel-attached VM and MVS client systems • -c charactersetname for network-attached UNIX and Windows client systems Client System Specification Specify the character set for your client system before invoking TPump by configuring the: • HSHSPB parameter for channel-attached VM and MVS client systems • clispb.dat file for network-attached UNIX and Windows client systems Note: The charactersetname specification used when you invoke TPump always takes precedence over your current client system specification. Teradata Database Default If you do not use a charactersetname specification when you invoke TPump, and there is no character set specification for your client system, TPump uses the default specification in the Teradata Database system table DBC.Hosts. Note: If you rely on the DBC.Hosts table specification for the default character set, make sure that the initial logon is in the default character set: • EBCDIC for channel-attached VM and MVS client systems • ASCII for network-attached UNIX and Windows client systems 64 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Programming Considerations Table 9: Ways to Either Specify a Character Set or Accept a Default Specification (continued) Specification or Default Selection Description TPump Utility Default If there is no character set specification in DBC.Hosts, then TPump defaults to: • EBCDIC for channel-attached VM and MVS client systems • ASCII for network-attached UNIX and Windows client systems Character Set Specifications for AXSMODs When you use an AXSMOD with TPump, the session character set is passed as an attribute to the AXSMOD for possible use. The attribute value is a variable-length character string with either the character set name or the character representation of the character set ID. The attribute varies based on how you specify the character set. Specify the session character set by Attribute name is name CHARSET_NAME ID CHARSET_NUMBER Multibyte Character Sets Teradata Database supports multibyte characters in object names when the client session character set is UTF8 or UTF16. Refer to Teradata Database International Character Set Support manual for a list of valid characters used in object names. In TPump scripts, double quote multibyte characters used in object names. To log on with UTF8 character set or other supported multibyte character sets (Chinese, Japanese, or Korean), create object names shorter than 30 bytes. This limitation applies to userid, password, and account. The logon string might fail if it exceeds 30 bytes per an object name. Multibyte character sets impact the operation of certain TPump commands, as well as object names in Teradata SQL statements, as shown in the following table. TPump Command Affected Element Impact ACCEPT Utility variables The utility variables may contain multibyte characters. If the client does not allow multibyte character set names, then the filename must be in uppercase English. BEGIN LOAD Table names: Target table names and error table names may contain multibyte characters. • Target tables • Error tables Teradata Parallel Data Pump Reference 65 Chapter 2: Using TPump Programming Considerations TPump Command Affected Element Impact DML DML label name The label name in a DML statement may contain multibyte characters. The label name may be referenced in the APPLY clause of an IMPORT statement. FIELD Field name The field name specified may contain multibyte characters. The name can be referenced in other FIELD commands in NULLIF and field concatenation expressions, and in APPLY WHERE conditions in IMPORT commands. The FIELD command can also contain a NULLIF expression, which may use multibyte characters. FILLER Filler name The name specified in a FILLER command may contain multibyte characters. IF IF condition The condition in an IF statement may compare multibyte character strings. LAYOUT Layout name CONTINUEIF condition The layout name may contain multibyte characters and may be used in the LAYOUT clause of an IMPORT command. The CONTINUEIF condition may specify multibyte character set character comparisons. LOGON User name Password The user name and password may contain multibyte characters. LOGTABLE Table name Database name The logtable name and database name may contain multibyte characters. NAME set SYSJOBNAME This variable may contain kanji characters. SET Utility variable The utility variable may contain multibyte characters. The variable can be substituted wherever substitution is allowed. TABLE Table and database name The table name (and database name if the table name is fully qualified) specified in a TABLE statement may contain multibyte characters. Avoid using the TABLE command when using UTF8 or UTF16 character sets by explicitly specifying the layout. Using Graphic Data Types To define double-byte character set data, the GRAPHIC, VARGRAPHIC, and LONG VARGRAPHIC data types are supported. TPump accepts GRAPHIC data in the input data set or file containing the TPump statements to be executed. The FIELD and FILLER statements that describe the input data are the statements affected by GRAPHIC data support. Table 10 lists the GRAPHIC data types that can be specified for the datadesc option in the FIELD or FILLER statement. 66 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Programming Considerations Table 10: GRAPHIC Data Types for datadesc option in FIELD or FILLER Statement GRAPHIC Data Type Input Length Description GRAPHIC (n) if n specified, (n*2) bytes; otherwise, 2 bytes (n=1 is assumed) n double-byte characters VARGRAPHIC (n) m+2 bytes where m/2 <= n 2-byte integer, followed by m/2 double-byte characters LONG VARGRAPHIC m+2 bytes where m/2 <=32000 2-byte integer, followed by m/2 double-byte characters Using Graphic Constants TPump supports two forms of graphic constants. The graphic literal or string constant is allowed only in the kanji EBCDIC character sets. It must have an even number of bytes within the quoted string to represent double-byte characters. The two forms of graphic constants are as follows. • The kanjiEBCDIC graphic constant form (used for the IBM mainframe) • The hexadecimal representation of graphic data (used for both the IBM mainframe and network platforms) For more information on graphic constants and hexadecimal constants, refer to the Teradata Database SQL Reference: Fundamentals. Restrictions and Limitations Table 11 describes TPump restrictions and limitations on operational features and functions. Table 11: Restrictions and Limitations on Operational Features and Functions Operational Feature/Function Restriction/Limitation Maximum file size 2 gigabytes (MP-RAS UNIX systems). Maximum row size The maximum row size for a TPump job, data plus indicators, is approximately 64,000 bytes. This limit is a function of the row size limit of the Teradata Database. • Aggregate operators • Concatenation of data files • Data retrieval from the Teradata Database via SELECT statements Not allowed Expressions Evaluated from left to right, using the Teradata Database order of preference, but can be overridden by parentheses. IMPORT commands Limit of four IMPORT commands within a single TPump load task. Teradata Parallel Data Pump Reference 67 Chapter 2: Using TPump Programming Considerations Table 11: Restrictions and Limitations on Operational Features and Functions (continued) Operational Feature/Function Restriction/Limitation Date specification For dates before 1900 or after 1999, the year portion of the date must be represented by four numerals (yyyy). The default of two numerals (yy) to represent the year is interpreted to be the 20th century, and must be overridden to avoid spurious year information. If the table column defined as type DATE does not have the proper format, your dates may not process properly. The correct date format must be specified at the time of table creation, for example: CREATE TABLE tab (ADATE DATE*); DEFINE a (char(10)) INSERT tab (ADATE = :a(DATE, FORMAT ’yyyy-mmdd’)); Access logging Unlike the MultiLoad and FastLoad utilities, access logging can cause a severe performance penalty in TPump. This is because TPump uses normal SQL operations rather than a proprietary protocol, and if all successful table updates are logged, a log entry is made for each operation. The primary index of the access logging table may then create the possibility of row hash conflicts. Primary Indexes and Partitioning Column Sets You should specify values for the partitioning column set when performing TPump deletes and updates to avoid lock contention problems that can degrade performance. Avoid updating primary index and partitioning columns with TPump to minimize performance degradation. Termination Return Codes When a TPump job terminates, it returns a completion code to the client system using the conventions listed in Table 12: Table 12: Termination Return Codes Code Description 0 Normal completion 4 Warning 8 User error 12 Severe internal error 16 No message destination is available Note: To avoid ambiguous or conflicting results, always use values greater than 20 when you specify a return code with your LOGOFF command. Many CLI and Teradata Database errors generate return codes of 12. For Teradata Database errors that can be corrected and resubmitted, TPump tries up to 16 times to resubmit and, at 68 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Writing a TPump Job Script the end of this process, returns a code of 12. Many CLI and Teradata Database errors generate the return code of 12, except for: • Errors on Teradata SQL statements outside of the TPump task (before the BEGIN LOAD command or after the END LOAD command). TPump ignores these errors, which display error messages but do not cause early termination, nor generate TPump return codes. • Retryable errors or errors caused by Teradata Database restarts. Writing a TPump Job Script This section describes the contents of a TPump job script and explains how to write one. Definition The TPump job script, or program, is a set of TPump commands and Teradata SQL statements that alter the contents of the specified target tables in the Teradata Database. These commands and statements: • Insert new rows • Update some or all of the contents of selected existing rows • Delete selected existing rows Each TPump job includes a number of support commands that establish and maintain the TPump support environment, and a number of task commands that perform the actual database insert, update, or delete operations. These commands and statements identify and describe the input data to be applied to the target table, and then place that data into the target table. These activities may commence anytime after configuring the program as described in “TPump Support Environment” on page 37. Caution: In the event of a client failure, the identical script must be resubmitted in order to restart. If the script is edited, a restart will not be permitted. Script Writing Guidelines The following script writing guidelines will help you write a TPump job script: • A script may contain up to four IMPORTs (tasks), delimited by a leading BEGIN LOAD command and a trailing END LOAD command. • The BEGIN LOAD command specifies the number of sessions and establishes a number of controlling parameters. The BEGIN LOAD command also specifies the error table, and is the only table specified. An optional qualifying database name may also be specified. This database name may be different from the database being modified, thus allowing tables to be created and dropped with no impact on the production database. In addition, the BEGIN LOAD command establishes acceptable threshold levels for important task controls, such as number and percentage of errors, session limits, duration Teradata Parallel Data Pump Reference 69 Chapter 2: Using TPump Writing a TPump Job Script of logon attempts in hours (tenacity), and checkpointing frequency. This command also provides optional controls to: • • determine where any macros are placed • guarantee serial operations on given rows • select the number of statements to pack into a multiple-statement request • select a restart logic mode The next item appearing in a script is usually a description of the records in the external file containing the change data for the target tables. The description of these input records appears in a sequence of commands headed by the LAYOUT command. The LAYOUT command tags the record layout being depicted with a unique name, which is then referenced by subsequent script commands in tasks throughout the rest of the job. The LAYOUT is followed by the supporting information contained in the sequence of one or more FIELD, FILLER, and TABLE commands. • Each FIELD command describes a single data item occupying a column in the input row. These items are described by data type, starting position, length, and several other characteristics. The FIELD command is used only for those items (columns) relevant to the current task, which are to be sent to the Teradata Database as changes to the target table. The FIELD command may include the KEY modifier if the column is to be considered part of the primary index for purposes of serialization. • Each FILLER command describes a column in the input row in the same way as the FIELD command. These FILLER fields are never sent to the Teradata Database. The FILLER command, however, identifies those columns that you do not want to be sent to the Teradata Database. Thus, if a sequence of 10 alternating FIELD and FILLER commands is used to describe 10 contiguous columns in the row, every other column, a total of five columns, would be sent to the Teradata Database. • The TABLE command identifies any existing table with the same layout as the input. The TABLE command is used when the changes are being enacted on entire rows, rather than selected columns. • The next entry in the script is the DML command, which is followed by the DML statements INSERT, UPDATE, and DELETE. The DML command creates an identifying label for the DML statement input, which immediately follows the command. The DML command also defines an error handling process for handling missing and duplicate rows, with respect to the error table. The three DML statements (INSERT, UPDATE, and DELETE) follow the DML command, and may occur in any order and in any quantity. The INSERT statement is used to place a complete and entirely new row into the target table. The UPDATE statement takes the data contents from columns in the input record, as defined with the LAYOUT, FIELD, FILLER command sequence, and substitutes the data into the target table. The UPDATE rows are selected based on criteria specified in a conditional clause in the statement. 70 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Writing a TPump Job Script The DML command also allows UPDATE and INSERT statements to be paired to provide TPump with an upsert capability. This allows TPump, in a single pass, to attempt an UPDATE and, if it fails, perform an INSERT on the same row. The DELETE statement removes entire rows from the target table whenever the evaluation of the deleting condition is true, as specified in a conditional clause in the statement. • The only information not yet provided in the task is the identity of the input data file, the starting and ending records in the file that are being used in this task, and other related information. This is done with the IMPORT command. This command basically tells the TPump utility to bring in file X, from record A through record N, to associate the layout name (and specifications) with the input records, and to apply the desired DML (INSERT, UPDATE, and DELETE) statement to each record. • The last command in the script is the END LOAD command. This command flags the end of the commands and statements for the task, and triggers the program to begin execution of the task. For compatibility with the MultiLoad utility, multiple IMPORTs (up to four) are allowed within a single BEGIN/END LOAD pair. However, because TPump does have an apply phase, there is no significant difference between a script containing four BEGIN/END LOAD pairs, each with one IMPORT, and a script with one BEGIN/END LOAD pair and four IMPORTs. Procedure for Writing a Script A complete TPump job includes: • Invoking TPump • Logging onto the Teradata Database and establishing the TPump support environment • Specifying the TPump tasks • Logging off from the Teradata Database and terminating TPump. Use the following procedure as a guide for writing TPump job scripts: 1 Invoke TPump, specifying your runtime options: • Normal or abbreviated (brief) printout • Number of buffers per session • Character set • Configuration file • Periodicity rate • Error logging function • Macro save option • Alternate run file • Verbose mode Refer to “Invoking TPump” for detailed information about how to specify these options. 2 Establish the TPump support environment using the support commands summarized in Table 2. Teradata Parallel Data Pump Reference 71 Chapter 2: Using TPump Writing a TPump Job Script As a minimum, this part of your TPump job must include: • A LOGTABLE command to specify the restart log table • A LOGON command to provide a logon string that is used to connect all Teradata SQL and TPump utility sessions with the Teradata Database. 3 Specify the TPump task using the task commands summarized in Table 2. 4 If you want to specify another TPump task: • Use the support commands to modify the TPump support environment for the next task. • Use the task commands to specify the next task. Repeat these steps for each task in your TPump job. Note: Though a single TPump job can include a number of different tasks, limiting your jobs to a single task for each invocation of TPump provides the highest assurance of a successful restart/recovery operation if a system failure interrupts your job. 5 Use the LOGOFF command to disconnect all active sessions with the Teradata Database and terminate TPump on your client system. TPump Script Example The following example shows what a simple TPump script and its corresponding output might look like. The lines that begin with 4-digit numbers (for example, 0001) are scripts, the rest are output. **** 16:07:17 UTY6633 WARNING: No configuration file, using build defaults ======================================================================== = = = Teradata Parallel Data Pump Utility Release 12.00.00.00 = = Platform MVS = = = ======================================================================== = = = Copyright 1997-200, NCR Corporation. ALL RIGHTS RESERVED. = = = ======================================================================== **** 16:07:17 UTY2411 Processing start date: TUE JUL 10, 2007 ======================================================================== = = = Logon/Connection = = = ======================================================================== 0001 .LOGTABLE tpperf1a_lt1a; 0002 .LOGON TDQY/tpperf1a,; **** 16:07:18 UTY8400 Teradata Database Release: 12.00.00.00 **** 16:07:18 UTY8400 Teradata Database Version: 12.00.00.00 **** 16:07:18 UTY8400 Default character set: EBCDIC **** 16:07:18 UTY8400 Maximum supported buffer size: 1M **** 16:07:26 UTY6211 A successful connect was made to the RDBMS. **** 16:07:26 UTY6217 Logtable 'TPPERF1A.TPPERF1A_LT1A' has been created. ======================================================================== = = = Processing Control Statements = = = 72 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Writing a TPump Job Script ======================================================================== 0003 .NAME TPUP1; 0004 .BEGIN LOAD SESSIONS 20 PACK 20 ROBUST ON SERIALIZE ON CHECKPOINT 0 ERRORTABLE ET_TPPERF1A_TASK1A; ======================================================================== = = = Processing TPump Statements = = = ======================================================================== 0005 .LAYOUT LAY1A; 0006 .FIELD F0 * integer key; 0007 .FIELD F1 * integer; 0008 .FIELD F2 * integer; 0009 .FIELD F3 * char(38); 0010 .DML LABEL ONE; 0011 UPDATE tpperf1a set F2 = F2 + 1 where F0 = :F0 and F1 = :F1; 0012 .IMPORT INFILE DATA FROM 1 THRU 96 LAYOUT LAY1A APPLY ONE; 0013 .END LOAD; **** 16:07:27 UTY6609 Starting to log on sessions... **** 16:07:28 UTY6610 Logged on 20 sessions. ======================================================================== = = = TPump Import(s) Beginning = = = ======================================================================== **** 16:07:28 UTY6630 Options in effect for following TPump Import(s): . Tenacity: 4 hour limit to successfully connect load sessions. . Max Sessions: 20 session(s). . Min Sessions: 16 session(s). . Checkpoint: 0 minute(s). . Errlimit: No limit in effect. . Restart Mode: ROBUST. . Serialization: ON. . Packing: 20 Statements per Request. . StartUp Rate: UNLIMITED Statements per Minute. **** 16:07:36 UTY6608 Import 1 begins. **** 16:07:39 UTY6641 Since last chkpt., 96 recs. in, 96 stmts., 20 reqs **** 16:07:39 UTY6647 Since last chkpt., avg. DBS wait time: 0.05 **** 16:07:39 UTY6612 Beginning final checkpoint... **** 16:07:39 UTY6641 Since last chkpt., 96 recs. in, 96 stmts., 20 reqs **** 16:07:39 UTY6647 Since last chkpt., avg. DBS wait time: 0.05 **** 16:07:40 UTY6607 Checkpoint Completes with 96 rows sent. **** 16:07:40 UTY6642 Import 1 statements: 96, requests: 20 **** 16:07:40 UTY6643 Import 1 average statements per request: 4.80 **** 16:07:40 UTY6644 Import 1 average statements per record: 1.00 **** 16:07:40 UTY6645 Import 1 statements/session: avg. 4.80, min. 4.00, max. 5.00 **** 16:07:40 UTY6646 Import 1 requests/session: average 1.00, minimum 1.00, maximum 1.00 **** 16:07:40 UTY6648 Import 1 DBS wait time/session: avg. 0.05, min. 0.00, max. 1.00 **** 16:07:40 UTY6649 Import 1 DBS wait time/request: avg. 0.05, min. 0.00, max. 1.00 **** 16:07:40 UTY1803 Import processing statistics . IMPORT 1 Total thus far Teradata Parallel Data Pump Reference 73 Chapter 2: Using TPump Viewing TPump Output . ========= ============== Candidate records considered:........ 96....... 96 Apply conditions satisfied:.......... 96....... 96 Errors loggable to error table:...... 0....... 0 Candidate records rejected:.......... 0....... 0 ** Statistics for Apply Label : ONE Type Database Table or Macro Name Activity U tpperf1a tpperf1a 96 **** 16:07:42 UTY0821 Error table tpperf1a.ET_TPPERF1A_TASK1A is EMPTY, dropping table. 0014 .if &imp1_one_1 = 96 then; **** 16:07:44 UTY2402 Previous statement modified to: 0015 .if 96 = 96 then; 0016 .display 'rowcount ok' to file systest; 0017 .else; 0018 .display 'rowcount not ok' to file systest; 0019 .endif; 0020 .if &sysetcnt = 0 then; **** 16:07:44 UTY2402 Previous statement modified to: 0021 .if 0 = 0 then; 0022 .display 'no errors' to file systest; 0023 .else; 0024 .display 'errors!!!' to file systest; 0025 .endif; ======================================================================== = = = Logoff/Disconnect = = = ======================================================================== **** 16:07:55 UTY6216 The restart log table has been dropped. **** 16:07:55 UTY6212 A successful disconnect was made from the RDBMS. **** 16:07:55 UTY2410 Total processor time used = '0.23474 Seconds' . Start : 16:07:17 - MON JULY 16, 2007 . End : 16:07:55 - MON JULY 16, 2007 . Highest return code encountered = '0'. Viewing TPump Output TPump reporting functions provide timely information about the status of tasks in progress and those just completed. 74 • TPump Statistics—The TPump Statistics facility provides information on the success or failure of TPump processing, with respect to data records transferred, target table row modifications, and error table statistics. These statistics are accumulated and presented at the end of the job. • TPump Options Messages—The options messages list the settings of some important TPump task parameters. • Logoff/Disconnect Messages—The logoff/disconnect messages report several key run statistics. Teradata Parallel Data Pump Reference Chapter 2: Using TPump Viewing TPump Output TPump Statistics For each task, TPump accumulates statistical items and writes them to the customary output destination of the external system, SYSPRINT/stdout (or the redirected stdout), or the destination specified in the ROUTE command. The statistics listed in Table 13 are kept: Table 13: TPump Statistics Reference Number Reference Item Statistic 1 Candidate records considered The number of records read. 2 Apply conditions satisfied The number of statements sent to the RDBMS. If there are no rejected or skipped records, this value is equal to the number of candidate records, multiplied by the number of APPLY statements referenced in the import. 3 Errors loggable to error table The number of records resulting in errors on the RDBMS. These records are found in the associated error table. 4 Candidate records rejected The number of records which are rejected by the TPump client code because they are formatted incorrectly. 5 Statistics for Apply Label This area breaks out the total activity count for each statement within each DML APPLY clause. The ‘Type’ column contains the values U for update, I for insert, and D for delete. Note that unlike the other reported statistics, these values are NOT accumulated across multiple imports. 6 Number of RDBMS requests sent These statistics are displayed only in the verbose mode, which is selected as a runtime parameter, VERBOSE, in MVS, or -v in UNIX. In addition, Teradata TPump receives a count of the number of rows deleted from the Teradata Database. Teradata TPump writes it either to SYSPRINT/stdout (or the redirected stdout), or the destination specified in the ROUTE command. If a record is rejected due to an error, as in the case of a duplicate, missing, or extra insert, update, or delete row, the following statistical output shows that an error condition occurred. . . Candidate records considered:........ Apply conditions satisfied:.......... Errors loggable to error table:...... Candidate records rejected:.......... Number of RDBMS requests sent:....... ** Statistics for Apply Label : LABELB Type Database I CME Teradata Parallel Data Pump Reference IMPORT 1 Total thus far ========= ============== 8....... 8 8....... 8 1....... 1 1....... 1 6....... 6 Table or Macro Name TDBTB734_TAL <-----(1)<-----(2)<-----(3)<-----(4)<-----(6)- Activity 7 <-(5)- 75 Chapter 2: Using TPump Viewing TPump Output Restart Statistics Teradata TPump stores statistics in the restart log table. After a restart, all statistics are properly restored. Teradata TPump Statistical Output The following is an example of Teradata TPump output. Lines that are marked on the righthand side with (-----------(n) are explained above. **** 16:53:31 UTY6633 WARNING: No configuration file, using build defaults ======================================================================== = = = Teradata Parallel Data Pump Utility Release 12.00.00.00 = = Platform MVS = = = ======================================================================== = = = Copyright 1997-200, NCR Corporation. ALL RIGHTS RESERVED. = = = ======================================================================== **** 16:53:31 UTY2411 Processing start date: MON JULY 16, 2007 ======================================================================== = = = Logon/Connection = = = ======================================================================== 0001 .LOGTABLE LT_SIGH; 0002 .LOGON pebble/cme,; **** 16:53:32 UTY8400 Teradata Database Release: 12.00.00.00 **** 16:53:32 UTY8400 Teradata Database Version: 12.00.00.00 **** 16:53:32 UTY8400 Default character set: EBCDIC **** 16:53:32 UTY8400 Maximum supported buffer size: 1M **** 16:53:41 UTY6211 A successful connect was made to the RDBMS. **** 16:53:41 UTY6217 Logtable 'CME.LT_SIGH' has been created. ======================================================================== = = = Processing Control Statements = = = ======================================================================== 0003 CREATE TABLE TAB1, FALLBACK, NO JOURNAL (F0 integer ,F1 integer ,F2 integer ,F3 char(38)) UNIQUE PRIMARY INDEX(F0) ; **** 16:53:44 UTY1016 'CREATE' request successful. 0004 CREATE TABLE TAB2, FALLBACK, NO JOURNAL (F0 integer ,F1 integer ,F2 integer ,F3 char(38)) UNIQUE PRIMARY INDEX(F0) ; **** 16:53:48 UTY1016 'CREATE' request successful. 0005 .BEGIN LOAD SESSIONS 10 ROBUST ON 76 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Viewing TPump Output 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021 0022 0023 **** **** **** **** **** **** SERIALIZE ON CHECKPOINT 10 NOMONITOR ERRORTABLE ET_TEST1; ======================================================================== = = = Processing TPump Statements = = = ======================================================================== .LAYOUT LAY1A; .FIELD F0 * integer key; .FIELD F1 * integer; .FIELD F2 * integer; .FIELD F3 * char(38); .DML LABEL TAB1PART1; INSERT into tab1 values (:F0,:F1,:F2,:F3); .DML LABEL TAB2PART1; INSERT into tab2 values (:F0,:F1,:F2,:F3); .DML LABEL TAB1UPSERT DO INSERT FOR MISSING UPDATE ROWS IGNORE DUPLICATE INSERT ROWS; UPDATE tab1 set F2=F2 + 1 where f0=:f0 + 50 and f1 > 4; INSERT into tab1 ( F0, F1, F2, F3) values (:F0 + 100,:F1,:F2,:F3); .DML LABEL TAB2UPSERT DO INSERT FOR MISSING UPDATE ROWS IGNORE DUPLICATE INSERT ROWS; UPDATE tab2 set F2=F2 + 1 where f0=:f0 + 50 and f1 > 4; INSERT into tab2 ( F0, F1, F2, F3) values (:F0 + 100,:F1,:F2,:F3); .IMPORT INFILE INDATA FROM 1 THRU 100 LAYOUT LAY1A APPLY TAB1PART1 APPLY TAB2PART1; .IMPORT INFILE INDATA FROM 1 THRU 100 LAYOUT LAY1A APPLY TAB1UPSERT APPLY TAB2UPSERT; .END LOAD; 16:53:48 UTY6609 Starting to log on sessions... 16:53:49 UTY6610 Logged on 10 sessions. ======================================================================== = = = TPump Import(s) Beginning = = = ======================================================================== 16:53:49 UTY6630 Options in effect for following TPump Import(s): . Tenacity: 4 hour limit to successfully connect load sessions. . Max Sessions: 10 session(s). . Min Sessions: 8 session(s). . Checkpoint: 10 minute(s). . Errlimit: No limit in effect. . Restart Mode: ROBUST. . Serialization: ON. . Packing: 20 Statements per Request. . StartUp Rate: UNLIMITED Statements per Minute. 16:54:00 UTY6608 Import 1 begins. 16:54:05 UTY6641 Since last chkpt., 100 recs. in, 200 stmts., 10 reqs 16:54:05 UTY6647 Since last chkpt., avg. DBS wait time: 0.30 Teradata Parallel Data Pump Reference 77 Chapter 2: Using TPump Viewing TPump Output **** **** **** **** **** **** **** **** **** **** **** **** 16:54:05 UTY6612 Beginning final checkpoint... 16:54:05 UTY6641 Since last chkpt., 100 recs. in, 200 stmts., 10 reqs 16:54:05 UTY6647 Since last chkpt., avg. DBS wait time: 0.30 16:54:05 UTY6607 Checkpoint Completes with 200 rows sent. 16:54:05 UTY6642 Import 1 statements: 200, requests: 10 16:54:05 UTY6643 Import 1 average statements per request: 20.00 16:54:05 UTY6644 Import 1 average statements per record: 1.00 16:54:05 UTY6645 Import 1 statements/session: avg. 20.00, min. 20.00, max. 20.00 16:54:05 UTY6646 Import 1 requests/session: average 1.00, minimum 1.00, maximum 1.00 16:54:05 UTY6648 Import 1 DBS wait time/session: avg. 0.30, min. 0.00, max. 3.00 16:54:05 UTY6649 Import 1 DBS wait time/request: avg. 0.30, min. 0.00, max. 3.00 16:54:05 UTY1803 Import processing statistics . IMPORT 1 Total thus far . ========= ============== Candidate records considered:........ 100....... 100<-----(1)Apply conditions satisfied:.......... 200....... 200<-----(2)Errors loggable to error table:...... 0....... 0<-----(3)Candidate records rejected:.......... 0....... 0<-----(4)Number of RDBMS requests sent:....... 10....... 10<-----(6)** Statistics for Apply Label : TAB1PART1 Type Database Table or Macro Name Activity I CME tab1 100<-(5)** Statistics for Apply Label : TAB2PART1 Type Database Table or Macro Name Activity I CME tab2 100 **** 16:54:19 UTY6608 Import 2 begins. **** 16:54:29 UTY6641 Since last chkpt., 100 recs. in, 300 stmts., 171 reqs **** 16:54:29 UTY6647 Since last chkpt., avg. DBS wait time: 0.00 **** 16:54:29 UTY6612 Beginning final checkpoint... **** 16:54:29 UTY6641 Since last chkpt., 100 recs. in, 300 stmts., 171 reqs **** 16:54:29 UTY6647 Since last chkpt., avg. DBS wait time: 0.00 **** 16:54:29 UTY6607 Checkpoint Completes with 200 rows sent. **** 16:54:29 UTY6642 Import 2 statements: 300, requests: 171 **** 16:54:29 UTY6643 Import 2 average statements per request: 1.75 **** 16:54:29 UTY6644 Import 2 average statements per record: 1.50 **** 16:54:29 UTY6645 Import 2 statements/session: avg. 30.00, min. 30.00, max. 30.00 **** 16:54:29 UTY6646 Import 2 requests/session: average 17.10, minimum 17.00, maximum 18.00 **** 16:54:29 UTY6648 Import 2 DBS wait time/session: avg. 0.00, min. 0.00, max. 0.00 **** 16:54:29 UTY6649 Import 2 DBS wait time/request: avg. 0.00, min. 0.00, max. 0.00 **** 16:54:29 UTY1803 Import processing statistics . IMPORT 2 Total thus far . ========= ============== Candidate records considered:........ 100....... 200 Apply conditions satisfied:.......... 200....... 400 Errors loggable to error table:...... 0....... 0 Candidate records rejected:.......... 0....... 0 ** Statistics for Apply Label : TAB1UPSERT Type Database Table or Macro Name Activity U CME tab1 50 I CME tab1 50 ** Statistics for Apply Label : TAB2UPSERT Type Database Table or Macro Name Activity U CME tab2 50 I CME tab2 50 **** 16:54:36 UTY0821 Error table CME.ET_TEST1 is EMPTY, dropping table. ======================================================================== = = = Logoff/Disconnect = 78 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Viewing TPump Output = = ======================================================================== **** 16:54:49 UTY6216 The restart log table has been dropped. **** 16:54:49 UTY6212 A successful disconnect was made from the RDBMS. **** 16:54:49 UTY2410 Total processor time used = '1.0363 Seconds' . Start : 16:53:31 - MON JULY 16, 2007 . End : 16:54:49 - MON JULY 16, 2007 . Highest return code encountered = '0'. The above script has a realistic degree of complexity. The script demonstrates a TPump job that contains two imports and each import has at least two associated statements. For the first import there are two statements, each of which is specified in a separate DML statement. The IMPORT statement references the two statements through two APPLY clauses. The second import adds additional complexity by having two statements in each DML statement. In this case, the two statements in each DML compose an upsert statement. TPump Options Messages The options message lists the settings of some important TPump task parameters. A few examples follow: Example 1 The following example depicts a typical options message. **** 17:09:34 UTY6630 Options in effect for following TPump Import(s): . Tenacity: 4 hour limit to successfully connect load sessions. . Max Sessions: 10 session(s). . Min Sessions: 8 session(s). . Checkpoint: 10 minute(s). . Errlimit: 1 rejected record(s). . Restart Mode: ROBUST. . Serialization: ON. . Packing: 20 Statements per Request. . StartUp Rate: UNLIMITED Statements per Minute. Example 2 In this example, the error limit is expressed as a percent of rows, not as a hard limit, the recovery mode is simple, and serialization is on. **** 17:09:34 UTY6630 Options in effect for following TPump Import(s): . Tenacity: 4 hour limit to successfully connect load sessions. . Max Sessions: 4 session(s). . Min Sessions: 4 session(s). . Checkpoint: 5 minutes. . Errlimit: 10% of records rejected. . Restart Mode: SIMPLE. . Serialization: ON. . Packing: 20 Statements per Request. . StartUp Rate: 500 Statements per Minute. Example 3 In this example, there is no error limit in effect and tenacity has been set to zero. **** 17:09:34 UTY6630 Options in effect for following TPump Import(s): Teradata Parallel Data Pump Reference 79 Chapter 2: Using TPump Monitoring TPump Jobs . . . . . . . . . Tenacity: Max Sessions: Min Sessions: Checkpoint: Errlimit: Restart Mode: Serialization: Packing: StartUp Rate: Sessions must successfully connect on first try. 1 session(s). 1 session(s). 5 minutes. No limit in effect. ROBUST. OFF. 40 Statements per Request. UNLIMITED Statements per Minute. Logoff/Disconnect Messages In response to the LOGOFF command, TPump completes the step by disconnecting active sessions and reporting on total run statistics. The logtable is either dropped or kept, depending on the success or failure of the previous activity. When you log off a TPump session, the following status messages are written to the SYSPRINT/stdout (or the redirected stdout) data destination, or to the destination specified in the ROUTE command. **** 13:57:45 UTY6216 The restart log table has been dropped. **** 13:57:45 UTY6212 A successful disconnect was made from the RDBMS. **** 13:57:45 UTY2410 Total processor time used = '0.270389 Seconds' . Start : 13:57:16 - MON JULY 16, 2007 . End : 13:57:45 - MON JULY 16, 2007 . Highest return code encountered = '0'.Progress Monitoring TPump differs from most other Teradata load utilities in that there is no support for it in QrySessn. Instead, the optional TPump Monitor (see Table 14) is the only method for remotely overseeing the progress of the utility. Note however, that while TPump requests do appear in the QrySessn output, they are displayed as a collection of individual transactions instead of being summarized into one utility instance. Monitoring TPump Jobs TPump provides an optional monitoring tool to monitor and update TPump jobs. The TPump Monitor provides for run-time monitoring of the TPump job that allows you, via a command-line interface, to track and alter the rate at which requests are issued to the RDBMS. The TPump Monitor provides the following functions: 80 • Provides a set of SQL scripts that create a Monitor Interface table. TPump updates this table approximately once every minute. • Allows you to learn the status of an import by querying against the Monitor Interface table. • Allows you to alter the statement rate of an import by updating the Monitor Interface table. Teradata Parallel Data Pump Reference Chapter 2: Using TPump Monitoring TPump Jobs Monitor Interface Table Use SQL scripts shipped with TPump to create a Monitor Interface Table (SysAdmin.TPumpStatusTbl) in the RDBMS where TPump maintains information about an import. TPump both reads commands from and updates status in the Monitor Interface Table. This table is required in order to use the TPump Monitor functionality, but is otherwise optional. If the table does not exist, the worst that will happen is that TPump issues a warning message indicating this fact. This table must be secure, so it is created by the DBA. An SQL script tpumpar.csql is provided in the TPump installation that performs the appropriate setup. The tpumpar.csql script includes an action request. Table 14 contains the following columns (other columns exist in order to support future functionality): Table 14: Monitor Interface Table Name Type Notes LogDB VARCHAR(32) Part of primary index. The name of the log table database. LogTable VARCHAR(32) Part of primary index. The name of the log table. Import INTEGER Part of primary index. The import number. (There may be multiple imports in a TPump job. UserName VARCHAR(32) The name of the user running the job. Used for security. InitStartDate DATE The initial start date of the import. InitStartTime FLOAT The initial start time of the import. CurrStartDate DATE The last date this import was started (may be a restart). CurrStartTime FLOAT The last time this import was started (may be a restart). LastUpdateDate DATE The last date this import updated the table. LastUpdateTime FLOAT The last time this import updated the table. RestartCount INTEGER The number of times this import has been restarted. Complete CHAR(1) ‘Y’ if this import is complete. (There may be multiple imports.) RecordsOut INTEGER The number of statements sent to the RDBMS. RecordsSkipped INTEGER The number of records skipped for apply conditions. RecordsRejcted INTEGER The number of records rejected for bad data (on host) RecordsRead INTEGER The number of records read. RecordsErrored INTEGER The number of records resulting in errors on the RDBMS. Teradata Parallel Data Pump Reference 81 Chapter 2: Using TPump Monitoring TPump Jobs Table 14: Monitor Interface Table (continued) Name Type Notes StmtsUnLimited CHAR(1) ‘Y’ if this import running without a statement rate limit. If ‘N’ then refer to StmtsDesired for the statement rate. StmtsDesired INTEGER The statement rate (if StmtsUnLimited is ‘N’) PeriodsDesired INTEGER Allows you to specify the desired periodicity. PleaseAbort CHAR(1) Set to ‘Y’ if you desire to abort. RequestAction CHAR(1) Before processing any action request, a message will be logged stating that the requested action is being taken. The following action requests are permitted: • Blank – No action • C – Take a checkpoint and continue the job • P – Take a checkpoint and pause until a subsequent action request resumes or terminates the job • R – Resume the job • T – Take a checkpoint and terminate the job with rc = 8. The job may be restarted. • A – Terminate the job immediately with rc = 12. The job may be restarted. RequestChange CHAR(1) Set to ‘Y’ by the user if you desire TPump to pick up the changes. Set to ‘N’ by TPump after changes are accepted. Security concerns dictate that the SQL script to set up the Monitor Interface table for TPump monitoring also establishes a set of views and macros in addition to the TPumpStatusTbl. Although database administrators can access the table directly, using macros and views is recommended because they provide for security and ensure rational use of the table. Without action on the part of the database administrator, no normal user can update the status of jobs. To grant controlled update access to the TPumpStatusTbl, a single command will suffice: “GRANT EXEC ON TPumpMacro TO _____;” The macros for TPump monitoring reside in the database TPumpMacro and SysAdmin. TPump Monitor Views The following views of the Monitor Interface table are available: View SysAdmin.TPumpStatus This view is for database administrators and lets them see all running TPump imports. CREATE VIEW SysAdmin.TPumpStatus AS LOCKING SysAdmin.TPumpStatusTbl FOR ACCESS SELECT * FROM SysAdmin.TPumpStatusTbl; 82 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Monitoring TPump Jobs View SysAdmin.TPumpStatusX This view is for all users and provides a view of TPump jobs. However, this view will only show you jobs which you “own”. CREATE VIEW SysAdmin.TPumpStatusX AS LOCKING SysAdmin.TPumpStatusTbl FOR ACCESS SELECT * FROM SysAdmin.TPumpStatusTbl WHERE UserName = USER; TPump Monitor Macros TPump Monitor provides a set of macros that you can use to update the Monitor Interface table and to monitor and control individual TPump import jobs. The following TPump Monitor macros are provided: Macro TPumpMacro. TPumpUpdateSelect This macro is provided for database administrators to use to manipulate and monitor individual TPump jobs: CREATE MACRO SysAdmin.TPumpUpdateSelect ( LogDB VARCHAR(32), LogTable VARCHAR(32), UserName VARCHAR(32), Import INTEGER, RequestChange CHAR(1), StmtsUnLimited CHAR(1), StmtsDesired INTEGER, PeriodsDesired INTEGER ) AS ( LOCK ROW WRITE /* OR LOCKING Sysadmin.TPumpStatus for WRITE */ SELECT RecordsOut , RecordsSkipped , RecordsRejcted , RecordsRead , RecordsErrored FROM SysAdmin.TPumpStatusTbl WHERE UserName = :UserName AND LogDB = :LogDB AND Import = :Import AND LogTable = :LogTable ; UPDATE SysAdmin.TPumpStatusTbl SET RequestChange = :RequestChange, StmtsUnLimited = :StmtsUnLimited, StmtsDesired = :StmtsDesired, PeriodsDesired = :PeriodsDesired Teradata Parallel Data Pump Reference 83 Chapter 2: Using TPump Estimating Space Requirements WHERE UserName = :UserName LogDB = :LogDB LogTable = :LogTable Import = :Import );mport = :Import ; ); AND AND AND ; Macro TPumpMacro. UserUpdateSelect The macro UserUpdateSelect is provided to let you monitor/update your own TPump jobs. CREATE MACRO TPumpMacro.UserUpdateSelect ( LogDB VARCHAR(32), LogTable VARCHAR(32), Import INTEGER, RequestChange CHAR(1), StmtsUnLimited CHAR(1), StmtsDesired INTEGER, PeriodsDesired INTEGER ) AS ( LOCK ROW WRITE /* OR LOCKING Sysadmin.TPumpStatus FOR WRITE */ SELECT RecordsOut , RecordsSkipped , RecordsRejcted , RecordsRead , RecordsErrored FROM SysAdmin.TPumpStatusTbl WHERE UserName = USER AND LogDB = :LogDB AND Import = :Import AND LogTable = :LogTable ; UPDATE SysAdmin.TPumpStatusTbl SET RequestChange = :RequestChange, StmtsUnLimited = :StmtsUnLimited, StmtsDesired = :StmtsDesired, PeriodsDesired = :PeriodsDesired WHERE UserName = USER AND LogDB = :LogDB AND LogTable = :LogTable AND Import = :Import ; ); Estimating Space Requirements This section discusses space requirements for the TPump log table. A row of approximately 200 bytes is written to the log table on each of the following events: 84 Teradata Parallel Data Pump Reference Chapter 2: Using TPump Estimating Space Requirements 1 One row is written at initialization. 2 One row is written for each SQL statement issued through the TPump support environment. 3 One row is written at the BEGIN LOAD command. 4 One row is written at the END LOAD command. 5 Two rows are written for each IMPORT command. 6 One row is written for each statement used in a load (between the BEGIN LOAD command and the END LOAD command). 7 One row is written for each checkpoint taken. 8 In the ROBUST mode, for each packed request, a number of partial checkpoint rows are written to the log between checkpoints. The rows are deleted each time a checkpoint is written. The partial checkpoint row contains 117 + (12 * packfactor) bytes per transaction. So the number of partial checkpoints will vary, depending on the checkpoint frequency, the power of the loading host, and the power of the Teradata target RDBMS. Thus, an equation for the space is: 200 + 200 * each statement in the support environment + 400 * each BEGIN/END LOAD + 200 * each statement issued as DML + 200 * the estimated number of checkpoints + (117 + (12 * packfactor)) * the number of partial checkpoints. A simplified version would be: R = 200 + 200S + 400L + 200D + 200C + (117 + (12P))N, where: R = Required space for TPump log table S = Each SQL statement issued through the support environment L = Each BEGIN/END LOAD command pair D = Each DML statement C = Estimated number of checkpoints P = Packfactor N = Number of partial checkpoints Space Calculation Example The following example of how TPump log table space is derived takes a simple load that consists of the following script: LOGTABLE CME.TLddNT14H; .LOGON OPNACC1/CME,CME; DROP TABLE TBL14TA; DROP TABLE TBL14TB; DROP TABLE tlnt14err; CREATE TABLE TBL14TA,FALLBACK (ABYTEINT BYTEINT, Teradata Parallel Data Pump Reference 85 Chapter 2: Using TPump Estimating Space Requirements ASMALLINT SMALLINT, AINTEGER INTEGER, ADECIMAL DECIMAL (5,2), ACHAR CHAR (5), ABYTE BYTE(1), AFLOAT FLOAT, ADATE DATE) UNIQUE PRIMARY INDEX (ASMALLINT); CREATE TABLE TBL14TB,FALLBACK (ABYTEINT BYTEINT, ASMALLINT SMALLINT, AINTEGER INTEGER, ADECIMAL DECIMAL (5,2), CHAR CHAR (5), ABYTE BYTE(1), AFLOAT FLOAT, ADATE DATE) UNIQUE PRIMARY INDEX (ASMALLINT); /*****************************************************************/ /* BEGIN TLOAD WITH ALL THE OPTIONS SPECIFIED SUCH AS ERRLIMIT, **/ /* CHECKPOINT, SESSIONS,TENACITY **/ /*****************************************************************/ .BEGIN LOAD ERRLIMIT 5 CHECKPOINT 15 SESSIONS 4 1 TENACITY 2 ERRORTABLE tlnt14err ROBUST ON PACK 20; .LAYOUT LAY1A; .FILLER ATEST * BYTEINT; .FIELD ABYTEINT * BYTEINT; .FIELD ASMALLINT * SMALLINT; .FIELD AINTEGER * INTEGER; .FIELD ADECIMAL * DECIMAL (5,2); .FIELD ACHAR * CHAR (5); .FIELD ABYTE * BYTE(1); .FIELD AFLOAT * FLOAT; .FIELD ADATE * DATE; .DML LABEL LABELA IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS; INSERT INTO TBL14TA VALUES (:ABYTEINT,:ASMALLINT,:AINTEGER,:ADECIMAL, :ACHAR,:ABYTE,:AFLOAT,:ADATE); .DML LABEL LABELB IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS; INSERT INTO TBL14TB VALUES (:ABYTEINT,:ASMALLINT,:AINTEGER,:ADECIMAL, :ACHAR,:ABYTE,:AFLOAT,:ADATE); .IMPORT INFILE ./tlnt014.dat LAYOUT LAY1A FROM 1 FOR 400 APPLY LABELA WHERE ATEST = 1 APPLY LABELB WHERE ATEST = 2; .END LOAD; .LOGOFF; From this script the space requirements can be calculated to be: 86 • 200 bytes for initialization + • 200 bytes * 6 for support environment statements + • 200 bytes * 2 for DML SQL statements + • 400 bytes for the BEGIN/END load pair + • 200 bytes for the IMPORT Teradata Parallel Data Pump Reference Chapter 2: Using TPump Estimating Space Requirements which is a starting total of 2400 bytes. Further, assume that the Teradata Database can accept about 32 statements per second and that the load takes a little more than an hour to complete. The space for partial and complete checkpoints is calculated with the following steps: 1 32 statements per second translates to 1920 statements per minute. 2 1920 / 20 (the packing factor) = 93 partial checkpoints/minute 3 Multiply by 15 (15 minute CP frequency) = 1395 partial checkpoint rows maximum. 4 Each checkpoint row is 117 + (12 * 20) = 357 bytes so 498,015 bytes are in partial checkpoint rows. 5 Given that the load takes just more than an hour, assume 5 checkpoints are written at 300 bytes each. Now we have the grand total of space in the log table: 2,400 + 498,015 + 1,500 = 517,980 bytes. Teradata Parallel Data Pump Reference 87 Chapter 2: Using TPump Estimating Space Requirements 88 Teradata Parallel Data Pump Reference CHAPTER 3 TPump Commands This chapter describes the TPump commands and Teradata SQL statements that you can execute from the TPump utility. Experienced TPump users can also refer to the simplified command descriptions in the TPump chapter of the Teradata Tools and Utilities Command Summary. This book provides the syntax diagrams and a brief description of the syntax variables for each Teradata client utility. Syntax Notes This section provides information you should know before using TPump commands and Teradata SQL statements. Each TPump command: • Must begin on a new line. • Must start with a period (.) character. In this document, TPump command periods are shown only in syntax diagrams and examples, but not in the narrative text. • Must end with a semicolon (;) character. • May continue for as many lines as necessary, as long as it satisfies the beginning and ending requirements. Statements are standard Teradata SQL statements and are not preceded by periods. See Appendix A: “How to Read Syntax Diagrams” for more information about how to read the syntax diagrams used in this book. TPump Commands Table 15 is an alphabetical list of the commands supported by TPump. The syntax and use of these commands is described in detail in this chapter. Teradata Parallel Data Pump Reference 89 Chapter 3: TPump Commands Syntax Notes Table 15: TPump Commands 90 Command Definition ACCEPT Accepts the data type and value of one or more utility variables from an external source. BEGIN LOAD Indicates the start of a TPump task and specifies the parameters for executing the task. DATEFORM Lets you define the form of the DATE data type specifications for the TPump job. DISPLAY Writes messages to the specified destination. DML Defines a label and error treatment option for the Teradata SQL DML statement(s) following the DML command. INSERT, UPDATE, DELETE, and EXECUTE are the DML statement options. ELSE The ELSE command is followed by commands and statements that execute when the preceding IF command is false. ENDIF Exits from the conditional expression IF or IF/ELSE command sequences. ENDIF is followed by commands and statements resuming the program. END LOAD Indicates completion of TPump command entries and initiates the task. This is the last command of a TPump task. FIELD Defines a field of the data source record. Fields specified by this command are sent to the Teradata Database. This command is used with the LAYOUT command. FILLER Defines a field in the data source that is not sent to the Teradata Database. This command is used with the LAYOUT command. IF The IF command is followed by a conditional expression which, if true, executes commands and statements following the IF command. IMPORT Identifies the data source, layout, and optional selection criteria to the client program. LAYOUT Specifies layout of the externally stored data records to be used in the TPump task. This command is used in conjunction with an immediately following sequence of FIELD, FILLER, and TABLE commands. LOGOFF Disconnects all active sessions and terminates execution of TPump on the client. LOGON Establishes a Teradata SQL session on the Teradata Database, and specifies the LOGON string to be used in connecting all sessions required by subsequent functions. LOGTABLE Identifies the table to be used for journaling checkpoint information required for safe, automatic restart of TPump in the event of a client or Teradata Database failure. NAME Sets the utility variable &SYSJOBNAME with a job name of up to 16 characters. PARTITION Establishes session partitions to transfer SQL requests to the Teradata Database. Teradata Parallel Data Pump Reference Chapter 3: TPump Commands Syntax Notes Table 15: TPump Commands (continued) Command Definition ROUTE Identifies the destination of output produced by TPump. RUN FILE Invokes the specified external source as the current source of commands and statements. SET Assigns a data type and a value to a utility variable. SYSTEM Suspends TPump to issue commands to the local operating system. TABLE Identifies a table whose column names and data descriptions are used as the names and data descriptions of the input record fields. Used in place of, or in addition to, the FIELD command. This command is used with the LAYOUT command. Note: When UTF16 session character set is used, the TABLE command will generate a field of CHAR(2n) for the CHAR(n) typed column and a field of VARCHAR(2m) for the VARCHAR(m) typed column because each character in the column takes 2-byte storage when using UTF16 session character set. TPump Teradata SQL Statements The following Teradata SQL statements supported by TPump are included in this chapter because they require special considerations for use with TPump. They are used for loading purposes and for creating TPump macros. The syntax and use of these Teradata SQL statements is described in detail in this chapter. Table 16: TPump Teradata SQL Statements Statement Definition DATABASE Changes the default database qualification for all DML statements. DELETE Removes specified rows from a table. EXECUTE Specifies a user-created (predefined) macro for execution. The macro named in this statement resides in the Teradata Database and specifies the type of DML statement (INSERT, UPDATE, or DELETE) being handled by the macro. INSERT Adds new rows to a table by directly specifying the row data to be inserted. UPDATE Statement and Atomic Upsert Changes field values in existing rows of a table. Teradata Parallel Data Pump Reference 91 Chapter 3: TPump Commands ACCEPT ACCEPT Purpose The ACCEPT command accepts data types and values from an external source and uses them to set one or more utility variables. Syntax , .ACCEPT var FILE A fileid FROM ; A IGNORE charpos1 charpos1 THRU THRU charpos2 charpos1 THRU charpos2 HE03A011 where Syntax Element Description var name of the utility variable that is to be set with the value accepted from the designated source Character string values appear as quoted strings in the data file. 92 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands ACCEPT Syntax Element Description fileid data source of the external system. The external system DD (or similar) statement specifies a file. UNIX and Windows infilename (the path name for a file). If the path name has embedded white space characters, enclose the entire path name in single or double quotes. MVS a true DDNAME. If DDNAME is specified, TPump reads data records from the specified source. A DDNAME must obey the same rules for its construction as Teradata SQL column names, except that: • the “at” sign (@) is allowed as an alphabetic character • the underscore (_) is not allowed The DDNAME must also obey the applicable rules of the external system. If DDNAME represents a data source on magnetic tape, the tape may be either labelled or nonlabelled (if the operating system supports it). VM/CMS A FILEDEF name. charpos1 and charpos2 start and end character positions of a field in each input record which contains extraneous information TPump ignores the specified field(s) as follows: • If charpos1 is specified, TPump ignores only the single specified character position. • If charpos1 THRU character positions are specified, TPump ignores character positions from charpos1 through the end of the record. • If THRU charpos2 is specified, TPump ignores character positions from the beginning of the record through charpos2. • If charpos1 THRU charpos2 is specified, TPump ignores character positions from charpos1 through charpos2. Usage Notes A single record, row, or input line is accepted from the designated source. Ensure that there is only one record in the file from which the ACCEPT command is getting the variables. If multiple variables are coded, each is sequentially assigned input text up to the first white space character encountered that is not within a quoted string. Input text for numeric values must be delimited only by white space or record boundaries. Input text for character strings must be enclosed in apostrophes. For example: .Accept age, name from file info; Teradata Parallel Data Pump Reference 93 Chapter 3: TPump Commands ACCEPT The data record provided to satisfy the preceding ACCEPT should include two fields. The following example shows two sample data records, where the first is correct but the next is not: 32 32 ’Tom’ Tom /* This line contains valid data. */ /* Tom is invalid data. */ An additional method of placing comments in input text is as follows: 32 ’Tom’; This line contains valid data. When the number of variables listed is greater than the number of responses available, unused variables remain undefined (NULL). If there are not enough variables to hold all responses, a warning message is issued. If the input source is a file, the next record (starting with the first) of the file is always retrieved. 94 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands BEGIN LOAD BEGIN LOAD Purpose The BEGIN LOAD command initiates or restarts a TPump task, specifying the number of sessions to use and any other parameters needed to execute the task. Syntax .BEGIN LOAD SESSIONS number A threshold A tname ERRORTABLE dbname. ERRLIMIT APPEND NODROP QUEUETABLE errcount errpercent CHECKPOINT frequency SERIALIZE OFF ON DATAENCRYPTION OFF ON ARRAYSUPPORT OFF ON TENACITY hours PACK statements PACKMAXIMUM LATENCY seconds RATE statement_rate RETRYTIMES nn SLEEP minutes NOTIMERPROCESS NOATOMICUPSERT NOMONITOR ROBUST ON OFF MACRODB NOTIFY dbname OFF LOW MEDIUM HIGH ULTRA Teradata Parallel Data Pump Reference EXIT name TEXT 'string ' MSG 'string ' 3021G017 95 Chapter 3: TPump Commands BEGIN LOAD where Syntax Element Description SESSIONS keyword for the number of TPump sessions number number of sessions to be logged on for update purposes for TPump. A TPump task logs on and uses the number of sessions specified. One additional session is used for performing various utility functions. There is no default value for number; it must be specified. Neither is there a maximum value, except for system-wide session limitations, which vary among machines. Limiting the number of sessions conserves resources on both the external system and the Teradata Database. This conservation is at the expense of a potential decrease in throughput and increase in elapsed time. threshold minimum number of sessions to be logged on for update purposes for the utility When logging on sessions, if the limits are reached above the threshold value, TPump stops trying to log on, and uses whatever sessions are already logged on. If the sessions run out before the threshold is reached, TPump logs off all sessions, waits for the time determined by the SLEEP value, and tries to log on again. ERRORTABLE optional keyword for identifying a database and error table You can use a database name as a qualifying prefix to the error tables. Specifying a database that is not your production database avoids cluttering your production system with error tables. This means that because the database should probably have a lot of PERM space, that space will not have to be increased for all databases with tables involved in the TPump task. Caution: APPEND Do not share the restart table between two or more TPump jobs. Each TPump job must have its own restart log table to ensure that it runs correctly. If you do not use a distinct restart log table for each TPump job, the results are unexpected. In addition, you may not be able to restart one or more of the affected jobs. will tell TPump to use the existing error table If the specified error table does not exist, TPump will create it. If the structure of the existing error table is not compatible with the error table TPump creates, the job will run into an error when TPump tries to insert or update the error table. NODROP will tell TPump not to DROP the error table even it is empty at the end of the job NODROP can be used with APPEND to persist the error table or alone. QUEUETABLE 96 will tell TPump to select the error table as a Queue Table Teradata Parallel Data Pump Reference Chapter 3: TPump Commands BEGIN LOAD Syntax Element Description dbname. the qualified database for the error table If the database is not specified, the database which contains the log table is used. The period following the dbname separates the database name from the tname parameter. If a different database is specified, it may help to avoid cluttering the production database with error tables. tname error table that receives information about errors detected during the load tname may be preceded by a database name qualifier. This table is referred to as the error table or ET table. TPump does not explicitly specify the level of protection applied to this table, using instead the default protection level applied to the database wherein the error table is placed. If the database specifies fallback, tname becomes fallback. The default error table name is composed of the job name, followed by an underscore and sequence number of the load, then an underscore and an ET, as in jobname_nnn_ET. tname identifies a nonexisting table for a nonrestart task, or an existing table for a restart task. For all errors inserted in this error table, the identifiers for the offending combination of statement and data record are included in the appropriate row of tname. The columns in the error table allow you to identify a specific data record and statement combination which produced an error. The column names and definitions of the error table are: Teradata Parallel Data Pump Reference 97 Chapter 3: TPump Commands BEGIN LOAD Syntax Element Description ImportSeq—A byteint containing the IMPORT command sequence number. DMLSeq —A byteint containing the sequence number of the DML command within the command file. SMTSeq—A byteint containing the sequence number of the DML statement within the DML command. ApplySeq—A byteint containing the sequence number of the APPLY clause within its IMPORT command. SourceSeq—An integer containing the position of a data record within a data source. DataSeq—A byteint identifying the data source. This value is always one. ErrorCode —An integer containing an error return code. ErrorMsg—Contains the corresponding error message for the error code. ErrorField —A smallint, which, if valid, indicates the bad field. The names of record fields sent to the Teradata Database are specified via the LAYOUT command, in conjunction with FIELD and TABLE commands. HostData - A variable length byte string containing the data sent by the external system. ERRLIMIT optional keyword for setting a limit on records rejected for errors When the ERRLIMIT is exceeded, TPump performs a checkpoint, then terminates the job. The data read before ERRLIMIT was exceeded will be submitted and completed before the job is terminated. This means when a job is terminated due to ERRLIMIT was exceeded, there may be more error records in the error table than the number specified in ERRLIMIT. To facilitate diagnosis of data errors, the ERRLIMIT should be greater than the number of statements packed into one request. 98 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands BEGIN LOAD Syntax Element Description errcount error threshold for controlling the number of rejected records Usage depends on whether used with the errpercent parameter. 1 When used without the errpercent parameter, specifies, as an unsigned integer, the number of records that can be rejected and recorded in tname during a load (all records sent between the BEGIN LOAD and END LOAD commands). The default is no limit. 2 When used with the errpercent parameter (which is approximate), specifies the maximum number of records that must be sent to the Teradata Database before the errpercent parameter is applied. For example, if errcount = 100 and errpercent = 5, then 100 records must be sent to the Teradata Database before the approximate 5 percent rejection limit is applied. If only the first five records are rejected when the 100th record is sent, the limit is not exceeded. If there are six rejections, however, then the limit is exceeded. After the 100th record is sent, TPump stops processing if the 5 percent limit has been exceeded. When the limit has been exceeded, TPump writes an error message to the external system’s customary message destination and terminates the task. All tables in use are left in their state at the time of the termination. This allows you to correct errors in data records and restart the task from the last checkpoint. If a restart is not possible or not desired, any unwanted tables should be dropped. CHECKPOINT keyword indicating the number of minutes between the occurrences of checkpoints This is followed by a frequency value. Teradata Parallel Data Pump Reference 99 Chapter 3: TPump Commands BEGIN LOAD Syntax Element Description frequency the interval in minutes between check pointing operations Specify an unsigned integer from 0 through 60, inclusive. If you specify a CHECKPOINT frequency of less than or equal to 60, a checkpoint is recorded at the specified frequency, in minutes. If you specify a CHECKPOINT frequency of more than 60, TPump terminates the job. Specifying a CHECKPOINT frequency of zero bypasses the checkpoint operation. Prior to Teradata Tools and Utilities 07.00.00, TPump initiated a checkpoint operation when the import began, regardless of whether a CHECKPOINT frequency was set to zero or not. After Teradata Tools and Utilities 07.00.00, the initial checkpoint operation is bypassed when the CHECKPOINT frequency was set to zero. Bypassing the initial checkpoint operation may cause data corruption when TPump restarts a job, or when a DBS restart occurs during a TPump job, whether or not the job is running in SIMPLE mode or ROBUST mode. If you do not specify a CHECKPOINT frequency, check pointing occurs every 15 minutes by default. Whether specified or not, checkpoints are written at the end of each data input source. Note: Checkpoints should not be set if you use an FDL-compatible INMOD routine with the FOR, FROM, or THRU options. When you use an FDL-compatible INMOD routine with the FOR, FROM, or THRU options, TPump terminates and an error message appears if the checkpoint frequency is other than zero. DATAENCRYPTION ON/OFF keyword to encrypt import data and the request text during the communication between TPump and Teradata Database If ON, the encryption will be performed. If DATAENCRYPTION is not specified, the default is OFF. The "-y" runtime parameter applies the encryption to all connected sessions, which include the control session and the load sessions. This option only applies the encryption to the load sessions, which are the sessions specified by the SESSIONS keyword in the BEGIN LOAD command, and overrides the "-y" runtime parameter when OFF is explicitly specified. For example, assuming the PARTITION command is not used in the job, when "-y" runtime parameter is specified with the job and DATAENCRYPTION OFF is specified in the script, the encryption will only apply to the control session. Similarly, assuming the PARTITION command is not used in the job when "-y" runtime parameter is not specified with the job, and DATAENCRYPTION ON is specified in the script, the encryption will apply to all load sessions but not the control session. When the PARTITION command is used, the encryption setting explicitly specified in the PARTITION command will override the setting of this option over the sessions defined by the PARTITION command. 100 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands BEGIN LOAD Syntax Element Description ArraySupport ON/OFF "ArraySupport ON|OFF" option to the .BEGIN LOAD command and the .DML command When "ArraySupport ON" is specified in the .BEGIN LOAD command, the .DML commands enclosed in .BEGIN LOAD and .END LOAD command pair will use the ArraySupport feature for its DML statement, unless "ArraySupport OFF" is specified for the .DML command. The default value of ArraySupport for the .BEGIN LOAD command is OFF. When "ArraySupport ON|OFF" is not specified with the .DML command, the default value for ArraySupport for that .DML command is the effective setting of ArraySupport in the .BEGIN LOAD command where the .DML command resides. When "ArraySupport ON|OFF" is specified at the .DML command, the specified value overrides the default setting determined by the .BEGIN LOAD command. When a .DML command is using the ArraySupport feature, it must contain one and only one DML statement, and the session partition that the .DML command references needs to be used by this .DML command exclusively. If the DML statement is an UPSERT-type statement, it can be specified as a pair of INSERT/UPDATE statements with DO INSERT FOR MISSING UPDATE clause. TPump will create its equivalent form of UPDATE … ELSE INSERT …, example Atomic Upsert, and use it as the actual DML statement. Or an UPDATE … ELSE INSERT … statement can be directly specified with DO INSERT FOR MISSING UPDATE clause. The non-atomic form of UPSERT is not supported by TPump Array Support. TENACITY keyword (with hours parameter) defining how long the utility tries to log on the sessions needed to perform the TPump job If a logon is denied, TPump delays for the time specified by the SLEEP parameter (the default is six minutes) and retries the logon. It retries until either the logon succeeds or the number of hours specified by TENACITY is exceeded. If the TENACITY parameter is not specified, the utility retries the logons for four hours. hours TPump tenacity factor, as an integral number of hours Specifies how long TPump keeps trying to logon to the required sessions. The default value for hours is 4 if the parameter is not specified. If hours is specified as 0, TPump does not retry logons after a logon fails because of a capacity limit. When a “no more sessions” error appears (either a 301 return code from a workstation CLI or a 513 return code from a mainframe CLI), TPump drops the sessions already acquired, and terminates the job without trying another logon. Teradata Parallel Data Pump Reference 101 Chapter 3: TPump Commands BEGIN LOAD Syntax Element Description LATENCY Keyword for flushing stale buffers. Note: When using the TPump latency option with Named Pipe Access Module, need_full_block = no option should be added in the Named Pipe Access Module initialization string. seconds flushing threshold based on number of seconds oldest record has resided in buffer. LATENCY cannot be less than one second. If the SERIALIZE parameter is set to OFF, only the current buffer can possibly be stale. If SERIALIZE is ON, the number of stale buffers can range from zero to the number of sessions. NOTIMERPROCESS keyword to tell TPump not to fork a child process as a timer process When a child process is forked, the SIGUSR2 signal notifies the parent process when the latency period expires. When a child process is not forked, the SIGALRM signal notifies the TPump process when the latency period expires. A child process is necessary for the latency function to work properly on the UNIX platforms when the MQSeries Access Module is used. minutes number of minutes to wait between unsuccessful logon attempts due to session limits errors on the Teradata Database or CLIv2 If SLEEP is not specified, the default between unsuccessful logon attempts is 6 minutes. SERIALIZE ON/OFF keyword to set the state (ON/OFF) of the serialization feature which, if ON, guarantees that operations on a given key combination (row) occur serially If SERIALIZE is not specified, the default is OFF. This feature is meaningful only when a primary key for the loaded data is specified by using the KEY option of the FIELD command. To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in the TPump job. PACKMAXIMUM keyword requesting TPump to dynamically determine the maximum possible PACK factor for the current load Maximum value is 600. Displayed in message UTY6652, the value thus determined should be specifically used on subsequent runs, as the use of PACKMAXIMUM requires iterative interactions with the RDBMS during initialization to heuristically determine the maximum possible PACK factor. PACK keyword for the number of statements to pack into a multiple-statement request Maximum value is 600. Packing improves network/channel efficiency by reducing the number of sends and receives between the application and the Teradata Database. 102 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands BEGIN LOAD Syntax Element Description statements number of statements, as a positive integer of up to 600, to pack into a multiple-statement request Default value is 20 statements per request. Under certain conditions, TPump may determine that the pack factor has been set too high. TPump then automatically lowers the pack setting to an appropriate value and issues warning message UTY6625, for instance: “UTY6625 WARNING: Packing has been changed to 12 statements per request” and continues. Packing improves network/channel efficiency by reducing the number of sends/receives between the application and the RDBMS. The packing factor is validated by sending a fully packed request to the Teradata Database using a prepare. This test checks for syntax problems and requests that are excessively large and may overwhelm the parser. To simplify the script development process, TPump ignores certain errors returned by an overloaded parser, shrinks the request, retries the prepare until it executes successfully and finally, issues a warning noting the revised packing factor size. When this happens, the TPump script should be modified to eliminate the warning, thereby avoiding the time-consuming process of shrinking the request. A packing failure may occur if the source parcel length does not match the data defined. If this happens, TPump issues the message: “UTY2819 WARNING: Packing may fail because input data does not match with the data defined.” To resolve this problem, increase the packing factor and resubmit the job. RATE keyword for entering the rate at which statements are sent to the Teradata Database RETRYTIMES nn keyword for retry times number of retry times Default is 16. If nn equals 0, the retry times will be set to 16. If retrytimes is set, this only takes effect for the requests/data between "BEGIN LOAD" and "END LOAD" pair. statement_rate initial maximum rate at which statements are sent to the Teradata Database per minute The statement rate must be a positive integer. If the statement rate is unspecified, the rate is unlimited. If the statement_ rate is less than the statement packing factor, TPump sends requests smaller than the packing factor. If the TPump Monitor is in use, the statement_rate can be changed later on. SLEEP Teradata Parallel Data Pump Reference keyword for the number of minutes to sleep 103 Chapter 3: TPump Commands BEGIN LOAD Syntax Element Description NOATOMICUPSERT keyword to perform non-atomic upsert operations for UPSERT DMLs in the job script if these UPSERT DMLs are not provided in the Atomic UPSERT form NOMONITOR keyword to prevent TPump from checking for statement rate changes from, or update status information for, the TPump Monitor ROBUST ON/OFF The OFF parameter signals TPump to use Simple restart logic. In this case, restarts cause TPump to begin where the last checkpoint occurred in the job. Any processing that occurred after the checkpoint is redone. This method does not have the extra overhead of the additional database writes in the Robust logic. Caution: Certain errors may cause reprocessing, resulting in extra error table rows due to reexecuting statements (attempting to re-insert rows, for example). Or, if the target table allows duplicate rows, reexecuting statements may cause extra duplicate rows to be inserted into the target table instead of causing extra error table rows. Simple logic is adequate in certain DML statements that can be repeated without changing the results of the operation. Examples of statements which are NOT Simple include the following: INSERTs into tables that allow duplicate rows (MULTISET tables). Self-referencing DML statements such as: “UPDATE FOO SET A=A+1...” “UPDATE FOO SET A = 3 WHERE A=4” MACRODB keyword for database to contain any macros used by TPump dbname name of database which is to contain any macros built/used by TPump This database overrides the default placement of macros into the database which contains the log restart table. NOTIFY TPump implementation of the notify user exit option: • NOTIFY OFF suppresses the notify user exit option. • NOTIFY LOW enables the notify user exit option for those events signified by “Yes” in the Low Notification Level column of Table 17. • NOTIFY MEDIUM enables the notify user exit option for the most significant events, as specified by “Yes” in the Medium Notification Level column of Table 17. • NOTIFY HIGH enables the notify user exit option for every TPump event that involves an operational decision point, as specified by “Yes” in the High Notification Level column of Table 17. • NOTIFY ULTRA enables the notify user exit option for every TPump event that involves an operational decision point, as specified by “Yes” in the ULTRA Notification Level column of Table 17. 104 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands BEGIN LOAD Syntax Element Description EXIT name keyword phrase that calls a user-defined exit where name is the name of a user-supplied library with a member name of _dynamn The default library names are: • NOTFYEXT for channel-attached VM and MVS client systems • libnotfyext.so for network-attached UNIX and Windows client systems The exit must be written in C, or in a language with a runtime environment that is compatible with C. On some versions of UNIX, you may have to add ./ prefix characters to the EXIT name specification if the module is in the current directory. TEXT 'string' user-supplied string of up to 80 characters that TPump passes to the named exit routine The string specification must be enclosed in single quote characters ('). MSG 'string' user-supplied string of up to 16 characters that TPump logs to: • The operator’s console for channel-attached VM and MVS client systems • The system log for network-attached UNIX and Windows client systems The string specification must be enclosed in single quote characters ('). Table 17: Events that Create Notifications Notification Level Event Low Medium High Ultra Signifies Initialize Yes Yes Yes Yes Successful processing of the Notify option (BEGIN LOAD command) File or INMOD Open No No Yes Yes Successful processing of the IMPORT command. Checkpoint Begin No No Yes Yes TPump started a checkpoint. Checkpoint End No No Yes Yes TPump successfully completed a checkpoint. Error Table No No Yes Yes Successful processing of the SEL COUNT(*) request for the error table. Table Statistics No Yes Yes Yes TPump has successfully written the table statistics. Teradata Database Restart No Yes Yes Yes TPump received a crash error from Teradata or CLI. CLIv2 Error Yes Yes Yes Yes TPump received a CLIv2 error. Teradata Parallel Data Pump Reference 105 Chapter 3: TPump Commands BEGIN LOAD Table 17: Events that Create Notifications (continued) Notification Level RDBMS Error Yes Yes Yes Yes A Teradata Database error that terminates TPump. Exit Yes Yes Yes Yes TPump completed a load task. Import Begin No No Yes Yes TPump is about to start reading records. Import End No No Yes Yes Last record has been read. Interim Run Statistics No No No Yes TPump is about to update the Monitor Interface table, or TPump successfully completed a checkpoint, or an Import has just completed successfully. DML Error No No Yes Yes TPump is about to log a DML error to the error table. Usage Notes Multiple tables can be targeted by a single TPump job. If the script author is uncertain whether or not to use ROBUST restart logic, it is always safe to use the ROBUST ON parameter. To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in the TPump job. The statement rate per minute you set using the RATE keyword is also affected by the periodicity value. By default, TPump uses a periodicity value of four when enforcing the statement rate limit. You can adjust the periodicity rate from 1 to 600 using a run-time parameter. For example, if you set the statement rate at 1600 and the periodicity at 10, then the maximum number of statements processed is 160 (1600/10) statements every 6 (60/10) seconds. Caution: 106 A LOGOFF command entered after a BEGIN and before the matching END LOAD logs you off the TPump utility. Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DATABASE DATABASE Purpose TPump supports the Teradata SQL DATABASE statement, which changes the default database qualification for all unqualified DML and DDL statements. Syntax DATABASE database ; 3021A020 where Syntax Element Description database new qualified default database for the error table Changes the database from the one originally specified by the BEGIN LOAD command. Usage Notes The DATABASE command only affects native SQL commands. In particular, it has no effect on the BEGIN LOAD command. The DATABASE command does affect INSERT, UPDATE, DELETE, and EXEC statements issued as part of a load. (When TPump logs on sessions, it immediately issues a DATABASE statement on each session.) The DATABASE command does not affect the placement of TPump macros. Teradata Parallel Data Pump Reference 107 Chapter 3: TPump Commands DATEFORM DATEFORM Purpose The DATEFORM command lets you define the form of the DATE data type specifications for the TPump job. Syntax .DATEFORM INTEGERDATE ; ANSIDATE 3021A006 where Syntax Element Description INTEGERDATE keyword that specifies integer DATE data types for the TPump job This is the default Teradata DATE data type specification for TPump jobs if you do not enter a DATEFORM command. ANSIDATE keyword that specifies ANSI fixed-length CHAR(10) DATE data types for the TPump job Usage Notes The topics in the following table describe the things you should consider when using the DATEFORM command. Topic Usage Notes Command Frequency and Placement • You can only use one DATEFORM command. • You must enter the command before the LOGON command. Data Type Conversions When you use the ANSIDATE specification, you must convert ANSI/SQL DateTime data types to fixed-length CHAR data types when specifying the column/field names in the TPump FIELD command. See the Usage Notes subsection of the FIELD command for a description of the fixed-length CHAR representations for each DATE, TIME, TIMESTAMP, and INTERVAL data type specification. Release Applicability 108 The ANSIDATE specification is valid for TPump jobs on the Teradata Database for UNIX. Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DELETE DELETE Purpose TPump supports the DELETE Teradata SQL statement, which removes rows from a table. Syntax tname DELETE WHERE conditional ; FROM HE05B032 where Syntax Element Description tname table from which rows are to be deleted tname is qualified either explicitly by database name, or by the current default database. WHERE condition conditional clause identifying the row(s) to delete The conditional clause uses values from input data record fields as defined by a FIELD command or TABLE command of the layout referenced by an IMPORT using this statement. Usage Notes The following notes describe how to use DELETE statements following a DML command. A DELETE statement may also be used in the support environment; normal rules for DELETE are followed in that case. TPump operates only on single table statements so DELETE statements must not contain any joins. To delete records from a table, the username specified on the LOGON command must have DELETE privilege on the specified table. When the condition(s) of the DELETE statement’s WHERE clause are evaluated, the result can be definitely true, definitely false, or indeterminate. If the result is true for a specific row, TPump deletes the row. An indeterminate result, due to an abnormal arithmetic condition such as underflow, overflow, or division by zero, is treated as an error, and TPump records both row and error code in the error table. Teradata Parallel Data Pump Reference 109 Chapter 3: TPump Commands DELETE The DELETE statement must identify only one object. Remember the following when constructing scripts: • A DELETE statement can be applied to either a table or view, provided that the view does not specify a join. • Equality values for all the primary index columns should normally be specified in the WHERE clause. The OR construct can be used in the WHERE clause of a DELETE statement; alternatively, two or more separate DML statements (one per OR term) can be used, with the DML statements applied conditionally with the APPLY clause of the IMPORT command. The nature of the alternatives will usually make one of the methods more appropriate. • The column(s) specified in this clause need not be a part of any index, but can be one or more nonindexed columns. This clause may specify nonequality values for any combination of columns of unique indices, or any values for other columns. • The maximum number of INSERT, UPDATE, DELETE, and EXECUTE statements that can be referenced in an IMPORT is 127. • The maximum number of DML statements that can be packed into a request is 600. The default number of statements packed is 20. Example The following example uses an input data source containing a series of one-field, four-byte records. Each record contains the value (EmpNum) of the primary index column (EmpNo) of a row to be deleted from the Employee table. .BEGIN LOAD SESSION number; .LAYOUT Layoutname; .FIELD EmpNum 1 INTEGER; .DML LABEL DMLlabelname; DELETE Employee WHERE EmpNo = :EmpNum; .IMPORT INFILE Infilename LAYOUT Layoutname .END LOAD; 110 APPLY DMLlabelname; Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DISPLAY DISPLAY Purpose The DISPLAY command can be used to write messages to the specified destination. Syntax DISPLAY 'text' FILE fileid ; TO 3021A021 where Syntax Element Description ‘text’ text to be written to the specified output destination fileid data source of the external system. The external system DD (or similar) statement specifies a file. UNIX and Windows infilename (the path name for a file) If the path name has embedded white space characters, enclose the entire path name in single or double quotes. MVS a true DDNAME. If DDNAME is specified, TPump reads data records from the specified source. A DDNAME must obey the same rules for its construction as Teradata SQL column names, except that: • the “at” sign (@) is allowed as an alphabetic character • the underscore (_) is not allowed A DDNAME must obey the applicable rules of the external system. If DDNAME represents a data source on magnetic tape, the tape may be either labelled or nonlabelled (if the operating system supports it). VM/CMS a FILEDEF name. Teradata Parallel Data Pump Reference 111 Chapter 3: TPump Commands DISPLAY Usage Notes Utility variables are replaced by their values before text is displayed. This is done by preceding the variable name with an ampersand (&). To display the name of a utility variable, code two ’&’s instead of one. To display an apostrophe within the text string, two consecutive apostrophes (single quotes) must be used to distinguish it from both the single quotes enclosing the string and a regular double quote. In UNIX-based systems, if outfilename is used to redirect stdout as well as the file in a DISPLAY command, the results written to outfilename may be incomplete due to conflicting writes to the same file. On UNIX systems, you can use an asterisk (*) as the fileid specification to direct the display messages to the system console/standard output (stdout) device. The system console is the: 112 • Display screen in interactive mode • Standard output device in batch mode Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DML DML Purpose The DML command defines a label and error treatment options for one or more immediately following DML statements. DML statements relevant to a TPump job are INSERT, UPDATE, DELETE, and EXECUTE, with UPDATE and INSERT statements sometimes paired to form either a basic upsert or an Atomic upsert operation. Syntax .DML LABEL label A ; A MARK ROWS DUPLICATE IGNORE INSERT UPDATE MISSING UPDATE DELETE EXTRA UPDATE DELETE DO INSERT FOR MISSING UPDATE , SERIALIZEON ( serialize_on_field , USE ( use_field PARTITION ARRAYSUPPORT ) ) partition_name OFF ON 3021E007 where Syntax Element Description LABEL keyword indicating that the following parameter is a label for the DML statements that follow Teradata Parallel Data Pump Reference 113 Chapter 3: TPump Commands DML Syntax Element Description label unique label is to be used for the immediately following set of one or more DML statements A label must obey the same rules for its construction as Teradata SQL column names. The label name may be referenced in the APPLY clause of an IMPORT command. MARK keyword indicating that the system should make a duplicate, missing, or extra INSERT, UPDATE, or DELETE row entry in the error table and continue processing If MARK is set and a uniqueness violation occurs, the duplicate rows go to the uniqueness violation table. In the case of an upsert, both the INSERT and UPDATE portions must fail for an error to be recorded. A row is a duplicate as follows: The table must be a nonunique primary index (NUPI) set table. (A set table does not allow duplicates; a MULTISET table does. MULTISET tables are only supported with Teradata for Windows NT.) Rows with NUPIs are duplicates if all the columns of a row are the exact duplicate of another row. If neither MARK or IGNORE is specified for duplicate rows, MARK applies to both INSERTs and UPDATEs. Similarly, if neither MARK or IGNORE is specified for missing or extra rows, MARK applies to both UPDATEs and DELETEs. MARK is the default for: • Both UPDATEs and DELETEs that refer to missing or extra rows. • Duplicate rows arising from both INSERTs and UPDATEs, except when those statements are combined to form an upsert, in which case the default is IGNORE. IGNORE keyword indicating that the system should not make an error table entry for the duplicate, missing, or extra INSERT, UPDATE, or DELETE row The system should continue processing instead. TPump determines whether a row is a duplicate as follows: The table must be a NUPI set table. TPump treats rows with NUPIs as duplicates if all the columns of a row are the exact duplicate of another row. IGNORE DUPICATE ROWS does not apply if there are ANY unique indexes in the row. If neither INSERT nor UPDATE is specified for duplicate rows, IGNORE applies to both INSERTs and UPDATEs. Similarly, if neither UPDATE nor DELETE is specified for missing or extra rows, IGNORE applies to both UPDATEs and DELETEs. IGNORE is the default condition for an upsert operation. INSERT 114 The upsert feature may be used (when used as DO INSERT FOR MISSING UPDATE ROWS or DO INSERT ROWS). Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DML Syntax Element Description An upsert saves you time while loading a database. An upsert completes, in one pass, an operation which requires two passes for other utilities. The DML statements that follow this option must be in the order of a single UPDATE statement followed by a single INSERT statement. This option first executes the UPDATE statement. If the UPDATE fails because the target row does not exist, TPump automatically executes the INSERT statement. This capability lets you update the database without first presorting the data. Otherwise, the data would have to be sorted into: • rows that need to be updated • rows that need to be inserted Further information on the usage and restrictions of the upsert feature appears in the following usage notes. PARTITION optional keyword used to name a session partition to be used for all SQL requests associated with this DML command If this keyword is not present, a session created from the SESSIONS will be used. Note: If serialization of two or more DML statements is required, the statements cannot be put in different partitions. Serialization requires that all DML statements with identical hash values of the rows be submitted from the same session. partition_name parameter identifying the partition name The partition name must obey the same rules for its construction as Teradata SQL column names. SERIALIZEON keyword used to turn serialization on for the fields you specify SERIALIZEON keyword may be used before, after, or between any IGNORE or MARK statements. serialize_on_field parameter identifying the field names for which you want to turn serialization on This is the same field name you used in the LAYOUT command used by the INSERT statement and referenced by the APPLY clause you have written. Separate the field names with a comma and enclose them in parentheses. USE keyword used to specify the fields that are to be used with a DML’s SQL statements Use of this keyword allows you to specify which FIELDs from the LAYOUT command are actually needed for each DML, so that data from all fields will not be sent. The USE keyword may be placed before, after, or between any IGNORE/ MARK statements. use_field parameter identifying the field names to use Every LAYOUT FIELD used by any of the DML’s SQL statements must be enumerated in the USE list; otherwise, an error will occur. Separate the field names with a comma and enclose them in parentheses. Teradata Parallel Data Pump Reference 115 Chapter 3: TPump Commands DML Syntax Element Description ArraySupport ON/OFF "ArraySupport ON|OFF" option to the .BEGIN LOAD command and the .DML command When "ArraySupport ON" is specified in the .BEGIN LOAD command, the .DML commands enclosed in .BEGIN LOAD and .END LOAD command pair will use the ArraySupport feature for its DML statement, unless "ArraySupport OFF" is specified for the .DML command. The default value of ArraySupport for the .BEGIN LOAD command is OFF. When "ArraySupport ON|OFF" is not specified with the .DML command, the default value for ArraySupport for that .DML command is the effective setting of ArraySupport in the .BEGIN LOAD command where the .DML command resides. When "ArraySupport ON|OFF" is specified at the .DML command, the specified value overrides the default setting determined by the .BEGIN LOAD command. When a .DML command is using the ArraySupport feature, it must contain one and only one DML statement and the session partition that the .DML command references needs to be used exclusively by this .DML command. If the DML statement is an UPSERT-type statement, it can be specified as a pair of INSERT/UPDATE statements with DO INSERT FOR MISSING UPDATE clause. TPump will create its equivalent form of UPDATE … ELSE INSERT …, example Atomic Upsert, and use it as the actual DML statement. Or an UPDATE … ELSE INSERT … statement can be directly specified with DO INSERT FOR MISSING UPDATE clause. The non-atomic form of UPSERT is not supported by TPump Array Support. Usage Notes The SQL EXECUTE command must be used between the BEGIN LOAD command and the END LOAD command. All INSERT, UPDATE, DELETE, and EXECUTE statements specified in the TPump script should fully specify the primary index of the referenced table to prevent the generation of table-level locks. A maximum of 600 DML statements may be packed into a request; the default is 20 statements. TPump assumes that row hash locking is used by INSERT, UPDATE, DELETE, and EXECUTE statements. If row hash locking is not used, TPump will run anyway, but may encounter trouble because table-level locking will cause each statement to block. In addition, TPump does not support UPDATE or EXECUTE statements that modify the primary index of the target table. TPump performs no checking to prevent the script author from creating DML that requests unreasonable updates, except that TPump will not use Atomic UPSERT if the UPDATE portion of the UPSERT specifies an unreasonable update. This restriction is imposed by the Teradata Database. IGNORE DUPICATE ROWS does not apply if there are ANY unique indexes in the row. TPump converts INSERT, UPDATE, and DELETE statements into macro equivalents, and, depending on the packing specified, submits multiple statements in one request. TPump also 116 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DML supports the EXECUTE statement, which can be used to bypass the macro creation step for frequently executed macros. For more information on the EXECUTE statement, refer to EXECUTE in this chapter. The maximum number of INSERT, UPDATE, DELETE, and EXECUTE statements that can be referenced in an IMPORT is 127. At the end of an IMPORT, an environmental variable is established for each DML command executed. TPump variables are not limited to 30 characters. These variables contain the activity counts associated with each statement. The variables created are of the form: &IMP <n>_<Apply label>_<x> where n = the number of the IMPORT, from one through four. Apply label = the label of the clause containing the DML command in question. x = the number of the statement within the containing APPLY clause. Serialization The SERIALIZEON keyword lets you turn serialization on for the fields you specify. You can use the SERIALIZEON keyword before, after, or between any IGNORE or MARK statements. You can also use the SERIALIZEON keyword with the SERIALIZE keyword in the BEGIN LOAD command and with the KEY keyword in the FIELD command. When you do, the DML-level serialization ignores and overrides the BEGIN LOAD-level serialization. In addition, you can mix DML serialized APPLYs with nonserialized DML APPLYs in the same IMPORT command. See “BEGIN LOAD” and “FIELD” for details about these commands. Sample Scripts The following is an example using the SERIALIZEON keyword: .LOGTABLE TPLOG01; .LOGON slugger/dbc,dbc; .BEGIN LOAD ERRLIMIT 100 50 CHECKPOINT 15 SESSIONS 20 TENACITY 2 ERRORTABLE TPERR01 PACK 30 ROBUST ON NOMONITOR; .LAYOUT LAY01; .FIELD cc1 * CHAR(8); .FIELD cc2 * CHAR(4); .FIELD cc3 * CHAR(6); .FIELD cc4 * CHAR(62); .DML LABEL LABEL01 DO INSERT FOR MISSING UPDATE ROWS Teradata Parallel Data Pump Reference 117 Chapter 3: TPump Commands DML SERIALIZEON (CC1); UPDATE TPTBL01 SET C4 = :CC4 WHERE C1 = :CC1; INSERT TPTBL01 (C1, C2, C4) VALUES (:CC1,:CC2, CC4); UPDATE TPTBL02 SET C4 = :CC4 WHERE C1 = :CC1 AND C2 = :CC2; INSERT TPTBL02 (C1, C2, C3, C4) VALUES (:CC1,:CC2,:CC3, :CC4); .IMPORT INFILE .\TPDAT.txt FORMAT TEXT LAYOUT LAY01 APPLY LABEL01 APPLY LABEL02; .END LOAD; .LOGOFF; The following is an example using the USE keyword: .LOGTABLE TPLOG01; .LOGON slugger/cfl,cfl; DROP DROP DROP DROP TABLE TABLE TABLE TABLE TPERR01; TPTBL01; TPTBL02; TPTBL03; CREATE TABLE TPTBL01( C1 INTEGER, C2 VARCHAR(30), C3 VARCHAR(30), C4 VARCHAR(30), C5 VARCHAR(30), C6 VARCHAR(30)) UNIQUE PRIMARY INDEX (C1); CREATE TABLE TPTBL02( C1 INTEGER, C2 VARCHAR(30), C3 VARCHAR(30), C4 VARCHAR(30), C5 VARCHAR(30)) UNIQUE PRIMARY INDEX (C1); CREATE TABLE TPTBL03( C1 INTEGER, C2 VARCHAR(30), C3 VARCHAR(30), C4 VARCHAR(30), C5 VARCHAR(30), C6 VARCHAR(30), C7 VARCHAR(30), C8 VARCHAR(30), C10 VARCHAR(30), C11 VARCHAR(30)) UNIQUE PRIMARY INDEX (C1); 118 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DML .BEGIN LOAD CHECKPOINT 15 SESSIONS 5 ERRORTABLE TPERR01 NOMONITOR; .LAYOUT LAY01; .FIELD FLD1 * VARCHAR(10); .FIELD FLD2 * VARCHAR(10); .FIELD FLD3 * VARCHAR(10); .FIELD FLD4 * VARCHAR(15); .FIELD FLD5 * VARCHAR(20); .FIELD FLD6 * VARCHAR(25); .FIELD FLD7 * VARCHAR(30); .FIELD FLD8 * VARCHAR(30); .FIELD FLD9 * VARCHAR(1); .FIELD FLD10 * VARCHAR(30); .FIELD FLD11 * VARCHAR(30); .DML LABEL LABEL01 USE(FLD1, FLD2, FLD4, FLD6, FLD8, FLD10); INSERT TPTBL01 (C1, C2, C3, C4, C5, C6) VALUES (:FLD1,:FLD2,:FLD4,:FLD6,:FLD8,:FLD10); .DML LABEL LABEL02 USE(FLD1, FLD3, FLD5, FLD7, FLD11); INSERT TPTBL02 (C1, C2, C3, C4, C5) VALUES (:FLD1,:FLD3,:FLD5,:FLD7,:FLD11); .DML LABEL LABEL03; INSERT TPTBL03 (C1, C2, C3, C4, C5, C6, C7, C8, C10, C11) VALUES (:FLD1, :FLD2, :FLD3, :FLD4, :FLD5, :FLD6, :FLD7, :FLD8, :FLD10, :FLD11); .IMPORT INFILE INDATA FORMAT VARTEXT ',' LAYOUT LAY01 APPLY LABEL01 WHERE FLD9 = 'A' APPLY LABEL02 WHERE FLD9 = 'B' APPLY LABEL03; .VERSION; .END LOAD; .LOGOFF; Note that as in the above example, DML USE APPLYs can be mixed with DML APPLYs not using the USE keyword within the same IMPORT. The following is an example using partitioning: .LOGTABLE TPLOG01; .LOGON cs4400s3/cfl,cfl; DROP TABLE TPTBL01; DROP TABLE TPTBL02; DROP TABLE TPERR01; CREATE TABLE TPTBL01, FALLBACK( C1 CHAR(12) not null, C2 CHAR(8) not null) Teradata Parallel Data Pump Reference 119 Chapter 3: TPump Commands DML PRIMARY INDEX (C1); CREATE TABLE TPTBL02, FALLBACK( C1 CHAR(12), C2 CHAR(8), C3 CHAR(6)) UNIQUE PRIMARY INDEX (C1); .BEGIN LOAD ERRLIMIT 100 50 CHECKPOINT 15 TENACITY 2 ERRORTABLE TPERR01 ROBUST off serialize on ; .LAYOUT LAY02; .FIELD cc1 * CHAR(12) key; .FIELD cc2 * CHAR(8); .FIELD cc3 * CHAR(6); .filler space1 * char(1); .partition part1 pack 10 sessions 10; .partition part2 sessions 5 1 packmaximum; .DML LABEL LABEL01 partition part1 DO INSERT FOR MISSING ROWS ignore extra update rows use(cc1, cc2); UPDATE TPTBL01 SET C2 = :CC2 WHERE C1 = :CC1; INSERT TPTBL01 (C1, C2) VALUES (:CC1,:CC2); .DML LABEL LABEL02 partition part2 serializeon( cc1 ) ignore extra update rows DO INSERT FOR MISSING UPDATE ROWS; UPDATE TPTBL02 SET C2 = :CC2 WHERE C1 = :CC1; INSERT TPTBL02 (C1, C2, C3) VALUES (:CC1,:CC2,:CC3); .IMPORT INFILE c:\NCR\Test\TpumpData001.txt FORMAT TEXT LAYOUT LAY02 APPLY LABEL01 APPLY LABEL02 where CC2 = '00000001'; .END LOAD; .LOGOFF; The Basic Upsert Feature When using the basic upsert feature: 120 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DML • There must be exactly two DML statements in this DML group. • The first DML statement must be an UPDATE statement that follows all of the TPump task rules. • The second DML statement must be an INSERT statement. • Both DML statements must refer to the same table. • The INSERT statement, when built, must reflect the same primary index specified in the WHERE clause of the UPDATE statement. This is true for both a single column primary index and a compound primary index. By following these rules, you will find a number of uses for the DO INSERT ROWS option. In the past, you could either presort data into INSERTs and UPDATEs, or attempt UPDATEs with all the data, and then do an INSERT on any UPDATEs that failed. With upsert, TPump needs only one pass of the data to UPDATE rows that need to be updated and INSERT rows that need to be inserted. Note: To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in the TPump job. If you specify MARK MISSING UPDATE ROWS, while using DO INSERT ROWS, TPump records any UPDATE that fails. This record appears in the Application Error Table, together with an error code that shows that the INSERT of the DO INSERT ROWS was then executed. If the INSERT fails, the INSERT row is also recorded in the Application Error table. The default for an upsert function, however, is not to mark missing update rows. This is because when you perform the upsert function, you expect the INSERT to occur when the UPDATE fails. The failure of the UPDATE portion of an upsert does not, in itself, constitute an error and should not be treated as one. The MARK MISSING DELETE ROW option has no meaning when used with the DO INSERT ROWS option. The option of MARK (IGNORE) EXTRA DELETE (UPDATE) ROWS provides TPump with a way to protect against an update or delete affecting multiple rows, which can happen in TPump because the primary index can be non-unique. MARK is the default for all DML options, except for an upsert. Example Upsert Each record in the following example contains the value of the primary index column (EmpNo) of a row of the Employee table whose PhoneNo column is to be assigned a new phone number from field Fone. When the UPDATE fails, the INSERT statement is activated and TPump enters the upsert mode. In this case, each record contains the primary index value (EmpNum) of a row that is to be inserted successively into the Employee table whose columns are EmpNo and PhoneNo. .BEGINLOAD SESSION number; .LAYOUT Layoutname; .FIELD EmpNum 1 INTEGER; .FIELD Fone * (CHAR (10)); .DML LABEL DMLlabelname Teradata Parallel Data Pump Reference 121 Chapter 3: TPump Commands DML DO INSERT FOR MISSING UPDATE ROWS; UPDATE Employee SET PhoneNo = :Fone WHERE EmpNo = :EmpNum; INSERT Employee (EmpNo, PhoneNo) VALUES (:EmpNum, :Fone); .IMPORT INFILE Infilename LAYOUT Layoutname APPLY DMLlabelname; .END LOAD; The scope of a DML command (and its label) terminates at the first following command of any kind or at the end of the file containing the DML statements, whichever occurs first. The SQL EXECUTE command must be between the BEGIN LOAD command and END LOAD command. For IMPORT tasks, you may specify up to five distinct error treatment options for one DML command. For example: .DML LABEL COMPLEX IGNORE DUPLICATE INSERT MARK DUPLICATE UPDATE IGNORE MISSING UPDATE MARK MISSING DELETE DO INSERT FOR MISSING ROWS ROWS ROWS ROWS UPDATE ROWS; It is valid to specify that missing update rows be both marked and treated as INSERTs or, as in the example, both ignored and treated as INSERTs. If TPump encounters any of the following: • no DML command in an IMPORT task, • DML statements outside the scope of a DML command in an IMPORT task, or • a DML command with no DML statements in an IMPORT task, it writes a diagnostic message to the primary output destination for the system, terminates the TPump task, and returns to the main TPump control module with a conventional nonzero return code. You can then correct the error and resubmit the TPump task. The DML commands (with their following DML statements) must appear between the appropriate BEGIN LOAD command and the IMPORT commands that refer to them. When the END LOAD command is encountered, the sequence of DML commands and DML statements is forgotten. The DML command cannot be shared by multiple BEGIN LOAD statements. The DML statements are described in the following sections. The maximum number of DML commands that can be used in a single TPump task is 128. If an excessive number of DML commands and statements are sent to the Teradata Database, an error message is logged, stating that there are too many DML steps for one TPump job. The Atomic Upsert Feature The basic upsert function has been enhanced to support an Atomic upsert capability. This enhancement permits TPump to perform single-row upserts in a single pass. This one-pass logic adopts the upsert-handling technique used by MultiLoad. The one-pass logic is designated Atomic to distinguish the grouping of paired UPDATE and INSERT statements which are executed as a single SQL statement. The syntax for Atomic upsert consists of an UPDATE statement and an INSERT statement, separated by an ELSE keyword. 122 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DML Existing TPump scripts using the Atomic upsert form do not have to be changed. TPump will automatically convert the old UPDATE/INSERT pairs to the Atomic upsert feature whenever appropriate. Any attempts to change this will result in a syntax error. The new syntax, which can also be used by CLIv2 and BTEQ applications, is dependent on whether or not the RDBMS version, against which the TPump job is run, supports this feature. If the RDBMS does not support Atomic Upsert, TPump reverts to the earlier logic of sending the INSERT request if an UPDATE request fails. The three basic constraints on the upsert feature are: 1 SAME TABLE: The UPDATE and INSERT statements must specify the same table. 2 SAME ROW: The UPDATE and INSERT statements must specify the same row; that is, the primary index value in the INSERT row must be the same as the primary value in the targeted UPDATE row. 3 HASHED ROW ACCESS: The UPDATE fully specifies the primary index, allowing the target row to be accessed with a one-AMP hashed operation. Although TPump does not verify basic upsert constraints, the Teradata Database will reject Atomic upsert constructs that fail the constraint test, and notify TPump by returning an appropriate error message to the client. Other Restrictions on Atomic Upsert Feature Some of these restrictions concern syntax that is supported in UPDATE and INSERT statements separately, but not when combined in an Atomic upsert statement. Other restrictions concern the upsert feature's not supporting certain Teradata Database features such as triggers and join/hash indexes, meaning that the upsert statement cannot be used on any table utilizing those features. The following restrictions are not supported by the Atomic upsert feature, and return an error if submitted to the Teradata Database: 1 INSERT-SELECT: Syntax not supported. The INSERT may not use a subquery to specify any of the inserted values. Note that support of this syntax is likely to be linked to support of subqueries in the UPDATE's WHERE clause constraints as described above, and may involve new syntax features to allow the UPDATE and INSERT to effectively reference the same subquery. 2 UPDATE-WHERE-CURRENT: Syntax not supported. The WHERE clause cannot use an updatable cursor to do what is called a positioned UPDATE. (It is unlikely that this syntax will ever be supported.) Note that this restriction does not prevent cursors from being used in other ways with Atomic upsert statements. For example, a DECLARE CURSOR statement may include upsert statements among those to be executed when the cursor is opened, as long as the upserts are otherwise valid. 3 UPDATE-FROM: Not supported. The SET clause cannot use a FROM clause table reference in the expression for the updated value for a column. 4 UPDATE-WHERE SUBQUERIES: Not supported. The WHERE clause cannot use a subquery either to specify the primary index or to constrain a nonindex column. Note that Teradata Parallel Data Pump Reference 123 Chapter 3: TPump Commands DML supporting this UPDATE syntax would also require support for either INSERT-SELECT or some other INSERT syntax feature that lets it specify the same primary index value as the UPDATE. 5 UPDATE-PRIMARY INDEX: Not supported. The UPDATE cannot change the primary index. This is sometimes called unreasonable update. 6 TRIGGERS: Feature not supported if either the UPDATE or INSERT could cause a trigger to be fired. The restriction applies as if the UPDATE and INSERT were both executed, because the parser trigger logic will not attempt to account for their conditional execution. UPDATE triggers on columns not referenced by the UPDATE clause, however, will never be fired by the upsert and are therefore permitted. DELETE triggers cannot be fired at all by an upsert and are likewise permitted. Note that an upsert could be used as a trigger action but it would be subject to the same constraints as any other upsert. Because an upsert is not allowed to fire any triggers itself, an upsert trigger action must not generate any further cascaded trigger actions. 7 JOIN/HASH INDEXES: Feature not supported if either the UPDATE or INSERT could cause the join/hash index to be updated. As with triggers, the restriction applies to each upsert as if the UPDATE and INSERT were both executed. While the UPDATE could escape this restriction if the join/hash index does not reference any of the updated columns, it is much less likely (and maybe impossible) for the INSERT to escape. If the benefit of lifting the restriction for a few unlikely join/hash index cases turns out to be not worth the implementation cost, the restriction may have to be applied more broadly to any table with an associated join/hash index. Treat the failed constraint as a nonfatal error, report the error in the job log for diagnostic purposes, and continue with the job by reverting to the old non-Atomic upsert protocol. Existing TPump Scripts Existing TPump scripts for upserts do not need to be changed. The syntax as described below for an upsert will continue to be supported: DO INSERT FOR MISSING UPDATE ROWS; UPDATE <update-operands>; INSERT <insert-operands>; Atomic Upsert Examples This section describes several examples that demonstrate how the Atomic upsert feature works, including cases where errors are detected and returned to the user. All of the examples use the same table, called Sales, as shown below: CREATE TABLE Sales, FALLBACK, (ItemNbr INTEGER NOT NULL, SaleDate DATE FORMAT 'MM/DD/YYYY' NOT NULL, ItemCount INTEGER) PRIMARY INDEX (ItemNbr); It is assumed that the table has been populated with the following data: INSERT INTO Sales (10, '05/30/2005', 1); 124 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands DML Assume that there exists a table called NewSales that has the same column definitions as those of table Sales. Example 1 (Error: different target tables) This example demonstrates an upsert statement that does not specify the same table name for the UPDATE part and the INSERT part of the statement. .Dml label upsertdml do insert for missing update rows; UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = ’05/30/2005’); INSERT INTO NewSales (10,’05/30/2005’, 1); A rule of an upsert statement is that only one single table is processed for the statement. Because the tables, Sales and NewSales, are not the same for the upsert statement, an error is returned, indicating that the name of the table must be the same for both the UPDATE and the INSERT. Example 2 (Error: different target rows) This example demonstrates an upsert statement that does not specify the same primary index value for the UPDATE part and the INSERT part of the statement. .Dml label upsertdml do insert for duplicate update rows; UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = ’05/30/2005’); INSERT INTO Sales (20,’05/30/2005’, 1); The primary index values for the UPDATE and the INSERT must be the same. Otherwise, we would be looking at two different rows: one for UPDATE and the other for INSERT, which is not the purpose of an upsert. An error is returned for the upsert statement because the specified primary index values of 10 and 20 are not the same (the primary index value must be the same for both the UPDATE and the INSERT). Example 3 (Valid Upsert UPDATE) This example demonstrates a successful upsert statement where a row gets updated. .Dml label upsertdml do insert for missing update rows; UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005'); INSERT INTO Sales (10, '05/30/2005', 1); After all of the rules have been validated, the row with ItemNbr = 10 and SaleDate = '05/30/ 2005' gets updated. A successful update of one row is returned. Example 4 (Valid Upsert INSERT) This example demonstrates a successful upsert statement where a row gets inserted. .Dml label upsertdml do insert for missing update rows; UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 20 AND SaleDate = '05/30/2005') INSERT INTO Sales (20, '05/30/2005', 1); After all of the rules have been validated and no row was found with Item = 20 and SaleDate = '05/30/2005' for the UPDATE, a new row is inserted with ItemNbr = 20. A successful insert of one row is returned. Teradata Parallel Data Pump Reference 125 Chapter 3: TPump Commands END LOAD END LOAD Purpose The END LOAD command must be present as the last command of a TPump task to initiate the execution of the task. Syntax .END LOAD ; 3021A022 126 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands EXECUTE EXECUTE Purpose TPump supports the Teradata SQL EXECUTE statement, which specifies a user-created (predefined) macro for execution. The EXECUTE statement specifies the type of DML statement (INSERT, UPDATE, DELETE, or UPSERT) to be handled by the macro. The macro named in this EXECUTE statement must reside in the Teradata Database before the import task starts. Only one DML statement (INSERT, UPDATE, DELETE, or UPSERT) can be specified in a TPump predefined macro. Caution: The SQL EXECUTE command must be used between the BEGIN LOAD command and the END LOAD command. Syntax EXECUTE EXEC name database. UPDATE/UPD INSERT/INS DELETE/DEL ; UPSERT/UPS 3021A001 where Syntax Element Description database name of the database in the Teradata Database where the macro to be executed resides name name of the macro resident in the Teradata Database to be executed DELETE/DEL keyword indicating a DELETE statement is being executed by the macro INSERT/INS keyword indicating an INSERT statement is being executed by the macro UPDATE/UPD keyword indicating an UPDATE statement is being executed by the macro UPSERT/UPS keyword indicating an Atomic upsert is being executed by the macro Usage Notes Using predefined macros saves time because TPump does not need to create and drop new macros each time a TPump job script is run. The rules for user-created macros are: Teradata Parallel Data Pump Reference 127 Chapter 3: TPump Commands EXECUTE • TPump expects the parameter list for any macro to match the FIELD list specified by the LAYOUT in the script. FILLER fields are ignored. If the USE clause is used in the DML statement, TPump expects the parameter list for every macro in the DML statement to match the field list specified by the USE clause. The order should be the same as the fields in the LAYOUT. • The macro should specify a single prime index operation: INSERT, UPDATE, DELETE, or UPSERT. TPump reports an error if the macro contains more than one supported statement. • The restrictions on INSERT, UPDATE, DELETE, and UPSERT statements supported by TPump are described in the corresponding sections of this manual. If the EXECUTE statement is replacing an INSERT, UPDATE, DELETE, or UPSERT statement in a job script, the EXECUTE statement must be placed at the same location as the INSERT, UPDATE, DELETE, or UPSERT statement that it replaces. The following example shows an INSERT statement is replaced by an equivalent predefined macro: .DML LABEL LABELA; DELETE <delete-operands> ; INSERT <insert-operands> ; UPDATE <update-operands> ; .DML LABEL LABELA ; DELETE <delete-operands> ; EXECUTE <insert-macro-name> INSERT ; UPDATE <update-operands> ; The correct syntax for a TPump predefined macro is one of the following: • CREATE MACRO <name> (<parameter list>) as (UPDATE....; ) ; • CREATE MACRO <name> (<parameter list>) as (INSERT.....; ) ; • CREATE MACRO <name> (<parameter list>) as (DELETE.....; ) ; • CREATE MACRO <name> (<parameter list>) as (UPSERT.....; ) ; If the Teradata Database server supports Atomic upsert, then automatic use of Atomic upsert is allowed, when possible, without changing existing TPump scripts. This is accomplished in the following manner: • TPump attempts to use the Atomic upsert syntax in defining a single UPSERT macro (instead of an UPDATE/INSERT macro pair). • If the UPSERT macro is successfully defined, TPump uses the Atomic upsert function for the UPSERT. • If an error occurs during UPSERT macro definition, presumably due to a violation of Teradata Database Atomic upsert restrictions, TPump issues a warning and reverts to the current TPump upsert method of paired UPDATE/INSERT statements. TPump will continue to operate as it does now when the existing TPump syntax for upsert is encountered, and references to predefined macros are used for either the UPDATE or the INSERT or both. 128 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands EXECUTE For example: .DML LABEL <dml-label-name> DO INSERT FOR MISSING UPDATE ROWS ... ; EXECUTE <update-macro-name> UPDATE ; INSERT <insert-operands> ; .DML LABEL <dml-label-name> DO INSERT FOR MISSING UPDATE ROWS ... ; UPDATE <update-operands> ; EXECUTE <insert-macro-name> INSERT ; .DML LABEL <dml-label-name> DO INSERT FOR MISSING UPDATE ROWS ... ; EXECUTE <update-macro-name> UPDATE ; EXECUTE <insert-macro-name> INSERT ; To allow for the use of predefined macros that take advantage of Atomic upsert, TPump command syntax supports an UPSERT macro: .DML LABEL <dml-label>; EXECUTE <upsert-macro-name> UPSERT ; When using predefined macros for atomic upserts, the DML statement will have “Ignore Missing Update Rows” as a default option. Atomic upsert syntax is not backward compatible; thus you cannot use it until you update the Teradata Database server to a compatible release. If the Teradata Database supports Atomic upsert, a TPump run can handle a mix of both standard and Atomic upserts. Upserts are reported as UPDATEs and INSERTs in the statistics displayed by TPump (and passed to the NOTIFY EXIT routine), because an Atomic upsert that results in an UPDATE will be reported by the Teradata Database as an UPDATE activity type, and an Atomic upsert that results in an INSERT will be reported by the Teradata Database as an INSERT activity type. Teradata Parallel Data Pump Reference 129 Chapter 3: TPump Commands FIELD FIELD Purpose The FIELD command specifies a field of the input record; it can also contain a NULLIF expression. All fields specified by FIELD commands are sent to the Teradata Database. Only fields relevant to the tasks using this layout need be specified. Syntax .FIELD fieldname startpos datadesc fieldexpr NULLIF BLANKS NULLS TRAILING LEADING A nullexpr A B DROP LEADING TRAILING AND NULLS BLANKS ; B KEY 3021A019 where Syntax Element Description fieldname name of an input record field to which: 1 a DML statement refers, 2 a nullexpr of a FIELD command or condition expression of a LAYOUT command refers, or 3 a condition expression of the IMPORT command APPLY clause refers. A fieldname must obey the same rules for its construction as Teradata SQL column names. fieldname can be referenced in other FIELD commands via NULLIF and field concatenation expressions, and in APPLY WHERE conditions in IMPORT commands. startpos starting position of a field of the data records in an external data source It may be specified as an unsigned integer which is a character position starting with 1, or as an asterisk which means the next available character position beyond the preceding field. Nothing prevents redefinition of positions of input records by specifying the same positions in multiple FIELD commands. See the example below. Note that where input records may be continued by use of the CONTINUEIF condition, a startpos specified as an unsigned integer refers to a character position in the final concatenated result from which the continuation indicator fields have been removed. Refer to the description of the condition parameter of the LAYOUT command. 130 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands FIELD Syntax Element Description datadesc type and length of data in the field. TPump generates the USING phrase accordingly with the user-assigned field name to which the body of the DML statement refers. nullexpr condition used for selectively inserting a null value into the affected column The condition is specified as a conditional expression involving any number of fields, each represented by its fieldname, and constants. All fieldnames appearing in the conditional expression must be defined by any of the following specifications: 1 The startpos and datadesc parameters of the FIELD command. 2 A FILLER command. 3 A TABLE command. If the specified condition is true, TPump sends the data to the Teradata Database with indicators, whether or not the INDICATORS option is specified on the LAYOUT command. When the character set of the job script is different from the client character set used for the job (for example, on MVS the job script must be in Teradata EBCDIC when using the UTF8 client character set, or UTF16 client character set can be used with the job script in UTF8), TPump will translate the string constants specified in the expression and the import data referenced in the expression to the same character set before evaluating the expression. For example, when the client character set is UTF16 and the script character set is UTF8, if the following commands are given: .field C1 .field C2 * * varchar(20); varchar(40) nullif c1 = 'DELETED'; TPump will translate the data in the C1 field to the UTF8 form and compare it with the UTF8 form of 'DELETED' to obtain the evaluation result. Similarly, on the mainframe, when the client character set is UTF8 and the script character set is Teradata EBCDIC, if the following commands are given: .field C1 .field C2 * * char(20); rchar(40) nullif c1 = 'removed'; TPump will translate the data in the C1 field from the UTF8 form to the Teradata EBCDIC form and compare it to the Teradata EBCDIC form of 'removed' to obtain the valuation result. Caution: When using UTF8 client set on the mainframe, be sure to examine the definition in the International Character Set Support manual to determine the code points of the special characters you might require. Different versions of EBCDIC do not always agree as to the placement of these characters. The mappings between Teradata EBCDIC and Unicode can be referred to in Appendix E of the International Character Set Support manual. The fieldname1 parameter in other FIELD commands can be referenced in nullexpr. Teradata Parallel Data Pump Reference 131 Chapter 3: TPump Commands FIELD Syntax Element Description fieldexpr concatenation of two or more items, either: • fields • character constants • string constants or a combination of these, as in: fieldname1||'C'||fieldname2||'STRING'||fieldname3... The field names within a layout must be unique and the data type of the field must be either CHAR or VARCHAR. Nested concatenations are not supported. If all items of the fieldexpr are fixed character (for example, no VARCHARs), the data type of the resulting field is CHAR(m), where “m” is the sum of the length for each component item. If at least one component of the fieldexpr is a VARCHAR, the data type of the resulting field is VARCHAR(m), where “m” is the sum of the maximum length for each component item. When the character set of the job script is different from the client character set used for the job (for example, on MVS the job script must be in Teradata EBCDIC when using the UTF8 client character set, or UTF16 client character set can be used with the job script in UTF8), TPump will translate the character constants and the string constants specified in the expression from the script character set form to the client character set form before concatenating the constants with the specified fields. Caution: DROP When using the UTF8 client set on the mainframe, be sure to examine the definition in the International Character Set Support manual to determine the code points of the special characters you might require. Different versions of EBCDIC do not always agree as to the placement of these characters. The mappings between Teradata EBCDIC and Unicode can be referred to in Appendix E of the International Character Set Support manual. characters present in the specified position(s) to be dropped from the specified fieldname, which must be of a character data type TPump drops the specified characters and presents the field to the Teradata Database as a VARCHAR data type. Usage Rules: If you specify two dropping actions, they must not be identical. If a FIELD command defines a fieldname in terms of two or more concatenated fieldname fields, any specified DROP clause applies to the concatenated result, not to the individual fieldname fields. But, because each fieldname must be defined by its own previous FIELD command, a DROP clause can be specified on these commands to apply to the individual fields. KEY keyword, which, when added to the end of the FIELD command, specifies that the field is part of the hash key for purposes of serialization, if the SERIALIZE parameter on the BEGIN LOAD command is active The serialization feature is meaningful only when a primary key for the loaded data is specified via this KEY option. 132 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands FIELD Usage Notes One or more FIELD commands may be intermixed with the TABLE command and the FILLER command. These commands must follow a LAYOUT command. If you redefine an input record field in fieldname, you cannot change the data type from ‘character’ to ‘decimal’ with the datadesc parameter. This is illegal in TPump and will abort the job and return an error message. If you specify both NULLIF and DROP LEADING/TRAILING BLANKS/NULLS on the same FIELD command, the DROP clause is evaluated after the NULLIF clause. As an example, in the FIELD command: .FIELD FIELDNAME * CHAR (5) NULLIF FIELDNAME = ‘x’ DROP LEADING BLANKS; if the input for fieldname is ‘x’, the NULLIF expression would evaluate to false because the leading blanks are not dropped before the NULLIF evaluation. Specifying Data Types Use the datadesc parameter to specify the type and length of data in the field. TPump generates the USING phrase accordingly with the user-assigned field name to which the body of the DML statement refers. For complete details on data types and data conversions, see SQL Reference: Data Types and Literals. The following is a short list of the input length and field description for the data type specifications you can make in the datadesc parameter: Graphic Data Type Specifications GRAPHIC(n) Where n is the length of the input stream in terms of double-byte characters. Length: n*2 bytes, if n is specified; otherwise 2 bytes, as n=1 is assumed. Description: n double-byte characters. The following example illustrates the use of the GRAPHIC data types in support of kanji or multibyte character data. The FIELD statement can contain GRAPHIC data types. .LAYOUT KANJIDATA; .FIELD EMPNO * SMALLINT; .FIELD LASTNAME * GRAPHIC(30); .FILLER FIRSTNAME * GRAPHIC(30); .FIELD JOBTITLE * VARGRAPHIC(30); VARGRAPHIC(n) Where n is the length of the input stream in terms of double-byte characters. Length: m + 2 bytes where m/2 <= 32000. Description: 2-byte integer followed by m/2 double-byte characters. LONG VARGRAPHIC Teradata Parallel Data Pump Reference 133 Chapter 3: TPump Commands FIELD Length: m + 2 bytes where m/2 <= 32000. Description: 2 byte integer followed by m/2 double-byte characters. Note: For both VARGRAPHIC and LONG VARGRAPHIC, m, a value occupying the first 2 bytes of the input data, is the length of the input in bytes, not characters. Each multibyte character set character is 2 bytes. Note: LONG VARGRAPHIC also implies VARGRAPHIC (32000). Range is 0 to 32000 in a 64,000-byte field. Decimal Data Type Specifications DECIMAL(x) and DECIMAL(x,y) Length: 1, 2, 4, or 8 bytes for network; packed decimal for mainframe Description: 64-bit double precision, floating point NULLIF Performance Using a large number of NULLIF clauses can cause a significant increase in the CPU usage on the system where you are running TPump. This rise in CPU usage may increase the time the job takes to run. An increase in CPU usage is most noticeable when you do not have: • FILLER commands in the LAYOUT • Input position gaps or overlaps • Concatenated fields • DROP clauses To avoid an increase in CPU usage on the system running TPump, transfer the processing of NULLIF expressions to the Teradata Database. Example 1 Instead of specifying the following: ... .FIELD fc * CHAR(5) NULLIF fc = 'empty'; .FIELD fi * INTEGER NULLIF fi = 0; ... .DML LABEL ins; INSERT INTO tbl1 VALUES(...,:fc,:fi,...); You would use this instead: ... .FIELD fc * CHAR(5); .FIELD fi * INTEGER; ... .DML LABEL ins; INSERT INTO tbl1 VALUES(...,NULLIF(:fc,'empty'),NULLIF(:fi,0),...); Example 2 In more complex situations, as in the following example: 134 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands FIELD ... .FIELD fs * CHAR(1) ; .FIELD fc * CHAR(5) NULLIF (fs <> 'M') AND (fs <> 'F'); .FIELD fi * INTEGER NULLIF fi < 0; ... .DML LABEL ins; INSERT INTO tbl2 VALUES(...,:fs,:fc,:fi,...); You would use this instead: ... .FIELD fs * CHAR(1) ; .FIELD fc * CHAR(5); .FIELD fi * INTEGER; ... .DML LABEL ins; INSERT INTO tbl2 VALUES(...,:fs, CASE WHEN (:fs = 'M') OR (:fs = 'F') THEN :fc ELSE NULL END, CASE WHEN (:fi >= 0) THEN :fi ELSE NULL END,...); Using ANSI/SQL DateTime Data Types When the DATEFORM command is used to specify ANSIDATE as the DATE data type, each DATE field is internally converted to a CHAR(10) field. You must convert all ANSI/SQL DateTime TIME, TIMESTAMP, and INTERVAL data types to fixed-length CHAR data types to specify column and field names in a TPump FIELD command. Table 18 shows how to use ANSI/SQL DateTime specifications. Table 18: ANSI/SQL DateTime Specifications Data Type Variable Definition Conversion Example TIME n = number of digits after decimal point CHAR(8 + n + (1 if n > 0, otherwise 0)) TIME (n) Format (n = 0): Example: hh:mm:ss 11:37:58 Default = 6 Format: (n = 4): Example: hh:mm:ss.ssss 11:37:58.1234 n = number of digits after decimal point CHAR(19 + n + (1 if n > 0, otherwise 0)) Valid values: 0–6 TIMESTAMP TIMESTAMP (n) Valid values: 0–6 TIME WITH TIME ZONE TIME (n) WITH TIME ZONE yyyy-mm-dd hh:mm:ss 1998-09-04 11:37:58 Default = 6 Format (n = 4): yyyy-mm-dd hh:mm:ss.ssss Example: 1998-09-04 11:37:58.1234 n = number of digits after decimal point CHAR(14 + n + (1 if n > 0, otherwise 0)) Valid values: 0–6 Default = 6 Teradata Parallel Data Pump Reference Format (n = 0): Example: Format (n = 0): hh:mm:ss{±}hh:mm Example: 11:37:58-08:00 Format (n = 4): hh:mm:ss.ssss {±} hh:mm Example: 11:37:58.1234-08:00 135 Chapter 3: TPump Commands FIELD Table 18: ANSI/SQL DateTime Specifications (continued) Data Type Variable Definition Conversion Example TIMESTAMP WITH TIME ZONE n = number of digits after decimal point CHAR(25 + n + (1 if n > 0, otherwise 0)) TIMESTAMP (n) WITH TIME ZONE Valid values: 0-6 Default = 6 Format (n = 0): yyyy-mm-dd hh:mm:ss{±}hh:mm Example: 1998-09-24 11:37:58+07:00 Format (n = 4): yyyy-mm-dd hh:mm:ss.ssss{±} hh:mm Example: 1998-09-24 11:37:58.1234+07:00 INTERVAL YEAR n = number of digits CHAR(n) INTERVAL YEAR (n) Valid values: 1-4 Format (n = 2): Example: yy 98 Format: (n = 4): Example: yyyy 1998 Default = 2 INTERVAL YEAR TO MONTH n = number of digits CHAR(n + 3) Valid values: 1-4 INTERVAL YEAR (n) TO MONTH Default = 2 Format (n = 2): Example: yy-mm 98-12 Format: (n = 4): Example: yyyy-mm 1998-12 INTERVAL MONTH n = number of digits CHAR(n) INTERVAL MONTH (n) Valid values: 1-4 Format (n = 2): Example: mm 12 Format: (n = 4): Example: mmmm 0012 Default = 2 INTERVAL DAY n = number of digits CHAR(n) INTERVAL DAY (n) Valid values: 1-4 Format (n = 2): Example: dd 31 Format: (n = 4): Example: dddd 0031 Default = 2 136 INTERVAL DAY TO HOUR n = number of digits CHAR(n + 3) INTERVAL DAY (n) TO HOUR Valid values: 1-4 Format (n = 2): Example: dd hh 31 12 Format: (n = 4): Example: dddd hh 0031 12 Default = 2 INTERVAL DAY TO MINUTE n = number of digits CHAR(n + 6) Valid values: 1-4 INTERVAL DAY (n) TO MINUTE Default = 2 Format (n = 2): Example: dd hh:mm 31 12:59 Format: (n = 4): Example: dddd hh:mm 0031 12:59 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands FIELD Table 18: ANSI/SQL DateTime Specifications (continued) Data Type Variable Definition Conversion Example INTERVAL DAY TO SECOND n = number of digits CHAR(n + 9 + m + (1 if m > 0, otherwise 0)) INTERVAL DAY (n) TO SECOND Default = 2 INTERVAL DAY TO SECOND (m) INTERVAL DAY (n) TO SECOND (m) Valid values: 1-4 m = number of digits after decimal point Format (n = 2, m = 0): hh:mm:ss Example: 12:59:59 Format: (n = 4, m = 4): hhhh:mm:ss.ssss Example: 0012:59:59.1234 Valid values: 0-6 Default = 6 INTERVAL HOUR n = number of digits CHAR(n) INTERVAL HOUR (n) Valid values: 1-4 Format: (n = 2): hh Example: 12 Default = 2 Format: (n = 4): hhhh Example: 0012 INTERVAL HOUR TO MINUTE n = number of digits CHAR(n + 3) Valid values: 1-4 INTERVAL HOUR (n) TO MINUTE Default = 2 Format: (n = 2): hh:mm Example: 12:59 INTERVAL HOUR TO SECOND n = number of digits Format: (n = 4): hhhh:mm Example: 0012:59 Valid values: 1-4 CHAR(n + 6 + m + (1 if m > 0, otherwise 0)) INTERVAL HOUR (n TO SECOND Default = 2 INTERVAL HOUR TO SECOND (m) m = number of digits Format: (n = 4, m = 4): hhhh:mm:ss.ssss after the decimal point Example: 0012:59:59.1234 Valid values: 0-6 INTERVAL HOUR (n) TO SECOND (m) Format: (n = 2, m = 0): hh:mm:ss Example: 12:59:59 Default = 6 INTERVAL MINUTE n = number of digits CHAR(n) INTERVAL MINUTE (n) Valid values: 1-4 Format (n = 2): Example: mm 59 Format: (n = 4): Example: mmmm 0059 Default = 2 INTERVAL MINUTE TO SECOND INTERVAL MINUTE (n) TO SECOND INTERVAL MINUTE TO SECOND (m) INTERVAL MINUTE (n) TO SECOND (m) Teradata Parallel Data Pump Reference n = number of digits Valid values: 1-4 Default = 2 m = number of digits after decimal point CHAR(n + 3 + m + (1 if m > 0, otherwise 0)) Format (n = 2, m = 0): mm:ss Example: 59:59 Format: (n = 4, m = 4): mmmm:ss.ssss Example: 0059:59.1234 Valid values: 0-6 Default = 6 137 Chapter 3: TPump Commands FIELD Table 18: ANSI/SQL DateTime Specifications (continued) Data Type Variable Definition Conversion Example INTERVAL SECOND n = number of digits CHAR(n + m + (1 if m > 0, otherwise 0)) INTERVAL SECOND (n) Valid values: 1-4 INTERVAL SECOND (n,m) Default = 2 Format (n = 2, m = 0): Example: m = number of digits after decimal point ss 59 Format: (n = 4, m = 4): ssss.ssss Example: 0059.1234 Valid values: 0-6 Default = 6 138 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands FILLER FILLER Purpose The FILLER command describes a named or unnamed field as filler which is not to be sent to the Teradata Database. Only fields relevant to this TPump task need to be specified. Syntax startpos .FILLER datadesc ; fieldname 3021A023 where Syntax Element Description fieldname name of an input record field to which a nullexpr of a FIELD command refers; or to which a “condition” expression of the IMPORT command’s APPLY clause refers The only reason for naming a filler field is to enable one of these expressions to refer to it. A fieldname must obey the same rules for its construction as Teradata SQL column names. The reason for describing a field that is not to be sent to the Teradata Database and is not used in any of the expressions mentioned in the previous paragraph is to make it possible for you to specify startpos as an asterisk for subsequent fields of the input records. If the use of the asterisk is not important to you, you do not need to define fields that do not participate in the TPump. startpos starting position of a field of the data records in an external data source It may be specified as an unsigned decimal integer, which is a character position starting with 1, or as an asterisk, which is the next available character position beyond the preceding field. Note that where input records may be continued by use of the CONTINUEIF condition, a startpos specified as an unsigned integer refers to a character position in the final concatenated result from which the continuation indicators have been removed. Refer to the description of the condition parameter of the LAYOUT command. datadesc Teradata Parallel Data Pump Reference type and length of data in the field 139 Chapter 3: TPump Commands FILLER Usage Notes One or more FILLER commands may be intermixed with the FIELD command or the TABLE command. These commands must follow a LAYOUT command. Example This example illustrates the use of the GRAPHIC data types in support of kanji or multibyte character data. The FILLER statement describing the input data set or file can contain GRAPHIC data types. .LAYOUT KANJIDATA; .FIELD EMPNO * SMALLINT; .FIELD LASTNAME * GRAPHIC(30); .FILLER FIRSTNAME * GRAPHIC(30); .FIELD JOBTITLE * VARGRAPHIC(30); 140 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands IF, ELSE, and ENDIF IF, ELSE, and ENDIF Purpose TPump provides a structure of IF, ELSE, and ENDIF commands for the conditional control of execution processes. Conditional execution works as follows: Syntax .IF ; conditional expression THEN statements to execute if TRUE .ELSE ; statements to execute if FALSE .ENDIF ; statements to resume with 3021A024 where Syntax Element Description conditional expression user-defined variables or pre-defined system variables following the IF command, whose condition (TRUE or FALSE) triggers the execution of alternative groups of statements statements to execute if TRUE statements to be executed whenever the conditional expression following the IF command evaluates as TRUE statements to execute if FALSE statements following the optional ELSE command which execute only when the conditional expression following the IF command evaluates as FALSE statements to resume with statements following the ENDIF command to terminate the conditional statement execution process and resume the normal command sequence Usage Notes The conditional expression in the IF command may consist of either user-defined variables or predefined system variables. The ELSE command clause is optional. ELSE is used only when there are statements to be executed when the condition is evaluated as false. Conditional expression is an expression which can be evaluated as either true or false. When evaluation of the expression returns a Teradata Parallel Data Pump Reference 141 Chapter 3: TPump Commands IF, ELSE, and ENDIF numeric result, 0 is interpreted as false; nonzero results are interpreted as true. See the “Utility Variables” on page 62. TPump supports the nesting of IF commands to a level of 100. Any ELSE or ENDIF commands must be present in their entirety and cannot be composed simply of variables in need of substitution. Commands and statements following an IF, ELSE, or ENDIF structure that are not executed are not parsed and do not have their variables substituted. Example 1 TPump is case sensitive when doing a compare on an ‘&SYS’ system variable. The RUN FILE command does not execute because the substituted values returned in this example are all in uppercase. This factor must be considered when creating a script to force the execution of a predetermined sequence of events. If, in line 0003, ‘FRI’ was used, the compare would work and the RUN FILE command would execute. 0003 .IF ’&SYSDAY’ = ’Fri’ THEN; 14:10:28 - FRI MAY 09, 1997 UTY2402 Previous statement modified to: 0004 .IF ’FRI’ = ’Fri’ THEN; 0005 .RUN FILE UTNTS38; 0006 .ENDIF; Example 2 In Example 2, the user has created the table named &TABLE and a variable named CREATERC, into which is set the system return code resulting from the execution of the CREATE TABLE statement. If the table name has not already been used, and the return code is not zero, the return code evaluates to an error condition and the job logs off with the error code displayed. 0010 .SET CREATERC TO &SYSRC; 0011 .IF &CREATERC = 3803 /* Table &TABLE already exists */ THEN; UTY2402 Previous statement modified to: 0012 .LOGOFF 08; 0013 .RUN FILE RUN01; 0014 .ELSE 0015 .IF &CREATERC <> 0 THEN 0016 .LOGOFF &CREATRC; 0017 .ENDIF 142 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands IMPORT IMPORT Purpose The IMPORT command identifies a source for data input. By referencing the LAYOUT command and DML command, IMPORT ties the previous commands together. The input data source used for IMPORT depends on whether the TPump utility is running on an IBM VM or MVS client, or on a network-attached client platform, as shown in the following syntax diagram. Syntax For Channel-Attached Client Systems: .IMPORT INFILE ddname A AXSMOD name 'init-string' B C A HOLD FREE INMOD modulename USING ( parms ) B D C FROM m FOR n THRU k D E FORMAT VARTEXT 3 'c' DISPLAY ERRORS NOSTOP E F LAYOUT layoutname ; F APPLY label WHERE condition 3021A004 Teradata Parallel Data Pump Reference 143 Chapter 3: TPump Commands IMPORT For Network-Attached Client Systems: .IMPORT INFILE filename A AXSMOD name 'init-string' B C A FROM m INMOD modulename THRU k USING ( parms ) B FOR n D C FORMAT FASTLOAD BINARY TEXT UNFORMAT VARTEXT 3 'c ' DISPLAY ERRORS NOSTOP D ; LAYOUT layoutname APPLY label WHERE condition 3021A025 where Syntax Element Description INFILE ddname external data source that contains the input records on channel-attached client systems In MVS, this is a DDNAME. In VM, it is a FILEDEF name. If DDNAME is specified, TPump reads data records from the specified source. If modulename is also specified, TPump passes the records it reads to the specified module. The DDNAME must obey the applicable rules of the external system. A DDNAME must obey the same construction rules as Teradata SQL column names except that: • The “at” character (@) is allowed as an alphabetic character • The underscore character (_) is not allowed If the DDNAME represents a data source on magnetic tape, the tape may be either labeled or nonlabeled, as supported by the operating system. 144 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands IMPORT Syntax Element Description AXSMOD name name of the access module file to be used to import data The names of the access module files are: OLE DB Access Module oledb_axsmod.dll on Microsoft® Windows platforms Named Pipes Access Module • np_axsmod.sl on Hewlett-Packard® HP-UX platforms • np_axsmod.so on NCR® MP-RAS, IBM® AIX®, Sun® Solaris® SPARC®, Sun® Solaris® Opteron®, and Novell® SUSE® Linux Enterprise and Red Hat® Enterprise Linux® Advanced Server platforms • np_axsmod.dll on Windows platforms Note: When using TPump latency option with Named Pipe Access Module, the Named Pipe Access Module parameter file should use need_full_block = no option. WebSphere® Access Module for Teradata (client version) • libmqsc.sl on HP-UX platforms • libmqsc.so on MP-RAS, AIX, Solaris SPARC, Solaris Opteron, and Linux platforms • libmqsc.dll on Windows platforms WebSphere® Access Module for Teradata (server version) • libmqs.sl on HP-UX platforms • libmqs on IBM MVS/ESA platforms • libmqs.so on AIX, Solaris SPARC, Solaris Opteron, and Linux platforms • libmqs.dll on Windows platforms. You may use your own shared library file name if you have a custom access module. Large File Access Module is no longer available because the Data Connector API supports file sizes greater than 2 gigabytes on Windows, HP-UX, AIX, and Solaris SPARC platforms. The AXSMOD option is not required for importing: Disk files on either network- or channel-attached client systems Magnetic tape files on channel-attached client systems It is required for importing magnetic tape and other types of files on network-attached client systems. Refer to Teradata Tools and Utilities Access Module Reference for more information about the specific access modules. ‘init-string’ Teradata Parallel Data Pump Reference optional initialization string for the access module 145 Chapter 3: TPump Commands IMPORT Syntax Element Description INFILE filename fully qualified UNIX or Windows path name for an input file on networkattached client systems If the path name has embedded white space characters, you must enclose the entire path name in single or double quotes. If you specify the INFILE filename, the data is read from the specified source. If you also specify the INMOD modulename, the data is passed to the specified module. HOLD default condition to not deallocate the input tape device specified by ddname when the import operation completes on channel-attached client systems Instead, the HOLD specification de-allocates the device when the entire TPump operation completes. FREE deallocation of the tape input device specified by ddname when the import operation completes on channel-attached client systems When de-allocated, any attempt to open the input device, either in the same TPump utility task or in another task within the same script, produces an undefined ddname error. The default is to not deallocate the device. INMOD modulename optional user-written routine for preprocessing the input data In MVS, the modulename is the name of a load module. In UNIX and Windows, it is the pathname for the INMOD executable code file. The modulename must obey the applicable rules of the external system. A modulename must obey the same construction rules as Teradata SQL column names except that on channel-attached client systems: • The “at” character (@) is allowed as an alphabetic character • The underscore character (_) is not allowed When you specify both the INFILE fileid and the INMOD modulename parameters, the input file is read and the data is passed to the INMOD routine for preprocessing. If you do not specify the INFILE fileid parameter, your INMOD routine must provide the input data record. Note: When you use an INMOD routine with the INFILE specification, TPump performs the file read operation, and the INMOD routine acts as a pass-through filter. Because the FDL-compatible INMOD routine must always perform the file read operation, you cannot use an FDL-compatible INMOD routine with the INFILE specification of a TPump IMPORT command. Note: On some versions of UNIX, you may have to add ./ prefix characters to the modulename specification if the module is in the current directory. 146 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands IMPORT Syntax Element Description USING (parms) character string with the parameters you want to pass to the user exit routine The parms string can include one or more character strings, each delimited on either end by an apostrophe or quotation mark. Maximum size of the parms string is 1K bytes. Parentheses within delimited character strings or comments have the same syntactical significance as alphabetic characters. Before passing the parms string to the user exit routine, the following items are replaced with a single blank character: • Each comment. • Each consecutive sequence of white space characters, such as blank, tab and so on, that appears outside of delimited strings. The entire parms string must be enclosed in parentheses. On channelattached client systems, the parentheses are included in the string passed to the user exit routine. Note: The parms string must be FDLINMOD for the user exit routines written for the prior Pascal version of the FastLoad utility (program FASTMAIN). FROM m logical record number, as an integer, of the record in the identified data source where processing is to begin If you do not use a FROM m specification, TPump begins processing with the first record received from the data source. FOR n number of records, as an integer, starting at record m, to be processed If you do not use a FOR n or a THRU k specification, TPump continues processing through the last record obtained from the data source. THRU k logical record number, as an integer, of the record in the identified data source where processing is to end If you do not use a THRU k or a FOR n specification, TPump continues processing through the last record obtained from the data source. Teradata Parallel Data Pump Reference 147 Chapter 3: TPump Commands IMPORT Syntax Element Description FORMAT record format of the input file The format can be: FASTLOAD—A 2-byte integer, n, followed by n bytes of data and an end-ofrecord marker (either X’0A’ or X’0D’). BINARY—A 2-byte integer, n, followed by n bytes of data. TEXT—An arbitrary number of bytes, followed by an end-of-record marker which is a: • Line feed (X’0A’) on UNIX platforms. • Carriage-return and line-feed pair (X’0D0A’) on Windows platforms. UNFORMAT—defined by FIELD, FILLER, and TABLE commands of the specified layout. Note: When using UNFORMAT formatting in MVS, ensure that the data stream and data source are consistent with the layout defined in the utility script. Discrepancies in the length of the data stream could result in data corruption. VARTEXT—in variable-length text record format, with each field separated by a delimiter. Rules for a delimiter are: • No control characters other than a TAB character can be used as a delimiter. • Any character that appears in the data cannot be used as a delimiter. • Delimiters can only be up to 10 single-characters long. If you do not specify a FORMAT option, the default is FASTLOAD. Note: On the mainframe platform, when access module is not used, the default is the input data read record by record and the LAYOUT is applied to each read record. 148 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands IMPORT Syntax Element Description 'c' optional specification of the delimiter characters that separate fields in the variable-length text records of the input data source The default, if you do not use a 'c' specification, is the vertical bar character ( | ). When the character set of the job script is different from the client character set used for the job (for example, on MVS the job script must be in Teradata EBCDIC when using the UTF8 client character set, or UTF16 client character set can be used with the job script in UTF8), TPump will translate the effective delimiter from the script character set form to the client character set form before separating the fields with it. For example, when the client character set is UTF16 and the script character set is UTF8, if the following command is given: … FORMAT VARTEXT '-' ... TPump translates '-' from the UTF8 form to the UTF16 form and then separate the fields in the record according to the UTF16 form of '-'. Similarly, on the mainframe, when the client character set is UTF8 and the script character set is Teradata EBCDIC, if the following command is given: … FORMAT VARTEXT '6A'xc ... TPump interprets x’6A’ according to Teradata EBCDIC and translates it to the corresponding Unicode code point, U+007C "VERTICAL LINE", and uses the UTF8 encoding scheme of U+007C, 0x7C (which is '|' in 7-bit ASCII), as the delimiter character for the record. Caution: When using the UTF8 client set on the mainframe, examine the definition in the International Character Set Support manual to determine the code points of the special characters you might require. Different versions of EBCDIC do not always agree as to the placement of these characters. For example, the code point of '|' in most IBM EBCDIC code pages is x'4F'. If you specify '|' as the delimiter in the script or leave the delimiter to default in a system environment using such an IBM EBCDIC code page, (which is essentially the same as specifying '|'), but your UTF8 data uses x'7C' ('|' in Unicode) as the delimiter, the job will run into errors because: 1 the code point of x'4F' in Teradata EBCDIC maps to U+008D but not U+007C, and 2 the delimiter must use single-byte characters when it is in the client character set form. DISPLAY ERRORS optional keyword specification that writes input data records that produce errors to the standard error file NOSTOP optional keyword specification that inhibits the TPump termination in response to an error condition associated with a variable-length text record LAYOUT layoutname layout of the input record, as specified by a previous command APPLY label error treatment options specified by a previous DML LABEL command for subsequent INSERT, UPDATE, or DELETE statements Teradata Parallel Data Pump Reference 149 Chapter 3: TPump Commands IMPORT Syntax Element Description WHERE condition condition that determines whether the indicated label options are applied to the records and sent to the Teradata Database, where: • condition true = yes • condition false = no The condition specification can reference: • Any combination of fields defined in the currently active layout • System and user-defined constants and variables • The fieldname1 specified in commands When you specify VARTEXT, the TPump utility assumes that the input data is variable-length text fields separated by a field delimiter character. The utility parses each input data record on a field-by-field basis, and creates a VARCHAR field for each input text field. When the character set of the job script is different from the client character set used for the job (for example, on MVS the job script must be in Teradata EBCDIC when using the UTF8 client character set, or UTF16 client character set can be used with the job script in UTF8), TPump translates the string constants specified in the condition and the import data referenced in the condition to the same character set before evaluating the condition. For example, when the client character set is UTF16 and the script character set is UTF8, if the following command is given … APPLY lable1 WHERE C1 = 'INSERT'; TPump translates the data in the C1 field to the UTF8 form and compares it with the UTF8 form of 'INSERT' to obtain the evaluation result. Similarly, on the mainframe, when the client character set is UTF8 and the script character set is Teradata EBCDIC, if the following command is given: … APPLY lable2 WHERE C2 = 'DELETE'; TPump translates the data in the C2 field from the UTF8 form to the Teradata EBCDIC form and perform the comparison with the Teradata EBCDIC form of 'DELETE'. Caution: When using the UTF8 client set on the mainframe, be sure to examine the definition in the International Character Set Support manual to determine the code points of the special characters you might require. Different versions of EBCDIC do not always agree as to the placement of these characters. The mappings between Teradata EBCDIC and Unicode can be referred to in Appendix E of the International Character Set Support manual. Usage Notes A maximum of four IMPORT commands can be used in a single TPump load task. A single load comprises the set of commands and statements bounded by a BEGIN LOAD-END LOAD command pair. If the number of IMPORTs sent to the Teradata Database for the load exceeds four, an error message is logged. TPump has been limited to four IMPORTs per load in order to limit the amount of memory needed to keep track of job-related statistics. The maximum number of INSERT, UPDATE, DELETE, and EXECUTE statements that can be referenced in an IMPORT is 127. 150 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands IMPORT The only DML statements that are candidates for application by an IMPORT command are those within the scope of DML commands whose labels appear in one or more of the IMPORT command’s APPLY clauses. The referenced DML commands and their following DML statement(s) must appear between the BEGIN LOAD command that defines the task and the referencing IMPORT commands. A statement or group of statements is applied if no condition is specified, or if the specified condition is true. TPump permits multiple statements to be applied to the same data record in either of two ways. First, if an APPLY clause refers to a label whose scope includes multiple DML statements, each of these statements is applied to the same data record under the same condition specified in the clause. Second, if multiple APPLY clauses are used, each can refer to the label of a different DML statement or group of statements. Each label’s statements are applied to the same data record under that condition specified in the respective clause. These features allow the same data record to be applied to different tables under the same or differing conditions. VARTEXT Record Usage When you specify VARTEXT, TPump assumes that the input data is variable-length text fields separated by a field delimiter character. It parses each input data record on a field-by-field basis, and creates a VARCHAR field for each input text field. When using the VARTEXT specification, VARCHAR, VARBYTE, and LONG VARCHAR are the only valid data type specifications to use in TPump layout FIELD and FILLER commands. Two consecutive delimiter characters direct TPump to null the field corresponding to the one immediately following the first delimiter character. Also, if the last character in a record is a delimiter character, and there is at least one more field to be processed, then TPump nulls the field corresponding to the next one to be processed, as defined in the layout FIELD and FIELD command. The total number of fields in each input record must be equal to or greater than the number of fields described in the TPump layout FIELD and FIELD commands. If it is less, TPump generates an error message. If it is more, the Teradata Database ignores the extra fields. The last field of a record does not have to end with a delimiter character. It can end with a delimiter character, but it is not required. When TPump encounters an error condition in an input record, it normally discards the record and terminates. When loading variable-length text records, you can inhibit either or both of these functions by specifying the error-handling options: • DISPLAY ERRORS • NOSTOP If NOSTOP is specified, TPump will not terminate even if an error is encountered. By specifying both options and redirecting STDERR to a file location instead of your terminal screen, the TPump job runs to completion and saves all the error records. Then you can manually modify them and load them into the table. Teradata Parallel Data Pump Reference 151 Chapter 3: TPump Commands IMPORT All IMPORT commands for a TPump task must appear between the BEGIN LOAD and END LOAD commands for the task. TPump imposes several syntax rules for the parms string for an INMOD user exit routine. On entry to any INMOD user exit routine for TPump, the conventional parameter register points to a parameter list of two 32-bit addresses used to communicate with the INMOD. At the end of an IMPORT, an environmental variable is established for each DML command executed. TPump variables are not constrained to 30 characters. These variables contain the activity counts associated with each statement. The variables created are of the form: &IMP <n>_<Apply label>_<x> where n = the number of the IMPORT, from one through four. Apply label = the label of the clause containing the DML command in question. x = the number of the statement within the containing APPLY clause. The following script is an example of a TPump job using the APPLY keyword to create conditional clauses to apply DML INSERTs, UPDATEs, and UPSERTs to the IMPORT. APPLY Example .BEGIN LOAD SESSIONS 34; .LAYOUT EQTTB535; .FIELD Pool_Upd_Code * CHAR(01); .FIELD Eqmt_Init * CHAR(04); .... .DML LABEL UPSERTAC DO INSERT FOR MISSING UPDATE ROWS; UPDATE EQTDBT50.EQTTB535_TAL SET TCS_POOL_IDFR_NUM =:TCS_POOL_IDFR_NUM ..... WHERE ..... ; INSERT INTO EQTDBT50.EQTTB535_TAL VALUES( POOL_EXPN_DATE =:POOL_EXPN_DATE (DATE, FORMAT 'YYYYMMDD') ..... ); .DML LABEL UPSERTDL; UPDATE EQTDBT50.EQTTB535_TAL SET ..... WHERE ..... ; 152 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands IMPORT .IMPORT INFILE INFILE LAYOUT EQTTB535 APPLY UPSERTAC WHERE (POOL_UPD_CODE = 'C' OR POOL_UPD_CODE = 'A') APPLY UPSERTDL WHERE POOL_UPD_CODE = 'D' ; .END LOAD; /* For the upsert: */ /* (first statement in .DML UPSERTAC) */ /* make sure we have the 50 updates */ .IF &IMP1_UPSERTAC_1 <> 50 THEN .LOGOFF 100; /* ... and 50 inserts */ /* (second statement in .DML UPSERTAC) */ .IF &IMP1_UPSERTAC_2 <> 50 THEN .LOGOFF 101; /* And for the plain update: */ /* (first statement in .DML UPSERTDL) */ /* we should have 10 of these. */ .IF &IMP1_UPSERTDL_1 <> 10 THEN .LOGOFF 102; .LOGOFF; Teradata Parallel Data Pump Reference 153 Chapter 3: TPump Commands INSERT INSERT Purpose TPump supports the Teradata SQL INSERT statement, which adds new rows to a table by directly specifying the row data to be inserted. Syntax INSERT INS tname ; .* , INTO VALUES , ( cname ( :fieldname expression ) ) 3021A026 where Syntax Element Description tname table that is to receive rows created from input data records If the table is not explicitly qualified by database name, the default database qualifies it. cname column of the specified table that is to receive the value from a field of matching input records, where the value is identified by the corresponding entry in the fieldname list fieldname field of an input record, whose value is given to a column of the specified table that is identified by the corresponding entry in the cname clause in this statement If this statement did not specify a cname, the object’s CREATE statement provides the corresponding column identifier. This assumes that all columns from the table correspond to those specified in the original CREATE statement. expression alternative to the fieldname clause, an expression that includes one or more actual fieldnames as terms may instead be used Usage Notes The following notes describe how to use an INSERT statement following a DML command. An INSERT statement may also be used in the support environment; normal rules for INSERT are followed in that case. One way of specifying the applicable DML statements is to relate each field name to the name of the column to which the field’s data is applied. Another way tells TPump to apply the first 154 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands INSERT nonfiller field of a record that is sent to the Teradata Database to the first defined column of the affected table, the second nonfiller field to the second column, and so on. TPump converts INSERT statements into macro equivalents and, depending on the packing specified, submits multiple statements on one request. To insert records into the table identified by tname, the username specified in the LOGON command must have the INSERT privilege for the table. A value must be specified for every column, either explicitly or by default. For TPump use, if the object of the INSERT statement is a view, it must not specify a join. TPump operates only on single table statements so INSERT statements must not contain any joins. The correspondence between the fields of data records to be inserted into a table, and the columns of the table, can be specified in any of four ways. These appear in the following examples, using targetable as the table or view name. The maximum number of INSERT, UPDATE, and DELETE statements that can be referenced in an IMPORT is 127. The maximum number of DML statements that can be packed into a request is 600. The default number of statements packed is 20. ANSI/SQL DateTime Specifications You can use the ANSI/SQL DATE, TIME, TIMESTAMP, and INTERVAL DateTime data types in Teradata SQL CREATE TABLE statements. Specify them as column/field modifiers in INSERT statements. Example 1 .BEGIN LOAD SESSION number; .LAYOUT Layoutname; .TABLE Targetablename; .DML LABEL DMLlabelname; INSERT INTO Targetablename.*; .IMPORT INFILE Infilename LAYOUT Layoutname .END LOAD; APPLY DMLlabelname; Example 2 .LAYOUT lname; .FIELD first 1 somedatatype; .FIELD f2nd * anydatatype; . . . .FIELD flast * datatype; .DML LABEL label; INSERT INTO targetable VALUES (:first, :f2nd, ... :flast); Teradata Parallel Data Pump Reference 155 Chapter 3: TPump Commands INSERT Example 3 .LAYOUT lname; .FIELD first 1 somedatatype; .FIELD f2nd * anydatatype; . . . .FIELD flast * datatype; .DML LABEL label; INSERT INTO targetable (col1, col2, ... colast) VALUES (:f2nd, :first, ... :flast); Example 4 An input data source contains a series of 10- to 40-byte records. Each record contains the primary index value (EmpNum) of a row that is to be inserted successively into the Employee table whose columns are EmpNo, Name, and Salary. .BEGIN LOAD SESSION number ; .LAYOUT Layoutname; .FIELD EmpNum 1 INTEGER; .FIELD Name * (VARCHAR (30)); .FIELD Sal * (DECIMAL (7,2)); .DML LABEL DMLlabelname; INSERT Employee (EmpNo, Name, Salary) VALUES (:EmpNum, :Name, :Sal); .IMPORT INFILE Infilename LAYOUT Layoutname APPLY DMLlabelname; .END LOAD; 156 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands LAYOUT LAYOUT Purpose The LAYOUT command, in conjunction with the immediately following sequence of FIELD, FILLER, and TABLE commands, specifies the layout of the externally stored data records. Syntax .LAYOUT ; layoutname CONTINUEIF condition INDICATORS 3021A027 where Syntax Element Description layoutname name assigned to the layout for reference by one or more subsequent IMPORT commands A layoutname must obey the same rules for its construction as Teradata SQL column names. The name specified also may be used in the LAYOUT clause of an IMPORT command. Teradata Parallel Data Pump Reference 157 Chapter 3: TPump Commands LAYOUT Syntax Element Description CONTINUEIF condition following: position = value where position is an unsigned integer (never an asterisk) that specifies the starting character position of the field of every input record that contains the continuation indicator, and where value is the continuation indicator specified as a character constant or a string constant. TPump uses the length of the constant as the length of the continuation indicator field. In the CONTINUEIF option, the condition specified as position = value is case-sensitive; verify that the correct case has been specified for this parameter. If the condition is true, TPump forms a single record to be sent to the Teradata Database, by concatenating the next input record at the end of the current record. (The current record is the one most recently obtained from the external data source.) If the condition parameter is false, TPump sends to the Teradata Database, the current input record either by itself, or as the last of a sequence of concatenated records. Note that the starting position of the continuation indicator field is specified as a character position of the input record. Character positions start with character position one. TPump removes the continuation indicator field from the input records so that they are not part of the final concatenated result. For other fields whose startpos is specified as an unsigned integer, the startpos refers to the field position within the final concatenated result. Consequently, you cannot define the continuation indicator field as all or part of a field defined with the FIELD, FILLER, or TABLE commands. Refer to the startpos parameter of the FIELD command. When the character set of the job script is different from the client character set used for the job (for example, on MVS the job script must be in Teradata EBCDIC when using the UTF8 client character set, or UTF16 client character set can be used with the job script in UTF8), TPump translates the specified value, which is either a character constant or a string constant, from the script character set form to the client character set form before evaluating the condition. TPump uses the length of the constant in the client character set form as the length of the continuation indicator field. Caution: When using the UTF8 client set on the mainframe, be sure to examine the definition in the International Character Set Support manual to determine the code points of the special characters you might require. Different versions of EBCDIC do not always agree as to the placement of these characters. The mappings between Teradata EBCDIC and Unicode can be referred to in Appendix E of International Character Set Support. 158 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands LAYOUT Syntax Element Description INDICATORS condition that the data is in the indicator mode This means that the first n bytes of each record are indicator bytes, where n is the rounded-up integer quotient of the number of fields defined by this LAYOUT command for transmission to the Teradata Database, divided by 8. TPump sends all the FIELD commands, including redefines, to the Teradata Database. If a field has been defined and then redefined, indicator bits must be set for both. FILLER commands also need to have indicator bits set. TPump sends both the defined and the redefined fields to the Teradata Database. This demonstrates the inefficiency of redefines which cause the transfer of an extraneous field. If INDICATORS is specified on the LAYOUT command and the data file does not contain indicator bytes in each record, the target table will be loaded with spurious data. Conversely, if INDICATORS is not specified and the data file contains indicator bytes in each record, the target table will likewise be corrupted. Exercise caution to guard against either occurrence. A LAYOUT command that includes the INDICATORS option must accurately describe all fields of the record to agree with the column descriptions and ordering of the table from which this indicator-mode data was previously selected. If the INDICATORS option is specified, TPump sends the data to the Teradata Database with indicators. The NULLIF parameter of the FIELD command can be specified with or without the INDICATORS option. If NULLIF is specified, TPump sends the data to the Teradata Database with indicators, whether or not the INDICATORS option is specified. Usage Notes Although there is no explicit limit to the number of LAYOUT commands allowed, there is a practical limit. The implied limit on usable LAYOUT commands per TPump load is four, because TPump allows up to four IMPORT commands within a load, and each IMPORT can reference only one LAYOUT. A LAYOUT command must be immediately followed by a combination of FIELD, FILLER, and/or TABLE commands. This sequence of commands, referenced by the layoutname, may describe one or more record formats contained in one or more client data sources (see redefinition options for FIELD, FILLER, and TABLE). The LAYOUT command sequence is terminated by the first subsequent command that is not a FIELD, FILLER, or TABLE command. A layoutname may be used by one or more TPump tasks (delimited by BEGIN LOAD and END LOAD) in a single job step and must be defined prior to any IMPORT commands that reference it. All IMPORT commands in a single TPump task must reference the same layoutname in the LAYOUT clause. Teradata Parallel Data Pump Reference 159 Chapter 3: TPump Commands LOGDATA LOGDATA Purpose Supplies parameters to the LOGMECH command beyond those needed by the logon mechanism, such as user ID and password, to successfully authenticate the user. The LOGDATA command is optional. Whether or not parameters are supplied and the values and types of parameters depend on the selected logon method. LOGDATA is available only on network-based platforms. Syntax .LOGDATA logdata_string ; 'logdata_string ' 2409A054 where Syntax Element Description logdata_string ‘logdata_string’ parameters required for the logon mechanism specified using “LOGMECH” on page 161 For information about the logon parameters for supported mechanisms, see the Security Administration guide. The string is limited to 64 KB and must be in the session character set. To specify a string containing white space or other special characters, enclose the data string in single quotes. Note: The security feature this command supports is not supported with the UTF16 session character set. Usage Notes For more information about logon security, see the Security Administration guide. Examples If used, the LOGDATA command and the LOGMECH command must precede the LOGON command. The commands themselves may occur in any order. The example demonstrates using the LOGDATA, LOGMECH, and LOGON commands in combination to specify the Kerberos logon authentication method and associated parameters: .logmech KRB5; .logdata joe@domain1@@mypassword; .logon cs4400s3; 160 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands LOGMECH LOGMECH Purpose Identifies the appropriate logon mechanism by name. If the specified mechanism requires parameters other than user ID and password for authentication, the LOGDATA command provides these parameters. The LOGMECH command is optional and available only on network-attached systems. Syntax logmech_name .LOGMECH ; 2409A053 where Syntax Element Description logmech_name definition of the logon mechanism For a discussion of supported logon mechanisms, see Security Administration. The name is limited to 8 bytes; it is not case-sensitive. Usage Notes Every session to be connected requires a mechanism name. If none is supplied, a default mechanism can be used instead, as defined on either the server or client system in an XML-based configuration file. For more information about logon security, see Security Administration. Examples If used, the LOGDATA and LOGMECH commands must precede the LOGON command. The commands themselves may occur in any order. The following example demonstrates using the LOGDATA, LOGMECH, and LOGON commands in combination to specify the Windows logon authentication method and associated parameters: .logmech NTLM; .logdata joe@domain1@@mypassword; .logon cs4400s3; Teradata Parallel Data Pump Reference 161 Chapter 3: TPump Commands LOGOFF LOGOFF Purpose The LOGOFF command disconnects all active sessions and terminates execution of TPump on the client. An optional return code value may be specified as a conditional or arithmetic expression, evaluated to a signed integer. Syntax ; .LOGOFF retcode 3021A028 where Syntax Element Description retcode completion code returned to the client operating system If retcode is not specified, TPump returns the value generated by the error condition. Usage Notes TPump tracks the internal error condition code throughout the job and returns either 0 for complete success, 4 for warnings, 12 for fatal errors, and 16 for no sysprint. These values are the “error conditions”. To avoid ambiguity or conflict with standard TPump completion codes, values greater than 20 should be used. TPump returns the higher value between the value generated by the error condition and the return code specified in LOGOFF. If the LOGOFF command processes, this indicates that the highest return code reached was no more than 4 (warning). Any return code other than 0 or 4 would have terminated the job. LOGOFF is permitted at any point in the input script and logs you off immediately. Example Suppose successful execution of a Teradata SQL statement (such as CREATE TABLE) is necessary to prepare for TPump. If you determine the statement has failed with an unacceptable completion code, and if BADRC is set to &SYSRC after the failed SQL statement, you can terminate execution of TPump and return the unacceptable code to the client by executing this command: .LOGOFF &BADRC; 162 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands LOGOFF The restart table is dropped when this command is executed. If execution is terminated before the LOGOFF command is encountered, the restart table is not dropped, in order to support a restart at a later time. If a serious error terminates the program before the LOGOFF command is processed, the return code output is the value generated by the error condition rather than the optional retcode specified as a LOGOFF command option. Teradata Parallel Data Pump Reference 163 Chapter 3: TPump Commands LOGON LOGON Purpose The LOGON command establishes a Teradata SQL session between TPump and the Teradata Database. You use it to specify the LOGON string for connecting sessions required by subsequent functions. Syntax Standard LOGON ; username .LOGON tdpid / , password , ' acctid ' 3021A005 Note: On VM/MVS, with the use of the User Logon Exit routine in TDP, the user name is not required. See Teradata Director Program Reference for more information. Single Sign On LOGON ; .LOGON tdpid / username , password ,'acctid ' 2409A010 Note: When logon encryption is enabled on the gateway, single sign on is disabled on the client and standard logon syntax should be used instead. where Syntax Element Description tdpid optional identifier associated with a particular copy of the Teradata Database If this field is not specified, the default tdpid, established by the system administrator, is used. For channel-attached systems, the tdpid string must be in the form: TDPn where n is the TDP identifier username 164 user identifier of up to a 30-character maximum Teradata Parallel Data Pump Reference Chapter 3: TPump Commands LOGON Syntax Element Description password optional password associated with the username, up to a 30-character maximum The Teradata Database must be configured to recognize the password specified. ’acctid’ optional account identifier associated with the username, up to a 30-character maximum You must enclose the string specification in single quotes. If this field is not specified, a default ’acctid’ is used. Usage Notes Both the LOGON command and the LOGTABLE command are required to set up the TPump support environment. You can use them in any order, but they must precede any other commands. However, you can use a RUN FILE command to identify a file containing the LOGON command before the LOGON and LOGTABLE commands. LOGON and LOGTABLE commands typically occur as: .logtable logtable001; .logon tdpx/me,paswd; When the LOGON command is executed, the initial TPump utility session is logged on. The logon information is saved and re-used when processing the BEGIN LOAD command to connect the appropriate number of sessions. The parameters (tdpid, username, password, and ’acctid’) are optional and are used in all sessions established with the Teradata Database. The LOGON command may only occur once. The period (.) preceding LOGON is also optional. The tdpid identifier specifies a particular Teradata Database. See your system or site administrator for the identifier that you plan to use. If you do not specify a tdpid and the site administrator has not updated the System Parameter Block, the default identifier is Teradata Database. The long form of this parameter, tdpx, should be used to avoid CLI errors that can occur when the short form is used. The tdpid parameter is optional if your site has only one TDP, if you have previously executed a TDP command, or if you select the default TDP. This parameter is not case-sensitive. TPump does not prompt for a username or password. If either or both of these are required, TPump fails and reports the error. Both of these parameters may be optional if a logon exit is used. Where possible, you should not use special characters in the ’acctid’ parameter string. Although ’acctid’ may contain special characters, they might be interpreted differently by different output devices. Therefore, you might have to modify a script containing special characters if your output is routed to another device. If the ’acctid’ contains an apostrophe (single quote) character, you should use either the second form of the LOGON command, which is delimited by quotes, or double the apostrophe character as follows: .LOGON 0/fml,fml,”engineering’s account” Teradata Parallel Data Pump Reference 165 Chapter 3: TPump Commands LOGON or .LOGON 0/fml,fml,’engineering”s account” If the ’acctid’ does not contain an apostrophe, the two LOGON command forms are the same. If you enter any parameter incorrectly, the logon fails and TPump returns an error message. For security reasons, the error message does not state in which parameter the error occurred. If password security on a channel-attached client is a concern, use the RUN FILE command to alter the script to accept the LOGON command from another dataset/file under the control of ACF2 or another client-resident security system. For example: //stepname EXEC PGM=TPUMP,... //SECURE DD DSN = FOO //SYSIN DD * .LOGTABLE MYTABLE; .RUN SECURE; You can then log on by simply entering the LOGON command with a valid user name and no password if your system administrator has granted this option. As an example, to log onto TPump as user ABC with ABC as the password (which is masked from view on the output listing), specify the LOGON command on one line as follows: .logon ABC,ABC When the command is entered, TPump displays something like: **** **** **** **** **** 22:13:18 22:13:18 22:13:18 22:13:18 22:13:26 UTY8400 UTY8400 UTY8400 UTY8400 UTY6211 Teradata Database Release: 12.00.00.00 Teradata Database Version: 12.00.00.00 Default character set: EBCDIC Maximum supported buffer size: 1M A successful connect was made to the RDBMS Logon exits are supported on both mainframe and UNIX clients. The CLIv2 User Logon Exit routine can be used to make some or all logon string elements optional. LOGON is used with the LOGTABLE command, both of which are required. LOGON and LOGTABLE may appear in any order, but must precede other commands except RUN FILE commands used to identify the file containing the LOGON command. If you enter LOGON first, you are warned that LOGTABLE is required. The parameters (tdpid, username, password, and ’acctid’) are used in all sessions established with the Teradata Database. The LOGON command may occur only once. Note: If the RDBMS is configured to use single sign on (SSO) and you are logged on to the Teradata client machine, the machine name, user name, and password are not required in the LOGON command. The user name and password combination specified when you logged on to your Teradata client machine are authenticated via network security for a SSO such that valid Teradata users will be permitted to log on to the Teradata Database. The use of SSO is strictly optional, unless the Gateway has been configured to accept only SSO-style logons. If you want to connect to a Teradata Database other than the default, the TDPid must be included in the LOGON command. If the TDPid is not specified, the default contained in clispb.dat will be used. To be interpreted correctly, the TDPid must be followed by the slash separator (‘/’), to distinguish the TDPid from a Teradata Database user name. For example, to connect to slugger, you would enter one of the following: 166 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands LOGON .LOGON slugger/; .LOGON slugger/,,'acctinfo'; If you enter the LOGON command first, TPump warns you that the LOGTABLE command is also required. If an account ID is to be used, the optional account ID must be specified in the LOGON command. Teradata Parallel Data Pump Reference 167 Chapter 3: TPump Commands LOGTABLE LOGTABLE Purpose The LOGTABLE command identifies the table to be used for journaling checkpoint information required for safe, automatic restart of the TPump support environment in the event of a client or Teradata Database hardware platform failure. The LOGTABLE command is used in conjunction with the LOGON command, both of which are required. Both LOGON and LOGTABLE may appear in any order, but must precede any other commands except any RUN FILE commands used to identify the file containing the LOGON command. If you enter LOGON first, you are warned that the LOGTABLE is required. Caution: Do not share the restart log table between two or more TPump jobs. Each TPump job must have its own restart log table to ensure that it runs correctly. If you do not use a distinct restart log table for each TPump job, the results are unexpected. In addition, you may not be able to restart one or more of the affected jobs. Syntax ; tname .LOGTABLE dbname. 3021A029 where Syntax Element Description dbname (optional) database name under which the log table exists The default is the database name associated with the username specified in the LOGON command. TPump searches for the table (tname) in that database unless another database name is specified in this option. tname identifier for the restart log table Usage Notes A LOGTABLE command is required for each invocation of TPump. Only a single LOGTABLE command is allowed for each execution. It must precede all environmental and application commands (other than RUN FILE and LOGON) in the input stream. The specified table is used as the TPump restart log. It does not have to be fully qualified. If the table exists, it is examined to determine if this is a restart. When this is the case, a restart is done automatically. If the table does not exist, it is created and used as a restart log during this invocation of TPump. 168 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands LOGTABLE The log table is maintained automatically by the Teradata TPump. If you manipulate this table in any way, the restart capability is lost. The only action that you should take is to DROP the log table; never attempt to delete rows from the table. The log table should not be dropped when the Teradata Tpump job using that log table is running. If the log table is dropped during a job run, Teradata Tpump will run into errors. You cannot override the default for the database name with a DATABASE statement because it must come after LOGTABLE/ LOGON. Instead, you must use the LOGTABLE dbname option. TPump allows a DELETE DATABASE statement because DELETE is a standard Teradata SQL function. This statement can delete the current restart log after it has been created, which terminates the job. Example The following example presents both the LOGTABLE command and the LOGON command as they typically occur. .logtable Mine.Logtable001; .logon tdpx/me,paswd; Log Table Space Requirements The calculation of space requirements for a TPump log table is highly dependent on the specifics of the job. Although there are mandatory inserts for every TPump job, others occur on a job-dependent basis. See “Estimating Space Requirements” for details on how to calculate log table space. Teradata Parallel Data Pump Reference 169 Chapter 3: TPump Commands NAME NAME Purpose The NAME command assigns a unique job name identifier to the environmental variable &SYSJOBNAME. Syntax .NAME ; jobname 3021A030 where Syntax Element Description jobname character string that identifies the name of a job in a maximum of 16 characters If this command is not specified, the default job name of ltdbase_logtable is used, where: 1 ltdbase is a character string of up to the first seven characters of the name of the database containing the log table. 2 logtable is a character string with the first eight characters of the log table name. Usage Notes The NAME environmental command must be used only once, in order to set the job name and the variable &SYSJOBNAME. Further attempts to execute the command will fail. The NAME command sets the variable &SYSJOBNAME to the specified string. The string is truncated to 16 characters. It is an error to use this command more than once in a TPump script or after the first BEGIN LOAD command in the script. If &SYSJOBNAME is not set using the NAME command, it defaults to MYYYYMMDD_HHMMSS_LLLLL, where M = macro MM = month DD = day YYYY = year hh = hour mm = minute ss = second 170 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands NAME lllll = is the low order 5 digits of the logon sequence number returned by the dbs from the .LOGON command. This variable is not set until created with the NAME command, or with the first BEGIN LOAD by default. Any attempt to use it before a NAME command is issued (or before the first BEGIN LOAD if there is no NAME command), results in a syntax error. This variable is significant because it is used by TPump when composing default names for various database artifacts, namely the error table and TPump-created macros. Note: If serialization for two or more DML statements is required, they cannot be put in different partitions. Serialization requires that all DML statements with identical hash values of the rows be submitted from the same session. Teradata Parallel Data Pump Reference 171 Chapter 3: TPump Commands PARTITION PARTITION Purpose The PARTITION command defines a collection of sessions used to transfer SQL requests to the Teradata RDBMS. A DML command may name the partition to be used for its requests to the RDBMS. A default session partition may still be created using the SESSIONS and PACK parameters of the BEGIN LOAD command. This command works in conjunction with a DML parameter, PARTITION, which names the session partition that a DML’s SQL will use. If the DML command does not have a PARTITION parameter, then the default partition created using the SESSIONS and PACK parameters of the BEGIN LOAD command will be used. Syntax .PARTITION partition_name SESSIONS A number threshold A DATAENCRYPTION OFF ON statements PACK PACKMAXIMUM 3021B018 where Syntax Element Description number number of sessions to be logged on for the partition TPump logs on and uses the number of sessions specified to communicate requests to the Teradata Database. There is no default value for number; it must be specified. Neither is there a maximum value, except for system-wide session limitations, which vary among machines. Limiting the number of sessions conserves resources on both the external system and the Teradata Database. This conservation is at the expense of a potential decrease in throughput and increase in elapsed time. 172 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands PARTITION Syntax Element Description DATAENCRYPTION ON/OFF keyword to encrypt import data and the request text during the communication between TPump and Teradata Database for the sessions defined in the PARTITION command If ON, the encryption will be performed. If OFF, the encryption will not be performed. If DATAENCRYPTION is not specified, the default is OFF when "-y" runtime parameter is not specified and DATAENCRYPTION is OFF in the BEGIN LOAD command. If "-y" runtime parameter is specified or DATAENCRYPTION is ON in the BEGIN LOAD command, the default is ON. This option applies to the sessions defined by the PARTITION command. When the session is specified explicitly, the setting overrides the encryption setting by the "-y" runtime parameter and by the DATAENCRYPTION option in the BEGIN LOAD command for the sessions defined in the PARTITION command. PACK keyword for the number of statements to pack into a multiple-statement request Maximum value is 600. Packing improves network/channel efficiency by reducing the number of sends and receives between the application and the Teradata Database. PACKMAXIMUM keyword requesting TPump to dynamically determine the maximum possible PACK factor for the current partition Maximum value is 600. Displayed in message UTY6652, the value thus determined should be specifically used on subsequent runs, as the use of PACKMAXIMUM requires iterative interactions with the RDBMS during initialization to heuristically determine the maximum possible PACK factor. partition_name name assigned to the partition for reference by one or more subsequent DML commands A partition name must obey the same rules for its construction as Teradata SQL column names. The name specified may be used in the PARTITION clause of a DML command. SESSIONS Teradata Parallel Data Pump Reference keyword for designating the number of sessions for the partition 173 Chapter 3: TPump Commands PARTITION Syntax Element Description statements number of statements, as a positive integer of up to 600, to pack into a multiple-statement request Default value is 20 statements per request. Note: Under certain conditions, TPump may determine that the pack factor has been set too high. TPump then automatically lowers the pack setting to an appropriate value and issues warning message UTY6625, for example: “UTY6625 WARNING: Packing has been changed to 12 statements per request”, and continues. Packing improves network/channel efficiency by reducing the number of sends/receives between the application and the RDBMS. The packing factor is validated by sending a fully packed request to the Teradata Database using a prepare. This test checks for syntax problems and requests that are excessively large and overwhelm the parser. To simplify the script development process, TPump ignores certain errors returned by an overloaded parser, shrinks the request, retries the prepare until it executes successfully and finally, issues a warning noting the revised packing factor size. When this happens, the TPump script should be modified to eliminate the warning, which avoids the time-consuming process of shrinking the request. Note: A packing failure may occur if the source parcel length does not match the data defined. If this happens, TPump issues the message: “UTY2819 WARNING: Packing may fail because input data does not match with the data defined.” To resolve this problem, increase the packing factor and resubmit the job. threshold minimum number of sessions to be logged on for the partition When logging on sessions, if system limits are reached above the threshold value, TPump stops trying to log on, and uses whatever sessions are already logged on. If the sessions run out before the threshold is reached, TPump logs off all sessions, waits for the time determined by the SLEEP value (specified in the BEGIN LOAD command), and tries to log on again. Example A sample script that uses partitioning follows: .LOGTABLE TPLOG01; .LOGON cs4400s3/cfl,cfl; DROP TABLE TPTBL01; DROP TABLE TPTBL02; DROP TABLE TPERR01; CREATE TABLE TPTBL01, FALLBACK( C1 CHAR(12) not null, C2 CHAR(8) not null) PRIMARY INDEX (C1); 174 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands PARTITION CREATE TABLE TPTBL02, FALLBACK( C1 CHAR(12), C2 CHAR(8), C3 CHAR(6)) UNIQUE PRIMARY INDEX (C1); .BEGIN LOAD ERRLIMIT 100 50 CHECKPOINT 15 TENACITY 2 ERRORTABLE TPERR01 ROBUST off serialize on ; .LAYOUT LAY02; .FIELD cc1 * CHAR(12) key; .FIELD cc2 * CHAR(8); .FIELD cc3 * CHAR(6); .filler space1 * char(1); .partition part1 pack 10 sessions 10; .partition part2 sessions 5 1 packmaximum; .DML LABEL LABEL01 partition part1 DO INSERT FOR MISSING ROWS ignore extra update rows use(cc1, cc2); UPDATE TPTBL01 SET C2 = :CC2 WHERE C1 = :CC1; INSERT TPTBL01 (C1, C2) VALUES (:CC1,:CC2); .DML LABEL LABEL02 partition part2 serializeon( cc1 ) ignore extra update rows DO INSERT FOR MISSING UPDATE ROWS; UPDATE TPTBL02 SET C2 = :CC2 WHERE C1 = :CC1; INSERT TPTBL02 (C1, C2, C3) VALUES (:CC1,:CC2,:CC3); .IMPORT INFILE c:\NCR\Test\TpumpData001.txt FORMAT TEXT LAYOUT LAY02 APPLY LABEL01 APPLY LABEL02 where CC2 = '00000001'; .END LOAD; .LOGOFF; Teradata Parallel Data Pump Reference 175 Chapter 3: TPump Commands ROUTE ROUTE Purpose The ROUTE command identifies the destination of various outputs produced by TPump. Syntax .ROUTE MESSAGES ; fileid1 FILE TO ECHO fileid2 FILE WITH TO ECHO OFF WITH FILE fileid1 TO FILE TO fileid2 ECHO WITH fileid1 TO ECHO OFF WITH 3021A031 where Syntax Element Description MESSAGES preferred location where the messages be redirected (normally written to DDNAME SYSPRINT in VM/MVS or stdout in UNIX); that is, sent to an additional destination, or both fileid1 and fileid2 alternate message destination in the external system UNIX and Windows Fileid is the path name for a file. If the path name has embedded white space characters, enclose the entire path name in single or double quotes. VM A FILEDEF name. MVS A DDNAME. See the MVS fileid topic in the “Usage Notes” section. ECHO additional destination, with a fileid specification Use the ECHO keyword to specify, for example, that messages be captured in a file (fileid2) while still being written to your terminal. Note: The ECHO OFF specification cancels the additional file specification of a previously established ECHO destination. Usage Notes In MVS, fileid is a true DDNAME; in VM/CMS, it is a FILEDEF name; and in UNIX, it is a file pathname. If DDNAME is specified, TPump writes data records to the specified destination. A 176 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands ROUTE DDNAME must obey the same rules for its construction as Teradata SQL column names except that the “at” sign (@) is allowed as an alphabetic character and the underscore ( _ ) is not allowed. The DDNAME must also obey the applicable rules of the external system. If DDNAME represents a data source on magnetic tape, the tape may be either labelled or nonlabelled (if the operating system supports it). On UNIX systems, you can use an asterisk (*) as the fileid1 or fileid2 specifications to route messages to the system console/standard output (stdout) device. The system console is the: • Display screen in interactive mode or • Standard output device in batch mode Example 1 .ROUTE MESSAGES TO FILE OUTPUT WITH ECHO TO FILE SYSOUT; ECHO, when specified with OFF, stops routing output to the previously established echo destination. Example 2 .ROUTE MESSAGES FILE OUTPUT; The messages are written to the file designated by OUTPUT from this point unless redirected by another ROUTE command. In UNIX-based systems, if “outfilename” is used to redirect “stdout,” and also as the file in a ROUTE WITH ECHO command, the results written to “outfilename” may be incomplete due to conflicting writes to the same file. Teradata Parallel Data Pump Reference 177 Chapter 3: TPump Commands RUN FILE RUN FILE Purpose The RUN FILE command invokes the specified external source as the current source of commands and statements. Syntax .RUN FILE ; fileid IGNORE charpos1 charpos1 THRU THRU charpos2 charpos1 THRU charpos2 3021A032 where Syntax Element Description fileid data source of the external system The client system DD or equivalent statement specifies a file. UNIX and Windows infilename (the path name for a file). If the path name has embedded white space characters, enclose the entire path name in single or double quotes. MVS a true DDNAME. If DDNAME is specified, TPump reads data records from the specified source. A DDNAME must obey the same rules for its construction as Teradata SQL column names, except that: • the “at” sign (@) is allowed as an alphabetic character • the underscore (_) is not allowed The DDNAME must also obey the applicable rules of the external system. If DDNAME represents a data source on magnetic tape, the tape may be either labelled or nonlabelled (if the operating system supports it). The “at” sign (@) is allowed as an alphabetic character and the underscore (_) is not allowed. VM/CMS A FILEDEF name. 178 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands RUN FILE Syntax Element Description charpos1 and charpos2 start and end character positions of a field in each input record which contains extraneous information TPump ignores the specified field(s) as follows: 1 charpos1: only the single specified character position is ignored. 2 charpos1 THRU: character positions from charpos1 through the end of the record are ignored. 3 THRU charpos2: character positions from the beginning of the record through charpos2 are ignored. 4 charpos1 THRU charpos2: character positions from charpos1 through charpos2 are ignored. Usage Notes Once TPump executes the RUN FILE command, further commands and DML statements are read from the specified source until a LOGOFF command or end-of-file condition is encountered, whichever occurs first. An end-of-file condition automatically causes TPump to resume reading its commands and DML statements from the previously active source (or the previous RUN source when RUNs are nested), either SYSIN for VM/MVS, or stdin (normal or redirected) in UNIX. After SYSIN/stdin processes any user-provided invocation parameter, it remains the active input source. If an end-of-file condition occurs on fileid, SYSIN/stdin is read because there are no more commands or statements in the PARM string. The command source specified by a RUN FILE command may contain nested RUN FILE commands to 16 levels. On UNIX systems, you can use an asterisk (*) as the fileid specification for the system console/ standard input (stdin) device. The system console is the: • Keyboard in interactive mode or • Standard input device in batch mode Example .RUN FILE LOGON; Teradata Parallel Data Pump Reference 179 Chapter 3: TPump Commands SET SET Purpose The SET command assigns a data type and a value to a utility variable. Variables need not be declared in advance to be the object of the SET command. If a variable does not already exist, the SET command creates it. The SET command also dynamically changes the data type to that of the assigned value if it has already been defined. Variables used to the right of TO in the expression must have already been defined. Syntax .SET ; expression var TO 3021A033 where Syntax Element Description var name of the utility variable which is set to the evaluated expression following it expression value and data type to which the utility variable is to be set Usage Notes The utility variable can be substituted wherever substitution is allowed. If the expression evaluates to a numeric value, the symbol is assigned an integer value, as in: .SET FOONUM TO -151 ; If the expression is a quoted string, the symbol is assigned a string value, as in: .SET FOOCHAR TO ’-151’ ; The minimum and maximum limits for Floating Point data types are as follows: 4.0E-75 <=abs(float variable)<7.0E75 Example 1 TPump supports concatenation of variables, using the SET command, such as: .SET C TO 1; .SET D TO 2; .SET X TO &C.&D; 180 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands SET Example 2 In this example, X evaluates to 12. If a decimal point is added to the concatenated variables, as in: .SET C TO 1; .SET D TO 2; .SET X TO &C..&D; X then evaluates to 1.2. Teradata Parallel Data Pump Reference 181 Chapter 3: TPump Commands SYSTEM SYSTEM Purpose The SYSTEM command allows you to access the local operating system during TPump operations. Syntax .SYSTEM 'oscommand' ; 3021A034 where Syntax Element Description ‘oscommand’ command string (enclosed within single quotes) that is appropriate to the local operating system The SYSTEM command suspends the current TPump application in order to execute the command. When the command completes, the return code from the invoked command is displayed, and the &SYSRC variable is updated with the return code. Usage Notes On MVS clients, the command is passed to the PGM executor. The first token of the command string is interpreted as a load module and the remainder as a PARM string. As an example, the following statement calls the load module IEBUPDTE, passing the PARM string “NEW”. .SYSTEM “IEBUPDTE NEW”; This command calls IEBUPDTE in the same way it is called via the JCL statement: //EXEC PGM=IEBUPDTE,PARM=’NEW’ On MVS, the program must be present in the STEPLIB or JOBLIB concatenation, be resident in one of the LPAs, or be located in the linklist concatenation. Otherwise, the invocation will fail, with code SYS_ABTM (-14) returned, resulting from an underlying abend S806-04. Other types of failures also are possible. Similarly, on VM clients, if the command to be executed is not found, an abend is likely to occur. 182 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands SYSTEM On VM clients, the SYSTEM command is passed to the CMS SUBSET executor and the result returned. On UNIX clients, the SYSTEM command invokes the standard UNIX interface to issue the command to the shell (sh), and waits for its completion. Teradata Parallel Data Pump Reference 183 Chapter 3: TPump Commands TABLE TABLE Purpose The TABLE command identifies a table whose column names and data descriptions are used as the names and data descriptions of the input record fields. These are assigned in the order defined. The TABLE command must be used with the LAYOUT command. Syntax .TABLE ; tableref 3021A035 where Syntax Element Description tableref existing table whose column names and data descriptions are assigned, in the order defined, to fields of the input data records The column names of the table specified by the TABLE command must be Teradata SQL column names that do not require being enclosed in quotation marks. Tables cannot be created with invalid column names. Any nonstandard column name causes one of three kinds of errors, depending on the nature of the divergence from the standard. These errors are: 1 Embedded blanks cause a syntax error, depending on the nonblank contents of the name. 2 Invalid characters cause an invalid name error. 3 Reserved words cause a syntax error that mentions invalid use of the reserved word. Usage Notes One or more TABLE commands may be intermixed with the FIELD command or FILLER command following a LAYOUT command. This method of specifying record layout fields assumes each field, as defined by the data description of the corresponding column of tableref, is contiguous with the previous one, beginning at the next available character position beyond any previous field specifications for the input records. The fields must appear in the order defined for the columns of the table. The object identified by the tableref parameter must be a table. It need not appear as a parameter of the BEGIN LOAD command, but you must either be an owner of the object or have at least one privilege on it. If specified as an unqualified table name, the current default database qualifies it. 184 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands TABLE When serialization has been set to ON by the BEGIN LOAD command, the primary index columns for the specified table are considered key fields for serialization purposes. When the TABLE command is used and the table contains a structured UDT type, TPump returns an external representation of the UDT and that requires the user to transform. The term “external type” means the data type of the external opaque container for a structured UDT and is the type returned by the from-sql transform method. Teradata Parallel Data Pump Reference 185 Chapter 3: TPump Commands UPDATE Statement and Atomic Upsert UPDATE Statement and Atomic Upsert Purpose TPump supports the UPDATE Teradata SQL statement, which changes field values in existing rows of a table. Syntax , UPDATE UPD tname SET cname = expr WHERE conditional ; 3021A036 where Syntax Element Description tname table or view to be updated This table was previously identified as tname in the BEGIN LOAD command. If tname is not explicitly qualified by database name, the current default database qualifies it. cname column whose value is to be replaced by the value of expr The column named must not be a column of the primary index. expr expression whose resulting value is to replace the current value of the identified column The expression can contain any combination of constants, current values of columns of the referenced row, or values from fields of input data records. References to fields of input data records are as follows: :fieldname where :fieldname is defined by a FIELD command or TABLE command of the layout referenced by an IMPORT using this UPDATE. WHERE condition conditional clause specifying the row or rows to be updated The conditional clause can use values from fields of input data records by referring to their field names as follows: :fieldname where fieldname is defined by a FIELD command or TABLE command. Equality values for all the columns of the primary index must be explicitly specified in this clause. 186 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands UPDATE Statement and Atomic Upsert Usage Notes - Update The following notes describe how to use an UPDATE statement following a DML command. An UPDATE statement may also be used in the support environment; normal rules for UPDATE are followed in that case. 1 To update records in a table, the userid that is logged on must have UPDATE privilege for the table. 2 In an IMPORT task, if you specify multiple UPI columns, they should all be specified in the WHERE clause of the UPDATE statement. In this case, the WHERE clause is fully qualified, thereby allowing TPump to avoid table locks and optimize the processing. 3 For TPump use, if the object of the UPDATE statement is a view, it must not specify a join. TPump operates only on single table statements so UPDATE statements must not contain any joins. 4 Only one object may be identified. 5 The OR construct can be used in the WHERE clause of a DELETE statement; alternatively, two or more separate DML statements (one per OR term) can be used, with the DML statements applied conditionally with the APPLY clause of the IMPORT command. The nature of the alternatives will usually make one of the methods more appropriate. 6 The maximum number of INSERT, UPDATE, DELETE, and EXECUTE statements that can be referenced in an IMPORT is 127. 7 The maximum number of DML statements that can be packed into a request is 600. The default number of statements packed is 20. Note: To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in the TPump job. Example The following example describes an input data source containing a series of 14-byte records. Each record contains the value of the primary index column (EmpNo) of a row of the Employee table whose PhoneNo column is to be assigned a new phone number from field Fone. .BEGIN LOAD SESSION number; .LAYOUT Layoutname; .FIELD EmpNum 1 INTEGER; .FIELD Fone * (CHAR (10)); .DML LABEL DMLlabelname; UPDATE Employee SET PhoneNo = :Fone WHERE EmpNo = :EmpNum; .IMPORT INFILE Infilename LAYOUT Layoutname APPLY DMLlabelname; .END LOAD; Usage Notes - Atomic Upsert The syntax for Atomic upsert consists of an UPDATE statement and an INSERT statement separated by an ELSE keyword as follows: UPDATE <update-operands> ELSE INSERT <insert-operands>; Teradata Parallel Data Pump Reference 187 Chapter 3: TPump Commands UPDATE Statement and Atomic Upsert TPump inserts the ELSE keyword between the UPDATE and INSERT statements by itself, so the user should not enter it in the script. If the ELSE keyword is used in this context, TPump will terminate with a syntax error. The <update-operands> and <insert-operands> are operands for regular UPDATE and INSERT SQL statements, respectively. Only certain types of UPDATE and INSERT operands are valid in an Atomic upsert statement, and the operand parameters within a given upsert statement are subject to further constraints linking the update and insert parameters. When using the standard upsert feature, the primary index should always be fully specified for the UPDATE statement, just as for other DML in a TPump script, so that the update can be processed as a one-AMP rather than an all-AMP operation. In addition, both the UPDATE and the INSERT of an upsert statement pair should specify the same target table, and the primary index value specified in the UPDATE's WHERE clause should match the primary index value implied by the column values in the INSERT. When processing an Atomic upsert statement, the Teradata Database will usually reject statements that fail to meet these basic upsert constraints and return an error, enabling TPump to detect and handle constraint violations. Constraints considered to be basic to the upsert operation are: 1 SAME TABLE: The UPDATE and INSERT statements must specify the same table. 2 SAME ROW: The UPDATE and INSERT statements must specify the same row; that is, the primary index value in the inserted row must be the same as the primary value in the targeted UPDATE row. 3 HASHED ROW ACCESS: The UPDATE must fully specify the primary index, allowing the target row to be accessed with a one-AMP hashed operation. Some of these restrictions concern syntax that is supported in UPDATE and INSERT statements separately but not when combined in an Atomic upsert statement. Restrictions not supported by the Atomic upsert feature that return an error if submitted to the Teradata Database are: 188 1 INSERT-SELECT: Syntax not supported. The INSERT may not use a subquery to specify any of the inserted values. Note that support of this syntax is likely to be linked to support of subqueries in the UPDATE's WHERE clause constraints as described above, and may involve new syntax features to allow the UPDATE and INSERT to effectively reference the same subquery. 2 UPDATE-WHERE-CURRENT: Syntax not supported. The WHERE clause cannot use an updatable cursor to do what is called a positioned UPDATE. (It is unlikely that this syntax will ever be supported.) Note that this restriction does not prevent cursors from being used in other ways with Atomic upsert statements. For example, a DECLARE CURSOR statement may include upsert statements among those to be executed when the cursor is opened, as long as the upserts are otherwise valid. 3 UPDATE-FROM: Not supported. The SET clause cannot use a FROM clause table reference in the expression for the updated value for a column. 4 UPDATE-WHERE SUBQUERIES: Not supported. The WHERE clause cannot use a subquery either to specify the primary index or to constrain a nonindex column. Note that Teradata Parallel Data Pump Reference Chapter 3: TPump Commands UPDATE Statement and Atomic Upsert supporting this UPDATE syntax would also require support for either INSERT-SELECT or some other INSERT syntax feature that lets it specify the same primary index value as the UPDATE. 5 UPDATE-PRIMARY INDEX: Not supported. The UPDATE cannot change the primary index. This is sometime called unreasonable update. 6 TRIGGERS: Feature not supported if either the UPDATE or INSERT could cause a trigger to be fired. The restriction applies as if the UPDATE and INSERT were both executed, because the parser trigger logic will not attempt to account for their conditional execution. UPDATE triggers on columns not referenced by the UPDATE clause, however, will never be fired by the upsert and are therefore permitted. DELETE triggers cannot be fired at all by an upsert and are likewise permitted. Note that an upsert could be used as a trigger action but it would be subject to the same constraints as any other upsert. Because upsert is not allowed to fire any triggers itself, an upsert trigger action must not generate any further cascaded trigger actions. 7 JOIN/HASH INDEXES: Feature not supported if either the UPDATE or INSERT could cause the join/hash index to be updated. As with triggers, the restriction applies to each upsert as if the UPDATE and INSERT were both executed. While the UPDATE could escape this restriction if the join/hash index does not reference any of the updated columns, it is much less likely (and maybe impossible) for the INSERT to escape this restriction. If the benefit of lifting the restriction for a few unlikely join/hash index cases turns out to be not worth the implementation cost, the restriction may have to be applied more broadly to any table with an associated join/hash index. TPump treats a failed constraint as a nonfatal error, reports the error in the job log for diagnostic purposes, and continues with the job by reverting to nonbasic upsert protocol. To resolve order-dependency issues, TPump always processes the UPDATE before the INSERT because: • It matches the ordering implied by the upsert name: UP[date] + [in]SERT. • It matches the ordering implied by the UPDATE-ELSE-INSERT syntax. • It matches the common definition of upsert semantics. • It allows for an upsert operation on MULTISET tables, where an insert-first policy would always succeed on INSERT and never on UPDATE. Existing TPump scripts for upsert do not need to be changed. The syntax as described below for upsert will continue to be supported: DO INSERT FOR MISSING UPDATE ROWS; UPDATE <update-operands>; INSERT <insert-operands>; TPump changes this syntax into Atomic upsert syntax by replacing the semicolon between the UPDATE and INSERT statements with an ELSE keyword to convert the statement pair into a single Atomic upsert statement. If user-created macros are used in place of the UPDATE and INSERT statements, TPump generates: EXEC <update-macro> ELSE EXEC <insert-macro>; Teradata Parallel Data Pump Reference 189 Chapter 3: TPump Commands UPDATE Statement and Atomic Upsert because this statement does not conform to Atomic upsert syntax used by TPump. Atomic Upsert Examples This section describes several examples that demonstrate how the Atomic upsert feature works, including cases where errors are detected and returned to the user. All of the examples use the same table, called Sales, as shown below: CREATE TABLE Sales, FALLBACK, (ItemNbrINTEGER NOT NULL, SaleDateDATE FORMAT 'MM/DD/YYYY' NOT NULL, ItemCountINTEGER) PRIMARY INDEX (ItemNbr); Assume that the table has been populated with the following data: INSERT INTO Sales (10, '05/30/2005', 1); A table called NewSales has the same column definitions as those of table Sales. Example 1 (Error: different target tables) This example demonstrates an upsert statement that does not specify the same table name for the UPDATE part and the INSERT part of the statement. UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005') ELSE INSERT INTO NewSales (10, '05/30/2005', 1); A rule of an upsert is that only one single table is processed for the statement. Because the tables, Sales and NewSales, are not the same for the upsert statement, an error is returned to the user indicating that the name of the table must be the same for both the UPDATE and the INSERT. Example 2 (Error: different target rows) This example demonstrates an upsert statement that does not specify the same primary index value for both the UPDATE and INSERT parts of the statement. UPDATE Sales SET ItemCount = Itemcount + 1 WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005') ELSE INSERT INTO Sales (20, '05/30/2005', 1); The primary index values for the UPDATE and the INSERT must be the same. In this case, an error is returned to the user indicating that the primary index value must be the same for both the UPDATE and the INSERT. Example 3 (Error: unqualified primary index) This example demonstrates an upsert statement that does not specify the primary index in the WHERE clause. UPDATE Sales SET ItemCount = ItemCount + 1 WHERE SaleDate = '05/30/2005' ELSE INSERT INTO Sales (10, '05/30/2005', 1); When the primary index is not fully specified in the UPDATE of an upsert statement, an allrow scan to find rows to update might result. This is again not the purpose of upsert, and an error is returned to the user. 190 Teradata Parallel Data Pump Reference Chapter 3: TPump Commands UPDATE Statement and Atomic Upsert Example 4 (Error: missing ELSE) This example demonstrates an upsert statement with a missing ELSE keyword. UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005') INSERT INTO Sales (10, '05/30/2005', 1); Example 5 (Error: INSERT-SELECT) This example demonstrates an upsert statement that specifies INSERT-SELECT. UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005') ELSE INSERT INTO Sales SELECT * FROM NewSales WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005'); The INSERT part of an upsert may not use a subquery to specify any of the inserted values. In this case, a syntax error is returned. Example 6 (Error: UPDATE-FROM) This example demonstrates an upsert statement that specifies UPDATE-FROM. UPDATE Sales FROM NewSales SET Sales.ItemCount = NewSales.ItemCount WHERE Sales.ItemNbr = NewSales.ItemNbr ELSE INSERT INTO Sales (10, '05/ 30/2005', 1); The SET clause may not use a FROM clause table reference in the expression for the updated value for a column, and an error is returned. Example 7 (Error: UPDATE-WHERE SUBQUERIES) This example demonstrates an upsert statement that specifies UPDATE-WHERE SUBQUERIES. UPDATE Sales SET ItemCount = ItemCount + 1 WHERE ItemNbr IN (SELECT ItemNbr FROM NewSales) ELSE INSERT INTO Sales (10, '05/30/2005', 1); The WHERE clause of the UPDATE may not use a subquery for any purpose. In this case, error ERRTEQUPSCOM is returned. Example 8 (Error: UPDATE-PRIMARY INDEX) This example demonstrates an upsert statement that tries to update a primary index value. UPDATE Sales SET ItemNbr = 20 WHERE (ItemNbr = 10 AND SaleDate = '05/30/ 2005') ELSE INSERT INTO Sales (20, '05/30/2005', 1); Unreasonable updates or updates that change the primary index values are not allowed in an upsert statement, and an error is returned. Example 9 (Valid Upsert UPDATE) This example demonstrates a successful upsert statement that updates a row. UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005') ELSE INSERT INTO Sales (10, '05/30/2005', 1); After all of the rules have been validated, the row with ItemNbr = 10 and SaleDate = '05/30/ 2005' gets updated. A successful update of one row results. Teradata Parallel Data Pump Reference 191 Chapter 3: TPump Commands UPDATE Statement and Atomic Upsert Example 10 (Valid Upsert INSERT) This example demonstrates a successful upsert statement that inserts a row. UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 20 AND SaleDate = '05/30/2005') ELSE INSERT INTO Sales (20, '05/30/2005', 1); After all of the rules have been validated and no row was found with Item = 20 and SaleDate = '05/30/2005' for the UPDATE, a new row is inserted with ItemNbr = 20. A successful insert of one row results. 192 Teradata Parallel Data Pump Reference CHAPTER 4 Troubleshooting in TPump This chapter provides a description of the user aids for identifying and correcting errors that may occur during a TPump task. Foremost among these tools are a large number of error messages. For more information on error messages, refer to the Messages manual. Troubleshooting information in this chapter includes: • Early Error Detection • Error Types • Error Messages • Reading TPump Error Tables • TPump Performance Checklist Early Error Detection The TPump utility avoids wasting time and resources on a task that contains “terminating” errors in either input statements, commands, or both. To accomplish this, statements and commands for a task are acquired and analyzed for detectable syntax and other errors before the TPump task is initiated on the Teradata Database. When a BEGIN LOAD command invokes TPump, and the utility can complete an error-free pass, it proceeds. If not, TPump cleans up and terminates after an error pass. TPump uses the Teradata Database to detect errors in the set of DML statements for the task. The first statement in error terminates TPump. Error Types Most errors are fatal, resulting in termination of TPump. The exceptions to this general rule are as follows: • User-specified SQL commands fail with no adverse effect. The variable &SYSRC is set and if the script tests this variable it can stop the job if necessary. • Data-related errors in the RDBMS can reach the user-specified error limit before terminating the job. A list of data related errors is provided in Table 19. • Errors which can be retried. The error numbers for these types of errors are: 2595, 2631, 2639, 2641, 2826, 2834, 2835, 3110, 3111, 3231, 3120, 3319, 3598, 3603, 5991, 6699, 8018, and 8024. Teradata Parallel Data Pump Reference 193 Chapter 4: Troubleshooting in TPump Error Messages Error Messages Teradata Database error message numbers that identify errors that can be fixed and resubmitted are 2631, 2639, 2641, 2834, 2835, 3110, 3111, 3120, 3127, 3178, 3598, 3603, and 8024. TPump ignores errors on Teradata SQL statements outside of the TPump task; that is, before the BEGIN LOAD command or after the END LOAD command. The TPump job continues and no return code is returned, although Teradata Database error messages are displayed. When TPump encounters errors caused by Teradata Database failure, it neither terminates the job nor produces a return code. When the Teradata Database recovers, TPump restarts the job and continues without user intervention. Teradata Database error message numbers identifying a Teradata Database failure are 2825, 2826, 2827, 2828, 3897, and 8018. When one of these errors occur, a row is inserted into TPump’s error table for the statement or data record in question. If the error occurs for one of the statements in a multiple-statement request, the remaining statements are re-driven. The retryable errors are automatically retried up to 16 times if retry times are not specified. For the complete text and explanation of error messages, refer to the Messages manual. A row is inserted into TPump’s error table for the statement in error. If the error occurs for one of the statements in a multiple-statement request, then the remaining statements are redriven. These errors include the conditions listed in Table 19. Table 19: TPump Error Conditions 194 Error Description 2603 Bad argument for SQRT function. 2604 Bad argument involving %TVMID.%FLDID for SQRT function. 2605 Bad argument for LOG function. 2606 Bad argument involving %TVMID.%FLDID for LOG function. 2607 Bad argument for LN function. 2608 Bad argument involving %TVMID.%FLDID for LN function. 2614 Precision loss during expression evaluation. 2615 Precision loss calculating expression involving %TVMID.%FLDID. 2616 Numeric overflow occurred during computation. 2617 Overflow occurred computing an expression involving %TVMID.%FLDID. 2618 Invalid calculation: division by zero. 2619 Division by zero in an expression involving %TVMID.%FLDID. 2620 The format or data contains a bad character. Teradata Parallel Data Pump Reference Chapter 4: Troubleshooting in TPump Error Messages Table 19: TPump Error Conditions (continued) Error Description 2621 Bad character in format or data of %TVMID.%FLDID. 2622 Bad argument for ** operator. 2623 Bad argument involving %TVMID.%FLDID for ** operator. 2650 Numeric Processor Operand Error. 2651 Operation Error computing expression involving %TVMID.%FLDID. 2665 Invalid date. 2666 Invalid date supplied for %TVMID.%FLDID. 2674 Precision loss during data conversion. 2675 Numerical overflow occurred during computation. 2676 Invalid calculation: division by zero. 2679 The format or data contains a bad character. 2682 Precision loss during data conversion. 2683 Numerical overflow occurred during computation. 2684 Invalid calculation: division by zero. 2687 The format or data contains a bad character. 2689 Non-nullable field was null. 2700 Referential constraint violation: invalid Foreign Key value. 2726 Referential constraint violation: cannot delete/update the Parent Key value. 2801 Duplicate unique prime key error in %DBID.%TVMID. 2802 Duplicate row error in %DBID.%TVMID. 2803 Secondary index uniqueness violation in %DBID.%TVMID. 2805 Maximum row length exceeded in %TVMID. 2814 Data size exceeds the maximum specified. 2816 Failed to insert duplicate row into TPump target table. This error occurs if MARK DUPLICATE INSERT/UPDATE ROWS is specified and a duplicate row is detected. 2817 Activity count greater than one for TPump UPDATE/DELETE. This error occurs if MARK EXTRA UPDATE/DELETE ROWS is specified and an activity count greater than one was seen. In this case, the error table row is inserted, but the corresponding UPDATE/DELETE also completes. 2818 Activity count zero for TPump UPDATE or DELETE. This error occurs if MARK MISSING UPDATE/DELETE ROWS is specified and an activity count of zero was seen. 2844 Journal image is longer than maximum. Teradata Parallel Data Pump Reference 195 Chapter 4: Troubleshooting in TPump Error Messages Table 19: TPump Error Conditions (continued) 196 Error Description 2893 Right truncation of string data. 3535 A character string failed conversion to a numeric value. 3564 Range constraint: Check error in field%TVMID.%FLDID. 3577 Row size overflow. 3578 Scratch space overflow. 3604 Cannot place a null value in a NOT NULL field. 3751 Expected a digit for the exponent. 3752 Too many digits in exponent. 3753 Too many digits in integer or decimal. 3754 Numeric precision error. 3755 Numeric overflow error. 3756 Numeric divided-by-zero error. 3757 Numeric stack overflow error. 3758 Numeric stack underflow error. 3759 Numeric illegal character error. 3996 Right truncation of string data. 5317 Check constraint violation. 5326 Operand of EXTRACT function is not a valid data type or value. 5410 Invalid TIME literal. 5411 Invalid TIMESTAMP literal. 5991 Error during plan generation. 6705 Illegally formed character string was encountered during translation. 6706 The string contains an untranslatable character. 7433 Invalid time. 7441 Date not corresponding to an existing era. 7442 Invalid era. 7451 Invalid timestamp. 7452 Invalid interval. 7453 Invalid field overflow. 7454 DateTime field overflow. Teradata Parallel Data Pump Reference Chapter 4: Troubleshooting in TPump Reading TPump Error Tables Table 19: TPump Error Conditions (continued) Error Description 7455 Invalid time specified. Reading TPump Error Tables This section describes the reading and usage of TPump error tables as a diagnostic device to locate and fix problems. For more information, refer to the description of these tables in “BEGIN LOAD”. Occasionally, TPump encounters rows that cannot be correctly processed. When this happens, TPump creates a row in the error table that is produced for each target table. Error tables are structured to provide enough information to reveal the cause of a problem and allow correction. In the case of missing, duplicate, or extra rows, these are noted in the error table only if the DML command specifies that requirement with the MARK parameter, which is the default for DML statements, except for those participating in an upsert. There are three error codes that relate specifically to the incidence of missing, duplicate, and extra rows. These are: 1 2816: Failed to insert duplicate row into TPump target table. This error occurs if MARK DUPLICATE INSERT/UPDATE ROWS is specified and a duplicate row is detected. 2 2817: Activity count greater than one for TPump UPDATE/DELETE. This error occurs if MARK EXTRA UPDATE/DELETE ROWS is specified and an activity count greater than one resulted. In this case, the error table row is inserted, but the corresponding UPDATE/DELETE also completes. 3 2818: Activity count zero for TPump UPDATE or DELETE. This error occurs if MARK MISSING UPDATE/DELETE ROWS is specified and an activity count of zero resulted. The error table is used primarily to hold information about errors that occur while the Teradata Database is trying to redistribute the data during the acquisition phase. If the Teradata Database is unable to build a valid primary index, some application phase errors may be put into this table. . Table 20 defines the Acquisition Error Table, with column entries comprising the unique primary index. Teradata Parallel Data Pump Reference 197 Chapter 4: Troubleshooting in TPump Reading TPump Error Tables Table 20: Acquisition Error Table Column Data Type Definition ImportSeq byteint Sequence number assigned to the IMPORT command in which the error occurred. DMLSeq byteint Sequence number assigned to the DML command in which the error occurred. SMTSeq byteint Sequence number of the DML statement in the DML command that was being executed while this error occurred. ApplySeq byteint Sequence number of apply clause in IMPORT command executing when error occurred. SourceSeq integer The data row number in the client file that the DBC was building when the error occurred. DataSeq byteint The data source where the record resides. ErrorCode char(255) The RDBMS code for the error. ErrorMsg char The corresponding error message for the error code. ErrorField smallint The number of the field in error if it can be determined. HostData varbyte (63677) The first 63,677 bytes of client data associated with the error. The following TPump task describes how to interpret the error table information to isolate and fix the problem. This task is greatly abbreviated, containing only the DML command and the IMPORT command. A probable sequence of actions for locating and fixing the problem follows the task. SEQ TYPE -------DML STMT STMT SEQ # Statement --- ------------------------------------------------------001 .DML LABEL FIRSTDML; 001 INSERT INTO table1 VALUES( :FIELD1, :FIELD2 ); 002 UPDATE table2 SET field3 = :FIELD3 WHERE field4 =:FIELD4; DML STMT 002 001 .DML LABEL SECNDDML; DELETE FROM table3 WHERE field3 = :FIELD3; IMPORT APPLY 001 001 .IMPORT INFILE file1 LAYOUT layout1 APPLY FIRSTDML; IMPORT APPLY APPLY 002 001 002 .IMPORT INFILE file2 LAYOUT layout2 APPLY FIRSTDML APPLY SECNDDML; 198 Teradata Parallel Data Pump Reference Chapter 4: Troubleshooting in TPump Reading TPump Error Tables In this example, the Statement column represents the user entry. The SEQ # and SEQ TYPE columns are the Sequence Number and Sequence Type assigned to each statement. If an error occurs while using this task and the information in the following error table is displayed, you can determine where the error occurred and what was being executed at the time of the error. ImportSeq DMLSeq SMTSeq ApplySeq SourceSeq DBCErrorCode DBCErrorField ----------------- ---------------002 001 002 001 20456 2679 field3 The following sequence provides a series of analytical steps for extracting and interpreting the information in this row of the error table. 1 Check the DMLSeq field to find the statement being executed. It contains the sequence number 001. 2 Check the SMTSeq field. The sequence number 002 in this field indicates that the error occurred while executing: change this statement of the first DML command, which is the UPDATE statement in the above task. 3 Verify that the script shows that the DML command is used twice, once in each IMPORT. 4 The value of 002 in the ImportSeq field shows that the error occurred in the second IMPORT clause. 5 The value of 001 in the ApplySeq field indicates that the error occurred in the first apply of that clause, which was being executed when the error occurred. 6 The value of 2679 in the DBCErrorCode field shows: The format or data contains a bad character which indicates that bad data is coming from the client. 7 The ErrorField field of the error row shows that the error occurred while building field3 of the table. 8 The script then shows that the error occurred when field3 was being built from :FIELD3 in the client data. 9 The LAYOUT clause in the script shows where the problem data is positioned within the row coming from the client. 10 The script shows that the IMPORT clause with the error was loading file2, and indicates what error occurred, which statement detected the error, and which file has the error. 11 The SourceSeq field of the error table pinpoints the problem location in the 20456th record of this file. The problem is isolated and can now be fixed. Most problems in the error tables do not require as much research as this example required. This error was selected in order to use all of the information in the error table. As a rule, you only need to look at one or two items in the error tables to be able to locate and correct the problem. Teradata Parallel Data Pump Reference 199 Chapter 4: Troubleshooting in TPump TPump Performance Checklist TPump Performance Checklist The following checklist helps to isolate and analyze TPump performance problems and their causes. 200 1 Monitor the TPump job using the Monitor macros. Determine whether the job is making progress. 2 Check for locks. The existence of locks can be detected by using the Teradata Database Showlocks utility. The existence of transaction locks can be detected by checking for 'blocked' status Teradata Database utilities that use the performance monitor feature of the Teradata Database (Teradata Manager). 3 Check table DBC.Resusage for problem areas (for example, data bus capacity or CPU capacity at 100% for one or more processors). 4 Avoid large error tables, if possible, because error processing is generally expensive. 5 Verify that the primary index is unique. Nonunique primary indexes can cause severe TPump performance problems. Teradata Parallel Data Pump Reference CHAPTER 5 Using INMOD and Notify Exit Routines This chapter provides a detailed description of the INMOD feature used in TPump and the notify exit routines that are associated with INMODs. An INMOD is a user-generated module called by the IMPORT command, which reads data from a data source. TPump honors INMODs developed for other load utilities. Owing to the complexity of this feature, it has been given a separate place in this chapter, rather than including it in the command syntax descriptions. The following information is included in this chapter: • Overview • Using INMOD and Notify Exit Routines Overview This section provides an overview of the INMOD and Notify Exit routines. Information includes INMOD routines, notify exit routines, programming languages, programming structure, routine entry points, the TPump/INMOD routine interface, the TPump/notify exit routine interface, and rules and restrictions for using routines. INMOD Routines The term INMOD is an acronym for input modification routines. An INMOD is a user-written routine used by TPump to supply or preprocess input records before they are sent to the Teradata Database. You can use an INMOD routine to supply input records or to perform preprocessing tasks on the input records before passing them to TPump. Such tasks, for example, could: • Generate records to be passed to TPump. • Validate data records before passing them to TPump. • Read data directly from one or more database systems such as IMS, Total. • Convert fields in a data record before passing it to TPump. The INMOD is specified as part of the IMPORT command. See “IMPORT” for INMOD syntax information. Teradata Parallel Data Pump Reference 201 Chapter 5: Using INMOD and Notify Exit Routines Overview Notify Exit Routines A notify exit routine specifies a predefined action to be performed whenever certain significant events occur during a TPump job. Notify exit routines are especially useful in an operator-free environment where job scheduling relies heavily on automation to optimize system performance. For example, by writing an exit routine in C (without using CLIv2) and using the NOTIFY . . . EXIT option of the BEGIN LOAD command, you can provide a routine to detect whether a TPump job succeeds or fails, how many records were loaded, what the return code was for a failed job, and so on. Programming Languages The TPump utility is written in: • SAS/C for channel-attached VM and MVS client systems • C for network-attached UNIX and Windows client systems You can write INMOD and notify exit routines in the following programming languages, depending on the platform that runs TPump: 202 Platform Routines VM, MVS • INMOD routines in Assembler, COBOL, PL/I, or SAS/C • Notify exit routines in SAS/C UNIX, Windows • INMOD and notify exit routines in C Note: Although it is neither certified nor supported, you can write INMOD routines in COBOL on network-attached client systems if you use the Micro Focus COBOL for UNIX compiler. Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Overview Programming Structure Table 21 defines the structure by programming language for communicating between TPump and INMOD or notify exit routines. Table 21: Programming Routines by Language Routine Language Programming Structure Assembler First parameter: RRECORD RTNCODE RLENGTH RBODY DSECT DS F DS F DS CLxxxxx Note: In the RBODY specification, the body length xxxxx is: • 32004 for Teradata for Windows • 64004 for Teradata Database for UNIX Second parameter: IPARM RSEQ PLEN PBODY C DSECT DS F DS H DS CL100 First parameter: struct { } long Status; long RecordLength; char buffer[xxxxx]; Note: In the char buffer specification, the buffer length xxxxx is: • 32004 for Teradata for Windows • 64004 for Teradata Database for UNIX Second parameter: struc } COBOL long seqnum; short parmlen; char parm[80]; First parameter: 01 INMOD-RECORD. 03 RETURN-CODE PIC S9(9) COMP. 03 RECORD-LENGTH PIC 9(9) COMP. 03 RECORD-BODY PIC X(xxxxx) Note: In the RECORD-BODY specification, the body length xxxxx is: • 32004 for Teradata for Windows • 64004 for Teradata Database for UNIX Second parameter: 01 PARM-STRUCT. 03 SEQ-NUM PIC 9(9) COMP. 03 PARM-LEN PIC 9(4) COMP. 03 PARM-BODY PIC X(80). Teradata Parallel Data Pump Reference 203 Chapter 5: Using INMOD and Notify Exit Routines Overview Table 21: Programming Routines by Language (continued) Routine Language Programming Structure PL/I First parameter: DCL 1 PARMLIST, 10 STATUS FIXED BINARY(31,0) 10 RLENGTH FIXED BINARY(31,0) 10 REC CHAR(xxxxx) Note: In the REC CHAR specification, the length xxxxx is: • 32004 for Teradata for Windows • 64004 for Teradata Database for UNIX Second parameter: DCL 1 PARMLIST, 10 SEQNUM FIXED BINARY(31,0) 10 PLENGTH FIXED BINARY(15,0) 10 PBODY CHAR(80) In each structure, the records must be constructed so that the left-to-right order of the data field corresponds to the order of the field names specified in the TPump LAYOUT command and subsequent FIELD, FILLER, and TABLE commands. Routine Entry Points The following table shows the entry points for INMOD routines. INMOD Routine Language Entry Point SAS/C on VM and MVS platforms _dynamn COBOL and PL/I on VM and MVS platforms DYNAMN C on UNIX and Windows platforms _dynamn (or BLKEXIT*) *Only for FDL-compatible INMODs compiled and linked with BLKEXIT as the entry point. When the FDLcompatible INMOD is used, ’USING("FDLINMOD")’ must be specified in the IMPORT statement. The following table shows the entry points for Notify Exit routines. 204 Notify Exit Routine Language Entry Point SAS/C on VM and MVS platforms _dynamn COBOL and PL/I on VM and MVS platforms DYNAMN C on UNIX and Windows platforms _dynamn Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Overview The TPump/INMOD Routine Interface TPump exchanges information with an INMOD routine by using the conventional parameter register to point to a parameter list of two 32-bit addresses. The first 32-bit address points to a three-value structure consisting of status code, length, and body. The second 32-bit address points to a data structure containing a sequence number and a parameter list. Status Code Status Code is a 32-bit signed binary value that carries information in both directions. The TPump-to-INMOD interface uses eight status codes, as defined in Table 22. Table 22: TPump-to-INMOD Status Codes Value Description 0 TPump is calling for the first time and expects the INMOD routine to return a record. At this point, the INMOD routine should perform its initialization tasks before sending a data record to TPump. 1 TPump is calling, not for the first time and expects the INMOD routine to return a record. 2 The client system has been restarted, the INMOD routine should reposition to the last checkpoint, and TPump is not expecting the INMOD routine to return a data record. Note: If the client system restarts before the first checkpoint, TPump sends entry code 0 to re-initialize. Repositioning information, provided by the INMOD after a code 3, is read from the restart log table and returned in the buffer normally used for the data record. 3 A checkpoint has been written, the INMOD routine should remember the checkpoint position, and TPump does not expect the INMOD routine to return a data record. In the buffer normally used to return data, the INMOD should return any information (up to 100 bytes) needed to reposition to this checkpoint. The utility saves this information in the restart log table. 4 The Teradata Database has failed, the INMOD routine should reposition to the last checkpoint, and TPump is not expecting the INMOD routine to return a data record. Note: If the RDBMS restarts before the first checkpoint, TPump sends entry code 5 for cleanup, and then it sends entry code 0 to re-initialize. TPump reads the repositioning information, provided by the INMOD after a code 3, from the restart log table and returned to the INMOD in the buffer normally used for the data record. 5 The TPump job has ended and the INMOD routine should perform any required cleanup tasks. 6 The INMOD should initialize and prepare to receive records. 7 The next record is available for the INMOD. Table 23 explains the two status codes used by the INMOD-to-TPump interface. Teradata Parallel Data Pump Reference 205 Chapter 5: Using INMOD and Notify Exit Routines Overview Table 23: INMOD-to-TPump Interface Status Codes Value Description 0 A record is being returned as the body value for a read call (code 1). For calls other than read, a value of 0 indicates successful completion. Any nonzero value The INMOD routine is at an end-of-file condition for a read call (code 1). For calls other than read, a nonzero value indicates a processing error that terminates TPump. Length Length is the 32-bit binary value that the INMOD routine uses to specify the length, in bytes, of the data record. The INMOD routine can use a length value of zero to indicate an end-offile condition. Body Body is the area where the INMOD routine places the data record. Maximum record length is 31K or 31,744 bytes for Teradata for Windows. Maximum record length for Teradata Database for UNIX is 62K or 63,488 bytes. Sequence Number Sequence number is a 4-byte integer record counter portion of the source sequence number. Parameter List The parameter list in the second 32-bit address consists of the following: Caution: • VARCHAR specification • Two-byte length specification, m • The m-byte parms string, as parsed and presented by TPump To prevent data corruption, INMOD routines that cannot comply with these protocols should terminate if they encounter a restart code 2, 3, or 4. To support proper TPump restart operations, INMOD routines must save and restore checkpoint information as described here. If the INMOD saves checkpoint information in some other manner, a subsequent restart/ recovery operation could result in data loss or corruption. TPump/Notify Exit Routine Interface TPump accumulates operational information about specific events that occur during a TPump job. If the BEGIN LOAD command includes a NOTIFY option with an EXIT specification, then, when the specific events occur, TPump calls the named notify exit routine and passes to it: • An event code to identify the event • Specific information about the event Table 24 lists the event codes and describes the data that TPump passes to the notify exit routine for each event. (See the description of the NOTIFY option in the “BEGIN LOAD” 206 Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Overview command description in Chapter 3: “TPump Commands,” for a description of the events associated with each level of notification—low, medium, high, and ultra.) Note: To support future enhancements, always make sure that your notify exit routines ignore invalid or undefined event codes, and that they do not cause TPump to terminate abnormally. Table 24: Events Passed to the Notify Exit Routine Event Event Code Initialize 0 Event Description Data Passed to the Notify Exit Routine Successful processing of the NOTIFY option of the BEGIN LOAD command. • • • • • • • • • Version ID length—4-byte unsigned integer Version ID string—32-character (maximum) array Utility ID—4-byte unsigned integer Utility name length—4-byte unsigned integer Utility name string—36-character (maximum) array User name length—4-byte unsigned integer User name string—64-character (maximum) array Optional string length—4-byte unsigned integer Optional string—80-character (maximum) array File or INMOD open 1 Successful processing of the IMPORT command that specifies the file or INMOD routine name • File name length—4-byte unsigned integer • File name—256-character (maximum) array • Import number—4-byte unsigned integer Checkpoint begin 2 TPump is about to perform a checkpoint operation Record number—4-byte unsigned integer Import begin 3 The first record is about to be read for each import task Import number—4-byte unsigned integer Import end 4 The last record has been read for each import task. The returned data is the record statistics for the import task • • • • • Error table 5 Processing of the SEL COUNT(*) request completed successfully for the error table • Table name—128-byte character (maximum) array • Number of rows—4-byte unsigned integer Teradata Database restart 6 TPump received a crash message from No data accompanies the Teradata Database restart event code the Teradata Database or from the CLIv2 CLIv2 error 7 TPump received a CLIv2 error Teradata Parallel Data Pump Reference Import number—4-byte unsigned integer Records read—4-byte unsigned integer Records skipped—4-byte unsigned integer Records rejected—4-byte unsigned integer Records sent to the Teradata Database—4-byte unsigned integer • Data errors—4-byte unsigned integer Error code—4-byte unsigned integer 207 Chapter 5: Using INMOD and Notify Exit Routines Overview Table 24: Events Passed to the Notify Exit Routine (continued) Event Event Code Event Description Data Passed to the Notify Exit Routine TPump received a Teradata Database error that will produce an exit code of 12 Error code—4-byte unsigned integer Teradata Database error 8 Exit 9 TPump completed a load task Exit code—4-byte unsigned integer Table statistics 10 TPump has successfully written the table statistics • Type (I = Insert, U = Update, or D = Delete) — 1-byte character variable • Database name—64-character (maximum) array • Table/macro name—64-character (maximum) array • Activity count—4-byte unsigned integer Checkpoint end 11 TPump successfully completed the checkpoint operation Record number—4-byte unsigned integer Interim run statistics 12 TPump is updating the Monitor Interface table, has just completed a checkpoint, or has read the last record for an import task. The returned data is the statistics for the current load • Import number—4-byte unsigned integer • Statements sent to the Teradata Database—4-byte unsigned integer • Requests sent to the Teradata Database—4-byte unsigned integer • Records read—4-byte unsigned integer • Records skipped—4-byte unsigned integer • Records rejected—4-byte unsigned integer • Records sent to the Teradata Database—4-byte unsigned integer • Data errors—4-byte unsigned integer DML error 13 TPump received a Teradata Database error that was caused by DML and will introduce an error-row insert to the error table • Import number—4-byte unsigned integer • Error code—4-byte unsigned integer • Error message—256-character (maximum) array • Record number—4-byte unsigned integer • Apply number—1-byte unsigned char • DML number—1-byte unsigned char • Statement number—1-byte unsigned char • Record data—64,004-character (maximum) array • Record data length—4-byte unsigned integer • Feedback—a pointer to 4-byte unsigned integer “Feedback” always points to integer 0 when it is passed to the user exit routine. The user may change the value of this integer to 1 to instruct TPump not to log the error to the error table. In this case, TPump will not log the error, but continue other regular process on this error. 208 Note: Not all Teradata Database errors cause this event. A 3807 error, for example, while trying to drop or create a table, does not terminate TPump. Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Overview Rules and Restrictions for Using Routines The following sections describe the operational rules and restrictions for using INMOD and notify exit routines in TPump jobs. Specifying Routines INMOD and notify exit routine names must be unique within the system. A TPump job can specify one INMOD routine with each IMPORT command. These specifications can be to the same or different INMOD routines. In addition to the multiple INMOD routines, each TPump job can specify an exit routine with the NOTIFY... EXIT option of the BEGIN LOAD command. Compiling and Linking Routines The methods for compiling and linking routines vary with the operating system. The following sections describe the methods for VM, MVS, UNIX, and Windows. Using VM On channel-attached VM client systems, INMOD and notify exit routines must be compiled under SAS/C and passed to CLINK with the following options: • CLINK <filename> • LKED • LIBE • DYNAMC • NAME <modulename> The resulting module, which can be loaded by SAS/C at run time, is placed in a load library called DYNAMC LOADLIB. (The first name must be DYNAMC because this is the only place that SAS/C looks for user load modules.) Multiple load modules can exist in the local library as long as each module has a unique name. Using MVS The procedure on MVS platforms is similar to the procedure on VM platforms, with one exception: • User load modules can be located anywhere, as long as the location is identified by one of the DD name STEPLIB specifications in the JCL. Using UNIX On network-attached UNIX client systems, INMOD and notify exit routines must: • Be compiled with the MetaWare High C compiler • Be linked into a shared object module • Use an entry point named _dynamn Using Windows On network-attached Windows client systems, INMOD and notify exit routines must: Teradata Parallel Data Pump Reference 209 Chapter 5: Using INMOD and Notify Exit Routines Overview • Be written in C • Have a dynamn entry point that is a _declspec • Be saved as a Dynamic Link Library (DLL) file For more information, see the examples in Appendix C: “INMOD and Notify Exit Routine Examples” for sample programs and procedures that compile and link INMOD and notify exit routines for your operating system environment. Addressing Mode on VM and MVS Systems You can use either 31-bit or 24-bit addressing for INMOD routines on channel-attached systems. The 31-bit mode provides access to more memory, which enhances performance for TPump jobs with a large number of sessions. Use the following linkage parameters to specify the addressing mode when building INMOD routines for VM and MVS systems: • For 31-bit addressing: AMODE(31) RMODE(24) • For 24-bit addressing: AMODE(24) RMODE(24) INMOD Routine Compatibility with Other Load Utilities You can use FDL-compatible INMOD routines that were created for FastLoad by including the FDLINMOD parameter as the USING (parms) option of your IMPORT command. Using this parameter provides compatible support operations except for the way checkpointing is performed: • If your TPump job uses the FROM, FOR, or THRU options to request a range of records from an FDL-compatible INMOD routine, then TPump bypasses any default record checkpoint function. By default, TPump takes a checkpoint every 15 minutes. You can bypass the TPump checkpoint function by specifying a CHECKPOINT rate of zero in your BEGIN LOAD commands. If the Teradata Database experiences a restart/recovery operation, TPump starts over and gets the records again from the beginning of the range. Under these same circumstances, if your BEGIN LOAD command included a CHECKPOINT rate other than zero, TPump terminates with an error condition. • If your TPump job does not request a range of records, then TPump performs checkpointing either by default (every 15 minutes) or per your job specifications. If the Teradata Database experiences a restart/recovery operation and the INMOD routine supports recovery, TPump continues the data acquisition activity from the last recorded checkpoint. Note, however, that the source sequence numbers generated by TPump may not correctly identify the sequence in which the INMOD routine supplied the records. The data is still applied correctly, however, despite this discrepancy. You cannot specify an FDL-compatible INMOD routine with the INFILE specification of a TPump IMPORT command. When you specify an INMOD routine with the INFILE specification: 210 Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Using INMOD and Notify Exit Routines • TPump performs the file-read operation • The INMOD routine acts as a pass-through filter The combination of an FDL-compatible INMOD routine with a TPump INFILE specification is not valid because an FDL-compatible INMOD routine must always perform the file read operation. Checkpoints To support TPump restart operations, your INMOD routine must support checkpoint operations, as described in “The TPump/INMOD Routine Interface” on page 205. If you use an INMOD routine that does not support the checkpoint function, your job may encounter problems when TPump takes a checkpoint. By default, TPump takes a checkpoint every 15 minutes. You can bypass the TPump checkpoint function by specifying a CHECKPOINT rate of zero in your BEGIN LOAD command; that way, the job completes without taking a checkpoint. Though this would nullify the TPump restart/reload capability, it would allow you to use an INMOD routine that does not support the checkpoint function. Using INMOD and Notify Exit Routines This section provides some specific information you need for using INPUT and notify exit routines in TPump. Topics include TPump-specific restrictions, the TPump/INMOD interface for different client operating systems, preparation of the INMOD program, INMOD input values, INMOD output values, and programming specifications for Unix-based and Windows clients. TPump-specific Restrictions INMOD names should be unique within the system. INMODs are not re-entrant and cannot be shared by two TPump (or FastLoad, MultiLoad, or FastExport) sessions at the same time. Some changes have been made to the INMOD utility interface for TPump because of operational differences between TPump and the older utilities. For compatibility with INMODs, the FDLINMOD parameter should be used. The use of this parm provides support of existing INMODs, with the following restrictions: • When the FDLINMOD parm is used, INMODs that are compatible with other utilities may be used. However, if a range of records is requested from an FDL-compatible INMOD (using FROM, FOR, or THRU on the IMPORT command), TPump bypasses any default record checkpointing. If there is a recovery under these circumstances, TPump starts over and acquires the records again from the beginning of the range. Under these same circumstances, if checkpointing is requested by specifying the CHECKPOINT parameter on the BEGIN LOAD command, TPump terminates with an error. • If a range of records is not requested when using an FDL-compatible INMOD, TPump performs checkpointing, either by default or by the user’s request. If there is a recovery and Teradata Parallel Data Pump Reference 211 Chapter 5: Using INMOD and Notify Exit Routines Using INMOD and Notify Exit Routines the INMOD supports recovery, TPump continues its data acquisition from the last recorded checkpoint. However, the source sequence numbers generated by TPump may not correctly identify the sequence in which the INMOD supplied the records. Despite this discrepancy, the data is still applied correctly. • Warning: You cannot specify an FDL-compatible INMOD routine in conjunction with the INFILE specification of a TPump IMPORT command. If an INMOD is specified together with the INFILE specification, TPump performs the file read operation and the INMOD acts as a pass-through filter. Since an FDL-compatible INMOD always performs the file read operation, it is not valid with a TPump INFILE specification. The TPump default is to take a checkpoint every 15 minutes. With other loading utilities, checkpointing must be explicitly requested. If you attempt to run with an INMOD that does not use checkpointing, problems may arise when TPump defaults to a checkpoint mode. To avoid this condition, you can disable TPump checkpointing by specifying zero as the checkpoint rate parameter on the BEGIN LOAD command, so that the checkpoint is never reached. This may be imperative for users who do not have INMODs capable of checkpointing. TPump/INMOD Interface This section discusses the TPump/INMOD Interface for different client operating systems: TPump/INMOD Interface on IBM Client-based Systems The use of an INMOD is specified on the IMPORT command. On IBM client-based systems, the Teradata Database interfaces with INMODs written in C, COBOL, PL/I, and Assembler. Examples of these INMODs are presented in Appendix C: “INMOD and Notify Exit Routine Examples”. An optional parms string to be passed to the INMOD may also be specified on the IMPORT command. TPump imposes the following syntax rules for this string: 212 • The parms string may include one or more character strings, each delimited on either end by an apostrophe, or delimited on either end by a quotation mark. The maximum size of the parms string is 1 KB. • If a FastLoad INMOD is used, the parms string of the IMPORT command must be FDLINMOD. • The parms string passed to an INMOD includes the parentheses used to specify the parm. Thus, if the IMPORT specifies USING (’5’), the entire expression (’5’) is passed to the INMOD. • Parentheses within delimited character strings or comments have the same syntactical significance as alphabetic characters. • In the parms string that TPump passes to the INMOD routine, each comment is replaced by a single blank character. • In the parms string that TPump passes to the INMOD routine, each consecutive sequence of whitespace characters, such as blank, tab, and so on, that appears outside of delimited strings, is replaced by a single blank character. • FDLINMOD is used for compatibility by pointing to a data structure that is the same for BDL and FDL INMODs. Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Using INMOD and Notify Exit Routines TPump/INMOD Interface on UNIX-based Systems On UNIX-based client platforms, TPump is written in C and, therefore, the INMOD procedure is dynamically loaded at runtime, rather than link-edited into the TPump module or operated as a separate executable program. The runtime loader requires that the INMOD module be compiled and linked as a shared object, and that the entry point for the procedure be named _dynamn. The use of an INMOD is specified in the IMPORT command. On UNIX-based systems, the Teradata Database interfaces only with INMODs written in C. An example of a C INMOD is presented in Appendix C: “INMOD and Notify Exit Routine Examples”. An optional parms string to be passed to the INMOD may also be specified on the IMPORT command. TPump imposes these syntax rules: • One INMOD is allowed for each IMPORT command. Multiple IMPORTs are allowed; these may use the same or different INMODs. • The input filename parameter specified on the IMPORT command must be the fully qualified UNIX pathname for the input file. • The INMOD filename parameter specified on the IMPORT command must be the fully qualified UNIX pathname of the INMOD shared object file. • The parms string may include one or more character strings, each delimited on either end by an apostrophe, or delimited on either end by a quotation mark. The maximum size of the parms string is 1k bytes. • If a FastLoad INMOD is used, the parms string of the IMPORT command must be “FDLINMOD”. • The parms string as a whole must be enclosed in parentheses. • Parentheses within delimited character strings or comments have the same syntactical significance as alphabetic characters. • In the parms string that TPump passes to the INMOD routine, each comment is replaced by a single blank character. • In the parms string that TPump passes to the INMOD routine, each consecutive sequence of whitespace characters, such as blank, tab, and so on, that appears outside of delimited strings, is replaced by a single blank character. • FDLINMOD is used for compatibility by pointing to a data structure that is the same for FDL INMODs. TPump/INMOD Interface on Windows Systems On Windows client platforms, TPump is written in C and, therefore, the INMOD procedure is dynamically loaded at runtime, rather than link-edited into the TPump module or run as a separate executable program. The runtime loader requires that the INMOD module be compiled and linked as a DynamicLink Library (DLL) file, and that the point for the procedure be named _dynamn. The use of an INMOD is specified in the IMPORT command. On systems, the Teradata Database interfaces only with written in C.An optional parms string to be passed to INMOD may also be specified on the IMPORT command. TPump imposes the following syntax rules: Teradata Parallel Data Pump Reference 213 Chapter 5: Using INMOD and Notify Exit Routines Preparing the INMOD Program • One INMOD is allowed for each IMPORT command. Multiple are allowed; these may use the same or different INMODs. • The input filename parameter specified on the IMPORT command must be the fully qualified Windows pathname for the input file. • The INMOD filename parameter specified on the IMPORT command must be the fully qualified Windows pathname of the INMOD DLL file. • The parms string may include one or more character strings, each delimited on either end by an apostrophe, or delimited on either end by a quotation mark. The maximum size of the parms string is 1k bytes. • If a FastLoad INMOD is used, the parms string of the IMPORT command must be “FDLINMOD”. • The parms string as a whole must be enclosed in parentheses. • Parentheses within delimited character strings or comments have the same syntactical significance as alphabetic characters. • In the parms string that TPump passes to the INMOD routine, each comment is replaced by a single blank character. • In the parms string that TPump passes to the INMOD routine, each consecutive sequence of whitespace characters, such as blank, tab, and so on, that appears outside of delimited strings, is replaced by a single blank character. • FDLINMOD is used for compatibility by pointing to a data structure that is the same for FDL INMODs. Preparing the INMOD Program This section describes the protocol used between TPump and an INMOD written for TPump. The protocols are applicable to all client platforms running TPump. Considerations applicable exclusively to UNIX-based clients are contained in “Programming INMODs for UNIX-based Clients” on page 216. On entry to an INMOD user exit routine for TPump, the conventional parameter register points to a parameter list of two 32-bit addresses. The first 32-bit address points to a data structure containing the following fields: 214 • Return Code/Function Code, 4-byte integer. • Length, 4-byte integer, Length of the data record. • Data Record, Input data record buffer. The maximum length is: • 31K or 31,744 bytes for Teradata for Windows • 62K or 63,488 bytes for Teradata Database for UNIX Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines INMOD Input Values INMOD Input Values As input to the INMOD routine, Table 25 lists valid values of the Return Code/Function Code field and their meanings: Table 25: INMOD Input Return Code Values Code Description 0 Request for INMOD to initialize and return first record. 1 Request for INMOD to return a record. 2 Request for INMOD to reposition to last checkpoint because of client restart. Repositioning information, provided by the INMOD after a code 3, is read from the restart logtable and returned to the INMOD in the buffer normally used for the data record. 3 Request for INMOD to take a checkpoint. In the buffer normally used to return data, the INMOD should return any information (up to 100 bytes) that it may need to reposition to this checkpoint. TPump then saves this information in its restart log table. 4 Request for INMOD to reposition to last checkpoint because of Teradata Database failure. Repositioning information, provided by the INMOD after a code 3, is read from the restart log table and returned to the INMOD in the buffer normally used for the data record. 5 Request for INMOD to wrap up at termination. 6 Request for INMOD to initialize. 7 Request for INMOD to receive first (next) record. INMOD Output Values As output from the INMOD routine, Table 26 lists valid values of the Return Code field and their meanings: Table 26: INMOD Output Return Code Values Code Description 0 on read call (code 1) Indicates End Of File not reached. The length field should be set to the length of the output record. If an input record was supplied to the INMOD and it is to be skipped, set the length field to zero. If no input record was supplied, setting the length to zero acts as an End Of File. Non-0 on read call (code 1) Indicates End Of File. 0 on non-read call (not code 1) Indicates successful operation. Non-0 on non-read call (not code 1) Indicates a processing error. TPump terminates. Teradata Parallel Data Pump Reference 215 Chapter 5: Using INMOD and Notify Exit Routines Programming INMODs for UNIX-based Clients The second 32-bit address points to a data structure containing the following fields: • Sequence Number, 4-byte integer, Integer record counter portion of the source sequence number. • Parameter List, Varchar, 2-byte length, m, followed by the m-byte parms string as parsed and presented by TPump. INMODs that cannot comply with these protocols should terminate if a Restart Code 2, Code 3, or Code 4 is encountered. Otherwise, data might become corrupted. In order to be restartable, INMODs must make use of TPump to save and restore checkpoint information as described above. If the INMOD saves its checkpointing information privately, recovery might result in data corruption. Note: For VM users, INMODs must be link-edited into a CMS LOADLIB with the name DYNAMC LOADLIB to be available for use with TPump. Note: On MVS, the module must reside in the steplib/joblib (for JCL), task library (for clist/ exec), or the system linklist (for any). Programming INMODs for UNIX-based Clients In addition to the techniques for preparing INMODs listed in “Preparing the INMOD Program” on page 214 which apply to all platforms, there are several rules that must be followed only for developing C INMODs for UNIX-based clients. These are: • The INMOD subroutine must be named _dynamn. • The INMOD must be compiled with the MetaWare High C compiler. • The compiled INMOD module must be linked into a shared object module. Compiling and Linking a C INMOD on a UNIX-based Client Note: For a description of the syntax diagrams used in this book, see Appendix A: “How to Read Syntax Diagrams.” The following syntax example can be used to compile a C INMOD on a UNIX-based client. Compile Syntax cc -c inmod.c 3021A038 where 216 Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Programming INMODs for UNIX-based Clients Syntax Element Description cc Program that invokes the MetaWare High C Compiler c Linker option specifying to compile without linking to produce an output file (a.out) inmod.c A C source module for the INMOD Use the following syntax example to link the object modules into a shared object module. Link Syntax , ld -dy -G inmod.o inmod.so -o HE05A016 where Syntax Element Description ld Invokes the UNIX linker editor dy Specifies to use dynamic linking G Specifies to create a shared object inmod.o Describes an object module derived from the compile step (see above) o Specifies the output filename; default is a.out inmod.so Specifies the resulting shared object module This is the user-specified name in the IMPORT command. Compiling and Linking a C INMOD on MP-RAS and Sun Solaris SPARC Use the following syntax example to compile a C INMOD on MP-RAS or Sun Solaris SPARC client systems. cc -G -KPIC sourcefile.c -o shared-object-name 2409B051 where Teradata Parallel Data Pump Reference 217 Chapter 5: Using INMOD and Notify Exit Routines Programming INMODs for UNIX-based Clients Syntax Element Description cc Invokes the MetaWare High C Compiler -G Specifies to create a shared object -KPIC Is a compiler option that generates Position Independent Code (PIC) for all user exit routine sourcefile Is a C source module for the INMOD -o Specifies the output file name shared-objectname Specifies the resulting shared object module This is the name you specify as: • The INMOD modulename parameter of the IMPORT command of your TPump job script. • The EXIT name parameter for the NOTIFY option of the BEGIN LOAD command of your TPump job script The shared-object-name can be any valid UNIX file name. Compiling and Linking a C INMOD on a Sun Solaris Opteron Use the following syntax example to compile a C INMOD on a Sun Solaris Opteron client system. cc -dy -G sourcefile.c -o shared-object-name 2409A055 where 218 Syntax Element Description cc Invokes the MetaWare High C Compiler -dy Specifies to use dynamic linking -G Specifies to create a shared object sourcefile Is a C source module for the INMOD -o Specifies the output file name Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Programming INMODs for UNIX-based Clients Syntax Element Description shared-objectname Specifies the resulting shared object module This is the name you specify as: • The INMOD modulename parameter of the IMPORT command of your TPump job script. • The EXIT name parameter for the NOTIFY option of the BEGIN LOAD command of your TPump job script. The shared-object-name can be any valid UNIX file name. Compiling and Linking a C INMOD on HP-UX PA RISC Use the following syntax example to compile a C INMOD on HP-UX PA RISC client. Compile Syntax cc +z inmod.c +ul 3021A002 where Syntax Element Description cc Invokes the MetaWare High C Compiler +z Is a compiler option specified to generate Position Independent Code (PIC) for all user exit routines +ul Is a compiler option that allows pointers to access non-natively aligned data inmod.c Is a C source module for the INMOD Use the following syntax example to link the object modules on HP-UX PA-RISC into the shared object. Link Syntax , ld -b inmod.o -o inmod.so 3021A003 where Teradata Parallel Data Pump Reference 219 Chapter 5: Using INMOD and Notify Exit Routines Programming INMODs for UNIX-based Clients Syntax Element Description ld Invokes the UNIX linker editor -b Is a linker option specified to generate a shared object file inmod.o Is an object module derived from the compile step (see above) o Specifies the output filename; default is a.out inmod.so Specifies the resulting shared object module This is the user-specified name in the IMPORT command. Compiling and Linking a C INMOD on HP-UX Itanium Use the following syntax example to compile a C INMOD on an HP-UX Itanium-based client. Compile Syntax cc +u1 -D_REENTRANT +DD64 -c inmod.c 2409A057 where Syntax Element Description cc Invokes the MetaWare High C compiler +u1 Is a compiler option that allows pointers to access non-natively aligned data -D_REENTRANT Ensures that all the Pthread definitions are visible at compile time +DD64 Generates 64-bit object code for PA2.0 architecture -c Compiles one or more source files but does not enter the linking phase inmod.c A C source module for the INMOD Use the following syntax example to link the object modules on HP-UX Itanium into the shared object. Link Syntax ld -n -b inmod.o -lc -o inmod.so 2409A056 220 Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Programming INMODs for UNIX-based Clients where Syntax Element Description ld Invokes the UNIX linker editor -n Generates an executable with file type SHARE_MAGIC. This option is ignored in 64-bit mode. -b Is a linker option specified to generate a shared object file inmod.o Is an object module derived from the compile step (see above) -lc Search a library libc.a, libc.so, or libc.sh -o Specifies the output filename; default is a.out inmod.so Specifies the resulting shared object module This is the user-specified name in the IMPORT command. Compiling and Linking a C INMOD on an IBM AIX Use the following syntax example to compile a C INMOD on an IBM AIX-based client. Compile Syntax cc -c -brtl -fPIC sourcefile.c 3021A008 where Syntax Element Description cc Is a call to the program that invokes the native UNIX C compiler -c Is a compiler option that specifies to not send object files to the linkage editor -brtl Tells the linkage editor to accept both .sl and .a library file types -fPIC Is a compiler option that generates Position Independent Code (PIC) for all user exit routines sourcefile.c Is a C source module for the INMOD Use the following syntax example to link the object modules into a shared object module. Teradata Parallel Data Pump Reference 221 Chapter 5: Using INMOD and Notify Exit Routines Programming INMODs for UNIX-based Clients Link Syntax ld -G A objectfile.o A -bE: export_dynamn.txt -e_dynamn -o shared-object-name -lm -lc 3021A011 where Syntax Element Description ld Invokes the UNIX linker editor G Produces a shared object enabled for use with the run-time linker -e_dynamn Sets the entry point of the exit routine to _dynamn -bE : export_dynamn.txt Is a linker option that exports the symbol "_dynamn" explicitly and the file export_dynamn.txt contains the symbol objectfile.o Is an object module created during compile step -o Specifies the output file name shared-object-name Specifies the resulting shared object module This is the name you specify as: • The INMOD modulename parameter of the IMPORT command of your TPump job script. • The EXIT name parameter for the NOTIFY option of the BEGIN LOAD command of your TPump job script The shared-object-name can be any valid UNIX file name. -lm Is a linker option specifying to link with the /lib/libm.a library -lc Is a linker option specifying to link with the /lib/libc.a library Compiling and Linking a C INMOD on a Linux Client Use the following syntax example to compile a C INMOD on a Linux client. Note: Be sure to compile your INMOD and notify exit routines in 32-bit mode so they are compatible with Teradata TPump. , gcc -shared -fPIC inmod.c -o inmod.so 3021A037 where 222 Teradata Parallel Data Pump Reference Chapter 5: Using INMOD and Notify Exit Routines Programming INMODs for a Windows Client Syntax Element Description gcc Invokes the C compiler on Linux shared Produces a shared object, which can then be linked with other objects to form an executable -fPIC Produces Position Independent Code -o Specifies the output file name Programming INMODs for a Windows Client The previous section lists INMOD preparation techniques that apply to all platforms. There are several additional rules to follow when developing C INMODs for Windows clients. These are: • The INMOD routine must be written in C • The INMOD routine must have an entry point named _dynamn and declared with _declspec keyword • The file must be saved as a DLL file Compiling and Linking a C INMOD on a Windows Client Use the following syntax example to create a DLL on a Windows client. cl /DWIN32 /LD inmod.c 3021B012 where Syntax Element Description cl Invokes the Microsoft C Compiler D Defines a macro LD Creates a .dll inmod.c Denotes a C source module for the INMOD Teradata Parallel Data Pump Reference 223 Chapter 5: Using INMOD and Notify Exit Routines Programming INMODs for a Windows Client 224 Teradata Parallel Data Pump Reference APPENDIX A How to Read Syntax Diagrams This appendix describes the conventions that apply to reading the syntax diagrams used in this book. Syntax Diagram Conventions Notation Conventions The following table defines the notation used in this section: Item Definition/Comments Letter An uppercase or lowercase alphabetic character ranging from A through Z. Number A digit ranging from 0 through 9. Do not use commas when entering a number with more than three digits. Word Variables and reserved words: UPPERCASE LETTERS Represents a keyword. Syntax diagrams show all keywords in uppercase, unless operating system restrictions require them to be in lowercase. If a keyword is shown in uppercase, you may enter it in uppercase or mixedcase. lowercase letters Represents a keyword that you must enter in lowercase, such as a UNIX command. lowercase italic letters Represents a variable such as a column or table name. You must substitute a proper value. lowercase bold letters Represents a variable that is defined immediately following the diagram that contains it. UNDERLINED LETTERS Represents the default value. This applies both to uppercase and to lowercase words. Spaces Use one space between items, such as keywords or variables. Punctuation Enter all punctuation exactly as it appears in the diagram. Teradata Parallel Data Pump Reference 225 Appendix A: How to Read Syntax Diagrams Syntax Diagram Conventions Paths The main path along the syntax diagram begins at the left, and proceeds, left to right, to the vertical bar, which marks the end of the diagram. Paths that do not have an arrow or a vertical bar only show portions of the syntax. Note that the only part of a path that reads from right to left is a loop. Paths that are too long for one line use continuation links. Continuation links are small circles with letters indicating the beginning and ending of a link: A A FE0CA002 When you see a circled letter in a syntax diagram, go to the corresponding circled letter and continue. Required Items Required items appear on the main path: SHOW FE0CA003 If you can choose from more than one item, the choices appear vertically, in a stack. The first item appears on the main path: SHOW CONTROLS VERSIONS FE0CA005 Optional Items Optional items appear below the main path: SHOW CONTROLS 226 FE0CA004 Teradata Parallel Data Pump Reference Appendix A: How to Read Syntax Diagrams Syntax Diagram Conventions If choosing one of the items is optional, all the choices appear below the main path: SHOW CONTROLS VERSIONS FE0CA006 You can choose one of the options, or you can disregard all of the options. Abbreviations If a keyword or a reserved word has a valid abbreviation, the unabbreviated form always appears on the main path. The shortest valid abbreviation appears beneath. SHOW CONTROLS CONTROL FE0CA042 In the above syntax, the following formats are valid: • SHOW CONTROLS • SHOW CONTROL Loops A loop is an entry or a group of entries that you can repeat one or more times. Syntax diagrams show loops as a return path above the main path, over the item or items that you can repeat. , , ( 3 4 cname ) JC01B012 The following rules apply to loops: If Then there is a maximum number of entries allowed the number appears in a circle on the return path. there is a minimum number of entries required the number appears in a square on the return path. Teradata Parallel Data Pump Reference In the example, you may enter cname a maximum of 4 times. In the example, you must enter at least 3 groups of column names. 227 Appendix A: How to Read Syntax Diagrams Syntax Diagram Conventions If Then a separator character is required between entries the character appears on the return path. If the diagram does not show a separator character, use one blank space. In the example, the separator character is a comma. a delimiter character is required around entries the beginning and ending characters appear outside the return path. Generally, a space is not needed between delimiter characters and entries. In the example, the delimiter characters are the left and right parentheses. Excerpts Sometimes a piece of a syntax phrase is too large to fit into the diagram. Such a phrase is indicated by a break in the path, marked by | terminators on either side of the break. A name for the excerpted piece appears between the break marks in boldface type. The named phrase appears immediately after the complete diagram, as illustrated by the following example. LOCKING excerpt A A HAVING con excerpt where_cond , cname , col_pos 2409A050 228 Teradata Parallel Data Pump Reference APPENDIX B TPump Examples This appendix provides some examples of TPump scripts and their corresponding output. Included are: • Simple Script Example • Restarted Upsert Example • Example Using the TABLE Command In the output examples, the lines that begin with 4-digit numbers (for example, 0001) are scripts, the rest are output. Simple Script Example The following is an example of a simple script. This script: /**************************************************************/ /* */ /* MLNT002H MVSJCL */ /* */ /**************************************************************/ /***********************************************/ /* STEP01 CREATES THE TABLES FOR THE TPump JOB */ /***********************************************/ .LOGTABLE CME.TLddNT2H; .LOGON OPNACC1/CME,CME; DROP TABLE TBL1T; DROP TABLE TBL2T; DROP TABLE tlnt2err; CREATE TABLE TBL2T,FALLBACK (ABYTEINT BYTEINT, ASMALLINT SMALLINT, AINTEGER INTEGER, ADECIMAL DECIMAL (5,2), ACHAR CHAR (5), ABYTE BYTE(1), AFLOAT FLOAT, ADATE DATE) UNIQUE PRIMARY INDEX (ASMALLINT); /*****************************************************************/ /* BEGIN LOAD WITH ALL THE OPTIONS SPECIFIED SUCH AS ERRLIMIT, */ /* CHECKPOINT, SESSIONS,TENACITY */ /*****************************************************************/ .BEGIN LOAD SESSIONS 6 4 PACK 10 Teradata Parallel Data Pump Reference 229 Appendix B: TPump Examples Simple Script Example CHECKPOINT 1 TENACITY 2 ERRLIMIT 5 ERRORTABLE tlnt2err; .LAYOUT LAY1A; .FIELD ABYTEINT * BYTEINT; .FIELD ASMALLINT * SMALLINT; .FIELD AINTEGER * INTEGER; .FIELD ADECIMAL * DECIMAL (5,2); .FIELD ACHAR * CHAR (5); .FIELD ABYTE * BYTE(1); .FIELD AFLOAT * FLOAT; .FIELD ADATE * DATE; .DML LABEL LABELA IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS; INSERT INTO TBL2T VALUES (:ABYTEINT,:ASMALLINT,:AINTEGER,:ADECIMAL,:ACHAR,:ABYTE,:AFLOAT,:ADATE); .IMPORT INFILE ./tlnt002.dat LAYOUT LAY1A APPLY LABELA FROM 1 FOR 1000; .END LOAD; .LOGOFF; produces the following output: 0002 /**************************************************************/ /* */ /* MLNT002H MVSJCL */ /* */ /**************************************************************/ /***********************************************/ /* STEP01 CREATES THE TABLES FOR THE TPump JOB */ /***********************************************/ .LOGTABLE CME.TLddNT2H; 0003 .LOGON OPNACC1/CME,; **** 09:47:17 UTY8400 Teradata Database Release: 12.00.00.00 **** 09:47:17 UTY8400 Teradata Database Version: 12.00.00.00 **** 09:47:17 UTY8400 Default character set: EBCDIC **** 09:47:17 UTY8400 Maximum supported buffer size: 1M **** 09:47:17 UTY8400 Upsert supported by RDBMS server **** 09:47:17 UTY6211 A successful connect was made to the DBS. **** 09:47:17 UTY6217 Logtable 'CME.TLddNT2H' has been created. ======================================================================== = = = Processing Control Statements = = = ======================================================================== 0004 DROP TABLE TBL1T; **** 09:47:23 UTY1016 'DROP' request successful. 0005 DROP TABLE TBL2T; **** 09:47:29 UTY1016 'DROP' request successful. 0006 DROP TABLE tlnt2err; **** 09:47:30 UTY1008 DBS failure: 3807, Table/view 'tlnt2err' does not exist. 0007 CREATE TABLE TBL2T,FALLBACK (ABYTEINT BYTEINT, ASMALLINT SMALLINT, AINTEGER INTEGER, 230 Teradata Parallel Data Pump Reference Appendix B: TPump Examples Simple Script Example **** 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021 **** **** **** ADECIMAL DECIMAL (5,2), ACHAR CHAR (5), ABYTE BYTE(1), AFLOAT FLOAT, ADATE DATE) UNIQUE PRIMARY INDEX (ASMALLINT); 09:47:42 UTY1016 'CREATE' request successful. /*****************************************************************/ /* BEGIN LOAD WITH ALL THE OPTIONS SPECIFIED SUCH AS ERRLIMIT, */ /* CHECKPOINT, SESSIONS,TENACITY */ /*****************************************************************/ .BEGIN LOAD SESSIONS 6 4 PACK 10 CHECKPOINT 1 TENACITY 2 ERRLIMIT 5 ERRORTABLE tlnt2err; ======================================================================== = = = Processing TPump Statements = = = ======================================================================== .LAYOUT LAY1A; .FIELD ABYTEINT * BYTEINT; .FIELD ASMALLINT * SMALLINT; .FIELD AINTEGER * INTEGER; .FIELD ADECIMAL * DECIMAL (5,2); .FIELD ACHAR * CHAR (5); .FIELD ABYTE * BYTE(1); .FIELD AFLOAT * FLOAT; .FIELD ADATE * DATE; .DML LABEL LABELA IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS; INSERT INTO TBL2T VALUES (:ABYTEINT,:ASMALLINT,:AINTEGER,:ADECIMAL,:ACHAR,:ABYTE,:AFLOAT,:ADATE); .IMPORT INFILE ./tlnt002.dat LAYOUT LAY1A APPLY LABELA FROM 1 FOR 1000; .END LOAD; 09:47:43 UTY6609 Starting to log on sessions... 09:47:57 UTY6610 Logged on 6 sessions. ======================================================================== = = = TPump Import(s) Beginning = = = ======================================================================== 09:47:57 UTY6630 Options in effect for following TPump Import(s): . Tenacity: 2 hour limit to successfully connect load sessions. . Max Sessions: 6 session(s). . Min Sessions: 4 session(s). . Checkpoint: 1 minute(s). . Errlimit: 5 rejected record(s). . Restart Mode: ROBUST. . Serialization: OFF. . Packing: 10 Statements per Request. . StartUp Rate: UNLIMITED Statements per Minute. Teradata Parallel Data Pump Reference 231 Appendix B: TPump Examples Restarted Upsert Example **** 09:48:13 UTY6608 Import 1 begins. **** 09:48:51 UTY6641 Since last chkpt., 1000 recs. in, 1000 stmts., 104 reqs **** 09:48:51 UTY6647 Since last chkpt., avg. DBS wait time: 0.26 **** 09:48:51 UTY6612 Beginning final checkpoint... **** 09:48:51 UTY6641 Since last chkpt., 1000 recs. in, 1000 stmts., 104 reqs **** 09:48:51 UTY6647 Since last chkpt., avg. DBS wait time: 0.26 **** 09:48:51 UTY6607 Checkpoint Completes with 1000 rows sent. **** 09:48:51 UTY6642 Import 1 statements: 1000, requests: 104 **** 09:48:51 UTY6643 Import 1 average statements per request: 9.62 **** 09:48:51 UTY6644 Import 1 average statements per record: 1.00 **** 09:48:51 UTY6645 Import 1 statements/session: avg. 166.67, min. 154.00, max . 182.00 **** 09:48:51 UTY6646 Import 1 requests/session: average 17.33, minimum 16.00, m aximum 19.00 **** 09:48:51 UTY6648 Import 1 DBS wait time/session: avg. 4.50, min. 2.00, max. 11.00 **** 09:48:51 UTY6649 Import 1 DBS wait time/request: avg. 0.25, min. 0.11, max. 0.58 **** 09:48:51 UTY1803 Import processing statistics . IMPORT 1 Total thus far . ========= ============== Candidate records considered:........ 1000....... 1000 Apply conditions satisfied:.......... 1000....... 1000 Records logged to error table:....... 0....... 0 Candidate records rejected:.......... 0....... 0 ** Statistics for Apply Label : LABELA Type Database Table or Macro Name Activity I CME TBL2T 1000 **** 09:48:52 UTY0821 Error table CME.tlnt2err is EMPTY, dropping table. 0022 .LOGOFF ; ======================================================================== = = = Logoff/Disconnect = = = ======================================================================== **** 09:49:00 UTY6216 The restart log table has been dropped. **** 09:49:00 UTY6212 A successful disconnect was made from the RDBMS. **** 09:49:00 UTY2410 Total processor time used = '0.791138 Seconds' . Start : 09:47:17 - MON JULY 16, 2007 . End : 09:49:00 - MON JULY 16, 2007 . Highest return code encountered = '0'. Restarted Upsert Example This restarted upsert example uses two IMPORT clauses. The first one loads half of the records from the data source into an empty table. The second one does an upsert using all the records in the same data file. The result is that it updates the rows inserted during the first import and inserts all of the rows that the first import skipped. This script: /***********************************************/ /* STEP01 CREATES THE TABLES FOR THE TPump JOB */ /***********************************************/ .LOGTABLE TLddNT13H; 232 Teradata Parallel Data Pump Reference Appendix B: TPump Examples Restarted Upsert Example .LOGON cs4400s3/wth,wth; DROP TABLE TBL13TA; DROP TABLE tlnt13err; CREATE TABLE TBL13TA,FALLBACK (ABYTEINT BYTEINT, ASMALLINT SMALLINT, AINTEGER INTEGER, ADECIMAL DECIMAL (5,2), ACHAR CHAR (5), ABYTE BYTE(1), AFLOAT FLOAT, ADATE DATE) UNIQUE PRIMARY INDEX (ASMALLINT); /*****************************************************************/ /* BEGIN LOAD WITH ALL THE OPTIONS SPECIFIED SUCH AS ERRLIMIT, */ /* CHECKPOINT, SESSIONS,TENACITY */ /*****************************************************************/ .BEGIN LOAD SESSIONS 1 1 PACK 10 CHECKPOINT 1 TENACITY 2 ERRLIMIT 50 ERRORTABLE tlnt13err; .LAYOUT LAY1A; /*.FILLER ATEST * BYTEINT;*/ .FIELD ABYTEINT * BYTEINT; .FIELD ASMALLINT * SMALLINT KEY; .FIELD AINTEGER * INTEGER; .FIELD ADECIMAL * DECIMAL (5,2); .FIELD ACHAR * CHAR (5); .FIELD ABYTE * BYTE(1); .FIELD AFLOAT * FLOAT; .FIELD ADATE * DATE; /* insert half of the rows ......................*/ .DML LABEL LABELA IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS; INSERT INTO TBL13TA VALUES (:ABYTEINT,:ASMALLINT,:AINTEGER,:ADECIMAL,:ACHAR,:ABYTE,:AFLOAT,:ADATE); /* ... and then upsert all of the rows ..........*/ .DML LABEL LABELB IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS DO INSERT FOR MISSING UPDATE ROWS; UPDATE TBL13TA SET ADECIMAL = ADECIMAL + 1 WHERE ASMALLINT = :ASMALLINT; INSERT INTO TBL13TA VALUES (:ABYTEINT,:ASMALLINT,:AINTEGER,:ADECIMAL,:ACHAR,:ABYTE,:AFLOAT,:ADATE); /* should result in an upsert with half inserts and half updates */ .IMPORT INFILE ./tlnt013.dat LAYOUT LAY1A FROM 1 FOR 400 APPLY LABELA WHERE ABYTEINT = 1; .IMPORT INFILE ./tlnt013.dat LAYOUT LAY1A FROM 1 FOR 400 Teradata Parallel Data Pump Reference 233 Appendix B: TPump Examples Restarted Upsert Example APPLY LABELB; .END LOAD; .LOGOFF; produces the following output (assuming it was restarted during the second import): 0001 /***********************************************/ /* STEP01 CREATES THE TABLES FOR THE TPump JOB */ /***********************************************/ .LOGTABLE TLddNT13H; 0002 .LOGON cs4400s3/wth,; **** 16:57:43 UTY8400 Teradata Database Release: 12.00.00.00 **** 16:57:43 UTY8400 Teradata Database Version: 12.00.00.00 **** 16:57:43 UTY8400 Default character set: ASCII **** 16:57:43 UTY8400 Maximum supported buffer size: 1M **** 16:57:43 UTY8400 Upsert supported by RDBMS server **** 16:57:43 UTY6211 A successful connect was made to the RDBMS. **** 16:57:43 UTY6210 Logtable 'WTH.TLddNT13H' indicates that a restart is in progress. ======================================================================== = = = Processing Control Statements = = = ======================================================================== 0003 DROP TABLE TBL13TA; **** 16:57:43 UTY1012 A restart is in progress. This request has already been executed. The return code was: 0. 0004 DROP TABLE tlnt13err; **** 16:57:43 UTY1011 A restart is in progress. This request has already been executed. The return code was: 3807, accompanied by the following message text: Table/view/trigger/procedure 'tlnt13err' does not exist. 0005 CREATE TABLE TBL13TA,FALLBACK (ABYTEINT BYTEINT, ASMALLINT SMALLINT, AINTEGER INTEGER, ADECIMAL DECIMAL (5,2), ACHAR CHAR (5), ABYTE BYTE(1), AFLOAT FLOAT, ADATE DATE) UNIQUE PRIMARY INDEX (ASMALLINT); **** 16:57:43 UTY1012 A restart is in progress. This request has already been executed. The return code was: 0. 0006 /*****************************************************************/ /* BEGIN LOAD WITH ALL THE OPTIONS SPECIFIED SUCH AS ERRLIMIT, */ /* CHECKPOINT, SESSIONS,TENACITY */ /*****************************************************************/ .BEGIN LOAD SESSIONS 1 1 PACK 10 CHECKPOINT 1 TENACITY 2 ERRLIMIT 50 ERRORTABLE tlnt13err; ======================================================================== = = = Processing TPump Statements = = = ======================================================================== 0007 .LAYOUT LAY1A; 234 Teradata Parallel Data Pump Reference Appendix B: TPump Examples Restarted Upsert Example 0008 /*.FILLER ATEST * BYTEINT;*/ .FIELD ABYTEINT * BYTEINT; 0009 .FIELD ASMALLINT * SMALLINT KEY; 0010 .FIELD AINTEGER * INTEGER; 0011 .FIELD ADECIMAL * DECIMAL (5,2); 0012 .FIELD ACHAR * CHAR (5); 0013 .FIELD ABYTE * BYTE(1); 0014 .FIELD AFLOAT * FLOAT; 0015 .FIELD ADATE * DATE; 0016 /* insert half of the rows ......................*/ .DML LABEL LABELA IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS; 0017 INSERT INTO TBL13TA VALUES (:ABYTEINT,:ASMALLINT,:AINTEGER,:ADECIMAL,:ACHAR,:ABYTE,:AFLOAT,:ADATE); 0018 /* ... and then upsert all of the rows ..........*/ .DML LABEL LABELB IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS DO INSERT FOR MISSING UPDATE ROWS; 0019 UPDATE TBL13TA SET ADECIMAL = ADECIMAL + 1 WHERE ASMALLINT = :ASMALLINT; 0020 INSERT INTO TBL13TA VALUES (:ABYTEINT,:ASMALLINT,:AINTEGER,:ADECIMAL,:ACHAR,:ABYTE,:AFLOAT,:ADATE); 0021 /* should result in an upsert with half inserts and half updates */ .IMPORT INFILE ./tlnt013.dat LAYOUT LAY1A FROM 1 FOR 400 APPLY LABELA WHERE ABYTEINT = 1; 0022 .IMPORT INFILE ./tlnt013.dat LAYOUT LAY1A FROM 1 FOR 400 APPLY LABELB; 0023 .END LOAD; **** 16:57:43 UTY6609 Starting to log on sessions... **** 16:57:43 UTY6610 Logged on 1 sessions. ======================================================================== = = = TPump Import(s) Beginning = = = ======================================================================== **** 16:57:43 UTY6630 Options in effect for following TPump Import(s): . Tenacity: 2 hour limit to successfully connect load sessions. . Max Sessions: 1 session(s). . Min Sessions: 1 session(s). . Checkpoint: 1 minute(s). . Errlimit: 50 rejected record(s). . Restart Mode: ROBUST. . Serialization: ON. . Packing: 10 Statements per Request. . StartUp Rate: UNLIMITED Statements per Minute. **** 16:57:43 UTY6615 Processing complete for load 1, import 1. **** 16:57:44 UTY6622 Restart recovery processing begins. **** 16:57:44 UTY6623 Restart recovery processing complete. **** 16:57:44 UTY8800 WARNING: Rate Monitoring turned off - no permission on macro: TPumpMacro.ImportCreate. Teradata Parallel Data Pump Reference 235 Appendix B: TPump Examples Example Using the TABLE Command **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** 16:57:44 UTY6608 Import 2 begins. 16:57:58 UTY6641 Since last chkpt., 370 recs. in, 370 stmts., 37 reqs 16:57:58 UTY6647 Since last chkpt., avg. DBS wait time: 0.38 16:57:58 UTY6612 Beginning final checkpoint... 16:57:58 UTY6641 Since last chkpt., 370 recs. in, 370 stmts., 37 reqs 16:57:58 UTY6647 Since last chkpt., avg. DBS wait time: 0.38 16:57:59 UTY6607 Checkpoint Completes with 400 rows sent. 16:57:59 UTY6642 Import 2 statements: 400, requests: 40 16:57:59 UTY6643 Import 2 average statements per request: 10.00 16:57:59 UTY6644 Import 2 average statements per record: 1.00 16:57:59 UTY6645 Import 2 statements/session: avg. 400.00, min. 400.00, max. 400.00 16:57:59 UTY6646 Import 2 requests/session: avg. 40.00, min. 40.00, max. 40.00 16:57:59 UTY6648 Import 2 DBS wait time/session: avg. 15.00, min. 15.00, max. 15.00 16:57:59 UTY6649 Import 2 DBS wait time/request: avg. 0.38, min. 0.38, max. 0.38 16:57:59 UTY1803 Import processing statistics . IMPORT 2 Total thus far . ========= ============== Candidate records considered:........ 400....... 800 Apply conditions satisfied:.......... 400....... 600 Records logged to error table:....... 0....... 0 Candidate records rejected:.......... 0....... 0 ** Statistics for Apply Label : LABELB Type Database Table or Macro Name Activity U WTH TBL13TA 200 I WTH TBL13TA 200 **** 16:58:01 UTY0821 Error table WTH.tlnt13err is EMPTY, dropping table. 0024 .LOGOFF; ======================================================================== = = = Logoff/Disconnect = = = ======================================================================== **** 16:58:08 UTY6216 The restart log table has been dropped. **** 16:58:08 UTY6212 A successful disconnect was made from the RDBMS. **** 16:58:08 UTY2410 Total processor time used = '0.450648 Seconds' . Start : 16:57:39 - MON JULY 16, 2007 . End : 16:58:08 - MON JULY 16, 2007 . Highest return code encountered = '0'. Example Using the TABLE Command This example script uses the TABLE command and “INSERT <TABLENAME>.*” feature. /***********************************************/ /* STEP01 CREATES THE TABLES FOR THE TPump JOB */ /***********************************************/ .LOGTABLE TLddNT10H; .LOGON cs4400s3/wth,wth; DROP TABLE TBL10T; DROP TABLE TLNT10ERR; CREATE TABLE TBL10T, FALLBACK (RAND INTEGER, ATIME INTEGER, ASESS INTEGER) UNIQUE PRIMARY INDEX (RAND); /*****************************************************************/ 236 Teradata Parallel Data Pump Reference Appendix B: TPump Examples Example Using the TABLE Command /* BEGIN LOAD WITH ALL THE OPTIONS SPECIFIED SUCH AS ERRLIMIT, */ /* CHECKPOINT, SESSIONS,TENACITY */ /*****************************************************************/ .BEGIN LOAD SESSIONS 8 1 PACK 20 SERIALIZE ON CHECKPOINT 1 TENACITY 2 ERRORTABLE TLNT10ERR; .LAYOUT LAY1A; .TABLE TBL10T; .DML LABEL LABELA MARK DUPLICATE ROWS MARK MISSING ROWS MARK EXTRA ROWS; INSERT INTO TBL10T.*; .IMPORT INFILE ./tlnt010.dat LAYOUT LAY1A APPLY LABELA FROM 1 FOR 111; .END LOAD; .LOGOFF; produces the following results. When looking at the following results, notice that the output fields generated by the TABLE command includes the “KEY” modifier for the field coming from the primary index of the table. This is what enables the use of the “SERIALIZE” option: 0001 /***********************************************/ /* STEP01 CREATES THE TABLES FOR THE TPump JOB */ /***********************************************/ .LOGTABLE TLddNT10H; 0002 .LOGON cs4400s3/wth,; **** 17:14:07 UTY8400 Teradata Database Release: 12.00.00.00 **** 17:14:07 UTY8400 Teradata Database Version: 12.00.00.00 **** 17:14:07 UTY8400 Default character set: ASCII **** 17:14:07 UTY8400 Maximum supported buffer size: 1M **** 17:14:07 UTY8400 Upsert supported by RDBMS server **** 17:14:12 UTY6211 A successful connect was made to the RDBMS. **** 17:14:12 UTY6217 Logtable 'WTH.TLddNT10H' has been created. ======================================================================== = = = Processing Control Statements = = = ======================================================================== 0003 DROP TABLE TBL10T; **** 17:14:13 UTY1016 'DROP' request successful. 0004 DROP TABLE TLNT10ERR; **** 17:14:14 UTY1008 RDBMS failure: 3807, Table/view/trigger/procedure 'TLNT10ERR' does not exist. 0005 CREATE TABLE TBL10T, FALLBACK (RAND INTEGER, ATIME INTEGER, ASESS INTEGER) UNIQUE PRIMARY INDEX (RAND); **** 17:14:15 UTY1016 'CREATE' request successful. 0006 /*****************************************************************/ /* BEGIN LOAD WITH ALL THE OPTIONS SPECIFIED SUCH AS ERRLIMIT, */ /* CHECKPOINT, SESSIONS,TENACITY */ /*****************************************************************/ Teradata Parallel Data Pump Reference 237 Appendix B: TPump Examples Example Using the TABLE Command .BEGIN LOAD SESSIONS 8 1 PACK 20 SERIALIZE ON CHECKPOINT 1 TENACITY 2 ERRORTABLE TLNT10ERR; 238 Teradata Parallel Data Pump Reference Appendix B: TPump Examples Example Using the TABLE Command 0007 0008 **** **** **** **** **** 0009 0010 0011 0012 **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** ======================================================================== = = = Processing TPump Statements = = = ======================================================================== .LAYOUT LAY1A; .TABLE TBL10T; 17:14:15 UTY6009 Fields generated by .TABLE command begin. 17:14:15 UTY6010 *** .FIELD RAND * INTEGER KEY; 17:14:15 UTY6010 *** .FIELD ATIME * INTEGER; 17:14:15 UTY6010 *** .FIELD ASESS * INTEGER; 17:14:15 UTY6011 Fields generated by .TABLE command end. .DML LABEL LABELA MARK DUPLICATE ROWS MARK MISSING ROWS MARK EXTRA ROWS; INSERT INTO TBL10T.*; .IMPORT INFILE ./tlnt010.dat LAYOUT LAY1A APPLY LABELA FROM 1 FOR 111; .END LOAD; 17:14:15 UTY6609 Starting to log on sessions... 17:14:16 UTY6610 Logged on 7 sessions. ======================================================================== = = = TPump Import(s) Beginning = = = ======================================================================== 17:14:16 UTY6630 Options in effect for following TPump Import(s): . Tenacity: 2 hour limit to successfully connect load sessions. . Max Sessions: 8 session(s). . Min Sessions: 1 session(s). . Checkpoint: 1 minute(s). . Errlimit: No limit in effect. . Restart Mode: ROBUST. . Serialization: ON. . Packing: 20 Statements per Request. . StartUp Rate: UNLIMITED Statements per Minute. 17:14:21 UTY8800 WARNING: Rate Monitoring turned off - no permission on macro: TPumpMacro.ImportCreate. 17:14:21 UTY6608 Import 1 begins. 17:14:24 UTY6641 Since last chkpt., 111 recs. in, 111 stmts., 7 reqs 17:14:24 UTY6647 Since last chkpt., avg. DBS wait time: 0.43 17:14:24 UTY6612 Beginning final checkpoint... 17:14:24 UTY6641 Since last chkpt., 111 recs. in, 111 stmts., 7 reqs 17:14:24 UTY6647 Since last chkpt., avg. DBS wait time: 0.43 17:14:24 UTY6607 Checkpoint Completes with 111 rows sent. 17:14:24 UTY6642 Import 1 statements: 111, requests: 7 17:14:24 UTY6643 Import 1 average statements per request: 15.86 17:14:24 UTY6644 Import 1 average statements per record: 1.00 17:14:24 UTY6645 Import 1 statements/session: avg. 15.86, min. 14.00, max. 18.00 17:14:24 UTY6646 Import 1 requests/session: avg. 1.00, min. 1.00, max. 1.00 17:14:24 UTY6648 Import 1 DBS wait time/session: avg. 0.43, min. 0.00, max. 2.00 17:14:24 UTY6649 Import 1 DBS wait time/request: avg. 0.43, min. 0.00, max. 2.00 17:14:24 UTY1803 Import processing statistics Teradata Parallel Data Pump Reference 239 Appendix B: TPump Examples Example Using the TABLE Command . IMPORT 1 Total thus far . ========= ============== Candidate records considered:........ 111....... 111 Apply conditions satisfied:.......... 111....... 111 Records logged to error table:....... 0....... 0 Candidate records rejected:.......... 0....... 0 ** Statistics for Apply Label : LABELA Type Database Table or Macro Name Activity I WTH TBL10T 111 **** 17:14:25 UTY0821 Error table WTH.TLNT10ERR is EMPTY, dropping table. 0013 .LOGOFF; ======================================================================== = = = Logoff/Disconnect = = = ======================================================================== **** 17:14:33 UTY6216 The restart log table has been dropped. **** 17:14:33 UTY6212 A successful disconnect was made from the RDBMS. **** 17:14:33 UTY2410 Total processor time used = '0.330475 Seconds' . Start : 17:14:05 - MON JULY 16, 2007 . End : 17:14:33 - MON JULY 16, 2007 . Highest return code encountered = '0'. 240 Teradata Parallel Data Pump Reference APPENDIX C INMOD and Notify Exit Routine Examples This appendix provides INMOD examples using: • COBOL Pass-Thru INMOD • Assembler INMOD • PL/I INMOD • C INMOD - UNIX These examples contain MVS control statements. Each of these INMODs works for VM when appropriate changes are made in order to convert from JCL to REXX. Workstation-based clients support only INMODs written in C; an example of this is also provided in this appendix. COBOL INMOD //DBCCB1 JOB 1,’DBC’,MSGCLASS=A,NOTIFY=DBC,CLASS=B,REGION=4096K //COBCOMPL EXEC COBUCL //COB.SYSIN DD * IDENTIFICATION DIVISION. PROGRAM-ID. DYNAMN. AUTHOR. JCK. INSTALLATION. TERADATA. DATE-WRITTEN. DATE-COMPILED. SECURITY. OPEN. REMARKS. THIS PROGRAM IS A COBOL INMOD ROUTINE FOR TPUMP. FUNCTION: THIS PROGRAM READS AND RETURNS A RECORD OF 80 BYTES LONG VIA STRUCT-1 AND STRUCT-2. ENVIRONMENT DIVISION. CONFIGURATION SECTION. SOURCE-COMPUTER. IBM-370. OBJECT-COMPUTER. IBM-370. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT INMOD-DATA-FILE ASSIGN TO SYSIN-INDATA. DATA DIVISION. FILE SECTION. FD INMOD-DATA-FILE BLOCK CONTAINS 0 RECORDS LABEL RECORDS STANDARD. 01 INPUT-PARM-AREA PICTURE IS X(80). WORKING-STORAGE SECTION. 01 NUMIN PICTURE S9(4) COMP VALUE +0. 01 NUMOUT PICTURE S9(4) COMP VALUE +0. Teradata Parallel Data Pump Reference 241 Appendix C: INMOD and Notify Exit Routine Examples LINKAGE SECTION. *TPump COMMUNICATES WITH INMOD VIA STRUCT-1 AND STRUCT-2. 01 STRUCT-1. 02 RETURN-INDICATE PIC S9(9) COMP. 02 RECORD-LEN PIC S9(9) COMP. 02 RECORD-BODY. 03 DATA-AREA1 PIC X(80). 01 STRUCT-2. 02 SEQ-NUMBER PIC S9(9) COMP. 02 PARM-LIST. 05 PARM-LENTH PIC X(2). 05 PARM-STRING PIC X(80). PROCEDURE DIVISION USING STRUCT-1, STRUCT-2. BEGIN. MAIN. DISPLAY “==============================================” DISPLAY STRUCT-1. DISPLAY STRUCT-2. IF RETURN-INDICATE = 0 THEN * INMOD INITIALIZATION - OPEN FILE AND READ THE 1ST REC. DISPLAY “INMOD CALLED - RETURN CODE 0 ” PERFORM OPEN-FILES PERFORM READ-RECORDS GOBACK ELSE IF RETURN-INDICATE = 1 THEN * READ A RECORD. DISPLAY “INMOD CALLED - RETURN CODE 1 ” PERFORM READ-RECORDS GOBACK ELSE IF RETURN-INDICATE = 5 THEN * CLOSE INMOD - JUST SEND RETURN CODE = 0 DISPLAY “INMOD CALLED - RETURN CODE 5 ” MOVE 0 TO RECORD-LEN MOVE 0 TO RETURN-INDICATE GOBACK ELSE * UNKNOWN CODE. DISPLAY “INMOD CALLED - RETURN CODE X ” MOVE 0 TO RECORD-LEN MOVE 16 TO RETURN-INDICATE GOBACK. OPEN-FILES. OPEN INPUT INMOD-DATA-FILE. MOVE 0 TO RETURN-INDICATE. READ-RECORDS. READ INMOD-DATA-FILE INTO DATA-AREA1 AT END GO TO END-DATA. ADD 1 TO NUMIN. MOVE 80 TO RECORD-LEN. MOVE 0 TO RETURN-INDICATE. ADD 1 TO NUMOUT. END-DATA. CLOSE INMOD-DATA-FILE. DISPLAY “NUMBER OF INPUT RECORDS = ” NUMIN. DISPLAY “NUMBER OF OUTPUT RECORDS = ” NUMOUT. MOVE 0 TO RECORD-LEN. MOVE 0 TO RETURN-INDICATE. 242 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples GOBACK. /* //LKED.SYSLMOD DD DSN=JCK.INMOD.LOAD(INMODG1),DISP=MOD //LKED.SYSIN DD * ENTRY DYNAMN NAME INMODG1(R) /* //****************************************************************** //* NEXT 3 STEPS PREPARE TERADATA rdbms FOR THE TPump’S INMOD TEST * /******************************************************************* //TPUMPDEL EXEC PGM=IEFBR14 //TPUMPLOG DD DSN=JCK.INMOD.TDQ8.TPumpLOG, // DISP=(MOD,DELETE),UNIT=SYSDA,SPACE=(TRK,0) //TPUMPCAT EXEC PGM=TPUMP //SYSPRINT DD SYSOUT=* //TPumpLOG DD DSN=JCK.INMOD.TDQ8.TPumpLOG,DISP=(NEW,CATLG), // UNIT=SYSDA,DCB=(RECFM=F,DSORG=PS,LRECL=8244), // SPACE=(8244,(12,5)) //SYSIN DD * //******************************************************************* //* THIS STEP WILL ONLY DROP THE TABLES IF TPump NOT IN APPLY PHASE * //******************************************************************* //CREATE EXEC BTEQ //STEPLIB DD DSN=STV.GG00.APP.L,DISP=SHR // DD DSN=STV.TG00.APP.L,DISP=SHR // DD DSN=STV.RG00.APP.L,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSABEND DD SYSOUT=* //SYSIN DD DATA,DLM=## .LOGON TDQ8/DBC,DBC; RELEASE TPump XXXX.INMODCB1; .IF ERRORCODE = 2572 THEN .GOTO NODROP; DROP TABLE XXXX.LOGTABLE; DROP TABLE XXXX.ET_INMODCB1; DROP TABLE XXXX.UV_INMODCB1; DROP TABLE XXXX.WT_INMODCB1 .QUIT; .LABEL NODROP; .EXIT 4;; DROP USER XXXX; ## //***************************************************************** //* * //* RUN TPump * //* * //***************************************************************** //LOADIT EXEC PGM=TPump //STEPLIB DD DISP=SHR,DSN=JCK.INMOD.LOAD //SYSPRINT DD SYSOUT=* //SYSTERM DD SYSOUT=* //SYSOUT DD SYSOUT=* //SYSIN DD DATA,DLM=## .LOGTABLE XXXX.LOGTABLE; .LOGON TDQ8/XXXX,XXXX; /* TEST DATAIN, DATALOC */ DROP TABLE XXXX.INMODCB1; CREATE TABLE INMODCB1 (F1 CHAR(10), F2 CHAR(70)); .BEGIN IMPORT TPump TABLES INMODCB1; .Layout layname1; Teradata Parallel Data Pump Reference 243 Appendix C: INMOD and Notify Exit Routine Examples COBOL Pass-Thru INMOD .Field L1Fld1 1 Char(10); .Field L1Fld2 * Char(70); .DML Label DML1; INSERT INMODCB1(F1,F2) VALUES (:L1FLD1, :L1FLD2); .IMPORT INMOD INMODG1 USING (“AAA” “BBB”) LAYOUT LAYNAME1 APPLY DML1; .End LOAD; .LOGOFF; ## //INDATA DD DATA,DLM=## 01COBOL1 AAAAAAAAAAAAAAAA 02COBOL1 BBBBBBBBBBBBBBBB 03COBOL1 CCCCCCCCCCCCCCCC 04COBOL1 DDDDDDDDDDDDDDDD ## //SELECT EXEC BTEQ //STEPLIB DD DSN=STV.GG00.APP.L,DISP=SHR // DD DSN=STV.TG00.APP.L,DISP=SHR // DD DSN=STV.RG00.APP.L,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSABEND DD SYSOUT=* //SYSIN DD DATA,DLM=## .LOGON TDQ8/XXXX,XXXX; SELECT * FROM INMODCB1; .LOGOFF; ## // COBOL Pass-Thru INMOD IDENTIFICATION DIVISION. PROGRAM-ID. INMOD2. AUTHOR. STV. INSTALLATION. TERADATA. DATE-WRITTEN. DATE-COMPILED. SECURITY. OPEN. REMARKS. THIS PROGRAM IS AN EXAMPLE OF A COBOL INMOD ROUTINE WHICH RECEIVES A RECORD FROM TPump THEN MODIFIES OR REJECTS IT. ENVIRONMENT DIVISION. CONFIGURATION SECTION. SOURCE-COMPUTER. IBM-370. OBJECT-COMPUTER. IBM-370. DATA DIVISION. WORKING-STORAGE SECTION. 01 COUNTROWS PICTURE S9(4) COMP VALUE +0. 01 REJROWS PICTURE S9(4) COMP VALUE +0. 01 INSROWS PICTURE S9(4) COMP VALUE +0. 01 I PICTURE S9(4) COMP. 01 MATCHFLAG PIC 9. 88 NOTMATCH VALUE 0. 88 MATCH VALUE 1. LINKAGE SECTION. 01 STRUCT-1. 02 RETURN-INDICATE PIC S9(9) COMP. 244 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples COBOL Pass-Thru INMOD 02 02 RECORD-LEN PIC S9(9) COMP. RECORD-BODY OCCURS 80 TIMES. 03 DATA-AREA1 PIC X. 01 STRUCT-2. 02 SEQ-NUMBER PIC S9(9) COMP. 02 PARM-LIST. 05 PARM-LENGTH PIC S9(4) COMP. 05 PARM-STRING OCCURS 80 TIMES. 07 PARM-DATA PIC X. PROCEDURE DIVISION USING STRUCT-1, STRUCT-2. BEGIN. MAIN. DISPLAY “================================================” IF RETURN-INDICATE = 6 THEN DISPLAY “INMOD2 CALLED - RETURN CODE 6 ” PERFORM INITIALIZE GOBACK ELSE IF RETURN-INDICATE = 7 THEN DISPLAY “INMOD2 CALLED - RETURN CODE 7 ” PERFORM PROCESS-RECORD GOBACK ELSE IF RETURN-INDICATE = 5 THEN DISPLAY “INMOD2 CALLED - RETURN CODE 5 ” PERFORM FINALIZE GOBACK ELSE DISPLAY “BLKEXIT CALLED - RETURN CODE X ” MOVE 0 TO RETURN-INDICATE. GOBACK. INITIALIZE. MOVE 0 TO COUNTROWS INSROWS REJROWS. MOVE 0 TO RETURN-INDICATE. PROCESS-RECORD. ADD 1 TO COUNTROWS. MOVE 0 TO RETURN-INDICATE. MOVE 1 TO I. MOVE 1 TO MATCHFLAG. PERFORM COMPARE UNTIL (I > PARM-LENGTH) OR (NOTMATCH). IF NOTMATCH THEN DISPLAY “REJECTED” ADD 1 TO REJROWS MOVE 0 TO RECORD-LEN ELSE DISPLAY “ACCEPTED” ADD 1 TO INSROWS. COMPARE. IF (RECORD-BODY(I) = PARM-STRING(I)) THEN NEXT SENTENCE ELSE MOVE 0 TO MATCHFLAG. ADD 1 TO I. FINALIZE. MOVE 0 TO RETURN-INDICATE. DISPLAY “NUMBER OF TOTAL RECORDS = ” COUNTROWS. DISPLAY “NUMBER OF REJECTED RECORDS = ” REJROWS. DISPLAY “NUMBER OF ACCEPTED RECORDS = ” INSROWS. GOBACK. Teradata Parallel Data Pump Reference 245 Appendix C: INMOD and Notify Exit Routine Examples Assembler INMOD Assembler INMOD //JCKAS1 JOB 1,’JCK’,MSGCLASS=A,NOTIFY=JCK,CLASS=B, REGION=4096K //***************************************************************** //* * //* IDENTIFY NECESSARY LOAD LIBRARIES FOR RELEASE * //* * //***************************************************************** //JOBLIB DD DISP=SHR,DSN=STV.GG10.APP.L // DD DISP=SHR,DSN=STV.GG00.APP.L // DD DISP=SHR,DSN=STV.TG00.APP.L // DD DISP=SHR,DSN=STV.RG00.APP.L // DD DISP=SHR,DSN=TER2.SASC301H.LINKLIB //ASMFCL EXEC ASMFCL //ASM.SYSIN DD * DYNAMN TITLE ’-- CONCATENATE INPUT RECORDS FOR INPUT TO TPump’ DYNAMN CSECT USING DYNAMN,15 ******************************************************************* * THIS PROGRAM IS CALLED BY THE TERADATA TPump PROGRAM * * TO OBTAIN A RECORD TO BE USED TO INSERT,UPDATE, OR * * DELETE ROWS OF A TARGET TABLE. * * * * THIS PROGRAM IS NOT REENTRANT. * * FUNCTION: * * READ AN INPUT RECORD AND ADD A FOUR-BYTE INTEGER FIELD * * THE FRONT OF THE RECORD. THE NEW FIELD WILL CONTAIN * * A SEQUENCE NUMBER WHICH RANGES FROM 1 TO ... * * NUMBER-OF-INPUT-RECORDS. * * * * RETURN TO THE CALLER (TPump) INDICATING * * EITHER MORE RECORDS ARE AVAILABLE OR NO MORE RECORDS * * ARE TO BE PROCESSED. * * * * THIS INMOD PROGRAM CAN BE USED TO ENSURE UNIQUE RECORDS * * IN CERTAIN APPLICATIONS, THE SEQUENCE FIELD * * CAN BE USED FOR “DATA SAMPLING”. * * * * DDNAME OF THE INPUT DATA SET: “INDATA” * * * ******************************************************************* B STOREGS BRANCH AROUND EP DC AL1(31) DEFINE EP LENGTH DC CL9’DYNAMN ’ DEFINE DC CL9’&SYSDATE’ ENTRY DC CL8’ VM ’ POINT DC CL5’&SYSTIME’ IDENTIFIER ******************************************************************* * SAVE REGISTERS * ******************************************************************* STOREGS DS 0H DEFINE AND ALIGN SYMBOL STM R14,R12,12(R13) STORE OFF CALLER’S REGISTERS LR R12,R15 COPY BASE ADDRESS DROP R15 DROP VOLATILE BASE REGISTER USING DYNAMN,R12 ESTAB PERM CSECT ADDRBLTY LA R14,SAVEAREA POINT AT LOCAL SAVE WORK 246 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples Assembler INMOD ST ST LR R14,8(,R13) R13,4(,R14) R13,R14 STORE FWD LINK IN SA CHAIN STORE BWD LINK IN SA CHAIN COPY LOCAL SAVE/WORK AREA ADDR POINT TO PARM L R11,0(,R1) SPACE 1 ******************************************************************* * OPEN “DATA” DATA SET * * (ONLY THE FIRST TIME) * ******************************************************************* USING PREBUF,R11 COVER PRE-PROC AREA LA R9,PREREC POINT TO START OF PREPROC. DATA OC PRECODE,PRECODE FIRST ENTRY ? (0=FIRST ENTRY) BNZ NOOPEN NO, SKIP OPEN USING IHADCB,R10 YES,COVER DCB FOR OPEN LA R10,INDATA POINT TO DATA DCB OPEN INDATA OPEN INPUT DATA SET TM DCBOFLGS,X’10’ DID IT OPEN ? BO OPENOK YES, WTO ’UNABLE TO OPEN INDATA DATA SET’,ROUTCDE=11 B BADRET RETURN WITH ERROR CODE ******************************************************************* * CHECK TPump STATUS CODES * * 0 = FIRST ENTRY (TPump EXPECTS TO RECEIVE A RECORD) * * 1 = GET NEXT RECORD (TPump EXPECTS TO RECEIVE A RECORD) * * 2 = CLIENT RESTART CALL (TPump DOES NOT EXPECT A RECORD) * * 3 = CHECKPOINT CALL (TPump DOES NOT EXPECT A RECORD) * * 4 = RESTART CALL (TPump DOES NOT EXPECT A RECORD) * * 5 = CLOSE INMOD (TPump DOES NOT EXPECT A RECORD) * * * * * * NOTE: CODES 2,3 AND 4 ARE NOT HANDLED BY THIS PROGRAM * * * ******************************************************************* OPENOK DS 0H NOOPEN L R15,PRECODE CHECK ON CODE FROM TPump C R15,=F’1’ NEED RECORD ? BH NOREC NO , DO NOT “GET” A RECORD L R15,SAMPNUM GET CURRENT SAMPLE NUM. LA R15,1(R15) INCR BY 1 ST R15,0(R9) STORE AT FRONT OF RECORD ST R15,SAMPNUM RESET COUNTER LA R9,4(R9) ADVANCE FOR READ ADDR. LA R10,INDATA COVER INDATA DCB GETNEXT GET INDATA,(R9) READ A RECORD INCREC LH R9,DCBLRECL GET RECORD LENGTH AH R9,=H’4’ ADD 4 FOR NEW FIELD SR R15,R15 SET RETURN CODE VALUE RETURN ST R9,PRELEN SET LENGTH (ZERO AFTER EOF) ST R15,PRECODE L R13,4(R13) RETURN (14,12),RC=0 RETURN SPACE 5 ******************************************************************* * EOF ENTERED AT END-OF-FILE * ******************************************************************* Teradata Parallel Data Pump Reference 247 Appendix C: INMOD and Notify Exit Routine Examples Assembler INMOD * EOF CLOSE INDATA CLOSE INPUT DATA SET * ******************************************************************* NOREC SR R15,R15 SET ZERO RETURN CODE SR R9,R9 SET ZERO LENGTH B RETURN RETURN * BADRET LA R15,16 SET RETURN CODE FOR ERROR SR R9,R9 SET LENGTH = 0 B RETURN ERROR RETURN EJECT * * CONSTANTS * * REGEQU R0 EQU 0 R1 EQU 1 R2 EQU 2 R3 EQU 3 R4 EQU 4 R5 EQU 5 R6 EQU 6 R7 EQU 7 R8 EQU 8 R9 EQU 9 R10 EQU 10 R11 EQU 11 R12 EQU 12 R13 EQU 13 R14 EQU 14 R15 EQU 15 EJECT * * DATA STRUCTURES AND VARIABLES * SPACE 1 SAVEAREA DC 9D’0’ SAVE AREA SAMPNUM DC F’0’ SPACE 10 INDATA DCB DDNAME=INDATA,MACRF=(GM),DSORG=PS,EODAD=EOF PREBUF DSECT PRECODE DS F PRELEN DS F PREREC DS 0XL31000 DCBD DEVD=DA,DSORG=PS PREPRM DSECT PRESEQ DS F PREPRML DS H PREPRMS DS CL80 END //LKED.SYSLMOD DD DSN=JCK.INMOD.LOAD(INMODG1),DISP=MOD,UNIT=3380, // VOLUME=SER=TSO805 //LKED.SYSIN DD * ENTRY DYNAMN NAME INMODG1(R) /* //TPUMPDEL EXEC PGM=IEFBR14 //TPUMPLOG DD DSN=JCK.INMOD.TDQ8.TPumpLOG, // DISP=(MOD,DELETE),UNIT=SYSDA,SPACE=(TRK,0) //TPUMPCAT EXEC PGM=TPUMP 248 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples Assembler INMOD //STEPLIB DD DSN=STV.GG00.APP.L,DISP=SHR // DD DSN=STV.TG00.APP.L,DISP=SHR // DD DSN=STV.RG00.APP.L,DISP=SHR //SYSPRINT DD SYSOUT=* //TPUMPLOG DD DSN=JCK.INMOD.TDQ8.TPumpLOG,DISP=(NEW,CATLG), // UNIT=SYSDA,DCB=(RECFM=F,DSORG=PS,LRECL=8244), // SPACE=(8244,(12,5)) //SYSIN DD * //********************************************************************** //* THIS STEP WILL ONLY DROP THE TABLES IF TPump IS NOT IN APPLY PHASE * //********************************************************************** //CREATE EXEC BTEQ //STEPLIB DD DSN=STV.GG00.APP.L,DISP=SHR // DD DSN=STV.TG00.APP.L,DISP=SHR // DD DSN=STV.RG00.APP.L,DISP=SHR //SYSPRINT DD SYSOUT=A //SYSABEND DD SYSOUT=* //SYSIN DD DATA,DLM=## .LOGON TDQ8/DBC,DBC; RELEASE TPump XXXX.INMODAS1; .IF ERRORCODE = 2572 THEN .GOTO NODROP; DROP TABLE XXXX.LOGTABLE; DROP TABLE XXXX.ET_INMODAS1; DROP TABLE XXXX.UV_INMODAS1; DROP TABLE XXXX.WT_INMODAS1; .QUIT; .LABEL NODROP; .EXIT 4; ## //***************************************************************** //* * //* RUN TPump * //* * //***************************************************************** //LOADIT EXEC PGM=TPump //STEPLIB DD DISP=SHR,DSN=STV.GG10.APP.L // DD DISP=SHR,DSN=STV.GG00.APP.L // DD DISP=SHR,DSN=STV.TG00.APP.L // DD DISP=SHR,DSN=STV.RG00.APP.L // DD DISP=SHR,DSN=TER2.SASC301H.LINKLIB // DD DISP=SHR,DSN=JCK.INMOD.LOAD,VOLUME=SER=TSO805,UNIT=3380 //SYSPRINT DD SYSOUT=* //SYSTERM DD SYSOUT=* //SYSOUT DD SYSOUT=* //SYSIN DD DATA,DLM=## .LOGTABLE XXXX.LOGTABLE; .LOGON TDQ8/XXXX,XXXX; /* TEST DATAIN, DATALOC */ DROP TABLE XXXX.INMODAS1; CREATE TABLE INMODAS1 (F1 CHAR(10), F2 CHAR(70)); .BEGIN IMPORT TPump TABLES INMODAS1; .Layout layname1; .FIELD L1FLD0 1 CHAR(4); .FIELD L1FLD1 * CHAR(10); .Field L1Fld2 * Char(70); .DML Label DML1; INSERT INMODAS1(F1,F2) VALUES (:L1FLD1, :L1FLD2); .IMPORT INMOD INMODG1 USING (“AAA” “BBB”) LAYOUT LAYNAME1 APPLY DML1; .End LOAD; Teradata Parallel Data Pump Reference 249 Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD .LOGOFF; ## //INDATA DD DATA,DLM=## 01ASSEMBLEAAAAAAAAAAAAAAAA 02ASSEMBLEBBBBBBBBBBBBBBBB 03ASSEMBLECCCCCCCCCCCCCCCC 04ASSEMBLEDDDDDDDDDDDDDDDD ## //SELECT EXEC BTEQ //STEPLIB DD DSN=STV.GG00.APP.L,DISP=SHR // DD DSN=STV.TG00.APP.L,DISP=SHR // DD DSN=STV.RG00.APP.L,DISP=SHR //SYSPRINT DD SYSOUT=A //SYSABEND DD SYSOUT=* //SYSIN DD DATA,DLM=## .LOGON TDQ8/XXXX,XXXX; SELECT * FROM INMODAS1; .LOGOFF; ## // PL/I INMOD //SFDPL2 JOB (22150000),’SFD’,MSGCLASS=A,CLASS=B, // REGION=4096K //***************************************************************** //* * //* IDENTIFY NECESSARY LOAD LIBRARIES FOR RELEASE * //* * //***************************************************************** //JOBLIB DD DSN=STV.RG20.APPLOAD,DISP=SHR // DD DSN=STV.EG14MLL1.APP.L,DISP=SHR // DD DSN=STV.TG13BLD.APP.L,DISP=SHR // DD DSN=TER2.SASC450F.LINKLIB,DISP=SHR //STEP1 EXEC ASMFC //ASM.SYSGO DD DSN=&&LOADSET1,DISP=(MOD,PASS),UNIT=VIO, // SPACE=(880,(500,100),,,ROUND) //ASM.SYSIN DD * PLIA TITLE ’TPump INTERFACE TO PL/I EXIT ROUTINE’ DYNAMN CSECT CNOP 0,4 B START-*(,R15) BRANCH AROUND CONSTANTS DC AL1(L’PLIAFLAG) LENGTH OF CONSTANTS PLIAFLAG DC C’ASSEMBLED AT &SYSTIME ON &SYSDATE.. BLKPLIA’ *-----------------------------------------------------------------* * G1_01 * * * * ON ENTRY: R1 -> PARAMETER LIST * * PARM 1 -> MULTI-FIELD RECORD * * FIELD 1: COMMAND CODE/RETURN CODE * * (32 BIT INTEGER) * * 0 = INITIAL CALL * * 1 = RECORD CALL * * 2 = HOST RESTART - ALSO INITIAL CALL * * 3 = CHECKPOINT * * 4 = DBC RESTART * 250 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD * 5 = FINAL CALL * * 6 = W/ INFILE - ALSO INITIAL CALL * * 7 = RECEIVE RECORD CALL * * FIELD 2: DATA RECORD LENGTH * * (32 BIT INTEGER) * * FIELD 3: DATA RECORD * * (UP TO 31K BYTES) * * PARM 2 -> EXIT ROUTINE WORK WORD * * (32 BIT INTEGER) * * * * * * OPERATION: * * INITIAL CALL: * * 1) BULK LOADER LOADS THIS MODULE AND CALLS * * 2) BLKPLIA (THIS PROGRAM) WHICH CALLS * * 3) BLKPLI (PL/I PROGRAM TO ESTABLISH PL/I ENVIRONMENT) * * WHICH CALLS * * 4) BLKASM (ENTRY POINT IN THE PROGRAM) WHICH CALLS * * 5) BLKEXIT (USER EXIT PROGRAM IN PL/I). * * UPON RETURN: * * 1) BLKEXIT RETURNS TO * * 2) BLKASM WHICH PERFORMS MAGIC AND RETURNS DIRECTLY TO * * 3) BULK LOADER, THEREBY PRESERVING THE PL/I ENVIRONMENT * * FOR SUBSEQUENT CALLS. * * RECORD CALL: * * 1) BULK LOADER CALLS * * 2) BLKPLIA WHICH REVERSES THE MAGIC AND BRANCHES INTO * * 3) BLKASM WHICH CALLS * * 4) BLKEXIT WITH THE PL/I ENVIRONMENT SAVED BEFORE. * * UPON RETURN: * * 1) BLKEXIT RETURNS TO * * 2) BLKASM WHICH PERFORMS MAGIC AND RETURNS DIRECTLY TO * * 3) BULK LOADER, THEREBY PRESERVING THE PL/I ENVIRONMENT * * FOR SUBSEQUENT CALLS. * * * *-----------------------------------------------------------------* START SAVE (14,12) LR R11,R15 -> PROGRAM ENTRY POINT USING DYNAMN,R11 L R2,4(,R1) -> EXIT ROUTINE WORD L R3,0(,R1) -> COMMAND WORD L R3,0(,R3) COMMAND WORD CH R3,=H’0’ INITIAL CALL? BE INITCALL YES , DO INITIAL CODE CH R3,=H’6’ INITIAL CALL? BE INITCALL YES , DO INITIAL CODE CH R3,=H’2’ INITIAL CALL? BNE CALLPGM NO, JUST GO CALL PROGRAM *-----------------------------------------------------------------* * SETUP WORK AREA AND PL/I ENVIRONMENT * *-----------------------------------------------------------------* INITCALL LA R0,WORKALEN SR R1,R1 L R15,=V(DBCMEM) BALR R14,R15 ST R1,0(,R2) SAVE WORKAREA ADDRESS ST R1,WORKADDR SAVE WORKAREA ADDRESS LR R10,R1 -> CURRENT WORK AREA USING WORKAREA,R10 Teradata Parallel Data Pump Reference 251 Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD SPIE MF=(E,NOSPIE) CLEAR PASCAL INTERRUPT EXIT ST R1,SPIEPAS SAVE PASCAL SPIE MVC WRKFLAG,WRKFLAGP IDENTIFY WORK AREA XC SAVE1(12),SAVE1 CLEAR START OF SAVEAREA LA R1,SAVE1 INITIAL PROGRAM SAVE AREA ST R13,4(,R1) BACK CHAIN SAVE AREAS ST R1,8(,R13) FORW CHAIN SAVE AREAS LR R13,R1 -> NEW SAVE AREA ST R3,COMMAND KEEP COMMAND FOR LATER LA R1,PLIPARM -> STARTUP PARAMETERS L R15,=V(PLISTART) PL/I SETUP ENTRY POINT BALR R14,R15 CALL PL/I SETUP PROGRAM *-----------------------------------------------------------------* * FINAL RETURN FROM PL/I: FREE WORKAREA AND RETURN * *-----------------------------------------------------------------* L R1,SPIEPAS -> PASCALVS SPIE SPIE MF=(E,(1)) RESTORE PASCALUS SPIE L R13,4(,R13) BACK UP SAVE AREA CHAIN LR R1,R10 LA R0,WORKALEN L R15,=V(DBCMEM) BALR R14,R15 DROP R10 WORKAREA RETURN XR R15,R15 INDICATE ALL IS WELL ST R15,16(,R13) SET CALLER’S RETURN CODE RETURN (14,12) RETURN TO CALLER *-----------------------------------------------------------------* * REESTABLISH PL/I ENVIRONMENT AND CALL USER * *-----------------------------------------------------------------* ALLPGM L R10,0(R2) -> WORK AREA CALLPGM L R10,WORKADDR -> WORK AREA USING WORKAREA,R10 ST R3,COMMAND KEEP COMMAND FOR LATER LR R3,R1 SAVE -> PARMS FOR LATER LA R1,SAVE1 -> BLKPLIA SAVE AREA ST R13,4(,R1) REBUILD BACK CHAIN ST R1,8(,R13) REBUILD FORW CHAIN LM R12,R13,SAVE2 REESTABLISH PL/I ENVIRONMENT B AGAIN CALL EXIT ROUTINE DROP R10 WORKAREA DROP R11 BLKPLIA *-----------------------------------------------------------------* * PL/I CALLS HERE WITH CORRECT ENVIRONMENT * *-----------------------------------------------------------------* ENTRY BLKASM BLKASM B ASMSTART-*(,R15) BRANCH AROUND CONSTANTS DC AL1(L’ASMFLAG) LENGTH OF CONSTANTS ASMFLAG DC C’BLKASM’ ASMSTART SAVE (14,12) SAVE BLKPLI REGISTERS LR R11,R15 ADDRESS PROGRAM USING BLKASM,R11 SPIE MF=(E,NOSPIE)) CLEAR PASCAL INTERRUPT EXIT LR R4,R1 HOLD PL/I SPIE FOR LATER *-----------------------------------------------------------------* * PREPARE PROPER PL/I DSA FOR FURTHER MURGLING * *-----------------------------------------------------------------* LA R0,88 LENGTH OF NEW DSA L R1,76(,R13) -> FIRST AVAILABLE STORAGE ALR R0,R1 -> POSSIBLE END + 1 252 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD CL R0,12(,R12) ENOUGH ROOM FOR NEW DSA? BNH ENOUGH YES, GO USE IT L R15,116(,R12) NO, POINT TO OVERFLOW ROUTINE BALR R14,R15 AND CALL IT ENOUGH ST R0,76(,R1) NEW FIRST AVAILABLE STORAGE ST R13,4(,R1) BACK CHAIN SAVE AREAS MVC 72(4,R1),72(R13) COPY LIB WORKSPACE ADDRESS LR R13,R1 ADDRESS NEW DSA MVI 0(R13),X’80’ SET FLAGS IN DSA TO MVI 1(R13),X’00’ PRESERVE PL/I MVI 86(R13),X’91’ ERROR HANDLING MVI 87(R13),X’C0’ IN THE ASSEMBLER ROUTINE *-----------------------------------------------------------------* * CALL USER PL/I ROUTINE WITH ORIGINAL BULK PARMS * *-----------------------------------------------------------------* L R2,4(,R13) -> REGISTERS TO BLKASM L R2,4(,R2) -> PREVIOUS REGISTERS L R2,4(,R2) -> REGISTERS TO BLKPLI L R2,4(,R2) -> REGISTERS TO BLKPLIA L R3,24(,R2) -> PARMS TO BLKPLIA L R1,4(,R3) -> EXIT ROUTINE WORD L R10,0(,R1) -> WORK AREA L R10,WORKADDR -> WORK AREA.G1_01. USING WORKAREA,R10 CLC WRKFLAG,WRKFLAGP DID IT WORK? BE GOODWRK YES, USE IT ABEND 1,DUMP NO, ABEND RIGHT HERE GOODWRK STM R12,R13,SAVE2 SAVE PL/I ENVIRONMENT ST R4,SPIEPLI SAVE PL/I SPIE L R11,16(,R2) -> BLKPLIA ENTRY POINT DROP R11 BLKASM USING DYNAMN,R11 AIN L R1,SPIEPLI -> PLI SPIE SPIE MF=(E,(1)) RESTORE PL/I SPIE AGAIN XR R5,R5 MUST BE ZERO CALLING PL/I OI 4(R3),X’80’ LAST PARAMETER .G1_01. LR R1,R3 RESTORE ORIGINAL R1 .G1_01. L R15,=V(BLKEXIT) -> USER ROUTINE BALR R14,R15 CALL USER *-----------------------------------------------------------------* * CHECK WHETHER OR NOT TO HOLD PL/I ENVIRONMENT * *-----------------------------------------------------------------* L R1,SPIEPAS -> PASCALVS SPIE SPIE MF=(E,(1)) RESTORE PASCALVS SPIE L R13,SAVE1+4 RETURN AROUND PL/I B RETURN GO PERFORM RETURN DROP R10 WORKAREA DROP R11 BLKPLIA LTORG SPACE 2 NOSPIE SPIE MF=L SPACE 2 STRUC DC F’0’ OFFSET OF FIRST ELEMENT DC F’4’ OFFSET OF SECOND ELEMENT DC F’8’ OFFSET OF THIRD ELEMENT DC Y(31*1024,0) 31K FIXED LENGTH STRING SPACE 2 PLIPARM DC A(*+4+X’80000000’) -> PL/I INITIAL ARGUMENT DC Y(L’PLIARG) LENGTH OF PL/I INITIAL ARGUMENT Teradata Parallel Data Pump Reference 253 Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD PLIARG WORKADDR WRKFLAGP WRKFLAGL WORKAREA WRKFLAG SAVE1 COMMAND SAVE2 SPIEPAS SPIEPLI EXITPRM AGGLOC WORKALEN R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 DC SPACE DS DC DC EQU SPACE DSECT DS DS DS DS DS DS DS DS DS EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU END C’NOSTAE/’ DISABLE ERROR RECOVERY 2 F ADDRESS FOR WORKAREA .G1_01. C’BLKPLIA WORK AREA’ CL(((*-WRKFLAGP+7)/8*8)(*WRKFLAGP))’ ’ FILL TO DWORD *-WRKFLAGP 2 CL(WRKFLAGL) 18F F 2F F F A 2A 0D *-WORKAREA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DYNAMN SAVE AREA FOR BULKPLI ALIGN END OF WORK AREA /* //STEP2 EXEC PLIXC //PLI.SYSLIN DD DSN=&&LOADSET2,DISP=(MOD,PASS),UNIT=VIO, // SPACE=(80,(250,100)) //PLI.SYSIN DD DATA,DLM=## BLKPLI: /* BULK LOADER INTERFACE TO PL/I USER EXIT ROUTINE */ PROC OPTIONS (MAIN); /* THIS PROGRAM IS CALLED BLKPLIA (THE SPECIAL EXIT ROUTINE ENTRY */ /* POINT PROGRAM, WRITTEN IN ASSEMBLER). */ /* IT THEN CALLS BLKASM (ANOTHER ENTRY POINT IN BLKPLIA). */ DCL BLKASM ENTRY; CALL BLKASM; END; ## //STEP3 EXEC PLIXCL //PLI.SYSIN DD DATA,DLM=## BLKEXIT: PROCEDURE (X,Y); /* ONLY BLKEXIT ACCEPTED HERE. */ DCL X FIXED, Y FIXED; DCL 1 PARM_LIST ALIGNED BASED(P), 10 STATUS FIXED BINARY (31,0), 10 RLENGTH FIXED BINARY (31,0), 10 BUFFER CHAR(80); 254 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD DCL 1 PARM_PARM2 ALIGNED BASED(Q), 10 SEQ FIXED BINARY (31,0), 10 LEN FIXED BINARY (15,0), 10 PARAMETER CHAR(80); DCL COUNT STATIC FIXED BINARY (31,0), INSROWS STATIC FIXED BINARY (31,0), REJROWS STATIC FIXED BINARY (31,0; DCL I,NOTMATCH FIXED BINARY (31,0); DCL ADDR BUILTIN; DCL SUBSTR BUILTIN; P=ADDR(X); Q=ADDR(Y); DISPLAY(’### INSIDE PL/I INMOD ROUTINE...’); DISPLAY(P->STATUS); DISPLAY(P->RLENGTH); DISPLAY(P->BUFFER); DISPLAY(Q->SEQ); DISPLAY(Q->LEN); DISPLAY(Q->PARAMETER); SELECT (P->STATUS); WHEN (6) DO; /* Initialize */ COUNT=0; REJROWS=0; INSROWS=0; P->STATUS=0; END; WHEN (7) DO; /* Process */ DISPLAY('Processing...'); COUNT=COUNT+1; NOTMATCH=0; P->STATUS =0; DO I = 1 TO Q->LEN; IF SUBSTR(P->BUFFER,I,1) ^= SUBSTR(Q->PARAMETER,I,1) THEN DO; NOTMATCH = 1; LEAVE; END; END; IF NOTMATCH = 1 THEN DO; DISPLAY('------> REJECTED <--------'); REJROWS = REJROWS +1; P->RLENGTH = 0; END; ELSE DO; DISPLAY('------> accepted <--------'); INSROWS = INSROWS +1; END; END; WHEN (5) DO; /* Finalizing */ DISPLAY('Finalizing...'); P->STATUS=0; END; OTHERWISE DO; DISPLAY('UNKNOWN CODE...'); P->STATUS=99; END; END; DISPLAY('P->STATUS=');DISPLAY(STATUS); DISPLAY('P->RLENGTH=');DISPLAY(RLENGTH); DISPLAY('TOTAL =');DISPLAY(COUNT); Teradata Parallel Data Pump Reference 255 Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD DISPLAY('INSERTS=');DISPLAY(INSROWS); DISPLAY('REJROWS=');DISPLAY(REJROWS); DISPLAY('--------------------------------------------------------'); END BLKEXIT; ## //LKED.SYSIN DD * INCLUDE BLKPLI INCLUDE BLKPLIA INCLUDE CLILIB(DBCMEM) ENTRY DYNAMN NAME INMDPL2(R) /* //LKED.BLKPLIA DD DSN=*.STEP1.ASM.SYSGO,DISP=(OLD,PASS), // VOL=REF=*.STEP1.ASM.SYSGO //LKED.BLKPLI DD DSN=*.STEP2.PLI.SYSLIN,DISP=(OLD,PASS), // VOL=REF=*.STEP2.PLI.SYSLIN //LKED.CLILIB DD DISP=SHR,DSN=STV.RG20APP.APP.L,UNIT=3380, // VOLUME=SER=CNFG03 //COPY EXEC PGM=IEBGENER //SYSIN DD DUMMY //SYSPRINT DD SYSOUT=* //SYSUT2 DD DISP=(NEW,PASS),DSN=&&TEMP,UNIT=SYSDA, // DCB=(LRECL=80,BLKSIZE=1760,RECFM=FB), // SPACE=(CYL,(1,1),RLSE) //SYSUT1 DD DATA,DLM=@@ ("SASC") A0000000000000000000000000000A A0000000000000000000000000000A ("COBOL") A0000000000000000000000000000A ("ASSEM") A0000000000000000000000000000A ("SASC") B1111111111111111111111111111B ("PASC") B1111111111111111111111111111B ("COBOL") B1111111111111111111111111111B ("ASSEM") B1111111111111111111111111111B ("SASC") C2222222222222222222222222222C ("PASC") C2222222222222222222222222222C ("COBOL") C2222222222222222222222222222C ("ASSEM") C2222222222222222222222222222C ("PL/I") C2222222222222222222222222222C ("SASC") D3333333333333333333333333333D ("PASC") D3333333333333333333333333333D ("PL/I") D3333333333333333333333333333D ("SASC") E4444444444444444444444444444E ("PASC") E4444444444444444444444444444E ("PL/I") E4444444444444444444444444444E ("SASC") F5555555555555555555555555555F ("PASC") F5555555555555555555555555555F ("PL/I") F5555555555555555555555555555F @@ //********************************************************************** //* THIS STEP WILL ONLY DROP THE TABLES IF TPump IS NOT IN APPLY PHASE * //********************************************************************** //CREATE EXEC BTEQ .LOGON TDP5/DMD,DMD; /* INMOD TEST CASE II - PL/I */ RELEASE TPump DMD.INMODPL2; .IF ERRORCODE = 2572 THEN .GOTO NODROP; DROP TABLE DMD.LOGTABLE; DROP TABLE DMD.ET_INMODPL2; DROP TABLE DMD.UV_INMODPL2; 256 ("PASC") Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD DROP TABLE DMD.WT_INMODPL2; DROP TABLE DMD.INMODPL2; .QUIT; .LABEL NODROP; .EXIT 4; CREATE TABLE INMODPL2 (F1 CHAR(10), F2 CHAR(70)); ## //***************************************************************** //* * //* RUN TPump * //* * //***************************************************************** //LOADIT EXEC PGM=TPump,TIME=(,3) //STEPLIB DD DSN=STV.RG20.APPLOAD,DISP=SHR // DD DSN=STV.EG14MLL1.APP.L,DISP=SHR // DD DSN=STV.TG13BLD.APP.L,DISP=SHR // DD DSN=TER2.SASC450F.LINKLIB,DISP=SHR // DD DSN=*.STEP3.LKED.SYSLMOD,DISP=(OLD,PASS), // VOL=REF=*.STEP3.LKED.SYSLMOD //SYSPRINT DD SYSOUT=* //SYSTERM DD SYSOUT=* //SYSOUT DD SYSOUT=* //INDATA DD DISP=OLD,DSN=*.COPY.SYSUT2,DCB=(LRECL=80,RECFM=F), // VOL=REF=*.COPY.SYSUT2 //SYSIN DD DATA,DLM=## .LOGON TDP5/DMD,DMD; .LOGTABLE DMD.LOGTABLE_SFD; .BEGIN LOAD TABLES INMODPL2; .Layout layname1; .Field L1Fld1 1 Char(10); .Field L1Fld2 * Char(30); .Field L1Fld3 * Char(40); .DML Label DML1; INSERT INMODPL2(F1,F2) VALUES (:L1FLD1, :L1FLD2); .IMPORT INFILE INDATA INMOD INMDPL2 USING (“PL/I”) LAYOUT LAYNAME1 APPLY DML1; .End LOAD; .LOGOFF; ## C INMOD - MVS //JCKLC1 JOB 1,’JCK’,MSGCLASS=A,NOTIFY=JCK,CLASS=B, REGION=4096K //****************************************************************** //* * //* IDENTIFY NECESSARY LOAD LIBRARIES FOR RELEASE * //* * //****************************************************************** //JOBLIB DD DISP=SHR,DSN=STV.GG10.APP.L // DD DISP=SHR,DSN=STV.GG00.APP.L // DD DISP=SHR,DSN=STV.TG00.APP.L // DD DISP=SHR,DSN=STV.RG00.APP.L // DD DISP=SHR,DSN=TER2.SASC301H.LINKLIB //C EXEC PGM=LC370B //STEPLIB // //SYSTERM //SYSPRINT //SYSUT1 DD DD DD DD DD DSN=TER2.SASC301H.LOAD,DISP=SHR DSN=TER2.SASC301H.LINKLIB,DISP=SHR SYSOUT=* SYSOUT=* UNIT=SYSDA,SPACE=(TRK,(10,10)) Teradata Parallel Data Pump Reference 257 Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD //SYSUT2 DD UNIT=SYSDA,SPACE=(TRK,(10,10)) //SYSLIN DD DSN=&&OBJECT,SPACE=(3200,(10,10)),DISP=(MOD,PASS), // UNIT=SYSDA //SYSLIB DD DSN=TER2.SASC301H.MACLIBC,DISP=SHR //SYSDBLIB DD DSN=&&DBGLIB,SPACE=(4080,(20,20,1)),DISP=(,PASS), // UNIT=SYSDA,DCB=(RECFM=U,BLKSIZE=4080) //SYSTMP01 DD UNIT=SYSDA,SPACE=(TRK,25) VS1 ONLY //SYSTMP02 DD UNIT=SYSDA,SPACE=(TRK,25) VS1 ONLY //SYSIN DD DATA,DLM=## /* This program is for TPump INMOD testing using C user exit routine. When this routine is activated it looks at the content of the function code passed (a->code) and depending on its value, it 0) initializes, i.e., opens a file, etc... 1) reads a record 5) acknowledges “close inmod” request. The user exit routine must return “return code”(a->code) and “length” (a->len). You should send return code = zero when no errors occur and non-zero for an error. TPump expects length = zero at the end of file. Then it sends “CLOSE INMOD” request. THE USER EXIT routine must explicitly return “return code” = ZERO to terminate the conversation. */ #include <stddef.h> #include <stdlib.h> #include <stdio.h> typedef unsigned short Int16; typedef unsigned char Int8; typedef unsigned long int Int32; /* PASSING parameter structures */ typedef struct { Int32 code; Int32 len; Int8 buf[80]; } inmodbuf; typedef struct { Int32 seq; Int16 len; char param[80]; } inmodpty; static FILE *IN; static int count=0; char *memcpy(); void _dynamn(a,b) inmodbuf *a; inmodpty *b; {int code=0; char tempbuf[80]; memcpy(tempbuf,a->buf,sizeof(a->buf)); tempbuf[79]=’\0’; printf(“BEGIN--> %d %d %s\n”,a->code,a->len,tempbuf); printf(“ +++ %d %d %s\n”,b->seq ,b->len,b->param); code= (int) a->code; switch (code) { 258 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD case 0: /* Here you open the file and read the first record */ printf(“## CODE=0, openinig...\n”); IN=fopen(“ddn:INDATA”,“rb”); if (! ferror(IN)) { if (! readrecord(a)) fclose(IN); }; break; case 1: /* TPump requested next record, read it */ printf(“## CODE=1, reading...\n”); if (! readrecord(a)) fclose(IN); break; case 5: /* TPump is closing INMOD routine */ a->code=0; a->len=0; printf(“## CODE=5, terminating...\n”); break; default: a->code=12; /* any number not = to zero */ a->len=0; printf(“##### UNKNOWN code ######\n”);a->code=0;a->len=0; }; memcpy(tempbuf,a->buf,sizeof(a->buf)); tempbuf[79]=’\0’; printf(“END --> %d %d %s\n”,a->code,a->len,tempbuf); printf(“ +++ %d %d %s\n”,b->seq ,b->len,b->param); } int readrecord(a) inmodbuf *a; { int rtn=0; char tempbuf[80]; if (fread((char *)&(a->buf),sizeof(a->buf),1,IN)) { count++; memcpy(tempbuf,a->buf,sizeof(a->buf)); tempbuf[79]=’\0’; printf(“ %d %s \n”,count,tempbuf); a->len=80; a->code=0; rtn=1; }; if ferror(IN) { printf(“==== error ====\n”); a->code=16; /* any non zero number */ a->len=0; }; if feof(IN) { /* EOF, set length = zero */ printf(“=== EOF ===\n”); a->code=9; a->len=9; Teradata Parallel Data Pump Reference 259 Appendix C: INMOD and Notify Exit Routine Examples PL/I INMOD }; return(rtn); } ## //LKED EXEC PGM=LINKEDIT,PARM=’LIST,MAP’,COND=(8,LT,C) //SYSPRINT DD SYSOUT=*,DCB=(RECFM=FBA,LRECL=121,BLKSIZE=1210) //SYSTERM DD SYSOUT=*00153 //SYSLIN DD DSN=*.C.SYSLIN,DISP=(OLD,PASS),VOL=REF=*.C.SYSLIN // DD DDNAME=SYSIN //SYSLIB DD DSN=TER2.SASC301H.SUBLIB,DISP=SHR //SYSUT1 DD DSN=&&SYSUT1,UNIT=SYSDA,DCB=BLKSIZE=1024, // SPACE=(1024,(200,50))00158 //SYSLMOD DD DSN=JCK.INMOD.LOAD(INMODG1),DISP=MOD,UNIT=3380, // VOLUME=SER=TSO805 //SYSIN DD DATA,DLM=## NAME INMODG1(R) ## //BDLDEL EXEC PGM=IEFBR1400164 //BDLCAT EXEC PGM=TPUMP //STEPLIB DD DSN=STV.GG00.APP.L,DISP=SHR // DD DSN=STV.TG00.APP.L,DISP=SHR // DD DSN=STV.RG00.APP.L,DISP=SHR //SYSPRINT DD SYSOUT=*00171 // UNIT=SYSDA,DCB=(RECFM=F,DSORG=PS,LRECL=8244), // SPACE=(8244,(12,5)) //SYSIN DD * //******************************************************************* //* THIS STEP WILL ONLY DROP THE TABLES IF TPump NOT IN APPLY PHASE * //******************************************************************* //CREATE EXEC BTEQ //STEPLIB DD DSN=STV.GG00.APP.L,DISP=SHR // DD DSN=STV.TG00.APP.L,DISP=SHR // DD DSN=STV.RG00.APP.L,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSABEND DD SYSOUT=* //SYSIN DD DATA,DLM=## .LOGON TDQ8/DBC,DBC; DROP TABLE XXXX.LOGTABLE; DROP TABLE XXXX.ET_INMODLC1; DROP TABLE XXXX.UV_INMODLC1; DROP TABLE XXXX.WT_INMODLC1; .QUIT; .LABEL NODROP; .EXIT 4; ## //****************************************************************** //* * //* RUN TPump * //* * //****************************************************************** //LOADIT EXEC PGM=TPump //STEPLIB DD DISP=SHR,DSN=STV.GG10.APP.L // DD DISP=SHR,DSN=STV.GG00.APP.L // DD DISP=SHR,DSN=STV.TG00.APP.L // DD DISP=SHR,DSN=STV.RG00.APP.L // DD DISP=SHR,DSN=TER2.SASC301H.LINKLIB // DD DISP=SHR,DSN=JCK.INMOD.LOAD,VOLUME=SER=TSO805, // UNIT=338 //SYSPRINT DD SYSOUT=* 260 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples C INMOD - UNIX //SYSTERM DD SYSOUT=* //SYSOUT DD SYSOUT=* //SYSIN DD DATA,DLM=## .LOGTABLE XXXX.LOGTABLE; .LOGON TDQ8/XXXX,XXXX; /* TEST DATAIN, DATALOC */ DROP TABLE XXXX.INMODLC1; CREATE TABLE INMODLC1 (F1 CHAR(10), F2 CHAR(70)); .BEGIN LOAD TABLES INMODLC1; .Layout layname1; .Field L1Fld1 1 Char(10); .Field L1Fld2 * Char(70); .DML Label DML1; INSERT INMODLC1(F1,F2) VALUES (:L1FLD1, :L1FLD2); .IMPORT INMOD INMODG1 USING (“AAA” “BBB”) LAYOUT LAYNAME1 APPLY DML1; .End LOAD; .LOGOFF; ## //INDATA DD DATA,DLM=## 01C AAAAAAAAAAAAAAAA 02C BBBBBBBBBBBBBBBB 03C CCCCCCCCCCCCCCCC 04C DDDDDDDDDDDDDDDD00229 ## //SELECT EXEC BTEQ //STEPLIB DD DSN=STV.GG00.APP.L,DISP=SHR // DD DSN=STV.TG00.APP.L,DISP=SHR // DD DSN=STV.RG00.APP.L,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSABEND DD SYSOUT=* //SYSIN DD DATA,DLM=## .LOGON TDQ8/XXXX,XXXX; SELECT * FROM INMODLC1; .LOGOFF; ## C INMOD - UNIX /* This program is for TPump INMOD testing using C user exit routine. When this routine is activated it looks at the content of the function code passed (a->code) and depending on its value, it 0) initializes, i.e., opens a file, etc... 1) reads a record 5) acknowledges “close inmod” request. The user exit routine must return “return code”(a->code) and “length” (a->len). You should send return code = zero when no errors occur and non-zero for an error. TPump expects length = zero at the end of file. Then it sends “CLOSE INMOD” request. THE USER EXIT routine must explicitly return “return code” = ZERO to terminate the conversation. */ #include <stddef.h> #include <stdlib.h> #include <stdio.h> typedef unsigned short Int16; typedef unsigned char Int8; typedef unsigned long int Int32; Teradata Parallel Data Pump Reference 261 Appendix C: INMOD and Notify Exit Routine Examples C INMOD - UNIX /* PASSING parameter structures */ typedef struct { Int32 code; Int32 len; Int8 buf[80]; } inmodbuf; typedef struct { Int32 seq; Int16 len; char param[80]; } inmodpty; static FILE *IN; static int count=0; char *memcpy(); void _dynamn(a,b) inmodbuf *a; inmodpty *b; {int code=0; char tempbuf[80]; memcpy(tempbuf,a->buf,sizeof(a->buf)); tempbuf[79]=’\0’; printf(“BEGIN--> %d %d %s\n”,a->code,a->len,tempbuf); printf(“ +++ %d %d %s\n”,b->seq ,b->len,b->param); code= (int) a->code; switch (code) { case 0: /* Here you open the file and read the first record */ printf(“## CODE=0, openinig...\n”); IN=fopen(“ddn:INDATA”,“rb”); if (! ferror(IN)) { if (! readrecord(a)) fclose(IN); }; break; case 1: /* TPump requested next record, read it */ printf(“## CODE=1, reading...\n”); if (! readrecord(a)) fclose(IN); break; case 5: /* TPump is closing INMOD routine */ a->code=0; a->len=0; printf(“## CODE=5, terminating...\n”); break; default: a->code=12; /* any number not = to zero */ a->len=0; printf(“##### UNKNOWN code ######\n”);a->code=0;a->len=0; }; memcpy(tempbuf,a->buf,sizeof(a->buf)); 262 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples Sample Notify Exit Routine tempbuf[79]=’\0’; printf(“END --> %d printf(“ +++ %d %d %d %s\n”,a->code,a->len,tempbuf); %s\n”,b->seq ,b->len,b->param); } int readrecord(a) inmodbuf *a; { int rtn=0; char tempbuf[80]; if (fread((char *)&(a->buf),sizeof(a->buf),1,IN)) { count++; memcpy(tempbuf,a->buf,sizeof(a->buf)); tempbuf[79]=’\0’; printf(“ %d %s \n”,count,tempbuf); a->len=80; a->code=0; rtn=1; }; if ferror(IN) { printf(“==== error ====\n”); a->code=16; /* any non zero number */ a->len=0; }; if feof(IN) { /* EOF, set length = zero */ printf(“=== EOF ===\n”); a->code=9; a->len=9; }; return(rtn); } Sample Notify Exit Routine The following is the listing of tldnfyxt.c, the sample notify exit routine that is provided with TPump software. /********************************************************************* * * * tldnfyxt.c - Sample Notify Exit for Tpump. * * * * Copyright 1997-2007, NCR Corporation. ALL RIGHTS RESERVED. * * * * Purpose - This is a sample notify exit for Tpump . * * * * Execute - Build Notify on a Unix system * * compile and link into shared object * * cc -G tldnfyxt.c - o libtldnfyxt.so * * - Build Notify on a Win32 system * * compile and link into dynamic link library * * cl /DWIN32 /LD tldnfyxt.c * * - Build Notify on AIX system * Teradata Parallel Data Pump Reference 263 Appendix C: INMOD and Notify Exit Routine Examples Sample Notify Exit Routine * cc -c -brtl -qalign=packed tldnfyxt.c * * ld -G -e_dynamn -bE:export_dynamn.txt tldnfyxt.o * * -o libtldnfyxt.so -lm -lc * * where export_dynamn.txt conaints the symbol "_dynamn"* * - Build Notify on Linux system * * gcc -shared -fPIC tldnfyxt.c -o libtldnfyxt.so * * * /********************************************************************* #include <stdio.h> typedef unsigned long UInt32; #define NOTIFYID_FASTLOAD 1 #define NOTIFYID_MULTILOAD 2 #define NOTIFYID_FASTEXPORT 3 #define NOTIFYID_BTEQ 4 #define NOTIFYID_TPUMP 5 #define #define #define #define #define #define MAXVERSIONIDLEN 32 MAXUTILITYNAMELEN 36 MAXUSERNAMELEN 64 MAXUSERSTRLEN 80 MAXTABLENAMELEN 128 MAXFILENAMELEN 256 typedef enum { NMEventInitialize NMEventFileInmodOpen NMEventCkptBeg NMEventImportBegin NMEventImportEnd NMEventErrorTable NMEventDBSRestart NMEventCLIError NMEventDBSError NMEventExit NMEventTableStats NMEventCkptEnd NMEventRunStats NMEventDMLError } NfyTLDEvent; = = = = = = = = = = = = = = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 #define TIDUPROW 2816 typedef enum { DEFeedbackDefault = 0, DEFeedbackNoLogging = 1 } DMLErrorFeedbackType; typedef struct _TLNotifyExitParm { long Event; /* should be NfyTLDEvent values */ union { struct { int VersionLen; char VersionId[MAXVERSIONIDLEN]; int UtilityId; int UtilityNameLen; char UtilityName[MAXUTILITYNAMELEN]; int UserNameLen; char UserName[MAXUSERNAMELEN]; int UserStringLen; 264 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples Sample Notify Exit Routine char UserString[MAXUSERSTRLEN]; } Initialize; struct { int nImport; } ImpStart; struct { UInt32 FileNameLen; char FileOrInmodName[MAXFILENAMELEN]; UInt32 nImport; } FileOpen ; struct { unsigned long Records; } CheckPt; struct { char *TableName; unsigned long Rows; } ETDrop ; struct { long ReturnCode; } Exit; struct { int nImport; unsigned long RecsIn; unsigned long RecsSkipped; unsigned long RecsRejd; unsigned long RecsOut; unsigned long RecsError; } Complete; struct { char type; char *dbasename; char *szName; unsigned long Activity; } TableStats; struct { UInt32 ErrorCode; } DBSError; struct { UInt32 ErrorCode; } CLIError; struct { int nImport; unsigned long nSQLstmt; unsigned long nReqSent; unsigned long RecsIn; unsigned long RecsSkipped; unsigned long RecsRejd; unsigned long RecsOut; unsigned long RecsError; } Stats; struct { UInt32 nImport; UInt32 ErrorCode; char *ErrorMsg; UInt32 nRecord; unsigned char nApplySeq; unsigned char nDMLSeq; unsigned char nSMTSeq; char *ErrorData; Teradata Parallel Data Pump Reference 265 Appendix C: INMOD and Notify Exit Routine Examples Sample Notify Exit Routine UInt32 ErrorDataLen; UInt32 *feedback; } DMLError; } Vals; } TLNotifyExitParm; #ifdef I370 #define TLNfyExit MLNfEx #endif extern long TLNfyExit( #ifdef __STDC__ TLNotifyExitParm *Parms #endif ); #ifdef WIN32 /* Change for WIN32 */ __declspec(dllexport) long _dynamn(TLNotifyExitParm *P) #else long _dynamn(P) TLNotifyExitParm *P; #endif { FILE *fp; int i; if (!(fp = fopen("NFYEXIT.OUT", "a"))) return(1); switch(P->Event) { case NMEventInitialize : fprintf(fp, "exit called @ Tpump init.\n"); fprintf(fp, "Version: %s\n", P->Vals.Initialize.VersionId); P->Vals.Initialize.UtilityName[MAXUTILITYNAMELEN] = '\0'; fprintf(fp, "Utility: %s\n", P->Vals.Initialize.UtilityName); fprintf(fp, "User: %s\n", P->Vals.Initialize.UserName); if (P->Vals.Initialize.UserStringLen) fprintf(fp, "UserString: %s\n", P->Vals.Initialize.UserString); break; case NMEventFileInmodOpen : fprintf(fp, "Exit called @ File/Inmod Open\n" "File/Inmod Name : %s Import : %d\n", P->Vals.FileOpen.FileOrInmodName, P->Vals.FileOpen.nImport); break; case NMEventCkptBeg : fprintf(fp,"exit called @ checkpoint begin : %u Records .\n", P->Vals.CheckPt.Records); break; case NMEventCkptEnd : fprintf(fp,"exit called @ checkpoint End : P->Vals.CheckPt.Records); break; %u Records Sent.\n", case NMEventCLIError : fprintf(fp, "exit called @ CLI error %d\n", P->Vals.CLIError.ErrorCode); break; 266 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples Sample Notify Exit Routine case NMEventErrorTable : fprintf(fp,"exit called @ Error Table : %s " "%u logable records.\n", P->Vals.ETDrop.TableName, P->Vals.ETDrop.Rows); break; case NMEventDBSError : fprintf(fp, "exit called @ DBS error %d\n", P->Vals.DBSError.ErrorCode); break; case NMEventImportBegin: /*DR51679 event name should be consistent */ fprintf(fp, "exit called @ import %d starting. \n", P->Vals.ImpStart.nImport); break; case NMEventImportEnd : /*DR51679 event name should be consistent */ fprintf(fp, "exit called @ import %d ending \n", P->Vals.Complete.nImport); fprintf(fp, "Total Records Read: %u \nRecords Skipped " "%u \nUnreadable Records:%u \nRecords Sent: " "%u \nData Errors : %u \n", P->Vals.Complete.RecsIn, P->Vals.Complete.RecsSkipped, P->Vals.Complete.RecsRejd, P->Vals.Complete.RecsOut, P->Vals.Complete.RecsError); break; case NMEventDBSRestart : fprintf(fp, "exit called @ RDBMS restarted\n"); break; case NMEventExit : fprintf(fp, "exit called @ tpump notify out of scope:" " return code %d.\n", P->Vals.Exit.ReturnCode); break; case NMEventTableStats: fprintf(fp,"exit called @ Table Stats: \n"); if(P->Vals.TableStats.type == 'I') fprintf(fp,"Rows Inserted : " "%u \nTable/Macro Name : %s \nDatabase Name" " : %s \n", P->Vals.TableStats.Activity, P->Vals.TableStats.szName, P->Vals.TableStats.dbasename); if(P->Vals.TableStats.type == 'U') fprintf(fp,"Rows Updated : " "%u \nTable/Macro Name : %s \nDatabase Name" " : %s \n", P->Vals.TableStats.Activity, P->Vals.TableStats.szName, P->Vals.TableStats.dbasename); if(P->Vals.TableStats.type == 'D') fprintf(fp,"Rows Deleted : " "%u \nTable/Macro Name : %s \nDatabase Name" Teradata Parallel Data Pump Reference 267 Appendix C: INMOD and Notify Exit Routine Examples Sample Notify Exit Routine " : %s \n", P->Vals.TableStats.Activity, P->Vals.TableStats.szName, P->Vals.TableStats.dbasename); break; case NMEventRunStats : fprintf(fp, "exit called @ states\n"); fprintf(fp, "import %d \n", P->Vals.Stats.nImport); fprintf(fp, "Total SQL Statements: %u \nRequest Sent: %u \n" "Records Read: %u \nRecords Skipped: %u \n" "nUnreadable Records: %u \nRecords Sent: %u \n" "Data Errors : %u \n", P->Vals.Stats.nSQLstmt, P->Vals.Stats.nReqSent, P->Vals.Stats.RecsIn, P->Vals.Stats.RecsSkipped, P->Vals.Stats.RecsRejd, P->Vals.Stats.RecsOut, P->Vals.Stats.RecsError); break; case NMEventDMLError : fprintf(fp, "exit called @ DML error \n"); fprintf(fp, "import %d \n", P->Vals.DMLError.nImport); fprintf(fp, "Error code: %u \nError text: %s \n" "Record number: %u \nApply number: %d \n" "DML number: %d \nStatement number: %d \n" "Error data length : %u \n" "feedback : %u \n", P->Vals.DMLError.ErrorCode, P->Vals.DMLError.ErrorMsg, P->Vals.DMLError.nRecord, P->Vals.DMLError.nApplySeq, P->Vals.DMLError.nDMLSeq, P->Vals.DMLError.nSMTSeq, P->Vals.DMLError.ErrorDataLen, *(P->Vals.DMLError.feedback)); fprintf(fp, "Error data: "); for (i=0 ;i<P->Vals.DMLError.ErrorDataLen; i++) { fprintf(fp, "%c", P->Vals.DMLError.ErrorData[i]); } fprintf(fp, "\n"); if (P->Vals.DMLError.ErrorCode == TIDUPROW) { *(P->Vals.DMLError.feedback) = DEFeedbackNoLogging; fprintf(fp, "Returning feedback = %u \n", DEFeedbackNoLogging); } break; default : fprintf(fp,"\nAn Invalid Event Passed to the Exit Routine\n"); break; } 268 Teradata Parallel Data Pump Reference Appendix C: INMOD and Notify Exit Routine Examples Sample Notify Exit Routine fclose(fp); return(0); } Teradata Parallel Data Pump Reference 269 Appendix C: INMOD and Notify Exit Routine Examples Sample Notify Exit Routine 270 Teradata Parallel Data Pump Reference Glossary Numeric 24x7 Lights Out Operations: The use of Systems Management tools to ensure the reliable movement and update of data from operational systems to analytical systems. 2PC: Two-Phase Commit A abend: Abnormal END of task. Termination of a task prior to its completion because of an error condition that cannot be resolved by the recovery facilities that operate during execution. ABORT: In Teradata SQL, a statement that stops a transaction in progress and backs out changes to the database only if the conditional expression associated with the abort statement is true. Access Lock: A lock that allows selection of data from a table that may be locked for write access. The Teradata MultiLoad utility maintains access locks against the target tables during the Acquisition Phase. Access Module: A software component that provides a standard set of I/O functions to access data on a specific device. Access Module Processor (AMP): A virtual processor that receives steps from a parsing engine (PE) and performs database functions to retrieve or update data. Each AMP is associated with one virtual disk, where the data is stored. An AMP manages only its own virtual disk and not the virtual disk of any other AMP. access right: A user’s right to perform the Teradata SQL statements granted to him against a table, database, user, macro, or view. Also known as privilege. account: The distinct account name portion of the system account strings, excluding the performance group designation. Accounts can be employed wherever a user object can be specified. Acquisition Lock: A lock that is a flag in the table header that effectively rejects certain types of Teradata SQL access statements. An acquisition lock allows all concurrent DML access and the DROP DDL statement, and rejects DDL statements other than DROP. Acquisition Phase: Responsible for populating the primary data subtables of the work tables. Data are received from the host, converted into internal format, and inserted into the work tables. The work tables will be sorted at the end of the Acquisition Phase and prior to the Application Phase. Teradata Parallel Data Pump Reference 271 Glossary action definition: attributes. A logical action consisting of a single physical action and related active data warehouse (ADW): An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships quickly, efficiently and proactively. Active data warehousing is about integrating advanced decision support with day-to-day, even minute-to-minute decision making that increases quality which encourages customer loyalty and thus secures an organization's bottom line. The market is maturing as it progresses from first-generation “passive” decision-support systems to current- and next-generation “active” data warehouse implementations. Active Database: Active database systems integrate event-based rule processing with traditional database functionality. The behavior of the database is achieved through a set of Event-Condition-Action rules associated with the database. When an event is detected the relevant rules fire. Firing of a rule implies evaluating a condition on the database and carrying out the corresponding action. An active database system derives its power from the variety of events it can respond to and the kind of actions it can perform in response. Ad Hoc Query: issued. Any query that cannot be determined prior to the moment the query is administrator: A special user responsible for allocating resources to a community of users. Aggregation: Used in the broad sense to mean aggregating data horizontally, vertically, and chronologically. all joins: In Teradata SQL, a join is a SELECT operation that allows you to combine columns and rows from two or more tables to produce a result. Join types restricted by DWM are: inner join, outer join, merge join, product join, and all joins. All joins are a combination of the above types, depending on how the user selects the information to be returned. In addition to the four types listed above, selecting all joins may include an exclusion join, nested join, and RowID join. allocation group: (AG) A set of parameters that determine the amount of resources available to the sessions assigned to a PG referencing a specific AG. Has an assigned weight that is compared to other AG weights. An AG can limit the total amount of CPU used by sessions under its control. AMP: Access Module Processor (UNIX-based systems), a type of virtual processor (vproc) that controls the management of the Teradata Database and the disk subsystem, with each AMP being assigned to a virtual disk (vdisk). For more information, see the Introduction to Teradata Warehouse. AMP worker task: (AWT) Processes (threads on some platforms) dedicated to servicing the Teradata Database work requests. For each AMP vproc, a fixed number of AWTs are preallocated during Teradata Database initialization. Each AWT looks for a work request to arrive in the Teradata Database, services the request, and then looks for another. An AWT can process requests of any work type. Each Teradata Database query is composed of a series of work requests that are performed by AWTs. Each work request is assigned a work type 272 Teradata Parallel Data Pump Reference Glossary indicating when the request should be executed relative to other work requests waiting to execute. Analytical Data Store: Useful in making strategic decisions, this data storage area maintains summarized or historical data. This stored data is time variant, unlike operational systems which contain real-time data. Information contained in this data store is determined and collected based on the corporate business rules. ANSI: American National Standards Institute. ANSI maintains a standard for SQL. For information about Teradata compliance with ANSI SQL, see the SQL Reference: Fundamentals. AP: Application Processor APE: Alert Policy Editor. Use this Teradata Manager component to define alert policies: create actions, set event thresholds, assign actions to events, and apply the policy to the Teradata Database. APH: Alternate Parcel Header. Application Lock: A flag set in the table header of a target table indicating that the Application Phase is in progress. An application lock allows all concurrent access lock select access and the DROP DDL statement, and rejects all other DML and DDL statements. Application Lifecycle: Includes the following three stages: • process and change management • analysis and design • construction and testing Application Phase: Responsible for turning rows from a work table into updates, deletes, and inserts and applying them to a single target table. APRC: Application Processor Reset Containment API: Application Program Interface. An interface (calling conventions) by which an application program accesses an operating system and other services. An API is defined at source code level and provides a level of abstraction between the application and the kernel (or other privileged utilities) to ensure the portability of the code. An API can also provide an interface between a high level language and lower level utilities and services written without consideration for the calling conventions supported by compiled languages. In this case, the API may translate the parameter lists from one format to another and the interpret call-by-value and call-by-reference arguments in one or both directions. Architecture: A definition and preliminary design which describes the components of a solution and their interactions. An architecture is the blueprint by which implementers construct a solution which meets the users’ needs. ARCMAIN: ARC executable that extracts (or inserts) database headers and data rows from the HUT (Host UTility) archive interface. Teradata Parallel Data Pump Reference 273 Glossary ASCII: American Standard Code for Information Interchange, a character set used primarily on personal computers. Availability: A measure of the percentage of time that a computer system is capable of supporting a user request. A system may be considered unavailable as a result of events such as system failures or unplanned application outages. B B Tree: An indexing technique in which pointers to data are kept in a structure such that all referenced data is equally accessible in an equal time frame. BAR: Backup and restore; also referred to as Backup/Archive/Restore; a software and hardware product set. BLOB: An acronym for binary large object. A BLOB is a large database object that can be anything that doesn’t require character set conversion. This includes MIDI, MP3, PDF, graphics and much more. BLOBs can be up to 2 GB in size. BTEQ: Basic Teradata Query facility. A utility that allows users on a workstation to access data on a Teradata Database, and format reports for both print and screen output. Business-Driven: An approach to identifying the data needed to support business activities, acquiring or capturing those data, and maintaining them in a data resource that is readily available. bypass objects: Specific users, groups, and accounts can be set up to circumvent DWM query management by declaring them to be bypassed. Basically, this turns off the DWM query checking mechanism for all of the requests issued by those users and/or using those accounts. C Call-Level Interface Version 2 (CLIv2): A collection of callable service routines that provide an interface to the Teradata Database. Specifically, CLI is the interface between the application program and the Micro Teradata Directory Program (for network-attached clients). CLI builds parcels that MTDP packages for sending to the Teradata Database using the Micro Operating System Interface (for network-attached clients), and provides the application with a pointer to each of the parcels returned from the Teradata Database. Capture: The process of capturing a production data source. cardinality: In set theory, cardinality refers to the number of members in the set. When specifically applied to database theory, the cardinality of a table refers to the number of rows contained in a table. CASE: Computer Aided Software Engineering. Change Data Capture: The process of capturing changes made to a production data source. Change data capture is typically performed by reading the source DBMS log. It consolidates units of work, ensures data is synchronized with the original source, and reduces data volume in a data warehousing environment. 274 Teradata Parallel Data Pump Reference Glossary channel-attached: A mainframe computer that communicates with a server (for example, a Teradata RDBMS) through a channel driver. Character Set: A grouping of alphanumeric and special characters used by computer systems to support different user languages and applications. Various character sets have been codified by the American National Standards Institute (ANSI). Checkpoint Rate: The interval between checkpoint operations during the Acquisition Phase of a MultiLoad import task expressed as either the number of rows read from your client system or sent to the Teradata Database, or an amount of time, in minutes. CICS: Customer Information Control System CLI: Call-Level Interface. The interface between the application program and the MTDP (for network-attached clients) or TDP (for channel-attached clients). CLIv2 refers to version two of the interface. Client: A computer that can access the Teradata Database. CLIv2: Call-Level Interface Version 2. The interface between the application program and the MTDP (for network-attached clients) or TDP (for channel-attached clients). CLIv2so: Call-Level Interface Version 2 Shared Object (CLIv2so); this program installs the CLI libraries required by other utilities. When the CLIv2so program submits a request to a Teradata Database, CLI Library components transform the request into Teradata Database formats. The CLI Library sends requests to, and receives responses from, the Teradata Database over a network. client-server environment: The distribution of work on a LAN in which the processing of an application is divided between a front-end client and a back-end server, resulting in faster, more efficient processing. The server performs shared functions such as managing communication and providing database services. The client performs individual user functions such as providing customized interfaces, performing screen-to-screen navigation, and offering help functions. CMS: Conventional Monitor System CLOB: An acronym for character large object. A CLOB is a pure character-based large object in a database. It can be a large text file. HTML, RTF or other character-based file. CLOBs can be up 2 GB in size. Also see BLOB and LOB. Cluster: Logical, table-level archive whereby only those rows residing on specific AMPs, and which are members of the specified cluster, are archived onto a single tape data set. This allows multiple jobs to be applied for backup of large tables, to reduce the backup window. This method is used to affect a parallel archive/restore operation via a “divide and conquer” backup strategy. COBOL: Common Business-Oriented Language Coexistence System: Teradata Parallel Data Pump Reference A Teradata system running on mixed platforms 275 Glossary column: In the relational model of Teradata SQL, databases consist of one or more tables. In turn, each table consists of fields, organized into one or more columns by zero or more rows. All of the fields of a given column share the same attributes. COP: Communications Processor. One kind of interface processor (IFP) on the Teradata Database. A COP contains a gateway process for communicating with workstations via a network. COP Interface: Workstation-resident software and hardware, and Teradata Databaseresident software and hardware, that allows workstations and the Teradata Database to communicate over networks. CPU: Central processing unit. D DASD: Direct access storage device (pronounced DAZ-dee). A general term for magnetic disk storage devices that has historically been used in the mainframe and minicomputer (midrange computer) environments. When used, it may also include hard disk drives for personal computers. A recent form of DASD is the redundant array of independent disks (RAID). The “direct access” means that all data can be accessed directly in about the same amount of time rather than having to progress sequentially through the data. database: A related set of tables that share a common space allocation and owner. A collection of objects that provide a logical grouping for information. The objects include, tables, views, macros, triggers, and stored procedures. Data Cardinality: Cardinality is a property of data elements which indicates the number of allowable entries in the element. A data element such as gender only allows two entries (male or female) and is said to possess low cardinality. Data elements for which many allowable entries are possible, such as age or income are said to have high cardinality. Data Definition Language (DDL): In Teradata SQL, the statements and facilities that manipulate database structures (such as CREATE, MODIFY, DROP, GRANT, REVOKE, and GIVE) and the Data Dictionary information kept about those structures. In the typical, prerelational data management system, data definition and data manipulation facilities are separated, and the data definition facilities are less flexible and more difficult to use than in a relational system. Data Dictionary: In the Teradata Database, the information automatically maintained about all tables, views, macros, databases, and users known to the Teradata Database system, including information about ownership, space allocation, accounting, and access right relationships between those objects. Data Dictionary information is updated automatically during the processing of Teradata SQL data definition statements, and is used by the parser to obtain information needed to process all Teradata SQL statements. data loading: The process of loading data from a client platform to a Teradata RDBMS server. For TPump, data loading includes any combination of INSERT, UPDATE, DELETE, and/or UPSERT operations. 276 Teradata Parallel Data Pump Reference Glossary data manipulation: In Teradata SQL, the statements and facilities that change the information content of the database. These statements include INSERT, UPDATE, and DELETE. Data Mart: A type of data warehouse designed to meet the needs of a specific group of users such as a single department or part of an organization. Typically a data mart focuses on a single subject area such as sales data. Data marts may or may not be designed to fit into a broader enterprise data warehouse design. Data Mining: A process of analyzing large amounts of data to identify hidden relationships, patterns, and associations. Data Model: A logical map that represents the inherent properties of the data independent of software, hardware, or machine performance considerations. The model shows data elements grouped into records, as well as the association around those records. Data Synchronization: The process of identifying active data replicates and ensuring that data concurrency is maintained. Also known as data version synchronization or data version concurrency because all replicated data values are consistent with the same version as the official data. Data Scrubbing: The process of filtering, merging, decoding, and translating source data to create validated data for the data warehouse. data streams: Buffers in memory for temporarily holding data. A data stream is not a physical file; instead, it is more like a pipe (in UNIX or Windows), or a batch pipe in MVS. Data Warehouse: A subject oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision making process. A repository of consistent historical data that can be easily accessed and manipulated for decision support. DB2: IBM DATABASE 2 DBA: Database Administrator DBQL: Database Query Log. DBQL are a series of system tables created in the DBC database during the Teradata Database installation process. They are used to track query processing. See Database Administration to learn more about the DBQL DD: Data dictionary or data definition. DDL: Data definition language, which supports manipulating database structures and the Data Dictionary information kept about these structures. DDL operator: The DDL operator is a stand-alone operator that allows you to perform any necessary database routines prior to a load/apply job without having to use another utility such as BTEQ. For example, you can create tables or indexes, or drop tables, as needed, before starting a load/apply job. As a stand-alone operator, supporting only one instance, the DDL operator does not send or retrieve data to or from a Teradata TPump operator interface. Teradata Parallel Data Pump Reference 277 Glossary DEFINE Statement: A statement preceding the INSERT statement that describes the fields in a record before the record is inserted in the table. This statement is similar to the SQL USING clause. Delete Task: A task that uses a full file scan to remove a large number of rows from a single Teradata Database table. A delete task is composed of three major phases: Preliminary, Application, and End. The phases are a collection of one or more transactions that are processed in a predefined order according to the Teradata MultiLoad protocol. delimiter: In Teradata SQL, a punctuation mark or other special symbol that separates one clause in a Teradata SQL statement from another, or that separates one Teradata SQL statement from another. DIT: Directory Information Tree. A graphical display of an organization's directory structure, sites, and servers, shown as a branching structure. The top-level (root) directory usually represents the organization level. DLL: Dynamic-link library. A feature of the Windows family of operating systems that allows executable routines to be stored separately as files with .dll extensions and to be loaded only when needed by a program. DML: Data manipulation language. In Teradata SQL, the statements and facilities that manipulate or change the information content of the database. These statements include SELECT, INSERT, UPDATE, and DELETE. domain name: A group of computers whose host names (the unique name by which a computer is known on a network) share a common suffix, that is the domain name. Drill down: of data. A method of exploring detailed data that was used in creating a summary level DSN: Digital Switched Network. The completely digital version of the PSTN. Dual Active System: A dual active system is comprised of two active database systems that operate in tandem and serve the needs of both the production and development environments. Dual active systems virtually eliminate all down time and provide seamless disaster recovery protection for critical users and applications. Duplicate Row Check: A logic within the Teradata Database used to check for duplicate rows while processing each primary data row for INSERTs and UPDATEs. DWM: Dynamic Workload Manager. The product described in this document, which manages access to the Teradata Database. EBCDIC: Extended binary coded decimal interchange code. An IBM code that uses 8 bits to represent 256 possible characters. It is used primarily in IBM mainframes, whereas personal computers use ASCII. E-CLI: 278 Extended Call-Level Interface Teradata Parallel Data Pump Reference Glossary Error Tables: Tables created during the Preliminary Phase used to store errors detected while processing a Teradata MultiLoad job. There are two error tables, ET and UV, that contains errors found during the Acquisition Phase and Application Phase, respectively. EOF: End of File ETL: Extract, transform, and load EUC: Extended UNIX Code. Extended UNIX Code (EUC) for Japanese and TraditionalChinese defines a set of encoding rules that can support from 1 to 4 character sets. exclusion join: In Teradata SQL, a product join or merge join where only the rows that do not satisfy (are NOT in) the conditional specified in the SELECT are joined. Exclusive Lock: Supports the manual recovery procedure when a RELEASE MLOAD statement is executed after a Teradata MultiLoad task has been suspended or aborted. execution time frame: waiting to run. A period of time when DWM can execute scheduled requests that are Extract: The process of copying a subset of data from a source to a target environment. Exit Routines: Specifies a predefined action to be performed whenever certain significant events occur during a Teradata MultiLoad job. F Failover: Failover is when Teradata QD switches from one connected system to another when an error occurs. Many factors affect how failover occurs. failure: Any condition that precludes complete processing of a Teradata SQL statement. Any failure will abort the current transaction. FastExport: Teradata FastExport utility. A program that quickly transfers large amounts of data from tables and views of the Teradata Database to a client-based application. FastLoad: Teradata FastLoad utility. A program that loads empty tables on the Teradata Database with data from a network-attached or channel-attached client. field: The basic unit of information stored in the Teradata Database. A field is either null, or has a single numeric or string value. See also column, database, row, table. FIFO: First In first out queue. FIPS: Federal Information Processing Standards filter operator: operators. A type of operator that performs filtering on data en route from other Flat File As a noun, an ASCII text file consisting of records of a single type, in which there is no embedded structure information governing relationships between records. As an adjective, describes a flattened representation of a database as single file from which the structure could implicitly be rebuilt. Teradata Parallel Data Pump Reference 279 Glossary A particular type of database structure, as opposed to relational. Foreign Key: The primary key of a parent data subject that is placed in a subordinate data subject. Its value identifies the data occurrence in the parent data subject that is the parent of the data occurrence in the subordinate data subject. Formatted Records: See Records. Function: User Defined Functions (UDF) are extensions to Teradata SQL. Users can write UDFs to analyze and transform data already stored in their data warehouse in ways that are beyond the functionality of Teradata’s native functions. G Gateway: A device that connects networks having different protocols. global rule: Object Access and Query Resource rules can be specified as being global, that is, they apply to all objects, and therefore to all requests. When a rule is specified as being global, no query objects need be (or can be) associated with the rule because all objects are implicitly included. Care should be taken defining a global access rule, as it causes all requests to be rejected except those from the DBC user and any bypassed objects. Globally Distributed Objects (GDO): A data structure that is shared by all of the virtual processors in the Teradata Database system configuration. graphical user interface (GUI): The use of pictures rather than just words to represent the input and output of a program. A program with a GUI runs under a Windows operating system. The GUI displays certain icons, buttons, dialog boxes in its windows on the screen and the user controls it by moving a pointer on the screen (typically controlled by a mouse) and selecting certain objects by pressing buttons on the mouse. This contrasts with a command line interface where communication is by exchange of strings of text. GSS: Generic Security Services. An application level interface (API) to system security services. It provides a generic interface to services which may be provided by a variety of different security mechanisms. Vanilla GSS-API supports security contexts between two entities (known as “principals”). H heuristics: Statistics recommendations, based on general rules of thumb. HOSI: Acronym for hash-ordered secondary index. I IPT: I/Os Per Transaction import: This refers to the process of pulling system information into a program. To add system information from an external source to another system. The system receiving the data must support the internal format or structure of the data. 280 Teradata Parallel Data Pump Reference Glossary Import Task: A task that quickly applies large amounts of client data to one or more tables or views on the Teradata Database. Composed of four major phases: Preliminary, Acquisition, Application, and End. The phases are a collection of one or more transactions that are processed in a predefined order according to the Teradata MultiLoad protocol. An import task references up to five target tables. In-Doubt: A transaction that was in process on two or more independent computer processing systems when an interruption of service occurred on one or more of the systems. The transaction is said to be in doubt because it is not known whether the transaction was successfully processed on all of the systems. Information engineering: The discipline for identifying information needs and developing information systems that produce messages that provide information to a recipient. Information engineering is a filtering process that reduces masses of data to a message that provides information. INMOD: INput MODule, a program that administrators can develop to select, validate, and preprocess input data. INMOD Routine: User-written routines that Teradata MultiLoad and other load/export utilities use to provide enhanced processing functions on input records before they are sent to the Teradata Database. Routines can be written in C language (for network-attached platforms), or SAS/S, COBOL, PL/I or Assembler (for channel-attached platforms). A routine can read and preprocess records from a file, generate data records, read data from other database systems, validate data records, and convert data record fields. inner join: In Teradata SQL, a join operation on two or more tables, according to a join condition, that returns the qualifying rows from each table. instance: In object-oriented programming, refers to the relationship between an object and its class. The object is an instance of the class. In Teradata Parallel Transporter (Teradata PT), an instance is an occurrence of a fully defined Teradata PT operator, with its source and target data flows, number of sessions, and so on. Teradata PT can process multiple instances of operators. interface processor (IFP): Used to manage the dialog between the Teradata Database and the host. Its components consist of session control, client interface, the parser, the dispatcher, and the BYNET. One type of IFP is a communications processor (COP). A COP contains a gateway process for communicating with workstations via a network. Intermediary: A computer software process written by a third party which interfaces to one or more Teradata servers and initiates a change data capture or change data apply operation with replication services. internet protocol (IP): Data transmission standard; the standard that controls the routing and structure of data transmitted over the Internet. interval histogram: Interval histograms are a form of synopsis data structure. A synopsis data structure is a data structure that is substantially smaller than the base data it represents. Interval histograms provide a useful statistical profile of attribute values that characterize the Teradata Parallel Data Pump Reference 281 Glossary properties of that raw data. The Teradata Database uses interval histograms to represent the cardinalities and certain other statistical values and demographics of columns and indexes for all-AMPs sampled statistics and for full-table statistics. Each histogram is composed of a maximum of 100 intervals. I/O: Input/output. ISO: International Standards Organization J JES: Job Entry Subsystem (JES) is an MVS subsystem of the OS/390 and z/OS mainframe operating systems that manages jobs (units of work) that execute on the system. Each job is described to the operating system by system administrators or other users in job control language (JCL). There are two versions, JES2 and JES3. JES3 allows central control of the processing of jobs using a common work queue. Both OS/390 and z/OS provide an interactive menu for initiating and managing jobs. JCL: Job Control Language is a language for describing jobs (units of work) to the OS/390, z/OS, and VSE operating systems, which run on IBM's OS/390 and z800/900 large server (mainframe) computers. These operating systems allocate their time and space resources among the total number of jobs that have been started in the computer. Jobs in turn break down into job steps. All the statements required to run a particular program constitute a job step. Jobs are background (sometimes called batch) units of work that run without requiring user interaction (for example, print jobs). In addition, the operating system manages interactive (foreground) user requests that initiate units of work. In general, foreground work is given priority over background work. JIS: Japanese Industrial Standards specify the standards used for industrial activities in Japan. The standardization process is coordinated by Japanese Industrial Standards Committee and published through Japanese Standards Association. Job Script: A job script, or program, is a set of MultiLoad commands and Teradata SQL statements that make changes to specified target tables and views in the Teradata Database. These changes can include inserting new rows, updating the contents of existing rows, and deleting existing rows. join: A select operation that combines information from two or more tables to produce a result. L LAN: Local Area Network. LANs supported by Teradata products must conform to the IEEE 802.3 standard (Ethernet LAN). Least Used: Least used (-lu) in a command line parameter that tells Teradata QD to route queries to the least used database. Load operator: A consumer-type operator that emulates some of the functions of the FastLoad utility in the Teradata PT infrastructure. 282 Teradata Parallel Data Pump Reference Glossary LOB: An acronym for large object. A large object is a database object that is large in size. LOBs can be up to 2 gigabytes. There are two types of LOBs, CLOBs and BLOBs. CLOBs are character-based objects, BLOBs are binary-based objects. Locks: Teradata FastLoad automatically locks any table being loaded and frees a lock only after an END LOADING statement is entered. Therefore, access to a table is available when Teradata FastLoad completes. log: A record of events. A file that records events. Many programs produce log files. Often you will look at a log file to determine what is happening when problems occur. Log files have the extension “.log”. log stream: A log stream is a series of log messages defined in one message catalog and initiated from one originator. One originator may initiate several log streams (for example, if there are multiple operators in one originator). logical action: A named action that is defined on the Alert Policy Editor's Actions tab. Logical actions can be assigned to events in the alert policy. Logical Data Model: A data model that represents the normalized design of data needed to support an information system. Data are drawn from the common data model and normalized to support the design of a specific information system. Actual implementation of a conceptual module in a database. It may take multiple logical data models to implement one conceptual data model. loner value: A value that has a frequency greater than the total number of table rows divided by the maximum interval times 2. M MAPI: Messaging Application Programming Interface. A set of Microsoft-defined functions and interfaces that support E-mail capabilities. macro: A file that is created and stored on the Teradata RDBMS, and is executed in response to a Teradata SQL EXECUTE statement merge join: In Teradata SQL, the type of join that occurs when the WHERE conditional of a SELECT statement causes the system first to sort the rows of two tables based on a join field (specified in the statement), then traverse the result while performing a merge/match process. Metadata: Data about data. For example, information about where the data is stored, who is responsible for maintaining the data, and how often the data is refreshed. methods: In object-oriented programming, methods are the programming routines by which objects are manipulated. NFS: Network file system. MIB: Management Information Base Teradata Parallel Data Pump Reference 283 Glossary MOSI: Micro Operating System Interface. A library of routines that implement operating system dependent and protocol dependent operations on the workstation. MTDP: Micro Teradata Director Program. A library of routines that implement the session layer on the workstation. MTDP is the interface between CLI and the Teradata Database. MPP: Massively Parallel Processing multi-threading: An option that enables you to speed up your export and import operations with multiple connections. MultiLoad: Teradata MultiLoad utility. A command-driven utility that performs fast, highvolume maintenance functions on multiple tables and views of the Teradata Database. Multiset Tables: Tables that allow duplicate rows. MVS (Multiple Virtual Storage): One of the primary operating systems for large IBM computers. N name: A word supplied by the user that refers to an object, such as a column, database, macro, table, user, or view. nested join: In Teradata SQL, this join occurs when the user specifies a field that is a unique primary index on one table and which is in itself an index (unique/non-unique primary or secondary) to the second table. Network: In the context of the Teradata Database, a LAN (see LAN). network attached: A computer that communicates over the LAN with a server (for example, a Teradata RDBMS). NIC: Network Interface Card. NO REWIND: A tape device definition that prevents a rewind operation at either file open or file close. NO REWIND allows a program to access multiple files on a tape by leaving the tape positioned at the end of the current file at close, thus allowing the subsequent file to be easily accessed by the next open. notify exit: A user-defined exit routine that specifies a predefined action to be performed whenever certain significant events occur during a TPump job. For example, by writing an exit in C (without using CLIv2) and using the NotifyExit attribute in an operator definition, you can provide a routine to detect whether a TPump job succeeds or fails, how many records were loaded, what the return code is for a failed job, and so on. null: The absence of a value for a field. Nullif Option: This option allows the user to null a column in a table under certain conditions; it is only used in conjunction with DEFINE statements. 284 Teradata Parallel Data Pump Reference Glossary NUPI: Non-unique primary index; an NUPI is typically assigned to minor entities in the database. NUSI: Non-unique secondary index; an NUSI is efficient for range query access, while a unique secondary index (USI) is efficient for accessing a single value. O object: In object-oriented programming, a unique instance of a data structure defined according to the template provided by its class. Each object has its own values for the variables belonging to its class and can respond to the messages, or methods, defined by its class. object access rule: An Object Access filter allows you to define the criteria for limiting access to issuing objects and/or query objects. Queries that reference objects associated with the rule (either individually or in combination) during the specified dates and times are rejected. Global rules are not applicable for this type. object definition: The details of the structure and instances of the objects used by a given query. Object definitions are used to create the tables, views, and macros, triggers, join indexes, and stored procedures in a database. ODBC: (Open Database Connectivity) Under ODBC, drivers are used to connect applications with databases. The ODBC driver processes ODBC calls from an application, but passes SQL requests to the Teradata Database for processing. ODBC operator: A producer-type operator that enables universal open data access with many ODBC compliant data sources, including Oracle, SQL Server, DB2, and so on. The ODBC operator runs on all Teradata PT supported platforms. It reads data close to the sources, and then feeds the data directly to the Teradata Database without the need of an intermediate staging platform. OLTP: (On-Line Transaction Processing) Processing that supports the daily business operations. Also known as operational processing. operator routine: method. In object-oriented programming, refers to a function that implements a The terms operator routine and operator function may be used interchangeably. OS/VS Operating System/Virtual Storage OTB: Open Teradata Backup; a product set consisting of OTB-Veritas, OTB-BakBone, and others; Teradata backup products for MP-RAS/UNIX, NT and Windows 2000 platforms. outer join: In Teradata SQL, an extension of an inner join operation. In addition to returning qualifying rows from tables joined according to a join condition (the inner join), an outer join returns non-matching rows from one or both of its tables. Multiple tables are joined two at a time. owner: In Teradata SQL, the user who has the ability to grant or revoke all access rights on a database to and from other users. By default, the creator of the database is the owner, but ownership can be transferred from one user to another by the GIVE statement. Teradata Parallel Data Pump Reference 285 Glossary P parameter: A variable name in a macro for which an argument value is substituted when the macro is executed. parser: A program executing in a PE that translates Teradata SQL statements entered by a user into the steps that accomplish the user’s intensions. parsing engine (PE): An instance (virtual processor) of the database management session control, parsing, and dispatching processes and their data context (caches). Paused MultiLoad Job: A job that was halted, before completing, during the Acquisition Phase of the Teradata MultiLoad operation. The paused condition can be intentional, or the result of a system failure or error condition. PDE: Parallel Database Extensions peak perm: Highest amount of permanent disk space, in bytes, used by a table. performance groups: A performance group is a collection of parameters used to control and prioritize resource allocation for a particular set of Teradata Database sessions within the Priority Scheduler. Every Teradata Database session is assigned to a performance group during the logon process. Performance groups are the primary consideration in partitioning the working capacity of the Teradata Database. To learn more about performance groups, see the Priority Scheduler section of Utilities. performance periods: A threshold or limit value that determines when a session is under the control of that performance period. A performance period links PGs/Teradata Database sessions under its control to an AG that defines a scheduling strategy. A performance period allows you to change AG assignments based on time-of-day or resource usage. Physical Data Model: A data model that represents the denormalized physical implementation of data that support an information system. The logical data model is denormalized to a physical data model according to specific criteria that do not compromise the logical data model but allow the database to operate efficiently in a specific operating environment. Pipeclient: A command line program used to send commands to Teradata Query Director. The programs uses named pipes formatting. Primary server: A Teradata server in which client applications execute transactions through use of Teradata SQL or utilities such as Teradata MultiLoad and update the tables of one or more replication groups. The changes are captured by replication services and given to an intermediary connected to the server. priority definition set: A collection of data that includes the resource partition, performance group, allocation group, performance period type, and other definitions that control how the Priority Scheduler manages and schedules session execution. product join: In Teradata SQL, the type of join that occurs when the WHERE conditional of a SELECT statement causes the Teradata Database system to compare all qualifying rows from 286 Teradata Parallel Data Pump Reference Glossary one table to all qualifying rows from the other table. Because each row of one table is compared to each row of another table, this join can be costly in terms of system performance. Note that product joins without an overall WHERE constraint are considered unconstrained (Cartesian). If the tables to be joined are small, the effect of an unconstrained join on performance may be negligible, but if they are large, there may be a severe negative effect on system performance. profiles: A profile is a set of parameters you assign to a user, group of users, or an account that determines what scheduling capabilities are available and how your Teradata Query Scheduler scheduled requests server handles their scheduled requests. physical action: A basic action type, such as <Send a Page>, <Send an E-Mail>, etc. Physical actions must be encapsulated by logical actions in order to be used in the alert policy. PIC: Position independent code Pipeclient: A command line program used to send commands to Teradata Query Director. The program uses named pipes formatting. PL/I: Programming Language/1, a programming language supported for MultiLoad development. PMPC: Performance Monitor and Production Controls PP2: Preprocessor2 PPP: Point-to-Point Protocol Primary Key: A set of one or more data characteristics whose value uniquely identifies each data occurrence in a data subject. A primary key is also known as a unique identifier. privilege: A user’s right to perform the Teradata SQL statements granted to him against a table, database, user, macro, or view. Also known as access right. procedure: Short name for Teradata stored procedure. Teradata provides Stored Procedural Language (SPL) to create stored procedures. A stored procedure contains SQL to access data from within Teradata and SPL to control the execution of the SQL. producer: A type of operator that retrieves data from an external data store, such as a file, Teradata Database table, and so on, and provides it to other operators. A producer operator produces the data into the data stream’s buffer. production system: A database used in a live environment. A system that is actively used for day to day business operations. This differs from a test or development system that is used to create new queries or test new features before using them on the production system. Protocol: network. Teradata Parallel Data Pump Reference The rules for the format, sequence and relative timing of messages exchanged on a 287 Glossary Q query analysis: A feature that estimates the answer set size (number of rows) and processing time of a SELECT type query. Query Capture Database (QCD): A database of relational tables that store the steps of any query plan captured by the Query Capture Facility (QCF). Query Capture Facility (QCF): Provides a method to capture and store the steps from any query plan in a set of predefined relational tables called the Query Capture Database (QCD). query: A Teradata SQL statement, particularly a SELECT statement. Query Director: Teradata Query Director. A Teradata client application used to balance sessions between systems according to user provided algorithms. query management: The primary function of DWM is to manage logons and queries. This feature examines logon and query requests before they are dispatched for execution within the Teradata Database, and may reject logons, and may reject or delay queries. It does this by comparing the objects referenced in the requests to the types of DBA-defined rules. Query Resource filter: A Query Resource filter allows you to define the criteria for limiting resource usage associated with queries. You can define resource criteria such as: • Row count • Processing time • No joins permitted • No full table scans permitted Queries that are estimated to meet or exceed the limits for the rule during the specified dates and times are rejected. You may define global rules for this type. Query Session Utility: A separate utility program used to monitor the progress of your Teradata MultiLoad job. It reports different sets of status information for each phase of your job. R random AMP sample (RAS): An arbitrary sample from an Access Module Processor (AMP). These are samples of the tables in a query or all of the tables in a given database. Also known as RAS. RDBMS (Relational Database Management System): A database management system in which complex data structures are represented as simple two-dimensional tables consisting of columns and rows. Records: When using the Teradata MultiLoad utility, both formatted and unformatted records are accepted for loading. A formatted record, in the Teradata Database world, consists of a record created by a Teradata Database utility, such as BTEQ, where the record is packaged with begin- and end-record bytes specific to the Teradata Database. Unformatted records are 288 Teradata Parallel Data Pump Reference Glossary any records not originating on a Teradata Database, such as Lotus 1-2-3 files. These files contain records that must be defined before loading onto the Teradata Database. recursive query: A named query expression that is allowed to reference itself in its own definition, giving the user a simple way to specify a search of a table using iterative self-join and set operations. Use a recursive query to query hierarchies of data. Hierarchical data could be organizational structures such as department and sub-department, forums of discussions such as posting, response, and response to response, bill of materials, and document hierarchies. Replication Group: A set of tables for which either data changes are being captured on a primary server or applied on a subscriber server. Replication Services: a set of software functions implemented in the Teradata server that interact with an intermediary to capture or apply change data to the tables of a replication group. request: In host software, a message sent from an application program to the Teradata Database. resource partition: A collection of prioritized PGs related by their users’ associations. Has an assigned weight that determines the proportion of resources available to that partition relative to the other partitions defined for that Teradata Database. Restart Log Table: One of four restart tables the Teradata MultiLoad utility creates that are required for restarting a paused Teradata MultiLoad job. Restoration Lock: A flag set in the table header of a target table indicating that the table was aborted during the Application Phase and is now ready to be restored. A limited set of operations can be done on the table: Delete All, Drop Fallback, Drop Index, Drop Table, and Select with access lock. No Teradata MultiLoad restart will be allowed on a table with a Restoration Lock. result: The information returned to the user to satisfy a request made of the Teradata Database. results table/file: In the Schedule Request environment, a results table or file is a database table or a Windows file into which result data for a schedule request that is not self-contained are stored. results file storage: A symbolic name to a root directory where scheduled requests results are stored. You map a file storage location to a Windows root directory where results are stored. RowID join: In Teradata SQL, this join occurs when one of the join tables has a non-unique primary index constant, and another column of that table matches weakly with a non-unique secondary index column of the second table. rule: Rules are the name given to the method used by DWM to define what requests are prohibited from being immediately executed on the Teradata Database. That is, the rules enforced by DWM provide the Query Management capabilities. Teradata Parallel Data Pump Reference 289 Glossary Round Robin: Round robin (-rr) is a command line parameter that tells Teradata Query Director to route sessions in a specific order. Routing: A general term that describes how Teradata Query Director receives sessions and sends them to one system or another. Routing Configuration File: The routing configuration file in Teradata Query Director allows administrators to associate specific userids and account strings to specific systems. row: Whether null or not, that represent one entry under each column in a table. The row is the smallest unit of information operated on by data manipulation statements. RowID join: In Teradata SQL, this join occurs when one of the join tables has a non-unique primary index constant, and another column of that table matches weakly with a non-unique secondary index column of the second table. RSG: Relay Services Gateway. A virtual processor residing on a node in which the replication services software will execute. RT: Response Time RTF: Rich Text File run file: A script that is not contained within the SYSIN file, but rather executed through use of the .RUN BTEQ command. S scheduled requests: The capability to store scripts of SQL requests and execute them at a scheduled later time. schema: Schemas are used to identify the structure of the data. Producers have an output schema, to define what the source data will look like in the data stream. Consumers have an input schema, to define what will be read from the data stream. If the input and output schemas are the same, you only define the schema once. script: A file that contains a set of BTEQ commands and/or SQL statements. Security token: A binary string generated by a server when a replication group is created or altered that must be input to secure a change data capture or apply operation. self-contained statement: A query request that stores the result data that it generates, if any. For example, an INSERT/SELECT statement would be self-contained, whereas a SELECT statement would not. separator: A character or group of characters that separates words and special symbols in Teradata SQL. Blanks and comments are the most common separators. server: A computer system running the Teradata Database. Typically, a Teradata Database server has multiple nodes, which may include both TPA and non-TPA nodes. All nodes of the server are connected via the Teradata BYNET or other similar interconnect. 290 Teradata Parallel Data Pump Reference Glossary Session: A session begins when the user logs on to the Teradata Database and ends when the user logs off the Teradata Database. Also called a Teradata Database session. session: In client software, a logical connection between an application program on a host and the Teradata Database that permits the application program to send one request to and receive one response from the Teradata Database at a time. skew: This value (which is only available in V2R4 and above) is calculated based on a single Database collection interval. If the Session Collection rate is 60, then the skew is calculated for a 60 second period. The value is calculated using ‘current' data values. For example, the Max CPU used during the past 60 seconds relative to the Average used over that same 60 seconds: skew = 100 * (1 - avg / max) SMP: Symmetric Multi-Processing SNMP: Simple Network Management Protocol. Sockclient: A command line program used to send commands to Teradata Query Director. Source Database: The database from which data will be extracted or copied into the Data Warehouse. SQL: Structured Query Language. An industry-standard language for creating, updating and, querying relational database management systems. SQL was developed by IBM in the 1970s for use in System R. It is the de facto standard as well as being an ISO and ANSI standard. It is often embedded in general purpose programming languages. Programming language used to communicate with the Teradata Database. SSO: Single sign-on, an authentication option that allows users of the Teradata Database on Windows 2000 systems to access the Teradata Database based on their authorized network usernames and passwords. This feature simplifies the procedure requiring users to enter an additional username and password when logging on to Teradata Database via client applications. stand-alone operator: operators. In TPump, a type of operator that does not exchange data with other Star Schema: A modeling scheme that has a single object in the middle connected to a number of objects around it radially. statement: A request for processing by the Teradata Database that consists of a keyword verb, optional phrases, operands and is processed as a single entity. statistics: These are the details of the processes used to collect, analyze, and transform the database objects used by a given query. stored procedure: Teradata supports stored procedures. A stored procedure is a combination of SQL statements and control and conditional handling statements that run using a single call statement. Teradata Parallel Data Pump Reference 291 Glossary Stream operator: A consumer-type operator that allows parallel inserts, updates, and deletes to new or preexisting Teradata tables. Subscriber server: A Teradata server in which changes captured from a primary server by an intermediary are applied to tables that duplicate those of the primary. Replication services executing on the servers provide the capture and apply functions. supervisory user: In Data Dictionary, a user who has been delegated authority by the administrator to further allocate Teradata Database resources such as space and the ability to create, drop, and modify users within the overall user community. T table: A set of one or more columns with zero or more rows that consist of fields of related information. Target Database: Target table: TCP/IP: The database in which data will be loaded or inserted. A user table where changes are to be made by a Teradata MultiLoad task. Transmission Control Protocol/Internet Protocol. TDPID: Teradata Director Program Identifier. The name of the Teradata Database being accessed. tdwm: The database shared by Teradata Dynamic Workload Manager and Teradata Query Scheduler. Previously called the dbqrymgr database. Teradata SQL: The Teradata Database dialect of the relational language SQL, having data definition and data manipulation statements. A data definition statement would be a CREATE TABLE statement and a data manipulation statement would be a data retrieval statement (a SELECT statement). TDP: Teradata Director Program; TDP provides a high-performance interface for messages communicated between the client and the Teradata system. Target Level Emulation (TLE): Permits you to emulate a target environment (target system) by capturing system-level information from that environment. The captured information is stored in the relational tables SystemFE.Opt_Cost_Table and SystemFE.Opt_RAS_Table. The information in these tables can be used on a test system with the appropriate column and indexes to make the Optimizer generate query plans as if it were operating in the target system rather than the test system. test system: A Teradata Database where you want to import Optimizer-specific information to emulate a target system and create new queries or test new features. title: In Teradata SQL, a string used as a column heading in a report. By default, it is the column name, but a title can also be explicitly declared by a TITLE phrase. 292 TPA: Trusted Parallel Application. TOS: Teradata Operating System Teradata Parallel Data Pump Reference Glossary TPM: Transactions Per Minute Transport: The process of extracting data from a source, interfacing with a destination environment, and then loading data to the destination. transaction: A set of Teradata SQL statements that is performed as a unit. Either all of the statements are executed normally or else any changes made during the transaction are backed out and the remainder of the statements in the transaction are not executed. The Teradata Database supports both ANSI and Teradata transaction semantics. trigger: One or more Teradata SQL statements associated with a table and executed when specified conditions are met. TSM: Tivoli Storage Management; IBM’s storage management solution. TTU: Teradata Tools and Utilities is a robust suite of tools and utilities that enables users and system administrators to enjoy optimal response time and system manageability with there Teradata system. TPump is included in Teradata Tools and Utilities. tuple: In a database table (relation), a set of related values one for each attribute (column). A tuple is stored as a row in a relational database management system. It is analogous to a record in a nonrelational file. Two Phase Commit: Two Phase Commit is the process by which a relational database ensures that distributed transactions are performed in an orderly manner. In this system, transactions may be terminated by either committing them or rolling them back. type: An attribute of a column that specifies the representation of data values for fields in that column. Teradata SQL data types include numerics and strings. U UDF User Defined Functions UDM User-Defined Methods. The database developer can create custom functions that are explicitly connected to UDTs; these are known as UDMs. Functionalities directly applicable to a UDT can be located within the UDMs associated with that UDT rather than being replicated to all of the applications that use that UDT, resulting in increased maintainability. UDT A custom data type, known as a user-defined type. By creating UDTs, a database developer can augment the Teradata Database with data types having capabilities not offered by Teradata predefined (built-in) data types. Use TPump to import values into tables containing UDT columns in the same manner as is done for other tables. The input records to TPump must have the column data for UDT columns in its external type format. Unformatted Records: See Records. Unicode: A fixed-width (16 bits) encoding of virtually all characters present in all languages in the world. unique secondary index (USI): One of two types of secondary indexes. A secondary index may be specified at table creation or at any time during the life of the table. It may consist of Teradata Parallel Data Pump Reference 293 Glossary up to 16 columns. To get the benefit of the index, the query has to specify a value for all columns in the secondary index. A USI has two purposes: It can speed up access to a row which otherwise might require a full table scan without having to reply on the primary index, and it can be used to enforce uniqueness of a column or set of columns. user: In Teradata SQL, a database associated with a person who uses the Teradata Database. The database stores the person’s private information and accesses other Teradata Databases. Update operator: A consumer-type operator that emulates some of the functions of the Teradata MultiLoad utility in the Teradata PT infrastructure. UPI: Unique primary index; a UPI is required and is typically assigned to major entities in the database. user: A database associated with a person who uses the Teradata Database. The database stores the person’s private information and accesses other Teradata Databases. user groups: A group of users can be specified within DWM as either as a collection of individual users, or as all user names which satisfy a character string pattern (such as SALE*). The Teradata concept of roles is not used to define user groups, as it applies to privileges. User groups can generally be employed wherever an issuing object can be specified, and any condition applied to a group implicitly applies to all users within that group. UTF-8: In simple terms, UTF-8 is an 8 bit encoding of 16 bit Unicode to achieve an international character representation. In more technical terms, in UTF-8, characters are encoded using sequences of 1 to 6 octets. The only octet of a sequence of one has the higher-order bit set to 0, the remaining 7 bits are used to encode the character value. UTF-8 uses all bits of an octet, but has the quality of preserving the full US-ASCII range. The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80). This means that normal tools for text searching work as expected. UTF16 A 16-bit Unicode Translation Format. V value-ordered secondary index (VOSI): A non-unique secondary index (NUSI) can be value ordered which means the NUSI can be sorted on the key values themselves rather than on the corresponding hash codes. This is useful for range queries where only a portion of the index subtable will be accessed. With a value-ordered NUSI, only those blocks in the NUSI subtable that are within the range are scanned. It must be a number value, up to 4 bytes, versus a longer character column. DATE is the most commonly used data type. The actual data value is stored as part of the NUSI structure. Varbyte: A data type that represents a variable-length binary string. Varchar: A data type that represents a variable-length non-numeric character. Vargraphic: 294 A data type that represents a variable-length string of characters. Teradata Parallel Data Pump Reference Glossary view: An alternate way of organizing and presenting information in a Teradata Database. A view, like a table, has rows and columns. However, the rows and columns of a view are not directly stored by the Teradata Database. They are derived from the rows and columns of tables (or other views) whenever the view is referenced. VM (Virtual Machine): VM/CMS One of the primary operating systems for large IBM computers. Virtual Machine/Conversational Monitor System W Weighted Round Robin: Weighted round robin (-wrr) is a startup command line parameter that allows the administrator to weight specific databases for Teradata QD. workgroups: Workgroups represent collections of related scheduled request work for users, user groups, or accounts. Each workgroup is assigned a maximum number of requests that can be executing from that workgroup simultaneously thereby ensuring that requests for all workgroups get a fair share of their scheduled work done within the execution time frames. workload limits rule A Workload Limits rule allows you to limit the number of logon sessions and all-AMP queries, as well as reject or delay queries when workload limits are encountered. You can define which users, accounts, performance groups, or users within performance groups that are associated with this type of rule. Workstation: A network-attached client. Work Table: A table created during the Preliminary Phase used to store intermediate data acquired from the host during a Teradata MultiLoad task. These data will eventually be applied to a target table. Write Lock: A write lock enables a single user to modify a table. The Teradata MultiLoad utility maintains write locks against each target table during the Application Phase, and work tables and error tables for each task transaction. X XML: XML is the eXtensible Markup Language—a system created to define other markup languages. For this reason, it can also be referred to as a metalanguage. XML is commonly used on the Internet to create simple methods for the exchange of data among diverse clients. Teradata Parallel Data Pump Reference 295 Glossary 296 Teradata Parallel Data Pump Reference Index Symbols - 46 &SYSAPLYCNT system variable 60 &SYSDATE system variable 59 &SYSDATE4 system variable 59 &SYSDAY system variable 59 &SYSDELCNT system variable 59 &SYSETCNT system variable 59 &SYSINSCNT 59 &SYSINSCNT system variable 59 &SYSJOBNAME system variable 59 &SYSNOAPLYCNT system variable 60 &SYSOS system variable 60 &SYSRC system variable 60 &SYSRCDCNT system variable 60 &SYSRJCTCNT system variable 60 &SYSUPDCNT system variable 60 &SYSUSER system variable 60 (serialize_on_field specification DML command 115 ./ prefix EXIT name specification, BEGIN EXPORT command 146 A abort termination 52 abort, defined 271 aborted TPump job recovery 55 ACCEPT command definition 90 function 28 syntax 92 access rights 33 accounts defined 271 acctid specification LOGON command 165 Acquisition Error Table 197 administrator, defined 272 aggregate operators, programming considerations 67 all joins defined 272 ALTER TABLE SQL statement 29 alternate error file runtime parameter 48 ANSI/SQL DateTime specifications programming considerations 63 Teradata Parallel Data Pump Reference restrictions 63 ANSIDATE keyword DATEFORM command 108 API defined 273 APPEND keyword BEGIN LOAD command 96 application commands syntax 89 APPLY label specification, IMPORT command 149 ArraySupport keyword BEGIN LOAD 101, 116 Assembler INMOD programming structure 203 Atomic upsert feature 122 Atomic UPSERT keyword EXECUTE command 127 AXSMOD keyword IMPORT command 145 AXSMOD name, IMPORT command 145 B -b runtime parameter 45 batch mode syntax for invoking on MVS 44 syntax for invoking on UNIX 43 syntax for invoking on VM 44 syntax for invoking on Windows 43 BEGIN LOAD command definition 90 function 28 in script 69 syntax 95 BRIEF runtime parameter 45 buffers per session runtime parameter 45 BUFFERS runtime parameter 45 bypass objects defined 274 C -c charactersetname runtime parameter 45 C language INMODs programming structure 203 C language, comment support 63 -C runtime parameter 47 Character Sets 297 Index Unicode 25 UTF16 26 UTF8 25 Character sets Japanese 23 character sets Chinese and Korean 23 client system specifications 64 default 65 effects on TPump commands 65 for AXSMOD 65 runtime parameters 45 site defined 27 Teradata RDBMS default 64 character-to-date data conversions 24 character-to-numeric data conversions 24 charpos1 93 charpos2 93 CHARSET= charactersetname runtime parameter 45 CHECKPOINT keyword BEGIN LOAD command 99 CHECKPOINT SQL statement 29 checkpoints description 25 Chinese and Korean character sets 23 cname specification INSERT command 154 UPDATE statement 186 COBOL INMOD programming structure 203 COLLECT STATISTICS SQL statement 29 command functions 27 commands ACCEPT definition 90 function 28 syntax 92 BEGIN LOAD definition 90 function 28 syntax 95 DATEFORM definition 90 function 28 syntax 108 DISPLAY definition 90 function 28 syntax 111 DML definition 90 function 28 syntax 113 298 ELSE definition 90 function 28 syntax 141 END LOAD definition 90 function 28 syntax 126 ENDIF definition 90 function 28 syntax 141 FIELD definition 90 function 28 syntax 130 FILLER definition 90 function 29 syntax 139 IF definition 90 function 28 syntax 141 IMPORT definition 90 function 29 syntax 143 LAYOUT definition 90 function 29 syntax 157 LOGOFF definition 90 function 28 syntax 162 LOGON definition 90 function 28 syntax 164 LOGTABLE definition 90 function 28 syntax 168 NAME definition 90 function 28 syntax 170 PARTITION definition 90 function 29 syntax 172 ROUTE definition 91 Teradata Parallel Data Pump Reference Index function 28 syntax 176 RUN FILE definition 91 function 28 syntax 178 SET definition 91 function 28 syntax 180 SYSTEM definition 91 function 28 syntax 182 TABLE definition 91 function 29 syntax 184 usage 57 COMMENT 29 comment support 63 condition specification LAYOUT command 158 conditional expression specification IF, ELSE, and ENDI command 141 conditional expressions 57 CONFIG runtime parameter 47 configuration file optional specification 41 parameters overridden by runtime parameters BRIEF 45 CHARSET 45 ERRLOG 48 runtime parameter 47 conversions character-to-numeric 24 integer-to-decimal 24 numeric-to-numeric 24 CREATE DATABASE SQL statement 30 CREATE MACRO SQL statement 30 CREATE TABLE SQL statement 30 CREATE VIEW SQL statement 30 D -d runtime parameter 48 data file concatenation, programming considerations 67 data conversions capabilities 24 character-to-date 24 character-to-numeric 24 date-to-character 24 integer-to-decimal 24 Teradata Parallel Data Pump Reference numeric-to-numeric 24 data definition language 17, 37, 107 data dictionary, defined 276 data formats 22 data manipulation language 17, 37, 107 data manipulation, defined 277 data serialization 17 data types ANSI/SQL DateTime 63 ANSI/SQL DateTime restrictions 63 database objects protection and location 53 database specification DATABASE command 107 EXECUTE command 127 DATABASE SQL statement 30 definition 91 syntax 107 datadesc specification FIELD command 131 FILLER command 139 DATAENCRYPTION keyword BEGIN LOAD command 100, 173 DATEFORM command definition 90 function 28 syntax 108 DateTime data types, specifying 63 date-to-character data conversions 24 dbname specification BEGIN LOAD command 104 LOGTABLE command 168 dbname. specification BEGIN LOAD command 97 DBQL defined 277 DDL 17, 37, 107 ddname specification IMPORT command 144 decimal, zoned 24 DELETE DATABASE SQL statement 30 DELETE DML statement in script 70, 71 DELETE keyword EXECUTE command 127 DELETE macro 128 DELETE SQL statement 30, 109 definition 91 syntax 109 delimiters defined 278 DISPLAY command definition 90 function 28 299 Index syntax 111 DISPLAY ERRORS keyword, IMPORT command 149 DIT defined 278 DML 17, 37, 107 DML command definition 90 function 28 in script 70 overview 32 syntax 113 DML statements 32 DROP DATABASE SQL statement 30 DROP keyword, FIELD command 132 DWM defined 278 dynamn entry point for C INMOD routines 204 for SAS/C INMOD routines 204 E -e filename runtime parameter 48 echo 176 ELSE command definition 90 function 28 syntax 141 END LOAD command definition 90 function 28 in script 71 syntax 126 ENDIF command definition 90 function 28 syntax 141 errcount specification BEGIN LOAD command 99 ERRLIMIT keyword BEGIN LOAD command 98 ERRLOG=filename runtime parameter 48 error detection 193 error table acquisition 197 error tables reading 197 troubleshooting 197 errors 193 ERRORTABLE keyword BEGIN LOAD command 96 EUC defined 279 exclusion join 300 defined 279 EXECUTE SQL statement definition 91 EXECUTE statement syntax 127 execution time frame defined 279 EXIT keyword BEGIN LOAD command 105 EXIT name specification, BEGIN LOAD command 105 exit routines, definition 202 expr specification UPDATE statement 186 expression specification INSERT command 154 SET command 180 expressions, programming considerations 67 F -f runtime parameter 45 failure, defined 279 features, advanced, INMODs 201 FIELD command definition 90 function 28 in script 70 syntax 130 field, defined 279 fieldexpr specification FIELD command 132 fieldname specification FIELD command 130 FILLER command 139 INSERT command 154 file size, maximum 67 file requirements for invoking TPump 41 fileid 93, 111 filename specification IMPORT command 146 FILLER command definition 90 function 29 in script 70 syntax 139 FOR n specification, IMPORT command 147 FREE option, IMPORT command 146 frequency specification BEGIN LOAD command 100 FROM keyword IMPORT command 147 Teradata Parallel Data Pump Reference Index G GIVE SQL statement 30 global rule defined 280 GRANT SQL statement 30 graphic constants hexadecimal 67 KanjiEBCDIC 67 support for 67 graphic data types support for 66 GSS defined 280 H HOLD option IMPORT command 146 hours specification BEGIN LOAD command 101 I IF command definition 90 function 28 syntax 141 IGNORE keyword DML command 114 IMPORT command definition 90 function 29 in script 71 syntax 143 INDICATORS keyword LAYOUT command 159 INFILE filename specification IMPORT command 146 INFILE keyword IMPORT command 144, 146 infilename standard input file specification 50 init-string specification IMPORT command 145 INMODs 201 assembler example 246 C example 250 COBOL example 241 COBOL pass-thru example 244 compiling and linking 216 FastLoad 213 IBM interface 212 input return code values 215 input values 215 interface 213 Teradata Parallel Data Pump Reference major functions 201 output return code values 215 PL/I example 250 preparing program 214 programming 216 routines entry points 204 platforms supported 202 programming languages supported 202 programming structure 203 rules and restrictions 209 using 211 TPump interface 212 UNIX interface 213 UNIX programming 216 Windows interface 213 inner join defined 281 input/output controlling 38 INSERT DML statement in script 70 INSERT keyword DML command 114 EXECUTE command 127 INSERT macro 128 INSERT SQL statement 30 definition 91 syntax 154 INTEGERDATE keyword DATEFORM command 108 integer-to-decimal data conversions 24 internationalization 18 invoking on UNIX platform 42 on Windows platform 42 invoking TPump 41 J Japanese character sets 23 JIS defined 282 job recovery if aborted 55 jobname specification NAME command 170 join, defined 282 K kanjiEBCDIC graphic constants 67 KEY keyword, FIELD command 132 301 Index Korean and Chinese character sets 23 L LABEL keyword DML command 113 label specification DML command 114 IMPORT command 149 LATENCY keyword BEGIN LOAD command 102 LAYOUT command definition 90 function 29 in script 70 syntax 157 LAYOUT name specification, IMPORT command 149 layoutname specification IMPORT command 149 LAYOUT command 157 lock access 33 acquisition 33 application 33 row hash locking 33 write 33 log table space requirement calculation example 85 space requirements 84 LOGDATA command syntax 160 LOGMECH command syntax 161 LOGOFF command definition 90 function 28 messages 80 syntax 162 logoff/disconnect message 74 LOGON command definition 90 function 28 syntax 164 LOGTABLE command definition 90 function 28 syntax 168 logtables non-shareability 168 space requirements 169 M -m runtime parameter 49 macro runtime parameters 49 302 MACRODB keyword BEGIN LOAD command 104 macros 32 predefined 33 TPump usage 17 MACROS runtime parameter 49 MARK keyword DML command 114 maximum file size, programming considerations 67 row size, programming considerations 67 merge join defined 283 messages 176 minutes specification BEGIN LOAD command 102 MODIFY DATABASE SQL statement 30 monitor facility 80 MSG string specification, BEGIN LOAD command 105 MultiLoad utility data conversion capabilities 24 MULTISET table 104, 114 MVS syntax for invoking in batch mode 44 N name 127 NAME command definition 90 function 28 syntax 170 name specification BEGIN LOAD command 105 IMPORT command 145 name, defined 284 Named Pipes Access Module 145 nested join defined 284 NOATOMICUPSERT 104 NODROP keyword BEGIN LOAD command 96 NOMONITOR keyword BEGIN LOAD command 104 non-shareability logtables 168 normal termination 51 NOSTOP keyword, IMPORT command 149 NOTIFY option specification, BEGIN LOAD command 104 NOTIMERPROCESS keyword BEGIN LOAD command 102 null, defined 284 nullexpr specification FIELD command 131 Teradata Parallel Data Pump Reference Index number specification BEGIN LOAD command 96 PARTITION command 172 numeric-to-numeric data conversions 24 O Object Access filter defined 285 OLE DB Access Module 145 operators reserved words 56 options messages 74, 79 oscommand string specification SYSTEM command 182 outer join defined 285 outfilename standard output file specification 50 owner, defined 285 P pack factor 85 PACK keyword BEGIN LOAD command 102 PARTITION command 173 packing 116, 155 packing factor 85, 103, 174 PACKMAXIMUM keyword BEGIN LOAD command 102 PARTITION command 173 parms specification IMPORT command 147 parser, defined 286 parsing engine (PE), defined 286 PARTITION command definition 90 function 29 syntax 172 PARTITION keyword DML command 115 partition_name specification DML command 115 PARTITION command 173 password specification LOGON command 165 performance checklist troubleshooting performance checklist 200 performance group defined 286 periodicity runtime parameter 48 PL/I language INMODs programming structure 204 Teradata Parallel Data Pump Reference PRDICITY runtime parameter 48 predefined macros 33 preparing a TPump script 69 procedures defined 287 product join defined 286 product version numbers 3 profiles defined 287 programming INMODs for UNIX-based clients 216 UNIX-based clients 216 Q queries defined 288 query analysis defined 288 query management defined 288 Query Resource filter defined 288 R -r tpump command runtime parameter 49 RATE keyword BEGIN LOAD command 103 recovery aborted TPump job 55 procedures 53 reduced print output runtime parameter 45 redundant conversions supported 24 RENAME SQL statement 30 REPLACE MACRO SQL statement 30 REPLACE VIEW SQL statement 30 reporting options messages 79 statistics 75 restart 76 request, defined 289 reserved words use in TPump 56 restart statistics 76 restart log table 52, 53 restart procedures 53 restrictions and limitations programming considerations aggregate operators 67 data file concatenation 67 303 Index expressions 67 maximum file size 67 maximum row size 67 Teradata RDBMS data retrieval 67 results file storage defined 289 results files defined 289 results tables defined 289 retcode specification LOGOFF command 162 return codes 52 termination 68 REVOKE SQL statement 30 ROBUST keyword BEGIN LOAD command 104 ROUTE command definition 91 function 28 syntax 176 row size, maximum 67 row count variables 62 row hash locking 33 row, defined 290 RowID join defined 289, 290 rule defined 289 RUN FILE command definition 91 function 28 syntax 178 S scheduled requests defined 290 script example 72 preparation 69 writing guidelines 69 writing procedure 71 scriptencoding parameter 46 seconds specification BEGIN LOAD command 102 self-contained statement defined 290 separator, defined 290 SERIALIZE keyword BEGIN LOAD command 102 SERIALIZEON keyword DML command 115 304 session, defined 291 sessions 90, 95, 101, 162, 164 SESSIONS keyword BEGIN LOAD command 96 PARTITION command 173 SET command definition 91 function 28 syntax 180 SET QUERY_BAND SQL statement 30 SET SESSION COLLATION SQL statement 30 single sign on 164 SLEEP keyword 96, 101, 174 BEGIN LOAD command 103 software releases supported 3 space requirements for TPump log tables 84 log table 84 SQL defined 291 SQL statements DATABASE definition 91 syntax 107 DELETE definition 91 syntax 109 EXECUTE definition 91 syntax 127 INSERT definition 91 syntax 154 supported by TPump 29 UPDATE definition 91 syntax 186 SQL, Teradata 37 SSO LOGON command 164 starting TPump on UNIX platform 42 on Windows platform 42 startpos specification FIELD command 130 FILLER command 139 statement defined 291 statement_rate specification BEGIN LOAD command 103 statements specification BEGIN LOAD command 103 PARTITION command 174 Teradata Parallel Data Pump Reference Index statements to execute if FALSE specification IF, ELSE, and ENDIF command 141 statements to execute if TRUE specification IF, ELSE, and ENDIF command 141 statements to resume with specification IF, ELSE, and ENDIF command 141 statistics 75 facility 74 restart 76 stored procedures defined 291 string variable MSG specification, BEGIN LOAD command 105 TEXT specification, BEGIN LOAD command 105 supervisory user, defined 292 support commands, defined 27 support environment 37 SYSAPLYCNT system variable 60 SYSDATE system variable 59 SYSDATE4 system variable 59 SYSDAY system variable 59 SYSDELCNT system variable 59 SYSETCNT system variable 59 SYSINSCNT system variable 59 SYSJOBNAME 28 SYSJOBNAME system variable 59 SYSNOAPLYCNT system variable 60 SYSOS system variable 60 SYSRC system variable 60 SYSRCDCNT system variable 60 SYSRJCTCNT system variable 60 SYSTEM command definition 91 function 28 syntax 182 system failure restart and recovery 53 system variables 59 &SYSAPLYCNT 60 &SYSDATE 59 &SYSDATE4 59 &SYSDAY 59 &SYSDELCNT 59 &SYSETCNT 59 &SYSJOBNAME 59 &SYSNOAPLYCNT 60 &SYSOS 60 &SYSRC 60 &SYSRCDCNT 60 &SYSRJCTCNT 60 &SYSTIME &SYSTIME system variable 60 &SYSUPDCNT 60 &SYSUSER 60 Teradata Parallel Data Pump Reference SYSTIME system variable 60 SYSUPDCNT system variable 60 SYSUSER system variable 60 T TABLE command definition 91 function 29 in script 70 syntax 184 table, defined 292 tableref specification TABLE command 184 tables fallback 35 nonfallback 35 task commands 27 task commands 38 syntax and usage 89 usage 57 tdpid specification LOGON command 164 TENACITY keyword 101 BEGIN LOAD command 101 Teradata RDBMS data retrieval, programming considerations 67 Teradata SQL 37 Teradata SQL statements supported by TPump 29 terminating a TPump job 52 termination return codes 52, 68 text 111 TEXT string specification, BEGIN LOAD command 105 threshold specification PARTITION command 174 THRU keyword IMPORT command 147 time and date variables 61 title, defined 292 tname specification BEGIN LOAD command 97 DELETE statement 109 INSERT command 154 LOGTABLE command 168 UPDATE statement 186 TPump advanced features 201 invoking batch mode on MVS 44 batch mode on UNIX 43 batch mode on VM 44 305 Index batch mode on Windows 43 macros 32 Monitor facility 80 Monitor facility interface 80 script, example of 72 support command, defined 27 support environment 38 using INMOD routines 211 tpump command runtime parameter 49 TPump Conditional Expressions 57 TPump/INMOD Routine Interface 205 transaction, defined 293 troubleshooting 193 early error detection 193 error detection 193 reading error tables 197 type, defined 293 U Unicode Character Sets 25 UNIX starting TPump 42 syntax for invoking in batch mode 43 UPDATE DML statement 70 in script 70 UPDATE keyword EXECUTE command 127 UPDATE macro 128 UPDATE SQL statement 30 definition 91 syntax 186 upsert Atomic 122 example 121 feature 32, 120 UPSERT keyword 32 DML command 121 EXECUTE command 127 UPSERT macro 128 USE keyword DML command 115 use_field specification DML command 115 user 294 user groups defined 294 username specification LOGON command 164 USING (parms) specification IMPORT command 147 USING keyword IMPORT command 147 306 UTF16 Character Sets 26 UTF-8 defined 294 UTF8 Character Sets 25 utility variables supported by TPump 62 V -v 75 -V runtime parameter 50 -v runtime parameter 50 var 92 var specification SET command 180 variables date and time date and time variables 61 row count 62 substitution 62 utility supported by TPump 62 VARTEXTvariable-length text record format 148 verbose mode runtime parameter 50 VERBOSE runtime parameter 50 version numbers 3 VM syntax for invoking in batch mode 44 W WebSphere® Access Module for Teradata (client version) 145 WebSphere® Access Module for Teradata (server version) 145 WHERE condition specification DELETE statement 109 IMPORT command 150 UPDATE statement 186 Windows starting TPump 42 syntax for invoking in batch mode 43 workgroups defined 295 workload limits rule defined 295 Z zoned decimal format 24 Teradata Parallel Data Pump Reference

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Teradata Parallel Data Pump Reference