Download Teradata Database-to-Hadoop User Guide

What would you do if you knew?™ Teradata QueryGrid Teradata QueryGrid: Teradata Database-toHadoop User Guide Release 15.0.4 B035-1185-015K October 2015 The product or products described in this book are licensed products of Teradata Corporation or its affiliates. Teradata, Active Data Warehousing, Active Enterprise Intelligence, Applications-Within, Aprimo Marketing Studio, Aster, BYNET, Claraview, DecisionCast, Gridscale, MyCommerce, QueryGrid, SQL-MapReduce, Teradata Decision Experts, "Teradata Labs" logo, Teradata ServiceConnect, Teradata Source Experts, WebAnalyst, and Xkoto are trademarks or registered trademarks of Teradata Corporation or its affiliates in the United States and other countries. Adaptec and SCSISelect are trademarks or registered trademarks of Adaptec, Inc. AMD Opteron and Opteron are trademarks of Advanced Micro Devices, Inc. Apache, Apache Avro, Apache Hadoop, Apache Hive, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Apple, Mac, and OS X all are registered trademarks of Apple Inc. Axeda is a registered trademark of Axeda Corporation. Axeda Agents, Axeda Applications, Axeda Policy Manager, Axeda Enterprise, Axeda Access, Axeda Software Management, Axeda Service, Axeda ServiceLink, and Firewall-Friendly are trademarks and Maximum Results and Maximum Support are servicemarks of Axeda Corporation. Data Domain, EMC, PowerPath, SRDF, and Symmetrix are registered trademarks of EMC Corporation. GoldenGate is a trademark of Oracle. Hewlett-Packard and HP are registered trademarks of Hewlett-Packard Company. Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries. Intel, Pentium, and XEON are registered trademarks of Intel Corporation. IBM, CICS, RACF, Tivoli, and z/OS are registered trademarks of International Business Machines Corporation. Linux is a registered trademark of Linus Torvalds. LSI is a registered trademark of LSI Corporation. Microsoft, Active Directory, Windows, Windows NT, and Windows Server are registered trademarks of Microsoft Corporation in the United States and other countries. NetVault is a trademark or registered trademark of Dell Inc. in the United States and/or other countries. Novell and SUSE are registered trademarks of Novell, Inc., in the United States and other countries. Oracle, Java, and Solaris are registered trademarks of Oracle and/or its affiliates. QLogic and SANbox are trademarks or registered trademarks of QLogic Corporation. Quantum and the Quantum logo are trademarks of Quantum Corporation, registered in the U.S.A. and other countries. Red Hat is a trademark of Red Hat, Inc., registered in the U.S. and other countries. Used under license. SAP is the trademark or registered trademark of SAP AG in Germany and in several other countries. SAS and SAS/C are trademarks or registered trademarks of SAS Institute Inc. SPARC is a registered trademark of SPARC International, Inc. Symantec, NetBackup, and VERITAS are trademarks or registered trademarks of Symantec Corporation or its affiliates in the United States and other countries. Unicode is a registered trademark of Unicode, Inc. in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Other product and company names mentioned herein may be the trademarks of their respective owners. The information contained in this document is provided on an "as-is" basis, without warranty of any kind, either express or implied, including the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. Some jurisdictions do not allow the exclusion of implied warranties, so the above exclusion may not apply to you. In no event will Teradata Corporation be liable for any indirect, direct, special, incidental, or consequential damages, including lost profits or lost savings, even if expressly advised of the possibility of such damages. The information contained in this document may contain references or cross-references to features, functions, products, or services that are not announced or available in your country. Such references do not imply that Teradata Corporation intends to announce such features, functions, products, or services in your country. Please consult your local Teradata Corporation representative for those features, functions, products, or services available in your country. Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or updated without notice. Teradata Corporation may also make improvements or changes in the products or services described in this information at any time without notice. To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this document. Please e-mail: [email protected] Any comments or materials (collectively referred to as "Feedback") sent to Teradata Corporation will be deemed non-confidential. Teradata Corporation will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display, transform, create derivative works of, and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis. Further, Teradata Corporation will be free to use any ideas, concepts, know-how, or techniques contained in such Feedback for any purpose whatsoever, including developing, manufacturing, or marketing products or services incorporating Feedback. Copyright © 2014 - 2015 by Teradata. All Rights Reserved. Preface Purpose This book describes the Teradata® QueryGrid™: Teradata Database-to-Hadoop SQL interface for transferring data between Teradata Database and remote Hadoop hosts. Use this book with the other books in the SQL book set. Audience This book is intended for database administrators and other technical personnel who use Teradata Database. Supported Releases This book supports Teradata QueryGrid: Teradata Database-to-Hadoop 15.0.4. Teradata QueryGrid: Teradata Database-to-Hadoop (also referred to as the Teradata-toHadoop connector) supports the following distributions: • Teradata Database Release 15.0, 15.0.1, 15.0.2, or 15.0.3 to Hortonworks HDP 1.3.2 • Teradata Database Release 15.0.1, 15.0.2, 15.0.3, or 15.0.4 to Hortonworks HDP 2.1.2 • Teradata Database Release 15.0.4 or later to Hortonworks HDP 2.3.0 • Teradata Database Release 15.0.4 or later to Cloudera CDH 5.4.3 Related Documents Information about using Teradata QueryGrid 14.10, which can be used to connect to MapR distributions, can be found in the following documents: Title Publication ID Release Summary, B035-1098 B035-1098-112A Refer to the topic titled "SQL-H for Teradata: LOAD_FROM_HCATALOG." SQL Functions, Operators, Expressions, and Predicates, B035-1145 B035-1145-112A Refer to the topic titled "LOAD_FROM_HCATALOG." Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 3 Preface Prerequisites Note: These documents refer to Teradata QueryGrid 14.10 by its former name, Teradata SQL-H. Prerequisites You should be familiar with basic relational database management theory and technology. To become familiar with concepts specific to Teradata Database, read Introduction to Teradata, B035-1091 and SQL Fundamentals, B035-1141. Changes to This Book Release Description Teradata QueryGrid: Teradata Database-toHadoop 15.0.4 • • • October 2015 • Teradata QueryGrid: Teradata Database-toHadoop 15.0.3 August 2015 • • • • • Teradata QueryGrid: Teradata Database-toHadoop 15.0.2 • May 2015 • • • 4 Teradata QueryGrid: Teradata Database-toHadoop 15.0.1 • • January 2015 • • • Added information about the support of Cloudera CDH 5.4. Added information about the support of Hortonworks HDP 2.3. Added information that you must have Teradata Database release 15.0.4 installed to use the Teradata-to-Hadoop connector with HDP 2.3 and CDH 5.4. Added that as of Teradata Database release 15.0.4, Hortonworks HDP 1.3.2 is no longer supported. Book applies now to Hadoop using LDAP and Kerberos for external security. Added a known Hortonworks HDP 2.1 issue with Hive 13 to "Limitations" in Chapter 1. Added a known issue with importing a Hive table with a large numbers of partitions to "Limitations" in Chapter 1. Added information about using DEFINER with authorization and foreign server to Chapter 2. Added information to "Authentication Security" in Chapter 4. Added information about external security (LDAP and Kerberos) for Hadoop to Chapters 1, 2, and 4. Note: The Teradata-to-Hadoop connector does not currently support the use of Kerberos for external security. Redistributed information from Chapter 3 about table operators as users will interact with them through the foreign server objects. Updated information in ServerV[X] and ServerInfoV[X]. Moved FNC Interfaces to an appendix. Added additional attributes to FNC_ TblOpSetFormat. Added Hortonworks HDP 2.1 to the list of compatible releases. Added information about hadoop_properties, isNested, updateStatistics, and UseNativeQualification name value pairs. Added information about new supported data types. Updated information about the proxy user. Updated heap size and memory information. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Preface Additional Information Release Description • • Teradata QueryGrid: Teradata Database-toHadoop 15.0 Added information about dealing with line terminators. Added information about using the table operators with timestamp data. Initial book created June 2014. June 2014 Additional Information URL Description www.info.teradata.com Use the Teradata Information Products Publishing Library site to: • View or download a manual: • Under Online Publications, select General Search. • Enter your search criteria and click Search. • Download a documentation CD-ROM: • Under Online Publications, select General Search. • In the Title or Keyword field, enter CD-ROM, and click Search. www.teradata.com The Teradata home page provides links to numerous sources of information about Teradata. Links include: • Executive reports, white papers, case studies of customer experiences with Teradata, and thought leadership • Technical information, solutions, and expert advice • Press releases, mentions and media resources www.teradata.com/TEN/ Teradata Customer Education delivers training that builds skills and capabilities for our customers, enabling them to maximize their Teradata investment. https://tays.teradata.com Use Teradata @ Your Service to access Orange Books, technical alerts, and knowledge repositories, view and join forums, and download software patches. Teradata Developer Exchange Teradata Developer Exchange provides articles on using Teradata products, technical discussion forums, and code downloads. To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this document. Please email [email protected]. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 5 Preface Product Safety Information Product Safety Information This document may contain information addressing product safety practices related to data or property damage, identified by the word Notice. A notice indicates a situation which, if not avoided, could result in damage to property, such as equipment or data, but not related to personal injury. Example Notice: Improper use of the Reconfiguration utility can result in data loss. 6 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide CHAPTER 1 Introduction to Teradata QueryGrid: Teradata Database-to-Hadoop Overview of Teradata QueryGrid: Teradata Database-to-Hadoop Teradata QueryGrid: Teradata Database-to-Hadoop (also referred to as the Teradata-toHadoop connector) provides an SQL interface for transferring data between Teradata Database and remote Hadoop hosts. From Teradata Database you can do the following: • Import Hadoop data into a temporary or permanent Teradata Database table. • Export data from temporary or permanent Teradata Database tables into existing Hadoop tables. • Create or drop tables in Hadoop. • Reference tables on the remote hosts in SELECT and INSERT statements. • Select Hadoop data for use with a business tool. • Select and join Hadoop data with data from independent data warehouses for analytical use. Benefits • Provides the ability to export data to Hadoop servers, adding to the Hadoop data import ability that was available in Release 14.10 as Teradata SQL-H. • Enables the automatic push down of qualification columns and grammar to execute on a remote host. • Provides the ability to qualify both columns and partitions involved in the query to reduce the amount of data that needs to be returned. • Provides privileges to control who can read and write to the servers and tables on remote hosts. • Provides simplified grammar that makes the Teradata-to-Hadoop connector easier to use. Create a foreign server definition once and thereafter use the server name instead of detailed connection information in each SQL query. • Provides the ability to create an authorization object to securely store credentials. Foreign servers can be defined to use an authorization object to authenticate with a security system, such as LDAP or Kerberos, that is protecting Hadoop clusters. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 7 Chapter 1 Introduction to Teradata QueryGrid: Teradata Database-to-Hadoop Overview of Teradata QueryGrid: Teradata Database-to-Hadoop Considerations You may not use this feature without the appropriate license. The fact that this feature may be included in product media or downloads, or described in documentation that you receive, does not authorize you to use it without the appropriate license. Contact your Teradata sales representative to purchase and enable this feature. Teradata QueryGrid: Teradata Database-to-Hadoop installs on a single node on the Teradata system. The Teradata Database then automatically distributes the table operators and the files that are needed to the other nodes on the system. The current version of the Teradata-to-Hadoop connector has the following prerequisites: • Teradata Database 15.0.4 or later Note: You must upgrade to 15.0.4, which uses Java 8, to use Teradata QueryGrid: Teradata Database-to-Hadoop with HDP 2.3 and CDH 5.4. • At least one of the following: • Hortonworks HDP 2.1 • Hortonworks HDP 2.3 • Cloudera CDH 5.4 Note: The current version of the Teradata-to-Hadoop connector does not support the use of Hortonworks HDP 1.3.2. Hortonworks HDP 1.3.2 is supported for use in releases 15.0, 15.0.1, 15.0.2, and 15.0.3 of the Teradata-to-Hadoop connector. • A minimum of 96GB of node memory • A network that connects all Teradata Database nodes to all Hadoop data nodes • If your Hadoop cluster is protected by an external security system, such as LDAP, each Teradata Database user accessing Hadoop must have a corresponding security system credential. The Teradata-to-Hadoop connector also requires some post-installation configuration to the FSGCache, number of concurrent queries, and Java Virtual Machine (JVM) settings. For information about the configuration required, see Post-Installation Configuration. Note: Teradata QueryGrid: Teradata Database-to-Hadoop does not work with a Kerberized cluster where Hive requires LDAP authentication. Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH 5.4 is not supported. Limitations • Teradata QueryGrid: Teradata Database-to-Hadoop supports ORC file import and export for Hortonworks HDP 2.1. However, there is a known issue with Hive 13 (which is fixed in Hive 14) that generates an error when importing CHAR/VARCHAR data from ORC using FOREIGN TABLE or usenativequalification. • Import from a Hive table fails when the Hadoop job configuration size exceeds the Teradata-to-Hadoop connector limit of 16 MB. This can happen if there are a large 8 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 1 Introduction to Teradata QueryGrid: Teradata Database-to-Hadoop ANSI Compliance number of partitions defined for the Hive table. For example, a simple Hive table of three INT columns and nine VARCHAR(2000) columns with 10,000 partitions may return a job configuration size of 1.6 MB, but that same table defined to have 100,000 partitions would have a job configuration size of 15 MB. If your Hadoop job fails, try reducing the number of partitions in the Hive table that you want to import. • The SequenceFile format is not supported. • Apache Avro is not supported. ANSI Compliance The syntax used for the connector is a Teradata extension to the ANSI SQL:2011 standard. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 9 Chapter 1 Introduction to Teradata QueryGrid: Teradata Database-to-Hadoop ANSI Compliance 10 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide CHAPTER 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop Introduction This chapter describes the syntax and options for the Teradata-to-Hadoop connector, and provides examples of their use. It includes DDL statements, and information about DML and DCL statements. If you are a DBA, or a DBA has already granted you the privileges needed to create foreign servers, we recommend that you start by reading CREATE FOREIGN SERVER. ALTER FOREIGN SERVER Purpose Modifies the parameters of an existing server object. Syntax Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 11 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop ALTER FOREIGN SERVER Syntax Elements server_name The name given to the foreign server object. EXTERNAL SECURITY Associates an authorization object with the foreign server. The authorization stores the encrypted credentials for a user as a database object. The Teradata QueryGrid connector passes the credentials in the authorization to the remote platform identified by the foreign server when the foreign server is accessed. You must use EXTERNAL SECURITY TRUSTED for Teradata QueryGrid: Teradata Database-to-Hadoop when the Hadoop platform is protected by an external security system, such as Kerberos, for example. INVOKER DEFINER INVOKER is a keyword that indicates that the associated authorization must be present in the user database at the time that the foreign server is accessed. Note: The user database is the database that was created for the user in the Teradata system when the user account was created. DEFINER is a keyword that indicates that the associated authorization must be present in the database that contains the foreign server when the foreign server is accessed. Note: The DEFAULT keyword that can be used with DEFINER in CREATE AUTHORIZATION and REPLACE AUTHORIZATION statements is not needed in association with a foreign server. You must use either INVOKER TRUSTED or DEFINER TRUSTED if the remote platform uses an external security system (such as Kerberos, for example) for authentication. 12 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop ALTER FOREIGN SERVER TRUSTED A keyword that indicates the associated authorization object was created as TRUSTED. authorization_name Specifies the name of the authorization object to be used when the foreign server is accessed. Modify options ADD Use to: • add or replace a global name value pair that is used to define the server object • add an IMPORT or EXPORT table operator. If you want to replace a table operator that is already associated with the foreign server you must first drop the table operator before adding the new one. • add or replace a local name value pair that is used with an IMPORT or EXPORT table operator name('value') The name value pair or pairs that you want to add or modify. Note that in the description of the name value pairs: • The label "Server only" indicates that the name value pair must follow the syntax ADD name('value'). • The label "Import only" indicates that the name value pair must be specified after the IMPORT keyword. • The label "Export only" indicates that the name value pair must be specified after the EXPORT keyword. • Unlabeled name value pairs may be specified in any part of the ADD syntax. If specified as ADD name('value'), the name value pair will be applied to the server as a whole. For descriptions of the name value pairs used with the server object, see Required Name Value Pairs and Optional Name Value Pairs. IMPORT Indicates that you are going to act on the operator that is used to import data into Teradata Database. EXPORT Indicates that you are going to act on the operator that is used to export data out of Teradata Database. operator_name The name of the table operator that you want to use. For more information about the table operators used with the server object, see CREATE FOREIGN SERVER. Drop options DROP • to drop a global name value pair that was used to define a server object. You need only specify the name to drop the pair. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 13 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop ALTER FOREIGN SERVER • to drop an IMPORT or EXPORT table operator that was associated with a server definition. When you drop a table operator all related name value pairs are also dropped. • to drop a local name value pair that was used with an IMPORT or EXPORT table operator. You need only specify the name to drop the pair. name When used alone, name is the name of the name value pair that you want to drop. For more information about the name value pairs used with the server object, see Required Name Value Pairs and Optional Name Value Pairs. IMPORT Indicates that you are going to act on the operator that is used to import data into Teradata Database. EXPORT Indicates that you are going to act on the operator that is used to export data out of Teradata Database Required Name Value Pairs These name value pairs are required to create a functioning foreign server object. Additional optional name value pairs may be required to create a foreign server for a specific implementation. hosttype Server only. For Teradata QueryGrid: Teradata Database-to-Hadoop, this is ('hadoop'). port Server only. The server port number for the Hive Metastore; typically this is 9083. server Server only. The DNS host name or IP address for the Apache Hive Metastore (hive.metastore.uris). You can use an application, such as Ambari, to obtain this value. Optional Name Value Pairs These name value pairs are optional. However, a particular server implementation may require you to define some of these name value pairs. For example, a foreign server must be defined with a Hive port to support queries that access Hive Server2. clustername Required when using security('kerberos'). Specifies the directory name that stores the JAR file that contains the configuration files (core-site.xml, hdfs-site.xml, hive-site.xml, hive-site.xml, and yarn-site.xml) for the Hadoop cluster to be accessed. This directory was set up during the Hadoop client installation on the Teradata nodes. For example, you would use the name value pair clustername('yourcluster') if the files are stored as follows: • yourcluster/ • yourcluster/core-site.xml • yourcluster/hdfs-site.xml • yourcluster/hive-site.xml • yourcluster/hive-site.xml • yourcluster/yarn-site.xml 14 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop ALTER FOREIGN SERVER compression_codec Export only. Specifies the type of compression to use for the exported data. The default is no compression. The supported compression types are based on the compression codecs configured on the Hadoop system. Note: Snappy is supported but you must specify it in hadoop_properties using <orc.compression=SNAPPY> as the argument. Snappy is supported only for the ORCFile file type. You must specify the full name for the compression codec as follows: • org.apache.hadoop.io.compress.DefaultCodec • org.apache.hadoop.io.compress.GzipCodec • org.apache.hadoop.io.compress.BZip2Codec dbname The name of the user’s database. This parameter is optional. You can specify a dbname value in the foreign server to limit its scope to a specific database. The value specified in the USING clause in the CREATE FOREIGN SERVER syntax overrides any corresponding value specified directly in the user query. default_string_size Size at which data imported from or exported to Hadoop String columns is truncated. When applied to the import operator, the value represents the maximum number of Unicode characters to import, and defaults to 2048 characters. When applied to the export operator, the value represents the maximum number of bytes to export, and defaults to 4096 bytes. Teradata QueryGrid silently truncates the String columns at the default value set in default_string_size. hadoop_properties ('<property1=value>, <property3=value1,value2>') Sets specific properties used to interact with Hadoop. If there are multiple arguments, you must delimit them with angle brackets. If there is only one argument, you can omit the angle brackets. For example, the syntax for the hadoop_properties clause for a High Availability (HA) target supports an updated syntax where multiple values can be included. In this case the properties must be enclosed by left and right angle brackets. No spaces are allowed within or between arguments. The High Availability hadoop properties are defined based on the name service that is defined on your hadoop server. For example, if you have the following Hadoop properties: hadoop_properties(' <dfs.client.use.datanode.hostname=true> ,<dfs.datanode.usedatanode.hostname=true> ,<dfs.nameservices=MYCOMPANY_HADOOP02> ,<dfs.ha.namenodes.MYCOMPANY_HADOOP02=nn1,nn2> ,<dfs.namenode.rpc-address.MYCOMPANY_HADOOP02.nn1=hdp230-2:8020> ,<dfs.namenode.rpc-address.MYCOMPANY_HADOOP02.nn2=hdp230-3:8020> ,<dfs.client.failover.proxy.provider.MYCOMPANY_HADOOP02=org.apache. hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider>') In this example, you would make the following replacements: • Replace MYCOMPANY_HADOOP02 with your own name service ID. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 15 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop ALTER FOREIGN SERVER • Replace hdp230-2 and hdp230-3 with your own namenode hostnames. It may also be necessary for you to replace nn1 and nn2 in the example above with your own namenode aliases. To verify, check the following property in your hdfssite.xml file: <property> <name>dfs.ha.namenodes.MYCOMPANY_HADOOP02</name> <value>namenode10,namenode66</value> </property> In this case, you would make the following replacements: • Replace nn1 with namenode10 • Replace nn2 with namenode66 In most cases, you should set the dfs.client.use.datanode.hostname property to true. If you have a setup where your TPA nodes are on one BYNET, the Hadoop cluster is on another BYNET, and they are communicating with one another via Ethernet, then you should also set the dfs.datanode.usedatanode.hostname property to true. hiveserver The DNS host name or IP address of Hive Server2. This is used when a query results in the use of the HCTAS and HDROP procedures or FOREIGN TABLE SELECT in Hive. (You can use an application, such as Ambari, to obtain this value.) If no value is specified for hiveserver then the value for server is used. hiveport The port for access to the Hive Server2; typically this is 10000. You can use an application, such as Ambari, to obtain this value. merge_hdfs_files Export only. Indicates that files under the same partition should be merged whenever possible. The default is to not merge. A value of TRUE means that files will be merged. row_count_report_freq The frequency with which byte count is updated in DBQL. The default is every 100 rows. You can adjust this to a larger value if the update frequency is too resource intensive. security Specifies the name of the external security system used for authentication on the Hadoop cluster. This parameter is required when an external security system is in use. The default is no security. Valid values are: • kerberos • ldap Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH 5.4 is not supported. tablename The name of the table to be imported or exported. This parameter is optional. You can specify a tablename value in the foreign server to limit its scope to a specific table. The value specified in the USING clause in the CREATE FOREIGN SERVER syntax overrides any corresponding value specified directly in the user query. 16 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop ALTER FOREIGN SERVER temp_dbname Import only. The value is the name of the Hadoop database to use to store temporary Hive staging tables. This parameter is optional. You should consider using temp_dbname when planning to use FOREIGN TABLE syntax on a foreign server or when a foreign server is set up to use usenativequalification. If no database is specified with temp_dbname, then the default Hadoop database is used. To use the specified database, it must exist when the foreign server is created or altered to use the database. The session user must have create and write permission to the database. If multiple users use the same foreign server then the Hadoop administrator may want to consider setting up Hive authorization in such a way that temporary tables cannot be read by another user. transformformatting Import only. When set to 'true' it indicates that an array list data is formatted appropriately, so that it can be cast directly into a Teradata array column type based on the appropriate data type. This parameter is optional. The value specified in the USING clause in the CREATE FOREIGN SERVER syntax overrides any corresponding value specified directly in the user query. updatestatistics Export only. Indicates that the LOAD_TO_HCATALOG_abcn_n_n operator updates the table statistics after all the data has been loaded into the target Hive table. Valid values are 'true' and 'false'. A value of true means that the table statistics are updated. Note: You must also have set hive.stats.autogather to true in your hive-site.xml file for updatestatistics to work properly. usenativequalification Import only. A value of 'true' indicates that SELECT queries should be pushed down to Hive as much as possible. When a foreign server uses usenativequalification, Teradata Database examines the following conditions: • Hive table data size is large and there are qualifying predicates on non-partitioned Hive columns. Large is defined as having a number of splits that is larger than the number of Teradata Database nodes. • The queried Hive object is a view. When either of the two conditions are met, Teradata Database constructs a Hive query from the Hive object name, the referenced columns, and the qualifying predicates. It then creates a Hive staging table (in the database specified by temp_dbname) from the constructed query and retrieves data in the staging table. The staging table is dropped after all data has been retrieved. Valid values are 'true' and 'false.' For queries that involve joins between two HCatalog tables, Teradata Database brings the data into Teradata spool and joins them in the database. The join is not pushed into Hive. For example, the join syntax in the following query requires the manual FOREIGN TABLE SELECT to be accomplished in Hive: SELECT h1.c1, h2.c2 FROM h1@hadoop1, h2@hadoop1 WHERE h1.id = h2.id ; Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 17 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop ALTER FOREIGN SERVER username The name of the Hadoop user's credential. This option is ignored when the security name value pair is defined for the foreign server. If no username value and no security value are defined, the foreign server uses the name of user making the request. (This is the Teradata Database user name in capital letters.) No password is associated with a Hadoop user. HDFS and Hive check for a user name for access. If no user name is specified, then foreign server supplies the name of the session user. If HDFS and Hive are not configured for file permissions, then the user name is optional. Required Privileges You must have DROP SERVER privilege on the TD_SERVER_DB database or on the specified foreign server to modify the foreign server object. If you are modifying the table operators that are associated with the server, or adding a table operator, you must also have EXECUTE FUNCTION and SELECT privileges on the specified table operators. Examples of Using ALTER FOREIGN SERVER Example: Adding a New Attribute The following example adds a new attribute to an existing server object. In this example, INSERT and SELECT operations are limited to the table named cardata. ALTER FOREIGN SERVER hadoop2 ADD tablename('cardata') ; Example: Defining an EXPORT Option for an Existing Server The following example defines an EXPORT table operator to an existing foreign server object : ALTER FOREIGN SERVER hive_metastore_server ADD EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_3_0 USING merge_hdfs_files('true') compression_codec('io.seqfile.compression.type=BLOCK') ; Usage Notes You cannot use the following names in the name value pairs in ALTER SERVER statements: • Columns • hExplain • IsNested • Servermode Note: External security options and ADD or DROP clauses must be specified in the syntax. 18 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop BEGIN LOGGING BEGIN LOGGING Purpose Starts the auditing of SQL requests that attempt to access data. This topic describes only the portions of the BEGIN LOGGING syntax diagram that are specific to this Teradata QueryGrid connector. For information about the other syntax that you can use with BEGIN LOGGING, see SQL Data Definition Language - Syntax and Examples, B035-1144. Syntax BEGIN LOGGING ON A WITH TEXT DENIALS FIRST LAST FIRST AND LAST EACH A FOR CONSTRAINT constraint_name ALL , operation GRANT B , BY database_name user_name B , ON ; 20 AUTHORIZATION authorization_name DATABASE database_name USER database_name TABLE object_name VIEW database_name. MACRO user_name. PROCEDURE FUNCTION TYPE FOREIGN SERVER Syntax Element ON FOREIGN SERVER object_name Indicates that the database object for which access is to be logged is a foreign server. You must specify an object name, which is the name of the foreign server. You can optionally specify the name of the containing database, which must be TD_SERVER_DB. You cannot use a user_name with FOREIGN SERVER. For more information about using BEGIN LOGGING, see SQL Data Definition Language Syntax and Examples, B035-1144. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 19 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop COMMENT (Comment Placing Form) COMMENT (Comment Placing Form) Purpose Creates a user-defined description of a user-defined database object or definition in the data dictionary. This topic describes only the portions of the COMMENT syntax diagram that are specific to Teradata QueryGrid. For information about the other syntax elements in COMMENT (Comment Placing Form), see SQL Data Definition Language - Syntax and Examples, B035-1144. Syntax COMMENT ON object_kind_1 object_kind_2 object_name database_name. user_name. 'comment' ; AS IS Syntax Element object_kind_2 An optional database object kind specification. You can specify the following database object kinds to retrieve a comment for the kind of object they represent, but they are optional. • DATABASE • FOREIGN SERVER • TABLE • USER If you specify an optional database_name with FOREIGN SERVER, the name must be TD_SERVER_DB. You cannot use a user_name with FOREIGN SERVER. All existing rules for COMMENT apply for use with a FOREIGN SERVER object. The optional comment string is recorded in DBC.TVM. For more information about using COMMENT (Comment Placing Form), see SQL Data Definition Language - Syntax and Examples, B035-1144. CREATE AUTHORIZATION and REPLACE AUTHORIZATION Purpose Creates or replaces an authorization object in Teradata Database. The authorization stores credentials for a user account that exists on a remote platform. The credentials need only be 20 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE AUTHORIZATION and REPLACE AUTHORIZATION valid on the platform specified in the foreign server object; they do not need to be valid on the Teradata Database or on its underlying operating system. When you specify TRUSTED in the CREATE or REPLACE AUTHORIZATION statement, Teradata Database does not validate the credentials. For Teradata QueryGrid, an authorization object is used by a foreign server object to log into a remote platform using credentials that are valid on the remote platform. When a Teradata user makes a request that uses the foreign server, the foreign server object provides the credentials from the authorization object to the target platform for authentication. This allows any part of the request that runs on the remote platform to use the context, privileges, and access control granted to the remote platform user account. For example, if the foreign server connects to a Hadoop server protected by LDAP, then the associated authorization object must contain credentials for the user account in LDAP. If the foreign server connects to a Hadoop server protected by Kerberos, then the associated authorization object must contain credentials for the user account in Kerberos. The syntax table describes only the portions of the CREATE AUTHORIZATION and REPLACE AUTHORIZATION syntax diagram that are specific to Teradata QueryGrid. For information about the other syntax that you can use with CREATE AUTHORIZATION and REPLACE AUTHORIZATION, see SQL Data Definition Language - Syntax and Examples, B035-1144. Syntax Syntax Elements database_name. user_dbname. Optional name of the location where the authorization is to be stored. The default location that is used changes based on whether DEFINER or INVOKER is specified. The following rules apply to specifying DEFINER or INVOKER: • If you specify DEFINER, the database or user you specify must be the containing database or user for the foreign server, UDF, table UDF, method, or external SQL procedure. If no location is specified, the authorization is created in the database that contains the foreign server objects (TD_SERVER_DB). • If you specify INVOKER, the database_name or user_dbname you specify must be associated with the session user who will be sending requests to the foreign server. If no location is specified, the authorization is placed in the user database of the creator of the authorization. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 21 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE AUTHORIZATION and REPLACE AUTHORIZATION authorization_name Name for the authorization object. This name must be unique within the database in which it is stored. INVOKER DEFINER • If you specify INVOKER TRUSTED, or if you specify TRUSTED alone, Teradata creates the authorization object in the database of the user who creates the object. This syntax makes the authorization available only to those with privilege to the user database. • If you specify DEFINER TRUSTED or DEFINER DEFAULT TRUSTED, then Teradata creates the authorization object in the database that contains the object that is using the authorization; for a foreign server this is the TD_SERVER_DB database. This syntax makes the authorization globally available. TRUSTED A keyword used to specify that the credentials are to be encrypted and stored as database objects. When using an authorization object, you must use the TRUSTED security type for Teradata QueryGrid: Teradata Database-to-Hadoop. You cannot use TRUSTED authorizations in CREATE or REPLACE UDF or XSP statements. 'fs_user_name' The name of the credential on the remote platform to be used by the foreign server. 'fs_password' The password for the credential on the remote platform to be used by the foreign server. All existing rules for CREATE AUTHORIZATION and REPLACE AUTHORIZATION apply. For more information about using CREATE AUTHORIZATION and REPLACE AUTHORIZATION, see SQL Data Definition Language - Syntax and Examples, B035-1144. Usage Notes • An authorization is required only if you are using an external security system (such as LDAP or Kerberos) for authentication on the foreign server's target platform. For more information, see LDAP and Kerberos Authentication Security. Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH 5.4 is not supported. • You must use either INVOKER TRUSTED or DEFINER TRUSTED when authentication on Hadoop is performed by an external security system such as LDAP or Kerberos. • Use INVOKER TRUSTED when you want to create a one-to-one mapping between the Teradata user and the user on the foreign server's target platform. For example, using the same user name for Teradata and LDAP. • Use DEFINER TRUSTED when you want to create a many-to-one mapping between Teradata users and a user on the foreign server's target platform. For example, when you want multiple Teradata users who are making requests to the foreign server to use one LDAP account on the target platform. 22 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER • When you create an authorization for another user using INVOKER TRUSTED, user_dbname must be specified. Specify the username associated with the session user who will be sending requests to the foreign server. If you fail to specify user_dbname, the authorization will be stored in your user database. • The authorization takes up no space in the database used to store it. • If your credentials change on the foreign server's target platform, you must remember to replace the credentials in your authorization object. If you fail to update the invalid information, the next time that you try to reference the foreign server object, you get an error message. • If you drop an authorization object, keep in mind that it may be used by multiple foreign server objects. You should either drop the foreign server objects or alter them so that they specify a valid authorization object. If you fail to update the invalid information, the next time that you try to reference the foreign server object, you get an error message. Examples of Creating and Replacing the Authorization If you plan to use the authorization to authenticate to LDAP or Kerberos on a foreign server then you must use either INVOKER TRUSTED or DEFINER TRUSTED. The following two examples establish authorization for the user who invokes the object. The credentials are encrypted and stored as a database object in the user database. CREATE AUTHORIZATION sales AS INVOKER TRUSTED USER 'johnson' PASSWORD 'Secret' ; REPLACE AUTHORIZATION sales AS TRUSTED USER 'williams' PASSWORD 'topsecret' ; If you want to make the authorization available globally, create the authorization on TD_SERVER_DB using the DEFINER TRUSTED type. CREATE AUTHORIZATION TD_SERVER_DB.remote_system1 AS DEFINER TRUSTED USER 'proxy_1' PASSWORD 'Global' ; If you use DEFINER TRUSTED, as in this example, then the credentials for johnson are stored in the sales authorization created in the TD_SERVER_DB database. CREATE AUTHORIZATION TD_SERVER_DB.sales AS DEFINER TRUSTED USER 'johnson' PASSWORD 'Secret'; CREATE FOREIGN SERVER Purpose Creates a foreign server object and associates table operators with it. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 23 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER When you create a server object, you can customize it based on its purpose. You can define multiple server objects for the same remote database, each with different characteristics needed by different users. You can use name value pairs to define the characteristics of the foreign server. You can use global parameters to define the foreign server as a whole. Some table operators have local parameters that are specific to the operator. Some parameters can be used so that the server object overrides user selection; for example, you limit access to data by setting the table name. Syntax operator option table_operator using option database_name. Syntax Elements server_name The name given to the foreign server object. EXTERNAL SECURITY Associates an authorization object with the foreign server. The authorization stores the encrypted credentials for a user as a database object. The Teradata QueryGrid connector passes the credentials in the authorization to the remote platform identified by the foreign server when the foreign server is accessed. You must use EXTERNAL SECURITY TRUSTED for Teradata QueryGrid: Teradata Database-to-Hadoop when the Hadoop platform is protected by an external security system, such as Kerberos, for example. INVOKER 24 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER DEFINER INVOKER is a keyword that indicates that the associated authorization must be present in the user database at the time that the foreign server is accessed. Note: The user database is the database that was created for the user in the Teradata system when the user account was created. DEFINER is a keyword that indicates that the associated authorization must be present in the database that contains the foreign server when the foreign server is accessed. Note: The DEFAULT keyword that can be used with DEFINER in CREATE AUTHORIZATION and REPLACE AUTHORIZATION statements is not needed in association with a foreign server. You must use either INVOKER TRUSTED or DEFINER TRUSTED if the remote platform uses an external security system (such as Kerberos, for example) for authentication. TRUSTED A keyword that indicates the associated authorization object was created as TRUSTED. authorization_name Specifies the name of the authorization object to be used when the foreign server is accessed. Using Option USING USING introduces the global name value pairs (NVPs) that provide the server definition information. USING must be followed by at least one name value pair of the form name('value' ), but an empty value of ' ' is supported. You can create a foreign server without a USING clause, but users cannot query a foreign server until you complete the server definition with an import operator and an export operator. The USING clause that appears in the server area (in front of the table operators ) contains global parameters that define the connection to the remote platform and can be applied to both the import and export table operators. The USING clause that appears in the operator option part of the syntax diagram contains local parameters that are used just for that table operator. A name value pair can be used in any USING clause location unless otherwise indicated as "Server only," "Import only," and "Export only." name('value') The name value pair or pairs that you specify to define the foreign server. For descriptions of the name value pairs used with the server object, see "Required Name Value Pairs" and "Optional Name Value Pairs." Required Name Value Pairs These name value pairs are required to create a functioning foreign server object. Additional optional name value pairs may be required to create a foreign server for a specific implementation. hosttype Server only. For Teradata QueryGrid: Teradata Database-to-Hadoop, this is ('hadoop'). Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 25 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER port Server only. The server port number for the Hive Metastore; typically this is 9083. server Server only. The DNS host name or IP address for the Apache Hive Metastore (hive.metastore.uris). You can use an application, such as Ambari, to obtain this value. Optional Name Value Pairs These name value pairs are optional. However, a particular server implementation may require you to define some of these name value pairs. For example, a foreign server must be defined with a Hive port to support queries that access Hive Server2. clustername Required when using security('kerberos'). Specifies the directory name that stores the JAR file that contains the configuration files (core-site.xml, hdfs-site.xml, hive-site.xml, hive-site.xml, and yarn-site.xml) for the Hadoop cluster to be accessed. This directory was set up during the Hadoop client installation on the Teradata nodes. For example, you would use the name value pair clustername('yourcluster') if the files are stored as follows: • yourcluster/ • yourcluster/core-site.xml • yourcluster/hdfs-site.xml • yourcluster/hive-site.xml • yourcluster/hive-site.xml • yourcluster/yarn-site.xml compression_codec Export only. Specifies the type of compression to use for the exported data. The default is no compression. The supported compression types are based on the compression codecs configured on the Hadoop system. Note: Snappy is supported but you must specify it in hadoop_properties using <orc.compression=SNAPPY> as the argument. Snappy is supported only for the ORCFile file type. You must specify the full name for the compression codec as follows: • org.apache.hadoop.io.compress.DefaultCodec • org.apache.hadoop.io.compress.GzipCodec • org.apache.hadoop.io.compress.BZip2Codec dbname The name of the user’s database. This parameter is optional. You can specify a dbname value in the foreign server to limit its scope to a specific database. The value specified in the USING clause in the CREATE FOREIGN SERVER syntax overrides any corresponding value specified directly in the user query. default_string_size Size at which data imported from or exported to Hadoop String columns is truncated. When applied to the import operator, the value represents the maximum number of Unicode characters to import, and defaults to 2048 characters. When applied to the 26 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER export operator, the value represents the maximum number of bytes to export, and defaults to 4096 bytes. Teradata QueryGrid silently truncates the String columns at the default value set in default_string_size. hadoop_properties ('<property1=value>, <property3=value1,value2>') Sets specific properties used to interact with Hadoop. If there are multiple arguments, you must delimit them with angle brackets. If there is only one argument, you can omit the angle brackets. For example, the syntax for the hadoop_properties clause for a High Availability (HA) target supports an updated syntax where multiple values can be included. In this case the properties must be enclosed by left and right angle brackets. No spaces are allowed within or between arguments. The High Availability hadoop properties are defined based on the name service that is defined on your hadoop server. For example, if you have the following Hadoop properties: hadoop_properties(' <dfs.client.use.datanode.hostname=true> ,<dfs.datanode.usedatanode.hostname=true> ,<dfs.nameservices=MYCOMPANY_HADOOP02> ,<dfs.ha.namenodes.MYCOMPANY_HADOOP02=nn1,nn2> ,<dfs.namenode.rpc-address.MYCOMPANY_HADOOP02.nn1=hdp230-2:8020> ,<dfs.namenode.rpc-address.MYCOMPANY_HADOOP02.nn2=hdp230-3:8020> ,<dfs.client.failover.proxy.provider.MYCOMPANY_HADOOP02=org.apache. hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider>') In this example, you would make the following replacements: • Replace MYCOMPANY_HADOOP02 with your own name service ID. • Replace hdp230-2 and hdp230-3 with your own namenode hostnames. It may also be necessary for you to replace nn1 and nn2 in the example above with your own namenode aliases. To verify, check the following property in your hdfssite.xml file: <property> <name>dfs.ha.namenodes.MYCOMPANY_HADOOP02</name> <value>namenode10,namenode66</value> </property> In this case, you would make the following replacements: • Replace nn1 with namenode10 • Replace nn2 with namenode66 In most cases, you should set the dfs.client.use.datanode.hostname property to true. If you have a setup where your TPA nodes are on one BYNET, the Hadoop cluster is on another BYNET, and they are communicating with one another via Ethernet, then you should also set the dfs.datanode.usedatanode.hostname property to true. hiveserver The DNS host name or IP address of Hive Server2. This is used when a query results in the use of the HCTAS and HDROP procedures or FOREIGN TABLE SELECT in Hive. (You can use an application, such as Ambari, to obtain this value.) If no value is specified for hiveserver then the value for server is used. hiveport The port for access to the Hive Server2; typically this is 10000. You can use an application, such as Ambari, to obtain this value. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 27 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER merge_hdfs_files Export only. Indicates that files under the same partition should be merged whenever possible. The default is to not merge. A value of TRUE means that files will be merged. row_count_report_freq The frequency with which byte count is updated in DBQL. The default is every 100 rows. You can adjust this to a larger value if the update frequency is too resource intensive. security Specifies the name of the external security system used for authentication on the Hadoop cluster. This parameter is required when an external security system is in use. The default is no security. Valid values are: • kerberos • ldap Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH 5.4 is not supported. tablename The name of the table to be imported or exported. This parameter is optional. You can specify a tablename value in the foreign server to limit its scope to a specific table. The value specified in the USING clause in the CREATE FOREIGN SERVER syntax overrides any corresponding value specified directly in the user query. temp_dbname Import only. The value is the name of the Hadoop database to use to store temporary Hive staging tables. This parameter is optional. You should consider using temp_dbname when planning to use FOREIGN TABLE syntax on a foreign server or when a foreign server is set up to use usenativequalification. If no database is specified with temp_dbname, then the default Hadoop database is used. To use the specified database, it must exist when the foreign server is created or altered to use the database. The session user must have create and write permission to the database. If multiple users use the same foreign server then the Hadoop administrator may want to consider setting up Hive authorization in such a way that temporary tables cannot be read by another user. transformformatting Import only. When set to 'true' it indicates that an array list data is formatted appropriately, so that it can be cast directly into a Teradata array column type based on the appropriate data type. This parameter is optional. The value specified in the USING clause in the CREATE FOREIGN SERVER syntax overrides any corresponding value specified directly in the user query. updatestatistics Export only. Indicates that the LOAD_TO_HCATALOG_abcn_n_n operator updates the table statistics after all the data has been loaded into the target Hive table. Valid values are 'true' and 'false'. A value of true means that the table statistics are updated. Note: You must also have set hive.stats.autogather to true in your hive-site.xml file for updatestatistics to work properly. 28 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER usenativequalification Import only. A value of 'true' indicates that SELECT queries should be pushed down to Hive as much as possible. When a foreign server uses usenativequalification, Teradata Database examines the following conditions: • Hive table data size is large and there are qualifying predicates on non-partitioned Hive columns. Large is defined as having a number of splits that is larger than the number of Teradata Database nodes. • The queried Hive object is a view. When either of the two conditions are met, Teradata Database constructs a Hive query from the Hive object name, the referenced columns, and the qualifying predicates. It then creates a Hive staging table (in the database specified by temp_dbname) from the constructed query and retrieves data in the staging table. The staging table is dropped after all data has been retrieved. Valid values are 'true' and 'false.' For queries that involve joins between two HCatalog tables, Teradata Database brings the data into Teradata spool and joins them in the database. The join is not pushed into Hive. For example, the join syntax in the following query requires the manual FOREIGN TABLE SELECT to be accomplished in Hive: SELECT h1.c1, h2.c2 FROM h1@hadoop1, h2@hadoop1 WHERE h1.id = h2.id ; username The name of the Hadoop user's credential. This option is ignored when the security name value pair is defined for the foreign server. If no username value and no security value are defined, the foreign server uses the name of user making the request. (This is the Teradata Database user name in capital letters.) No password is associated with a Hadoop user. HDFS and Hive check for a user name for access. If no user name is specified, then foreign server supplies the name of the session user. If HDFS and Hive are not configured for file permissions, then the user name is optional. DO IMPORT WITH Associates an IMPORT table operator with a foreign server. Note: You can specify table operators in any order. DO EXPORT WITH Associates an EXPORT table operator with a foreign server. Note: You can specify table operators in any order. Operator Option database_name. The name of the database that contains the operator that you want to call. For example, SYSLIB. table_operator The name of the table operator to use for import or export. The Teradata-to-Hadoop connector provides the following table operators for use: Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 29 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER Table Operators Connector Releases That Provide These Operators LOAD_FROM_HCATALOG_HDP1_3_2 15.0, 15.0.1, 15.0.2, and 15.0.3 LOAD_TO_HCATALOG_HDP1_3_2 LOAD_FROM_HCATALOG_HDP2_1_2 15.0.1, 15.0.2, 15.0.3, and 15.0.4 LOAD_TO_HCATALOG_HDP2_1_2 LOAD_FROM_HCATALOG_HDP2_3_0 15.0.4 LOAD_TO_HCATALOG_HDP2_3_0 LOAD_FROM_HCATALOG_CDH5_4_3 15.0.4 LOAD_TO_HCATALOG_CDH5_4_3 DO IMPORT WITH uses the LOAD_FROM_HCATALOG_abcn_n_n table operators. These table operators retrieve data from a Hadoop distributed database into Teradata Database, where the data can be placed in tables or joined with existing tables. These table operators produce a spooled table that contains rows and columns of data from a user-specified Hadoop table that is defined in the HCatalog of the remote system. DO EXPORT WITH uses LOAD_TO_HCATALOG_abcn_n_n table operators. These table operators export data from Teradata Database into a Hadoop distributed database, where the data can be placed in tables or joined with existing tables. Note: When you create a foreign server, specify the table operator name with the distribution acronym and version number that corresponds with the version of the Hadoop distribution that you are using. For example, LOAD_FROM_HCATALOG_CDH5_4_3 is compatible with Cloudera CDH version 5.4.3. Supported Data Types, HCatalog File Types, and Compression The following table shows the data types supported by the Teradata-to-Hadoop connector and how they are mapped during import and export. 30 Hadoop Data Type Teradata Database Data Type String VARCHAR UNICODE CHARSET Boolean BYTEINT Integer INT BigInt BIGINT Float FLOAT Double FLOAT BINARY VARBYTE MAP VARCHAR UNICODE CHARSET Struct VARCHAR UNICODE CHARSET ARRAY VARCHAR UNICODE CHARSET Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER Hadoop Data Type Teradata Database Data Type TINYINT BYTEINT SMALLINT SMALLINT Date DATE Timestamp TIMESTAMP Decimal DECIMAL VARCHAR VARCHAR CHAR CHAR Data Import Notes The Hadoop String data type does not have a maximum length. Teradata Database has a row size limit of approximately 64K. Exceeding this row size limit results in an error. The default VARCHAR string size of 4096 bytes (2048 Unicode characters) permits approximately 14 Hadoop String columns to be imported successfully and held in a single Teradata Database row. (Note that the row header and columns of other types are part of the row size and may reduce the number of String columns that can be imported.) You may want to change the default_string_size in the Foreign Server to a different value based on typical string size and number of Hadoop String columns to be imported. For example, if your row size typically exceeds the size limit you may want to take a best-fit approach and set the default_string_size to a value smaller than 2048 so that imports can be performed without error. The strings are truncated at the value set in default_string_size. You may alternately want to import large Hadoop STRING columns using a more customized approach. For more information, see RETURNS Clause. The Teradata Database data type JSON must first be cast to VARCHAR or CLOB for export. It is imported as a VARCHAR or CLOB and cast to JSON. BOOLEAN data types are stored as true/false literal values in Hadoop HiveQL. In Teradata, Hadoop BOOLEAN data types are mapped to BYTEINT data types. When Teradata imports Boolean data from Hadoop, the true/false literal values are stored as 1/0 values, respectively, in the corresponding Teradata BYTEINT column. Data Export Notes The Date and Timestamp are assumed to be UTC. For more information, see Timestamp Data. You should be aware of the display differences that occur when you export BOOLEAN data from Teradata into Hadoop: • If you export data into a Hive Table that you created by using the Hive command line interface, the BYTEINT data type in Teradata is mapped to a BOOLEAN in HCatalog, and the 1/0 values are displayed as true/false literals in Hive. • If you export data into a new Hive table that you created by using the Teradata HCTAS stored procedure, then the BYTEINT data type in Teradata is mapped to TINYINT in HCatalog, which has the same range as BYTEINT. So, the BYTEINT 1/0 values (Boolean) are exported as-is into the TINYINT column in HCatalog. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 31 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER In Hive, if you want to display Boolean true/false literal values in an HCatalog table, instead of the 1/0 values, you can use the built-in CAST conversion function in Hive to convert the display of a TINYINT value to the BOOLEAN primitive type. For example: SELECT CAST(column_bool_1 AS BOOLEAN) from HCTAS_TBL; where column_bool_1 is defined in the HCatalog table as follows: column_bool_1 TINYINT, HCatalog File Types and Compression The following table shows the file types and compression types supported by the Teradata-toHadoop connector. Distribution Supported Import File Types Compression Supported By File Type Hortonworks HDP 2.1.2 and 2.3.0 • • • Compression types supported for use with TextFile: • DefaultCodec • BZip2Codec • GzipCodec TextFile RCFile ORCFile Compression types supported for use with RCFile: • Block compression Compression types supported for use with ORCFile: • Block compression • SnappyCodec Cloudera 5.4.3 • • TextFile RCFile Compression types supported for use with TextFile: • DefaultCodec • BZip2Codec • GzipCodec Compression types supported for use with RCFile: • Block compression Required Privileges You must have CREATE SERVER privilege on the TD_SERVER_DB database to define a foreign server object. If you are associating the server with table operators, you must also have EXECUTE FUNCTION and SELECT privileges on the specified table operators. Usage Notes • The target platform of the foreign server object must be running and reachable when you create the foreign server object for it in Teradata Database. 32 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER • You can create multiple named foreign server objects that reference the same server using the same IP and port numbers. • Foreign server object names that are stored in TD_SERVER_DB must be unique. • Teradata treats the hosttype name value pair as special. If you specify this name value pair, you must use it in the server-area name value list. • Name value pairs in the server area of the syntax apply to the connection to the remote platform and to both of the table operators specified in the IMPORT WITH and EXPORT WITH clauses. • Name value pairs in the IMPORT WITH or EXPORT WITH clause apply only to the table operator specified in the clause. • Server options, names, and name value pairs can appear only once in the CREATE FOREIGN SERVER syntax. Name value pairs used within the IMPORT WITH and EXPORT WITH clauses cannot duplicate those used in the server-area name value list. • The order of the DO IMPORT WITH and DO EXPORT WITH clauses in the CREATE SERVER syntax does not matter. • You must grant SELECT, INSERT, and SHOW privileges on foreign server objects to users who need to query foreign server objects. • The use of the EXTERNAL SECURITY clause is required when the foreign server's target platform uses LDAP or Kerberos for authentication. For more information, see LDAP and Kerberos Authentication Security. You cannot use the following names in the name value pairs in CREATE SERVER statements: • Columns • hExplain • IsNested • Servermode Examples of Using CREATE FOREIGN SERVER A standard foreign server definition must contain the following NVPs: • server • port • hosttype Most foreign server definitions also use the following NVPs: • hiveserver • hiveport • username hadoop_properties may also need to be defined in the following situations: • Data Nodes have a private network (multi-homed) • High Availability is enabled For a description of the name value pairs used to define the foreign server, see Using Option. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 33 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FOREIGN SERVER Example: Typical Server Definition With IMPORT and EXPORT Table Operators The following example creates a server object and associates an IMPORT table operator and an EXPORT table operator with it: CREATE FOREIGN SERVER hadoop1 USING hosttype('hadoop') server('192.0.2.3') port('9083') hiveport ('10000') username('hive') DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_3_0 DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_3_0; Example: Creating a Server Object for LDAP The following example creates a server object for an LDAP-protected Hadoop cluster. It uses an authentication object named auth_hdp that is located in the user database for the session user: CREATE FOREIGN SERVER TD_SERVER_DB.hadoop2 EXTERNAL SECURITY INVOKER TRUSTED auth_hdp USING hosttype('hadoop') server('hserver_name.example') port('9083') hiveport ('10000') security('ldap') DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_1_2, DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_1_2; Example: Creating a Server Object for Kerberos The following example creates a server object for a Kerberos-protected Hadoop cluster. It uses an authentication object named auth_cdh that is located in the user database for the session user: CREATE FOREIGN SERVER TD_SERVER_DB.hadoop2 EXTERNAL SECURITY INVOKER TRUSTED auth_cdh USING hosttype('hadoop') server('hserver_name.example') port('9083') hiveport ('10000') security('kerberos') clustername('foo') DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_CDH5_4_3, DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_CDH5_4_3; 34 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop CREATE FUNCTION (Table Form) Example: Using the Unicode Delimited Identifier to Create a Server Object The following example creates a server object using the Unicode Delimited Identifier for the server name: CREATE FOREIGN SERVER U&"hadoop#005fsrv" UESCAPE'#'USING server('hive_metastore_server') port('9083') hosttype('hadoop') hiveport ('10000') username('hive') DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_3_0 USING transformformatting('true') ; Example: Using a Double-quoted Object Name to Create a Server Object The following example creates a server object using the double-quoted object name server name and associates an IMPORT table operator with it: CREATE FOREIGN SERVER TD_SERVER_DB."hadoop srv1" USING server('hive_metastore_server') port('9083') hosttype('hadoop') hiveport ('10000') username('hive') DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_1_2 USING transformformatting('true') ; CREATE FUNCTION (Table Form) Purpose Creates a table function definition. This syntax diagram excerpt shows the addition of the EXECUTE THREADSAFE parameter, to CREATE FUNCTION (Table Form). For information about the other syntax that you can use with CREATE FUNCTION (Table Form), see SQL Data Definition Language - Syntax and Examples, B035-1144. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 35 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop DROP FOREIGN SERVER Syntax table form language_clause SQL_data_access SQL_data_access language_clause SPECIFIC external_data_access specific_function_name db_name. user_name. PARAMETER STYLE SQL JAVA DETERMINISTIC NOT CALLED ON NULL INPUT EXECUTE THREADSAFE NOT Syntax Elements EXECUTE THREADSAFE Indicates that the function is to be loaded with a special thread safe loader. This attribute applies only to Java UDFs. It uses additional memory since each AMP instance is loaded separately and it is used only when classes are not thread safe. The use of EXECUTE THREADSAFE is not recommended for users. For information about using CREATE FUNCTION (Table Form), see SQL Data Definition Language - Syntax and Examples, B035-1144. DROP FOREIGN SERVER Purpose Drops a foreign server object from the TD_SERVER_DB database. In addition to deleting the server object and its associated information from the dictionary tables, all dependent entries on the associated table operators are deleted. You must have the DROP SERVER privilege on the TD_SERVER_DB database or on the specified foreign server to DROP the foreign server. Syntax DROP FOREIGN SERVER server_name TD_SERVER_DB. 36 ; Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop END LOGGING Syntax Elements server_name The name of the foreign server object. You can also use the following formats for the server name: • the Unicode Delimited Identifier, such as U&"foreign#005fsv" UESCAPE'#' • the double-quoted object name, such as "foreign srv1" TB_SERVER_DB. The name of the database that stores server objects and their attributes. Examples of Dropping a Foreign Server These examples show dropping a server object. DROP FOREIGN SERVER hive_metastore_server ; DROP FOREIGN SERVER U&"hadoop#005fsrv" UESCAPE'#' ; DROP FOREIGN SERVER "hcatalog server" ; END LOGGING Purpose Ends the auditing of SQL requests that started with a BEGIN LOGGING request. This topic describes only the portions of the END LOGGING syntax diagram that are specific to Teradata QueryGrid. For information about the other syntax that you can use with END LOGGING, see SQL Data Definition Language - Syntax and Examples, B035-1144. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 37 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop GRANT and REVOKE Syntax ON END LOGGING DENIALS ALL , WITH TEXT A operation GRANT A B FOR CONSTRAINT , constraint_name BY database_name user_name B , ON ; 20 AUTHORIZATION authorization_name DATABASE database_name USER database_name TABLE object_name VIEW database_name. MACRO user_name. PROCEDURE FUNCTION TYPE FOREIGN SERVER Syntax Elements ON operation Indicates the operation for which log entries should no longer be made. ON FOREIGN SERVER object_name Indicates that the operation for which log entries should no longer be made is access to a foreign server. You must specify an object name, which is the name of the foreign server. You can optionally specify the name of the containing database, which must be TD_SERVER_DB. You cannot use a user_name with FOREIGN SERVER. For information about using END LOGGING, see SQL Data Definition Language - Syntax and Examples, B035-1144. GRANT and REVOKE GRANT grants one or more explicit privileges on a database, foreign server, user, proxy logon user, table, hash index, join index, view, stored procedure, User-Defined Function (UDF), User-Defined Method (UDM), User-Defined Type (UDT), or macro to a role, group of roles, user, or group of users or databases. REVOKE revokes privileges on the same objects. 38 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop HELP FOREIGN There are no changes to existing syntax for Teradata QueryGrid, except that CREATE SERVER and DROP SERVER privileges have been added. These privileges should be granted only to a user, not to a database. For a syntax diagram and description of the syntax elements that you can use in GRANT and REVOKE, see SQL Data Control Language, B035-1149. HELP FOREIGN Purpose Returns the details of the foreign object that you specify. • A foreign server object name returns the list of databases accessible on the server. • The name of a database on a foreign server returns the list of tables in the remote database on the server. • The name of a table in a remote database on a foreign server returns the list of columns in the remote table on the server. Syntax Syntax Elements SERVER server_name The name of the foreign server. Displays the databases in the foreign server. DATABASE db_name@server_name The name of the remote database, qualified with the name of the foreign server. Displays the tables in the database. TABLE db_name.table_name@server_name The name of the remote table, qualified with the name of the foreign server. Displays the column names, types, and partitioning. Required Privileges You must have ANY privilege on the server object to display the output from HELP FOREIGN. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 39 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop HELP FOREIGN Examples of Using HELP FOREIGN Example: Using HELP FOREIGN to List Databases The import table operator behavior determines the information that this statement returns. The table and database names, if specified, are passed as name value pairs to the import table operator to retrieve the appropriate information. The response to the HELP statement is a SELECT response. The number of columns and rows returned for this statement are as in the regular SELECT response from Teradata. Assume you have created the following server: CREATE FOREIGN SERVER HADOOPSRV USING SERVER('HIVE_METASTORE_SERVER') PORT('9083') HOSTTYPE('HADOOP') HIVEPORT ('10000') DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_1_2, DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_1_2; And a user types the following query: HELP FOREIGN SERVER HADOOPSRV ; The output looks similar to the following: *** Query completed. 4 rows found. One column returned. *** Total elapsed time was 6 seconds. databases -----------------------------------------------books default product test Example: Using HELP FOREIGN to List Tables If you use HELP with a database name, it returns a list of the tables in the database, as the following example shows: HELP FOREIGN DATABASE product@hadoopSrv; *** Query completed. One row found. One column returned. *** Total elapsed time was 3 seconds. tables ------------------------------------------cellphonedata_t Example: Using HELP FOREIGN to List the Columns in a Table This example shows the syntax used to list the columns in the cellphonedata_t table in the hadoopSrv database. .sidetitles on .foldline on HELP FOREIGN TABLE product.cellphonedata_t@hive_metastore_server; 40 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop SHOW FOREIGN SERVER The output looks similar to the following: *** Query completed. 13 rows found. 3 columns returned. *** Total elapsed time was 3 seconds. name internal_memory column_type int partitioned_column f name model column_type string partitioned_column f name weight column_type float partitioned_column f name colors column_type string partitioned_column f name camera column_type float partitioned_column f name chipset column_type string partitioned_column f name sim column_type string partitioned_column f name operating_system column_type string partitioned_column f name touchscreen column_type string partitioned_column f name memory_slot column_type string partitioned_column f name stand_by_time column_type int partitioned_column f name dt column_type string partitioned_column t name country column_type string partitioned_column t SHOW FOREIGN SERVER Purpose Displays the SQL text most recently used to create, drop, or modify the server object. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 41 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop SHOW FOREIGN SERVER A SHOW FOREIGN SERVER statement allows you to see a server object definition that contains the name value pairs that the associated table operators use to connect to the foreign server. Syntax SHOW FOREIGN SERVER IN XML server_name TB_SERVER_DB. ; Syntax Elements IN XML To return the report in XML format. The XML schema for the output produced by this option is maintained in: http://schemas.teradata.com/dbobject/DBobject.xsd TB_SERVER_DB. The name of the database that stores foreign server objects and their parameters. server_name The name of the foreign server object. For the full syntax diagram and information about the other objects that can be used with SHOW, see SQL Data Definition Language - Syntax and Examples, B035-1144. Required Privileges SHOW FOREIGN SERVER requires SHOW privilege or ANY privilege on the server object to display the output. Examples of Using SHOW FOREIGN SERVER This example demonstrates a CREATE SERVER followed by a SHOW SERVER statement, and then its output. CREATE FOREIGN SERVER hadoopSrv USING server('hive_metastore_server') port('9083') hosttype('hadoop') hiveport ('10000') DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_3_0, DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_3_0; The SHOW FOREIGN SERVER statement for this server results in output that looks similar to the following: CREATE FOREIGN SERVER TD_SERVER_DB.hadoopSrv USING server ('hive_metastore_server') 42 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop Using the Teradata-to-Hadoop Connector in SELECT Statements port ('9083') hosttype ('hadoop') hiveport ('10000') DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_3_0, DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_3_0; If you use SHOW IN XML FOREIGN SERVER syntax, the output appears similar to the following : <?xml version="1.0" encoding="UTF-16" standalone="no" ?> <TeradataDBObjectSet version="1.0" xmlns="http://schemas.teradata.com/ dbobject" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schema Location="http://schemas.teradata.com/dbobject http:// schemas.teradata.com/dbobject/DBObject.xsd"> <ForeignServer dbName="TD_SERVER_DB" name="hadoopSrv" objId="0:2996" objVer="1"> <ServerClauseList><Clause name="server" value="hive_metastore_server"/> <Clause name="port" value="9083"/> <Clause name="hosttype" value="hadoop"/> <Clause name="hiveport" value="10000"/> </ServerClauseList> <ImportClause tblopdb="SYSLIB" tblopname="LOAD_FROM_HCATALOG_HDP2_3_0"/> <ExportClause tblopdb="SYSLIB" tblopname="LOAD_TO_HCATALOG_HDP2_3_0"/> </ForeignServer> <Environment> <Server dbRelease="15g.00.00.434" dbVersion="15.00.00.00sqlh_16" hostName="td1410"/> <User userId="0000FF03" userName="UT1"/> <Session charset="ASCII" dateTime="2014-01-09T15:50:38"/> </Environment> </TeradataDBObjectSet> Using the Teradata-to-Hadoop Connector in SELECT Statements Purpose SELECT returns specific row data in the form of a result table. Usage Notes For the Teradata-to-Hadoop connector, you can use the table_name@server_name syntax to reference a table on a foreign server or to specify a pass-through query to be executed on a specified foreign server. The reference to the external table calls the IMPORT table operator that is associated with the server definition. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 43 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop Using the Teradata-to-Hadoop Connector in SELECT Statements You can use FOREIGN TABLE syntax in the FROM clause to perform a pass-through query or to retrieve results from the specified foreign server. For example, you can specify a Hive query as the remote pass-through information and the import table operator returns the results of the query for processing. You can specify remote_ pass-through_information as a quoted string to exclude it from Teradata analysis. The table name may optionally specify a database_name, for example database_name.table_name. The database_name references an equivalent name space on the server. For Hadoop, the reference is to the database schema in Hive/HCatalog. The table_name that you use must refer to a base table on the foreign server. References to a view or view-like object are not supported. Note that queries that use the table operators can be CPU intensive. Teradata recommends that you use workload management rules to minimize CPU usage by queries that use the table operators. For more information, see Post-Installation Configuration. For additional information about using SELECT, see SQL Data Manipulation Language, B035-1146. Example of Using SELECT with Remote Pass-through Information Assume the following query: SELECT * FROM FOREIGN TABLE (SELECT count(*) FROM vim.cardata)@hadoop3 myt1 (x); A reference to an external query to pass through (identified by the FOREIGN TABLE (…)@server_name syntax) calls the IMPORT table operator that is associated with the server definition. The grammar in the parentheses is unchecked, but tokenized and then passed to the remote server for execution. The query returns the following: *** Query completed. One row found. One column returned. *** Total elapsed time was 41 seconds. x ----------4 Example of Using SELECT FROM a Table The following example shows the use of SELECT FROM with a Hadoop table, using the table_name@server_name. SELECT * FROM vim.cardata@hadoop2 WHERE make = 'buick'; Example of Limiting the Data Being Spooled The following example demonstrates use of the WHERE clause, which does not limit the data being imported over the network, but does limit data being spooled. SELECT * FROM vim.cardata@hadoop2 as D1 WHERE liter<4 ; 44 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop Using the Teradata-to-Hadoop Connector in SELECT Statements RETURNS Clause The table operator portion of SELECT supports the use of a RETURNS clause to define the expected output columns. The RETURNS clause supports either a column list or a table definition. This clause is typically used if the output column definitions are known and there is no need to access the remote meta-store to dynamically determine the output columns. Note: A view definition is not supported. Examples: Using the RETURNS Clause For the Teradata-to-Hadoop connector LOAD_FROM_HCATALOG_abcn_n_n table operator function, the RETURNS clause maps STRING columns in an HCatalog table to VARCHAR(2048) CHARACTER SET UNICODE on Teradata during import. At times, the values of the STRING columns on Hadoop may be greater or lesser than VARCHAR(2048), so in those cases you can choose to specify the actual size of the Hadoop STRING columns by listing those columns in the RETURNS clause with appropriate VARCHAR display size. For example, you can use the following query: SELECT make, model, price FROM vim.cardata@hadoop2 RETURNS (make VARCHAR(2), model VARCHAR(50)) as D ; The function also supports mapping large Hadoop STRING columns to BLOB or CLOB columns on Teradata. Define the corresponding Hadoop string columns as BLOB or CLOB in the RETURNS clause, for example, as follows: select i, s from vim.tdjson@hadoop2 RETURNS(s clob(2000)) ; By default, the function converts an ARRAY column retrieved from an HCatalog table to a JSON string and then maps that to a VARCHAR(2048) CHARACTER SET UNICODE column on Teradata. If you want to map the Hadoop ARRAY to a matching Teradata ARRAY type, you can indicate that in the RETURNS clause, so that the Hadoop ARRAY type can be converted to a special VARCHAR string that can be CASTed to a matching Teradata ARRAY type. For example: create Type strarray as varchar(25) Array[2] ; SELECT TOP 3 CAST(sarray AS strarray) as stringType FROM vim.arrayall@hadoop2 RETURNS (sarray strarray) ; Timestamp Data When you perform a query that imports data from Hadoop (that is, it uses LOAD_FROM_HCATALOG_abcn_n_n), timestamp data is assumed to be UTC. When you perform a query that exports data to Hadoop (that is, it uses LOAD_TO_HCATALOG_abcn_n_n), timestamp data is converted to UTC. For the following examples, assume that the following data exists on a hadoop cluster: hive -e "SELECT * FROM tab_csv" OK 1 2010-01-01 10:00:00 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 45 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop Using the Teradata-to-Hadoop Connector in SELECT Statements Assume that the Teradata server has the following settings: 16. 17. 18. 57. System TimeZone Hour System TimeZone Minute System TimeZone String TimeDateWZControl = = = = 7 0 Not Set 3 (Enabled with LOCAL) Example: Import Timestamp Data This example imports timestamp data: IMPORT (SELECT * FROM tab_csv@foreign_server;) The data goes through the following conversions: 1. The table operator imports the data (2010-01-01 10:00:00). 2. It assumes that the data is UTC, so Teradata converts it to system time (2010-01-01 17:00:00). 3. Teradata then converts it back to UTC to display the value in the user session (2010-01-01 10:00:00). 4. The session default time zone is +7 hours: (2010-01-01 17:00:00). The value in step 4 will vary, depending on the session time zone selected. For example: set time zone 'gmt'; With this session time zone setting, the displayed timestamp would be (2010-01-01 10:00:00). Example: Export Timestamp Data This example exports timestamp data: (INSERT tab1@foreign_server (2010-01-01 10:00:00)) The data goes through the following conversion: 1. Teradata converts the data to UTC, based on the session time zone. (2010-01-01 03:00:00). 2. The timestamp data is written to the hadoop disk (2010-01-01 03:00:00). The value in step 1 varies, depending on the session time zone selected. For example: set time zone 'gmt'; With the above session time zone setting, the data is converted to (2010-01-01 10:00:00). set time zone 'America pacific'; With the above session time zone setting, the data is converted to (2010-01-01 18:00:00). 46 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop Using the Teradata-to-Hadoop Connector in INSERT Statements Using the Teradata-to-Hadoop Connector in INSERT Statements Purpose Adds new rows to a named table by directly specifying the row data to be inserted (valued form) or by retrieving the new row data from another table (selected, or INSERT … SELECT form). Usage Notes If you refer to an external table as a target of an INSERT/SELECT statement, the INSERT is automatically resolved, and calls the EXPORT table operator (LOAD_TO_HCATALOG_abcn_n_n) that is associated with the foreign server. You identify the table as an external table by using the syntax table_name@server_name. After the server name is resolved, Teradata Database automatically fills in the database name and the table name and executes the associated EXPORT operator. To INSERT into a Hadoop table, the table must already exist, but it can be empty or populated. You can optionally specify a database name in addition to the table name, as in database_name.table_name. The database_name references an equivalent name space on the foreign server. For Hadoop, the reference is to the database schema in Hive/HCatalog. The table_name that you use must refer to a base table on the foreign server. References to a view or view-like object are not supported. The @server syntax in an INSERT statement is not supported as an action statement in a database trigger definition. Hive treats line terminators (by default \r (0x0d) and \n (0x0a)) as end of row markers and assumes that the data that follows a line terminator is part of the next row. Teradata does not remove the line terminators before it exports data to a Hadoop system. You must deal with the line terminators appropriately in your query before the system exports the data to Hadoop. For example, the following command replaces all line breaks (\n) in varchar_col1 with spaces: SELECT oreplace(varchar_col1, ‘0a’xc, ‘ ‘) from tab1 Any timestamp data is converted to UTC. For more information, see Timestamp Data. Note that queries that use the table operators can be CPU intensive. Teradata recommends that you use workload management rules to minimize CPU usage by queries that use the table operators. For more information, see Post-Installation Configuration. For information about using INSERT, see SQL Data Manipulation Language, B035-1146. Example of Using INSERT In this example of INSERT, the inner operator writes data to the Hadoop file system and produces entries to be placed in HCatalog. The output of the inner operator is directed to one AMP, which registers them with HCatalog. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 47 Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop Using the Teradata-to-Hadoop Connector in INSERT Statements INSERT INTO vim.customer@hadoop3 SELECT * FROM customer ; Restricted Words The following words are restricted: • SERVER • IMPORT • EXPORT 48 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide CHAPTER 3 Stored Procedures for Teradata QueryGrid: Teradata Database-to-Hadoop Introduction Teradata supplies stored procedures that you can use to create and drop Hadoop tables. You can use these procedures so that SQL scripts can export data in a standalone manner. HCTAS Stored Procedure Purpose To create the schema of a local Teradata table in Hadoop. You can use this procedure so that SQL scripts can export data in a standalone manner. You must have SELECT privilege on the foreign server to use this stored procedure. Syntax HCTAS ( 'table_name', 'partition_columns_list ', 'table_definition', 'foreign_servername', 'hive_db_name' ) SYSLIB. Syntax Elements table_name The name of the Teradata table to use for referencing column type definitions when creating an Apache Hive table. Column types are translated to the corresponding column types supported by the INSERT. partition_columns_list A comma-separated, ordered list of Teradata columns to use as partitioned columns in Hive. table_definition The Hive table definition information, such as location or format type. foreign_servername The name of the foreign server that you want to create the table on. Used for permissions and connection information. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 49 Chapter 3 Stored Procedures for Teradata QueryGrid: Teradata Database-to-Hadoop HCTAS Stored Procedure hive_db_name The name of the Hive database in which you want to create the table. Supported Data Types HCTAS currently supports the following mapping of data types. Teradata Data Type 2.1 Hive Data Type 1.3.2 Hive Data Type INT INT INT BIGINT BIGINT BIGINT BYTEINT TINYINT TINYINT BYTE BINARY BINARY NUMBER DOUBLE DOUBLE REAL DOUBLE DOUBLE SMALLINT SMALLINT SMALLINT VARBYTE BINARY BINARY VARCHAR STRING STRING CHAR(n) CHAR(n) STRING DECIMAL(n) DECIMAL(n) DECIMAL DATE DATE STRING TIMESTAMP TIMESTAMP STRING ALL ELSE STRING STRING Usage Notes To use HCTAS, you must have the following name value pairs defined for the foreign server: • Hive port: Use port 10000 for the hive server, for example, hiveport(‘10000’). • Server name: The hostname or IP address of the Hive server, for example, server(‘hive_metastore_server’). Examples of Using HCTAS to Create a Table Schema This basic query creates a table without a partition on the Hadoop server. CALL SYSLIB.HCTAS('test',null,null,'hive_metastore_server','default') ; This syntax results in the following output: hive> describe test; OK 50 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 3 Stored Procedures for Teradata QueryGrid: Teradata Database-to-Hadoop HDROP Stored Procedure c1 c2 c3 string string int None None None The following query returns more information: CALL SYSLIB.HCTAS('test','c1,c2', 'LOCATION "/user/hive/ test_table"','hive_metastore_server','default') ; It results in the following output: OK c3 int c1 string c2 string # Partition Information # col_name data_type c1 string c2 string None None None comment None None HDROP Stored Procedure Purpose To drop a Hadoop table on a foreign server. You can use this procedure in SQL scripts to drop tables in a standalone manner. You must have SELECT privilege on the foreign server to use this stored procedure. Syntax HDROP ( 'hive_db_name', 'hive_table_name', 'foreign_servername' ) SYSLIB. Syntax Elements hive_db_name The Hive database where the table is located. hive_table_name The name of the Hive table that you want to drop. foreign_servername The name of the foreign server on which you want to drop the table. HDROP Usage Notes To use HDROP, you must have the following name value pairs defined for the foreign server: Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 51 Chapter 3 Stored Procedures for Teradata QueryGrid: Teradata Database-to-Hadoop HDROP Stored Procedure • Hive port: Use port 10000 for the hive server. For example hiveport(‘10000’) • Server name: The hostname or IP address of the Hive server. For example, server(‘hcatalog_server’) Example of Using HDROP to Drop a Hadoop Table The following example demonstrates the use of HDROP to drop a Hadoop table: CALL SYSLIB.HDROP('defaultDB','testTable','hive_metastore_server') ; 52 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide CHAPTER 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop Privileges Needed to Use Teradata QueryGrid Privileges for Administrators CREATE SERVER and DROP SERVER are object-level privileges that restrict who can use the CREATE FOREIGN SERVER and DROP FOREIGN SERVER SQL statements. • CREATE SERVER can only be granted on the TD_SERVER_DB database as a whole. • DROP SERVER can be granted on the TD_SERVER_DB database or on individual foreign server objects. • The CREATE SERVER and DROP SERVER privileges are included if you grant ALL privileges on the TD_SERVER_DB database. In addition to the CREATE SERVER and DROP SERVER privileges, administrators need the EXECUTE FUNCTION and SELECT privileges on the import and export table operators or on the SYSLIB database that contains the table operators in order to create, drop, and modify foreign server objects. The creator of a foreign server object implicitly receives the following privileges on the object: • SHOW privilege WITH GRANT OPTION • DROP SERVER privilege WITH GRANT OPTION • SELECT privilege WITH GRANT OPTION • If the foreign server object is capable of exporting data (that is, the CREATE FOREIGN SERVER statement includes the DO EXPORT WITH clause), the creator automatically receives the INSERT privilege WITH GRANT OPTION CREATE AUTHORIZATION and DROP AUTHORIZATION privileges are required to work with authorization objects referenced by foreign server objects. DROP AUTHORIZATION is automatically granted to the creator of an authorization object. Privileges for Users of the Foreign Server Object • Users who will be querying the remote database must be granted SELECT, INSERT, and SHOW privileges on the foreign server object used to access the remote server. • Granting the ALL privilege on a foreign server object implicitly grants other privileges that depend on the nature of the foreign server: • If the foreign server object can import data from the remote database (that is, the CREATE FOREIGN SERVER statement included a DO IMPORT WITH clause), Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 53 Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop Maintaining Security granting the ALL privilege on the foreign server implicitly includes the SELECT, SHOW, and DROP privileges. • If the foreign server object can export data to the remote database (that is, the CREATE FOREIGN SERVER statement included a DO EXPORT WITH clause), granting the ALL privilege on the foreign server implicitly includes the INSERT, SHOW, and DROP privileges. Maintaining Security You can maintain security for Teradata QueryGrid: Teradata-to-Hadoop by: • Setting appropriate privileges to those who create and manage foreign server and if used, to authorization objects. • Setting up foreign server objects that match the appropriate access to database and tables needed by Teradata Database users. • Setting appropriate privileges to foreign server objects to Teradata Database users. The physical security of data as it resides on disk or is transferred across the network is not addressed by Teradata QueryGrid. Teradata QueryGrid does not support encryption across networks or any authentication security. You may want to consider the following security guidelines: • You should not grant EXECUTE FUNCTION privileges on the functions in SYSLIB to users performing queries on the foreign server. • GRANT CREATE and DROP SERVER privileges only to a trusted database administrator who administers the server setup. • The trusted database administrator can then grant SELECT or INSERT privilege on the server objects to a subset of users. • The trusted database administrator can set up authentication using one of the methods described in LDAP and Kerberos Authentication Security. • If an external security system is in use (LDAP or Kerberos) on the Hadoop cluster, the user specified in an authorization object must exist in the external security system. • When an authorization object is used, the user name will be used for both HDFS and Hive access. • For a Hadoop cluster protected by LDAP, hive permissions are required even for HDFSonly access. • The user may or may not belong to any group on the Hadoop cluster. • On the Hadoop platform, HDFS and Hive permissions must be set up appropriately or permission will be denied. Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH 5.4 is not supported. 54 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop LDAP and Kerberos Authentication Security LDAP and Kerberos Authentication Security You can set up Teradata QueryGrid: Teradata Database-to-Hadoop to authenticate to a Hadoop cluster that is protected by an external security system, such as LDAP or Kerberos. The Teradata-to-Hadoop connector uses an authorization object to pass on the credentials needed to authenticate to LDAP or Kerberos. Teradata QueryGrid: Teradata Database-to-Hadoop does not work with a Kerberized cluster where Hive requires LDAP authentication. Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH 5.4 is not supported. If the foreign server does not use LDAP or Kerberos, you can define a fixed user name as the value for username in the USING clause of the foreign server. All users using that foreign server will access the Hadoop data under that fixed name. If no username is defined in the foreign server, then the name of the user making the request is used. Authorization Objects and Mapping You can create an authorization object, which stores the credentials for a user in the Hadoop security system (LDAP or Kerberos) in encrypted form. You can set up the Teradata-toHadoop connector to use authorization objects based on your security needs and administrative convenience. If you need one-to-one mapping between a Teradata Database user and a Hadoop user, then you must have corresponding accounts in Teradata Database and the security system. When that user creates the authorization using AS INVOKER TRUSTED, the authorization is stored by default on the user database. The credentials for the security system do not need to be revealed to another person and the authorization object is accessible only to users with privilege to that database. You can use many-to-one mapping between multiple Teradata Database users and one user in the Hadoop security system to simplify administration. Only the creator of the authorization need know the credentials for the user on the Hadoop security system. When the authorization is created using AS DEFINER TRUSTED, the authorization is stored by default in the TD_SERVER_DB database, which makes the authorization available globally. Where the Foreign Server Looks for the Authorization Object When a foreign server is configured with the INVOKER keyword and no value is specified for the database name (dbname) the Teradata-to-Hadoop connector automatically looks for the authorization in the user database of the session user. When a foreign server is configured with the DEFINER keyword and no value is specified for the database name (dbname) the Teradata-to-Hadoop connector automatically looks for the authorization in the TD_SERVER_DB database. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 55 Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop LDAP and Kerberos Authentication Security Setup Process for LDAP and Kerberos Authentication If you are using Hortonworks HDP 2.1 or 2.3, you can follow this process to give Teradata Database users access to a Hadoop cluster that uses LDAP or Kerberos authentication. If you are using Cloudera CDH 5.4, you can follow this process to give Teradata Database users access to a Hadoop cluster that uses Kerberos authentication. 1. Create the required authorization objects based on your mapping scheme. 2. For Kerberos only, the Hadoop core-site.xml must contain a proxy user entry for each Kerberos user principal used for authentication. An initial set of proxy users was added during the Teradata-to-Hadoop connector installation. If you want to use additional proxy users you must add them to core-site.xml. For more information, see Configuring a Kerberos User for Use as a Proxy. 3. Create the foreign server object: • Use the required syntax for your authorization. For example, if the authorization is create using DEFINER, the foreign server must be created using DEFINER. • In the USING clause, include security and specify the system being used (ldap or kerberos). • If you are using Kerberos, then you must include clustername to specify the directory name in the auxiliary JAR file under which the Hadoop XML configuration files reside. 4. Grant the SELECT privilege and the INSERT privilege on the foreign server object to the desired set of users. For information on authorizations, see CREATE AUTHORIZATION and REPLACE AUTHORIZATION. For more information on foreign servers, see CREATE FOREIGN SERVER. Example: Kerberos Using INVOKER This example creates the remote_hdp authorization object in the creator's user database. If the creator is td_user then td_user.remote_hdp is the fully qualified object name. create authorization remote_hdp as invoker trusted user 'kerberos_user' password 'kerberos_pass'; This example creates a foreign server object that uses the remote_hdp authorization object. create foreign server hdp21 external security invoker trusted remote_hdp using hosttype('hadoop') port('9083') hiveport('10000') server('hdp21.example.com') security('kerberos') clustername('spiral') do import with syslib.load_from_hcatalog_hdp2_1_2, do export with syslib.load_to_hcatalog_hdp2_1_2; The clustername value of spiral matches the directory name in the auxiliary JAR file that was created during installation of QueryGrid. clustername is required for a Kerberosprotected Hadoop cluster. 56 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop LDAP and Kerberos Authentication Security Example: Kerberos Using DEFINER This example creates the remote_hdp authorization object in the td_server_db database. create authorization td_server_db.remote_cdh as definer trusted user 'kerberos_proxy_user' password 'kerberos_proxy_pass'; This example creates a foreign server object that uses the remote_hdp authorization object. create foreign server cdh54 external security definer trusted remote_cdh using hosttype('hadoop') port('9083') hiveport('10000') server('cdh54.example.com') security('kerberos') clustername('spiral') do import with syslib.load_from_hcatalog_cdh5_4_3, do export with syslib.load_to_hcatalog_cdh5_4_3; The clustername value of spiral matches the directory name in the auxiliary JAR file that was created during installation of QueryGrid. clustername is required for a Kerberosprotected Hadoop cluster. Example: LDAP Using INVOKER This example creates the remote_hdp authorization object in the creator's user database. If the creator is td_user then td_user.remote_hdp is the fully qualified object name. create authorization remote_hdp as invoker trusted user 'ldap_user' password 'ldap_pass'; This example creates a foreign server object named hdp21 that uses the remote_hdp authorization object. create foreign server hdp21 external security invoker trusted remote_hdp using hosttype('hadoop') port('9083') hiveport('10000') server('hdp21.example.com') security('ldap') do import with syslib.load_from_hcatalog_hdp2_1_2, do export with syslib.load_to_hcatalog_hdp2_1_2; Example: LDAP Using DEFINER This example creates the remote_hdp authorization object in the td_server_db database. create authorization td_server_db.remote_hdp as definer trusted user 'ldap_proxy_user' password 'ldap_proxy_pass'; Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 57 Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop LDAP and Kerberos Authentication Security This example creates a foreign server object named hdp21 that uses the remote_hdp authorization object. create foreign server hdp21 external security definer trusted remote_hdp using hosttype('hadoop') port('9083') hiveport('10000') server('hdp21.example.com') security('ldap') do import with syslib.load_from_hcatalog_hdp2_3_0, do export with syslib.load_to_hcatalog_hdp2_3_0; Kerberos Maintenance You must update the configuration of Teradata QueryGrid: Teradata Database-to-Hadoop under these circumstances: • You want a foreign server to be able to access Hadoop using a new Kerberos user principal (that is, a Kerberos user not previously used for authentication by any foreign server). For more information, see Configuring a Kerberos User for Use as a Proxy. • The name or location of the default Kerberos realm or the location of the host for your KDC (Key Distribution Center) or administration server changes. For more information, see Updating Kerberos Configuration Information. Configuring a Kerberos User for Use as a Proxy The core-site.xml file for the Hadoop NameNode must include information for each Kerberos user who will access Hadoop from Teradata Database. During the installation of Teradata QueryGrid: Teradata Database-to-Hadoop, an initial set of users was added to the coresite.xml file. If you want to use a new user as a proxy, then properties for that user must be added to core-site.xml. This task must be performed before you use an authorization object created for that user. If you are using Hortonworks HDP, you can use an application, such as Ambari, that you would normally use to edit the service property values in core-site.xml. For information about how to edit core-site.xml, refer to your tool's documentation. If you are using Cloudera CDH, you can use Cloudera Manager to edit the core-site.xml file. 1 In core-site.xml, add a property for groups where you replace user_name with the name of the user: hadoop.proxyuser.user_name.groups 2 Add a key value of * to indicate a member of any group or specify groups by name in a comma-separated list. 3 Add a property for hosts where you replace user_name with the name of the user: hadoop.proxyuser.user_name.hosts 58 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop LDAP and Kerberos Authentication Security 4 Add a key value of * to indicate that the proxy can connect from any host (that is, a Teradata node) or specify hosts by name in a comma-separated list. 5 If you are using Cloudera CDH, use Cloudera Manager to redeploy the configuration files. After redeployment, the following files can be found in the /etc/hive/conf directory: • core-site.xml • hdfs-site.xml • hive-site.xml • mapred-site.xml • yarn-site.xml Property Example This example shows the properties added for a proxy user named myproxy_user. <property> <name>hadoop.proxyuser.myproxy_user.groups</name> <value>group1,group2</value> <description> Allow the proxy user myproxy_user to impersonate any members of the groups: group1 or group2. </description> </property> <property> <name>hadoop.proxyuser.myproxy_user.hosts</name> <value>host1,host2</value> <description> Allow the proxy user myproxy_user to connect only from host1 and host2 to impersonate a user. It is recommended to use the IP addresses of the Teradata nodes. </description> </property> Updating Kerberos Configuration Information During the installation of Teradata QueryGrid: Teradata Database-to-Hadoop, communication was set up between the Teradata Database and the Kerberos authentication server or realm. If you make changes to your default Kerberos realm or to the location of the host for your KDC or administration server, then you must update that information in the krb5.conf file. The file is located on the Teradata Database nodes on which the Kerberos client is installed and the Hadoop cluster nodes. You may use any tools that you would normally use to edit the krb5.conf file and install the jar file. For information, refer to your tool's documentation. 1 Navigate to the krb5.conf files on all nodes in both systems and set up communication between the Teradata Database and the Kerberos authentication server or realm. In the following example, bolded content is updated: [libdefaults] default_realm = C1.HADOOP.MYCOMPANY.COM dns_lookup_realm = false Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 59 Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop LDAP and Kerberos Authentication Security dns_lookup_kdc = false ticket_lifetime = 24h forwardable = yes udp_preference_limit = 1 [realms] EXAMPLE.COM = { kdc = kerberos.example.com admin_server = kerberos.example.com } C1.HADOOP.MYCOMPANY.COM = { kdc = spiral1.mydivision.mycompany.com:88 admin_server = spiral1.mydivision.mycompany.com:749 default_domain = hadoop.com } [domain_realm] .hadoop.com = C1.HADOOP.MYCOMPANY.COM hadoop.com = C1.HADOOP.MYCOMPANY.COM [logging] kdc = FILE:/var/log/krb5/krb5kdc.log admin_server = FILE:/var/log/krb5/kadmind.log default = SYSLOG:NOTICE:DAEMON 2 Depending on the distribution you are using, do one of the following tasks: Option Description Hortonworks HDP Create a JAR file directory and in it create a JAR file that contains the required configuration files. Cloudera CDH Create a JAR file directory that reflects the nameservices name, and in it create a JAR file that contains the required configuration files. For example: jar cvf spiral.jar spiral/*.xml Enabling security for Kerberos requires a clustername clause in the CREATE SERVER statement. The value of this clause must match the directory name under which the XML files reside in the auxiliary JAR files created by the user. • core-site.xml • hdfs-site.xml • hive-site.xml • mapred-site.xml • yarn-site.xml In this example, the clause clustername is spiral and must be specified and match the directory name. spiral/ spiral/core-site.xml spiral/hdfs-site.xml spiral/hive-site.xml spiral/mapred-site.xml spiral/yarn-site.xml 60 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop LDAP and Kerberos Authentication Security 3 Complete the procedure in Configuring Kerberos Settings for Teradata QueryGrid. Configuring Kerberos Settings for Teradata QueryGrid If you are configuring Kerberos, complete this procedure after using PUT to install the Teradata QueryGrid connector. Note: tdsqlh_td 15.00.03.xx is the minimum version of the Teradata QueryGrid connector package required to use with Kerberos. Configuring Kerberos Settings When Using Hortonworks HDP 1 Edit the tdsqlh_hdp.bteq file to install the JAR file created earlier and add it to CLASSPATH: • mycluster designates the directory name created earlier in this procedure. • myjar.jar designates the JAR file created earlier in this procedure. a Add the following lines into tdsqlh_hdp.bteq near similar lines of code: CALL sqlj.install_jar('cj!myjar.jar','mycluster',0); CALL sqlj.replace_jar('cj!myjar.jar','mycluster'); b Modify the following statements in tdsqlh_hdp.bteq by adding (*,mycluster) to the end of the statements: CALL sqlj.alter_java_path('SQLH_HDP2_1_2','(*,tdsqlh_hdp_HDP2_1_2) (*,avro_HDP2_1_2)(*,commons-cli_HDP2_1_2)(*,commonscodec_HDP2_1_2)(*,commons-configuration_HDP2_1_2)(*,commonslang_HDP2_1_2)(*,commons-logging_HDP2_1_2)(*,datanucleuscore_HDP2_1_2)(*,guava_HDP2_1_2)(*,hadoop-auth_HDP2_1_2) (*,hadoop-common_HDP2_1_2)(*,hadoop-hdfs_HDP2_1_2)(*,hadoop-mrcommon_HDP2_1_2)(*,hadoop-mr-core_HDP2_1_2)(*,hivecommon_HDP2_1_2)(*,hive-exec_HDP2_1_2)(*,hive-hcat-core_HDP2_1_2) (*,hive-jdbc_HDP2_1_2)(*,hive-metastore_HDP2_1_2)(*,hiveserde_HDP2_1_2)(*,hive-service_HDP2_1_2)(*,httpclient_HDP2_1_2) (*,httpcore_HDP2_1_2)(*,jackson-core-asl_HDP2_1_2) (*,jetty_HDP2_1_2)(*,jetty-util_HDP2_1_2)(*,libfb303_HDP2_1_2) (*,log4j_HDP2_1_2)(*,pig_HDP2_1_2)(*,slf4j-api_HDP2_1_2)(*,slf4jlog4j12_HDP2_1_2)(*,snappy-java_HDP2_1_2)(*,mycluster)'); CALL sqlj.alter_java_path('SQLH_NO_VER','(*,tdsqlh_hdp_HDP2_1_2) (*,avro_HDP2_1_2)(*,commons-cli_HDP2_1_2)(*,commonscodec_HDP2_1_2)(*,commons-configuration_HDP2_1_2)(*,commonslang_HDP2_1_2)(*,commons-logging_HDP2_1_2)(*,datanucleuscore_HDP2_1_2)(*,guava_HDP2_1_2)(*,hadoop-auth_HDP2_1_2) (*,hadoop-common_HDP2_1_2)(*,hadoop-hdfs_HDP2_1_2)(*,hadoop-mrcommon_HDP2_1_2)(*,hadoop-mr-core_HDP2_1_2)(*,hivecommon_HDP2_1_2)(*,hive-exec_HDP2_1_2)(*,hive-hcat-core_HDP2_1_2) (*,hive-jdbc_HDP2_1_2)(*,hive-metastore_HDP2_1_2)(*,hiveserde_HDP2_1_2)(*,hive-service_HDP2_1_2)(*,httpclient_HDP2_1_2) (*,httpcore_HDP2_1_2)(*,jackson-core-asl_HDP2_1_2) (*,jetty_HDP2_1_2)(*,jetty-util_HDP2_1_2)(*,libfb303_HDP2_1_2) (*,log4j_HDP2_1_2)(*,pig_HDP2_1_2)(*,slf4j-api_HDP2_1_2)(*,slf4jlog4j12_HDP2_1_2)(*,snappy-java_HDP2_1_2)(*,mycluster)'); Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 61 Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop LDAP and Kerberos Authentication Security Configuring Kerberos Settings When Using Cloudera CDH 1 Edit the tdsqlh_cdh.bteq file to install the JAR file created earlier and add it to CLASSPATH: • mycluster designates the directory name created earlier in this procedure. • myjar.jar designates the JAR file created earlier in this procedure. a Add the following lines into tdsqlh_cdh.bteq near similar lines of code: CALL sqlj.install_jar('cj!myjar.jar','mycluster',0); CALL sqlj.replace_jar('cj!myjar.jar','mycluster'); b Modify the following statements in tdsqlh_cdh.bteq by adding (*,mycluster) to the end of the statements: CALL sqlj.alter_java_path('SQLH_cdh5_4_3','(*,tdsqlh_cdh_cdh5_4_3) (*,commons_collections_cdh5_4_3)(*,hive_serde_cdh5_4_3) (*,guava_cdh5_4_3)(*,commons_io_cdh5_4_3)(*,log4j_cdh5_4_3) (*,hadoop_hdfs_cdh5_4_3)(*,avro_cdh5_4_3)(*,hive_service_cdh5_4_3) (*,commons_cli_cdh5_4_3)(*,hive_exec_cdh5_4_3) (*,commons_logging_cdh5_4_3)(*,libfb303_cdh5_4_3) (*,datanucleus_core_cdh5_4_3)(*,hadoop_auth_cdh5_4_3) (*,commons_configuration_cdh5_4_3)(*,hadoop_common_cdh5_4_3) (*,hadoop_core_cdh5_4_3)(*,hadoop_mrcapp_cdh5_4_3) (*,hadoop_mrccommon_cdh5_4_3)(*,hadoop_mrccore_cdh5_4_3) (*,htrace_com_cdh5_4_3)(*,servlet_api_cdh5_4_3) (*,hive_jdbc_cdh5_4_3)(*,commons_codec_cdh5_4_3) (*,commons_lang_cdh5_4_3)(*,hive_hcat_core_cdh5_4_3) (*,jdo_api_cdh5_4_3)(*,hive_common_cdh5_4_3) (*,hive_metastore_cdh5_4_3)(*,protobuf_java_cdh5_4_3) (*,httpclient_cdh5_4_3)(*,httpcore_cdh5_4_3)(*,pig_cdh5_4_3) (*,mycluster)'); CALL sqlj.alter_java_path('SQLH_NO_VER','(*,tdsqlh_cdh_cdh5_4_3) (*,commons_collections_cdh5_4_3)(*,hive_serde_cdh5_4_3) (*,guava_cdh5_4_3)(*,commons_io_cdh5_4_3)(*,log4j_cdh5_4_3) (*,hadoop_hdfs_cdh5_4_3)(*,avro_cdh5_4_3)(*,hive_service_cdh5_4_3) (*,commons_cli_cdh5_4_3)(*,hive_exec_cdh5_4_3) (*,commons_logging_cdh5_4_3)(*,libfb303_cdh5_4_3) (*,datanucleus_core_cdh5_4_3)(*,hadoop_auth_cdh5_4_3) (*,commons_configuration_cdh5_4_3)(*,hadoop_common_cdh5_4_3) (*,hadoop_core_cdh5_4_3)(*,hadoop_mrcapp_cdh5_4_3) (*,hadoop_mrccommon_cdh5_4_3)(*,hadoop_mrccore_cdh5_4_3) (*,htrace_com_cdh5_4_3)(*,servlet_api_cdh5_4_3) (*,hive_jdbc_cdh5_4_3)(*,commons_codec_cdh5_4_3) (*,commons_lang_cdh5_4_3)(*,hive_hcat_core_cdh5_4_3) (*,jdo_api_cdh5_4_3)(*,hive_common_cdh5_4_3) (*,hive_metastore_cdh5_4_3)(*,protobuf_java_cdh5_4_3) (*,httpclient_cdh5_4_3)(*,httpcore_cdh5_4_3)(*,pig_cdh5_4_3) (*,mycluster)'); 62 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide CHAPTER 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop Creating the Server Database To use the Teradata-to-Hadoop connector, the TD_SERVER_DB database must exist to hold server objects and their associated information. This database is created by running the Database Initialization Program (DIP). DIP is a series of executable SQL script files packaged with Teradata Database. Each DIP script creates one or more system users, databases, macros, tables, and views, for use by the Teradata Database and/or by users. All of the DIP scripts that you need should have been executed during Teradata Database installation. For information about using the DIP scripts, see Utilities, B035-1102. Post-Installation Configuration Some configuration is required before you use the Teradata-to-Hadoop connector. For example, the following kinds of things may need to be changed: • FSGcache concurrency settings. • Work load management rules that control the number of concurrent queries. • Java Virtual Machine (JVM) settings. A change in JVM settings requires a restart. You should also make sure that a proxy user has been set up on the Hadoop cluster. For more information, see the version of the Orange Book, Teradata® QueryGrid™: Teradata Database-to-Hadoop, publication number 541-0009812, that supports release 15.0.4 of the Teradata-to-Hadoop connector. Tuning Concurrency Between FSGCache and the JVM To process Teradata-to-Hadoop connector requests, the node-level JVM requires a significant amount of Java heap and permanent memory space to handle thread safety and HDFS buffering. By default, TASM limits concurrency to two queries, so either the FSGCache and JVM settings should be changed to support two queries, or the concurrency rules should be reset to match the level of concurrency that the system has been tuned for. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 63 Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop Post-Installation Configuration Viewpoint Workload Designer contains a default TDWM rule in the TDWM ruleset that includes the Teradata-to-Hadoop connector table operators. You can change the concurrency value in the default rule or you can delete the default rule and define custom rules for the SYSLIB.LOAD_FROM_HCATALOG_abcn_n_n object and the SYSLIB.LOAD_TO_HCATALOG_abcn_n_n object. If you define custom rules, be sure to delete the default rule. Teradata Database continues to use the values in the default rule until it is deleted. Custom rules can be included with other functions but cannot include objects other than functions. For information about using Viewpoint Workload Designer, see the Teradata Viewpoint User Guide, B035-2206. The level of concurrency that you want influences the FSGcache settings and the JVM heap and perm space that is needed. The following table provides an example that illustrates the relationship between these settings. It is based on an example system that has 36 AMPs per node. 128GB Teradata System 96GB Teradata System FSGCache 95% FSGCache 92% FSGCache 90% FSGCache 88% FSGCache 85% 6.5GB 10.24GB 12.8GB 15.36GB 19.2GB 2 concurrent queries 3 concurrent queries 4 concurrent queries 5 concurrent queries ~7 concurrent queries Requires 512MB Java perm space. Requires 750MB Java perm space. Requires 1GB Java Requires 1.25GB perm space. Java perm space. Requires 1.75GB Java perm space. 4.8GB 7.68GB 9.6GB 11.52GB 14.4GB 3 concurrent queries ~4 concurrent queries ~5 concurrent queries Requires 750MB Java perm space. Requires 1GB Java Requires 1.25GB perm space. Java perm space. 1 concurrent query 2 concurrent queries Requires 512MB Java perm space. Requires 512MB Java perm space. For more information about tuning, see the version of the Teradata® QueryGrid™: Teradata Database-to-Hadoop Orange book, publication number 541–0009812, that supports release 15.0.4 of the Teradata-to-Hadoop connector. It contains embedded Microsoft® Excel® spreadsheets to use to calculate suggested memory settings. You can enter numbers that represent your desired configuration into the appropriate spreadsheet, and the spreadsheet produces an estimate of the suggested FSGCache and JVM memory settings needed to run Teradata QueryGrid:Teradata Database-to-Hadoop on your system. For information about the FSGCache, see Utilities, B035-1102. JVM Configuration for the Teradata-to-Hadoop connector Table Operators We recommend that you use the following garbage collection options to efficiently clean up unused Java objects and keep the memory usage under control: • -XX:UseParallelGC • -XX:+UseParallelOldGC Java permanent space is used for storing the Java classes that are loaded through the Java class loader. By default, the Java permanent space is set to 64MB. The Teradata-to-Hadoop 64 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop Post-Installation Configuration connector table operator needs to load significantly more classes with the thread-safe loader. For Java permanent space, we suggest the following settings for Hortonworks 2.1.2: • • • • • 1 concurrent query: -XX:MaxPermSize=512m 2 concurrent queries: -XX:MaxPermSize=512m 3 concurrent queries: -XX:MaxPermSize=750m 4 concurrent queries: -XX:MaxPermSize=1GB 5 concurrent queries: -XX:MaxPermSize=1.25GB Hadoop libraries require a relatively large amount of heap memory for object allocation, I/O buffers, and temporary caches during the life of a request. The Java heap size must be configured appropriately so that memory is efficiently used. The following table lists the recommended minimum and maximum heap sizes for one to five concurrent queries for Hortonworks 2.1.2: Concurrency Heap Sizes 1 concurrent query Minimum: -Xms4g Maximum: -Xmx4g 2 concurrent queries Minimum: -Xms6g Maximum: -Xmx6g 3 concurrent queries Minimum: -Xms9g Maximum: -Xmx9g 4 concurrent queries Minimum: -Xms12g Maximum: -Xmx12g 5 concurrent queries Minimum: -Xms15g Maximum: -Xmx15g JVM Configuration and ORC Files ORC files contain a property stripe size. The JVM memory tuning parameters for Teradata Database are optimized for a 64 MB stripe size. A larger stripe size requires significantly more heap memory to run with reasonable response time. It is recommended that a stripe size of 64 MB be used. For larger stripe sizes an additional 8 GB of memory per concurrent query is recommended. For information on modifying the ORC file stripe size, see the Hadoop documentation. For information on modifying the JVM heap, see Configuring Teradata JVM Options. Configuring Teradata JVM Options If you’ve determined that you need to change the JVM settings for your Teradata system, you can apply the JVM options to the system using a cufconfig utility property called JVMOptions. For information about the cufconfig utility, see Utilities, B035-1102. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 65 Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop Post-Installation Configuration To set the appropriate JVM heap and perm space, you can perform the following steps on the primary node of the Teradata system: 1 Create a text file that contains the appropriate JVM heap and perm sizes. List all options in a single line, delimited with a space. For example, the settings below are for a Teradata system with 36 AMPs on each node with 96GB of memory. JVMOptions: -server -XX:UseParallelGC -XX:+UseParallelOldGC – Xms7100m -Xmx7100m -XX:NewSize=2370m -XX:MaxNewSize=2370m XX:MaxPermSize=864m Name the file itjvmopt.txt and put it in the following location: /etc/opt/teradata/tdconfig/jvmconfig/ 2 Run the following command: cufconfig –f /etc/opt/teradata/tdconfig/jvmconfig/jvmopt.txt 3 Then, run the following command to make sure that the JVMOptions property appears on the bottom of the output and that its value has been updated with the values specified in your itjvmopt.txt file: cufconfig -o The bottom of the output should look similar to this: USRLibraryPath: /usr/tdbms/lib JVMOptions: -server -XX:UseParallelGC -XX:+UseParallelOldGC – Xmx7100m -Xms7100m -XX:NewSize=2370m -XX:MaxNewSize=2370m XX:MaxPermSize=864m 4 Restart the database so that the new JVM options take effect. Monitoring User Queries Between Teradata and a Foreign Server To monitor the data transfer between Teradata and a foreign server by a user’s request, you can use the following APIs: • PM/API MONITOR SESSION Note: If you are using the MONITOR SESSION request, set the mon_ver_id to 11, where mon_ver_id is the monitor software version ID field for the current release. • Open API MonitorMySessions • Open API MonitorSession These APIs return the following field/column values: 66 Field Name/Column Name Description ReqTblOpBytesIn The total number of bytes transferred into Teradata Database from a foreign server for the current request through one or more table operators. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop Archive and Restore Field Name/Column Name Description Note: The request may involve one or multiple table operator executions. The ReqTblOpBytesIn output parameter shows bytes transferred across all invocations within the request. ReqTblOpBytesOut The total number of bytes transferred out of Teradata Database and into a foreign server for the current request through one or more table operators. Note: The request may involve one or multiple table operator executions. The ReqTblOpBytesOut output parameter shows bytes transferred across all invocations within the request. For more information about these APIs, see Application Programming Reference, B035-1090. Note: You can also monitor the transfer of the data in Viewpoint. Check the Remote Data Imported and Data Exported Remotely fields on the Overview tab of the Details View of Query Monitor, Query Groups, My Queries, Query Spotlight, and Workload Monitor. For more information about Viewpoint, see the Teradata Viewpoint User Guide, B035-2206. Archive and Restore Database DBC.TD_SERVER_DB stores all the server objects created using the Teradata-toHadoop connector. Note the following about archiving or restoring this database: • Archive and restore TD_SERVER_DB as a user database. It is not archived and restored as part of DBC. • You can archive and restore the entire database or individual server objects in the database. • Teradata archives the associated rows from DBC.ServerInfo and DBC.ServerTblOpInfo at the same time as it archives TD_SERVER_DB. • The post-restore script validates server connectivity. Note the following about copying foreign server objects: • Users lose their privileges on a foreign server object after it is copied, so administrators must grant these privileges again. • You can copy the entire TD_SERVER_DB database or individual server objects in the database. Renaming is not allowed. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 67 Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop Archive and Restore 68 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide CHAPTER 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views and Tables This chapter describes the Teradata-to-Hadoop connector Data Dictionary views and tables in the DBC. The Teradata-to-Hadoop connector Data Dictionary tables are reserved for system use and contain metadata about the foreign servers defined on the Teradata Database system. The Teradata-to-Hadoop connector Data Dictionary data can also be populated in other Data Dictionary views and tables. You can retrieve frequently-used data from any of the Data Dictionary tables via pre-defined views. The Teradata database administrator determines the set of views available to a user. You can use Teradata Administrator, Teradata SQL Assistant, or Teradata Studio Express to list the Teradata-to-Hadoop connector Data Dictionary views and tables and details about each view or table column. The views and tables in this book are presented in alphabetical order for quick reference to the meaning of individual fields. The actual Data Dictionary tables and view fields do not appear in alphabetical order. Installing Data Dictionary The system databases, tables and associated views and macros are created at system initialization (sysinit) time and by executing a set of Dictionary Initialization Program (DIP) scripts. The DIPALL option executes all of the DIP scripts that are installed on every system. Optional DIP scripts include: • DIPACC (supports database access logging) • DIPPDCR (supports infrastructure used by Teradata Professional Services when analyzing system performance issues) For information about ... See ... the DIP utility and its executable SQL scripts (such as DIPPDCR, DIPACC, DIPSYSUIF, DIPVIEWS, and DIPALL) Utilities. the macros that are created by the DIPVIEWS script Database Administration. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 69 Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views For information about ... See ... using the DIPACC script to create the DBC.ACCLogRule macro, which is required for setting up database access logging, Security Administration. Displaying the View and Table Definitions To display the view or table definitions, execute SHOW VIEW or SHOW TABLE objectname, where objectname is the name of the view or table whose most recent SQL create text is to be reported. For details on using the SHOW VIEW or SHOW TABLE statement, see SQL Data Definition Language - Syntax and Examples, B035-1144. For more information about the views and tables described in this chapter, see Data Dictionary, B035-1092 or Database Administration, B035-1093. Data Dictionary Views This topic describes the following: • Teradata-to-Hadoop connector Data Dictionary views • the values related to the Teradata-to-Hadoop connector that are populated in the DBQL QryLogV and QryLogStepsV views The Data Dictionary views described here are categorized as operational internal database views. Note: The Teradata-to-Hadoop connector Data Dictionary views each have an equivalent X version (for example, for the ServerV view, there is also an X version of that view). The X version of the view limits the view to only those server objects to which the user selecting from the view has access. For more information about view categories and X and VX views (also referred to as modern views), see Data Dictionary, B035-1092. QryLogV Category Operations Database DBC View Column and Referenced Table.Column View Column Data Type Format Referenced Table.Column TotalServerByteCount FLOAT ----,---,---,---,--9 DBQLogTbl.TotalServerByteCount 70 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views Usage Notes This DBQL view of the DBQLogTbl table reports things such as the AMP using the most CPU, the AMP with the most I/O, or maximum amount of spool used when processing a query. It can also report the size of the data transferred between Teradata and a foreign server. For more information about the QryLogV view, see Data Dictionary, B035-1092. For more information about the DBQL feature and detailed descriptions about the QryLogV view, see Database Administration, B035-1093. TotalServerByteCount Column The TotalServerByteCount column is the total number of bytes read from or sent to a foreign server. Example of QryLogV The following SELECT statement retrieves the main view for DBQL: SELECT * from dbc.qrylogv; Result: BTEQ ProcID CollectTimeStamp QueryID UserID UserName DefaultDatabase AcctString ExpandAcctString SessionID LogicalHostID RequestNum InternalRequestNum TxnUniq LockLevel LogonDateTime AcctStringTime AcctStringHour AcctStringDate LogonSource 01 LSS AppID ClientID ClientAddr QueryBand ProfileID StartTime FirstStepTime FirstRespTime ElapsedTime NumSteps 30719 2014-02-05 01:30:49 307191222605399239 00000004 TEST1 TEST1 SALES SALES 1,012 1 6 6 ? ? 2014-02-05 01:57:18 ? ? ? (TCP/IP) d1e5 198.51.100.15 192.0.2.24 9208 AA186017 BTEQ AA186017 198.51.100.24 ? ? 2014-02-05 01:57:27.260000 2014-02-05 01:57:28.690000 2014-02-05 01:57:30.480000 0:00:03.220000 4 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 71 Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views NumStepswPar MaxStepsInPar NumResultRows TotalIOCount AMPCPUTime ParserCPUTime UtilityByteCount UtilityRowCount ErrorCode ErrorText WarningOnly AbortFlag CacheFlag StatementType StatementGroup QueryText NumOfActiveAMPs MaxAMPCPUTime MaxCPUAmpNumber MinAmpCPUTime MaxAmpIO MaxIOAmpNumber MinAmpIO SpoolUsage LSN EstResultRows EstProcTime EstMaxRowCount TDWMEstMemUsage AMPCPUTimeNorm ParserCPUTimeNorm MaxAMPCPUTimeNorm MaxCPUAmpNumberNorm MinAmpCPUTimeNorm ParserExpReq ProxyUser ProxyRole SessionTemporalQualifier CalendarName CPUDecayLevel IODecayLevel TacticalCPUException TacticalIOException SeqRespTime ReqIOKB ReqPhysIO ReqPhysIOKB DataCollectAlg CallNestingLevel NumRequestCtx KeepFlag QueryRedriven ReDriveKind 72 0 0 2 44 0.150 0.216 ? ? 0 ? Select Select Select * from Product@remote_server; 4 0.108 0 0.000 35 0 3 1,024 ? 4 0.145 4 0.000 10.087 14.526 7.263 0 0.000 0.011 ? ? ? TERADATA ? ? ? ? ? 1,652.000 0.000 0.000 1 0 1 N N ? Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views LastRespTime DisCPUTime Statements DisCPUTimeNorm TxnMode RequestMode DBQLStatus NumFragments VHLogicalIO VHPhysIO VHLogicalIOKB VHPhysIOKB LockDelay CheckpointNum UnityTime UtilityInfoAvailable UnitySQL ThrottleBypassed IterationCount TTGranularity MaxStepMemory TotalServerByteCount ? 0.000 1 0.000 BTET Exec ? ? 0.000 0.000 0.000 0.000 ? ? ? N ? ? ? LogicalRow 1.250 2,012 QryLogStepsV Category Operations Database DBC View Column and Referenced Table.Column View Column Data Type Format Referenced Table.Column ServerByteCount FLOAT ----,---,---,---,--9 DBQLStepTbl.ServerByteCount Usage Notes This view of the DBQLStepTbl table is populated if you specify the WITH STEPINFO option. When the query completes, the system logs one row for each query step, including parallel steps. This view can also show the size of the data transferred between Teradata and a foreign server for each step. For more information about the QryLogStepsV view, see Data Dictionary, B035-1092. For more information about the DBQL feature and a description of the QryLogStepsV view, see Database Administration, B035-1093. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 73 Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views ServerByteCount Column The ServerByteCount column is the total number of bytes sent to or received from a foreign server for each step. Example of QryLogStepsV The following SELECT statement gives the user name and the elapsed time of the steps whose queries have transferred more than 10 MB of data. SELECT lv.username, sv.elapsedtime FROM DBC.QryLogStepsV AS sv, DBC.QryLogV AS lv WHERE ServerByteCount / (1024*1024) GT 10 AND sv.queryid = lv.queryid; Result: username TOM ElapsedTime username JOHN ElapsedTime 0:10:22.220000 0:21:32.510000 ServerV[X] Category Operations Database DBC View Column and Referenced Table.Column View Column Data Type Format Referenced Table.Column AuthorizationName VARCHAR(128) X(128) TVM.AuthName X(15) TVM.AuthorizationType UNICODE NOT CASESPECIFIC AuthorizationType VARCHAR(15) UNICODE NOT CASESPECIFIC CreateTimeStamp TIMESTAMP(0) YYYY-MMDDBHH:MI:SS TVM.CreateTimeStamp CreatorName VARCHAR(128) X(128) Dbase.DatabaseName X(128) Dbase.DatabaseName UNICODE NOT CASESPECIFIC NOT NULL DataBaseName 74 VARCHAR(128) Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views View Column Data Type Format Referenced Table.Column X(128) Dbase.DatabaseName UNICODE NOT CASESPECIFIC NOT NULL LastAlterName VARCHAR(128) UNICODE NOT CASESPECIFIC NOT NULL LastAlterTimeStamp TIMESTAMP(0) YYYY-MMDDBHH:MI:SS TVM.LastAlterTimeStamp ServerID BYTE(6) X(12) TVM.TVMId X(128) TVM.TVMName NOT NULL ServerName VARCHAR(128) UNICODE NOT CASESPECIFIC NOT NULL Usage Notes This Teradata QueryGrid connector Data Dictionary view provides details about the foreign servers defined in the Teradata Database system. Possible Values of the AuthorizationType Column Value Description T INVOKER TRUSTED S DEFINER TRUSTED '' UNKNOWN Example of ServerV[X] The following SELECT statement returns information about the foreign server objects created by user 'dba'. select * from DBC.ServerV where CreatorName = 'dba'; Result: ServerID DataBaseName ServerName CreatorName CreateTimeStamp LastAlterName LastAlterTimeStamp 000011960000 TD_SERVER_DB SERVER_1 dba 2014-12-02 19:51:46 dba 2014-12-02 19:51:46 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 75 Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views ServerID DataBaseName ServerName CreatorName CreateTimeStamp LastAlterName LastAlterTimeStamp AuthorizationName AuthorizationType 000012960000 TD_SERVER_DB SERVER_2 dba 2014-12-02 19:51:50 dba 2014-12-02 19:51:50 user1 INVOKER TRUSTED ServerInfoV[X] Category Operations Database DBC View Column and Referenced Table.Column View Column Data Type Format Referenced Table.Column NameInfo VARCHAR(128) X(128) ServerInfo.NameInfo X(7) ServerInfo.NameInfoType X(128) TVM.TVMName X(256) ServerInfo.ValueInfo UNICODE NOT NULL UPPERCASE NOT CASESPECIFIC NVPType VARCHAR(7) UNICODE ServerName VARCHAR(128) UNICODE NOT CASESPECIFIC NOT NULL ValueInfo VARCHAR(32000) Usage Notes This Teradata QueryGrid connector Data Dictionary view provides details about the name value pairs used by foreign servers defined in the Teradata Database system. Possible Values of the NVPType Column 76 Value Description I IMPORT E EXPORT Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views Value Description G GLOBAL '' UNKNOWN TblSrvV[X] Category Operations Database DBC View Column and Referenced Table.Columns View Column Data Type Format Referenced Table.Column ServerName VARCHAR(128) X(128) TVM.TVMName X(128) Dbase.DatabaseName X(7) ServerTblOpInfo.TblOpType X(128) ServerTblOpInfo.TblopName X(128) ServerTblOpInfo.TblopDBName UNICODE NOT NULL SrvDataBaseName VARCHAR(128) UNICODE NOT NULL TableOperatorType VARCHAR(7) UNICODE TblOpName VARCHAR(128) UNICODE NOT NULL TbpOpDataBaseName VARCHAR(128) UNICODE NOT NULL Usage Notes This Teradata-to-Hadoop connector view returns information about the foreign servers and their associated table operators. Possible Values of the TableOperatorType Column Value Description I IMPORT E EXPORT '' UNKNOWN Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 77 Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views Example of TblSrvV[X] The following SELECT statement returns information about the foreign server, 'SQLHSRV_1,' and its associated table operators, LOAD_TO_HCATALOG_HDP2_3_0 and LOAD_FROM_HCATALOG_HDP2_3_0. BTEQ -- Enter your SQL request or BTEQ command: select * from DBC.TblSrvV where ServerName = 'SQLHSRV_1'; *** Query completed. 2 rows found. 5 columns returned. *** Total elapsed time was 1 second. ServerName SrvDataBaseName TblOpName TblOpDBName TableOperatorType ServerName SrvDataBaseName TblOpName TblOpDBName TableOperatorType SQLHSRV_1 TD_SERVER_DB LOAD_TO_HCATALOG_HDP2_3_0 SYSLIB EXPORT SQLHSRV_1 TD_SERVER_DB LOAD_FROM_HCATALOG_HDP2_3_0 SYSLIB IMPORT TblSrvInfoV[X] Category Operations Database DBC View Column and Referenced Table.Column View Column Data Type Format Referenced Table.Column NameInfo VARCHAR(128) X(128) ServerInfo.NameInfo X(128) TVM.TVMName X(128) Dbase.DatabaseName UNICODE NOT NULL UPPERCASE NOT CASESPECIFIC ServerName VARCHAR(128) UNICODE NOT NULL SrvDataBaseName VARCHAR(128) UNICODE NOT NULL 78 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Views View Column Data Type Format Referenced Table.Column TableOperatorType VARCHAR(7) X(7) ServerTblOpInfo.TblOpType X(128) ServerTblOpInfo.TblopDBName X(128) ServerTblOpInfo.TblOpName X(256) ServerInfo.ValueInfo UNICODE TbpOpDataBaseName VARCHAR(128) UNICODE NOT NULL TblOpName VARCHAR(128) UNICODE NOT NULL ValueInfo VARCHAR(32000) UNICODE Usage Notes This Teradata QueryGrid connector view returns the name value pairs defined for a foreign server. For more information about name value pairs, see CREATE FOREIGN SERVER. Possible Values of the TableOperatorType Column Value Description I IMPORT E EXPORT '' UNKNOWN Example of TblSrvInfoV [X] The following SELECT statement returns the name value pairs defined for the IMPORT table operator associated with the foreign server object, 'SQLHSRV_1'. select * from DBC.TblSrvInfoV where ServerName='SQLHSRV_1' and TableOperatorType = 'IMPORT'; Result: ServerName SrvDataBaseName TblOpName TbpOpDataBaseName NameInfo ValueInfo TableOperatorType ServerName SrvDataBaseName TblOpName TbpOpDataBaseName NameInfo ValueInfo SQLHSRV_1 TD_SERVER_DB LOAD_FROM_HCATALOG_HDP2_3_0 SYSLIB hosttype 'hadoop' IMPORT SQLHSRV_1 TD_SERVER_DB LOAD_FROM_HCATALOG_HDP2_3_0 SYSLIB username 'hive' Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 79 Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Tables TableOperatorType ServerName SrvDataBaseName TblOpName TbpOpDataBaseName NameInfo ValueInfo TableOperatorType ServerName SrvDataBaseName TblOpName TbpOpDataBaseName NameInfo ValueInfo TableOperatorType IMPORT SQLHSRV_1 TD_SERVER_DB LOAD_FROM_HCATALOG_HDP2_3_0 SYSLIB server '10.25.32.106' IMPORT SQLHSRV_1 TD_SERVER_DB LOAD_FROM_HCATALOG_HDP2_3_0 SYSLIB port '9083' IMPORT Data Dictionary Tables This topic describes the following: • The Teradata-to-Hadoop connector Data Dictionary tables • The row values related to the Teradata-to-Hadoop connector that are populated in the DBQL DBQLogTbl and DBQLStepTbl tables and the following Data Dictionary tables: • DBC.AccessRights • DBC.AccLogRuleTbl • DBC.Dependency • DBC.TVM Like other system tables, the Teradata-to-Hadoop connector pre-defined tables are created as relational tables in the DBC database during system initialization (SysInit) or by the Table Initialization Program and can be accessed only by users who have the required privileges to the tables. Access to the tables is strictly controlled to ensure that users (including system administrators) cannot modify them. Notice: To ensure that the system functions properly, do not modify or delete any Data Dictionary tables. Use the Data Dictionary views to access data in the tables to ensure that the tables are not accidentally modified or deleted. For information about the data dictionary views, see Data Dictionary Views. DBC.AccessRights This Data Dictionary table stores information about discretionary access privileges and rowlevel security privileges that have been granted. This information includes: • The ID of the user that was granted the privilege 80 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Tables • The specific privilege that was granted • Who granted it, and whether it was granted using the GRANT statement Row Values Row Description AccessRight The type of privilege granted on a user object only. Possible values include: • CS (CREATE SERVER) • DS (DROP SERVER) Note: These values must be explicitly granted. Related Topics For more information about the DBC.AccessRights table, see Data Dictionary, B035-1092. DBC.AccLogRuleTbl This Data Dictionary table stores information about the logging of access privilege checks. This information includes: • Typical access control privilege checks • Row-level security privilege checks • The user, database, and object involved in the privilege check Note: For Teradata QueryGrid connector queries, this is the remote object or user on the foreign server involved in the privilege check. Row Values Row Description AcrCreateServer This row stores the logging in effect for the CREATE SERVER privilege on TD_SERVER_DB to which the rule applies. This row is populated if you specify the ON FOREIGN SERVER option in the BEGIN or END LOGGING statement. AcrDropServer This row stores the logging in effect for the DROP SERVER privilege on TD_SERVER_DB to which the rule applies. This row is populated if you specify the ON FOREIGN SERVER option in the BEGIN or END LOGGING statement. DBC.DBQLogTbl This Data Dictionary table is the main DBQL table containing information about the SQL and Teradata QueryGrid connector queries being logged. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 81 Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Tables The DBQLogTbl default row consists of all the available DBQLogTbl table fields. The default row provides general query information that is usually adequate for investigating a query that is interfering with performance. When no options are specified, a default row includes: • User ID and user name under which the session being logged was initiated • Unique ID for the process, session, and host (client) connection • Account string, expanded as appropriate, that was current when the query completed • First 200 characters of the query statement • CPU and I/O statistics • Default database name that was current when the query completed • The total size of the data transferred between Teradata and a foreign server The default is one default row per query. Row Values Row Description StatementGroup If there is a DDL statement in a request, the StatementGroup column reports which type: • DDL CREATE if this is a CREATE FOREIGN SERVER statement • DDL ALTER if this is an ALTER or DROP FOREIGN SERVER statement • OTHER SYS OTHER if this is a SHOW FOREIGN SERVER or HELP FOREIGN statement • DDL GRANT If the statement has only one DML statement or multiple DML statements that are all of the same type, StatementGroup indicates the type. For example if there are three DELETE statements in a request, StatementGroup reports: DML DELETE Similarly, for requests with individual or multiple INSERT, INSERT... SELECT, UPDATE or SELECT statements, StatementGroup reports: • DML INSERT • DML INSERT... SELECT • DML UPDATE • SELECT In a multistatement request with different types of DML statements, you see a list showing the number of statements of each type in the request. For example, a request with one insert and two update statements appears as: DML Del=0 Ins=1 InsSel=0 Upd=2 Sel=0 StatementType 82 The type of statement of the query. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Tables Row Description In a multistatement request, this is the last statement of the request. However, this may not accurately describe the request. For more statement information, see StatementGroup. The possible values recorded include: • CREATE SERVER for the CREATE FOREIGN SERVER statement. • ALTER SERVER for ALTER FOREIGN SERVER statement. • DROP SERVER for the DROP FOREIGN SERVER statement. • SHOW for the SHOW FOREIGN SERVER statement. • HELP for the HELP FOREIGN statement. TotalServerByteCount The total number of bytes read from or sent to a foreign server object. The column is NULL if the request does not load or send data from or to a foreign server object. Related Topics For more information about the ... See ... DBQL feature, how to enable DBQL logging, and Database Administration, B035-1093. the DBC.DBQStepTbl table and fields BEGIN/REPLACE QUERY LOGGING statement SQL Data Definition Language - Syntax and Examples, B035-1144. DBC.Dependency This Data Dictionary table stores information about the relationships and dependencies between various types of objects. The types of relationships and dependencies include: • Relationships between tables and row-level security constraints • Dependencies between JAR objects • Relationships between foreign server objects and table operators Row Value Row Description RelationshipCode The value KO indicates the relationship between the foreign server and the table operator. DBC.ServerInfo This Teradata QueryGrid connector Data Dictionary table stores the name value pairs of the server object that are used by the table operators to connect to the foreign server if the name value pair of the USING clause is specified in the CREATE/ALTER FOREIGN SERVER statement. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 83 Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Tables Row Values DBC.ServerInfo Field Description DatabaseID The database ID that contains the server object. NameInfo The name attribute specified in the name value pair of the USING clause in the CREATE or ALTER FOREIGN SERVER statement. NameInfoType Possible values include: • G indicates the server attribute name value pair defined in the USING clause of the CREATE or ALTER FOREIGN SERVER statement. • I indicates the IMPORT table operator name value pair defined in the USING clause of the CREATE or ALTER FOREIGN SERVER statement. • E indicates the EXPORT table operator name value pair defined in the USING clause of the CREATE or ALTER FOREIGN SERVER statement. ServerID The ID of the server object. ValueInfo The value attribute specified in the name value pair of the USING clause in the CREATE or ALTER FOREIGN SERVER statement. DBC.ServerTblOpInfo This Teradata QueryGrid connector Data Dictionary table stores information about the table operator associated with the foreign server. This table also includes the database and table operator names to avoid issues with the object ID changing during an archive or restore operation. Row Values 84 DBC.ServerTblOpInfo Field Description DatabaseId The database ID that contains the foreign server object. ServerID The ID of the foreign server. TblOpDatabaseName The database name in which the table operator is defined. TblopName The name of the table operator associated with the foreign server. TblOpType Possible values include: • I indicates the IMPORT table operator name value pair. • E indicates the EXPORT table operator name value pair. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Tables DBC.ServerTblOpInfo Field Description • 'UNKNOWN' indicates an unknown name value pair. Note: This column may return more than one operator type. DBC.DBQLStepTbl This DBQL table stores information about each processing step used to satisfy the query. For a Teradata QueryGrid connector query, this includes the size of the data transferred between Teradata and a foreign server. One row is logged for each step. This Data Dictionary table is only populated if you specify the WITH STEPINFO option in the BEGIN or REPLACE QUERY LOGGING statement. When the query completes, the system logs one row for each query step, including parallel steps. Row Value Row Description ServerByteCount The number of row bytes read from or sent to a foreign server object. This column is NULL if the step does not load or send data to or from a foreign server. Related Topics For more information ... See ... For more information about the DBQL feature, how to enable DBQL logging, the DBC.DBQStepTbl table and fields Database Administration, B035-1093. For more information about the BEGIN/ REPLACE QUERY LOGGING statement SQL Data Definition Language - Syntax and Examples, B035-1144. DBC.TVM Table This Data Dictionary table stores one row for each of the following objects on the system: • Column • Database • External stored procedure • Hash index • JAR • Join index • Macro • Stored procedure Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 85 Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop Data Dictionary Tables • • • • • • Table Trigger User-defined function User-defined method User-defined type View For Teradata QueryGrid connector queries, the DBC.TVM table stores one row for each foreign server object on the Teradata Database system if one of the following options is specified: • The TRUSTED security type of the CREATE or REPLACE AUTHORIZATION statement. • The optional comment string of the SQL COMMENT statement. The DBC.TVM table is not archived during archive and restore operations. Row Values Row Description AuthIdUsed The authorization ID of the foreign server object. This row returns NULL if the foreign server object is not authorized. AuthName The name of the authorization defined for the foreign server object. This row returns NULL if the foreign server object is not authorized. AuthorizationSubType Whether the specified authorization is the default authorization. Possible values include: • I indicates the specified INVOKER TRUSTED authorization. • D indicates DEFINER TRUSTED authorization. • F indicates DEFINER DEFAULT TRUSTED authorization. • NULL indicates the foreign server object is not authorized. AuthorizationType The type of authorization of the foreign server object. Possible values include: • T indicates that the TRUSTED security type of the CREATE or REPLACE AUTHORIZATION statement is specified. • NULL indicates the foreign server object is not authorized CommentString Text or comment supplied by the user on the column, database, table, view, macro, user-defined function, user-defined types, user-defined methods, stored procedure, role, profile, user, or foreign server. TableKind If you are using a foreign server object, this row returns K. Note: K is supported on the Teradata QueryGrid connectors only. For more information on TableKind values, see Data Dictionary, B035-1092. 86 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide APPENDIX A Notation Conventions About Notation Conventions This appendix describes the notation conventions used in this book. Convention Description Syntax Diagrams Describes SQL syntax form, including options. Square braces in the text Represent options. The indicated parentheses are required when you specify options. For example: DECIMAL [(n[,m])] means the decimal data type can be defined optionally: • without specifying the precision value n or scale value m • specifying precision (n) only • specifying both values (n,m) You cannot specify scale without first defining precision. • CHARACTER [(n)] means that use of (n) is optional. The values for n and m are integers in all cases. Syntax Diagram Conventions Notation Conventions Item Definition and Comments Letter An uppercase or lowercase alphabetic character ranging from A through Z. Number A digit ranging from 0 through 9. Do not use commas when typing a number with more than 3 digits. Word Keywords and variables. • UPPERCASE LETTERS represent a keyword. Syntax diagrams show all keywords in uppercase, unless operating system restrictions require them to be in lowercase. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 87 Appendix A Notation Conventions Syntax Diagram Conventions Item Definition and Comments • • lowercase letters represent a keyword that you must type in lowercase, such as a Linux command. Mixed Case letters represent exceptions to uppercase and lowercase rules. The exceptions are noted in the syntax explanation. lowercase italic letters represent a variable such as a column or table name. • Substitute the variable with a proper value. lowercase bold letters represent an excerpt from the diagram. • The excerpt is defined immediately following the diagram that contains it. UNDERLINED LETTERS represent the default value. • This applies to both uppercase and lowercase words. Spaces Use one space between items such as keywords or variables. Punctuation Type all punctuation exactly as it appears in the diagram. Paths The main path along the syntax diagram begins at the left with a keyword, and proceeds, left to right, to the vertical bar, which marks the end of the diagram. Paths that do not have an arrow or a vertical bar only show portions of the syntax. The only part of a path that reads from right to left is a loop. Continuation Links Paths that are too long for one line use continuation links. Continuation links are circled letters indicating the beginning and end of a link: A A When you see a circled letter in a syntax diagram, go to the corresponding circled letter and continue reading. Required Entries Required entries appear on the main path: SHOW If you can choose from more than one entry, the choices appear vertically, in a stack. The first entry appears on the main path: SHOW CONTROLS VERSIONS 88 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix A Notation Conventions Syntax Diagram Conventions Optional Entries You may choose to include or disregard optional entries. Optional entries appear below the main path: SHOW CONTROLS If you can optionally choose from more than one entry, all the choices appear below the main path: READ SHARE ACCESS Some commands and statements treat one of the optional choices as a default value. This value is UNDERLINED. It is presumed to be selected if you type the command or statement without specifying one of the options. Strings String literals appear in apostrophes: 'msgtext ' Abbreviations If a keyword or a reserved word has a valid abbreviation, the unabbreviated form always appears on the main path. The shortest valid abbreviation appears beneath. SHOW CONTROLS CONTROL In the above syntax, the following formats are valid: SHOW CONTROLS SHOW CONTROL Loops A loop is an entry or a group of entries that you can repeat one or more times. Syntax diagrams show loops as a return path above the main path, over the item or items that you can repeat: , , ( 3 4 cname ) Read loops from right to left. The following conventions apply to loops: Item Description Example maximum number of entries allowed The number appears in a circle on the return path. In the example, you may type cname a maximum of four times. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 89 Appendix A Notation Conventions Syntax Diagram Conventions Item Description Example minimum number of entries allowed The number appears in a square on the return path. In the example, you must type at least three groups of column names. separator character required between entries The character appears on the return path. In the example, the separator character is a comma. If the diagram does not show a separator character, use one blank space. delimiter character required around entries The beginning and end In the example, the delimiter characters appear outside the characters are the left and right return path. parentheses. Generally, a space is not needed between delimiter characters and entries. Excerpts Sometimes a piece of a syntax phrase is too large to fit into the diagram. Such a phrase is indicated by a break in the path, marked by (|) terminators on each side of the break. The name for the excerpted piece appears between the terminators in boldface type. The boldface excerpt name and the excerpted phrase appears immediately after the main diagram. The excerpted phrase starts and ends with a plain horizontal line: LOCKING excerpt HAVING con excerpt where_cond , cname , col_pos Multiple Legitimate Phrases In a syntax diagram, it is possible for any number of phrases to be legitimate: dbname DATABASE tname TABLE vname VIEW In this example, any of the following phrases are legitimate: dbname 90 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix A Notation Conventions Character Shorthand Notation Used in This Book DATABASE dbname tname TABLE tname vname VIEW vname Sample Syntax Diagram , CREATE VIEW viewname AS A LOCKING cname CV LOCK ACCESS dbname A DATABASE FOR SHARE IN tname READ TABLE WRITE EXCLUSIVE vname VIEW EXCL , B SEL B MODE expr , FROM qual_cond tname C .aname C HAVING cond ; qual_cond , WHERE cond GROUP BY cname , col_pos Character Shorthand Notation Used in This Book This book uses the Unicode naming convention for characters. For example, the lowercase character ‘a’ is more formally specified as either LATIN CAPITAL LETTER A or U+0041. The U+xxxx notation refers to a particular code point in the Unicode standard, where xxxx stands for the hexadecimal representation of the 16-bit value defined in the standard. In parts of the book, it is convenient to use a symbol to represent a special character, or a particular class of characters. This is particularly true in discussion of the following Japanese character encodings: • KanjiEBCDIC • KanjiEUC • KanjiShift-JIS These encodings are further defined in International Character Set Support, B035-1125. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 91 Appendix A Notation Conventions Character Shorthand Notation Used in This Book Character Symbols The symbols, along with character sets with which they are used, are defined in the following table. Symbol Encoding Meaning a-z A-Z 0-9 Any Any single byte Latin letter or digit. a-z A-Z 0-9 Any Any fullwidth Latin letter or digit. < KanjiEBCDIC Shift Out [SO] (0x0E). Indicates transition from single to multibyte character in KanjiEBCDIC. > KanjiEBCDIC Shift In [SI] (0x0F). Indicates transition from multibyte to single byte KanjiEBCDIC. T Any Any multibyte character. The encoding depends on the current character set. For KanjiEUC, code set 3 characters are always preceded by ss3. I Any Any single byte Hankaku Katakana character. In KanjiEUC, it must be preceded by ss2, forming an individual multibyte character. Δ Any Represents the graphic pad character. Δ Any Represents a single or multibyte pad character, depending on context. ss2 KanjiEUC Represents the EUC code set 2 introducer (0x8E). ss3 KanjiEUC Represents the EUC code set 3 introducer (0x8F). For example, string “TEST”, where each letter is intended to be a fullwidth character, is written as TEST. Occasionally, when encoding is important, hexadecimal representation is used. For example, the following mixed single byte/multibyte character data in KanjiEBCDIC character set LMN<TEST>QRS is represented as: D3 D4 D5 0E 42E3 42C5 42E2 42E3 0F D8 D9 E2 Pad Characters The following table lists the pad characters for the various character data types. 92 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix A Notation Conventions Character Shorthand Notation Used in This Book Server Character Set Pad Character Name Pad Character Value LATIN SPACE 0x20 UNICODE SPACE U+0020 GRAPHIC IDEOGRAPHIC SPACE U+3000 KANJISJIS ASCII SPACE 0x20 KANJI1 ASCII SPACE 0x20 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 93 Appendix A Notation Conventions Character Shorthand Notation Used in This Book 94 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide APPENDIX B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop Introduction to FNC Interfaces The following sections describe C library functions and Java application classes that Teradata provides for use by table operators to import and export data from and to foreign servers. The Java application classes are provided in the javFnc.jar archive; therefore, your search path for Java classes must include the directory containing the javFnc.jar archive. The default location for the archive is in the bin directory of the Teradata software distribution: /usr/tdbms/bin For more information about the C library functions and Java application classes, see SQL External Routine Programming, B035-1147. FNC_GetAmpHash / getAmpHash Purpose Returns values that hash to the specified AMPs. C Signature void FNC_GetAmpHash(int int **amphash, size) Parameter Type Description int ** amphash IN/OUT amphash[n][0] is the AMP number. int size IN amphash[n][1] will be returned with the value that hashes to the AMP. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide The size (n) of the amphash array. 95 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_GetHashAmp / getHashAmp Java Signature Defined in RuntimeContract class: public void getAmpHash(int[][] amphash) Usage Notes This routine is callable on an AMP or PE vproc. FNC_GetHashAmp / getHashAmp Purpose Accepts data and determines the AMP which would be responsible for that key. C Signature int FNC_GetHashAmp(FNC_HashRow_t int int *data, size, *retCode) Parameter Type Description FNC_HashRow_t * data IN A pointer to an array of structures representing table columns. FNC_HashRow_t is defined as follows: typedef struct { void *data; parm_tx type; } FNC_HashRow_t; int size IN The size of the data and return arrays. int * retCode OUT A pointer to an integer value to indicate success or an error number. 0 indicates success. Java Signature Defined in RuntimeContract class: public int getHashAmp(Object[] data) Return Value An integer representing the number of the AMP that would be responsible for the key. 96 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_SetActivityCount / setActivityCount Usage Notes This routine is callable on a PE vproc only by a table operator. FNC_SetActivityCount / setActivityCount Purpose Sets the number of rows exported. C Signature void FNC_SetActivityCount(int long stream, rowsexported) Java Signature Defined in RuntimeContract class: public void setActivityCount(int stream, long rowsexported) throws SQLException Parameters Parameter Type Description stream IN Specifies which stream to write to. rowsexported IN The value to be written to ActivityCount. Usage Notes This routine is callable on an AMP vproc only by a table operator. FNC_TblGetNodeData Purpose Returns node IDs and AMP IDs for all online AMP vprocs, allowing table functions and table operators to configure themselves to run on specific AMPs. This routine is callable on an AMP or PE vproc. For details about this routine, see SQL External Routine Programming, B035-1147. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 97 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpBytesTransferred / bytesTransferred FNC_TblOpBytesTransferred / bytesTransferred Purpose Records the number of bytes transferred between Teradata Database and the foreign server by the table operator. C Signature void FNC_TblOpBytesTransferred(unsigned long unsigned long in, out) Java Signature Defined in RuntimeContract class: public void bytesTransferred(long in, long out) throws SQLException Parameters Parameter Type Description in IN The number of bytes transferred into Teradata Database from the foreign server. out IN The number of bytes transferred from Teradata Database to the foreign server. Usage Notes This routine is callable on an AMP vproc only by a table operator. FNC_TblOpGetBaseInfo / getBaseInfo Purpose Examines each column in the parser tree and gets the information of the base element if the type of the column is a user-defined type (UDT). C Signature void FNC_TblOpGetBaseInfo(FNC_TblOpColumnDef_t UDT_BaseInfo_t 98 *colDefs, *baseInfo) Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpGetBaseInfo / getBaseInfo Parameter Type Description FNC_TblOpColumnDef_t * colDefs IN A list of column definitions. UDT_BaseInfo_t * baseInfo OUT For more information about the FNC_TblOpColumnDef_t structure, see SQL External Routine Programming . A list of UDT_BaseInfo_t structures, one for each column in colDefs . UDT_BaseInfo_t is defined as follows: typedef struct { SMALLINT udt_indicator; /* type of the UDT */ /* 0=NONUDT; 1=ARRAY; 2=STRUCT; 3=JSON */ int array_numDimension; /* the number of dimensions for ARRAY UDT */ dtype_et base_datatype; /* for array UDT, this is the data type of each element */ int base_max_length; SMALLINT base_total_interval_digits; SMALLINT base_num_fractional_digits; } UDT_BaseInfo_t; dtype_et is defined as follows: typedef int dtype_et; Valid values are defined by the dtype_en enumeration in the sqltypes_td.h header file. Java Signature Defined in RuntimeContract class: public UDTBaseInfo[] getBaseInfo(ColumnDefinition[] colDefs) throws SQLException The method returns a list of UDTBaseInfo, one for each column passed in. Usage Notes This routine detects whether or not the type of the column is a UDT, and if it is a UDT, whether it is ARRAY, STRUCT, or JSON. This information is returned in baseInfo.udt_indicator. If the column is an ARRAY UDT, then baseInfo is filled with detailed information about the base element of the array. Note: This routine currently does not support returning more information for other types of UDTs except ARRAY UDT. The routine is callable on a PE vproc only by a table operator. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 99 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpGetColDef FNC_TblOpGetColDef Purpose Retrieves column definitions of the stream specified by the input parameters. The routine also returns the output column definition for the contract function. For details about this routine, see SQL External Routine Programming, B035-1147. FNC_TblOpGetContractDef Purpose Retrieves the contract function context. This routine can be used to get the contract definition at different phases for the table operator. For details about this routine, see SQL External Routine Programming, B035-1147. FNC_TblOpGetContractPhase / getContractPhase Purpose Indicates the phase in the parser from which the contract function is being called. C Signature int FNC_TblOpGetContractPhase() Java Signature Defined in RuntimeContract class: public ContractPhase getContractPhase(); Return Value The parser phases are as follows: • FNC_CTRCT_GET_ALLCOLS_PHASE = 0 Indicates that all columns for the remote table should be returned. • FNC_CTRCT_VALIDATE_PHASE = 1 Validates that the given inputs are correct. The contract function can be called multiple times from this phase. 100 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpGetExternalQuery / getExternalQuery Note: This phase is currently not used. • FNC_CTRCT_COMPLETE_PHASE = 2 Indicates that this is the last call of the contract function and any foreign server actions that need to be done should be completed. • FNC_CTRCT_DDL_PHASE = 3 Indicates that execution of the CREATE SERVER statement is being completed and the connectivity should be verified. • FNC_CTRCT_DEFINE_SERVER_PHASE = 4 Indicates that a CREATE VIEW or CREATE MACRO statement is being executed and that the custom clause data may not be valid. Usage Notes This routine is callable on a PE vproc only by a table operator. FNC_TblOpGetExternalQuery / getExternalQuery Purpose Generates the text query string for the foreign server and returns the interface version that is currently supported. C Signature void FNC_TblOpGetExternalQuery(FNC_TblOpColumnDef_t ServerType ExtOpSetType int unsigned char unsigned int *colDefs, serverType, opSet, *interfaceVersion, **extQryPtr, *extQryLenPtr) Parameter Type Description FNC_TblOpColumnDef_t * colDefs IN A list of column definitions that may occur in a WHERE clause by the foreign server. For more information about the FNC_TblOpColumnDef_t structure, see SQL External Routine Programming. ServerType serverType IN Teradata QueryGrid: Teradata Database-to-Hadoop User Guide ServerType is defined as follows: typedef enum { ANSISQL = 1, HADOOP = 2 } serverType_et; 101 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpGetExternalQuery / getExternalQuery Parameter Type Description typedef int ServerType; • • ExtOpSetType opSet IN If ANSI SQL, the entire subquery for the table of the foreign server is returned. If HADOOP, only the WHERE clause portion of the query is returned. A set of valid operators supported on the foreign server. ExOpSetType is defined as follows: typedef enum { Eq_ET, Ne_ET, Gt_ET, Le_ET, Lt_ET, And_ET, Or_ET, Not_ET, Between_ET, In_ET, NotIn_ET, Ge_ET, Like_ET } extoptype_et; typedef BYTE ExtOpType; typedef unsigned int ExtOpSet; typedef struct ExtOpSetType { ExtOpSet ExtOpSetList; } ExtOpSetType; int * interfaceVersion IN/OUT A pointer to the interface version: • The caller passes in the desired interface version as the argument. • The routine returns the actual interface version that is currently supported. unsigned char ** extQryPtr OUT A pointer to the generated text query string for the foreign server. The query string is null-terminated. unsigned int * extQryLenPtr OUT A pointer to the length of the external query (in bytes). Java Signature Defined in RuntimeContract class: 102 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpGetExternalQuery / getExternalQuery public String getExternalQuery( ColumnDefinition[] colDefs, ServerType serverType, ExtOpSetType[] extOpSetTypes, int[] interfaceVersions) throws SQLException The parameters are similar to those for the C routine. The Java enum classes are defined as follows: public enum ServerType { ANSISQL(0), HADOOP(1); } public enum ExtOpSetType { Eq_ET(0), Ne_ET(1), Gt_ET(2), Le_ET(3), Lt_ET(4), And_ET(5), Or_ET(6), Not_ET(7), Between_ET(8), In_ET(9), NotIn_ET(10), Ge_ET(11), Like_ET(12), LastOp_ET(13); } The method returns a string which contains the external query. Example: Calling getExternalQuery ServerType sType = ServerType.ANSISQL; ExtOpSetType extOpTypes[] = new ExtOpSetType[3]; extOpTypes[0] = ExtOpSetType.Eq_ET; extOpTypes[1] = ExtOpSetType.And_ET; extOpTypes[2] = ExtOpSetType.Or_ET; int[] versions = new int[2]; versions[0] = 1; // The caller passes in the desired interface version. String extQuery = contract.getExternalQuery(colDefs, sType, extOpTypes, versions); After calling getExternalQuery, versions[1] will contain the actual interface version that is currently supported on the system. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 103 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpGetInnerContract / getInnerContractCtx Usage Notes This routine is callable on a PE vproc only by a table operator. Note: The C routine, FNC_TblOpGetExternalQuery, calls FNC_malloc to allocate memory for the buffer specified by *extQryPtr. Unless the routine returns *extQryPtr as NULL, you must use FNC_free to free the allocated memory after processing the data. FNC_TblOpGetInnerContract / getInnerContractCtx Purpose Gets the contract definition of a nested inner table operator for the outer table operator to use. C Signature void FNC_TblOpGetInnerContract(void int **innerContract, *contractLen) Parameter Type Description void ** innerContract IN/OUT Input argument: Identifies the buffer which will hold the contract definition information. Return value: • The contract definition of the inner table operator. • NULL, if the inner contract function does not exist. int * contractLen OUT The length of the contract definition. Java Signature Defined in RuntimeContract class: public byte[] getInnerContractCtx() throws SQLException Usage Notes This routine is callable on a PE vproc only by a table operator. Note: The C routine, FNC_TblOpGetInnerContract, calls FNC_malloc to allocate memory for the buffer specified by *innerContract . Unless the routine returns *innerContract as NULL, you must use FNC_free to free the allocated memory after processing the data. 104 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpSetContractDef FNC_TblOpSetContractDef Purpose Sets an opaque binary string value that the contract function passes to the associated table operator at execution time. This string is referred to as the contract function context. This routine can be used to set the contract definition at different phases for the table operator. For details about this routine, see SQL External Routine Programming, B035-1147. FNC_TblOpSetDisplayLength / setDisplayLength Purpose Resets the lengths in column definitions for VARCHAR data types. C Signature void FNC_TblOpSetDisplayLength(Stream_Direction_en FNC_TblOpColumnDef_t direction, *colDefs) Parameter Type Description Stream_Direction_en direction IN Stream_Direction_en is defined as follows: typedef enum { ISOUTPUT = 'W', ISINPUT = 'R' } Stream_Direction_en; Specify the input value ISINPUT for export and the value ISOUTPUT for import. FNC_TblOpColumnDef_t * colDefs IN/OUT A pointer to the column definitions which will be returned with the modified display lengths. For more information about the FNC_TblOpColumnDef_t structure, see SQL External Routine Programming. Java Signature Defined in RuntimeContract class: public void setDisplayLength(char direction, ColumnDefinition[] colDefs) throws SQLException Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 105 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpSetExplainText / setExplainText Parameter Type Description char direction IN Specify an input value of 'R' for export and 'W' for import. ColumnDefinition[] colDefs IN/OUT The column definitions for which the display lengths will be reset. Usage Notes This routine can be invoked for both import and export operations. The routine is callable on a PE vproc only by a table operator. FNC_TblOpSetExplainText / setExplainText Purpose Sets the EXPLAIN text when the table operator has the hexplain custom clause set. C Signature void FNC_TblOpSetExplainText(int char int numOfTexts, **arrayOfTexts, *arrayOfLens); Parameter Type Description int numOfTexts IN The number of EXPLAIN text strings. char ** arrayOfText s IN An array containing the EXPLAIN text strings. int * arrayOfLens IN An array containing the lengths of each EXPLAIN text string. Java Signature Defined in RuntimeContract class: public void setExplainText(String[] texts); Usage Notes Hexplain has the following values for the type of EXPLAIN to be completed: • 1 = simple • 2 = verbose • 3 = DBQL 106 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_ TblOpSetFormat / setFormat This routine accepts multiple self-contained EXPLAIN text strings as input in order to handle a multi-row EXPLAIN plan from a foreign server. The routine provides the EXPLAIN plan to the parser which will display the multiple lines of the EXPLAIN plan. This routine is callable on a PE vproc only by a table operator. FNC_ TblOpSetFormat / setFormat Purpose Sets attributes of the format of the input and output streams. This allows the contract function to specify the format of the data types to the parser. C Signature void FNC_TblOpSetFormat(char int Stream_Direction_en void int *attribute, streamno, direction, *inValue, inSize); Parameter Type Description char * attribute IN The format attribute to be set. Valid attributes are: • "RECFMT" • "TZTYPE" • "CHARSETFMT" • "REPUNSPTCHR" "CHARSETFMT" and "REPUNSPTCHR" apply only to import table operators. int streamno IN The stream number. Stream_Direction_en direction IN The stream direction: 'R' or 'W'. Stream_Direction_en is defined as follows: typedef enum { ISOUTPUT = 'W', ISINPUT = 'R' } Stream_Direction_en; void * inValue IN The location of the new value of the format attribute. int inSize IN The size in bytes of the new value pointed by inValue. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 107 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_ TblOpSetFormat / setFormat Java Signature Defined in RuntimeContract class: public void setFormat( int stream, InputInfo.StreamDir dir, java.util.Map<StreamFormat.FormatAttribute,java.lang.Object> formatattributes) Parameter Type Definition stream IN Indicates the stream on which the format will be applied. Currently the only valid value is 0. dir IN The direction of the stream (input or output). formatattributes IN Map of attribute values to apply. This method defines the attributes for formatting the stream. This is applicable for input and output streams. For information about the InputInfo and StreamFormat classes, see SQL External Routine Programming, B035-1147. Usage Notes • This routine is valid only when called within the contract function of a table operator. • For "RECFMT" the default value is INDICFMT1, where the format is IndicData with row separator sentinels. When the format attribute is "RECFMT", the inValue buffer should have a value of type Stream_Fmt_en. All field-level formats impact the entire record. • If data being imported from a foreign server contains characters unsupported by Teradata Database, you must use FNC_ TblOpSetFormat / setFormat and explicitly set "CHARSETFMT" and "REPUNSPTCHR" attributes. Format Attribute Values 108 Format Attribute Description "RECFMT" Defines the record format. When the format attribute is "RECFMT", the inValue buffer should have a value of type Stream_Fmt_en. The Stream_Fmt_en enumeration is defined in int/sql/sqltypes_td.h with the following values: • INDICFMT1 = 1 IndicData with row separator sentinels. • INDICBUFFMT1 = 2 IndicData with NO row or partition separator sentinels. "TZTYPE" Used as an indicator to Teradata Database to receive from or send TIME/TIMESTAMP data to the table operator in different format. • RAW = 0 as stored on the Teradata Database file system Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_ TblOpSetFormat / setFormat Format Attribute "CHARSETFMT" Description • UTC = 1 as UTC • EVLDBC Signals that neither data conversion nor detection is needed. EVLUTF16CHARSET Signals that the external data to be imported into Teradata Database are in UTF16 encoding. EVLUTF8CHARSET Signals that the external data to be imported into Teradata Database are in UTF8 encoding. • • "REPUNSPTCHR" A boolean value that specifies what to do when an unsupported unicode character is detected in the external data to be imported into Teradata Database. • true Replaces the unsupported character with U +FFFD. • false Return an error when an unsupported character is detected. This is the default behavior. Importing and Exporting TIME/TIMESTAMP Data You can map the Teradata Database TIME and TIMESTAMP data types to the Hadoop STRING or the Oracle TIMESTAMP data type when importing or exporting data to these foreign servers. The table operator can use FNC_TblOpSetFormat to set the tztype attribute as an indicator to Teradata Database to receive from or send TIMESTAMP data to the table operator in a native but adjusted format. The tztype attribute is set as follows for the import and export operators: • For Hadoop, the attribute is set to UTC. • For Oracle, the attribute is set to UTC. If the transform is off, the data will be transferred in Raw form which is the default for table operators and is consistent with standard UDFs. tztype is a member of the structure FNC_FmtConfig_t defined in fnctypes.h as follows: typedef struct { int Stream_Fmt_en recordfmt; //enum - indicdata, fastload binary, delimited bool inlinelob; //inline or deferred bool UDTTransformsOff; //true or false bool PDTTransformsOff; //true or false bool ArrayTransformsOff; //true or false Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 109 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpSetHashByDef / setHashBy char auxinfo[128]; //For delimited text can contain the record separator, delimiter //specification and the field enclosure characters double inperc; //recommended percentage of buffer devoted to input rows bool inputnames; //send input column names to step bool outputnames; //send output column names to step TZType_en tztype; //enum - Raw or UTC int charsetfmt; // charset format of data to be imported into TD through QG bool replUnsprtedUniChar; /* true - replace unsupported unicode character encountered with U+FFFD when data is imported into TD through QG false - error out when unsupported unicode char encountered */ } FNC_FmtConfig_t; TZType_en is defined as follows: typedef enum { Raw = 0, UTC = 1, } TZType_en; /* as stored on TD File system */ /* as UTC */ For export, FNC_TblOpSetInputColTypes or setInputInfo is called during the contract phase in the resolver and will use the tztype attribute to add the desired cast to the input TIME or TIMESTAMP column types. Teradata Database converts the TIME and TIMESTAMP data to the session local time before casting to the character type, so when a TIME or TIMESTAMP column is being mapped to charfix/charvar as when mapping to the Hadoop STRING type, the data will transmit in session local time zone and no explicit casts are needed. For import, when getting the input buffer from the table operator, TIME or TIMESTAMP data have to be converted to Raw form. There is no conversion needed for the import of Hadoop Strings to Teradata Database TIME or TIMESTAMP data types since it follows the normal conversion path from character to TIME/TIMESTAMP in Teradata Database. Note: Teradata does not recommend importing or exporting TIME/TIMESTAMP data for a Teradata Database system with timedatewzcontrol flag 57 = 0. For such systems, the TIME/ TIMESTAMP data is stored in OS local time. The System/Session time zone is not set and Teradata Database does not apply any conversions on TIME/TIMESTAMP data when reading or writing from disk. Therefore, exporting such data reliably in the format desired by the foreign server is a problem and Teradata recommends that the Teradata-to-Hadoop connector feature not be used for such systems. FNC_TblOpSetHashByDef / setHashBy Purpose Allows the contract function writer to set the HASH BY specification when developing table operators. 110 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpSetInputColTypes / setInputInfo C Signature void FNC_TblOpSetHashByDef(int FNC_Names_t streamno, *colNames); Parameter Type Description int streamno IN The input stream number. FNC_Names_t * colNames IN A pointer to the HASH BY metadata. FNC_Names_t is defined as follows: typedef struct { int number; // number of column names names_t names[1]; // array of column names } FNC_Names_t; names_t is defined as follows: typedef CHARACTER names_t[FNC_MAXNAMELEN_EON]; Java Signature Defined in RuntimeContract class: public void setHashBy(int streamno, String[] colNames) throws SQLException Usage Notes This routine can only run if called from the contract function. It is callable on a PE vproc. The routine will produce an error if the stream number is invalid or the HASH BY metadata was already set. FNC_TblOpSetInputColTypes / setInputInfo Purpose Sets casting statements on the input columns so that the data types are cast as indicated by the caller. C Signature void FNC_TblOpSetInputColTypes(int FNC_TblOpColumnDef_t Teradata QueryGrid: Teradata Database-to-Hadoop User Guide streamNo, *colDefs) 111 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpSetLocalOrderByDef / setOrderBy Parameter Type Description int streamNo IN The input stream number. FNC_TblOpColumnDef_t * colDefs IN A list of column definitions. For more information about the FNC_TblOpColumnDef_t structure, see SQL External Routine Programming, B035-1147. Java Signature Defined in RuntimeContract class: public void setInputInfo(int streamNo, ColumnDefinition[] colDefs) throws SQLException Usage Notes This routine is callable on a PE vproc only by a table operator. FNC_TblOpSetLocalOrderByDef / setOrderBy Purpose Allows the contract function writer to set the ordering specification when developing table operators. C Signature void FNC_TblOpSetLocalOrderByDef(int FNC_Names_Ord_t Parameter Type Description int streamno IN The input stream number. FNC_Names_Ord_t * colNames IN A pointer to the LOCAL ORDER BY metadata. streamno, *colNames); FNC_Names_Ord_t is defined as follows: typedef struct { int number; names_ord_t col[1]; } FNC_Names_Ord_t; // number of column names // array of name-order-nulltype triplets names_ord_t is defined as follows: 112 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpSetLocalOrderByDef / setOrderBy Parameter Type Description typedef struct { byte direction; // 'A'=ascending or 'D'=descending byte nullspec; // 'F'=nulls First or 'L'=nulls Last CHARACTER name[FNC_MAXNAMELEN_EON]; // column name } names_ord_t; Java Signature Defined in RuntimeContract class: public void setOrderBy(int streamno, String[] colNames) throws SQLException Usage Notes This routine can only run if called from the contract function. It is callable on a PE vproc. The routine will produce an error if the stream number is invalid or the LOCAL ORDER BY metadata was already set. Teradata QueryGrid: Teradata Database-to-Hadoop User Guide 113 Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop FNC_TblOpSetLocalOrderByDef / setOrderBy 114 Teradata QueryGrid: Teradata Database-to-Hadoop User Guide

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Teradata Database-to-Hadoop User Guide