Download Teradata Database-to-Hadoop User Guide

Document related concepts

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Functional Database Model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Versant Object Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database model wikipedia , lookup

Transcript
What would you do if you knew?™
Teradata QueryGrid
Teradata QueryGrid: Teradata Database-toHadoop
User Guide
Release 15.0.4
B035-1185-015K
October 2015
The product or products described in this book are licensed products of Teradata Corporation or its affiliates.
Teradata, Active Data Warehousing, Active Enterprise Intelligence, Applications-Within, Aprimo Marketing Studio, Aster, BYNET,
Claraview, DecisionCast, Gridscale, MyCommerce, QueryGrid, SQL-MapReduce, Teradata Decision Experts, "Teradata Labs" logo, Teradata
ServiceConnect, Teradata Source Experts, WebAnalyst, and Xkoto are trademarks or registered trademarks of Teradata Corporation or its
affiliates in the United States and other countries.
Adaptec and SCSISelect are trademarks or registered trademarks of Adaptec, Inc.
AMD Opteron and Opteron are trademarks of Advanced Micro Devices, Inc.
Apache, Apache Avro, Apache Hadoop, Apache Hive, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks
of the Apache Software Foundation in the United States and/or other countries.
Apple, Mac, and OS X all are registered trademarks of Apple Inc.
Axeda is a registered trademark of Axeda Corporation. Axeda Agents, Axeda Applications, Axeda Policy Manager, Axeda Enterprise, Axeda
Access, Axeda Software Management, Axeda Service, Axeda ServiceLink, and Firewall-Friendly are trademarks and Maximum Results and
Maximum Support are servicemarks of Axeda Corporation.
Data Domain, EMC, PowerPath, SRDF, and Symmetrix are registered trademarks of EMC Corporation.
GoldenGate is a trademark of Oracle.
Hewlett-Packard and HP are registered trademarks of Hewlett-Packard Company.
Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other
countries.
Intel, Pentium, and XEON are registered trademarks of Intel Corporation.
IBM, CICS, RACF, Tivoli, and z/OS are registered trademarks of International Business Machines Corporation.
Linux is a registered trademark of Linus Torvalds.
LSI is a registered trademark of LSI Corporation.
Microsoft, Active Directory, Windows, Windows NT, and Windows Server are registered trademarks of Microsoft Corporation in the United
States and other countries.
NetVault is a trademark or registered trademark of Dell Inc. in the United States and/or other countries.
Novell and SUSE are registered trademarks of Novell, Inc., in the United States and other countries.
Oracle, Java, and Solaris are registered trademarks of Oracle and/or its affiliates.
QLogic and SANbox are trademarks or registered trademarks of QLogic Corporation.
Quantum and the Quantum logo are trademarks of Quantum Corporation, registered in the U.S.A. and other countries.
Red Hat is a trademark of Red Hat, Inc., registered in the U.S. and other countries. Used under license.
SAP is the trademark or registered trademark of SAP AG in Germany and in several other countries.
SAS and SAS/C are trademarks or registered trademarks of SAS Institute Inc.
SPARC is a registered trademark of SPARC International, Inc.
Symantec, NetBackup, and VERITAS are trademarks or registered trademarks of Symantec Corporation or its affiliates in the United States
and other countries.
Unicode is a registered trademark of Unicode, Inc. in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other product and company names mentioned herein may be the trademarks of their respective owners.
The information contained in this document is provided on an "as-is" basis, without warranty of any kind, either express
or implied, including the implied warranties of merchantability, fitness for a particular purpose, or non-infringement.
Some jurisdictions do not allow the exclusion of implied warranties, so the above exclusion may not apply to you. In no
event will Teradata Corporation be liable for any indirect, direct, special, incidental, or consequential damages, including
lost profits or lost savings, even if expressly advised of the possibility of such damages.
The information contained in this document may contain references or cross-references to features, functions, products, or services that are
not announced or available in your country. Such references do not imply that Teradata Corporation intends to announce such features,
functions, products, or services in your country. Please consult your local Teradata Corporation representative for those features, functions,
products, or services available in your country.
Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or
updated without notice. Teradata Corporation may also make improvements or changes in the products or services described in this
information at any time without notice.
To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this
document. Please e-mail: [email protected]
Any comments or materials (collectively referred to as "Feedback") sent to Teradata Corporation will be deemed non-confidential. Teradata
Corporation will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display,
transform, create derivative works of, and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis.
Further, Teradata Corporation will be free to use any ideas, concepts, know-how, or techniques contained in such Feedback for any purpose
whatsoever, including developing, manufacturing, or marketing products or services incorporating Feedback.
Copyright © 2014 - 2015 by Teradata. All Rights Reserved.
Preface
Purpose
This book describes the Teradata® QueryGrid™: Teradata Database-to-Hadoop SQL
interface for transferring data between Teradata Database and remote Hadoop hosts.
Use this book with the other books in the SQL book set.
Audience
This book is intended for database administrators and other technical personnel who use
Teradata Database.
Supported Releases
This book supports Teradata QueryGrid: Teradata Database-to-Hadoop 15.0.4.
Teradata QueryGrid: Teradata Database-to-Hadoop (also referred to as the Teradata-toHadoop connector) supports the following distributions:
• Teradata Database Release 15.0, 15.0.1, 15.0.2, or 15.0.3 to Hortonworks HDP 1.3.2
• Teradata Database Release 15.0.1, 15.0.2, 15.0.3, or 15.0.4 to Hortonworks HDP 2.1.2
• Teradata Database Release 15.0.4 or later to Hortonworks HDP 2.3.0
• Teradata Database Release 15.0.4 or later to Cloudera CDH 5.4.3
Related Documents
Information about using Teradata QueryGrid 14.10, which can be used to connect to MapR
distributions, can be found in the following documents:
Title
Publication ID
Release Summary, B035-1098
B035-1098-112A
Refer to the topic titled "SQL-H for Teradata: LOAD_FROM_HCATALOG."
SQL Functions, Operators, Expressions, and Predicates, B035-1145
B035-1145-112A
Refer to the topic titled "LOAD_FROM_HCATALOG."
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
3
Preface
Prerequisites
Note: These documents refer to Teradata QueryGrid 14.10 by its former name, Teradata
SQL-H.
Prerequisites
You should be familiar with basic relational database management theory and technology.
To become familiar with concepts specific to Teradata Database, read Introduction to
Teradata, B035-1091 and SQL Fundamentals, B035-1141.
Changes to This Book
Release
Description
Teradata QueryGrid:
Teradata Database-toHadoop 15.0.4
•
•
•
October 2015
•
Teradata QueryGrid:
Teradata Database-toHadoop 15.0.3
August 2015
•
•
•
•
•
Teradata QueryGrid:
Teradata Database-toHadoop 15.0.2
•
May 2015
•
•
•
4
Teradata QueryGrid:
Teradata Database-toHadoop 15.0.1
•
•
January 2015
•
•
•
Added information about the support of Cloudera CDH 5.4.
Added information about the support of Hortonworks HDP 2.3.
Added information that you must have Teradata Database release
15.0.4 installed to use the Teradata-to-Hadoop connector with HDP
2.3 and CDH 5.4.
Added that as of Teradata Database release 15.0.4, Hortonworks HDP
1.3.2 is no longer supported.
Book applies now to Hadoop using LDAP and Kerberos for external
security.
Added a known Hortonworks HDP 2.1 issue with Hive 13 to
"Limitations" in Chapter 1.
Added a known issue with importing a Hive table with a large
numbers of partitions to "Limitations" in Chapter 1.
Added information about using DEFINER with authorization and
foreign server to Chapter 2.
Added information to "Authentication Security" in Chapter 4.
Added information about external security (LDAP and Kerberos) for
Hadoop to Chapters 1, 2, and 4.
Note: The Teradata-to-Hadoop connector does not currently support
the use of Kerberos for external security.
Redistributed information from Chapter 3 about table operators as
users will interact with them through the foreign server objects.
Updated information in ServerV[X] and ServerInfoV[X].
Moved FNC Interfaces to an appendix. Added additional attributes to
FNC_ TblOpSetFormat.
Added Hortonworks HDP 2.1 to the list of compatible releases.
Added information about hadoop_properties, isNested,
updateStatistics, and UseNativeQualification name value pairs.
Added information about new supported data types.
Updated information about the proxy user.
Updated heap size and memory information.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Preface
Additional Information
Release
Description
•
•
Teradata QueryGrid:
Teradata Database-toHadoop 15.0
Added information about dealing with line terminators.
Added information about using the table operators with timestamp
data.
Initial book created June 2014.
June 2014
Additional Information
URL
Description
www.info.teradata.com
Use the Teradata Information Products Publishing Library site to:
• View or download a manual:
• Under Online Publications, select General Search.
• Enter your search criteria and click Search.
• Download a documentation CD-ROM:
• Under Online Publications, select General Search.
• In the Title or Keyword field, enter CD-ROM, and click Search.
www.teradata.com
The Teradata home page provides links to numerous sources of
information about Teradata. Links include:
• Executive reports, white papers, case studies of customer
experiences with Teradata, and thought leadership
• Technical information, solutions, and expert advice
• Press releases, mentions and media resources
www.teradata.com/TEN/
Teradata Customer Education delivers training that builds skills and
capabilities for our customers, enabling them to maximize their
Teradata investment.
https://tays.teradata.com
Use Teradata @ Your Service to access Orange Books, technical alerts,
and knowledge repositories, view and join forums, and download
software patches.
Teradata Developer
Exchange
Teradata Developer Exchange provides articles on using Teradata
products, technical discussion forums, and code downloads.
To maintain the quality of our products and services, we would like your comments on the
accuracy, clarity, organization, and value of this document. Please email [email protected].
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
5
Preface
Product Safety Information
Product Safety Information
This document may contain information addressing product safety practices related to data
or property damage, identified by the word Notice. A notice indicates a situation which, if not
avoided, could result in damage to property, such as equipment or data, but not related to
personal injury.
Example
Notice: Improper use of the Reconfiguration utility can result in data loss.
6
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
CHAPTER 1
Introduction to Teradata QueryGrid: Teradata
Database-to-Hadoop
Overview of Teradata QueryGrid: Teradata
Database-to-Hadoop
Teradata QueryGrid: Teradata Database-to-Hadoop (also referred to as the Teradata-toHadoop connector) provides an SQL interface for transferring data between Teradata
Database and remote Hadoop hosts. From Teradata Database you can do the following:
• Import Hadoop data into a temporary or permanent Teradata Database table.
• Export data from temporary or permanent Teradata Database tables into existing
Hadoop tables.
• Create or drop tables in Hadoop.
• Reference tables on the remote hosts in SELECT and INSERT statements.
• Select Hadoop data for use with a business tool.
• Select and join Hadoop data with data from independent data warehouses for analytical
use.
Benefits
• Provides the ability to export data to Hadoop servers, adding to the Hadoop data import
ability that was available in Release 14.10 as Teradata SQL-H.
• Enables the automatic push down of qualification columns and grammar to execute on a
remote host.
• Provides the ability to qualify both columns and partitions involved in the query to
reduce the amount of data that needs to be returned.
• Provides privileges to control who can read and write to the servers and tables on remote
hosts.
• Provides simplified grammar that makes the Teradata-to-Hadoop connector easier to
use. Create a foreign server definition once and thereafter use the server name instead of
detailed connection information in each SQL query.
• Provides the ability to create an authorization object to securely store credentials. Foreign
servers can be defined to use an authorization object to authenticate with a security
system, such as LDAP or Kerberos, that is protecting Hadoop clusters.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
7
Chapter 1 Introduction to Teradata QueryGrid: Teradata Database-to-Hadoop
Overview of Teradata QueryGrid: Teradata Database-to-Hadoop
Considerations
You may not use this feature without the appropriate license. The fact that this feature may be
included in product media or downloads, or described in documentation that you receive,
does not authorize you to use it without the appropriate license. Contact your Teradata sales
representative to purchase and enable this feature.
Teradata QueryGrid: Teradata Database-to-Hadoop installs on a single node on the Teradata
system. The Teradata Database then automatically distributes the table operators and the files
that are needed to the other nodes on the system.
The current version of the Teradata-to-Hadoop connector has the following prerequisites:
• Teradata Database 15.0.4 or later
Note: You must upgrade to 15.0.4, which uses Java 8, to use Teradata QueryGrid: Teradata
Database-to-Hadoop with HDP 2.3 and CDH 5.4.
• At least one of the following:
• Hortonworks HDP 2.1
• Hortonworks HDP 2.3
• Cloudera CDH 5.4
Note: The current version of the Teradata-to-Hadoop connector does not support the use
of Hortonworks HDP 1.3.2. Hortonworks HDP 1.3.2 is supported for use in releases 15.0,
15.0.1, 15.0.2, and 15.0.3 of the Teradata-to-Hadoop connector.
• A minimum of 96GB of node memory
• A network that connects all Teradata Database nodes to all Hadoop data nodes
• If your Hadoop cluster is protected by an external security system, such as LDAP, each
Teradata Database user accessing Hadoop must have a corresponding security system
credential.
The Teradata-to-Hadoop connector also requires some post-installation configuration to the
FSGCache, number of concurrent queries, and Java Virtual Machine (JVM) settings.
For information about the configuration required, see Post-Installation Configuration.
Note: Teradata QueryGrid: Teradata Database-to-Hadoop does not work with a Kerberized
cluster where Hive requires LDAP authentication.
Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos
authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH 5.4 is
not supported.
Limitations
• Teradata QueryGrid: Teradata Database-to-Hadoop supports ORC file import and export
for Hortonworks HDP 2.1. However, there is a known issue with Hive 13 (which is fixed
in Hive 14) that generates an error when importing CHAR/VARCHAR data from ORC
using FOREIGN TABLE or usenativequalification.
• Import from a Hive table fails when the Hadoop job configuration size exceeds the
Teradata-to-Hadoop connector limit of 16 MB. This can happen if there are a large
8
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 1 Introduction to Teradata QueryGrid: Teradata Database-to-Hadoop
ANSI Compliance
number of partitions defined for the Hive table. For example, a simple Hive table of three
INT columns and nine VARCHAR(2000) columns with 10,000 partitions may return a
job configuration size of 1.6 MB, but that same table defined to have 100,000 partitions
would have a job configuration size of 15 MB. If your Hadoop job fails, try reducing the
number of partitions in the Hive table that you want to import.
• The SequenceFile format is not supported.
• Apache Avro is not supported.
ANSI Compliance
The syntax used for the connector is a Teradata extension to the ANSI SQL:2011 standard.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
9
Chapter 1 Introduction to Teradata QueryGrid: Teradata Database-to-Hadoop
ANSI Compliance
10
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
CHAPTER 2
Syntax for Teradata QueryGrid: Teradata
Database-to-Hadoop
Introduction
This chapter describes the syntax and options for the Teradata-to-Hadoop connector, and
provides examples of their use. It includes DDL statements, and information about DML
and DCL statements.
If you are a DBA, or a DBA has already granted you the privileges needed to create foreign
servers, we recommend that you start by reading CREATE FOREIGN SERVER.
ALTER FOREIGN SERVER
Purpose
Modifies the parameters of an existing server object.
Syntax
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
11
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
ALTER FOREIGN SERVER
Syntax Elements
server_name
The name given to the foreign server object.
EXTERNAL SECURITY
Associates an authorization object with the foreign server. The authorization stores the
encrypted credentials for a user as a database object. The Teradata QueryGrid connector
passes the credentials in the authorization to the remote platform identified by the
foreign server when the foreign server is accessed.
You must use EXTERNAL SECURITY TRUSTED for Teradata QueryGrid: Teradata
Database-to-Hadoop when the Hadoop platform is protected by an external security
system, such as Kerberos, for example.
INVOKER
DEFINER
INVOKER is a keyword that indicates that the associated authorization must be present
in the user database at the time that the foreign server is accessed.
Note: The user database is the database that was created for the user in the Teradata
system when the user account was created.
DEFINER is a keyword that indicates that the associated authorization must be present
in the database that contains the foreign server when the foreign server is accessed.
Note: The DEFAULT keyword that can be used with DEFINER in CREATE
AUTHORIZATION and REPLACE AUTHORIZATION statements is not needed in
association with a foreign server.
You must use either INVOKER TRUSTED or DEFINER TRUSTED if the remote
platform uses an external security system (such as Kerberos, for example) for
authentication.
12
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
ALTER FOREIGN SERVER
TRUSTED
A keyword that indicates the associated authorization object was created as TRUSTED.
authorization_name
Specifies the name of the authorization object to be used when the foreign server is
accessed.
Modify options
ADD
Use to:
• add or replace a global name value pair that is used to define the server object
• add an IMPORT or EXPORT table operator. If you want to replace a table operator that is
already associated with the foreign server you must first drop the table operator before
adding the new one.
• add or replace a local name value pair that is used with an IMPORT or EXPORT table
operator
name('value')
The name value pair or pairs that you want to add or modify.
Note that in the description of the name value pairs:
• The label "Server only" indicates that the name value pair must follow the syntax
ADD name('value').
• The label "Import only" indicates that the name value pair must be specified after the
IMPORT keyword.
• The label "Export only" indicates that the name value pair must be specified after the
EXPORT keyword.
• Unlabeled name value pairs may be specified in any part of the ADD syntax. If
specified as ADD name('value'), the name value pair will be applied to the server
as a whole.
For descriptions of the name value pairs used with the server object, see Required
Name Value Pairs and Optional Name Value Pairs.
IMPORT
Indicates that you are going to act on the operator that is used to import data into
Teradata Database.
EXPORT
Indicates that you are going to act on the operator that is used to export data out of
Teradata Database.
operator_name
The name of the table operator that you want to use.
For more information about the table operators used with the server object, see
CREATE FOREIGN SERVER.
Drop options
DROP
• to drop a global name value pair that was used to define a server object. You need only
specify the name to drop the pair.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
13
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
ALTER FOREIGN SERVER
• to drop an IMPORT or EXPORT table operator that was associated with a server
definition. When you drop a table operator all related name value pairs are also dropped.
• to drop a local name value pair that was used with an IMPORT or EXPORT table
operator. You need only specify the name to drop the pair.
name
When used alone, name is the name of the name value pair that you want to drop.
For more information about the name value pairs used with the server object, see
Required Name Value Pairs and Optional Name Value Pairs.
IMPORT
Indicates that you are going to act on the operator that is used to import data into
Teradata Database.
EXPORT
Indicates that you are going to act on the operator that is used to export data out of
Teradata Database
Required Name Value Pairs
These name value pairs are required to create a functioning foreign server object. Additional
optional name value pairs may be required to create a foreign server for a specific
implementation.
hosttype
Server only.
For Teradata QueryGrid: Teradata Database-to-Hadoop, this is ('hadoop').
port
Server only. The server port number for the Hive Metastore; typically this is 9083.
server
Server only. The DNS host name or IP address for the Apache Hive Metastore
(hive.metastore.uris). You can use an application, such as Ambari, to obtain this value.
Optional Name Value Pairs
These name value pairs are optional. However, a particular server implementation may
require you to define some of these name value pairs. For example, a foreign server must be
defined with a Hive port to support queries that access Hive Server2.
clustername
Required when using security('kerberos'). Specifies the directory name that
stores the JAR file that contains the configuration files (core-site.xml, hdfs-site.xml,
hive-site.xml, hive-site.xml, and yarn-site.xml) for the Hadoop cluster to be accessed.
This directory was set up during the Hadoop client installation on the Teradata nodes.
For example, you would use the name value pair clustername('yourcluster') if
the files are stored as follows:
• yourcluster/
• yourcluster/core-site.xml
• yourcluster/hdfs-site.xml
• yourcluster/hive-site.xml
• yourcluster/hive-site.xml
• yourcluster/yarn-site.xml
14
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
ALTER FOREIGN SERVER
compression_codec
Export only. Specifies the type of compression to use for the exported data. The default
is no compression.
The supported compression types are based on the compression codecs configured on
the Hadoop system.
Note: Snappy is supported but you must specify it in hadoop_properties using
<orc.compression=SNAPPY> as the argument. Snappy is supported only for the
ORCFile file type.
You must specify the full name for the compression codec as follows:
• org.apache.hadoop.io.compress.DefaultCodec
• org.apache.hadoop.io.compress.GzipCodec
• org.apache.hadoop.io.compress.BZip2Codec
dbname
The name of the user’s database.
This parameter is optional. You can specify a dbname value in the foreign server to limit
its scope to a specific database. The value specified in the USING clause in the CREATE
FOREIGN SERVER syntax overrides any corresponding value specified directly in the
user query.
default_string_size
Size at which data imported from or exported to Hadoop String columns is truncated.
When applied to the import operator, the value represents the maximum number of
Unicode characters to import, and defaults to 2048 characters. When applied to the
export operator, the value represents the maximum number of bytes to export, and
defaults to 4096 bytes. Teradata QueryGrid silently truncates the String columns at the
default value set in default_string_size.
hadoop_properties ('<property1=value>, <property3=value1,value2>')
Sets specific properties used to interact with Hadoop. If there are multiple arguments,
you must delimit them with angle brackets. If there is only one argument, you can omit
the angle brackets. For example, the syntax for the hadoop_properties clause for a High
Availability (HA) target supports an updated syntax where multiple values can be
included. In this case the properties must be enclosed by left and right angle brackets.
No spaces are allowed within or between arguments.
The High Availability hadoop properties are defined based on the name service that is
defined on your hadoop server. For example, if you have the following Hadoop
properties:
hadoop_properties('
<dfs.client.use.datanode.hostname=true>
,<dfs.datanode.usedatanode.hostname=true>
,<dfs.nameservices=MYCOMPANY_HADOOP02>
,<dfs.ha.namenodes.MYCOMPANY_HADOOP02=nn1,nn2>
,<dfs.namenode.rpc-address.MYCOMPANY_HADOOP02.nn1=hdp230-2:8020>
,<dfs.namenode.rpc-address.MYCOMPANY_HADOOP02.nn2=hdp230-3:8020>
,<dfs.client.failover.proxy.provider.MYCOMPANY_HADOOP02=org.apache.
hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider>')
In this example, you would make the following replacements:
• Replace MYCOMPANY_HADOOP02 with your own name service ID.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
15
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
ALTER FOREIGN SERVER
• Replace hdp230-2 and hdp230-3 with your own namenode hostnames.
It may also be necessary for you to replace nn1 and nn2 in the example above with your
own namenode aliases. To verify, check the following property in your hdfssite.xml file:
<property>
<name>dfs.ha.namenodes.MYCOMPANY_HADOOP02</name>
<value>namenode10,namenode66</value>
</property>
In this case, you would make the following replacements:
• Replace nn1 with namenode10
• Replace nn2 with namenode66
In most cases, you should set the dfs.client.use.datanode.hostname property to true. If
you have a setup where your TPA nodes are on one BYNET, the Hadoop cluster is on
another BYNET, and they are communicating with one another via Ethernet, then you
should also set the dfs.datanode.usedatanode.hostname property to true.
hiveserver
The DNS host name or IP address of Hive Server2. This is used when a query results in
the use of the HCTAS and HDROP procedures or FOREIGN TABLE SELECT in Hive.
(You can use an application, such as Ambari, to obtain this value.) If no value is
specified for hiveserver then the value for server is used.
hiveport
The port for access to the Hive Server2; typically this is 10000. You can use an
application, such as Ambari, to obtain this value.
merge_hdfs_files
Export only. Indicates that files under the same partition should be merged whenever
possible. The default is to not merge. A value of TRUE means that files will be merged.
row_count_report_freq
The frequency with which byte count is updated in DBQL. The default is every 100
rows. You can adjust this to a larger value if the update frequency is too resource
intensive.
security
Specifies the name of the external security system used for authentication on the
Hadoop cluster. This parameter is required when an external security system is in use.
The default is no security. Valid values are:
• kerberos
• ldap
Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos
authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH
5.4 is not supported.
tablename
The name of the table to be imported or exported.
This parameter is optional. You can specify a tablename value in the foreign server to
limit its scope to a specific table. The value specified in the USING clause in the
CREATE FOREIGN SERVER syntax overrides any corresponding value specified
directly in the user query.
16
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
ALTER FOREIGN SERVER
temp_dbname
Import only. The value is the name of the Hadoop database to use to store temporary
Hive staging tables.
This parameter is optional. You should consider using temp_dbname when planning to
use FOREIGN TABLE syntax on a foreign server or when a foreign server is set up to
use usenativequalification. If no database is specified with temp_dbname, then the
default Hadoop database is used.
To use the specified database, it must exist when the foreign server is created or altered
to use the database. The session user must have create and write permission to the
database. If multiple users use the same foreign server then the Hadoop administrator
may want to consider setting up Hive authorization in such a way that temporary tables
cannot be read by another user.
transformformatting
Import only. When set to 'true' it indicates that an array list data is formatted
appropriately, so that it can be cast directly into a Teradata array column type based on
the appropriate data type.
This parameter is optional. The value specified in the USING clause in the CREATE
FOREIGN SERVER syntax overrides any corresponding value specified directly in the
user query.
updatestatistics
Export only. Indicates that the LOAD_TO_HCATALOG_abcn_n_n operator updates
the table statistics after all the data has been loaded into the target Hive table. Valid
values are 'true' and 'false'. A value of true means that the table statistics are updated.
Note: You must also have set hive.stats.autogather to true in your hive-site.xml file for
updatestatistics to work properly.
usenativequalification
Import only. A value of 'true' indicates that SELECT queries should be pushed down to
Hive as much as possible. When a foreign server uses usenativequalification, Teradata
Database examines the following conditions:
• Hive table data size is large and there are qualifying predicates on non-partitioned
Hive columns. Large is defined as having a number of splits that is larger than the
number of Teradata Database nodes.
• The queried Hive object is a view.
When either of the two conditions are met, Teradata Database constructs a Hive query
from the Hive object name, the referenced columns, and the qualifying predicates. It
then creates a Hive staging table (in the database specified by temp_dbname) from the
constructed query and retrieves data in the staging table. The staging table is dropped
after all data has been retrieved.
Valid values are 'true' and 'false.'
For queries that involve joins between two HCatalog tables, Teradata Database brings
the data into Teradata spool and joins them in the database. The join is not pushed into
Hive. For example, the join syntax in the following query requires the manual
FOREIGN TABLE SELECT to be accomplished in Hive:
SELECT h1.c1, h2.c2 FROM h1@hadoop1, h2@hadoop1 WHERE h1.id =
h2.id ;
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
17
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
ALTER FOREIGN SERVER
username
The name of the Hadoop user's credential. This option is ignored when the security
name value pair is defined for the foreign server. If no username value and no security
value are defined, the foreign server uses the name of user making the request. (This is
the Teradata Database user name in capital letters.)
No password is associated with a Hadoop user. HDFS and Hive check for a user name
for access. If no user name is specified, then foreign server supplies the name of the
session user. If HDFS and Hive are not configured for file permissions, then the user
name is optional.
Required Privileges
You must have DROP SERVER privilege on the TD_SERVER_DB database or on the
specified foreign server to modify the foreign server object. If you are modifying the table
operators that are associated with the server, or adding a table operator, you must also have
EXECUTE FUNCTION and SELECT privileges on the specified table operators.
Examples of Using ALTER FOREIGN SERVER
Example: Adding a New Attribute
The following example adds a new attribute to an existing server object. In this example,
INSERT and SELECT operations are limited to the table named cardata.
ALTER FOREIGN SERVER hadoop2
ADD tablename('cardata') ;
Example: Defining an EXPORT Option for an Existing Server
The following example defines an EXPORT table operator to an existing foreign server
object :
ALTER FOREIGN SERVER hive_metastore_server
ADD EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_3_0 USING
merge_hdfs_files('true')
compression_codec('io.seqfile.compression.type=BLOCK') ;
Usage Notes
You cannot use the following names in the name value pairs in ALTER SERVER statements:
• Columns
• hExplain
• IsNested
• Servermode
Note: External security options and ADD or DROP clauses must be specified in the syntax.
18
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
BEGIN LOGGING
BEGIN LOGGING
Purpose
Starts the auditing of SQL requests that attempt to access data.
This topic describes only the portions of the BEGIN LOGGING syntax diagram that are
specific to this Teradata QueryGrid connector. For information about the other syntax that
you can use with BEGIN LOGGING, see SQL Data Definition Language - Syntax and
Examples, B035-1144.
Syntax
BEGIN LOGGING
ON
A
WITH TEXT
DENIALS
FIRST
LAST
FIRST AND LAST
EACH
A
FOR CONSTRAINT
constraint_name
ALL
,
operation
GRANT
B
,
BY
database_name
user_name
B
,
ON
;
20
AUTHORIZATION authorization_name
DATABASE database_name
USER database_name
TABLE
object_name
VIEW
database_name.
MACRO
user_name.
PROCEDURE
FUNCTION
TYPE
FOREIGN SERVER
Syntax Element
ON FOREIGN SERVER object_name
Indicates that the database object for which access is to be logged is a foreign server.
You must specify an object name, which is the name of the foreign server. You can
optionally specify the name of the containing database, which must be
TD_SERVER_DB. You cannot use a user_name with FOREIGN SERVER.
For more information about using BEGIN LOGGING, see SQL Data Definition Language Syntax and Examples, B035-1144.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
19
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
COMMENT (Comment Placing Form)
COMMENT (Comment Placing Form)
Purpose
Creates a user-defined description of a user-defined database object or definition in the data
dictionary.
This topic describes only the portions of the COMMENT syntax diagram that are specific to
Teradata QueryGrid. For information about the other syntax elements in COMMENT
(Comment Placing Form), see SQL Data Definition Language - Syntax and Examples,
B035-1144.
Syntax
COMMENT
ON
object_kind_1
object_kind_2
object_name
database_name.
user_name.
'comment'
;
AS
IS
Syntax Element
object_kind_2
An optional database object kind specification.
You can specify the following database object kinds to retrieve a comment for the kind
of object they represent, but they are optional.
• DATABASE
• FOREIGN SERVER
• TABLE
• USER
If you specify an optional database_name with FOREIGN SERVER, the name must be
TD_SERVER_DB. You cannot use a user_name with FOREIGN SERVER.
All existing rules for COMMENT apply for use with a FOREIGN SERVER object.
The optional comment string is recorded in DBC.TVM.
For more information about using COMMENT (Comment Placing Form), see SQL Data
Definition Language - Syntax and Examples, B035-1144.
CREATE AUTHORIZATION and REPLACE
AUTHORIZATION
Purpose
Creates or replaces an authorization object in Teradata Database. The authorization stores
credentials for a user account that exists on a remote platform. The credentials need only be
20
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE AUTHORIZATION and REPLACE AUTHORIZATION
valid on the platform specified in the foreign server object; they do not need to be valid on
the Teradata Database or on its underlying operating system. When you specify TRUSTED
in the CREATE or REPLACE AUTHORIZATION statement, Teradata Database does not
validate the credentials.
For Teradata QueryGrid, an authorization object is used by a foreign server object to log into
a remote platform using credentials that are valid on the remote platform. When a Teradata
user makes a request that uses the foreign server, the foreign server object provides the
credentials from the authorization object to the target platform for authentication. This
allows any part of the request that runs on the remote platform to use the context, privileges,
and access control granted to the remote platform user account.
For example, if the foreign server connects to a Hadoop server protected by LDAP, then the
associated authorization object must contain credentials for the user account in LDAP. If the
foreign server connects to a Hadoop server protected by Kerberos, then the associated
authorization object must contain credentials for the user account in Kerberos.
The syntax table describes only the portions of the CREATE AUTHORIZATION and
REPLACE AUTHORIZATION syntax diagram that are specific to Teradata QueryGrid. For
information about the other syntax that you can use with CREATE AUTHORIZATION and
REPLACE AUTHORIZATION, see SQL Data Definition Language - Syntax and Examples,
B035-1144.
Syntax
Syntax Elements
database_name.
user_dbname.
Optional name of the location where the authorization is to be stored.
The default location that is used changes based on whether DEFINER or INVOKER is
specified. The following rules apply to specifying DEFINER or INVOKER:
• If you specify DEFINER, the database or user you specify must be the containing
database or user for the foreign server, UDF, table UDF, method, or external SQL
procedure. If no location is specified, the authorization is created in the database
that contains the foreign server objects (TD_SERVER_DB).
• If you specify INVOKER, the database_name or user_dbname you specify must be
associated with the session user who will be sending requests to the foreign server. If
no location is specified, the authorization is placed in the user database of the
creator of the authorization.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
21
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE AUTHORIZATION and REPLACE AUTHORIZATION
authorization_name
Name for the authorization object. This name must be unique within the database in
which it is stored.
INVOKER
DEFINER
• If you specify INVOKER TRUSTED, or if you specify TRUSTED alone, Teradata
creates the authorization object in the database of the user who creates the object.
This syntax makes the authorization available only to those with privilege to the user
database.
• If you specify DEFINER TRUSTED or DEFINER DEFAULT TRUSTED, then
Teradata creates the authorization object in the database that contains the object
that is using the authorization; for a foreign server this is the TD_SERVER_DB
database. This syntax makes the authorization globally available.
TRUSTED
A keyword used to specify that the credentials are to be encrypted and stored as
database objects.
When using an authorization object, you must use the TRUSTED security type for
Teradata QueryGrid: Teradata Database-to-Hadoop.
You cannot use TRUSTED authorizations in CREATE or REPLACE UDF or XSP
statements.
'fs_user_name'
The name of the credential on the remote platform to be used by the foreign server.
'fs_password'
The password for the credential on the remote platform to be used by the foreign server.
All existing rules for CREATE AUTHORIZATION and REPLACE AUTHORIZATION
apply.
For more information about using CREATE AUTHORIZATION and REPLACE
AUTHORIZATION, see SQL Data Definition Language - Syntax and Examples,
B035-1144.
Usage Notes
• An authorization is required only if you are using an external security system (such as
LDAP or Kerberos) for authentication on the foreign server's target platform. For more
information, see LDAP and Kerberos Authentication Security.
Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos
authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH
5.4 is not supported.
• You must use either INVOKER TRUSTED or DEFINER TRUSTED when authentication
on Hadoop is performed by an external security system such as LDAP or Kerberos.
• Use INVOKER TRUSTED when you want to create a one-to-one mapping between the
Teradata user and the user on the foreign server's target platform. For example, using the
same user name for Teradata and LDAP.
• Use DEFINER TRUSTED when you want to create a many-to-one mapping between
Teradata users and a user on the foreign server's target platform. For example, when you
want multiple Teradata users who are making requests to the foreign server to use one
LDAP account on the target platform.
22
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
• When you create an authorization for another user using INVOKER TRUSTED,
user_dbname must be specified. Specify the username associated with the session user
who will be sending requests to the foreign server. If you fail to specify user_dbname, the
authorization will be stored in your user database.
• The authorization takes up no space in the database used to store it.
• If your credentials change on the foreign server's target platform, you must remember to
replace the credentials in your authorization object. If you fail to update the invalid
information, the next time that you try to reference the foreign server object, you get an
error message.
• If you drop an authorization object, keep in mind that it may be used by multiple foreign
server objects. You should either drop the foreign server objects or alter them so that they
specify a valid authorization object. If you fail to update the invalid information, the next
time that you try to reference the foreign server object, you get an error message.
Examples of Creating and Replacing the Authorization
If you plan to use the authorization to authenticate to LDAP or Kerberos on a foreign server
then you must use either INVOKER TRUSTED or DEFINER TRUSTED.
The following two examples establish authorization for the user who invokes the object. The
credentials are encrypted and stored as a database object in the user database.
CREATE AUTHORIZATION sales AS INVOKER TRUSTED
USER 'johnson'
PASSWORD 'Secret' ;
REPLACE AUTHORIZATION sales AS TRUSTED
USER 'williams'
PASSWORD 'topsecret' ;
If you want to make the authorization available globally, create the authorization on
TD_SERVER_DB using the DEFINER TRUSTED type.
CREATE AUTHORIZATION TD_SERVER_DB.remote_system1
AS DEFINER TRUSTED USER 'proxy_1'
PASSWORD 'Global' ;
If you use DEFINER TRUSTED, as in this example, then the credentials for johnson are
stored in the sales authorization created in the TD_SERVER_DB database.
CREATE AUTHORIZATION TD_SERVER_DB.sales AS DEFINER TRUSTED
USER 'johnson'
PASSWORD 'Secret';
CREATE FOREIGN SERVER
Purpose
Creates a foreign server object and associates table operators with it.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
23
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
When you create a server object, you can customize it based on its purpose. You can define
multiple server objects for the same remote database, each with different characteristics
needed by different users.
You can use name value pairs to define the characteristics of the foreign server. You can use
global parameters to define the foreign server as a whole. Some table operators have local
parameters that are specific to the operator. Some parameters can be used so that the server
object overrides user selection; for example, you limit access to data by setting the table
name.
Syntax
operator option
table_operator
using option
database_name.
Syntax Elements
server_name
The name given to the foreign server object.
EXTERNAL SECURITY
Associates an authorization object with the foreign server. The authorization stores the
encrypted credentials for a user as a database object. The Teradata QueryGrid connector
passes the credentials in the authorization to the remote platform identified by the
foreign server when the foreign server is accessed.
You must use EXTERNAL SECURITY TRUSTED for Teradata QueryGrid: Teradata
Database-to-Hadoop when the Hadoop platform is protected by an external security
system, such as Kerberos, for example.
INVOKER
24
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
DEFINER
INVOKER is a keyword that indicates that the associated authorization must be present
in the user database at the time that the foreign server is accessed.
Note: The user database is the database that was created for the user in the Teradata
system when the user account was created.
DEFINER is a keyword that indicates that the associated authorization must be present
in the database that contains the foreign server when the foreign server is accessed.
Note: The DEFAULT keyword that can be used with DEFINER in CREATE
AUTHORIZATION and REPLACE AUTHORIZATION statements is not needed in
association with a foreign server.
You must use either INVOKER TRUSTED or DEFINER TRUSTED if the remote
platform uses an external security system (such as Kerberos, for example) for
authentication.
TRUSTED
A keyword that indicates the associated authorization object was created as TRUSTED.
authorization_name
Specifies the name of the authorization object to be used when the foreign server is
accessed.
Using Option
USING
USING introduces the global name value pairs (NVPs) that provide the server definition
information. USING must be followed by at least one name value pair of the form
name('value' ), but an empty value of ' ' is supported. You can create a foreign server without
a USING clause, but users cannot query a foreign server until you complete the server
definition with an import operator and an export operator.
The USING clause that appears in the server area (in front of the table operators ) contains
global parameters that define the connection to the remote platform and can be applied to
both the import and export table operators. The USING clause that appears in the operator
option part of the syntax diagram contains local parameters that are used just for that table
operator. A name value pair can be used in any USING clause location unless otherwise
indicated as "Server only," "Import only," and "Export only."
name('value')
The name value pair or pairs that you specify to define the foreign server.
For descriptions of the name value pairs used with the server object, see "Required Name
Value Pairs" and "Optional Name Value Pairs."
Required Name Value Pairs
These name value pairs are required to create a functioning foreign server object. Additional
optional name value pairs may be required to create a foreign server for a specific
implementation.
hosttype
Server only.
For Teradata QueryGrid: Teradata Database-to-Hadoop, this is ('hadoop').
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
25
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
port
Server only. The server port number for the Hive Metastore; typically this is 9083.
server
Server only. The DNS host name or IP address for the Apache Hive Metastore
(hive.metastore.uris). You can use an application, such as Ambari, to obtain this value.
Optional Name Value Pairs
These name value pairs are optional. However, a particular server implementation may
require you to define some of these name value pairs. For example, a foreign server must be
defined with a Hive port to support queries that access Hive Server2.
clustername
Required when using security('kerberos'). Specifies the directory name that
stores the JAR file that contains the configuration files (core-site.xml, hdfs-site.xml,
hive-site.xml, hive-site.xml, and yarn-site.xml) for the Hadoop cluster to be accessed.
This directory was set up during the Hadoop client installation on the Teradata nodes.
For example, you would use the name value pair clustername('yourcluster') if
the files are stored as follows:
• yourcluster/
• yourcluster/core-site.xml
• yourcluster/hdfs-site.xml
• yourcluster/hive-site.xml
• yourcluster/hive-site.xml
• yourcluster/yarn-site.xml
compression_codec
Export only. Specifies the type of compression to use for the exported data. The default
is no compression.
The supported compression types are based on the compression codecs configured on
the Hadoop system.
Note: Snappy is supported but you must specify it in hadoop_properties using
<orc.compression=SNAPPY> as the argument. Snappy is supported only for the
ORCFile file type.
You must specify the full name for the compression codec as follows:
• org.apache.hadoop.io.compress.DefaultCodec
• org.apache.hadoop.io.compress.GzipCodec
• org.apache.hadoop.io.compress.BZip2Codec
dbname
The name of the user’s database.
This parameter is optional. You can specify a dbname value in the foreign server to limit
its scope to a specific database. The value specified in the USING clause in the CREATE
FOREIGN SERVER syntax overrides any corresponding value specified directly in the
user query.
default_string_size
Size at which data imported from or exported to Hadoop String columns is truncated.
When applied to the import operator, the value represents the maximum number of
Unicode characters to import, and defaults to 2048 characters. When applied to the
26
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
export operator, the value represents the maximum number of bytes to export, and
defaults to 4096 bytes. Teradata QueryGrid silently truncates the String columns at the
default value set in default_string_size.
hadoop_properties ('<property1=value>, <property3=value1,value2>')
Sets specific properties used to interact with Hadoop. If there are multiple arguments,
you must delimit them with angle brackets. If there is only one argument, you can omit
the angle brackets. For example, the syntax for the hadoop_properties clause for a High
Availability (HA) target supports an updated syntax where multiple values can be
included. In this case the properties must be enclosed by left and right angle brackets.
No spaces are allowed within or between arguments.
The High Availability hadoop properties are defined based on the name service that is
defined on your hadoop server. For example, if you have the following Hadoop
properties:
hadoop_properties('
<dfs.client.use.datanode.hostname=true>
,<dfs.datanode.usedatanode.hostname=true>
,<dfs.nameservices=MYCOMPANY_HADOOP02>
,<dfs.ha.namenodes.MYCOMPANY_HADOOP02=nn1,nn2>
,<dfs.namenode.rpc-address.MYCOMPANY_HADOOP02.nn1=hdp230-2:8020>
,<dfs.namenode.rpc-address.MYCOMPANY_HADOOP02.nn2=hdp230-3:8020>
,<dfs.client.failover.proxy.provider.MYCOMPANY_HADOOP02=org.apache.
hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider>')
In this example, you would make the following replacements:
• Replace MYCOMPANY_HADOOP02 with your own name service ID.
• Replace hdp230-2 and hdp230-3 with your own namenode hostnames.
It may also be necessary for you to replace nn1 and nn2 in the example above with your
own namenode aliases. To verify, check the following property in your hdfssite.xml file:
<property>
<name>dfs.ha.namenodes.MYCOMPANY_HADOOP02</name>
<value>namenode10,namenode66</value>
</property>
In this case, you would make the following replacements:
• Replace nn1 with namenode10
• Replace nn2 with namenode66
In most cases, you should set the dfs.client.use.datanode.hostname property to true. If
you have a setup where your TPA nodes are on one BYNET, the Hadoop cluster is on
another BYNET, and they are communicating with one another via Ethernet, then you
should also set the dfs.datanode.usedatanode.hostname property to true.
hiveserver
The DNS host name or IP address of Hive Server2. This is used when a query results in
the use of the HCTAS and HDROP procedures or FOREIGN TABLE SELECT in Hive.
(You can use an application, such as Ambari, to obtain this value.) If no value is
specified for hiveserver then the value for server is used.
hiveport
The port for access to the Hive Server2; typically this is 10000. You can use an
application, such as Ambari, to obtain this value.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
27
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
merge_hdfs_files
Export only. Indicates that files under the same partition should be merged whenever
possible. The default is to not merge. A value of TRUE means that files will be merged.
row_count_report_freq
The frequency with which byte count is updated in DBQL. The default is every 100
rows. You can adjust this to a larger value if the update frequency is too resource
intensive.
security
Specifies the name of the external security system used for authentication on the
Hadoop cluster. This parameter is required when an external security system is in use.
The default is no security. Valid values are:
• kerberos
• ldap
Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos
authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH
5.4 is not supported.
tablename
The name of the table to be imported or exported.
This parameter is optional. You can specify a tablename value in the foreign server to
limit its scope to a specific table. The value specified in the USING clause in the
CREATE FOREIGN SERVER syntax overrides any corresponding value specified
directly in the user query.
temp_dbname
Import only. The value is the name of the Hadoop database to use to store temporary
Hive staging tables.
This parameter is optional. You should consider using temp_dbname when planning to
use FOREIGN TABLE syntax on a foreign server or when a foreign server is set up to
use usenativequalification. If no database is specified with temp_dbname, then the
default Hadoop database is used.
To use the specified database, it must exist when the foreign server is created or altered
to use the database. The session user must have create and write permission to the
database. If multiple users use the same foreign server then the Hadoop administrator
may want to consider setting up Hive authorization in such a way that temporary tables
cannot be read by another user.
transformformatting
Import only. When set to 'true' it indicates that an array list data is formatted
appropriately, so that it can be cast directly into a Teradata array column type based on
the appropriate data type.
This parameter is optional. The value specified in the USING clause in the CREATE
FOREIGN SERVER syntax overrides any corresponding value specified directly in the
user query.
updatestatistics
Export only. Indicates that the LOAD_TO_HCATALOG_abcn_n_n operator updates
the table statistics after all the data has been loaded into the target Hive table. Valid
values are 'true' and 'false'. A value of true means that the table statistics are updated.
Note: You must also have set hive.stats.autogather to true in your hive-site.xml file for
updatestatistics to work properly.
28
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
usenativequalification
Import only. A value of 'true' indicates that SELECT queries should be pushed down to
Hive as much as possible. When a foreign server uses usenativequalification, Teradata
Database examines the following conditions:
• Hive table data size is large and there are qualifying predicates on non-partitioned
Hive columns. Large is defined as having a number of splits that is larger than the
number of Teradata Database nodes.
• The queried Hive object is a view.
When either of the two conditions are met, Teradata Database constructs a Hive query
from the Hive object name, the referenced columns, and the qualifying predicates. It
then creates a Hive staging table (in the database specified by temp_dbname) from the
constructed query and retrieves data in the staging table. The staging table is dropped
after all data has been retrieved.
Valid values are 'true' and 'false.'
For queries that involve joins between two HCatalog tables, Teradata Database brings
the data into Teradata spool and joins them in the database. The join is not pushed into
Hive. For example, the join syntax in the following query requires the manual
FOREIGN TABLE SELECT to be accomplished in Hive:
SELECT h1.c1, h2.c2 FROM h1@hadoop1, h2@hadoop1 WHERE h1.id =
h2.id ;
username
The name of the Hadoop user's credential. This option is ignored when the security
name value pair is defined for the foreign server. If no username value and no security
value are defined, the foreign server uses the name of user making the request. (This is
the Teradata Database user name in capital letters.)
No password is associated with a Hadoop user. HDFS and Hive check for a user name
for access. If no user name is specified, then foreign server supplies the name of the
session user. If HDFS and Hive are not configured for file permissions, then the user
name is optional.
DO IMPORT WITH
Associates an IMPORT table operator with a foreign server.
Note: You can specify table operators in any order.
DO EXPORT WITH
Associates an EXPORT table operator with a foreign server.
Note: You can specify table operators in any order.
Operator Option
database_name.
The name of the database that contains the operator that you want to call. For example,
SYSLIB.
table_operator
The name of the table operator to use for import or export. The Teradata-to-Hadoop
connector provides the following table operators for use:
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
29
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
Table Operators
Connector Releases That Provide These
Operators
LOAD_FROM_HCATALOG_HDP1_3_2
15.0, 15.0.1, 15.0.2, and 15.0.3
LOAD_TO_HCATALOG_HDP1_3_2
LOAD_FROM_HCATALOG_HDP2_1_2
15.0.1, 15.0.2, 15.0.3, and 15.0.4
LOAD_TO_HCATALOG_HDP2_1_2
LOAD_FROM_HCATALOG_HDP2_3_0
15.0.4
LOAD_TO_HCATALOG_HDP2_3_0
LOAD_FROM_HCATALOG_CDH5_4_3
15.0.4
LOAD_TO_HCATALOG_CDH5_4_3
DO IMPORT WITH uses the LOAD_FROM_HCATALOG_abcn_n_n table operators.
These table operators retrieve data from a Hadoop distributed database into Teradata
Database, where the data can be placed in tables or joined with existing tables. These
table operators produce a spooled table that contains rows and columns of data from a
user-specified Hadoop table that is defined in the HCatalog of the remote system.
DO EXPORT WITH uses LOAD_TO_HCATALOG_abcn_n_n table operators.
These table operators export data from Teradata Database into a Hadoop distributed
database, where the data can be placed in tables or joined with existing tables.
Note: When you create a foreign server, specify the table operator name with the
distribution acronym and version number that corresponds with the version of the
Hadoop distribution that you are using. For example,
LOAD_FROM_HCATALOG_CDH5_4_3 is compatible with Cloudera CDH version
5.4.3.
Supported Data Types, HCatalog File Types, and Compression
The following table shows the data types supported by the Teradata-to-Hadoop connector
and how they are mapped during import and export.
30
Hadoop Data Type
Teradata Database Data Type
String
VARCHAR UNICODE CHARSET
Boolean
BYTEINT
Integer
INT
BigInt
BIGINT
Float
FLOAT
Double
FLOAT
BINARY
VARBYTE
MAP
VARCHAR UNICODE CHARSET
Struct
VARCHAR UNICODE CHARSET
ARRAY
VARCHAR UNICODE CHARSET
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
Hadoop Data Type
Teradata Database Data Type
TINYINT
BYTEINT
SMALLINT
SMALLINT
Date
DATE
Timestamp
TIMESTAMP
Decimal
DECIMAL
VARCHAR
VARCHAR
CHAR
CHAR
Data Import Notes
The Hadoop String data type does not have a maximum length. Teradata Database has a row
size limit of approximately 64K. Exceeding this row size limit results in an error. The default
VARCHAR string size of 4096 bytes (2048 Unicode characters) permits approximately 14
Hadoop String columns to be imported successfully and held in a single Teradata Database
row. (Note that the row header and columns of other types are part of the row size and may
reduce the number of String columns that can be imported.)
You may want to change the default_string_size in the Foreign Server to a different value
based on typical string size and number of Hadoop String columns to be imported. For
example, if your row size typically exceeds the size limit you may want to take a best-fit
approach and set the default_string_size to a value smaller than 2048 so that imports can be
performed without error. The strings are truncated at the value set in default_string_size.
You may alternately want to import large Hadoop STRING columns using a more
customized approach. For more information, see RETURNS Clause.
The Teradata Database data type JSON must first be cast to VARCHAR or CLOB for export.
It is imported as a VARCHAR or CLOB and cast to JSON.
BOOLEAN data types are stored as true/false literal values in Hadoop HiveQL. In Teradata,
Hadoop BOOLEAN data types are mapped to BYTEINT data types. When Teradata imports
Boolean data from Hadoop, the true/false literal values are stored as 1/0 values, respectively,
in the corresponding Teradata BYTEINT column.
Data Export Notes
The Date and Timestamp are assumed to be UTC. For more information, see Timestamp
Data.
You should be aware of the display differences that occur when you export BOOLEAN data
from Teradata into Hadoop:
• If you export data into a Hive Table that you created by using the Hive command line
interface, the BYTEINT data type in Teradata is mapped to a BOOLEAN in HCatalog,
and the 1/0 values are displayed as true/false literals in Hive.
• If you export data into a new Hive table that you created by using the Teradata HCTAS
stored procedure, then the BYTEINT data type in Teradata is mapped to TINYINT in
HCatalog, which has the same range as BYTEINT. So, the BYTEINT 1/0 values
(Boolean) are exported as-is into the TINYINT column in HCatalog.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
31
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
In Hive, if you want to display Boolean true/false literal values in an HCatalog table,
instead of the 1/0 values, you can use the built-in CAST conversion function in Hive to
convert the display of a TINYINT value to the BOOLEAN primitive type. For example:
SELECT CAST(column_bool_1 AS BOOLEAN) from HCTAS_TBL;
where column_bool_1 is defined in the HCatalog table as follows:
column_bool_1
TINYINT,
HCatalog File Types and Compression
The following table shows the file types and compression types supported by the Teradata-toHadoop connector.
Distribution
Supported Import File Types
Compression Supported By
File Type
Hortonworks HDP 2.1.2 and
2.3.0
•
•
•
Compression types supported
for use with TextFile:
• DefaultCodec
• BZip2Codec
• GzipCodec
TextFile
RCFile
ORCFile
Compression types supported
for use with RCFile:
• Block compression
Compression types supported
for use with ORCFile:
• Block compression
• SnappyCodec
Cloudera 5.4.3
•
•
TextFile
RCFile
Compression types supported
for use with TextFile:
• DefaultCodec
• BZip2Codec
• GzipCodec
Compression types supported
for use with RCFile:
• Block compression
Required Privileges
You must have CREATE SERVER privilege on the TD_SERVER_DB database to define a
foreign server object. If you are associating the server with table operators, you must also
have EXECUTE FUNCTION and SELECT privileges on the specified table operators.
Usage Notes
• The target platform of the foreign server object must be running and reachable when you
create the foreign server object for it in Teradata Database.
32
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
• You can create multiple named foreign server objects that reference the same server using
the same IP and port numbers.
• Foreign server object names that are stored in TD_SERVER_DB must be unique.
• Teradata treats the hosttype name value pair as special. If you specify this name value
pair, you must use it in the server-area name value list.
• Name value pairs in the server area of the syntax apply to the connection to the remote
platform and to both of the table operators specified in the IMPORT WITH and
EXPORT WITH clauses.
• Name value pairs in the IMPORT WITH or EXPORT WITH clause apply only to the
table operator specified in the clause.
• Server options, names, and name value pairs can appear only once in the CREATE
FOREIGN SERVER syntax.
Name value pairs used within the IMPORT WITH and EXPORT WITH clauses cannot
duplicate those used in the server-area name value list.
• The order of the DO IMPORT WITH and DO EXPORT WITH clauses in the CREATE
SERVER syntax does not matter.
• You must grant SELECT, INSERT, and SHOW privileges on foreign server objects to
users who need to query foreign server objects.
• The use of the EXTERNAL SECURITY clause is required when the foreign server's target
platform uses LDAP or Kerberos for authentication. For more information, see LDAP
and Kerberos Authentication Security.
You cannot use the following names in the name value pairs in CREATE SERVER
statements:
• Columns
• hExplain
• IsNested
• Servermode
Examples of Using CREATE FOREIGN SERVER
A standard foreign server definition must contain the following NVPs:
• server
• port
• hosttype
Most foreign server definitions also use the following NVPs:
• hiveserver
• hiveport
• username
hadoop_properties may also need to be defined in the following situations:
• Data Nodes have a private network (multi-homed)
• High Availability is enabled
For a description of the name value pairs used to define the foreign server, see Using Option.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
33
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FOREIGN SERVER
Example: Typical Server Definition With IMPORT and EXPORT Table
Operators
The following example creates a server object and associates an IMPORT table operator and
an EXPORT table operator with it:
CREATE FOREIGN SERVER hadoop1
USING
hosttype('hadoop')
server('192.0.2.3')
port('9083')
hiveport ('10000')
username('hive')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_3_0
DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_3_0;
Example: Creating a Server Object for LDAP
The following example creates a server object for an LDAP-protected Hadoop cluster. It uses
an authentication object named auth_hdp that is located in the user database for the session
user:
CREATE FOREIGN SERVER TD_SERVER_DB.hadoop2
EXTERNAL SECURITY INVOKER TRUSTED auth_hdp
USING
hosttype('hadoop')
server('hserver_name.example')
port('9083')
hiveport ('10000')
security('ldap')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_1_2,
DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_1_2;
Example: Creating a Server Object for Kerberos
The following example creates a server object for a Kerberos-protected Hadoop cluster. It
uses an authentication object named auth_cdh that is located in the user database for the
session user:
CREATE FOREIGN SERVER TD_SERVER_DB.hadoop2
EXTERNAL SECURITY INVOKER TRUSTED auth_cdh
USING
hosttype('hadoop')
server('hserver_name.example')
port('9083')
hiveport ('10000')
security('kerberos')
clustername('foo')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_CDH5_4_3,
DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_CDH5_4_3;
34
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
CREATE FUNCTION (Table Form)
Example: Using the Unicode Delimited Identifier to Create a Server
Object
The following example creates a server object using the Unicode Delimited Identifier for the
server name:
CREATE FOREIGN SERVER U&"hadoop#005fsrv" UESCAPE'#'USING
server('hive_metastore_server')
port('9083')
hosttype('hadoop')
hiveport ('10000')
username('hive')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_3_0 USING
transformformatting('true') ;
Example: Using a Double-quoted Object Name to Create a Server Object
The following example creates a server object using the double-quoted object name server
name and associates an IMPORT table operator with it:
CREATE FOREIGN SERVER TD_SERVER_DB."hadoop srv1"
USING
server('hive_metastore_server')
port('9083')
hosttype('hadoop')
hiveport ('10000')
username('hive')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_1_2 USING
transformformatting('true') ;
CREATE FUNCTION (Table Form)
Purpose
Creates a table function definition.
This syntax diagram excerpt shows the addition of the EXECUTE THREADSAFE
parameter, to CREATE FUNCTION (Table Form). For information about the other syntax
that you can use with CREATE FUNCTION (Table Form), see SQL Data Definition
Language - Syntax and Examples, B035-1144.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
35
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
DROP FOREIGN SERVER
Syntax
table form
language_clause
SQL_data_access
SQL_data_access
language_clause
SPECIFIC
external_data_access
specific_function_name
db_name.
user_name.
PARAMETER STYLE
SQL
JAVA
DETERMINISTIC
NOT
CALLED ON NULL INPUT
EXECUTE
THREADSAFE
NOT
Syntax Elements
EXECUTE THREADSAFE
Indicates that the function is to be loaded with a special thread safe loader.
This attribute applies only to Java UDFs. It uses additional memory since each AMP instance
is loaded separately and it is used only when classes are not thread safe.
The use of EXECUTE THREADSAFE is not recommended for users.
For information about using CREATE FUNCTION (Table Form), see SQL Data Definition
Language - Syntax and Examples, B035-1144.
DROP FOREIGN SERVER
Purpose
Drops a foreign server object from the TD_SERVER_DB database.
In addition to deleting the server object and its associated information from the dictionary
tables, all dependent entries on the associated table operators are deleted.
You must have the DROP SERVER privilege on the TD_SERVER_DB database or on the
specified foreign server to DROP the foreign server.
Syntax
DROP FOREIGN SERVER
server_name
TD_SERVER_DB.
36
;
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
END LOGGING
Syntax Elements
server_name
The name of the foreign server object.
You can also use the following formats for the server name:
• the Unicode Delimited Identifier, such as U&"foreign#005fsv" UESCAPE'#'
• the double-quoted object name, such as "foreign srv1"
TB_SERVER_DB.
The name of the database that stores server objects and their attributes.
Examples of Dropping a Foreign Server
These examples show dropping a server object.
DROP FOREIGN SERVER hive_metastore_server ;
DROP FOREIGN SERVER U&"hadoop#005fsrv" UESCAPE'#' ;
DROP FOREIGN SERVER "hcatalog server" ;
END LOGGING
Purpose
Ends the auditing of SQL requests that started with a BEGIN LOGGING request.
This topic describes only the portions of the END LOGGING syntax diagram that are
specific to Teradata QueryGrid. For information about the other syntax that you can use
with END LOGGING, see SQL Data Definition Language - Syntax and Examples,
B035-1144.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
37
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
GRANT and REVOKE
Syntax
ON
END LOGGING
DENIALS
ALL
,
WITH TEXT
A
operation
GRANT
A
B
FOR CONSTRAINT
,
constraint_name
BY
database_name
user_name
B
,
ON
;
20
AUTHORIZATION authorization_name
DATABASE database_name
USER database_name
TABLE
object_name
VIEW
database_name.
MACRO
user_name.
PROCEDURE
FUNCTION
TYPE
FOREIGN SERVER
Syntax Elements
ON operation
Indicates the operation for which log entries should no longer be made.
ON FOREIGN SERVER object_name
Indicates that the operation for which log entries should no longer be made is access to
a foreign server.
You must specify an object name, which is the name of the foreign server. You can
optionally specify the name of the containing database, which must be
TD_SERVER_DB. You cannot use a user_name with FOREIGN SERVER.
For information about using END LOGGING, see SQL Data Definition Language - Syntax
and Examples, B035-1144.
GRANT and REVOKE
GRANT grants one or more explicit privileges on a database, foreign server, user, proxy
logon user, table, hash index, join index, view, stored procedure, User-Defined Function
(UDF), User-Defined Method (UDM), User-Defined Type (UDT), or macro to a role, group
of roles, user, or group of users or databases. REVOKE revokes privileges on the same
objects.
38
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
HELP FOREIGN
There are no changes to existing syntax for Teradata QueryGrid, except that CREATE
SERVER and DROP SERVER privileges have been added. These privileges should be granted
only to a user, not to a database.
For a syntax diagram and description of the syntax elements that you can use in GRANT
and REVOKE, see SQL Data Control Language, B035-1149.
HELP FOREIGN
Purpose
Returns the details of the foreign object that you specify.
• A foreign server object name returns the list of databases accessible on the server.
• The name of a database on a foreign server returns the list of tables in the remote
database on the server.
• The name of a table in a remote database on a foreign server returns the list of columns in
the remote table on the server.
Syntax
Syntax Elements
SERVER server_name
The name of the foreign server. Displays the databases in the foreign server.
DATABASE db_name@server_name
The name of the remote database, qualified with the name of the foreign server. Displays the
tables in the database.
TABLE db_name.table_name@server_name
The name of the remote table, qualified with the name of the foreign server. Displays the
column names, types, and partitioning.
Required Privileges
You must have ANY privilege on the server object to display the output from HELP
FOREIGN.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
39
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
HELP FOREIGN
Examples of Using HELP FOREIGN
Example: Using HELP FOREIGN to List Databases
The import table operator behavior determines the information that this statement returns.
The table and database names, if specified, are passed as name value pairs to the import table
operator to retrieve the appropriate information. The response to the HELP statement is a
SELECT response. The number of columns and rows returned for this statement are as in the
regular SELECT response from Teradata.
Assume you have created the following server:
CREATE FOREIGN SERVER HADOOPSRV USING
SERVER('HIVE_METASTORE_SERVER')
PORT('9083')
HOSTTYPE('HADOOP')
HIVEPORT ('10000')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_1_2,
DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_1_2;
And a user types the following query:
HELP FOREIGN SERVER HADOOPSRV ;
The output looks similar to the following:
*** Query completed. 4 rows found. One column returned.
*** Total elapsed time was 6 seconds.
databases
-----------------------------------------------books
default
product
test
Example: Using HELP FOREIGN to List Tables
If you use HELP with a database name, it returns a list of the tables in the database, as the
following example shows:
HELP FOREIGN DATABASE product@hadoopSrv;
*** Query completed. One row found. One column returned.
*** Total elapsed time was 3 seconds.
tables
------------------------------------------cellphonedata_t
Example: Using HELP FOREIGN to List the Columns in a Table
This example shows the syntax used to list the columns in the cellphonedata_t table in the
hadoopSrv database.
.sidetitles on
.foldline on
HELP FOREIGN TABLE product.cellphonedata_t@hive_metastore_server;
40
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
SHOW FOREIGN SERVER
The output looks similar to the following:
*** Query completed. 13 rows found. 3 columns returned.
*** Total elapsed time was 3 seconds.
name internal_memory
column_type int
partitioned_column f
name model
column_type string
partitioned_column f
name weight
column_type float
partitioned_column f
name colors
column_type string
partitioned_column f
name camera
column_type float
partitioned_column f
name chipset
column_type string
partitioned_column f
name sim
column_type string
partitioned_column f
name operating_system
column_type string
partitioned_column f
name touchscreen
column_type string
partitioned_column f
name memory_slot
column_type string
partitioned_column f
name stand_by_time
column_type int
partitioned_column f
name dt
column_type string
partitioned_column t
name country
column_type string
partitioned_column t
SHOW FOREIGN SERVER
Purpose
Displays the SQL text most recently used to create, drop, or modify the server object.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
41
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
SHOW FOREIGN SERVER
A SHOW FOREIGN SERVER statement allows you to see a server object definition that
contains the name value pairs that the associated table operators use to connect to the foreign
server.
Syntax
SHOW
FOREIGN SERVER
IN XML
server_name
TB_SERVER_DB.
;
Syntax Elements
IN XML
To return the report in XML format.
The XML schema for the output produced by this option is maintained in:
http://schemas.teradata.com/dbobject/DBobject.xsd
TB_SERVER_DB.
The name of the database that stores foreign server objects and their parameters.
server_name
The name of the foreign server object.
For the full syntax diagram and information about the other objects that can be used with
SHOW, see SQL Data Definition Language - Syntax and Examples, B035-1144.
Required Privileges
SHOW FOREIGN SERVER requires SHOW privilege or ANY privilege on the server object
to display the output.
Examples of Using SHOW FOREIGN SERVER
This example demonstrates a CREATE SERVER followed by a SHOW SERVER statement,
and then its output.
CREATE FOREIGN SERVER hadoopSrv USING
server('hive_metastore_server')
port('9083')
hosttype('hadoop')
hiveport ('10000')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_3_0,
DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_3_0;
The SHOW FOREIGN SERVER statement for this server results in output that looks similar
to the following:
CREATE FOREIGN SERVER TD_SERVER_DB.hadoopSrv USING
server ('hive_metastore_server')
42
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
Using the Teradata-to-Hadoop Connector in SELECT Statements
port ('9083')
hosttype ('hadoop')
hiveport ('10000')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_3_0,
DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_3_0;
If you use SHOW IN XML FOREIGN SERVER syntax, the output appears similar to the
following :
<?xml version="1.0" encoding="UTF-16" standalone="no" ?>
<TeradataDBObjectSet version="1.0" xmlns="http://schemas.teradata.com/
dbobject" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schema
Location="http://schemas.teradata.com/dbobject http://
schemas.teradata.com/dbobject/DBObject.xsd">
<ForeignServer dbName="TD_SERVER_DB" name="hadoopSrv" objId="0:2996"
objVer="1">
<ServerClauseList><Clause name="server"
value="hive_metastore_server"/>
<Clause name="port" value="9083"/>
<Clause name="hosttype" value="hadoop"/>
<Clause name="hiveport" value="10000"/>
</ServerClauseList>
<ImportClause tblopdb="SYSLIB"
tblopname="LOAD_FROM_HCATALOG_HDP2_3_0"/>
<ExportClause tblopdb="SYSLIB"
tblopname="LOAD_TO_HCATALOG_HDP2_3_0"/>
</ForeignServer>
<Environment>
<Server dbRelease="15g.00.00.434" dbVersion="15.00.00.00sqlh_16"
hostName="td1410"/>
<User userId="0000FF03" userName="UT1"/>
<Session charset="ASCII" dateTime="2014-01-09T15:50:38"/>
</Environment>
</TeradataDBObjectSet>
Using the Teradata-to-Hadoop Connector in
SELECT Statements
Purpose
SELECT returns specific row data in the form of a result table.
Usage Notes
For the Teradata-to-Hadoop connector, you can use the table_name@server_name syntax to
reference a table on a foreign server or to specify a pass-through query to be executed on a
specified foreign server. The reference to the external table calls the IMPORT table operator
that is associated with the server definition.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
43
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
Using the Teradata-to-Hadoop Connector in SELECT Statements
You can use FOREIGN TABLE syntax in the FROM clause to perform a pass-through query
or to retrieve results from the specified foreign server. For example, you can specify a Hive
query as the remote pass-through information and the import table operator returns the
results of the query for processing. You can specify remote_ pass-through_information as a
quoted string to exclude it from Teradata analysis.
The table name may optionally specify a database_name, for example
database_name.table_name. The database_name references an equivalent name space on the
server. For Hadoop, the reference is to the database schema in Hive/HCatalog. The
table_name that you use must refer to a base table on the foreign server. References to a view
or view-like object are not supported.
Note that queries that use the table operators can be CPU intensive. Teradata recommends
that you use workload management rules to minimize CPU usage by queries that use the
table operators. For more information, see Post-Installation Configuration.
For additional information about using SELECT, see SQL Data Manipulation Language,
B035-1146.
Example of Using SELECT with Remote Pass-through Information
Assume the following query:
SELECT * FROM FOREIGN TABLE (SELECT count(*) FROM vim.cardata)@hadoop3
myt1 (x);
A reference to an external query to pass through (identified by the FOREIGN TABLE
(…)@server_name syntax) calls the IMPORT table operator that is associated with the server
definition. The grammar in the parentheses is unchecked, but tokenized and then passed to
the remote server for execution. The query returns the following:
*** Query completed. One row found. One column returned.
*** Total elapsed time was 41 seconds.
x
----------4
Example of Using SELECT FROM a Table
The following example shows the use of SELECT FROM with a Hadoop table, using the
table_name@server_name.
SELECT * FROM vim.cardata@hadoop2 WHERE make = 'buick';
Example of Limiting the Data Being Spooled
The following example demonstrates use of the WHERE clause, which does not limit the data
being imported over the network, but does limit data being spooled.
SELECT * FROM vim.cardata@hadoop2 as D1 WHERE liter<4 ;
44
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
Using the Teradata-to-Hadoop Connector in SELECT Statements
RETURNS Clause
The table operator portion of SELECT supports the use of a RETURNS clause to define the
expected output columns. The RETURNS clause supports either a column list or a table
definition. This clause is typically used if the output column definitions are known and there
is no need to access the remote meta-store to dynamically determine the output columns.
Note: A view definition is not supported.
Examples: Using the RETURNS Clause
For the Teradata-to-Hadoop connector LOAD_FROM_HCATALOG_abcn_n_n table
operator function, the RETURNS clause maps STRING columns in an HCatalog table to
VARCHAR(2048) CHARACTER SET UNICODE on Teradata during import. At times, the
values of the STRING columns on Hadoop may be greater or lesser than VARCHAR(2048),
so in those cases you can choose to specify the actual size of the Hadoop STRING columns
by listing those columns in the RETURNS clause with appropriate VARCHAR display size.
For example, you can use the following query:
SELECT make, model, price FROM vim.cardata@hadoop2 RETURNS (make
VARCHAR(2), model VARCHAR(50)) as D ;
The function also supports mapping large Hadoop STRING columns to BLOB or CLOB
columns on Teradata. Define the corresponding Hadoop string columns as BLOB or CLOB
in the RETURNS clause, for example, as follows:
select i, s from vim.tdjson@hadoop2 RETURNS(s clob(2000)) ;
By default, the function converts an ARRAY column retrieved from an HCatalog table to a
JSON string and then maps that to a VARCHAR(2048) CHARACTER SET UNICODE
column on Teradata. If you want to map the Hadoop ARRAY to a matching Teradata
ARRAY type, you can indicate that in the RETURNS clause, so that the Hadoop ARRAY
type can be converted to a special VARCHAR string that can be CASTed to a matching
Teradata ARRAY type. For example:
create Type strarray as varchar(25) Array[2] ;
SELECT TOP 3 CAST(sarray AS strarray) as stringType FROM
vim.arrayall@hadoop2 RETURNS (sarray strarray) ;
Timestamp Data
When you perform a query that imports data from Hadoop (that is, it uses
LOAD_FROM_HCATALOG_abcn_n_n), timestamp data is assumed to be UTC.
When you perform a query that exports data to Hadoop (that is, it uses
LOAD_TO_HCATALOG_abcn_n_n), timestamp data is converted to UTC.
For the following examples, assume that the following data exists on a hadoop cluster:
hive -e "SELECT * FROM tab_csv"
OK
1
2010-01-01 10:00:00
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
45
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
Using the Teradata-to-Hadoop Connector in SELECT Statements
Assume that the Teradata server has the following settings:
16.
17.
18.
57.
System TimeZone Hour
System TimeZone Minute
System TimeZone String
TimeDateWZControl
=
=
=
=
7
0
Not Set
3
(Enabled with LOCAL)
Example: Import Timestamp Data
This example imports timestamp data:
IMPORT (SELECT * FROM tab_csv@foreign_server;)
The data goes through the following conversions:
1. The table operator imports the data (2010-01-01 10:00:00).
2. It assumes that the data is UTC, so Teradata converts it to system time (2010-01-01
17:00:00).
3. Teradata then converts it back to UTC to display the value in the user session
(2010-01-01 10:00:00).
4. The session default time zone is +7 hours: (2010-01-01 17:00:00).
The value in step 4 will vary, depending on the session time zone selected. For example:
set time zone 'gmt';
With this session time zone setting, the displayed timestamp would be (2010-01-01 10:00:00).
Example: Export Timestamp Data
This example exports timestamp data:
(INSERT tab1@foreign_server (2010-01-01 10:00:00))
The data goes through the following conversion:
1. Teradata converts the data to UTC, based on the session time zone. (2010-01-01
03:00:00).
2. The timestamp data is written to the hadoop disk (2010-01-01 03:00:00).
The value in step 1 varies, depending on the session time zone selected. For example:
set time zone 'gmt';
With the above session time zone setting, the data is converted to (2010-01-01 10:00:00).
set time zone 'America pacific';
With the above session time zone setting, the data is converted to (2010-01-01 18:00:00).
46
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
Using the Teradata-to-Hadoop Connector in INSERT Statements
Using the Teradata-to-Hadoop Connector in
INSERT Statements
Purpose
Adds new rows to a named table by directly specifying the row data to be inserted (valued
form) or by retrieving the new row data from another table (selected, or INSERT … SELECT
form).
Usage Notes
If you refer to an external table as a target of an INSERT/SELECT statement, the INSERT is
automatically resolved, and calls the EXPORT table operator
(LOAD_TO_HCATALOG_abcn_n_n) that is associated with the foreign server. You identify
the table as an external table by using the syntax table_name@server_name. After the server
name is resolved, Teradata Database automatically fills in the database name and the table
name and executes the associated EXPORT operator.
To INSERT into a Hadoop table, the table must already exist, but it can be empty or
populated.
You can optionally specify a database name in addition to the table name, as in
database_name.table_name. The database_name references an equivalent name space on the
foreign server. For Hadoop, the reference is to the database schema in Hive/HCatalog. The
table_name that you use must refer to a base table on the foreign server. References to a view
or view-like object are not supported.
The @server syntax in an INSERT statement is not supported as an action statement in a
database trigger definition.
Hive treats line terminators (by default \r (0x0d) and \n (0x0a)) as end of row markers and
assumes that the data that follows a line terminator is part of the next row. Teradata does not
remove the line terminators before it exports data to a Hadoop system. You must deal with
the line terminators appropriately in your query before the system exports the data to
Hadoop.
For example, the following command replaces all line breaks (\n) in varchar_col1 with
spaces:
SELECT oreplace(varchar_col1, ‘0a’xc, ‘ ‘) from tab1
Any timestamp data is converted to UTC. For more information, see Timestamp Data.
Note that queries that use the table operators can be CPU intensive. Teradata recommends
that you use workload management rules to minimize CPU usage by queries that use the
table operators. For more information, see Post-Installation Configuration.
For information about using INSERT, see SQL Data Manipulation Language, B035-1146.
Example of Using INSERT
In this example of INSERT, the inner operator writes data to the Hadoop file system and
produces entries to be placed in HCatalog. The output of the inner operator is directed to
one AMP, which registers them with HCatalog.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
47
Chapter 2 Syntax for Teradata QueryGrid: Teradata Database-to-Hadoop
Using the Teradata-to-Hadoop Connector in INSERT Statements
INSERT INTO vim.customer@hadoop3 SELECT * FROM customer ;
Restricted Words
The following words are restricted:
• SERVER
• IMPORT
• EXPORT
48
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
CHAPTER 3
Stored Procedures for Teradata QueryGrid:
Teradata Database-to-Hadoop
Introduction
Teradata supplies stored procedures that you can use to create and drop Hadoop tables. You
can use these procedures so that SQL scripts can export data in a standalone manner.
HCTAS Stored Procedure
Purpose
To create the schema of a local Teradata table in Hadoop.
You can use this procedure so that SQL scripts can export data in a standalone manner.
You must have SELECT privilege on the foreign server to use this stored procedure.
Syntax
HCTAS
( 'table_name', 'partition_columns_list ', 'table_definition', 'foreign_servername', 'hive_db_name' )
SYSLIB.
Syntax Elements
table_name
The name of the Teradata table to use for referencing column type definitions when
creating an Apache Hive table. Column types are translated to the corresponding
column types supported by the INSERT.
partition_columns_list
A comma-separated, ordered list of Teradata columns to use as partitioned columns in
Hive.
table_definition
The Hive table definition information, such as location or format type.
foreign_servername
The name of the foreign server that you want to create the table on. Used for
permissions and connection information.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
49
Chapter 3 Stored Procedures for Teradata QueryGrid: Teradata Database-to-Hadoop
HCTAS Stored Procedure
hive_db_name
The name of the Hive database in which you want to create the table.
Supported Data Types
HCTAS currently supports the following mapping of data types.
Teradata Data Type
2.1 Hive Data Type
1.3.2 Hive Data Type
INT
INT
INT
BIGINT
BIGINT
BIGINT
BYTEINT
TINYINT
TINYINT
BYTE
BINARY
BINARY
NUMBER
DOUBLE
DOUBLE
REAL
DOUBLE
DOUBLE
SMALLINT
SMALLINT
SMALLINT
VARBYTE
BINARY
BINARY
VARCHAR
STRING
STRING
CHAR(n)
CHAR(n)
STRING
DECIMAL(n)
DECIMAL(n)
DECIMAL
DATE
DATE
STRING
TIMESTAMP
TIMESTAMP
STRING
ALL ELSE
STRING
STRING
Usage Notes
To use HCTAS, you must have the following name value pairs defined for the foreign server:
• Hive port: Use port 10000 for the hive server, for example, hiveport(‘10000’).
• Server name: The hostname or IP address of the Hive server, for example,
server(‘hive_metastore_server’).
Examples of Using HCTAS to Create a Table Schema
This basic query creates a table without a partition on the Hadoop server.
CALL SYSLIB.HCTAS('test',null,null,'hive_metastore_server','default') ;
This syntax results in the following output:
hive> describe test;
OK
50
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 3 Stored Procedures for Teradata QueryGrid: Teradata Database-to-Hadoop
HDROP Stored Procedure
c1
c2
c3
string
string
int
None
None
None
The following query returns more information:
CALL SYSLIB.HCTAS('test','c1,c2',
'LOCATION "/user/hive/
test_table"','hive_metastore_server','default') ;
It results in the following output:
OK
c3
int
c1
string
c2
string
# Partition Information
# col_name
data_type
c1
string
c2
string
None
None
None
comment
None
None
HDROP Stored Procedure
Purpose
To drop a Hadoop table on a foreign server.
You can use this procedure in SQL scripts to drop tables in a standalone manner.
You must have SELECT privilege on the foreign server to use this stored procedure.
Syntax
HDROP
( 'hive_db_name', 'hive_table_name', 'foreign_servername' )
SYSLIB.
Syntax Elements
hive_db_name
The Hive database where the table is located.
hive_table_name
The name of the Hive table that you want to drop.
foreign_servername
The name of the foreign server on which you want to drop the table.
HDROP Usage Notes
To use HDROP, you must have the following name value pairs defined for the foreign server:
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
51
Chapter 3 Stored Procedures for Teradata QueryGrid: Teradata Database-to-Hadoop
HDROP Stored Procedure
• Hive port: Use port 10000 for the hive server. For example hiveport(‘10000’)
• Server name: The hostname or IP address of the Hive server. For example,
server(‘hcatalog_server’)
Example of Using HDROP to Drop a Hadoop Table
The following example demonstrates the use of HDROP to drop a Hadoop table:
CALL SYSLIB.HDROP('defaultDB','testTable','hive_metastore_server') ;
52
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
CHAPTER 4
Privileges and Security for Teradata QueryGrid:
Teradata Database-to-Hadoop
Privileges Needed to Use Teradata QueryGrid
Privileges for Administrators
CREATE SERVER and DROP SERVER are object-level privileges that restrict who can use
the CREATE FOREIGN SERVER and DROP FOREIGN SERVER SQL statements.
• CREATE SERVER can only be granted on the TD_SERVER_DB database as a whole.
• DROP SERVER can be granted on the TD_SERVER_DB database or on individual
foreign server objects.
• The CREATE SERVER and DROP SERVER privileges are included if you grant ALL
privileges on the TD_SERVER_DB database.
In addition to the CREATE SERVER and DROP SERVER privileges, administrators need the
EXECUTE FUNCTION and SELECT privileges on the import and export table operators or
on the SYSLIB database that contains the table operators in order to create, drop, and
modify foreign server objects.
The creator of a foreign server object implicitly receives the following privileges on the
object:
• SHOW privilege WITH GRANT OPTION
• DROP SERVER privilege WITH GRANT OPTION
• SELECT privilege WITH GRANT OPTION
• If the foreign server object is capable of exporting data (that is, the CREATE FOREIGN
SERVER statement includes the DO EXPORT WITH clause), the creator automatically
receives the INSERT privilege WITH GRANT OPTION
CREATE AUTHORIZATION and DROP AUTHORIZATION privileges are required to
work with authorization objects referenced by foreign server objects. DROP
AUTHORIZATION is automatically granted to the creator of an authorization object.
Privileges for Users of the Foreign Server Object
• Users who will be querying the remote database must be granted SELECT, INSERT, and
SHOW privileges on the foreign server object used to access the remote server.
• Granting the ALL privilege on a foreign server object implicitly grants other privileges
that depend on the nature of the foreign server:
• If the foreign server object can import data from the remote database (that is, the
CREATE FOREIGN SERVER statement included a DO IMPORT WITH clause),
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
53
Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop
Maintaining Security
granting the ALL privilege on the foreign server implicitly includes the SELECT,
SHOW, and DROP privileges.
• If the foreign server object can export data to the remote database (that is, the
CREATE FOREIGN SERVER statement included a DO EXPORT WITH clause),
granting the ALL privilege on the foreign server implicitly includes the INSERT,
SHOW, and DROP privileges.
Maintaining Security
You can maintain security for Teradata QueryGrid: Teradata-to-Hadoop by:
• Setting appropriate privileges to those who create and manage foreign server and if used,
to authorization objects.
• Setting up foreign server objects that match the appropriate access to database and tables
needed by Teradata Database users.
• Setting appropriate privileges to foreign server objects to Teradata Database users.
The physical security of data as it resides on disk or is transferred across the network is not
addressed by Teradata QueryGrid. Teradata QueryGrid does not support encryption across
networks or any authentication security.
You may want to consider the following security guidelines:
• You should not grant EXECUTE FUNCTION privileges on the functions in SYSLIB to
users performing queries on the foreign server.
• GRANT CREATE and DROP SERVER privileges only to a trusted database administrator
who administers the server setup.
• The trusted database administrator can then grant SELECT or INSERT privilege on the
server objects to a subset of users.
• The trusted database administrator can set up authentication using one of the methods
described in LDAP and Kerberos Authentication Security.
• If an external security system is in use (LDAP or Kerberos) on the Hadoop cluster, the
user specified in an authorization object must exist in the external security system.
• When an authorization object is used, the user name will be used for both HDFS and
Hive access.
• For a Hadoop cluster protected by LDAP, hive permissions are required even for HDFSonly access.
• The user may or may not belong to any group on the Hadoop cluster.
• On the Hadoop platform, HDFS and Hive permissions must be set up appropriately or
permission will be denied.
Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos
authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH 5.4 is
not supported.
54
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop
LDAP and Kerberos Authentication Security
LDAP and Kerberos Authentication Security
You can set up Teradata QueryGrid: Teradata Database-to-Hadoop to authenticate to a
Hadoop cluster that is protected by an external security system, such as LDAP or Kerberos.
The Teradata-to-Hadoop connector uses an authorization object to pass on the credentials
needed to authenticate to LDAP or Kerberos.
Teradata QueryGrid: Teradata Database-to-Hadoop does not work with a Kerberized cluster
where Hive requires LDAP authentication.
Note: Teradata QueryGrid: Teradata Database-to-Hadoop supports only Kerberos
authentication when used with Cloudera CDH 5.4; the use of LDAP on Cloudera CDH 5.4 is
not supported.
If the foreign server does not use LDAP or Kerberos, you can define a fixed user name as the
value for username in the USING clause of the foreign server. All users using that foreign
server will access the Hadoop data under that fixed name. If no username is defined in the
foreign server, then the name of the user making the request is used.
Authorization Objects and Mapping
You can create an authorization object, which stores the credentials for a user in the Hadoop
security system (LDAP or Kerberos) in encrypted form. You can set up the Teradata-toHadoop connector to use authorization objects based on your security needs and
administrative convenience.
If you need one-to-one mapping between a Teradata Database user and a Hadoop user, then
you must have corresponding accounts in Teradata Database and the security system. When
that user creates the authorization using AS INVOKER TRUSTED, the authorization is
stored by default on the user database. The credentials for the security system do not need to
be revealed to another person and the authorization object is accessible only to users with
privilege to that database.
You can use many-to-one mapping between multiple Teradata Database users and one user
in the Hadoop security system to simplify administration. Only the creator of the
authorization need know the credentials for the user on the Hadoop security system. When
the authorization is created using AS DEFINER TRUSTED, the authorization is stored by
default in the TD_SERVER_DB database, which makes the authorization available globally.
Where the Foreign Server Looks for the Authorization Object
When a foreign server is configured with the INVOKER keyword and no value is specified
for the database name (dbname) the Teradata-to-Hadoop connector automatically looks for
the authorization in the user database of the session user.
When a foreign server is configured with the DEFINER keyword and no value is specified
for the database name (dbname) the Teradata-to-Hadoop connector automatically looks for
the authorization in the TD_SERVER_DB database.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
55
Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop
LDAP and Kerberos Authentication Security
Setup Process for LDAP and Kerberos Authentication
If you are using Hortonworks HDP 2.1 or 2.3, you can follow this process to give Teradata
Database users access to a Hadoop cluster that uses LDAP or Kerberos authentication. If you
are using Cloudera CDH 5.4, you can follow this process to give Teradata Database users
access to a Hadoop cluster that uses Kerberos authentication.
1. Create the required authorization objects based on your mapping scheme.
2. For Kerberos only, the Hadoop core-site.xml must contain a proxy user entry for each
Kerberos user principal used for authentication. An initial set of proxy users was added
during the Teradata-to-Hadoop connector installation. If you want to use additional
proxy users you must add them to core-site.xml.
For more information, see Configuring a Kerberos User for Use as a Proxy.
3. Create the foreign server object:
• Use the required syntax for your authorization. For example, if the authorization is
create using DEFINER, the foreign server must be created using DEFINER.
• In the USING clause, include security and specify the system being used (ldap or
kerberos).
• If you are using Kerberos, then you must include clustername to specify the
directory name in the auxiliary JAR file under which the Hadoop XML configuration
files reside.
4. Grant the SELECT privilege and the INSERT privilege on the foreign server object to the
desired set of users.
For information on authorizations, see CREATE AUTHORIZATION and REPLACE
AUTHORIZATION.
For more information on foreign servers, see CREATE FOREIGN SERVER.
Example: Kerberos Using INVOKER
This example creates the remote_hdp authorization object in the creator's user database. If
the creator is td_user then td_user.remote_hdp is the fully qualified object name.
create authorization remote_hdp as invoker trusted
user 'kerberos_user' password 'kerberos_pass';
This example creates a foreign server object that uses the remote_hdp authorization object.
create foreign server hdp21
external security invoker trusted remote_hdp
using
hosttype('hadoop')
port('9083')
hiveport('10000')
server('hdp21.example.com')
security('kerberos')
clustername('spiral')
do import with syslib.load_from_hcatalog_hdp2_1_2,
do export with syslib.load_to_hcatalog_hdp2_1_2;
The clustername value of spiral matches the directory name in the auxiliary JAR file that
was created during installation of QueryGrid. clustername is required for a Kerberosprotected Hadoop cluster.
56
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop
LDAP and Kerberos Authentication Security
Example: Kerberos Using DEFINER
This example creates the remote_hdp authorization object in the td_server_db database.
create authorization td_server_db.remote_cdh as definer trusted
user 'kerberos_proxy_user' password 'kerberos_proxy_pass';
This example creates a foreign server object that uses the remote_hdp authorization object.
create foreign server cdh54
external security definer trusted remote_cdh
using
hosttype('hadoop')
port('9083')
hiveport('10000')
server('cdh54.example.com')
security('kerberos')
clustername('spiral')
do import with syslib.load_from_hcatalog_cdh5_4_3,
do export with syslib.load_to_hcatalog_cdh5_4_3;
The clustername value of spiral matches the directory name in the auxiliary JAR file that
was created during installation of QueryGrid. clustername is required for a Kerberosprotected Hadoop cluster.
Example: LDAP Using INVOKER
This example creates the remote_hdp authorization object in the creator's user database. If
the creator is td_user then td_user.remote_hdp is the fully qualified object name.
create authorization remote_hdp as invoker trusted
user 'ldap_user' password 'ldap_pass';
This example creates a foreign server object named hdp21 that uses the remote_hdp
authorization object.
create foreign server hdp21
external security invoker trusted remote_hdp
using
hosttype('hadoop')
port('9083')
hiveport('10000')
server('hdp21.example.com')
security('ldap')
do import with syslib.load_from_hcatalog_hdp2_1_2,
do export with syslib.load_to_hcatalog_hdp2_1_2;
Example: LDAP Using DEFINER
This example creates the remote_hdp authorization object in the td_server_db database.
create authorization td_server_db.remote_hdp as definer trusted
user 'ldap_proxy_user' password 'ldap_proxy_pass';
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
57
Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop
LDAP and Kerberos Authentication Security
This example creates a foreign server object named hdp21 that uses the remote_hdp
authorization object.
create foreign server hdp21
external security definer trusted remote_hdp
using
hosttype('hadoop')
port('9083')
hiveport('10000')
server('hdp21.example.com')
security('ldap')
do import with syslib.load_from_hcatalog_hdp2_3_0,
do export with syslib.load_to_hcatalog_hdp2_3_0;
Kerberos Maintenance
You must update the configuration of Teradata QueryGrid: Teradata Database-to-Hadoop
under these circumstances:
• You want a foreign server to be able to access Hadoop using a new Kerberos user principal
(that is, a Kerberos user not previously used for authentication by any foreign server).
For more information, see Configuring a Kerberos User for Use as a Proxy.
• The name or location of the default Kerberos realm or the location of the host for your
KDC (Key Distribution Center) or administration server changes.
For more information, see Updating Kerberos Configuration Information.
Configuring a Kerberos User for Use as a Proxy
The core-site.xml file for the Hadoop NameNode must include information for each Kerberos
user who will access Hadoop from Teradata Database. During the installation of Teradata
QueryGrid: Teradata Database-to-Hadoop, an initial set of users was added to the coresite.xml file. If you want to use a new user as a proxy, then properties for that user must be
added to core-site.xml. This task must be performed before you use an authorization object
created for that user.
If you are using Hortonworks HDP, you can use an application, such as Ambari, that you
would normally use to edit the service property values in core-site.xml. For information
about how to edit core-site.xml, refer to your tool's documentation.
If you are using Cloudera CDH, you can use Cloudera Manager to edit the core-site.xml file.
1 In core-site.xml, add a property for groups where you replace user_name with the
name of the user:
hadoop.proxyuser.user_name.groups
2 Add a key value of * to indicate a member of any group or specify groups by name in a
comma-separated list.
3 Add a property for hosts where you replace user_name with the name of the user:
hadoop.proxyuser.user_name.hosts
58
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop
LDAP and Kerberos Authentication Security
4 Add a key value of * to indicate that the proxy can connect from any host (that is, a
Teradata node) or specify hosts by name in a comma-separated list.
5 If you are using Cloudera CDH, use Cloudera Manager to redeploy the configuration
files.
After redeployment, the following files can be found in the /etc/hive/conf directory:
• core-site.xml
• hdfs-site.xml
• hive-site.xml
• mapred-site.xml
• yarn-site.xml
Property Example
This example shows the properties added for a proxy user named myproxy_user.
<property>
<name>hadoop.proxyuser.myproxy_user.groups</name>
<value>group1,group2</value>
<description>
Allow the proxy user myproxy_user to impersonate any members
of the groups: group1 or group2.
</description>
</property>
<property>
<name>hadoop.proxyuser.myproxy_user.hosts</name>
<value>host1,host2</value>
<description>
Allow the proxy user myproxy_user to connect only from host1
and host2 to impersonate a user. It is recommended to use the IP
addresses of the Teradata nodes.
</description>
</property>
Updating Kerberos Configuration Information
During the installation of Teradata QueryGrid: Teradata Database-to-Hadoop,
communication was set up between the Teradata Database and the Kerberos authentication
server or realm. If you make changes to your default Kerberos realm or to the location of the
host for your KDC or administration server, then you must update that information in the
krb5.conf file. The file is located on the Teradata Database nodes on which the Kerberos
client is installed and the Hadoop cluster nodes.
You may use any tools that you would normally use to edit the krb5.conf file and install the
jar file. For information, refer to your tool's documentation.
1 Navigate to the krb5.conf files on all nodes in both systems and set up communication
between the Teradata Database and the Kerberos authentication server or realm.
In the following example, bolded content is updated:
[libdefaults]
default_realm = C1.HADOOP.MYCOMPANY.COM
dns_lookup_realm = false
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
59
Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop
LDAP and Kerberos Authentication Security
dns_lookup_kdc = false
ticket_lifetime = 24h
forwardable = yes
udp_preference_limit = 1
[realms]
EXAMPLE.COM = {
kdc = kerberos.example.com
admin_server = kerberos.example.com
}
C1.HADOOP.MYCOMPANY.COM = {
kdc = spiral1.mydivision.mycompany.com:88
admin_server = spiral1.mydivision.mycompany.com:749
default_domain = hadoop.com
}
[domain_realm]
.hadoop.com = C1.HADOOP.MYCOMPANY.COM
hadoop.com = C1.HADOOP.MYCOMPANY.COM
[logging]
kdc = FILE:/var/log/krb5/krb5kdc.log
admin_server = FILE:/var/log/krb5/kadmind.log
default = SYSLOG:NOTICE:DAEMON
2 Depending on the distribution you are using, do one of the following tasks:
Option
Description
Hortonworks HDP Create a JAR file directory and in it create a JAR file that contains the
required configuration files.
Cloudera CDH
Create a JAR file directory that reflects the nameservices name, and in
it create a JAR file that contains the required configuration files.
For example: jar cvf spiral.jar spiral/*.xml
Enabling security for Kerberos requires a clustername clause in the CREATE SERVER
statement. The value of this clause must match the directory name under which the XML
files reside in the auxiliary JAR files created by the user.
• core-site.xml
• hdfs-site.xml
• hive-site.xml
• mapred-site.xml
• yarn-site.xml
In this example, the clause clustername is spiral and must be specified and match the
directory name.
spiral/
spiral/core-site.xml
spiral/hdfs-site.xml
spiral/hive-site.xml
spiral/mapred-site.xml
spiral/yarn-site.xml
60
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop
LDAP and Kerberos Authentication Security
3 Complete the procedure in Configuring Kerberos Settings for Teradata QueryGrid.
Configuring Kerberos Settings for Teradata QueryGrid
If you are configuring Kerberos, complete this procedure after using PUT to install the
Teradata QueryGrid connector.
Note: tdsqlh_td 15.00.03.xx is the minimum version of the Teradata QueryGrid
connector package required to use with Kerberos.
Configuring Kerberos Settings When Using Hortonworks HDP
1 Edit the tdsqlh_hdp.bteq file to install the JAR file created earlier and add it to
CLASSPATH:
• mycluster designates the directory name created earlier in this procedure.
• myjar.jar designates the JAR file created earlier in this procedure.
a Add the following lines into tdsqlh_hdp.bteq near similar lines of code:
CALL sqlj.install_jar('cj!myjar.jar','mycluster',0);
CALL sqlj.replace_jar('cj!myjar.jar','mycluster');
b Modify the following statements in tdsqlh_hdp.bteq by adding (*,mycluster) to
the end of the statements:
CALL
sqlj.alter_java_path('SQLH_HDP2_1_2','(*,tdsqlh_hdp_HDP2_1_2)
(*,avro_HDP2_1_2)(*,commons-cli_HDP2_1_2)(*,commonscodec_HDP2_1_2)(*,commons-configuration_HDP2_1_2)(*,commonslang_HDP2_1_2)(*,commons-logging_HDP2_1_2)(*,datanucleuscore_HDP2_1_2)(*,guava_HDP2_1_2)(*,hadoop-auth_HDP2_1_2)
(*,hadoop-common_HDP2_1_2)(*,hadoop-hdfs_HDP2_1_2)(*,hadoop-mrcommon_HDP2_1_2)(*,hadoop-mr-core_HDP2_1_2)(*,hivecommon_HDP2_1_2)(*,hive-exec_HDP2_1_2)(*,hive-hcat-core_HDP2_1_2)
(*,hive-jdbc_HDP2_1_2)(*,hive-metastore_HDP2_1_2)(*,hiveserde_HDP2_1_2)(*,hive-service_HDP2_1_2)(*,httpclient_HDP2_1_2)
(*,httpcore_HDP2_1_2)(*,jackson-core-asl_HDP2_1_2)
(*,jetty_HDP2_1_2)(*,jetty-util_HDP2_1_2)(*,libfb303_HDP2_1_2)
(*,log4j_HDP2_1_2)(*,pig_HDP2_1_2)(*,slf4j-api_HDP2_1_2)(*,slf4jlog4j12_HDP2_1_2)(*,snappy-java_HDP2_1_2)(*,mycluster)');
CALL sqlj.alter_java_path('SQLH_NO_VER','(*,tdsqlh_hdp_HDP2_1_2)
(*,avro_HDP2_1_2)(*,commons-cli_HDP2_1_2)(*,commonscodec_HDP2_1_2)(*,commons-configuration_HDP2_1_2)(*,commonslang_HDP2_1_2)(*,commons-logging_HDP2_1_2)(*,datanucleuscore_HDP2_1_2)(*,guava_HDP2_1_2)(*,hadoop-auth_HDP2_1_2)
(*,hadoop-common_HDP2_1_2)(*,hadoop-hdfs_HDP2_1_2)(*,hadoop-mrcommon_HDP2_1_2)(*,hadoop-mr-core_HDP2_1_2)(*,hivecommon_HDP2_1_2)(*,hive-exec_HDP2_1_2)(*,hive-hcat-core_HDP2_1_2)
(*,hive-jdbc_HDP2_1_2)(*,hive-metastore_HDP2_1_2)(*,hiveserde_HDP2_1_2)(*,hive-service_HDP2_1_2)(*,httpclient_HDP2_1_2)
(*,httpcore_HDP2_1_2)(*,jackson-core-asl_HDP2_1_2)
(*,jetty_HDP2_1_2)(*,jetty-util_HDP2_1_2)(*,libfb303_HDP2_1_2)
(*,log4j_HDP2_1_2)(*,pig_HDP2_1_2)(*,slf4j-api_HDP2_1_2)(*,slf4jlog4j12_HDP2_1_2)(*,snappy-java_HDP2_1_2)(*,mycluster)');
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
61
Chapter 4 Privileges and Security for Teradata QueryGrid: Teradata Database-to-Hadoop
LDAP and Kerberos Authentication Security
Configuring Kerberos Settings When Using Cloudera CDH
1 Edit the tdsqlh_cdh.bteq file to install the JAR file created earlier and add it to
CLASSPATH:
• mycluster designates the directory name created earlier in this procedure.
• myjar.jar designates the JAR file created earlier in this procedure.
a Add the following lines into tdsqlh_cdh.bteq near similar lines of code:
CALL sqlj.install_jar('cj!myjar.jar','mycluster',0);
CALL sqlj.replace_jar('cj!myjar.jar','mycluster');
b Modify the following statements in tdsqlh_cdh.bteq by adding (*,mycluster) to
the end of the statements:
CALL
sqlj.alter_java_path('SQLH_cdh5_4_3','(*,tdsqlh_cdh_cdh5_4_3)
(*,commons_collections_cdh5_4_3)(*,hive_serde_cdh5_4_3)
(*,guava_cdh5_4_3)(*,commons_io_cdh5_4_3)(*,log4j_cdh5_4_3)
(*,hadoop_hdfs_cdh5_4_3)(*,avro_cdh5_4_3)(*,hive_service_cdh5_4_3)
(*,commons_cli_cdh5_4_3)(*,hive_exec_cdh5_4_3)
(*,commons_logging_cdh5_4_3)(*,libfb303_cdh5_4_3)
(*,datanucleus_core_cdh5_4_3)(*,hadoop_auth_cdh5_4_3)
(*,commons_configuration_cdh5_4_3)(*,hadoop_common_cdh5_4_3)
(*,hadoop_core_cdh5_4_3)(*,hadoop_mrcapp_cdh5_4_3)
(*,hadoop_mrccommon_cdh5_4_3)(*,hadoop_mrccore_cdh5_4_3)
(*,htrace_com_cdh5_4_3)(*,servlet_api_cdh5_4_3)
(*,hive_jdbc_cdh5_4_3)(*,commons_codec_cdh5_4_3)
(*,commons_lang_cdh5_4_3)(*,hive_hcat_core_cdh5_4_3)
(*,jdo_api_cdh5_4_3)(*,hive_common_cdh5_4_3)
(*,hive_metastore_cdh5_4_3)(*,protobuf_java_cdh5_4_3)
(*,httpclient_cdh5_4_3)(*,httpcore_cdh5_4_3)(*,pig_cdh5_4_3)
(*,mycluster)');
CALL sqlj.alter_java_path('SQLH_NO_VER','(*,tdsqlh_cdh_cdh5_4_3)
(*,commons_collections_cdh5_4_3)(*,hive_serde_cdh5_4_3)
(*,guava_cdh5_4_3)(*,commons_io_cdh5_4_3)(*,log4j_cdh5_4_3)
(*,hadoop_hdfs_cdh5_4_3)(*,avro_cdh5_4_3)(*,hive_service_cdh5_4_3)
(*,commons_cli_cdh5_4_3)(*,hive_exec_cdh5_4_3)
(*,commons_logging_cdh5_4_3)(*,libfb303_cdh5_4_3)
(*,datanucleus_core_cdh5_4_3)(*,hadoop_auth_cdh5_4_3)
(*,commons_configuration_cdh5_4_3)(*,hadoop_common_cdh5_4_3)
(*,hadoop_core_cdh5_4_3)(*,hadoop_mrcapp_cdh5_4_3)
(*,hadoop_mrccommon_cdh5_4_3)(*,hadoop_mrccore_cdh5_4_3)
(*,htrace_com_cdh5_4_3)(*,servlet_api_cdh5_4_3)
(*,hive_jdbc_cdh5_4_3)(*,commons_codec_cdh5_4_3)
(*,commons_lang_cdh5_4_3)(*,hive_hcat_core_cdh5_4_3)
(*,jdo_api_cdh5_4_3)(*,hive_common_cdh5_4_3)
(*,hive_metastore_cdh5_4_3)(*,protobuf_java_cdh5_4_3)
(*,httpclient_cdh5_4_3)(*,httpcore_cdh5_4_3)(*,pig_cdh5_4_3)
(*,mycluster)');
62
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
CHAPTER 5
Administration and Utilities for Teradata
QueryGrid: Teradata Database-to-Hadoop
Creating the Server Database
To use the Teradata-to-Hadoop connector, the TD_SERVER_DB database must exist to hold
server objects and their associated information. This database is created by running the
Database Initialization Program (DIP). DIP is a series of executable SQL script files
packaged with Teradata Database. Each DIP script creates one or more system users,
databases, macros, tables, and views, for use by the Teradata Database and/or by users.
All of the DIP scripts that you need should have been executed during Teradata Database
installation.
For information about using the DIP scripts, see Utilities, B035-1102.
Post-Installation Configuration
Some configuration is required before you use the Teradata-to-Hadoop connector. For
example, the following kinds of things may need to be changed:
• FSGcache concurrency settings.
• Work load management rules that control the number of concurrent queries.
• Java Virtual Machine (JVM) settings. A change in JVM settings requires a restart.
You should also make sure that a proxy user has been set up on the Hadoop cluster.
For more information, see the version of the Orange Book, Teradata® QueryGrid™:
Teradata Database-to-Hadoop, publication number 541-0009812, that supports release
15.0.4 of the Teradata-to-Hadoop connector.
Tuning Concurrency Between FSGCache and the JVM
To process Teradata-to-Hadoop connector requests, the node-level JVM requires a
significant amount of Java heap and permanent memory space to handle thread safety and
HDFS buffering.
By default, TASM limits concurrency to two queries, so either the FSGCache and JVM
settings should be changed to support two queries, or the concurrency rules should be reset
to match the level of concurrency that the system has been tuned for.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
63
Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop
Post-Installation Configuration
Viewpoint Workload Designer contains a default TDWM rule in the TDWM ruleset that
includes the Teradata-to-Hadoop connector table operators. You can change the concurrency
value in the default rule or you can delete the default rule and define custom rules for the
SYSLIB.LOAD_FROM_HCATALOG_abcn_n_n object and the
SYSLIB.LOAD_TO_HCATALOG_abcn_n_n object. If you define custom rules, be sure to
delete the default rule. Teradata Database continues to use the values in the default rule until
it is deleted. Custom rules can be included with other functions but cannot include objects
other than functions.
For information about using Viewpoint Workload Designer, see the Teradata Viewpoint User
Guide, B035-2206.
The level of concurrency that you want influences the FSGcache settings and the JVM heap
and perm space that is needed. The following table provides an example that illustrates the
relationship between these settings. It is based on an example system that has 36 AMPs per
node.
128GB Teradata
System
96GB Teradata
System
FSGCache 95%
FSGCache 92%
FSGCache 90%
FSGCache 88%
FSGCache 85%
6.5GB
10.24GB
12.8GB
15.36GB
19.2GB
2 concurrent
queries
3 concurrent
queries
4 concurrent
queries
5 concurrent
queries
~7 concurrent
queries
Requires 512MB
Java perm space.
Requires 750MB
Java perm space.
Requires 1GB Java Requires 1.25GB
perm space.
Java perm space.
Requires 1.75GB
Java perm space.
4.8GB
7.68GB
9.6GB
11.52GB
14.4GB
3 concurrent
queries
~4 concurrent
queries
~5 concurrent
queries
Requires 750MB
Java perm space.
Requires 1GB Java Requires 1.25GB
perm space.
Java perm space.
1 concurrent query 2 concurrent
queries
Requires 512MB
Java perm space.
Requires 512MB
Java perm space.
For more information about tuning, see the version of the Teradata® QueryGrid™: Teradata
Database-to-Hadoop Orange book, publication number 541–0009812, that supports release
15.0.4 of the Teradata-to-Hadoop connector. It contains embedded Microsoft® Excel®
spreadsheets to use to calculate suggested memory settings. You can enter numbers that
represent your desired configuration into the appropriate spreadsheet, and the spreadsheet
produces an estimate of the suggested FSGCache and JVM memory settings needed to run
Teradata QueryGrid:Teradata Database-to-Hadoop on your system.
For information about the FSGCache, see Utilities, B035-1102.
JVM Configuration for the Teradata-to-Hadoop connector Table Operators
We recommend that you use the following garbage collection options to efficiently clean up
unused Java objects and keep the memory usage under control:
• -XX:UseParallelGC
• -XX:+UseParallelOldGC
Java permanent space is used for storing the Java classes that are loaded through the Java
class loader. By default, the Java permanent space is set to 64MB. The Teradata-to-Hadoop
64
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop
Post-Installation Configuration
connector table operator needs to load significantly more classes with the thread-safe loader.
For Java permanent space, we suggest the following settings for Hortonworks 2.1.2:
•
•
•
•
•
1 concurrent query: -XX:MaxPermSize=512m
2 concurrent queries: -XX:MaxPermSize=512m
3 concurrent queries: -XX:MaxPermSize=750m
4 concurrent queries: -XX:MaxPermSize=1GB
5 concurrent queries: -XX:MaxPermSize=1.25GB
Hadoop libraries require a relatively large amount of heap memory for object allocation, I/O
buffers, and temporary caches during the life of a request. The Java heap size must be
configured appropriately so that memory is efficiently used. The following table lists the
recommended minimum and maximum heap sizes for one to five concurrent queries for
Hortonworks 2.1.2:
Concurrency
Heap Sizes
1 concurrent query
Minimum: -Xms4g
Maximum: -Xmx4g
2 concurrent queries
Minimum: -Xms6g
Maximum: -Xmx6g
3 concurrent queries
Minimum: -Xms9g
Maximum: -Xmx9g
4 concurrent queries
Minimum: -Xms12g
Maximum: -Xmx12g
5 concurrent queries
Minimum: -Xms15g
Maximum: -Xmx15g
JVM Configuration and ORC Files
ORC files contain a property stripe size. The JVM memory tuning parameters for Teradata
Database are optimized for a 64 MB stripe size. A larger stripe size requires significantly
more heap memory to run with reasonable response time. It is recommended that a stripe
size of 64 MB be used. For larger stripe sizes an additional 8 GB of memory per concurrent
query is recommended.
For information on modifying the ORC file stripe size, see the Hadoop documentation. For
information on modifying the JVM heap, see Configuring Teradata JVM Options.
Configuring Teradata JVM Options
If you’ve determined that you need to change the JVM settings for your Teradata system, you
can apply the JVM options to the system using a cufconfig utility property called
JVMOptions.
For information about the cufconfig utility, see Utilities, B035-1102.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
65
Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop
Post-Installation Configuration
To set the appropriate JVM heap and perm space, you can perform the following steps on the
primary node of the Teradata system:
1 Create a text file that contains the appropriate JVM heap and perm sizes. List all options
in a single line, delimited with a space. For example, the settings below are for a Teradata
system with 36 AMPs on each node with 96GB of memory.
JVMOptions: -server -XX:UseParallelGC -XX:+UseParallelOldGC –
Xms7100m -Xmx7100m -XX:NewSize=2370m -XX:MaxNewSize=2370m XX:MaxPermSize=864m
Name the file itjvmopt.txt and put it in the following location:
/etc/opt/teradata/tdconfig/jvmconfig/
2 Run the following command:
cufconfig –f
/etc/opt/teradata/tdconfig/jvmconfig/jvmopt.txt
3 Then, run the following command to make sure that the JVMOptions property appears
on the bottom of the output and that its value has been updated with the values specified
in your itjvmopt.txt file:
cufconfig -o
The bottom of the output should look similar to this:
USRLibraryPath: /usr/tdbms/lib
JVMOptions: -server -XX:UseParallelGC -XX:+UseParallelOldGC –
Xmx7100m -Xms7100m -XX:NewSize=2370m -XX:MaxNewSize=2370m
XX:MaxPermSize=864m
4 Restart the database so that the new JVM options take effect.
Monitoring User Queries Between Teradata and a Foreign Server
To monitor the data transfer between Teradata and a foreign server by a user’s request, you
can use the following APIs:
• PM/API MONITOR SESSION
Note: If you are using the MONITOR SESSION request, set the mon_ver_id to 11, where
mon_ver_id is the monitor software version ID field for the current release.
• Open API MonitorMySessions
• Open API MonitorSession
These APIs return the following field/column values:
66
Field Name/Column Name
Description
ReqTblOpBytesIn
The total number of bytes transferred into Teradata Database
from a foreign server for the current request through one or
more table operators.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop
Archive and Restore
Field Name/Column Name
Description
Note: The request may involve one or multiple table operator
executions. The ReqTblOpBytesIn output parameter shows
bytes transferred across all invocations within the request.
ReqTblOpBytesOut
The total number of bytes transferred out of Teradata
Database and into a foreign server for the current request
through one or more table operators.
Note: The request may involve one or multiple table operator
executions. The ReqTblOpBytesOut output parameter shows
bytes transferred across all invocations within the request.
For more information about these APIs, see Application Programming Reference, B035-1090.
Note: You can also monitor the transfer of the data in Viewpoint. Check the Remote Data
Imported and Data Exported Remotely fields on the Overview tab of the Details View of
Query Monitor, Query Groups, My Queries, Query Spotlight, and Workload Monitor. For
more information about Viewpoint, see the Teradata Viewpoint User Guide, B035-2206.
Archive and Restore
Database DBC.TD_SERVER_DB stores all the server objects created using the Teradata-toHadoop connector. Note the following about archiving or restoring this database:
• Archive and restore TD_SERVER_DB as a user database. It is not archived and restored
as part of DBC.
• You can archive and restore the entire database or individual server objects in the
database.
• Teradata archives the associated rows from DBC.ServerInfo and DBC.ServerTblOpInfo
at the same time as it archives TD_SERVER_DB.
• The post-restore script validates server connectivity.
Note the following about copying foreign server objects:
• Users lose their privileges on a foreign server object after it is copied, so administrators
must grant these privileges again.
• You can copy the entire TD_SERVER_DB database or individual server objects in the
database. Renaming is not allowed.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
67
Chapter 5 Administration and Utilities for Teradata QueryGrid: Teradata Database-to-Hadoop
Archive and Restore
68
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
CHAPTER 6
Data Dictionary Tables and Views for Teradata
QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views and Tables
This chapter describes the Teradata-to-Hadoop connector Data Dictionary views and tables
in the DBC. The Teradata-to-Hadoop connector Data Dictionary tables are reserved for
system use and contain metadata about the foreign servers defined on the Teradata Database
system. The Teradata-to-Hadoop connector Data Dictionary data can also be populated in
other Data Dictionary views and tables. You can retrieve frequently-used data from any of
the Data Dictionary tables via pre-defined views. The Teradata database administrator
determines the set of views available to a user.
You can use Teradata Administrator, Teradata SQL Assistant, or Teradata Studio Express to
list the Teradata-to-Hadoop connector Data Dictionary views and tables and details about
each view or table column.
The views and tables in this book are presented in alphabetical order for quick reference to
the meaning of individual fields. The actual Data Dictionary tables and view fields do not
appear in alphabetical order.
Installing Data Dictionary
The system databases, tables and associated views and macros are created at system
initialization (sysinit) time and by executing a set of Dictionary Initialization Program (DIP)
scripts. The DIPALL option executes all of the DIP scripts that are installed on every system.
Optional DIP scripts include:
• DIPACC (supports database access logging)
• DIPPDCR (supports infrastructure used by Teradata Professional Services when
analyzing system performance issues)
For information about ...
See ...
the DIP utility and its executable SQL scripts
(such as DIPPDCR, DIPACC, DIPSYSUIF,
DIPVIEWS, and DIPALL)
Utilities.
the macros that are created by the DIPVIEWS
script
Database Administration.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
69
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
For information about ...
See ...
using the DIPACC script to create the
DBC.ACCLogRule macro, which is required for
setting up database access logging,
Security Administration.
Displaying the View and Table Definitions
To display the view or table definitions, execute SHOW VIEW or SHOW TABLE objectname,
where objectname is the name of the view or table whose most recent SQL create text is to be
reported. For details on using the SHOW VIEW or SHOW TABLE statement, see SQL Data
Definition Language - Syntax and Examples, B035-1144.
For more information about the views and tables described in this chapter, see Data
Dictionary, B035-1092 or Database Administration, B035-1093.
Data Dictionary Views
This topic describes the following:
• Teradata-to-Hadoop connector Data Dictionary views
• the values related to the Teradata-to-Hadoop connector that are populated in the DBQL
QryLogV and QryLogStepsV views
The Data Dictionary views described here are categorized as operational internal database
views.
Note: The Teradata-to-Hadoop connector Data Dictionary views each have an equivalent X
version (for example, for the ServerV view, there is also an X version of that view). The X
version of the view limits the view to only those server objects to which the user selecting
from the view has access.
For more information about view categories and X and VX views (also referred to as modern
views), see Data Dictionary, B035-1092.
QryLogV
Category
Operations
Database
DBC
View Column and Referenced Table.Column
View Column
Data Type
Format
Referenced Table.Column
TotalServerByteCount
FLOAT
----,---,---,---,--9
DBQLogTbl.TotalServerByteCount
70
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
Usage Notes
This DBQL view of the DBQLogTbl table reports things such as the AMP using the most
CPU, the AMP with the most I/O, or maximum amount of spool used when processing a
query. It can also report the size of the data transferred between Teradata and a foreign
server.
For more information about the QryLogV view, see Data Dictionary, B035-1092.
For more information about the DBQL feature and detailed descriptions about the QryLogV
view, see Database Administration, B035-1093.
TotalServerByteCount Column
The TotalServerByteCount column is the total number of bytes read from or sent to a foreign
server.
Example of QryLogV
The following SELECT statement retrieves the main view for DBQL:
SELECT * from dbc.qrylogv;
Result:
BTEQ
ProcID
CollectTimeStamp
QueryID
UserID
UserName
DefaultDatabase
AcctString
ExpandAcctString
SessionID
LogicalHostID
RequestNum
InternalRequestNum
TxnUniq
LockLevel
LogonDateTime
AcctStringTime
AcctStringHour
AcctStringDate
LogonSource
01 LSS
AppID
ClientID
ClientAddr
QueryBand
ProfileID
StartTime
FirstStepTime
FirstRespTime
ElapsedTime
NumSteps
30719
2014-02-05 01:30:49
307191222605399239
00000004
TEST1
TEST1
SALES
SALES
1,012
1
6
6
?
?
2014-02-05 01:57:18
?
?
?
(TCP/IP) d1e5 198.51.100.15 192.0.2.24
9208
AA186017
BTEQ
AA186017
198.51.100.24
?
?
2014-02-05 01:57:27.260000
2014-02-05 01:57:28.690000
2014-02-05 01:57:30.480000
0:00:03.220000
4
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
71
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
NumStepswPar
MaxStepsInPar
NumResultRows
TotalIOCount
AMPCPUTime
ParserCPUTime
UtilityByteCount
UtilityRowCount
ErrorCode
ErrorText
WarningOnly
AbortFlag
CacheFlag
StatementType
StatementGroup
QueryText
NumOfActiveAMPs
MaxAMPCPUTime
MaxCPUAmpNumber
MinAmpCPUTime
MaxAmpIO
MaxIOAmpNumber
MinAmpIO
SpoolUsage
LSN
EstResultRows
EstProcTime
EstMaxRowCount
TDWMEstMemUsage
AMPCPUTimeNorm
ParserCPUTimeNorm
MaxAMPCPUTimeNorm
MaxCPUAmpNumberNorm
MinAmpCPUTimeNorm
ParserExpReq
ProxyUser
ProxyRole
SessionTemporalQualifier
CalendarName
CPUDecayLevel
IODecayLevel
TacticalCPUException
TacticalIOException
SeqRespTime
ReqIOKB
ReqPhysIO
ReqPhysIOKB
DataCollectAlg
CallNestingLevel
NumRequestCtx
KeepFlag
QueryRedriven
ReDriveKind
72
0
0
2
44
0.150
0.216
?
?
0
?
Select
Select
Select * from Product@remote_server;
4
0.108
0
0.000
35
0
3
1,024
?
4
0.145
4
0.000
10.087
14.526
7.263
0
0.000
0.011
?
?
?
TERADATA
?
?
?
?
?
1,652.000
0.000
0.000
1
0
1
N
N
?
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
LastRespTime
DisCPUTime
Statements
DisCPUTimeNorm
TxnMode
RequestMode
DBQLStatus
NumFragments
VHLogicalIO
VHPhysIO
VHLogicalIOKB
VHPhysIOKB
LockDelay
CheckpointNum
UnityTime
UtilityInfoAvailable
UnitySQL
ThrottleBypassed
IterationCount
TTGranularity
MaxStepMemory
TotalServerByteCount
?
0.000
1
0.000
BTET
Exec
?
?
0.000
0.000
0.000
0.000
?
?
?
N
?
?
?
LogicalRow
1.250
2,012
QryLogStepsV
Category
Operations
Database
DBC
View Column and Referenced Table.Column
View Column
Data Type
Format
Referenced Table.Column
ServerByteCount
FLOAT
----,---,---,---,--9
DBQLStepTbl.ServerByteCount
Usage Notes
This view of the DBQLStepTbl table is populated if you specify the WITH STEPINFO
option. When the query completes, the system logs one row for each query step, including
parallel steps.
This view can also show the size of the data transferred between Teradata and a foreign
server for each step.
For more information about the QryLogStepsV view, see Data Dictionary, B035-1092.
For more information about the DBQL feature and a description of the QryLogStepsV view,
see Database Administration, B035-1093.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
73
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
ServerByteCount Column
The ServerByteCount column is the total number of bytes sent to or received from a foreign
server for each step.
Example of QryLogStepsV
The following SELECT statement gives the user name and the elapsed time of the steps
whose queries have transferred more than 10 MB of data.
SELECT lv.username, sv.elapsedtime FROM DBC.QryLogStepsV AS sv,
DBC.QryLogV AS lv WHERE ServerByteCount / (1024*1024) GT 10 AND
sv.queryid = lv.queryid;
Result:
username TOM
ElapsedTime
username JOHN
ElapsedTime
0:10:22.220000
0:21:32.510000
ServerV[X]
Category
Operations
Database
DBC
View Column and Referenced Table.Column
View Column
Data Type
Format
Referenced Table.Column
AuthorizationName
VARCHAR(128)
X(128)
TVM.AuthName
X(15)
TVM.AuthorizationType
UNICODE
NOT CASESPECIFIC
AuthorizationType
VARCHAR(15)
UNICODE
NOT CASESPECIFIC
CreateTimeStamp
TIMESTAMP(0)
YYYY-MMDDBHH:MI:SS
TVM.CreateTimeStamp
CreatorName
VARCHAR(128)
X(128)
Dbase.DatabaseName
X(128)
Dbase.DatabaseName
UNICODE
NOT CASESPECIFIC
NOT NULL
DataBaseName
74
VARCHAR(128)
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
View Column
Data Type
Format
Referenced Table.Column
X(128)
Dbase.DatabaseName
UNICODE
NOT CASESPECIFIC
NOT NULL
LastAlterName
VARCHAR(128)
UNICODE
NOT CASESPECIFIC
NOT NULL
LastAlterTimeStamp
TIMESTAMP(0)
YYYY-MMDDBHH:MI:SS
TVM.LastAlterTimeStamp
ServerID
BYTE(6)
X(12)
TVM.TVMId
X(128)
TVM.TVMName
NOT NULL
ServerName
VARCHAR(128)
UNICODE
NOT CASESPECIFIC
NOT NULL
Usage Notes
This Teradata QueryGrid connector Data Dictionary view provides details about the foreign
servers defined in the Teradata Database system.
Possible Values of the AuthorizationType Column
Value
Description
T
INVOKER TRUSTED
S
DEFINER TRUSTED
''
UNKNOWN
Example of ServerV[X]
The following SELECT statement returns information about the foreign server objects
created by user 'dba'.
select * from DBC.ServerV where CreatorName = 'dba';
Result:
ServerID
DataBaseName
ServerName
CreatorName
CreateTimeStamp
LastAlterName
LastAlterTimeStamp
000011960000
TD_SERVER_DB
SERVER_1
dba
2014-12-02 19:51:46
dba
2014-12-02 19:51:46
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
75
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
ServerID
DataBaseName
ServerName
CreatorName
CreateTimeStamp
LastAlterName
LastAlterTimeStamp
AuthorizationName
AuthorizationType
000012960000
TD_SERVER_DB
SERVER_2
dba
2014-12-02 19:51:50
dba
2014-12-02 19:51:50
user1
INVOKER TRUSTED
ServerInfoV[X]
Category
Operations
Database
DBC
View Column and Referenced Table.Column
View Column
Data Type
Format
Referenced Table.Column
NameInfo
VARCHAR(128)
X(128)
ServerInfo.NameInfo
X(7)
ServerInfo.NameInfoType
X(128)
TVM.TVMName
X(256)
ServerInfo.ValueInfo
UNICODE
NOT NULL
UPPERCASE NOT
CASESPECIFIC
NVPType
VARCHAR(7)
UNICODE
ServerName
VARCHAR(128)
UNICODE
NOT CASESPECIFIC
NOT NULL
ValueInfo
VARCHAR(32000)
Usage Notes
This Teradata QueryGrid connector Data Dictionary view provides details about the name
value pairs used by foreign servers defined in the Teradata Database system.
Possible Values of the NVPType Column
76
Value
Description
I
IMPORT
E
EXPORT
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
Value
Description
G
GLOBAL
''
UNKNOWN
TblSrvV[X]
Category
Operations
Database
DBC
View Column and Referenced Table.Columns
View Column
Data Type
Format
Referenced Table.Column
ServerName
VARCHAR(128)
X(128)
TVM.TVMName
X(128)
Dbase.DatabaseName
X(7)
ServerTblOpInfo.TblOpType
X(128)
ServerTblOpInfo.TblopName
X(128)
ServerTblOpInfo.TblopDBName
UNICODE
NOT NULL
SrvDataBaseName
VARCHAR(128)
UNICODE
NOT NULL
TableOperatorType
VARCHAR(7)
UNICODE
TblOpName
VARCHAR(128)
UNICODE
NOT NULL
TbpOpDataBaseName
VARCHAR(128)
UNICODE
NOT NULL
Usage Notes
This Teradata-to-Hadoop connector view returns information about the foreign servers and
their associated table operators.
Possible Values of the TableOperatorType Column
Value
Description
I
IMPORT
E
EXPORT
''
UNKNOWN
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
77
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
Example of TblSrvV[X]
The following SELECT statement returns information about the foreign server,
'SQLHSRV_1,' and its associated table operators, LOAD_TO_HCATALOG_HDP2_3_0 and
LOAD_FROM_HCATALOG_HDP2_3_0.
BTEQ -- Enter your SQL request or BTEQ command:
select * from DBC.TblSrvV where ServerName = 'SQLHSRV_1';
*** Query completed. 2 rows found. 5 columns returned.
*** Total elapsed time was 1 second.
ServerName
SrvDataBaseName
TblOpName
TblOpDBName
TableOperatorType
ServerName
SrvDataBaseName
TblOpName
TblOpDBName
TableOperatorType
SQLHSRV_1
TD_SERVER_DB
LOAD_TO_HCATALOG_HDP2_3_0
SYSLIB
EXPORT
SQLHSRV_1
TD_SERVER_DB
LOAD_FROM_HCATALOG_HDP2_3_0
SYSLIB
IMPORT
TblSrvInfoV[X]
Category
Operations
Database
DBC
View Column and Referenced Table.Column
View Column
Data Type
Format
Referenced Table.Column
NameInfo
VARCHAR(128)
X(128)
ServerInfo.NameInfo
X(128)
TVM.TVMName
X(128)
Dbase.DatabaseName
UNICODE
NOT NULL
UPPERCASE NOT
CASESPECIFIC
ServerName
VARCHAR(128)
UNICODE
NOT NULL
SrvDataBaseName
VARCHAR(128)
UNICODE
NOT NULL
78
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Views
View Column
Data Type
Format
Referenced Table.Column
TableOperatorType
VARCHAR(7)
X(7)
ServerTblOpInfo.TblOpType
X(128)
ServerTblOpInfo.TblopDBName
X(128)
ServerTblOpInfo.TblOpName
X(256)
ServerInfo.ValueInfo
UNICODE
TbpOpDataBaseName
VARCHAR(128)
UNICODE
NOT NULL
TblOpName
VARCHAR(128)
UNICODE
NOT NULL
ValueInfo
VARCHAR(32000)
UNICODE
Usage Notes
This Teradata QueryGrid connector view returns the name value pairs defined for a foreign
server. For more information about name value pairs, see CREATE FOREIGN SERVER.
Possible Values of the TableOperatorType Column
Value
Description
I
IMPORT
E
EXPORT
''
UNKNOWN
Example of TblSrvInfoV [X]
The following SELECT statement returns the name value pairs defined for the IMPORT
table operator associated with the foreign server object, 'SQLHSRV_1'.
select * from DBC.TblSrvInfoV where ServerName='SQLHSRV_1' and
TableOperatorType = 'IMPORT';
Result:
ServerName
SrvDataBaseName
TblOpName
TbpOpDataBaseName
NameInfo
ValueInfo
TableOperatorType
ServerName
SrvDataBaseName
TblOpName
TbpOpDataBaseName
NameInfo
ValueInfo
SQLHSRV_1
TD_SERVER_DB
LOAD_FROM_HCATALOG_HDP2_3_0
SYSLIB
hosttype
'hadoop'
IMPORT
SQLHSRV_1
TD_SERVER_DB
LOAD_FROM_HCATALOG_HDP2_3_0
SYSLIB
username
'hive'
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
79
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Tables
TableOperatorType
ServerName
SrvDataBaseName
TblOpName
TbpOpDataBaseName
NameInfo
ValueInfo
TableOperatorType
ServerName
SrvDataBaseName
TblOpName
TbpOpDataBaseName
NameInfo
ValueInfo
TableOperatorType
IMPORT
SQLHSRV_1
TD_SERVER_DB
LOAD_FROM_HCATALOG_HDP2_3_0
SYSLIB
server
'10.25.32.106'
IMPORT
SQLHSRV_1
TD_SERVER_DB
LOAD_FROM_HCATALOG_HDP2_3_0
SYSLIB
port
'9083'
IMPORT
Data Dictionary Tables
This topic describes the following:
• The Teradata-to-Hadoop connector Data Dictionary tables
• The row values related to the Teradata-to-Hadoop connector that are populated in the
DBQL DBQLogTbl and DBQLStepTbl tables and the following Data Dictionary tables:
• DBC.AccessRights
• DBC.AccLogRuleTbl
• DBC.Dependency
• DBC.TVM
Like other system tables, the Teradata-to-Hadoop connector pre-defined tables are created as
relational tables in the DBC database during system initialization (SysInit) or by the Table
Initialization Program and can be accessed only by users who have the required privileges to
the tables.
Access to the tables is strictly controlled to ensure that users (including system
administrators) cannot modify them.
Notice: To ensure that the system functions properly, do not modify or delete any Data Dictionary
tables. Use the Data Dictionary views to access data in the tables to ensure that the tables are
not accidentally modified or deleted. For information about the data dictionary views, see
Data Dictionary Views.
DBC.AccessRights
This Data Dictionary table stores information about discretionary access privileges and rowlevel security privileges that have been granted.
This information includes:
• The ID of the user that was granted the privilege
80
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Tables
• The specific privilege that was granted
• Who granted it, and whether it was granted using the GRANT statement
Row Values
Row
Description
AccessRight
The type of privilege granted on a user object only. Possible values
include:
• CS (CREATE SERVER)
• DS (DROP SERVER)
Note: These values must be explicitly granted.
Related Topics
For more information about the DBC.AccessRights table, see Data Dictionary, B035-1092.
DBC.AccLogRuleTbl
This Data Dictionary table stores information about the logging of access privilege checks.
This information includes:
• Typical access control privilege checks
• Row-level security privilege checks
• The user, database, and object involved in the privilege check
Note: For Teradata QueryGrid connector queries, this is the remote object or user on the
foreign server involved in the privilege check.
Row Values
Row
Description
AcrCreateServer
This row stores the logging in effect for the CREATE SERVER
privilege on TD_SERVER_DB to which the rule applies.
This row is populated if you specify the ON FOREIGN SERVER
option in the BEGIN or END LOGGING statement.
AcrDropServer
This row stores the logging in effect for the DROP SERVER privilege
on TD_SERVER_DB to which the rule applies.
This row is populated if you specify the ON FOREIGN SERVER
option in the BEGIN or END LOGGING statement.
DBC.DBQLogTbl
This Data Dictionary table is the main DBQL table containing information about the SQL
and Teradata QueryGrid connector queries being logged.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
81
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Tables
The DBQLogTbl default row consists of all the available DBQLogTbl table fields. The default
row provides general query information that is usually adequate for investigating a query that
is interfering with performance.
When no options are specified, a default row includes:
• User ID and user name under which the session being logged was initiated
• Unique ID for the process, session, and host (client) connection
• Account string, expanded as appropriate, that was current when the query completed
• First 200 characters of the query statement
• CPU and I/O statistics
• Default database name that was current when the query completed
• The total size of the data transferred between Teradata and a foreign server
The default is one default row per query.
Row Values
Row
Description
StatementGroup
If there is a DDL statement in a request, the StatementGroup
column reports which type:
• DDL CREATE if this is a CREATE FOREIGN SERVER
statement
• DDL ALTER if this is an ALTER or DROP FOREIGN SERVER
statement
• OTHER SYS OTHER if this is a SHOW FOREIGN SERVER or
HELP FOREIGN statement
• DDL GRANT
If the statement has only one DML statement or multiple DML
statements that are all of the same type, StatementGroup indicates
the type. For example if there are three DELETE statements in a
request, StatementGroup reports:
DML DELETE
Similarly, for requests with individual or multiple INSERT,
INSERT... SELECT, UPDATE or SELECT statements,
StatementGroup reports:
• DML INSERT
• DML INSERT... SELECT
• DML UPDATE
• SELECT
In a multistatement request with different types of DML statements,
you see a list showing the number of statements of each type in the
request. For example, a request with one insert and two update
statements appears as:
DML Del=0 Ins=1 InsSel=0 Upd=2 Sel=0
StatementType
82
The type of statement of the query.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Tables
Row
Description
In a multistatement request, this is the last statement of the request.
However, this may not accurately describe the request. For more
statement information, see StatementGroup.
The possible values recorded include:
• CREATE SERVER for the CREATE FOREIGN SERVER
statement.
• ALTER SERVER for ALTER FOREIGN SERVER statement.
• DROP SERVER for the DROP FOREIGN SERVER statement.
• SHOW for the SHOW FOREIGN SERVER statement.
• HELP for the HELP FOREIGN statement.
TotalServerByteCount
The total number of bytes read from or sent to a foreign server
object. The column is NULL if the request does not load or send
data from or to a foreign server object.
Related Topics
For more information about the ...
See ...
DBQL feature, how to enable DBQL logging, and Database Administration, B035-1093.
the DBC.DBQStepTbl table and fields
BEGIN/REPLACE QUERY LOGGING statement SQL Data Definition Language - Syntax and
Examples, B035-1144.
DBC.Dependency
This Data Dictionary table stores information about the relationships and dependencies
between various types of objects. The types of relationships and dependencies include:
• Relationships between tables and row-level security constraints
• Dependencies between JAR objects
• Relationships between foreign server objects and table operators
Row Value
Row
Description
RelationshipCode
The value KO indicates the relationship between the foreign
server and the table operator.
DBC.ServerInfo
This Teradata QueryGrid connector Data Dictionary table stores the name value pairs of the
server object that are used by the table operators to connect to the foreign server if the name
value pair of the USING clause is specified in the CREATE/ALTER FOREIGN SERVER
statement.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
83
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Tables
Row Values
DBC.ServerInfo Field
Description
DatabaseID
The database ID that contains the server object.
NameInfo
The name attribute specified in the name value pair of
the USING clause in the CREATE or ALTER
FOREIGN SERVER statement.
NameInfoType
Possible values include:
• G indicates the server attribute name value pair
defined in the USING clause of the CREATE or
ALTER FOREIGN SERVER statement.
• I indicates the IMPORT table operator name value
pair defined in the USING clause of the CREATE
or ALTER FOREIGN SERVER statement.
• E indicates the EXPORT table operator name value
pair defined in the USING clause of the CREATE
or ALTER FOREIGN SERVER statement.
ServerID
The ID of the server object.
ValueInfo
The value attribute specified in the name value pair of
the USING clause in the CREATE or ALTER
FOREIGN SERVER statement.
DBC.ServerTblOpInfo
This Teradata QueryGrid connector Data Dictionary table stores information about the table
operator associated with the foreign server.
This table also includes the database and table operator names to avoid issues with the object
ID changing during an archive or restore operation.
Row Values
84
DBC.ServerTblOpInfo Field
Description
DatabaseId
The database ID that contains the foreign server
object.
ServerID
The ID of the foreign server.
TblOpDatabaseName
The database name in which the table operator is
defined.
TblopName
The name of the table operator associated with the
foreign server.
TblOpType
Possible values include:
• I indicates the IMPORT table operator name value
pair.
• E indicates the EXPORT table operator name value
pair.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Tables
DBC.ServerTblOpInfo Field
Description
•
'UNKNOWN' indicates an unknown name value
pair.
Note: This column may return more than one operator
type.
DBC.DBQLStepTbl
This DBQL table stores information about each processing step used to satisfy the query. For
a Teradata QueryGrid connector query, this includes the size of the data transferred between
Teradata and a foreign server. One row is logged for each step.
This Data Dictionary table is only populated if you specify the WITH STEPINFO option in
the BEGIN or REPLACE QUERY LOGGING statement. When the query completes, the
system logs one row for each query step, including parallel steps.
Row Value
Row
Description
ServerByteCount
The number of row bytes read from or sent to a foreign server
object.
This column is NULL if the step does not load or send data to or
from a foreign server.
Related Topics
For more information ...
See ...
For more information about the DBQL feature,
how to enable DBQL logging, the
DBC.DBQStepTbl table and fields
Database Administration, B035-1093.
For more information about the BEGIN/
REPLACE QUERY LOGGING statement
SQL Data Definition Language - Syntax and
Examples, B035-1144.
DBC.TVM Table
This Data Dictionary table stores one row for each of the following objects on the system:
• Column
• Database
• External stored procedure
• Hash index
• JAR
• Join index
• Macro
• Stored procedure
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
85
Chapter 6 Data Dictionary Tables and Views for Teradata QueryGrid: Teradata Database-to-Hadoop
Data Dictionary Tables
•
•
•
•
•
•
Table
Trigger
User-defined function
User-defined method
User-defined type
View
For Teradata QueryGrid connector queries, the DBC.TVM table stores one row for each
foreign server object on the Teradata Database system if one of the following options is
specified:
• The TRUSTED security type of the CREATE or REPLACE AUTHORIZATION
statement.
• The optional comment string of the SQL COMMENT statement.
The DBC.TVM table is not archived during archive and restore operations.
Row Values
Row
Description
AuthIdUsed
The authorization ID of the foreign server object.
This row returns NULL if the foreign server object is not authorized.
AuthName
The name of the authorization defined for the foreign server object.
This row returns NULL if the foreign server object is not authorized.
AuthorizationSubType
Whether the specified authorization is the default authorization.
Possible values include:
• I indicates the specified INVOKER TRUSTED authorization.
• D indicates DEFINER TRUSTED authorization.
• F indicates DEFINER DEFAULT TRUSTED authorization.
• NULL indicates the foreign server object is not authorized.
AuthorizationType
The type of authorization of the foreign server object. Possible values
include:
• T indicates that the TRUSTED security type of the CREATE or
REPLACE AUTHORIZATION statement is specified.
• NULL indicates the foreign server object is not authorized
CommentString
Text or comment supplied by the user on the column, database, table,
view, macro, user-defined function, user-defined types, user-defined
methods, stored procedure, role, profile, user, or foreign server.
TableKind
If you are using a foreign server object, this row returns K.
Note: K is supported on the Teradata QueryGrid connectors only.
For more information on TableKind values, see Data Dictionary,
B035-1092.
86
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
APPENDIX A
Notation Conventions
About Notation Conventions
This appendix describes the notation conventions used in this book.
Convention
Description
Syntax Diagrams
Describes SQL syntax form, including options.
Square braces in the text
Represent options. The indicated parentheses are required when you
specify options.
For example:
DECIMAL [(n[,m])] means the decimal data type can be defined
optionally:
• without specifying the precision value n or scale value m
• specifying precision (n) only
• specifying both values (n,m)
You cannot specify scale without first defining precision.
• CHARACTER [(n)] means that use of (n) is optional.
The values for n and m are integers in all cases.
Syntax Diagram Conventions
Notation Conventions
Item
Definition and Comments
Letter
An uppercase or lowercase alphabetic character ranging from A through Z.
Number
A digit ranging from 0 through 9.
Do not use commas when typing a number with more than 3 digits.
Word
Keywords and variables.
• UPPERCASE LETTERS represent a keyword.
Syntax diagrams show all keywords in uppercase, unless operating system
restrictions require them to be in lowercase.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
87
Appendix A Notation Conventions
Syntax Diagram Conventions
Item
Definition and Comments
•
•
lowercase letters represent a keyword that you must type in lowercase, such
as a Linux command.
Mixed Case letters represent exceptions to uppercase and lowercase rules.
The exceptions are noted in the syntax explanation.
lowercase italic letters represent a variable such as a column or table name.
•
Substitute the variable with a proper value.
lowercase bold letters represent an excerpt from the diagram.
•
The excerpt is defined immediately following the diagram that contains it.
UNDERLINED LETTERS represent the default value.
•
This applies to both uppercase and lowercase words.
Spaces
Use one space between items such as keywords or variables.
Punctuation
Type all punctuation exactly as it appears in the diagram.
Paths
The main path along the syntax diagram begins at the left with a keyword, and proceeds, left
to right, to the vertical bar, which marks the end of the diagram. Paths that do not have an
arrow or a vertical bar only show portions of the syntax.
The only part of a path that reads from right to left is a loop.
Continuation Links
Paths that are too long for one line use continuation links. Continuation links are circled
letters indicating the beginning and end of a link:
A
A
When you see a circled letter in a syntax diagram, go to the corresponding circled letter and
continue reading.
Required Entries
Required entries appear on the main path:
SHOW
If you can choose from more than one entry, the choices appear vertically, in a stack. The first
entry appears on the main path:
SHOW
CONTROLS
VERSIONS
88
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix A Notation Conventions
Syntax Diagram Conventions
Optional Entries
You may choose to include or disregard optional entries. Optional entries appear below the
main path:
SHOW
CONTROLS
If you can optionally choose from more than one entry, all the choices appear below the
main path:
READ
SHARE
ACCESS
Some commands and statements treat one of the optional choices as a default value. This
value is UNDERLINED. It is presumed to be selected if you type the command or statement
without specifying one of the options.
Strings
String literals appear in apostrophes:
'msgtext '
Abbreviations
If a keyword or a reserved word has a valid abbreviation, the unabbreviated form always
appears on the main path. The shortest valid abbreviation appears beneath.
SHOW
CONTROLS
CONTROL
In the above syntax, the following formats are valid:
SHOW CONTROLS
SHOW CONTROL
Loops
A loop is an entry or a group of entries that you can repeat one or more times. Syntax
diagrams show loops as a return path above the main path, over the item or items that you
can repeat:
,
,
(
3
4
cname
)
Read loops from right to left.
The following conventions apply to loops:
Item
Description
Example
maximum number of entries
allowed
The number appears in a
circle on the return path.
In the example, you may type
cname a maximum of four
times.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
89
Appendix A Notation Conventions
Syntax Diagram Conventions
Item
Description
Example
minimum number of entries
allowed
The number appears in a
square on the return path.
In the example, you must type
at least three groups of column
names.
separator character required
between entries
The character appears on the
return path.
In the example, the separator
character is a comma.
If the diagram does not show
a separator character, use one
blank space.
delimiter character required
around entries
The beginning and end
In the example, the delimiter
characters appear outside the characters are the left and right
return path.
parentheses.
Generally, a space is not
needed between delimiter
characters and entries.
Excerpts
Sometimes a piece of a syntax phrase is too large to fit into the diagram. Such a phrase is
indicated by a break in the path, marked by (|) terminators on each side of the break. The
name for the excerpted piece appears between the terminators in boldface type.
The boldface excerpt name and the excerpted phrase appears immediately after the main
diagram. The excerpted phrase starts and ends with a plain horizontal line:
LOCKING
excerpt
HAVING
con
excerpt
where_cond
,
cname
,
col_pos
Multiple Legitimate Phrases
In a syntax diagram, it is possible for any number of phrases to be legitimate:
dbname
DATABASE
tname
TABLE
vname
VIEW
In this example, any of the following phrases are legitimate:
dbname
90
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix A Notation Conventions
Character Shorthand Notation Used in This Book
DATABASE dbname
tname
TABLE tname
vname
VIEW vname
Sample Syntax Diagram
,
CREATE VIEW
viewname
AS
A
LOCKING
cname
CV
LOCK
ACCESS
dbname
A
DATABASE
FOR
SHARE
IN
tname
READ
TABLE
WRITE
EXCLUSIVE
vname
VIEW
EXCL
,
B
SEL
B
MODE
expr
,
FROM
qual_cond
tname
C
.aname
C
HAVING cond
;
qual_cond
,
WHERE cond
GROUP BY
cname
,
col_pos
Character Shorthand Notation Used in This Book
This book uses the Unicode naming convention for characters. For example, the lowercase
character ‘a’ is more formally specified as either LATIN CAPITAL LETTER A or U+0041.
The U+xxxx notation refers to a particular code point in the Unicode standard, where xxxx
stands for the hexadecimal representation of the 16-bit value defined in the standard.
In parts of the book, it is convenient to use a symbol to represent a special character, or a
particular class of characters. This is particularly true in discussion of the following Japanese
character encodings:
• KanjiEBCDIC
• KanjiEUC
• KanjiShift-JIS
These encodings are further defined in International Character Set Support, B035-1125.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
91
Appendix A Notation Conventions
Character Shorthand Notation Used in This Book
Character Symbols
The symbols, along with character sets with which they are used, are defined in the following
table.
Symbol
Encoding
Meaning
a-z
A-Z
0-9
Any
Any single byte Latin letter or digit.
a-z
A-Z
0-9
Any
Any fullwidth Latin letter or digit.
<
KanjiEBCDIC
Shift Out [SO] (0x0E).
Indicates transition from single to multibyte character in
KanjiEBCDIC.
>
KanjiEBCDIC
Shift In [SI] (0x0F).
Indicates transition from multibyte to single byte KanjiEBCDIC.
T
Any
Any multibyte character.
The encoding depends on the current character set.
For KanjiEUC, code set 3 characters are always preceded by ss3.
I
Any
Any single byte Hankaku Katakana character.
In KanjiEUC, it must be preceded by ss2, forming an individual
multibyte character.
Δ
Any
Represents the graphic pad character.
Δ
Any
Represents a single or multibyte pad character, depending on
context.
ss2
KanjiEUC
Represents the EUC code set 2 introducer (0x8E).
ss3
KanjiEUC
Represents the EUC code set 3 introducer (0x8F).
For example, string “TEST”, where each letter is intended to be a fullwidth character, is
written as TEST. Occasionally, when encoding is important, hexadecimal representation is
used.
For example, the following mixed single byte/multibyte character data in KanjiEBCDIC
character set
LMN<TEST>QRS
is represented as:
D3 D4 D5 0E 42E3 42C5 42E2 42E3 0F D8 D9 E2
Pad Characters
The following table lists the pad characters for the various character data types.
92
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix A Notation Conventions
Character Shorthand Notation Used in This Book
Server Character Set
Pad Character Name
Pad Character Value
LATIN
SPACE
0x20
UNICODE
SPACE
U+0020
GRAPHIC
IDEOGRAPHIC SPACE
U+3000
KANJISJIS
ASCII SPACE
0x20
KANJI1
ASCII SPACE
0x20
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
93
Appendix A Notation Conventions
Character Shorthand Notation Used in This Book
94
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
APPENDIX B
FNC Interfaces for Teradata QueryGrid:
Teradata Database-to-Hadoop
Introduction to FNC Interfaces
The following sections describe C library functions and Java application classes that Teradata
provides for use by table operators to import and export data from and to foreign servers.
The Java application classes are provided in the javFnc.jar archive; therefore, your search
path for Java classes must include the directory containing the javFnc.jar archive. The default
location for the archive is in the bin directory of the Teradata software distribution:
/usr/tdbms/bin
For more information about the C library functions and Java application classes, see SQL
External Routine Programming, B035-1147.
FNC_GetAmpHash / getAmpHash
Purpose
Returns values that hash to the specified AMPs.
C Signature
void
FNC_GetAmpHash(int
int
**amphash,
size)
Parameter
Type
Description
int **
amphash
IN/OUT
amphash[n][0] is the AMP number.
int
size
IN
amphash[n][1] will be returned with the value that
hashes to the AMP.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
The size (n) of the amphash array.
95
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_GetHashAmp / getHashAmp
Java Signature
Defined in RuntimeContract class:
public void getAmpHash(int[][] amphash)
Usage Notes
This routine is callable on an AMP or PE vproc.
FNC_GetHashAmp / getHashAmp
Purpose
Accepts data and determines the AMP which would be responsible for that key.
C Signature
int
FNC_GetHashAmp(FNC_HashRow_t
int
int
*data,
size,
*retCode)
Parameter
Type
Description
FNC_HashRow_t *
data
IN
A pointer to an array of structures representing table
columns.
FNC_HashRow_t is defined as follows:
typedef struct {
void *data;
parm_tx type;
} FNC_HashRow_t;
int
size
IN
The size of the data and return arrays.
int *
retCode
OUT
A pointer to an integer value to indicate success or an
error number. 0 indicates success.
Java Signature
Defined in RuntimeContract class:
public int getHashAmp(Object[] data)
Return Value
An integer representing the number of the AMP that would be responsible for the key.
96
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_SetActivityCount / setActivityCount
Usage Notes
This routine is callable on a PE vproc only by a table operator.
FNC_SetActivityCount / setActivityCount
Purpose
Sets the number of rows exported.
C Signature
void
FNC_SetActivityCount(int
long
stream,
rowsexported)
Java Signature
Defined in RuntimeContract class:
public void setActivityCount(int stream, long rowsexported)
throws SQLException
Parameters
Parameter
Type
Description
stream
IN
Specifies which stream to write to.
rowsexported
IN
The value to be written to ActivityCount.
Usage Notes
This routine is callable on an AMP vproc only by a table operator.
FNC_TblGetNodeData
Purpose
Returns node IDs and AMP IDs for all online AMP vprocs, allowing table functions and
table operators to configure themselves to run on specific AMPs.
This routine is callable on an AMP or PE vproc.
For details about this routine, see SQL External Routine Programming, B035-1147.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
97
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpBytesTransferred / bytesTransferred
FNC_TblOpBytesTransferred / bytesTransferred
Purpose
Records the number of bytes transferred between Teradata Database and the foreign server
by the table operator.
C Signature
void
FNC_TblOpBytesTransferred(unsigned long
unsigned long
in,
out)
Java Signature
Defined in RuntimeContract class:
public void bytesTransferred(long in, long out)
throws SQLException
Parameters
Parameter
Type
Description
in
IN
The number of bytes transferred into Teradata Database
from the foreign server.
out
IN
The number of bytes transferred from Teradata Database
to the foreign server.
Usage Notes
This routine is callable on an AMP vproc only by a table operator.
FNC_TblOpGetBaseInfo / getBaseInfo
Purpose
Examines each column in the parser tree and gets the information of the base element if the
type of the column is a user-defined type (UDT).
C Signature
void
FNC_TblOpGetBaseInfo(FNC_TblOpColumnDef_t
UDT_BaseInfo_t
98
*colDefs,
*baseInfo)
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpGetBaseInfo / getBaseInfo
Parameter
Type
Description
FNC_TblOpColumnDef_t *
colDefs
IN
A list of column definitions.
UDT_BaseInfo_t *
baseInfo
OUT
For more information about the FNC_TblOpColumnDef_t structure, see SQL
External Routine Programming .
A list of UDT_BaseInfo_t structures, one for each column in colDefs .
UDT_BaseInfo_t is defined as follows:
typedef struct {
SMALLINT udt_indicator;
/* type of the UDT */
/* 0=NONUDT; 1=ARRAY; 2=STRUCT; 3=JSON */
int
array_numDimension;
/* the number of dimensions for ARRAY UDT */
dtype_et base_datatype;
/* for array UDT, this is the data type of each element */
int
base_max_length;
SMALLINT base_total_interval_digits;
SMALLINT base_num_fractional_digits;
} UDT_BaseInfo_t;
dtype_et is defined as follows:
typedef int dtype_et;
Valid values are defined by the dtype_en enumeration in the sqltypes_td.h header
file.
Java Signature
Defined in RuntimeContract class:
public UDTBaseInfo[] getBaseInfo(ColumnDefinition[] colDefs)
throws SQLException
The method returns a list of UDTBaseInfo, one for each column passed in.
Usage Notes
This routine detects whether or not the type of the column is a UDT, and if it is a UDT,
whether it is ARRAY, STRUCT, or JSON. This information is returned in
baseInfo.udt_indicator.
If the column is an ARRAY UDT, then baseInfo is filled with detailed information about the
base element of the array.
Note: This routine currently does not support returning more information for other types of
UDTs except ARRAY UDT.
The routine is callable on a PE vproc only by a table operator.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
99
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpGetColDef
FNC_TblOpGetColDef
Purpose
Retrieves column definitions of the stream specified by the input parameters. The routine
also returns the output column definition for the contract function.
For details about this routine, see SQL External Routine Programming, B035-1147.
FNC_TblOpGetContractDef
Purpose
Retrieves the contract function context.
This routine can be used to get the contract definition at different phases for the table
operator.
For details about this routine, see SQL External Routine Programming, B035-1147.
FNC_TblOpGetContractPhase / getContractPhase
Purpose
Indicates the phase in the parser from which the contract function is being called.
C Signature
int
FNC_TblOpGetContractPhase()
Java Signature
Defined in RuntimeContract class:
public ContractPhase getContractPhase();
Return Value
The parser phases are as follows:
• FNC_CTRCT_GET_ALLCOLS_PHASE = 0
Indicates that all columns for the remote table should be returned.
• FNC_CTRCT_VALIDATE_PHASE = 1
Validates that the given inputs are correct. The contract function can be called multiple
times from this phase.
100
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpGetExternalQuery / getExternalQuery
Note: This phase is currently not used.
• FNC_CTRCT_COMPLETE_PHASE = 2
Indicates that this is the last call of the contract function and any foreign server actions
that need to be done should be completed.
• FNC_CTRCT_DDL_PHASE = 3
Indicates that execution of the CREATE SERVER statement is being completed and the
connectivity should be verified.
• FNC_CTRCT_DEFINE_SERVER_PHASE = 4
Indicates that a CREATE VIEW or CREATE MACRO statement is being executed and
that the custom clause data may not be valid.
Usage Notes
This routine is callable on a PE vproc only by a table operator.
FNC_TblOpGetExternalQuery / getExternalQuery
Purpose
Generates the text query string for the foreign server and returns the interface version that is
currently supported.
C Signature
void
FNC_TblOpGetExternalQuery(FNC_TblOpColumnDef_t
ServerType
ExtOpSetType
int
unsigned char
unsigned int
*colDefs,
serverType,
opSet,
*interfaceVersion,
**extQryPtr,
*extQryLenPtr)
Parameter
Type
Description
FNC_TblOpColumnDef_t *
colDefs
IN
A list of column definitions that may occur in a WHERE
clause by the foreign server.
For more information about the
FNC_TblOpColumnDef_t structure, see SQL External
Routine Programming.
ServerType
serverType
IN
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
ServerType is defined as follows:
typedef enum
{
ANSISQL = 1,
HADOOP = 2
} serverType_et;
101
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpGetExternalQuery / getExternalQuery
Parameter
Type
Description
typedef int ServerType;
•
•
ExtOpSetType
opSet
IN
If ANSI SQL, the entire subquery for the table of the
foreign server is returned.
If HADOOP, only the WHERE clause portion of the
query is returned.
A set of valid operators supported on the foreign server.
ExOpSetType is defined as follows:
typedef enum
{
Eq_ET,
Ne_ET,
Gt_ET,
Le_ET,
Lt_ET,
And_ET,
Or_ET,
Not_ET,
Between_ET,
In_ET,
NotIn_ET,
Ge_ET,
Like_ET
} extoptype_et;
typedef BYTE ExtOpType;
typedef unsigned int ExtOpSet;
typedef struct ExtOpSetType {
ExtOpSet ExtOpSetList;
} ExtOpSetType;
int *
interfaceVersion
IN/OUT
A pointer to the interface version:
• The caller passes in the desired interface version as
the argument.
• The routine returns the actual interface version that is
currently supported.
unsigned char **
extQryPtr
OUT
A pointer to the generated text query string for the
foreign server. The query string is null-terminated.
unsigned int *
extQryLenPtr
OUT
A pointer to the length of the external query (in bytes).
Java Signature
Defined in RuntimeContract class:
102
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpGetExternalQuery / getExternalQuery
public String getExternalQuery(
ColumnDefinition[] colDefs,
ServerType
serverType,
ExtOpSetType[]
extOpSetTypes,
int[]
interfaceVersions)
throws SQLException
The parameters are similar to those for the C routine.
The Java enum classes are defined as follows:
public enum ServerType {
ANSISQL(0),
HADOOP(1);
}
public enum ExtOpSetType {
Eq_ET(0),
Ne_ET(1),
Gt_ET(2),
Le_ET(3),
Lt_ET(4),
And_ET(5),
Or_ET(6),
Not_ET(7),
Between_ET(8),
In_ET(9),
NotIn_ET(10),
Ge_ET(11),
Like_ET(12),
LastOp_ET(13);
}
The method returns a string which contains the external query.
Example: Calling getExternalQuery
ServerType sType = ServerType.ANSISQL;
ExtOpSetType extOpTypes[] = new ExtOpSetType[3];
extOpTypes[0] = ExtOpSetType.Eq_ET;
extOpTypes[1] = ExtOpSetType.And_ET;
extOpTypes[2] = ExtOpSetType.Or_ET;
int[] versions = new int[2];
versions[0] = 1; // The caller passes in the desired interface version.
String extQuery = contract.getExternalQuery(colDefs, sType,
extOpTypes, versions);
After calling getExternalQuery, versions[1] will contain the actual interface version that is
currently supported on the system.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
103
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpGetInnerContract / getInnerContractCtx
Usage Notes
This routine is callable on a PE vproc only by a table operator.
Note: The C routine, FNC_TblOpGetExternalQuery, calls FNC_malloc to allocate memory
for the buffer specified by *extQryPtr. Unless the routine returns *extQryPtr as NULL, you
must use FNC_free to free the allocated memory after processing the data.
FNC_TblOpGetInnerContract /
getInnerContractCtx
Purpose
Gets the contract definition of a nested inner table operator for the outer table operator to
use.
C Signature
void
FNC_TblOpGetInnerContract(void
int
**innerContract,
*contractLen)
Parameter
Type
Description
void **
innerContract
IN/OUT
Input argument: Identifies the buffer which will hold the
contract definition information.
Return value:
• The contract definition of the inner table operator.
• NULL, if the inner contract function does not exist.
int *
contractLen
OUT
The length of the contract definition.
Java Signature
Defined in RuntimeContract class:
public byte[] getInnerContractCtx()
throws SQLException
Usage Notes
This routine is callable on a PE vproc only by a table operator.
Note: The C routine, FNC_TblOpGetInnerContract, calls FNC_malloc to allocate memory
for the buffer specified by *innerContract . Unless the routine returns *innerContract as
NULL, you must use FNC_free to free the allocated memory after processing the data.
104
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpSetContractDef
FNC_TblOpSetContractDef
Purpose
Sets an opaque binary string value that the contract function passes to the associated table
operator at execution time. This string is referred to as the contract function context.
This routine can be used to set the contract definition at different phases for the table
operator.
For details about this routine, see SQL External Routine Programming, B035-1147.
FNC_TblOpSetDisplayLength / setDisplayLength
Purpose
Resets the lengths in column definitions for VARCHAR data types.
C Signature
void
FNC_TblOpSetDisplayLength(Stream_Direction_en
FNC_TblOpColumnDef_t
direction,
*colDefs)
Parameter
Type
Description
Stream_Direction_en
direction
IN
Stream_Direction_en is defined as follows:
typedef enum
{
ISOUTPUT = 'W',
ISINPUT = 'R'
} Stream_Direction_en;
Specify the input value ISINPUT for export and the value
ISOUTPUT for import.
FNC_TblOpColumnDef_t *
colDefs
IN/OUT
A pointer to the column definitions which will be
returned with the modified display lengths.
For more information about the
FNC_TblOpColumnDef_t structure, see SQL External
Routine Programming.
Java Signature
Defined in RuntimeContract class:
public void setDisplayLength(char direction,
ColumnDefinition[] colDefs)
throws SQLException
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
105
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpSetExplainText / setExplainText
Parameter
Type
Description
char
direction
IN
Specify an input value of 'R' for export and 'W' for
import.
ColumnDefinition[]
colDefs
IN/OUT
The column definitions for which the display lengths will
be reset.
Usage Notes
This routine can be invoked for both import and export operations.
The routine is callable on a PE vproc only by a table operator.
FNC_TblOpSetExplainText / setExplainText
Purpose
Sets the EXPLAIN text when the table operator has the hexplain custom clause set.
C Signature
void
FNC_TblOpSetExplainText(int
char
int
numOfTexts,
**arrayOfTexts,
*arrayOfLens);
Parameter
Type
Description
int
numOfTexts
IN
The number of EXPLAIN text strings.
char **
arrayOfText s
IN
An array containing the EXPLAIN text strings.
int *
arrayOfLens
IN
An array containing the lengths of each EXPLAIN text
string.
Java Signature
Defined in RuntimeContract class:
public void setExplainText(String[] texts);
Usage Notes
Hexplain has the following values for the type of EXPLAIN to be completed:
• 1 = simple
• 2 = verbose
• 3 = DBQL
106
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_ TblOpSetFormat / setFormat
This routine accepts multiple self-contained EXPLAIN text strings as input in order to
handle a multi-row EXPLAIN plan from a foreign server. The routine provides the
EXPLAIN plan to the parser which will display the multiple lines of the EXPLAIN plan.
This routine is callable on a PE vproc only by a table operator.
FNC_ TblOpSetFormat / setFormat
Purpose
Sets attributes of the format of the input and output streams. This allows the contract
function to specify the format of the data types to the parser.
C Signature
void
FNC_TblOpSetFormat(char
int
Stream_Direction_en
void
int
*attribute,
streamno,
direction,
*inValue,
inSize);
Parameter
Type
Description
char *
attribute
IN
The format attribute to be set.
Valid attributes are:
• "RECFMT"
• "TZTYPE"
• "CHARSETFMT"
• "REPUNSPTCHR"
"CHARSETFMT" and "REPUNSPTCHR" apply only to
import table operators.
int
streamno
IN
The stream number.
Stream_Direction_en
direction
IN
The stream direction: 'R' or 'W'.
Stream_Direction_en is defined as follows:
typedef enum
{
ISOUTPUT = 'W',
ISINPUT = 'R'
} Stream_Direction_en;
void *
inValue
IN
The location of the new value of the format attribute.
int
inSize
IN
The size in bytes of the new value pointed by inValue.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
107
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_ TblOpSetFormat / setFormat
Java Signature
Defined in RuntimeContract class:
public void setFormat(
int stream,
InputInfo.StreamDir dir,
java.util.Map<StreamFormat.FormatAttribute,java.lang.Object> formatattributes)
Parameter
Type
Definition
stream
IN
Indicates the stream on which the format will be applied.
Currently the only valid value is 0.
dir
IN
The direction of the stream (input or output).
formatattributes
IN
Map of attribute values to apply.
This method defines the attributes for formatting the stream. This is applicable for input and
output streams.
For information about the InputInfo and StreamFormat classes, see SQL External Routine
Programming, B035-1147.
Usage Notes
• This routine is valid only when called within the contract function of a table operator.
• For "RECFMT" the default value is INDICFMT1, where the format is IndicData with row
separator sentinels. When the format attribute is "RECFMT", the inValue buffer should
have a value of type Stream_Fmt_en. All field-level formats impact the entire record.
• If data being imported from a foreign server contains characters unsupported by Teradata
Database, you must use FNC_ TblOpSetFormat / setFormat and explicitly set
"CHARSETFMT" and "REPUNSPTCHR" attributes.
Format Attribute Values
108
Format Attribute
Description
"RECFMT"
Defines the record format. When the format
attribute is "RECFMT", the inValue buffer should
have a value of type Stream_Fmt_en. The
Stream_Fmt_en enumeration is defined in
int/sql/sqltypes_td.h with the following values:
• INDICFMT1 = 1
IndicData with row separator sentinels.
• INDICBUFFMT1 = 2
IndicData with NO row or partition separator
sentinels.
"TZTYPE"
Used as an indicator to Teradata Database to
receive from or send TIME/TIMESTAMP data to
the table operator in different format.
• RAW = 0 as stored on the Teradata Database
file system
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_ TblOpSetFormat / setFormat
Format Attribute
"CHARSETFMT"
Description
•
UTC = 1 as UTC
•
EVLDBC
Signals that neither data conversion nor
detection is needed.
EVLUTF16CHARSET
Signals that the external data to be imported
into Teradata Database are in UTF16
encoding.
EVLUTF8CHARSET
Signals that the external data to be imported
into Teradata Database are in UTF8 encoding.
•
•
"REPUNSPTCHR"
A boolean value that specifies what to do when
an unsupported unicode character is detected in
the external data to be imported into Teradata
Database.
• true
Replaces the unsupported character with U
+FFFD.
• false
Return an error when an unsupported
character is detected. This is the default
behavior.
Importing and Exporting TIME/TIMESTAMP Data
You can map the Teradata Database TIME and TIMESTAMP data types to the Hadoop
STRING or the Oracle TIMESTAMP data type when importing or exporting data to these
foreign servers.
The table operator can use FNC_TblOpSetFormat to set the tztype attribute as an indicator
to Teradata Database to receive from or send TIMESTAMP data to the table operator in a
native but adjusted format.
The tztype attribute is set as follows for the import and export operators:
• For Hadoop, the attribute is set to UTC.
• For Oracle, the attribute is set to UTC.
If the transform is off, the data will be transferred in Raw form which is the default for table
operators and is consistent with standard UDFs.
tztype is a member of the structure FNC_FmtConfig_t defined in fnctypes.h as follows:
typedef struct
{
int Stream_Fmt_en recordfmt; //enum - indicdata, fastload binary, delimited
bool inlinelob;
//inline or deferred
bool UDTTransformsOff;
//true or false
bool PDTTransformsOff;
//true or false
bool ArrayTransformsOff;
//true or false
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
109
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpSetHashByDef / setHashBy
char auxinfo[128]; //For delimited text can contain the record separator, delimiter
//specification and the field enclosure characters
double inperc;
//recommended percentage of buffer devoted to input rows
bool inputnames;
//send input column names to step
bool outputnames; //send output column names to step
TZType_en tztype; //enum - Raw or UTC
int charsetfmt; // charset format of data to be imported into TD through QG
bool replUnsprtedUniChar; /* true - replace unsupported unicode character
encountered with U+FFFD when data is imported
into TD through QG
false - error out when unsupported unicode
char encountered */
} FNC_FmtConfig_t;
TZType_en is defined as follows:
typedef enum
{
Raw = 0,
UTC = 1,
} TZType_en;
/* as stored on TD File system */
/* as UTC */
For export, FNC_TblOpSetInputColTypes or setInputInfo is called during the contract phase
in the resolver and will use the tztype attribute to add the desired cast to the input TIME or
TIMESTAMP column types.
Teradata Database converts the TIME and TIMESTAMP data to the session local time before
casting to the character type, so when a TIME or TIMESTAMP column is being mapped to
charfix/charvar as when mapping to the Hadoop STRING type, the data will transmit in
session local time zone and no explicit casts are needed.
For import, when getting the input buffer from the table operator, TIME or TIMESTAMP
data have to be converted to Raw form. There is no conversion needed for the import of
Hadoop Strings to Teradata Database TIME or TIMESTAMP data types since it follows the
normal conversion path from character to TIME/TIMESTAMP in Teradata Database.
Note: Teradata does not recommend importing or exporting TIME/TIMESTAMP data for a
Teradata Database system with timedatewzcontrol flag 57 = 0. For such systems, the TIME/
TIMESTAMP data is stored in OS local time. The System/Session time zone is not set and
Teradata Database does not apply any conversions on TIME/TIMESTAMP data when
reading or writing from disk. Therefore, exporting such data reliably in the format desired by
the foreign server is a problem and Teradata recommends that the Teradata-to-Hadoop
connector feature not be used for such systems.
FNC_TblOpSetHashByDef / setHashBy
Purpose
Allows the contract function writer to set the HASH BY specification when developing table
operators.
110
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpSetInputColTypes / setInputInfo
C Signature
void
FNC_TblOpSetHashByDef(int
FNC_Names_t
streamno,
*colNames);
Parameter
Type
Description
int
streamno
IN
The input stream number.
FNC_Names_t *
colNames
IN
A pointer to the HASH BY metadata.
FNC_Names_t is defined as follows:
typedef struct
{
int number; // number of column names
names_t names[1]; // array of column names
} FNC_Names_t;
names_t is defined as follows:
typedef CHARACTER names_t[FNC_MAXNAMELEN_EON];
Java Signature
Defined in RuntimeContract class:
public void setHashBy(int streamno, String[] colNames)
throws SQLException
Usage Notes
This routine can only run if called from the contract function. It is callable on a PE vproc.
The routine will produce an error if the stream number is invalid or the HASH BY metadata
was already set.
FNC_TblOpSetInputColTypes / setInputInfo
Purpose
Sets casting statements on the input columns so that the data types are cast as indicated by
the caller.
C Signature
void
FNC_TblOpSetInputColTypes(int
FNC_TblOpColumnDef_t
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
streamNo,
*colDefs)
111
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpSetLocalOrderByDef / setOrderBy
Parameter
Type
Description
int
streamNo
IN
The input stream number.
FNC_TblOpColumnDef_t *
colDefs
IN
A list of column definitions.
For more information about the
FNC_TblOpColumnDef_t structure, see SQL External
Routine Programming, B035-1147.
Java Signature
Defined in RuntimeContract class:
public void setInputInfo(int streamNo, ColumnDefinition[] colDefs)
throws SQLException
Usage Notes
This routine is callable on a PE vproc only by a table operator.
FNC_TblOpSetLocalOrderByDef / setOrderBy
Purpose
Allows the contract function writer to set the ordering specification when developing table
operators.
C Signature
void
FNC_TblOpSetLocalOrderByDef(int
FNC_Names_Ord_t
Parameter
Type
Description
int
streamno
IN
The input stream number.
FNC_Names_Ord_t *
colNames
IN
A pointer to the LOCAL ORDER BY metadata.
streamno,
*colNames);
FNC_Names_Ord_t is defined as follows:
typedef struct
{
int number;
names_ord_t col[1];
} FNC_Names_Ord_t;
// number of column names
// array of name-order-nulltype triplets
names_ord_t is defined as follows:
112
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpSetLocalOrderByDef / setOrderBy
Parameter
Type
Description
typedef struct
{
byte direction;
// 'A'=ascending or 'D'=descending
byte nullspec;
// 'F'=nulls First or 'L'=nulls Last
CHARACTER name[FNC_MAXNAMELEN_EON];
// column name
} names_ord_t;
Java Signature
Defined in RuntimeContract class:
public void setOrderBy(int streamno, String[] colNames)
throws SQLException
Usage Notes
This routine can only run if called from the contract function. It is callable on a PE vproc.
The routine will produce an error if the stream number is invalid or the LOCAL ORDER BY
metadata was already set.
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide
113
Appendix B FNC Interfaces for Teradata QueryGrid: Teradata Database-to-Hadoop
FNC_TblOpSetLocalOrderByDef / setOrderBy
114
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide