Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Teradata Architecture and Components Last Updated : 27th April 2004 Center of Excellence Data Warehousing Group Teradata Components & Architecture Network attached system Client Req. Client Req Channel attached system CLI MTDP CLI TDP MOSI Channel Adapter Bus Adapter Ethernet Adapter P Gateway S/W Parser Engine Parser Engine Message Passing Layer D AMP AMP AMP AMP Vdisk Vdisk Vdisk Vdisk E Teradata Components & Architecture Call Level Interface (CLI) is the lowest level interface to Teradata RDBMS. Create Sessions Allocate Requests & Response Buffers Create “Parcels” Fetch Responses CLI routines are similar in both channel attached and network attached environment. Host Channel Adapter – Hardware component used to connect Mainframe with Teradata box. Teradata Components & Architecture TDP – Component for Channel attached systems manages session traffic between CLI and Teradata DBS. Functions include session initiation & termination, logging, verification , providing physical input to and take output from PEs and maintain request queues. Does session balancing. Micro Teradata Director Program (MTDP) is a component of Network Attached Clients takes care of session related issues. Micro Operating System Interface (MOSI) provides OS independent interface. Parser Engine Parser & Resolver components check for Proper SQL Syntax Access permission for requested database objects Existence of requested database objects If requestor does not have appropriate access permission or database object does not exist, then error message is returned to the requestor Optimizer component takes error-free input from parser and performs the following tasks for query optimization Restructure the SQL statement to make it more efficient Create join plan and access plan for the query Compiles these plans into machine code Generator component takes the access plan produced by Optimizer and converts it into set of discrete tasks for database manager, software manager to perform. These tasks are referred to as AMP steps. The hashing algorithm is also component of Generator. Each AMP step is passed to the appropriate AMP over the BYNET (Message Passing Layer). Message Passing Layer Combination of PDE, BYNET s/w and BYNET h/w for MPP systems. Handles the internal communication of the Teradata DBMS. PEs and AMPs communicate via Message Passing Layer. MPL responsible for merging answer set back to the PE. AMP Virtual processors run under a multitasking environment. Responsible for controlling some portion of each table on the system. Finding the rows requested, sorting rows, aggregating columns, join processing, data conversions, disk space management etc. PDE A software interface layer on top of the operating system that enables the RDBMS to operate in a parallel environment. PDE provides ability to : Execute vprocs. Run the Teradata RDBMS in a parallel environment. Teradata File System Teradata File System is a layer between Teradata RDBMS and PDE. Provides a set of service calls that allows Teradata RDBMS to store and retrieve data efficiently. Block is the physical I/O unit for the Teradata File System. A data block contains one or more rows of the same table. Data blocks are stored in physical disk space units called sectors which are logically grouped together in cylinders. Data Distribution There is no concept of table space. Rows are distributed randomly across all AMPs. Primary index value is fed to the Hashing Algorithm which produces row hash. Row hash is applied to the Hash Map which determined the target AMP. Data Distribution (Cont..) Primary Index Value Hashing Algorithm Row hash(32 bit) Hash Maps BYNET AMP AMP AMP AMP Data Distribution (Cont..) Hash maps determine which AMP gets a row. They are part of Communication Layer Interface. Hash Map uses first 16 bit of Row Hash (also called Destination Selection Word) to determine Hash Bucket value . Each Hash Bucket is assigned to one AMP. Row ID Row Hash is not sufficient to identify a specific row in a table. Multiple row can have same Row Hash either due to Hash Synonyms or NUPI. Row ID every row in a table uniquely identifiable. 32 Bit Row Hash 32 Bit Uniqueness Value Row ID Locating a Row PE 48 Bit Table ID 32 Bit Row Hash Value Index Value DSW AMP Number AMP File System Logical Block Identifier Logical Row Identifier Block Data Row Locating a Row (cont.) DSW part of the row hash fed to Hash Map which identifies the target AMP number. The AMP access its Master Index. Master Index identifies the Cylinder Index Cylinder Index identifies the data blocks. A search of data blocks locates the row. Locating a Row (cont.) SELECT * FROM employee WHERE empno = 32987; 32987 Hash Algorithm 32 Bit Row Hash Value 16 Bit DSW Remaining 16 Bit 0000 0000 0010 1001 1010 0000 0100 0100 2 H A S H M A P empno Dno Loc Supv 32980 3 EC 21098 32876 5 K1 31432 31654 3 M1 56782 31624 4 K2 54567 31354 8 MGR 21348 32987 9 M1 43567 31653 5 K2 56711 33654 1 M3 34579 34644 10 MGR 56712 9 0 1 2 3 4 5 6 7 8 9 A B C D 000 00 01 02 03 04 05 06 07 00 01 02 03 04 05 001 06 07 00 01 02 03 04 05 06 07 00 01 02 03 002 04 05 06 07 00 01 02 03 04 05 06 07 00 01 003 02 03 04 05 06 07 00 01 02 03 04 05 06 07 004 00 01 02 03 04 05 06 07 00 01 02 03 04 05 Locating a Row (cont.) Master Index Table Id + Row Hash Table Id + Row Hash + Cylinder # Row Hash + PI Value + Cylinder Index Target Row Data Blocks Data Block contains one or more row of the same table. Block sizes range between 512 and 130560 bytes. Blocks within an individual table can vary, file system adjust their sizes dynamically as required. System maintains rows within the block in logical ROW ID sequence. Tables involved in Data Warehouse and Decision Support usually have larger block size to accommodate more rows per block. Teradata Parallelism Each PE supports upto 120 sessions in parallel. Each sessions may handle multiple requests concurrently up to 16 requests. MPL design to avoid any bottleneck for the system. Each AMPs can handle up to 80 task in parallel. Teradata Parallelism (cont.) Multiple sessions can be established by a client utility to perform multiple tasks in parallel. Optimizer may concurrently perform more than one step on behalf of the same request. Teradata DBMS is supported by a set of parallel client tools to achieve optimum throughput. Teradata Parallelism (Cont..) Query Parallelism Within-a-Step Parallelism Multi-Step Parallelism Common sub-expression elimination. Query Parallelism (cont..) Query parallelism enabled by hash partitioning the data across all the AMPs defined in the system. An AMP provides all the database services on its allocation of data blocks. Table scans, index scans, projections, selections, joins, aggregations, sorts executes in parallel across all AMPs. Query Parallelism (within-a-step) Optimizer generates steps to execute a SQL request. A step is often a large chunk of multiple database operations. Multiple relational operations are processed in parallel by pipelining. For example, while a table scan is taking place, selected rows can be pipelined into join process. Query Parallelism (Multi-Step) Executing multiple steps of a query simultaneously across all unit of parallelism in the system. One or more processes are invoked for each step on each AMP to perform a database operation. Example All AMPs doing these steps in parallel 2.1 1.1 Result-set for AMP 1 4 Match rows 3 Match rows Scan Products Scan Join Orders & 1.2 OrderDetails Customers Multi-Step Parallelism 2.2 Within-a-Step Parallelism Thee operations are pipelined Scan Orders Scan Customers Join Order and Customers Intra-node Parallelism LAN Gateway PE1 Channel Adupter PE2 AMP1 AMP4 AMP5 AMP7 AMP2 AMP3 AMP6 AMP8 Parallel Database Extension Massively Parallel Processing MPP SMP Node PE PE AMP AMP AMP AMP Clique MPP Massively Parallel Processing Multiple SMP nodes are connected together to form a MPP system. The inter node connecting layer is called BYNET. BYNET is a combination of h/w and s/w that allows multiple nodes to communicate with each other. BYNET supports up to 512 SMP nodes. Availability AMP Clustering and Fallback Cliques RAID 1/RAID 5 Availability - AMP Clustering and Fallback A cluster is a group of AMPs. The fallback copy of a row resides in AMP of the same cluster. Data rows are accessible even if one AMP is down. Two AMPs down in the same cluster halt the database. Availability - Cliques Group of nodes that share same disk array. Allows the system to continue operation even if there is a node failure. In case of a node failure vprocs migrate to other nodes in the same clique. Availability - BYNET for MPP A redundant, bi-directional interconnect network allows multiple SMP nodes to communicate. Dual BYNET provide fault tolerance. Scalable bandwidth as nodes are added. Supports point-to-point and broadcast communications. Availability (Cont..) Vproc Migration Failure of a node does not halt the database system. All the processing units migrates to the other nodes in the clique. Redundant BYNET Interconnect. Fallback Replicates a copy of data on physically isolated disks spread among several cliques. Makes data available if a processing unit is down or even if a node is down. RAID 1 / RAID 5 Linear Scalability Shared nothing MPP platform Software can scale linearly with hardware Parallel unit Vprocs act as self contained mini DBMS. AMPs do sorting, locking, journaling, loading, backup, recovery functions independently. Adding AMPs/PEs/Nodes to the system improve the performance linearly. Eg. Doubling the number of nodes will also double the query execution speed. Linear Scalability Linear Scalability Teradata Total Work Accompanished Non-Linear Scalability Traditional Transaction Processing Systems Increasing CPU power Questions ?