Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Postgres Conference HangZhou, China Postgres-XC/XL Scale-out Approach in PostgreSQL July 25th, 2015 NTT DATA INTELLILINK Corporation Koichi Suzuki Copyright © 2015 NTT DATA INTELLILINK Corporation Introduction Copyright © 2015 NTT DATA INTELLILINK Corporation 2 About the Speaker ● ● Fellow at NTT DATA Intellilink Corporation Principal, Technology Professionals at NTT DATA Group In Charge Of ● ● ● General Database Technology Database in huge data warehouse and its design PostgreSQL and its cluster technology In The Past ● ● ● ● Character Set Standard (Extended Unix Code, Unicode, etc) Heisei-font development (Technical Committee) Oracle Porting Object-Relational Database Copyright © 2015 NTT DATA INTELLILINK Corporation 3 Motivation ● ● ● ● Growing Database Workload both in OLTP (OnLine Transaction Processing) and OLAP (OnLine Analytical Processing) applications. Shared-Nothing Approach ● Performance with commodity hardware/software Extension to existing PostgreSQL Transparent API ● Internal API could be different ● ● Transparent libpq Interface No significant restriction to transaction ACID properties and SQL language. Copyright © 2015 NTT DATA INTELLILINK Corporation 4 Scale-out approach ● Distribution/Replication of table rows among different database “nodes” ● Parallelism ● Local join operation ● SQL planning for row distribution/replication ● Consistent and synchronous transaction management among “nodes” ● Performance with commodity hardware/software Copyright © 2015 NTT DATA INTELLILINK Corporation 5 Read Scale-out in PostgreSQL Master/Slave Read/Write Transactions Possible time delay Read-only Transactions Master Slave WAL (or Redo Log) Copyright © 2015 NTT DATA INTELLILINK Corporation 6 Scaling Out in Postgres XL/XC Read/Write Transactions No Delay in Update Visibility Local Disk Local Disk Local Disk Local Disk Backend Transaction Synchronization Copyright © 2015 NTT DATA INTELLILINK Corporation 7 OLTP Workload Scalability and Table Design Copyright © 2015 NTT DATA INTELLILINK Corporation 8 DBT-1 Workload Scalability DBT-1 (Rev) Copyright © 2015 NTT DATA INTELLILINK Corporation 9 Table Design in DBT-1 Benchmark CUSTOMER ORDERS ORDER_LINE ITEM SHOPPING_CART C_ID C_UNAME C_PASSWD C_FNAME C_LNAME C_ADDR_ID C_PHONE C_EMAIL C_SINCE C_LAST_VISIT C_LOGIN C_EXPIRATION C_DISCOUNT C_BALANCE C_YTD_PMT C_BIRTHDATE C_DATA O_ID O_C_ID O_DATE O_SUB_TOTAL O_TAX O_TOTAL O_SHIP_TYPE O_BILL_ADDR_ID O_SHIP_ADDR_ID O_STATUS OL_ID OL_O_ID OL_I_ID OL_QTY OL_DISCOUNT OL_COMMENTS OL_C_ID I_ID I_TITLE I_A_ID I_PUB_DATE I_PUBLISHER I_SUBJECT I_DESC I_RELATED1 I_RELATED2 I_RELATED3 I_RELATED4 I_RELATED5 I_THUMBNAIL I_IMAGE I_SRP I_COST I_AVAIL I_ISBN I_PAGE I_BACKING I_DIMENASIONS SC_ID SC_C_ID SC_DATE SC_SUB_TOTAL SC_TAX SC_SHIPPING_COST SC_TOTAL SC_C_FNAME SC_C_LNAME SC_C>DISCOUNT CC_XACTS CX_I_ID CX_TYPE CX_NUM CX_NAME CX_EXPIRY CX_AUTH_ID CX_XACT_AMT CX_XACT_DATE CX_CO_ID CX_C_ID Distributed with Customer ID Distributed with Shopping Cart ID SHOPPING_CART_LINE ADDRESS SCL_SC_ID SCL_I_ID SCL_QTY SCL_COST SCL_SRP SCL_TITLE SCL_BACKING SCL_C_ID STOCK ADDR_ID ADDR_STREET1 ADDR_STREET2 ADDR_CITY ADDR_STATE ADDR_ZIP ADDR_CO_ID ADDR_C_ID ST_I_ID ST_STOCK Replicated COUNTRY AUTHOR CO_ID CO_NAME CO_EXCHANGE CO_CURRENCY OL_ID OL_O_ID OL_I_ID OL_QTY OL_DISCOUNT OL_COMMENTS OL_C_ID Copyright © 2015 NTT DATA INTELLILINK Corporation Distributed with ItemID 10 MPP Performance – DBT-3 (TPC-H) By courtesy of Mason Sharp, Postgres-XL leader Copyright © 2015 NTT DATA INTELLILINK Corporation 11 Scale Out Approach (1): Table Distribution/Replication Categorize tables into two groups: Large and frequently-updated tables → Distribute rows among nodes (Distributed Tables) → Based on a column value (distribution key) → Hash, modulo or round-robin → Parallelism among transactions (OLTP) or in SQL processing (OLAP) Smaller and stable tables → Replicate among nodes (Replicated Tables) → Join Pushdown Avoid joins between Distributed Tables with join keys different from distribution key as possible. Copyright © 2015 NTT DATA INTELLILINK Corporation 12 Scale Out Table Design in DBT-1 Three distribution keys: ● Customer ID ● Shopping Cart ID ● Item ID Some transactions involve joins across distributed tables with non-distribution join keys. Copyright © 2015 NTT DATA INTELLILINK Corporation 13 Some More in XL/XC Node Configuration Copyright © 2015 NTT DATA INTELLILINK Corporation 14 Node Configuration: Two-Tier Approach Coordinator: ● Maintains global catalog information ● Build global SQL plan and SQL statements for datanodes ● Interact with datanode to execute local SQL statements and accumulate the result Datanode ● ● Maintains actual data (local data) Run local SQL statement from Coordinator (In XL, datanode may ask other datanodes for their local data) Copyright © 2015 NTT DATA INTELLILINK Corporation 15 Coordinator and Datanode Read/Write Transactions Coordinator Datanode Copyright © 2015 NTT DATA INTELLILINK Corporation 16 Node Configuration: Yet Another Node: GTM GTM: Global Transaction Manager Synchronizes each node's transaction status Copyright © 2015 NTT DATA INTELLILINK Corporation 17 Why GTM? Two-Phase Commit Protocol doesn't work? Two-Phase Commit Protocol Does: ● Maintain database consistency in transactions updating more than one node. Two-Phase Commit Protocol Doesn't: ● Maintain Atomic Visibility of Updates to other transactions (next slide) Copyright © 2015 NTT DATA INTELLILINK Corporation 18 Atomic Visibility and GTM Node B Node A TXN 1 Updates A and B Inconsistent Read! Prepares A and B TXN 2 Reads B and gets old value Commits A and B Reads A and gets new value GTM monitors TXN activity and make new value available at this timing. Copyright © 2015 NTT DATA INTELLILINK Corporation 19 Final Configuration: GTM, Coordinator and Datanode Read/Write Transactions Coordinator GTM Datanode Copyright © 2015 NTT DATA INTELLILINK Corporation 20 Configuration in Practice Just like configuring many database servers to talk each other ● Many pitfalls ● Pgxc_ctl provides simpler way to configure the whole cluster ● ● – Provide only needed parameters Pgxc_ctl will do the rest to issue needed commands and SQL statements. Visit http://sourceforge.net/p/postgres-xc/xc-wiki/PGOpen2013_Postgres_Open_2013/ Copyright © 2015 NTT DATA INTELLILINK Corporation 21 Scalability in OLTP Workloads Copyright © 2015 NTT DATA INTELLILINK Corporation 22 OLTP Workload Characteristics Number of Transactions: Many Number of Involved Table Rows: Small Locality of Row Allocation: High Update Frequency: High Copyright © 2015 NTT DATA INTELLILINK Corporation 23 Scaling Out OLTP Workload Read/Write Transactions Run Transactions in Parallel Coordinator GTM High workload Datanode Copyright © 2015 NTT DATA INTELLILINK Corporation 24 Scalability in OLAP (Analytic) Workloads Copyright © 2015 NTT DATA INTELLILINK Corporation 25 OLAP Workload Characteristics Number of Transactions: Small Number of Involved Table Rows: Huge Locality of Row Allocation: Low Update Frequency: Low Copyright © 2015 NTT DATA INTELLILINK Corporation 26 Scaling Out OLAP Workload SQL Coordinator Top level aggregation May need less coordinators GTM Low workload Datanode Run Small Local SQLs for each Datanode in Parallel Copyright © 2015 NTT DATA INTELLILINK Corporation 27 Join Offloading Copyright © 2015 NTT DATA INTELLILINK Corporation 28 Join Offloading: When row allocation is available ● Replicated Table and Partitioned Table – Can determine which datanode to go from WHERE clause Copyright © 2015 NTT DATA INTELLILINK Corporation 29 Join Offloading: When row allocation is available ● Replicated Table and Partitioned Table – When the coordinator cannot determine which datanode to go from WHERE clause Copyright © 2015 NTT DATA INTELLILINK Corporation 30 Parallel Aggregation Copyright © 2015 NTT DATA INTELLILINK Corporation 31 Aggregate Functions in PostgreSQL Finalize Function Copyright © 2015 NTT DATA INTELLILINK Corporation State Transition Function 32 Aggregate Functions in Postgres-XC/XL Finalize Function AVG ← (Sum, Count) (Sum, Count) Collector Function State Transition State Transition State Transition Function Function Function Datanode Coordinator Similar to Map Reduce! Copyright © 2015 NTT DATA INTELLILINK Corporation 33 Specific statements ● CREATE BARRIER – Synchronize all node's WAL for restoration. ● CREATE|ALTER|DROP NODE – Maintenance of cluster node ● Caution! – not automatically propagated. Issue to each coordinator. ● CREATE/DROP NODE GROUP – Alias for group of node ● EXECUTE DIRECT – Run SQL locally – Read operation only ● If you are superuser, turn xc_maitenance_mode to on by set statement to allow write operations. ● You must be responsible to any inconsistencies and side effects! Copyright © 2015 NTT DATA INTELLILINK Corporation 34 Specific catalogs ● pgxc_class – Definition of table distribution ● pgxc_node – Postgres-XC node information ● pgxc_group – Node group Copyright © 2015 NTT DATA INTELLILINK Corporation 35 Specific functions ● pgxc_version() – Show version ● pgxc_pool_check() – Check if connection pooler is consistent with pgxc_node catalog. ● pgxc_pool_reload – ● Reload cached connection data and synchronize pooler connection information with pgxc_node. pgxc_lock_for_backup – Only for adding new nodes. – Locks DDL execution to make catalog stable for backup and copy to new node. Copyright © 2015 NTT DATA INTELLILINK Corporation 36 Specific statements, catalogues, functions and parameters http://postgres-x2.github.io/reference/1.2/html/sql-commands.html for details Copyright © 2015 NTT DATA INTELLILINK Corporation 37 Specific parameters (planner parameters not included) ● gtm_backup_barrier (bool) – Enable CREATE BARRIER statement. ● persistent_datanode_connections (bool) – If “true”, session never releases connections. ● xc_maintenance_mode ● – Enable write operation in “EXECUTE DIRECT” statement. – Only allowed to root users. min_pool_size – ● max_pool_size – ● Max pooled connection size. pooler_port – ● Threashold for pooler to create new connection. Port number for the pooler (pgxc_ctl takes care of it) gtm_port – GTM port number (pgxc_ctl takes care of it) Copyright © 2015 NTT DATA INTELLILINK Corporation 38 Specific parameters (cont.) ● ● ● ● max_datanodes max_coordinators pgxcnode_cancel_delay – Timeout to wait cancel operation in millisconds. – Mainly for automatic test. gtm_host – ● GTM host name/IP address. Pgxc_ctl takes care of this. pgxc_node_name – Node name of the self. Pgxc_ctl takes care of this. Copyright © 2015 NTT DATA INTELLILINK Corporation 39 Community status and future Copyright © 2015 NTT DATA INTELLILINK Corporation 40 Specific statements ● CREATE/DROP NODE GROUP – Alias for group of node ● Unified again? Copyright © 2015 NTT DATA INTELLILINK Corporation 41 XC and XL community ● ● Postgres-XC is the original community – Based upon PostgreSQL 9.3 – Tested more for OLPT workload – Now community activity as Postgres-X2 – Stabilization ● Participated by many Chinese engineers ● Next minor release are planned in this August Postgres-XL was became separate community for more product-oriented and better stability – Based upon PostgreSQL 9.2 – Shares most of XC code base – Tested more for OLAP workload ● ● Direct data capture between datanodes – Provide many fixes. Most of them apply to XL as well – Just finished merge with Postgres 9.5 alfa Unified again? Copyright © 2015 NTT DATA INTELLILINK Corporation 42 Product status ● Source code inherits all the PostgreSQL repository (at some point) ● Fundamental features are all available ● – Global transaction management – SQL statements – Utilities Further challenges – Subtransaction (needed for full function support) – Catching up PostgreSQL (needed?) – Copyright © 2015 NTT DATA INTELLILINK Corporation 43 XC and XL community ● ● Both communities need much more resource to move forward – Developer – Tester – Real workload Now several Chinese farms are working together. – Much more active members are welcome! Copyright © 2015 NTT DATA INTELLILINK Corporation 44 XC and XL community ● ● Both communities need much more resource to move forward – Developer – Tester – Real workload Now several Chinese farms are working together. – Much more active members are welcome! Copyright © 2015 NTT DATA INTELLILINK Corporation 45 XC and XL community sites Postgres-XC https://github.com/postgres-x2 https://postgres-x2.github.io https://groups.google.com/forum/#!forum/postgres-x2-dev https://groups.google.com/forum/#!forum/postgres-x2-general [email protected] [email protected] Postgres-XL http://www.postgres-xl.org/ Copyright © 2015 NTT DATA INTELLILINK Corporation 46 Configuring Postgres-XC Copyright © 2015 NTT DATA INTELLILINK Corporation 47 Pgxc_ctl ● ● Postgres-XC contrib module Postgers-XC configuration and operation tool – A kind of Postres-XC shell – Builtin commands – Can invoke any bash commands ● ● ● ● ● Does not expand $(variable). Simple configuration Avoid many pitfalls in manual configuration and operation Bash-based configuration file You can write your favorite bash-script for your configuration Copyright © 2015 NTT DATA INTELLILINK Corporation 48 Pgxc_ctl builtin commands (major ones) ● prepare – Creates configuration file template ● deploy – Deploys postgres-xc binaries to necessary nodes ● Init [all] – Initialize postgres-xc cluster ● Run initdb and initgtm at necessary nodes ● Do additional configuration ● Initialize node configuration ● Start/stop – Cluster and node start/stop ● Clean – Cleanup existing resource ● Monitor – See what node is running Copyright © 2015 NTT DATA INTELLILINK Corporation 49 Pgxc_ctl builtin commands (major ones) ● Createdb – Similar to createdb but select one coordinator to do it. ● Psql – ● Add – ● Similar to psql but select one coordinator or specify coordinator name to connect to. Add gtm_proxy, coordinator and datanode (master and slave) Remove – Remove gtm_proxy, coordinator and datanode (master and slave) Copyright © 2015 NTT DATA INTELLILINK Corporation 50 Demonstration Copyright © 2015 NTT DATA INTELLILINK Corporation 51 Copyright © 2015 NTT DATA INTELLILINK Corporation