Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
US006026394A Ulllted States Patent [19] [11] Patent Number: Tsuchida et al. [45] [54] SYSTEM AND METHOD FOR Date of Patent: 5,845,113 [751 _ _ OTHER PUBLICATIONS _ DeWitt et al., “Parallel Database Systems: The Future of Inventors? Masashl Tsuchlda, Sagamlhara; Yuk") Nakano, Yokohamag N?buo _ _ Feb. 15, 2000 12/1998 Swami et al. ............................. .. 707/7 IMPLEMENTING PARALLEL OPERATIONS IN A DATABASE MANAGEMENT SYSTEM _ 6,026,394 High Performance Database Systems,” Communications of the AMC, v. 35, N0. 6, 1992, pp. 85—98. Kawamul‘a, Sagamlhara; KZQHYOSPI DeWitt et al., “GAMMA—A High Performance Data?oW Neglshl? Yokohama; Shunlchl T0n1> Mllsashlno, an of Japan Database Machine”, Proceedings of the Twelfth Interna tional Conference on Very Large Databases, Kyoto, Aug. _ _ 1986, pp. 228—237. [73] Asslgnee: Hltachl’ Ltd” Toky0> Japan “Dynamic and Load—Balanced Task—Oriented Database [211 Appl- No-3 09/ 148,648 Query Processing in Parallel Systems,” Lu et al., Dept of Info Sys, National Univ. Of Singapore, Advances in Data . [22] _ base Technology, EDBT 1992, 357—372. Flled' Sep' 4’ 1998 Parallel Computer Architecture, vol. 21, No. 4, Mar. 1989. Related US‘ Application Data [63] Primary Examiner—Hosain T. Alam Continuation of application No. 08/810,527, Mar. 4, 1997, Attorney’ Agent’ Or Flrm—BeaH Law Of?ces Pat. No. 5,806,059, which is a continuation of application No. 08/180,674, Jan. 13, 1994, abandoned. [57] [30] Foreign Application Priority Data ABSTRACT A database' management system‘ for executing database operations in parallel by a plurality of nodes and a query Jan. 20, 1993 [51] Japan .................................. .. 5-007804 Int Cl 7 ' [52] [JP] processing method are described, The database management G06F 17/30 ' system contains a decision management node for deciding a _ _ ’ ’ 707/;>097/g(7)/17 ’ a _ processing procedure for processing the query, and execute Fleld Of Search .......................... 3, 7, 10, 395/200'31’ 709/200’ 201’ 205 the process, and ajoin node for Sorting, merging, and-joining the information retrieved by the distribution node. When the US. Cl. """"""""""""""""""" " _ ’ distribution node for retrieving information so as to analyZe , [56] d from an a ppl.1cat1on . p ro gram, g enerate a the decision management node retrieves the information to be processed and the join node decided by the decision _ 521353332 . receive query process is executed, the distribution node decided by References Clted Us PATENT DOCUMENTS 2,091,252 quer y management node also obtains the result for the query from 2/1992 Tsuchida et al. ......................... .. 7057/2 Du et a1 the retrieved information' The query result is Outputted from 37%;; 5:471:622 11/1995 Eadline ..................................... .. 707/3 5,806,059 9/1998 Tsuchida et al. ......................... .. 707/2 an output node and transferred to the application program. 31 Claims, 20 Drawing Sheets N 51 50 COLUMN VALUE FREQUENCY INFORMATION RELATED TO JOINING COLUMN F v1 v2 v3 v4 v5 V6 v7 V8 vs no N 200-1 NODE1 N 200~2 NODE2 200-9 NODE9 200-10 NODEIO DIVISION OF DATA: IF THE JOINING KEY HAS A GDLUMN 101 111 Ngggf?F/gs/gAQRIBUHON % % R 200-11 NODE11 221 231 % % /\/ VALUE FREQUENCY INFGRMATIDN, VALUE RANGES ARE MADE s0 THAT THE DATA ARE EQUALLY DISTRIBUTED T0 NODES 200-15 NODE15 DATA DISTRIBUTION INFORMATION 141 NODES FDR JOINING PROCESS L51 VALUE RANGE VH’Z va-v4 V5-V6 v7-vs v9~v10 NODE NUMBER 11 12 13 I4 15 52 U.S. Patent Feb. 15,2000 Sheet 1 0f 20 6,026,394 F / 6'. J /V APPLICATION PROGRAM 1 /V - - —— — APPLICATION PROGRAM N T I w” DATABASE MANAGEMENT SYSTEM 21 SYSTEM CONTROLLER W I LOGICAL PRocESSoR 22 220 /1/ QUERY ANALYSIS IL 221 STATIc OPTIMIZATION /V ‘L _'~>' CODE GENERATION 222 /I/ 223 DYNAIIIc OPTIMIZATION/V L 224 CODE INTERPRETER /1/ x v 23 /V PHYSICAL PRocESSoR EXCLUSIVE <__>| [V230 W232 DATA ACCESSING PRocESS I PROCESS @ (—’LMAPPING CONTROL 31 I<—-)HATABASE BUFFER CONTROL /'/Z L 4 DATABASE BUFFER OPERATING SYSTEM /v30 40 0 0 DATABASE DICTIONARY 5° U.S. Patent Feb. 15,2000 Sheet 2 0f 20 FIG. 2 TO ANOTHER SYSTEM 6,026,394 U.S. Patent 101\r Feb. 15,2000 111w l: 100 6,026,394 Sheet 4 0f 20 t nevi‘ DATA RETRIEVAL SLOT SORTING PROCESS RETRIEVAL PHASE DISTRIBUTION OF DATA 4 ‘L 180 N-WAY MERGE PROCESS MERGE PHASE _ I »/ 12O 181 JOIN PROCESS JOINING PHASE REQUESTED DATA OUTPUT U.S. Patent Feb. 15,2000 Sheet 7 0f 20 6,026,394 FIG. 7 DATA RETRIEVAL/DISTRIBUTION PROCESS SLOT SORTING PROCESS Er ‘3.02 300 310 303 304 301 l 2 3 4 5 -- 8 305 / NUMBER OF 00053 DATA RETRIEVAL NODE FOR 00m PROCESS N N-WAY MERGE PROCESS / NUMBER OF NOOES FOR JOIN PROCESS N JOIN PROCESS REQUESTED DATA OUTPUT PROCESSING TIME E 330 340 NUMBER OF NODES JOIN NOOES '1' FOR JOIN PROCESS NUMBER OF RECEIVING NOOES :PROCESSING TIME : FOR EACH PHASE U.S. Patent Feb. 15,2000 Sheet 8 0f 20 DATA RETRIEVAL/DISTRI BUTION PROCESS 6,026,394 SLOT SORTING PROCESS Er Es 30s N306” 307 A/Ejlax ‘A, 309 O O O \ 310 311/ 312 v 305 P’ 1 f’ 2 I" 3 I" 4 f’ 5 - - 8 300 301 302 303 304 DATA RETRIEVAL NUMBER OF NODES NODE FOR JOIN PROCESS N N-WAY MERGE PROCESS Em L NUMBER OF NODES FOR JOIN PROCESS N JOIN PROCESS E. J REDuEsTED DATA OUTPUT PROCESSING ME E 330 340 T350 NUMBER OF ASSIGNED JOIN NODES NUMBER OF NODES FOR JOIN PROCESS N 1 NUMBER OF RECEIVING NODES :PROCESSING TIME FOR EACH PHASE U.S. Patent Feb. 15,2000 Sheet 9 0f 20 DATA RETRIEVAL/DISTRIBUTION PROCESS Er 6,026,394 SLOT SORTING PROCESS Es ‘302 300 E_Max ' 310 4 311 303 30 305 301 312 1 Z 3 4 5 "3 DATA RETRIEVAL NUMBER OF NODES FOR JO I N PROCESS MODE NUMBER OF NODES FOR JOIN PROCESS N JOIN PROCESS E1 REQUESTED DATA \ OUTPUT PROCESSING THE E 340 330 350T T351 NUMBER OF ASSIGNED JOIN NODES NUMBER OF NODES 1' FOR JOIN PROCESS NUMBER OF N RECEIVING NODES |:] :PROCESSING TIME FOR EACH PHASE U.S. Patent Feb. 15,2000 Sheet 10 0f 20 DATA RETRIEVAL/DISTRI BUTION PROCESS Er 6,026,394 SLOT SORTING PROCESS Es §O2 300 -LI' E__Max 310 304 301 3 5 -- 8 DATA RETRIEVAL NUMBER OF NODES NODE FOR JOIN PROCESS N N-WAY MERGE PROCESS Em 320 NUMBER OF NODES Fol}V JOIN PRucEss JOIN PROCESS REQUESTED DATA OUTPUT PROCESSING TIME 5 340 T350 NUMBER OF ASSIGNED JOIN NODES NUMBER OF NODES 1 FOR JOIN PROCESS NUMBER OF N RECEIVING NODES :PROCESSING TIME FOR EACH PHASE U.S. Patent Feb. 15,2000 Sheet 11 0f 20 FIG. 1] (a) PROCESS FOR QUERY ANALYSIS N 220 QUERY ANALYSIS N 221 STATIC OPTINI ZATION PROCESS N 222 CODE GENERATION END H6‘. 11 (b) STATIC OPTIMIZATION PROCESS N 2210 PR CAT ES ATI ELECTIVITY v N 2211 ACCESS PATH PRUNI NG 2212 GENERATION OF PROCESSING PROCEDURE CANDIDATES END 6,026,394 U.S. Patent Feb. 15,2000 Sheet 12 0f 20 6,026,394 32st ww> Sam 2mw.o5m» 5“.6m%g z_5w<Em:15_m<> mw5oza.w;zo C2“520o05z321;86 moHN ozw U.S. Patent Feb. 15,2000 Sheet 13 0f 20 6,026,394 FIG. 11 (d) C PROCESS FOR ACCESS PATH PRUNllNG ) N 22120 DECIDE CANDIDATES OF COLUMN INDICES APPEARS IN THE CONDITION EXPRESSION 22121 8 THE TABLE TO BE ACCESSED SEPARATELY STORED IN A PLURALITY OF MODES? YES 22122 DECIDE DID SEQUEN . SC FOR 22123 DECIDE CANDIDATES FOR PARALLEL SCAN I 22124 S THE SELECTIVITY 0F EACH CONDITION EXP RESSION ALREADY DECI DED'? YES /\/ 22126 N 22125 GIVE THE HIGHEST PRIORITY TO THE INDEX OF THE CONDITION EXPRESSION WHICH MINIMIZES THE SELECTIVITY OBTAIN THE MAXIMUM/ MINIMUM VALUE OF THE SELECTIVITY OF EACH CONDITION EXPRESSION I CALCULATE AND DECIDE THRESHOLD N 22127 VALUES FOR SELECTION FOR EACH ACCEss PATH BA UPON SYSTEM cHARAcTERIsTI C uCH AS CPU PERFORMANCE, I /0 PERFORMANCE ETC. REGISTER IIHICH AR SINGLE ESS PATH DIDATES, /\/ 22128 0F COMBIN A RAL INDICES, GIVING A SELECT Y LESS THAN THE ABOVE THRESHOLD VALUE c D U.S. Patent Feb. 15,2000 Sheet 14 0f 20 6,026,394 FIG. 11 (e) 2213 GENERATION OF PROCESSING PROCEDURE CANDIDATES 22130 S THE TABLE TO BE ACCESSED SEPARATELY STORED IN A PLURALITY OF MODES’? 22131 IS A SORTING PROCESS CONTAINED IN THE PROCESSING YES PROCEDURE CANDIDATE‘? A/22135 DECONPDSE TO TWO-WAY JOINS WHICH ARE JOINABLE IS THERE ONLY ONE ACCESS PATH FOR THE TABLE TO BE 22132 22133 GENERATE A SINGLE A/ GENERATE A PLURALITY OF PROCESSING PROCEDURE PROCESSING PROCEDURES A/zzxse FOR DATA READ / D I STR I BUTI ON ACCESSED? A/ [ REGISTER PROCESSING PROCEDURE CANDIDATES AND SLOT SORTING, IN 22134 CORRESPONDENCE WITH EACH TABLE STORING NODE I /I/22137 REGISTER THE SLOT s0RT|NG PROCESS PROCEDURE, N-IIAY NERGE PROCESSING PROCEDURE, AND JOIN PROCESSING PROCEDURE AS CANDIDATES, IN CORRESPONDENCE WITH EACH JOIN PROCESSING NODE, AND PARANETERIZE THE NUMBER OF TmEs 0F SLOT SORTING RUN LENGTH NERGING - //22138 REGISTER THE :IEOUES'ED DATA OUTPUT PROCESSING PROCEDURE TO THE REQUESTED DATA OUTPUT NODE END REQUEST? END U.S. Patent Feb. 15,2000 Sheet 15 0f 20 6,026,394 F/G. J 1 (f) 222 /V @05 GENERATION) 2220 IS THERE ONLY ONE PROCESSING PROCEDURE? 2221 EMBED THE COLUMN VALUE FREQUENCY INFORMATION TO THE PROCESSING PROCEDURE 2222 GENERATE THE DATA STRUCTURE FOR SELECTING PROCESSING PROCEDURES BASED UPON CONSTANTS SUBSTITUTED THROUGH THE EXECUTION a EXTEND THE PROCESSING PROCEDURES TO EXECUTABLE CODES 2223 ' U.S. Patent Feb. 15,2000 Sheet 16 0f 20 6,026,394 F/G. 12 (a) PROCESS FOR QUERY EXECUTION N 223 DYNAMIC OPTIMIZATION PROCESS N EXECUTION OF CODE ANALYSIS 224 U.S. Patent Feb. 15,2000 Sheet 17 0f 20 6,026,394 F/G. 12(b) YES IS THERE ONLY ONE PROCESSING PROCEDURE 7 CALCULATE THE SELECTIVITY BASED UPON THE SUBSTITUTED CONSTANT ARE THE PARALLELPROCESSING PROCEDURETAIC?IéIDIDATES COND9 No 22303 INPUT THE COLUNN VALUE FREQUENCY INFORNATI ON FROM THE DICTIONARY j 4 CALCULATE THE PROCESSING TIME FOR $230 DATA RETRIEVAL/DISTRI BUTION PROCESS l DECIDE THE NUMBER P 0F ASSIGNED JOIN 22305 NODES FROM THE PROCESSING TIME, AND DECIDE ITS PROCESSING PROCEDURE A1 1.2306 THERE CATTERING OF DATA RETRIE — YES AL/DISTRIBUTION PROCESSING TIME T Z2307 /I/ No / DECIDE THE PROCESSING PROCEDURE "A2" FOR ExEcuTING THE SLOT SORT <ING PROCESS THRouGH THE DATA RETRI EVAL/DISTRI BUTIONI NODES DECIDE THE PROCESSING PROCEDURE “A3” 22303 FOR THE NUMBER "P" OF ASSIGNED JOIN /I/ NODES SAID "P" BEING INCREASED AS MucH As " a" Is THE OUESTED DATA OUTPUT PROCESSING TIME GREATER THAN THE 22309 YES JOINING PROCESSING TIME + LAST ‘ OUND 0F N-IIIAY MERGE PRocEss- 22310 ING TIME ? /I/ TRANSFER THE LAST ROUND OF N-IIAY MERGE PROCESS T0 THE NO JOINING PRocEss, AND DECIDE THE PROCESSING PROCEDURE A4 ' 22311 [SELECT THE BEST SUITED PROCESSING PROCEDURE IN A1~A4W2312 | GENERATE THE DATA DISTRIBUTION INFORMATION I SELECT THE PROCESSING PROCEDURE BY USE OF A THRESHOLD FOR ACCESS PATH SELECTION A/22313 I/I/ U.S. Patent Feb. 15,2000 Sheet 18 0f 20 6,026,394 F/G. 12(0) PROCESS FOR DATA RETRI EVAL/DISTRI BUTION 22401 ACCESS THE DATABASE AND EVALUATE THE CONDITION EXPRESSION DISTRIBUTE THE DATA To THE BUFFER OF EACH NDDE 22402 /I/ BASED UPON THE DATA DISTRIBUTION INFORMATION 22403 IS THERE A FULLY OCCUPI BUFFER '? IS A SLOT SORTING PROCESS NECESSARY '? N0 22405 EXECUTE THE SLOT SORTING PROCESS I TRANSFER THE DATA TO THE CORRESPONDING NODE 22407 ARE ALL DATA ACCESSED ? 22408 TRANSFER THE REMAINING DATA @ 22406 U.S. Patent Feb. 15,2000 Sheet 19 0f 20 6,026,394 F/G. 12(0') RECEIVE DATA FROM OTHER NODES 22411 YE IS IT ALREADY SLOT SORTED '? S 22412 NO EXECUTE THE SLOT SORTING PROCESS 22413 BUFFER RESULTS OF SLOT SORTING PROCESS 22420 22414 IS DATA F RECEIVED JOIN THE SORT LISTS AND TRANSFER THE DATA TO A BUFFER ALL OTHER NOD . 22422YES IS IT AN N-WAY MERGE PROCESS . /v TRANSFER THE DATA TO THE OUTPUT NODE 22416 EXECUTE THE N-WAY NERGE PROCESS 22417 22423 IS EVERY JOINING PROCESS FINISHED ? BUFFER RESULTS OF THE N-WAY NERGE PROCESS 22424 22418 S IT A JOINING PROCESS 7 YES TRANSFER THE BUFFERED DATA TO AZ/ZMQ THE OUTPUT NODE TRANSFER THE REMAINING DATA TO THE OUTPUT NODE END