Download F/gs/gAQRIBUHON % % % % ARE EQUALLY DISTRIBUTED T0

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
US006026394A
Ulllted States Patent [19]
[11] Patent Number:
Tsuchida et al.
[45]
[54]
SYSTEM AND METHOD FOR
Date of Patent:
5,845,113
[751
_
_
OTHER PUBLICATIONS
_
DeWitt et al., “Parallel Database Systems: The Future of
Inventors? Masashl Tsuchlda, Sagamlhara; Yuk")
Nakano, Yokohamag N?buo
_
_
Feb. 15, 2000
12/1998 Swami et al. ............................. .. 707/7
IMPLEMENTING PARALLEL OPERATIONS
IN A DATABASE MANAGEMENT SYSTEM
_
6,026,394
High Performance Database Systems,” Communications of
the AMC, v. 35, N0. 6, 1992, pp. 85—98.
Kawamul‘a, Sagamlhara; KZQHYOSPI
DeWitt et al., “GAMMA—A High Performance Data?oW
Neglshl? Yokohama; Shunlchl T0n1>
Mllsashlno, an of Japan
Database Machine”, Proceedings of the Twelfth Interna
tional Conference on Very Large Databases, Kyoto, Aug.
_
_
1986, pp. 228—237.
[73] Asslgnee: Hltachl’ Ltd” Toky0> Japan
“Dynamic and Load—Balanced Task—Oriented Database
[211 Appl- No-3 09/ 148,648
Query Processing in Parallel Systems,” Lu et al., Dept of
Info Sys, National Univ. Of Singapore, Advances in Data
.
[22]
_
base Technology, EDBT 1992, 357—372.
Flled'
Sep' 4’ 1998
Parallel Computer Architecture, vol. 21, No. 4, Mar. 1989.
Related US‘ Application Data
[63]
Primary Examiner—Hosain T. Alam
Continuation of application No. 08/810,527, Mar. 4, 1997,
Attorney’ Agent’ Or Flrm—BeaH Law Of?ces
Pat. No. 5,806,059, which is a continuation of application
No. 08/180,674, Jan. 13, 1994, abandoned.
[57]
[30]
Foreign Application Priority Data
ABSTRACT
A database' management system‘ for executing database
operations in parallel by a plurality of nodes and a query
Jan. 20, 1993
[51]
Japan .................................. .. 5-007804
Int Cl 7
'
[52]
[JP]
processing method are described, The database management
G06F 17/30
'
system contains a decision management node for deciding a
_
_
’
’
707/;>097/g(7)/17
’
a
_
processing procedure for processing the query, and execute
Fleld Of Search ..........................
3, 7, 10,
395/200'31’ 709/200’ 201’ 205
the process, and ajoin node for Sorting, merging, and-joining
the information retrieved by the distribution node. When the
US. Cl.
""""""""""""""""""" "
_
’
distribution node for retrieving information so as to analyZe
,
[56]
d from
an a
ppl.1cat1on
. p ro gram, g enerate
a
the decision management node retrieves the information to
be processed and the join node decided by the decision
_
521353332
.
receive
query process is executed, the distribution node decided by
References Clted
Us PATENT DOCUMENTS
2,091,252
quer y
management node also obtains the result for the query from
2/1992 Tsuchida et al. ......................... .. 7057/2
Du et a1
the retrieved information' The query result is Outputted from
37%;;
5:471:622
11/1995 Eadline ..................................... .. 707/3
5,806,059
9/1998 Tsuchida et al. ......................... .. 707/2
an output node and transferred to the application program.
31 Claims, 20 Drawing Sheets
N
51
50
COLUMN VALUE FREQUENCY INFORMATION
RELATED TO JOINING COLUMN
F
v1 v2 v3 v4 v5 V6 v7 V8 vs no
N
200-1
NODE1
N
200~2
NODE2
200-9
NODE9
200-10
NODEIO
DIVISION OF DATA:
IF THE JOINING KEY HAS A GDLUMN
101
111
Ngggf?F/gs/gAQRIBUHON % %
R
200-11
NODE11
221
231
% %
/\/
VALUE FREQUENCY INFGRMATIDN, VALUE
RANGES ARE MADE s0 THAT THE DATA
ARE EQUALLY DISTRIBUTED T0 NODES
200-15
NODE15
DATA DISTRIBUTION INFORMATION
141
NODES FDR JOINING PROCESS
L51
VALUE RANGE
VH’Z
va-v4
V5-V6
v7-vs
v9~v10
NODE NUMBER
11
12
13
I4
15
52
U.S. Patent
Feb. 15,2000
Sheet 1 0f 20
6,026,394
F / 6'. J
/V
APPLICATION PROGRAM 1
/V
- - —— —
APPLICATION PROGRAM N
T
I
w”
DATABASE MANAGEMENT SYSTEM
21
SYSTEM CONTROLLER
W
I
LOGICAL PRocESSoR
22
220 /1/
QUERY ANALYSIS
IL
221
STATIc OPTIMIZATION /V
‘L
_'~>'
CODE GENERATION
222
/I/
223
DYNAIIIc OPTIMIZATION/V
L
224
CODE INTERPRETER
/1/
x
v
23
/V
PHYSICAL PRocESSoR
EXCLUSIVE <__>|
[V230
W232
DATA ACCESSING PRocESS I
PROCESS
@
(—’LMAPPING
CONTROL
31
I<—-)HATABASE BUFFER CONTROL /'/Z
L
4
DATABASE BUFFER
OPERATING SYSTEM
/v30
40 0 0
DATABASE
DICTIONARY
5°
U.S. Patent
Feb. 15,2000
Sheet 2 0f 20
FIG. 2
TO ANOTHER SYSTEM
6,026,394
U.S. Patent
101\r
Feb. 15,2000
111w
l:
100
6,026,394
Sheet 4 0f 20
t
nevi‘
DATA RETRIEVAL
SLOT SORTING
PROCESS
RETRIEVAL
PHASE
DISTRIBUTION
OF DATA
4
‘L
180
N-WAY MERGE
PROCESS
MERGE
PHASE
_
I
»/
12O
181
JOIN PROCESS
JOINING
PHASE
REQUESTED
DATA OUTPUT
U.S. Patent
Feb. 15,2000
Sheet 7 0f 20
6,026,394
FIG. 7
DATA RETRIEVAL/DISTRIBUTION PROCESS
SLOT SORTING PROCESS
Er
‘3.02
300
310
303
304
301
l
2
3
4
5 -- 8
305
/
NUMBER OF 00053
DATA RETRIEVAL
NODE
FOR 00m PROCESS
N
N-WAY MERGE PROCESS
/
NUMBER OF NOOES
FOR JOIN PROCESS
N
JOIN PROCESS
REQUESTED DATA
OUTPUT
PROCESSING
TIME E
330
340
NUMBER OF NODES
JOIN NOOES
'1'
FOR JOIN PROCESS NUMBER OF
RECEIVING
NOOES
:PROCESSING TIME
: FOR EACH PHASE
U.S. Patent
Feb. 15,2000
Sheet 8 0f 20
DATA RETRIEVAL/DISTRI BUTION PROCESS
6,026,394
SLOT SORTING PROCESS
Er
Es
30s
N306” 307
A/Ejlax ‘A, 309
O O O
\ 310
311/
312 v
305
P’ 1 f’ 2 I" 3 I" 4 f’ 5 - - 8
300 301 302 303 304
DATA RETRIEVAL
NUMBER OF NODES
NODE
FOR JOIN PROCESS
N
N-WAY MERGE PROCESS
Em
L
NUMBER OF NODES
FOR JOIN PROCESS
N
JOIN PROCESS
E.
J
REDuEsTED DATA
OUTPUT
PROCESSING
ME E
330
340
T350
NUMBER OF
ASSIGNED
JOIN NODES
NUMBER
OF NODES
FOR JOIN PROCESS
N
1
NUMBER OF
RECEIVING
NODES
:PROCESSING TIME
FOR EACH PHASE
U.S. Patent
Feb. 15,2000
Sheet 9 0f 20
DATA RETRIEVAL/DISTRIBUTION PROCESS
Er
6,026,394
SLOT SORTING PROCESS
Es
‘302
300
E_Max
'
310
4
311
303 30 305
301
312
1
Z
3
4
5 "3
DATA RETRIEVAL
NUMBER OF NODES
FOR JO I N PROCESS
MODE
NUMBER OF NODES
FOR JOIN PROCESS
N
JOIN PROCESS
E1
REQUESTED DATA
\
OUTPUT
PROCESSING
THE
E
340
330
350T T351
NUMBER OF
ASSIGNED
JOIN NODES
NUMBER OF NODES
1'
FOR JOIN PROCESS NUMBER OF
N
RECEIVING
NODES
|:] :PROCESSING TIME
FOR EACH PHASE
U.S. Patent
Feb. 15,2000
Sheet 10 0f 20
DATA RETRIEVAL/DISTRI BUTION PROCESS
Er
6,026,394
SLOT SORTING PROCESS
Es
§O2
300
-LI'
E__Max
310
304
301
3
5 -- 8
DATA RETRIEVAL
NUMBER OF NODES
NODE
FOR JOIN PROCESS
N
N-WAY MERGE PROCESS
Em
320
NUMBER OF NODES
Fol}V JOIN PRucEss
JOIN PROCESS
REQUESTED DATA
OUTPUT
PROCESSING
TIME 5
340
T350
NUMBER OF
ASSIGNED
JOIN NODES
NUMBER OF NODES
1
FOR JOIN PROCESS NUMBER OF
N
RECEIVING
NODES
:PROCESSING TIME
FOR EACH PHASE
U.S. Patent
Feb. 15,2000
Sheet 11 0f 20
FIG. 1] (a)
PROCESS FOR
QUERY ANALYSIS
N 220
QUERY ANALYSIS
N 221
STATIC OPTINI ZATION
PROCESS
N
222
CODE GENERATION
END
H6‘. 11 (b)
STATIC OPTIMIZATION
PROCESS
N 2210
PR
CAT
ES
ATI
ELECTIVITY
v
N
2211
ACCESS PATH PRUNI NG
2212
GENERATION OF PROCESSING
PROCEDURE CANDIDATES
END
6,026,394
U.S. Patent
Feb. 15,2000
Sheet 12 0f 20
6,026,394
32st
ww>
Sam
2mw.o5m» 5“.6m%g
z_5w<Em:15_m<>
mw5oza.w;zo C2“520o05z321;86
moHN
ozw
U.S. Patent
Feb. 15,2000
Sheet 13 0f 20
6,026,394
FIG. 11 (d)
C PROCESS FOR ACCESS
PATH PRUNllNG
)
N 22120
DECIDE CANDIDATES OF
COLUMN INDICES APPEARS
IN THE CONDITION EXPRESSION
22121
8 THE TABLE TO BE
ACCESSED SEPARATELY STORED
IN A PLURALITY OF MODES?
YES
22122
DECIDE
DID
SEQUEN
. SC
FOR
22123
DECIDE CANDIDATES FOR
PARALLEL SCAN
I
22124
S THE SELECTIVITY 0F
EACH CONDITION EXP RESSION
ALREADY DECI DED'?
YES
/\/ 22126
N 22125
GIVE THE HIGHEST PRIORITY
TO THE INDEX OF THE CONDITION
EXPRESSION WHICH MINIMIZES
THE SELECTIVITY
OBTAIN THE MAXIMUM/
MINIMUM VALUE OF THE
SELECTIVITY OF EACH
CONDITION EXPRESSION
I
CALCULATE AND DECIDE THRESHOLD
N 22127
VALUES FOR SELECTION FOR EACH
ACCEss PATH BA
UPON SYSTEM
cHARAcTERIsTI C uCH AS CPU
PERFORMANCE, I /0 PERFORMANCE ETC.
REGISTER
IIHICH AR
SINGLE
ESS PATH
DIDATES, /\/ 22128
0F
COMBIN A
RAL INDICES, GIVING A
SELECT
Y LESS THAN THE ABOVE
THRESHOLD VALUE
c
D
U.S. Patent
Feb. 15,2000
Sheet 14 0f 20
6,026,394
FIG. 11 (e)
2213
GENERATION OF PROCESSING
PROCEDURE CANDIDATES
22130
S THE TABLE TO BE
ACCESSED SEPARATELY STORED
IN A PLURALITY OF MODES’?
22131
IS A SORTING PROCESS
CONTAINED IN THE PROCESSING
YES
PROCEDURE CANDIDATE‘?
A/22135
DECONPDSE TO TWO-WAY
JOINS WHICH ARE JOINABLE
IS THERE ONLY
ONE ACCESS PATH
FOR THE TABLE TO BE
22132
22133
GENERATE A SINGLE
A/
GENERATE A PLURALITY OF
PROCESSING PROCEDURE
PROCESSING PROCEDURES
A/zzxse
FOR DATA READ / D I STR I BUTI ON
ACCESSED?
A/
[
REGISTER PROCESSING
PROCEDURE CANDIDATES
AND SLOT SORTING, IN
22134
CORRESPONDENCE WITH EACH
TABLE STORING NODE
I
/I/22137
REGISTER THE SLOT s0RT|NG
PROCESS PROCEDURE, N-IIAY
NERGE PROCESSING PROCEDURE,
AND JOIN PROCESSING
PROCEDURE AS CANDIDATES,
IN CORRESPONDENCE WITH EACH
JOIN PROCESSING NODE, AND
PARANETERIZE THE NUMBER OF
TmEs 0F SLOT SORTING RUN
LENGTH NERGING
-
//22138
REGISTER THE :IEOUES'ED
DATA OUTPUT PROCESSING
PROCEDURE TO THE REQUESTED
DATA OUTPUT NODE
END REQUEST?
END
U.S. Patent
Feb. 15,2000
Sheet 15 0f 20
6,026,394
F/G. J 1 (f)
222
/V
@05 GENERATION)
2220
IS THERE
ONLY ONE PROCESSING
PROCEDURE?
2221
EMBED THE COLUMN VALUE
FREQUENCY INFORMATION TO
THE PROCESSING PROCEDURE
2222
GENERATE THE DATA STRUCTURE
FOR SELECTING PROCESSING
PROCEDURES BASED UPON
CONSTANTS SUBSTITUTED
THROUGH THE EXECUTION
a
EXTEND THE PROCESSING
PROCEDURES TO EXECUTABLE
CODES
2223
'
U.S. Patent
Feb. 15,2000
Sheet 16 0f 20
6,026,394
F/G. 12 (a)
PROCESS FOR
QUERY EXECUTION
N 223
DYNAMIC OPTIMIZATION
PROCESS
N
EXECUTION OF
CODE ANALYSIS
224
U.S. Patent
Feb. 15,2000
Sheet 17 0f 20
6,026,394
F/G. 12(b)
YES
IS THERE
ONLY ONE PROCESSING PROCEDURE
7
CALCULATE THE SELECTIVITY BASED
UPON THE SUBSTITUTED CONSTANT
ARE
THE PARALLELPROCESSING
PROCEDURETAIC?IéIDIDATES
COND9
No
22303
INPUT THE COLUNN VALUE FREQUENCY
INFORNATI ON FROM THE DICTIONARY
j
4
CALCULATE THE PROCESSING TIME FOR
$230
DATA RETRIEVAL/DISTRI BUTION PROCESS
l
DECIDE THE NUMBER P 0F ASSIGNED JOIN
22305
NODES FROM THE PROCESSING TIME, AND
DECIDE ITS PROCESSING PROCEDURE A1
1.2306
THERE
CATTERING OF DATA RETRIE —
YES
AL/DISTRIBUTION PROCESSING TIME T
Z2307
/I/
No / DECIDE THE PROCESSING PROCEDURE
"A2" FOR ExEcuTING THE SLOT SORT
<ING PROCESS THRouGH THE DATA
RETRI EVAL/DISTRI BUTIONI NODES
DECIDE THE PROCESSING PROCEDURE “A3”
22303
FOR THE NUMBER "P" OF ASSIGNED JOIN /I/
NODES SAID "P" BEING INCREASED AS
MucH As " a"
Is THE
OUESTED DATA OUTPUT
PROCESSING TIME GREATER THAN THE
22309
YES
JOINING PROCESSING TIME + LAST
‘
OUND 0F N-IIIAY MERGE PRocEss-
22310
ING TIME ?
/I/
TRANSFER THE LAST ROUND OF
N-IIAY MERGE PROCESS T0 THE
NO
JOINING PRocEss, AND DECIDE
THE PROCESSING PROCEDURE A4
'
22311
[SELECT THE BEST SUITED PROCESSING PROCEDURE IN A1~A4W2312
|
GENERATE THE DATA DISTRIBUTION INFORMATION
I
SELECT THE PROCESSING PROCEDURE BY USE
OF A THRESHOLD FOR ACCESS PATH SELECTION
A/22313
I/I/
U.S. Patent
Feb. 15,2000
Sheet 18 0f 20
6,026,394
F/G. 12(0)
PROCESS FOR DATA
RETRI EVAL/DISTRI BUTION
22401
ACCESS THE DATABASE AND
EVALUATE THE CONDITION
EXPRESSION
DISTRIBUTE THE DATA To
THE BUFFER OF EACH NDDE
22402
/I/
BASED UPON THE DATA
DISTRIBUTION INFORMATION
22403
IS THERE A FULLY OCCUPI
BUFFER '?
IS A SLOT SORTING
PROCESS NECESSARY '?
N0
22405
EXECUTE THE SLOT
SORTING PROCESS
I
TRANSFER THE DATA TO THE
CORRESPONDING NODE
22407
ARE ALL DATA ACCESSED ?
22408
TRANSFER THE REMAINING DATA
@
22406
U.S. Patent
Feb. 15,2000
Sheet 19 0f 20
6,026,394
F/G. 12(0')
RECEIVE DATA FROM OTHER NODES
22411
YE
IS IT ALREADY SLOT SORTED '?
S
22412
NO
EXECUTE THE SLOT SORTING
PROCESS
22413
BUFFER RESULTS OF SLOT
SORTING PROCESS
22420
22414
IS DATA F
RECEIVED
JOIN THE SORT LISTS AND TRANSFER
THE DATA TO A BUFFER
ALL OTHER NOD
.
22422YES
IS IT AN N-WAY MERGE PROCESS .
/v
TRANSFER THE DATA
TO THE OUTPUT NODE
22416
EXECUTE THE N-WAY NERGE PROCESS
22417
22423
IS EVERY JOINING
PROCESS FINISHED ?
BUFFER RESULTS OF THE N-WAY
NERGE PROCESS
22424
22418
S IT A JOINING PROCESS 7
YES
TRANSFER THE BUFFERED DATA TO AZ/ZMQ
THE OUTPUT NODE
TRANSFER THE REMAINING
DATA TO THE OUTPUT NODE
END