Download Document

Document related concepts
no text concepts found
Transcript
Distributed Database System
Part 2 : Integrating heterogeneous data source
2001년 5월 10일
임효상
한국과학기술원 전산학과
데이터베이스 및 멀티미디어 연구실
Contents

Multi-DBMS
TP Monitor solution
Wrapper/Mediator solution

Data Replication


2017-05-22
Distributed Database System
2
Multi-DBMS
2017-05-22
Distributed Database System
3
Introduction

Distributed DBMS



Top-down approach
대상 data source를 분산 저장하여 전체적인 운용
비용을 줄이고, data를 효율적으로 관리
Multi-DBMS (Federated DBMS)


Bottom-up approach
이미 존재하는 heterogeneous한 여러 data
source들을 단일 schema로 묶고, 단일 interface
로 질의를 던져서 쉽게 정보를 취합
2017-05-22
Distributed Database System
4




A database system that resides on top of existing
local database systems(LDBs)
Provides a uniform environment in which the user
can access data from heterogeneous LDBs
Maintains a single global database schema that
integrates the schemas of the LDBs
When the global queries are issued,



Decomposes and translates the global queries into
queries for processing by LDBSs.
Merges the results from them and generates the final
result.
Supports distributed transaction management
over LBDs
2017-05-22
Distributed Database System
5
Motivation




Distributed Computing Environment
Many Different DBMSs have been
independently developed and used
Different data schema
Relevant data may be stored in distributed
and heterogeneous database
Access to heterogeneous data sources of
different DBMS can be required
2017-05-22
Distributed Database System
6

A few different ways

Converting and migrating all data from one
DBMS to another


The application in one DBMS may need to be
converted to run in another
Using gateways for specific pairs of DBMS

2017-05-22
Gateway approach does not support transaction
management
Distributed Database System
7
History

1980년대 중반


1990년대 초



Research prototype이 나오기 시작
범용 Multi-DBMS가 나오기 시작 (UniSQL/M)
특수목적 Multi-DBMS는 여러 개 나옴
현재



UniSQL/M이외의 뚜렷한 범용 Multi-DBMS가 존재하지 않음
여전히 vendor들이 표준을 안 따르고 독자적인 model에 따
라서 product를 개발
Middleware(TP Monitor, ODBC등), server-to-server
communication등 다른 개념들이 나오면서 필요성 반감
2017-05-22
Distributed Database System
8
Global Schema



Schema used in Multi-DBMS
Integration of the schemas exported from LDBs
Schema conflicts


Arise within the process of integrating different local
database schemas
Resolution techniques for schema conflict



Renaming entities and attributes
Homogenizing representations for different expression,
units, or levels of precision
Homogenizing attributes

2017-05-22
Type coercion, defining default values, concatenation
attributes, and projecting parts of a composition hierarchy
Distributed Database System
9
Global Query Processing

Definition


To access data that's on multivendor database
servers
Query Processing

Multi-DBMS that supports a single common
data model and a single global query language
on top of different types of existing systems
2017-05-22
Distributed Database System
10

Global Query Processing

global query decomposition





local query translation



Normalization, analysis, simplification
Relational calculus to relational algebra transformation
Data localization
Query modification
Each subquery is translated to a query or queries of
the corresponding local database system
Send localized queries to a local database system
results assembly

2017-05-22
The local results returned by the subquries are
combined into the global result
Distributed Database System
11
Global query
Query decomposition
and global optimization
Multi-DBMS
Layer
SQ1
SQ2
SQ3
…
Query
Query
Query
translator1 translator2 translator3
Local DBMS
Layer
2017-05-22
LQ1
LQ2
LQ3
DB1
DB2
DB3
SQn
Query
translatorn
…
Distributed Database System
SQ : Sub-Query
LQ : Local Query
LQn
DBn
12
Global Transaction Management

Definition


To deal with the problems of always keeping
distributed databases in a consistent state even
when concurrent accesses and failures occur
Multidatabase Transaction Processing

Global transaction manager(GTM)



submits global transaction operations to the local
DBMSs through the local transaction agent
coordinate two phase commit
A set of local transaction manager(LTM)
2017-05-22
Distributed Database System
13
Global Transactions
Ti
Tj
GTM
LTM
ti1
tj1
LTM
ti2
tj2
Local Transactions
Local Transactions
DBMS
2017-05-22
GTM : Global Transaction Manager
LTM : Local Transaction Manager
DBMS
Distributed Database System
14
Multi-DBMS Example (UniSQL/M)

Introduction

Multi-DBMS for managing a heterogeneous
collection of relational and object-oriented
database systems


Two component of global schema



Use SQL/M which is the data definition and
manipulation language
Proxy : transformation schema (= export schema)
Virtual Class : conceptual schema (= global schema)
Support UniSQL/X, Oracle, Sybase, Informix…
2017-05-22
Distributed Database System
15

History

1990년 김원 박사에 의해서 미국 UniSQL사 창립






MCC에서 개발중인던 객체관계형 DBMS ORION의 주요
개발 맴버
객체지향개념을 SQL에 접목
Multidatabase Architecture
1997년 12월, ㈜한국컴퓨터통신이 미국 UniSQL사
로부터 소스코드와 전세계 판권 인수
1999년 6월, UniSQL 4.0K 발표
2001년 현재, UniSQL 5.0 version3까지 나와있으
며 정부기관을 중심으로 사용되고 있음
2017-05-22
Distributed Database System
16

Architecture
Global DBMS
Vclasse
Proxy
Vclasse
Proxy
Proxy
Vclasse
Proxy
Proxy
UniSQL/M
Global Transaction Manager
Local DBMS
Local DBMS
Master
Master
Driver
Driver
Driver
Informix
Oracle
Sybase
2017-05-22
Distributed Database System
17
virtual
class
proxy
create vclass COMPANY
( name string, total_sale integer, net_profit integer, address string,
country string, grade integer)
as select name, total_sale, net_profit, address, NULL, grade
from DOMESTIC
union all
select name, total_sale, net_profit, NULL, country, grade
from FOREIGN;
create proxy DOMESTIC on Oracle
( name
string, total_sale integer,
net_profit integer, address
string,
grade
integer)
as select cname, gross_selling,net_gain_address,grade
from DomesticCompany;
Oracle
DomesticCompany
cname
Sybase
ForeignCompany
gross_ net_ address grade
selling gain
2017-05-22
create proxy FOREIGN on Sybase
( name
string, total_sale integer,
net_profit integer, country
string,
grade
integer)
as select comp_name,gross_sale,pure_gain,country, grade
from ForeignCompany;
comp_ gross_ pure_ country grade
name sale
gain
Distributed Database System
18

개발순서





Local database 생성
UniSQL/M에 global database 생성
UniSQL/M에 local database들을 등록하고 proxy를 정의
정의된 proxy를 사용하여 vclass를 정의
Vclass를 사용하여 응용 서비스를 개발
2017-05-22
Distributed Database System
19
TP Monitor Solution
2017-05-22
Distributed Database System
20
Transaction

An ACID unit of work

ACID Properties





By Industrial Requirement


Atomicity
Consistency
Isolation
Durability
Example application : finance(banking, brokerage, …),
insurance, healthcare, telecom, reservations(hotel, car, rail, air),
inventory control, retail/distribution ...
In DBMS

A collection of operations that performs a single logical function
in a database application
2017-05-22
Distributed Database System
21
Global Transaction

Environment Shift



Distributed information in network
Downsizing
Global(Distributed) Transaction : Transaction run on multiple
sites

2 Phase Commit Protocol



2 Phase Locking Protocol


Coordinate the action of distributed transaction
Synchronize updates on multiple site so that they either all fail of all
succeed (Atomicity)
Isolation
Methods


Transaction Processing Monitor
Multi-DBMS


ex) UNISQL ...
DBMS vendor’s approach
2017-05-22
Distributed Database System
22
Two Phase Commit Protocol
c
c
Phase 1
p
p
p
Prepare-to-commit
p
p
p
p
Ready-to-commit
c
Phase 2
p
c
p
Commit or Abort
p
p
p
Complete or Abort
C : coordinator
P : participant
2017-05-22
Distributed Database System
23
Two Type of Transaction Processing

TP Heavy




Using Transaction Processing Monitor
Support transaction on single server or multiple
heterogeneous server
All resource, not just data-centric ones
TP Lite



Database-centric approach : using stored procedure,
trigger...
Sybase integrate some of TP monitor functions inside the
DBMS engine in 1986
Can not support global transaction control
2017-05-22
Distributed Database System
24
Transaction Processing Monitor

Definition



“An operating system for transaction processing”
Transaction application programmer가 transaction의 ACID 특성
을 쉽게 구현할 수 있도록 도움을 주는 프로그램으로서, 시스템 자
원의 할당을 관리하는 interface와 procedure들로 이루어짐
Multi-resource manager 환경에서 distributed transaction
processing수행을 지원하는 middleware
client
Resource Manager
client
client
TP
Monitor
client
2017-05-22
Resource Manager
Resource Manager
Distributed Database System
25

역할

Process Management




Funneling work
Monitoring transaction execution
Load balancing
(Distributed) Transaction Management


2017-05-22
Distributed transaction processing 수행
ACID
Distributed Database System
26
TP Monitor Standard

X/Open’s DTP(Distributed Transaction
Processing) Reference Model


Define the components of a transaction-based system
and locate the interface between them
4 Components







Application Program
Resource Manager
Communication Resource Manager
Transaction Manager
1993, Version1
1994, Version2
ISO-TP(ISO0026)

Define the transaction identifiers and the two-phase
commit protocol in a commit tree
2017-05-22
Distributed Database System
27

1 TM Environment
Application(AP)
XATMI, TxRPC, CPI-C
TX
Resource
Manager
(RM)
XA
Transaction
Manager
(TM)
XA+
communication
Resource
2017-05-22
Communication
Resource
Manager
(CRM)
Distributed Database System
28

n TMs Environment
AP
TM
RM
CRM
OSI-TP
Resource
AP
RM
TM
AP
CRM
RM
OSI-TP
Resource
2017-05-22
TM
CRM
OSI-TP
Resource
Distributed Database System
29
Product

Close TP monitors

IBM CICS





The largest market share by far of any TP monitor product
Since 1968, to improve the efficient of mainframe operating
system environment
CICS/VSE, CICS/MVS, CICS/400, CICF for AIX, CICS for
OS/2, CICS for Windows/NT
Ported to other UNIX platforms
IBM IMS(Information Management System)




2017-05-22
Used by 40% of the world’s largest company
Since late 1960s
Online database and transaction processing at a time when
nearly all data processing was done in batch
Today’s IMS can interoperate with CICS and DB2
Distributed Database System
30

Open TP monitor

Novell(BEA) Tuxedo




IBM(Transarc) Encina




Developed by AT&T Bell Laboratory primary to service
telecommunication applications in 1984
Purchased by Novell in 1996
The most popular open TP monitor
Portable TP Monitor
Transarc was founded in 1984 by several Carregie-Mellon Univ
researchers
Purchased by IBM in 1994
Microsoft Transaction Server




2017-05-22
Component-based TP monitor product
Object-oriented
Easy to use, develop, deploy and management
In 1996, MS DTC(Microsoft Distributed Transaction Coordinator)
was embedded in SQL Server 6.5 and Windows/NT
Distributed Database System
31

Transarc Encina

Provide distributed transaction processing toolkit to
application developer
Application Programming Interface
TRPC
TRAN
TM/XA
REC
LOG VOL
LOCK
Base Development Environment
TRPC : Transaction Remote Procedure Call Service
TRAN : Distributed Transaction Service
TM/XA : Transaction Manager/XA Service
REC : Recovery Service
2017-05-22
LOG : Log Service
VOL : Volume Service
LOCK : Lock Service
Distributed Database System
32
Wrapper/Mediator Solution
2017-05-22
Distributed Database System
33
Introduction


Addressed by Gio Wiederhold(Stanford Univ.) in IEEE Computer March 1992
“Mediators in the Architecture of Future Information Systems”
Wrapper


Heterogeneous data source를 encapsulate하여 통일된 형식으로 상위 layer에 제공
하는 프로그램
Mediator

상위 layer에 제공하기 위한 정보를 생성하기 위하여, 특정 data 집합에 대한 부호화된
지식을 사용하는 프로그램
(A software module that exploits encoded knowledge about certain sets or
subsets of data to create information for a higher layer of applications)

Wrapper, Mediator를 사용한 일반적인 Architecture
Application
Mediator
2017-05-22
wrapper
wrapper
wrapper
wrapper
Data Source
Data Source
Data Source
Data Source
Distributed Database System
34

Example : UniSQL/M
Global DBMS
Vclasse
Vclasse
Vclasse
Mediator
Proxy
Proxy
Proxy
Proxy
Proxy
UniSQL/M
Global Transaction Manager
Local DBMS
Local DBMS
wrapper
Master
Master
wrapper
Driver
wrapper
Driver
wrapper
Driver
Informix
Oracle
Sybase
2017-05-22
Distributed Database System
35
Motivation

다양한 형식의 데이터가 네트웍 상에 산재되어 저장


산재된 데이터를 통합하여 원하는 정보를 얻어내는 것
이 필요


ex. Heterogeneous DBMS, Web page in WWW 등
ex. Multi-DBMS, Intelligent Agent 등
서로 다른 형식으로 저장되어 있는 데이터를 동일한
방법으로 access하는 것이 필요

ex. ODBC, JDBC 등
2017-05-22
Distributed Database System
36
Example : Garlic Project

IBM Almaden Research Center에서 1995년 부터 시작

Multimedia information system을 구성하기 위한 목적


Heterogeneous database system 및 non-database system에 저
장되어 있는 다양한 형태의 data를 통합
DB2 DataJoiner라는 product로 상용화




Heterogeneous Database Join
Transparent Access to Heterogeneous Data Source
Global Optimization
Special Data Management/Access
2017-05-22
Distributed Database System
37

Architecture

Garlic



Heterogeneous legacy data에 대한 통합된 관점을 제공하는 middleware system
Data의 저장 방법이나 위치를 변화 시키지 않음
Repository Wrapper


Heterogeneous repository에 저장된 data의 type과 내용을 통일된 형식으로
Garlic에 제공
Garlic 내부 protocol 과 repository native protocol간 변환
Client
...
Client
Client
Garlic (mediator)
Metadata
Query Processor
2017-05-22
Wrapper
Wrapper
Relational
DB
Object Oriented
DB
...
Wrapper
Wrapper
Image
Archive
Sound
Archive
Distributed Database System
38

IBM DB2 Joiner
2017-05-22
Distributed Database System
39

Issue



Garlic(mediator) 안에서 데이터를 어떻게
modeling할 것인가
Garlic query processor에서 query planning을 세
울 때 wrapper에서 제공하는 정보를 어떻게 이용
할 것인가
Garlic query processor에서 query를 수행할 때
wrapper를 어떻게 이용할 것인가
2017-05-22
Distributed Database System
40
Data Replication
2017-05-22
Distributed Database System
41
Introduction


분산 환경에서 동일한 data item을 여러 node
에 중복 시키는 방법
목적

Replication is often used in many distributed system to
provide a higher level of Performance, Reliability and
Availability




Highly available system
Fault tolerance capabilities
Improved response time
Load sharing
중복된 data item들이 올바른 상태를 유지하도록
보장하는 access algorithm 필요
2017-05-22
Distributed Database System
42
Replica Control Protocol

“Algorithm that control access to replicated data”
Replica : 중복되어 저장된 data item중의 한 copy

요구사항



Data correctness
성능 평가 기준

Performance



Availability : 어느 순간, 요청된 operation이 수행 가능할 확률
Response Time : Operation이 요청되었을 때부터 수행이 완료
되기까지 걸린 시간
Cost
2017-05-22
Distributed Database System
43

Correctness

Single-copy serializability 가 보장되어야 한다


2017-05-22
Single-copy :
전체 system에 하나의 data item만 존재하는 것처럼 동작
Serializability :
동시에 수행되는 여러 operation들의 결과가 각
operation을 차례대로 수행 했을 때의 결과와 동일
Distributed Database System
44

Eager replica update protocol




Update all the replicas of an item as part of a single
transaction
Ensure that executions are serializable
Non-scalable
Lazy replica update protocol



Propagate updates to replicas through independent
transaction after the original transaction commit
Become popular with database vendors due to their
superior performance characteristics
If used indiscriminately, result in non-serializable
execution
2017-05-22
Distributed Database System
45
Primary Copy

개념




Primary copy : replicated된 data item중에서 특별히 정한 것
Data를 update할 때 primary copy만 update
Primary copy가 저장된 node에서 다른 replica들에 대한 update를 책임짐
방법


Read : 원하는 data item이 local node에 있으면, 그것을 읽음
Local node에 원하는 data item이 없으면, 다른 node에 read를 요청
Write : Primary copy에 write 요청을 보냄
Primary copy의 변화를 다른 node의 data item에 연속적으로 propagation
Propagate
Primary Copy
S0
R/W
2017-05-22
S1
S2
S3
R
R
R
Distributed Database System
46

Design Idea

그 data item에 대한 access요청이 가장 빈번한 node에 primary
copy를 둔다


Local read와 write만으로도 data에 대한 access 요청을 처리 가능
단점

Large system의 경우 primary copy가 bottle neck으로 작용한다



Replica를 가지고 있지 않은 node로부터의 read 요청
모든 write 요청
Network partition시에 하나 이상의 primary copy가 나타날 수 있다


2017-05-22
Primary copy가 저장되어 있는 node가 fail되면 다른 node의 data item
이 primary copy로 선출
Network partition시 둘 이상의 primary copy가 나타나서 data
consistency가 깨짐
Distributed Database System
47
Quorum Consensus

개념

Quorum group : operation을 수행하기 위해서 permission을 얻어야 하는 node
들의 group




Quorum set : quorum group의 모음
Read, write는 각 operation의 quorum group에 속한 모든 node들로부터
permission을 얻어야만 수행 가능
Version number를 이용해서 가장 최근에 write된 replica을 찾아냄
요구사항


Read, write quorum group사이에는 하나 이상의 공통된 node가 존재
Write수행을 위한 quorum group사이에는 하나 이상의 공통된 node가 존재
Quorum Sets
R={ {A,B}, {A,C} , {B,C} , {D} }
v# = 2 x = 3
v# = 2 x = 3
v# = 2 x = 3
v# = 1 x = 4
v# = 1 x = 4
v# = 1 x = 4
v# = 1 x = 4
A
B
C
D
W={ {A,B,D}, {A,C,D}, {B,C,D} }
Read
2017-05-22
Distributed Database System
Write(x=3)
48

Weighted Voting

Quorum set을 나타내는 간단한 방법



정의된 quorum값에 따라서 system 동작 형태가 달라짐


예 1)
예 2)
각각의 node에 vote라는 값을 할당
각각의 operation은 미리 정의된 quorum값 만큼의 vote를 node들로부터 얻어야
만 수행이 가능
Read One Write All (ROWA)
Read majority / Write majority
Read Quorum = 1
Write Quorum = 5
Read Quorum = 3
Write Quorum = 3
2017-05-22
A
B
Vote = 1
Vote = 1
Distributed Database System
C
Vote = 1
D
Vote = 2
49

장점



중복된 data item들 중 일부에 대해서만 write를 수
행해도 data consistency를 보장 할 수 있다
Network partition과 같은 경우에 대한 특별한 대비
책이 필요 없다
단점


Read operation을 위해서 비교적 높은 overhead
가 있다
여러 Node가 fail되는 경우 전체 system이
available하지 않은 상태에 빠질 수 있다
2017-05-22
Distributed Database System
50
Available Copies

개념




일종의 Read One Write All 방법
Read operation : 임의의 available replica로부터 read
Write operation : 모든 available replica에 write
Directory Oriented Available Copies Method

Directory : Data item의 replica를 가지고 있는 node들의 list



Read Operation



Include(xa) : Node A에 data item x의 replica가 있음을 directory에 등록
exclude(xa) : Node A에 data item x의 replica가 있다는 정보를 directory로 부터
제거
Directory에서 date item x 가 있는 node를 찾아서 read 를 요청
실패하면 directory로부터 다른 node를 찾아서 재시도
Write Operation


2017-05-22
Directory에서 data item x 가 있는 모든 node를 찾아서 write request
하나의 node에서라도 write를 실패하면 실패한 node를 directory에서 제거하고
write 재시도
Distributed Database System
51

장점



한번의 access로 read operation을 수행할 수 있다
높은 data availability를 보장한다
단점

Network partition이 일어났을 때, data consistency를 보장하지 못한다
x
A
y
B
x
A
x
C
y
B
y
D
x
C
y
D
Network partition
2017-05-22
Distributed Database System
52
Product

상용 DBMS 에서 replication 기능을 제공




Oracle, Sybase, Informix…
Tamdem Machine
Distributed file system
Distributed operation system
2017-05-22
Distributed Database System
53
Conclusion

Integrating heterogeneous data source




Multi-DBMS
TP Monitor solution
Wrapper/Mediator solution
Data Replication
2017-05-22
Distributed Database System
54
Reference

Multi-DBMS




Principles of Distributed Database Systems by M.Tamer Ozsu and Patrick Valduriez, Prentice
Hall, 1999
Distributed Database Lecture Note(KAIST CS Dept, Stanford CS Dept.)
UniSQL Homepage (http://www.unisql.com)
TP Monitor







The Essential Client/Server Survival Guide by Robert Orfali, Dan Harkey, and Jeri Edwards,
John Wiley & Sons, Inc., 1996
Philip A. Bernstein, “Transaction Processing Monitors”, Communication of the ACM,
November 1990, p75 - 86
C. Mohan, “Transaction Processing and Distributed Computing in the Internet Age”, TP&DC
Talk, Presentation TP, 1998
Distributed Transaction Processing: Reference Model, Version 3”, The Open Group, 1996
“Distributed Transaction Processing: The TX Specification”, The Open Group, 1991
“Distributed Transaction Processing: The XA Specification”, The Open Group, 1995
Principles of Transaction Processing for the System Professional by Philip A. Bernstein and
Eric Newcomer, Morgan Kaufmann, 1996
2017-05-22
Distributed Database System
55

Wrapper/Mediator




Gio Wiederhold, “Mediators in the Architecture of Future Information Systems”,
IEEE Computer, March 1992, p38 - 49
Mary Tork Roth, and Peter Schwarz, “Don’t Scrap It, Wrap It! A Wrapper
Architecture for Legacy Data Source”, Proceeding of the 23rd VLDB Conference
Athens, Greece, 1997
Mary Tork Roth, Manish Arya, Laura M. Haas, Michael J. Carey, William F. Cody,
Ronald Fagin, Peter M. Schwarz, Joachim Thomas II and Edward L. Wimmers,
“The Garlic Project”, SIGMOD Conf. 1996, p557
Data Replication


Distributed Operating System by Andrew S. Tanenbaum, Prentice Hall, 1995
Replicated Data Management in Distributed Systems, Readings in Distributed
Computing Systems (T. L. Casavant and M. Singhal, Editors), IEEE Computer
Society Press, Los Alamitos, C.A., 1994, pages 572--591.
2017-05-22
Distributed Database System
56
보조자료1 : Middleware

Middleware의 정의 [CTR]

client와 server사이 또는 server와 server사이의 high-level communication
을 중개하며, API로 정의된 software layer
Client
Server
Server
Client
middleware

Network communication services
특징

컴퓨팅 환경의 이질성을 감춘다




여러 운영체제에 middleware를 porting하여 같은 API로 서비스를 제공
여러 네트웍 프로토콜을 지원
서로 다른 데이터베이스에 대해 하나의 interface를 제공
종류

Message oriented middleware(MOM), Remote Procedure Call(RPC),
Object Request Broker(ORB), Distributed Computing Environment(DCE),
Online Transaction Processing Monitor(OLTP Monitor), Database Connectivity Middleware
2017-05-22
Distributed Database System
57
보조자료2 : DBMS Gateway

Some database vendors extend two-phase commit to
multiple databases

Oracle’s Open Gateway


Manage the two-phase commit across heterogeneous XAcompliant database
Gateway between oracle server and other database server
client
client
client
Oracle
Open Gateway
DBMS A
Oracle
Open Gateway
DBMS B
Oracle Server
Client view of oracle server
2017-05-22
Distributed Database System
58
보조자료3 : Application Server

Oracle’s Application Server

오라클의 네트워크 컴퓨팅 아키텍쳐(Network Computing
Architecture)의 중간 계층으로서 web server, object request
broker, TP monitor등의 middleware 기능을 제공
X/Open XA
WEB client
Oracle Server
...
Oracle
Application Server
DBMS A
CORBA client
...


Application logic
DBMS B
Informix support gateway to access distributed database
IBM’s DB2 DataJoiner can access and join tables located
across multiple data sources
2017-05-22
Distributed Database System
59
Related documents