Download Chapter 17 of Database Design, Application Development, and

Document related concepts

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Commitment ordering wikipedia , lookup

Microsoft Access wikipedia , lookup

Serializability wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Oracle Database wikipedia , lookup

Ingres (database) wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

ContactPoint wikipedia , lookup

Concurrency control wikipedia , lookup

Database model wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
Chapter 17
Client-Server Processing,
Parallel Database
Processing, and
Distributed Databases
McGraw-Hill/Irwin
Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Outline




Overview
Client-Server Database Architectures
Parallel Database Architectures
Architectures for Distributed Database
Management Systems
 Transparency for Distributed Database
Processing
 Distributed Database Processing
17-2
Evolution of Distributed
Processing and Distributed Data




Need to share resources across a network
Timesharing (1970s)
Remote procedure calls (1980s)
Client-server computing (1990s)
17-3
Timesharing Network
Terminal
Terminal
Database
Mainframe computer
Terminal
17-4
Simple Resource Sharing
(a) Remote procedural call
Procedural call
Return results
Database
(b) File sharing
File request
File returned
Database
17-5
Client-Server Processing
Server
Client
Client
Client
Database
17-6
Distributed processing and
data
Server
Server
Client
Client
Client
Client
Database
Database
17-7
Motivation for Client-Server
Processing
 Flexibility: the ease of maintaining and
adapting a system
 Scalability: the ability to support scalable
growth of hardware and software capacity
 Interoperability: open standards that allow
two or more systems to exchange and use
software and data
17-8
Motivation for Parallel Database
Processing
 Scaleup: increased work that can be
accomplished
 Speedup: decrease in time to complete a
task
 Availability: increased accessibility of
system
 Highly available: little downtime
 Fault-tolerant: no downtime
17-9
Motivation for Distributed Data
 Data control: locate data to match an
organization’s structure
 Communication costs: locate data close to
data usage to lower communication cost
and improve performance
 Reliability: increase data availability by
replicating data at more than one site
17-10
Summary of Distributed
Processing and Data
Technology
Advantages
Disadvantages
Client-server
processing
Flexibility, interoperability, scalability
High complexity, high development cost,
possible interoperability problems
Parallel
database
processing
Speedup, scaleup, availability, scalability
for predictive performance improvements
Possible interoperability problems, high cost
Distributed
databases
Local control of data, improved
performance, reduced communication
costs, increased reliability
High complexity, additional security concerns
17-11
Client-Server Database
Architectures
 Client-Server Architecture is an
arrangement of components (clients and
servers) among computers connected by a
network.
 A client-server architecture supports
efficient processing of messages (requests
for service) between clients and servers.
17-12
Design Issues
 Division of processing: the allocation of
tasks to clients and servers.
 Process management: interoperability
among clients and servers and efficiently
processing messages between clients and
servers.
Middleware: software for process
management
•
17-13
Tasks to Distribute
 Presentation: code to maintain the
graphical user interface
 Validation: code to ensure the consistency
of the database and user inputs
 Business logic: code to perform business
functions
 Workflow: code to ensure completion of
business processes
 Data access: code to extract data to
answer queries and modify a database
17-14
Middleware
 A software component that performs
process management.
 Allow clients and servers to exist on
different platforms.
 Allows servers to efficiently process
messages from a large number of clients.
 Often located on a dedicated computer.
17-15
Client-Server Computing with
Middleware
Middleware
17-16
Types of Middleware
 Transaction-processing monitors: relieve the
operating system of managing database
processes
 Message-oriented middleware: maintain a
queue of messages
 Object-request brokers: provide a high level of
interoperability and message intelligence
 Data access middleware: provide a uniform
interface to relational and non relational data
using SQL
17-17
Two-Tier Architecture
SQL statements
Database server
Database
Query results
17-18
Two-Tier Architecture
 A PC client and a database server interact
directly to request and transfer data.
 The PC client contains the user interface
code.
 The server contains the data access logic.
 The PC client and the server share the
validation and business logic.
17-19
Three-Tier Architecture
(Middleware Server)
SQL statements
Database
Middleware server
Database server
Query Results
17-20
Three-Tier Architecture
(Application Server)
(b) Application server
Database
SQL statements
Database server
Application server
Query results
17-21
Three-Tier Architecture
 To improve performance, the three-tier
architecture adds another server layer
either by a middleware server or an
application server.
The additional server software can reside
on a separate computer.
Alternatively, the additional server
software can be distributed between the
database server and PC clients.
•
•
17-22
Multiple-Tier Architecture
 A client-server architecture with more than three
layers: a PC client, a backend database server,
an intervening middleware server, and
application servers.
 Provides more flexibility on division of
processing
 The application servers perform business logic
and manage specialized kinds of data such as
images.
17-23
Multiple-Tier Architecture
Application server
Database
Middleware server
Database server
Application server
17-24
Multiple-Tier Architecture with
Web Server
Page
request
HTML
Database
request
SQL
Middleware
server
with listener
Web server
HTML
Database
server
Results
SQL statements
and formatting requirements
Database
17-25
Web Service Architecture
 Generalize multiple-tier architectures for
electronic business commerce
 Supports services provided/used by
automated agents
 Advantages
 Deploy services faster
 Communicate services in standard formats
 Find services easier
17-26
Web Service Components
Registry
database
Service
description
(WSDL)
Service
requestor
Service
registry
Find
Publish
Bind
Service
description
(WSDL)
Service
provider
Service
implementation
Service
description
(WSDL)
17-27
Web Service Standards
 HTTP, FTP, TCP-IP
 Simple Object Access Protocol: XML
message sending
 Web Service Description Language
(WSDL)
 Universal Description, Discovery
Integration
 Web Services Flow Language
17-28
Parallel DBMS
 Uses a collection of resources
(processors, disks, and memory) to
perform work in parallel
 Divide work among resources to achieve
desired performance (scaleup and
speedup) and availability.
 Uses high speed network, operating
system, and storage system
 Purchase decision involves more than
parallel DBMS
17-29
Basic Architectures
(c) SN
(b) SD
(a) SE
N
N
P
P ...
P
M
...
P
P ...
P
P
P ...
P
M
M
M
M
M
M
...
...
Legend
P: processor
M: memory
N: high-speed network
SE: shared everything
SD: shared disk
SN: shared nothing
17-30
Clustering Architectures
(a) Clustered disk (CD)
(b) Clustered nothing (CN)
N
N
P
P ... P
P
P ... P
P
P ... P
P
P ... P
M
M
M
M
M
M
M
M
M
...
M
...
M
...
M
...
17-31
Design Issues
 Load balancing: CN architecture most
sensitive
 Cache coherence: CD architecture
problem
 Interprocessor communication: CN
architecture most sensitive
 Application transparency: no knowledge
about parallelism
17-32
Oracle Real Application
Clusters
SGA
LGWR
DBWR
Cache fusion
GCS
SGA
GCS
DBWR
LGWR
Shared storage system
Redo
logs
DB files
Redo
logs
Legend
LGWR: Log writer process
DBWR: DB writer process
GCS: Global cache service
SGA: Shared global area
17-33
Oracle RAC Features





Cache fusion to synchronize cache access
Query optimizer intelligence
Connection load balancing
Automatic failover
Comprehensive administration interface
17-34
IBM DB2 SPF
Coordinator
P
P
...
P
P
P
...
...
Partition 1
P
...
M
M
P
...
Partition 2
P
...
P
M
...
Partition n
17-35
IBM SPF Features




Automatic or DBA determined partitioning
Query optimizer intelligence
High scalability
Partitioned log parallelism
17-36
Distributed Database Architectures
 DBMSs need fundamental extensions.
 Underlying the extensions are a different
component architecture and a different
schema architecture.
 Component Architecture manages
distributed database requests.
 Schema Architecture provides additional
layers of data description.
17-37
Global Requests
Product data
Customer-order data
Product data
Customer-order data
17-38
Component Architecture
GD
GD
DDM
GD
Site 2
DDM
Site 1
DDM
LDM
LDM
DB
DB
Site 3
17-39
Schema Architecture I
External
schema 1
External
schema 2
...
External
schema n
Conceptual
schema
Fragmentation
schema
Allocation
schema
m Sites
Internal
schema 1
Internal
schema 2
...
Internal
schema m
17-40
Schema Architecture II
Global
external
schema 1
Global
external
schema 2
...
Global
external
schema n
Global
conceptual
schema
m Sites
Site 1 local
mapping
schema
Site 2 local
mapping
schema
Site 1 local
schemas
(conceptual,
internal,
external)
Site 2 local
schemas
(conceptual,
internal,
external)
...
...
Site m local
mapping
schema
Site m local
schemas
(conceptual,
internal,
external)
17-41
Distributed Database
Transparency
 Transparency is related to data independence.
 With transparency, users can write queries with
no knowledge of the distribution, and distribution
changes will not cause changes to existing
queries and transactions.
 Without transparency, users must reference
some distribution details in queries and
distribution changes can lead to changes in
existing queries.
17-42
Motivating Example
Customer
CustNo
CustName
CustCity
CustState
CustZip
CustRegion
Product
ProdNo
1
1
ProdName
ProdColor
ProdPrice
8
1
ProdNo
OrdNo
Inventory
OrdCity
StockNo
OrdDate
OrdAmt
CustNo
1
8
OrdNo
8
OrderLine
8
Order
QOH
WarehouseNo
ProdNo
17-43
Fragments Based on the
CustRegion Column
CREATE FRAGMENT Western-Customers AS
SELECT * FROM Customer WHERE CustRegion = 'West'
CREATE FRAGMENT Western-Orders AS
SELECT Order.* FROM Order, Customer
WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'West'
CREATE FRAGMENT Western-OrderLines AS
SELECT OrderLine.* FROM Customer, OrderLine, Order
WHERE OrderLine.OrdNo = Order.OrdNo
AND Order.CustNo = Customer.CustNo AND CustRegion = 'West'
CREATE FRAGMENT Eastern-Customers AS
SELECT * FROM Customer WHERE CustRegion = 'East'
CREATE FRAGMENT Eastern-Orders AS
SELECT Order.* FROM Order, Customer
WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'East'
CREATE FRAGMENT Eastern-OrderLines AS
SELECT OrderLine.* FROM Customer, OrderLine, Order
WHERE OrderLine.OrdNo = Order.OrdNo
AND Order.CustNo = Customer.CustNo AND CustRegion = 'East'
17-44
Fragments Based on the
WareHouseNo Column
CREATE FRAGMENT Denver-Inventory AS
SELECT * FROM Inventory WHERE WareHouseNo = 1
CREATE FRAGMENT Seattle-Inventory AS
SELECT * FROM Inventory WHERE WareHouseNo = 2
17-45
Fragmentation Transparency
 Fragmentation transparency provides the
highest level of data independence.
 Users formulate queries and transactions
without knowledge of fragments (locations,
or local formats).
 If fragments change, queries and
transactions are not affected.
17-46
Location Transparency
 Location transparency provides a lesser
level of data independence than
fragmentation transparency.
 Users need to reference fragments in
formulating queries and transactions.
 However, knowledge of locations and local
formats is not necessary.
17-47
Local Mapping Transparency
 Local mapping transparency provides a
lesser level of data independence than
location transparency.
 Users need to reference fragments at sites
in formulating queries and transactions.
 However, knowledge of local formats is not
necessary.
17-48
Oracle Distributed Databases
 Homogeneous and heterogeneous
distributed databases
 Emphasis on site autonomy
 Provides local mapping transparency
 Each site is a separately managed
database.
17-49
Oracle Links
 One way link from local to remote
 Support remote access to other users’
objects
 Necessary to have knowledge of remote
database objects
 Use synonyms and views with links to
reduce remote database knowledge
17-50
Distributed Database Processing
 Distributed data adds considerable
complexity to query processing and
transaction processing.
 Distributed database processing involves
movement of data, remote processing,
and site coordination.
 Performance implications sometimes
cannot be hidden.
17-51
Distributed Query Processing
 Involves both local (intra site) and global
(inter site) optimization.
 Multiple optimization objectives
 The weighting of communication costs
versus local processing costs depends on
network characteristics.
 There are many more possible access
plans for a distributed query.
17-52
Distributed Transaction
Processing
 Distributed DBMS provides concurrency
and recovery transparency.
 Independently operating sites must be
coordinated.
 New kinds of failures exist because of the
communication network.
 New protocols are necessary.
17-53
Distributed Concurrency Control
 The simplest scheme involves centralized
coordination.
 Centralized coordination involves the
fewest messages and the simplest
deadlock detection.
 The number of messages can be twice as
much in distributed coordination.
 Primary Copy Protocol is used to reduce
overhead with locking multiple copies.
17-54
Centralized Coordination
Lock
status
Lock
request
Subtransaction 1
at Site x
Central
coordinator
Subtransaction 2
...
at Site y
...
Subtransaction n
at Site z
17-55
Distributed Recovery Management
 Distributed DBMSs must contend with
failures of communication links and sites.
 Detecting failures involves coordination
among sites.
 The recovery manager must ensure that
different parts of a partitioned network act
in unison.
 The protocol for distributed recovery is the
two phase commit protocol (2PC).
17-56
Voting and Decision Phases
Coordinator
Participant
1
Write Begin-Commit to log.
Send Ready messages.
Wait for responses.
Voting phase
3
If all sites vote ready before timeout,
Write Global Commit record.
Decision phase
Send Commit messages.
Wait for Acknowledgments.
Else send Abort messages.
5
2
Force updates to disk.
If no failure,
Write Ready-Commit to log.
Send Ready vote.
Else send Abort vote.
4
Write Commit to log.
Release locks.
Send acknowledgment.
Wait for acknowledgments.
Resend Commit messages if necessary.
Write global end of transaction.
17-57
Summary
 Utilizing distributed processing and data can significantly
improve DBMS services but at the cost of new design
challenges.
 Client-server architectures provide alternatives among
cost, complexity, and benefit levels.
 Parallel database processing provides improved
performance (speedup and scaleup) and availability.
 Architectures for distributed DBMSs differ in the
integration of the local databases and level of data
independence.
17-58