Download Chapter 17 - Spatial Database Group

Document related concepts

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Serializability wikipedia , lookup

Microsoft Access wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Team Foundation Server wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database model wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
Chapter 17
Client-Server Processing and
Distributed Databases
Outline
Overview of Distributed Processing and
Distributed Data
 Client-Server Database Architectures
 Web Database Connectivity
 Architectures for Distributed Database
Management Systems
 Transparency for Distributed Database
Processing
 Distributed Database Processing

Evolution of Distributed
Processing and Distributed Data
Need to share resources across a network
 Timesharing (1970s)
 Remote procedure calls (1980s)
 Client-server computing (1990s)

Timesharing Network
Terminal
Terminal
Database
Mainframe computer
Terminal
Resource Sharing with a
Network of Personal Computers
(a) Remote procedural call
Procedural call
Return results
Database
(b) File sharing
File request
File returned
Database
Client-Server Processing with
Distributed processing only
Server
Client
Client
Client
Database
Distributed processing and data
Server
Server
Client
Client
Client
Client
Database
Database
Motivation for Distributed
Processing
Flexibility: the ease of maintaining and
adapting a system
 Scalability: the ability to support scalable
growth of hardware and software capacity
 Interoperability: open standards that allow
two or more systems to exchange and use
software and data

Motivation for Distributed Data
Data control: locate data to match an
organization’s structure
 Communication costs: locate data close to
data usage to lower communication cost
and improve performance
 Reliability: increase data availability by
replicating data at more than one site

Summary of Distributed
Processing and Data
Distributed
Processing
Flexibility,
interoperability,
scalability
Distributed Data
Local control of data,
improved performance,
reduced communication
costs, improved reliability
Disadvantages High complexity, high High complexity, additional
development cost,
security concerns
possible
interoperability
problems
Advantages
Client-Server Database
Architectures
Client-Server Architecture is an
arrangement of components (clients and
servers) among computers connected by a
network.
 A client-server architecture supports
efficient processing of messages (requests
for service) between clients and servers.

Design Issues
Division of processing: the allocation of
tasks to clients and servers.
 Process management: interoperability
among clients and servers and efficiently
processing messages between clients and
servers.
Middleware: software for process
management

•
Tasks to Distribute
Presentation: code to maintain the
graphical user interface
 Validation: code to ensure the consistency
of the database and user inputs
 Business logic: code to perform business
functions
 Workflow: code to ensure completion of
business processes
 Data access: code to extract data to answer
queries and modify a database

Middleware
A software component that performs
process management.
 Allow clients and servers to exist on
different platforms.
 Allows servers to efficiently process
messages from a large number of clients.
 Often located on a dedicated computer.

Client-Server Computing with
Middleware
Middleware
Types of Middleware




Transaction-processing monitors: relieve the
operating system of managing database processes
Message-oriented middleware: maintain a queue
of messages
Object-request brokers: provide a high level of
interoperability and message intelligence
Data access middleware: provide a uniform
interface to relational and non relational data
using SQL
Two-Tier Architecture
SQL statements
Database server
Database
Query results
Two-Tier Client-Server
Architecture
A PC client and a database server interact
directly to request and transfer data.
 The PC client contains the user interface
code.
 The server contains the data access logic.
 The PC client and the server share the
validation and business logic.

Three-Tier Architecture
(Middleware Server)
SQL statements
Database
Middleware server
Query Results
Database server
Three-Tier Architecture
(Application Server)
(b) Application server
Database
SQL statements
Database server
Query results
Application server
Three-Tier Architecture

•
•
To improve performance, the three-tier
architecture adds another server layer
either by a middleware server or an
application server.
The additional server software can reside
on a separate computer.
Alternatively, the additional server
software can be distributed between the
database server and PC clients.
Multiple-Tier Architecture
A client-server architecture with more than
three layers: a PC client, a backend
database server, an intervening middleware
server, and application servers.
 Provides more flexibility on division of
processing
 The application servers perform business
logic and manage specialized kinds of data
such as images.

Multiple-Tier Architecture
Application server
Database
Middleware server
Application server
Database server
Multiple-Tier Architecture with
Software Bus
Database
Application server
Database server
Software bus
Application server
Web Database Connectivity
Internet commerce depends heavily on
database access for websites.
 Web database connectivity allows a
database to be manipulated through a Web
page.
 A user may use a Web form to change a
database or view a report generated from a
database.

Internet Basics
Network of networks
 Uses standard protocols: TCP/IP

– TCP: splits messages into packets
– IP: routes messages

Each computer on the Internet has a unique
numeric address known as an IP address.
Internet and Intranet Relationship
TCP/IP
TCP/IP
TCP/IP
Intranet
Firewall
TCP/IP
TCP/IP
TCP/IP
Internet
World Wide Web
Most popular application on the Internet
 Supports browsing pages located on any
computer on the Internet
 Hypertext Transport Protocol (HTTP)
establishes a session between a browser and
a Web server.
 Each page has a unique address known as a
URL.

Web Page Request Cycle
1 User clicks hyperlink.
5
Browser displays file.
Browser sends request to
2
web server.
4 Web server sends file.
Web server locates
3
page.
XML/XSL
Solutions to HTML limitations
 eXtensible Markup Language (XML)

– Separates content and structure of a document
– Use document type declaration or schema to
specify document structure
eXtensible Style Language (XSL):
supports transformation into display
languages
 Both are extensible languages

The Common Gateway
Interface (CGI)
CGI is an interface that allows a Web
server to invoke an external program on
the same computer.
 The external program uses the parameters
passed by the Web server to produce
output that is sent back to the browser.
 Usually, the output contains HTML/XML
so that the browser can display it properly.

Straight CGI
Parameters
SQL
External
program
Web server
HTML/XML
Database
server
Results
Database
Hybrid CGI
Parameters
Parameters
External
program
Web server
HTML/XML
SQL
Partner
program
HTML/XML
Database
server
Results
Database
Server-side connectivity
Server-side connectivity bypasses the
external program needed with the CGI
approaches.
 Specialized Web server or middleware
server is needed
 SQL statements and database logic are kept
in a web page or external file.
 The database code can execute stored
procedures on the database server.

Server-Side Connectivity
Approach
SQL
Web server
with
middleware
extension
Database
server
HTML/XML
SQL statements
and formatting requirements
Database
Server-Side Connectivity with a
Middleware Server
Database
request
SQL
Middleware
server
with listener
Web server
HTML/XML
Database
server
Results
SQL statements
and formatting requirements
Database
Client-Side Connectivity
Client computing capacity can be more
fully utilized without storing code on the
client.
 Provides a more customized interface than
permitted by HTML
 Supports data buffering by the client to
improve performance

Web Page Request Cycle with
Client-Side Connectivity
Browser sends request to
web server.
Web server sends file
containing HTML/XML
and embedded applet
(Java) or binary object
(ActiveX).
Summary of Web Connectivity
Approach
Architecture Example Product
Comments
Straight CGI
Two-tier
Inexpensive; portability only limited
by external program; limited
scalability
More expensive than straight CGI;
portability depends on partner
program; scalable to modest loads
Expensive; Web server dependent;
highly scalable
Apache Web server
with external PERL
program
Hybrid CGI
Two-tier
Apache Web server
with Cold Fusion
server extensions
Server-side
Three-tier
Internet Information
connectivity and multiple- Server with Microsoft
tier
Transaction Server
Server-side
Three-tier
Oracle Application
connectivity and multiple- Server
(Middleware) tier
Client-side
Two- and
Microsoft Remote
connectivity multiple-tier Data Service
Expensive; Web server independent;
highly scalable
Customized client interface; efficient
data buffering; usually works with
server-side connectivity approaches
Architectures for Distributed
Database Management Systems
DBMSs need fundamental extensions.
 Underlying the extensions are a different
component architecture and a different
schema architecture.
 Component Architecture manages
distributed database requests.
 Schema Architecture provides additional
layers of data description.

Global Requests
Product data
Customer-order data
Product data
Customer-order data
Component Architecture
GD
GD
DDM
GD
Site 2
DDM
Site 1
DDM
LDM
LDM
DB
DB
Site 3
Schema Architecture I
External
schema 1
External
schema 2
...
External
schema n
Conceptual
schema
Fragmentation
schema
Allocation
schema
m Sites
Internal
schema 1
Internal
schema 2
...
Internal
schema m
Schema Architecture II
Global
external
schema 1
Global
external
schema 2
...
Global
external
schema n
Global
conceptual
schema
m Sites
Site 1 local
mapping
schema
Site 2 local
mapping
schema
Site 1 local
schemas
(conceptual,
internal,
external)
Site 2 local
schemas
(conceptual,
internal,
external)
...
...
Site m local
mapping
schema
Site m local
schemas
(conceptual,
internal,
external)
Transparency for Distributed
Database Processing
Transparency is related to data independence.
 With transparency, users can write queries
with no knowledge of the distribution, and
distribution changes will not cause changes to
existing queries and transactions.
 Without transparency, users must reference
some distribution details in queries and
distribution changes can lead to changes in
existing queries.

Motivating Example
Customer
CustNo
CustName
CustCity
CustState
CustZip
CustRegion
Product
ProdNo
1
1
ProdName
ProdColor
ProdPrice
8
1
ProdNo
OrdNo
Inventory
OrdCity
StockNo
OrdDate
OrdAmt
CustNo
1
8
OrdNo
8
OrderLine
8
Order
QOH
WarehouseNo
ProdNo
Fragments Based on the
CustRegion Field
CREATE FRAGMENT Western-Customers AS
SELECT * FROM Customer WHERE CustRegion = 'West'
CREATE FRAGMENT Western-Orders AS
SELECT Order.* FROM Order, Customer
WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'West'
CREATE FRAGMENT Western-OrderLines AS
SELECT OrderLine.* FROM Customer, OrderLine, Order
WHERE OrderLine.OrdNo = Order.OrdNo
AND Order.CustNo = Customer.CustNo AND CustRegion = 'West'
CREATE FRAGMENT Eastern-Customers AS
SELECT * FROM Customer WHERE CustRegion = 'East'
CREATE FRAGMENT Eastern-Orders AS
SELECT Order.* FROM Order, Customer
WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'East'
CREATE FRAGMENT Eastern-OrderLines AS
SELECT OrderLine.* FROM Customer, OrderLine, Order
WHERE OrderLine.OrdNo = Order.OrdNo
AND Order.CustNo = Customer.CustNo AND CustRegion = 'East'
Fragments Based on the
WareHouseNo Field
CREATE FRAGMENT Denver-Inventory AS
SELECT * FROM Inventory WHERE WareHouseNo = 1
CREATE FRAGMENT Seattle-Inventory AS
SELECT * FROM Inventory WHERE WareHouseNo = 2
Fragmentation Transparency
Fragmentation transparency provides the
highest level of data independence.
 Users formulate queries and transactions
without knowledge of fragments
(locations, or local formats).
 If fragments change, queries and
transactions are not affected.

Location Transparency
Location transparency provides a lesser
level of data independence than
fragmentation transparency.
 Users need to reference fragments in
formulating queries and transactions.
 However, knowledge of locations and local
formats is not necessary.

Local Mapping Transparency
Local mapping transparency provides a
lesser level of data independence than
location transparency.
 Users need to reference fragments at sites
in formulating queries and transactions.
 However, knowledge of local formats is
not necessary.

Distributed Database
Processing
Distributed data adds considerable
complexity to query processing and
transaction processing.
 Distributed database processing involves
movement of data, remote processing, and
site coordination.
 Performance implications sometimes
cannot be hidden.

Distributed query processing
Involves both local (intra site) and global
(inter site) optimization.
 Multiple optimization objectives
 The weighting of communication costs
versus local processing costs depends on
network characteristics.
 There are many more possible access plans
for a distributed query.

Distributed Transaction
Processing
Distributed DBMS provides concurrency
and recovery transparency.
 Independently operating sites must be
coordinated.
 New kinds of failures exist because of the
communication network.
 New protocols are necessary.

Distributed Concurrency Control
The simplest scheme involves centralized
coordination.
 Centralized coordination involves the
fewest messages and the simplest deadlock
detection.
 The number of messages can be twice as
much in distributed coordination.
 Primary Copy Protocol is used to reduce
overhead with locking multiple copies.

Centralized Coordination
Lock
status
Lock
request
Subtransaction 1
at Site x
Central
coordinator
Subtransaction 2
...
at Site y
...
Subtransaction n
at Site z
Distributed Recovery
Management
Distributed DBMSs must contend with
failures of communication links and sites.
 Detecting failures involves coordination
among sites.
 The recovery manager must ensure that
different parts of a partitioned network act
in unison.
 The protocol for distributed recovery is the
two phase commit protocol (2PC).

Voting and Decision Phases
Coordinator
Participant
1
Write Begin-Commit to log.
Send Ready messages.
Wait for responses.
Voting phase
3
If all sites vote ready before timeout,
Write Global Commit record.
Decision phase
Send Commit messages.
Wait for Acknowledgments.
Else send Abort messages.
5
Wait for acknowledgments.
Resend Commit messages if necessary.
Write global end of transaction.
2
Force updates to disk.
If no failure,
Write Ready-Commit to log.
Send Ready vote.
Else send Abort vote.
4
Write Commit to log.
Release locks.
Send acknowledgment.
Summary
Utilizing distributed processing and data
can significantly improve DBMS services
but at the cost of new design challenges.
 Several client-server architectures provide
alternatives among cost, complexity, and
benefit levels.
 Architectures for distributed DBMSs differ
in the integration of the local databases and
level of data independence.
