Download Replication Extracts from Books Online

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Relational model wikipedia , lookup

Big data wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
Replication Extracts from Books Online
Planning for Replication
Careful planning before replication deployment can maximize data consistency, minimize
demands on network resources, and prevent troubleshooting later.
Consider these areas when planning for replication:
 Whether replicated data needs to be updated, and by whom.
 Your data distribution needs regarding consistency, autonomy, and latency.
 The replication environment, including business users, technical infrastructure,
network and security, and data characteristics.
 Types of replication and replication options.
 Replication topologies and how they align with the types of replication.
Types of Replication
Microsoft® SQL Server™ 2000 provides the following types of replication that you can
use in your distributed applications:
 Snapshot replication
 Transactional replication
 Merge replication
Each type provides different capabilities depending on your application, and different
levels of ACID properties (atomicity, consistency, isolation, durability) of transactions
and site autonomy. For example, merge replication allows users to work and update data
autonomously, although ACID properties are not assured. Instead, when servers are
reconnected, all sites in the replication topology converge to the same data values.
Transactional replication maintains transactional consistency, but Subscriber sites are not
as autonomous as they are in merge replication because Publishers and Subscribers
generally should be connected continuously for updates to be propagated to Subscribers.
It is possible for the same application to use multiple replication types and options. Some
of the data in the application may not require any updates at Subscribers, some sets of
data may require updates infrequently, with updates made at only one or a few servers,
while other sets of data may need to be updated daily at multiple servers.
Which type of replication you choose for your application depends on your requirements
based on distributed data factors, whether or not data will need to be updated at the
Subscriber, your replication environment, and the needs and requirements of the data that
will be replicated. For more information, see Planning for Replication.
Each type of replication begins with generating and applying the snapshot at the
Subscriber, so it is important to understand snapshot replication in addition to any other
type of replication and options you choose.
Publishers, Distributors, and Subscribers
Before you configure publishing and distribution, consider the roles and requirements of
the servers in your replication topology.
Publisher
The Publisher is a server that makes data available for replication to other servers. In
addition to being the server where you specify which data is to be replicated, the
Publisher also detects which data has changed and maintains information about all
publications at that site. Usually, any data element that is replicated has a single
Publisher, even if it may be updated by several Subscribers or republished by a
Subscriber.
The publication database is the database on the Publisher that is the source of data and
database objects to be replicated. Each database used in replication must be enabled as a
publication database either through the Configure Publishing and Distribution Wizard,
the Publisher and Distributor properties, by using the sp_replicationdboption system
stored procedure, or by creating a publication on that database using the Create
Publication Wizard.
Distributor
The Distributor is a server that contains the distribution database and stores meta data,
history data, and/or transactions. The Distributor can be a separate server from the
Publisher (remote Distributor), or it can be the same server as the Publisher (local
Distributor). The role of the Distributor varies depending on which type of replication
you implement, and in general, its role is much greater for snapshot replication and
transactional replication than it is for merge replication.
Type of Replication
Snapshot Replication or
Transactional Replication
Distributor role



Merge Replication


Stores replicated transactions temporarily for
transactional replication.
Hosts most of the replication agents unless remote
agent activation or pull subscriptions are used.
Stores meta data and history data.
Stores meta data and synchronization history.
Hosts the snapshot agent and merge agent for push
subscriptions.
A Distributor may require additional resources to:
 Store the snapshot files for a publication.
 Host one or more distribution databases.
 Host processing for most replication agents (for pull subscriptions, the Merge
Agent or Distribution Agent runs at the Subscriber).
Remote Distributors
A remote Distributor is a computer that is physically separate from the Publisher and is
configured as a Distributor of replication. A local Distributor is a computer that is
configured to be both a Publisher and a Distributor of replication.
When you create a publication, the default snapshot folder location is on the Distributor.
Typically, you would choose to use a remote Distributor when you want to offload
processing to another computer, when you want minimal impact from replication on the
Publisher (for example, if the Publisher is an OLTP server), or if you want a centralized
Distributor for multiple Publishers.
Subscribers
Subscribers are servers that receive replicated data. Subscribers subscribe to publications,
not to individual articles within a publication, and they subscribe only to the publications
that they need, not necessarily all of the publications available on a Publisher.
Designing a Replication Topology
A replication topology defines the relationship between servers and the copies of data,
along with the logic that determines how synchronization occurs between copies.
Designing a replication topology helps you determine how long it takes for changes to get
from a Publisher to a Subscriber, whether the failure of one update prevents other
Subscribers from being updated, and the order in which updated information arrives at a
Subscriber, which can affect analysis and reporting.
To determine your replication topology:
 Select the physical replication model (central Publisher, central Publisher with
remote Distributor, publishing Subscriber, or central Subscriber).
 Determine where snapshot files will be located and how Publishers and
Subscribers will synchronize initially.
 Determine whether the Distributor will be local or remote, and determine whether
the distribution database will be shared.
 Determine if multiple Publishers will share a Distributor, each use its own
distribution database on the Publisher, or share a distribution database.
 Determine the type of replication and options to use.
 Determine whether replication is initiated at the Publisher (using push
subscriptions) or at the Subscriber (using pull subscriptions).
The replication topology is not limited to the physical connections between servers
because it also includes data paths between copies of the data. A Subscriber can receive
multiple copies of data from different Publishers, and all of those data copies can exist on
one server, incorporating a complicated topology.
Replication Options
Replication options allow you to configure replication in a manner best suited to your
application and environment.
Option
Filtering
Published Data
Type of
Replication
Snapshot
Replication,
Transactional
Replication,
Merge
Replication
Benefits
Filters allow you to create vertical and/or horizontal
partitions of data that can be published as part of
replication. By distributing partitions of data to
different Subscribers, you can:
 Minimize the amount of data sent over the
network.
 Reduce the amount of storage space required
at the Subscriber.
 Customize publications and applications
based on individual Subscriber requirements.
 Reduce conflicts because the different data
partitions can be sent to different
Subscribers.
Synchronizing Data
Synchronizing data refers to the process of data being propagated between Publisher and
Subscribers after the initial snapshot has been applied at the Subscriber. When a
subscription is synchronized, different processes occur depending on the type of
replication you are using and whether the subscription has been marked for
reinitialization.
For snapshot replication, synchronize means to reapply the snapshot at the Subscriber so
that schema and data at the subscription database is consistent with the publication
database. For transactional replication, synchronizing data means that data updates,
inserts, deletes, and other modifications are distributed between Publisher and
Subscribers. For merge replication, synchronization means that data updates made at
multiple sites are merged, conflicts (if any) are detected and resolved, and data eventually
converges to the same values.
The Distribution Agent and the Merge Agent move changes to data that occur at the
Publisher or at Subscribers. For consistency, Microsoft® SQL Server™ 2000 replication
uses the term synchronize to refer to when one of these replication agent runs.
Snapshot Replication Synchronization
When a subscription to a snapshot publication is synchronized, the Distribution Agent
(using distrib.exe or the Distribution ActiveX® Control) runs and the most recent
snapshot will be applied at the Subscriber. If modifications to data have been made, a
new snapshot will need to be generated before the new data can be applied to the
Subscriber.
Transactional Replication Synchronization
When a subscription to a transactional publication is synchronized, the Distribution
Agent (using distrib.exe or the Distribution ActiveX Control) runs and UPDATE,
INSERT and DELETE statements that have been logged at the Distributor are propagated
to the Subscriber.
If the subscription has been marked for reinitialization, the Snapshot Agent and
Distribution Agent must run so that a new snapshot is generated and propagated to
Subscribers.
Merge Replication Synchronization
Synchronization occurs when Publishers and Subscribers in a merge replication topology
reconnect using the Merge Agent (replmerg.exe or the Merge ActiveX Control) and
updates are propagated between sites, and if necessary, conflicts detected and resolved.
At the time of synchronization, the Merge Agent sends all changed data to the other sites.
Data flows from the originator of the change to the sites that need to be updated or
synchronized.
At the destination database, updates propagated from other sites are merged with existing
values according to extensible and flexible conflict detection and resolution. A Merge
Agent evaluates the arriving and current data values, and any conflicts between new and
old values are resolved automatically based on the default resolver (a resolver you
specified when creating the publication or a custom resolver).
Changed data values are replicated to other sites and converged with changes made at
those sites only when synchronization occurs. Synchronizations can occur minutes, days,
or even weeks apart. Data is converged and all sites eventually end up with the same data
values. However, if conflicts were detected and resolved, it means that work that was
committed by some users was altered or undone to resolve the conflict according to your
defined policies.