* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Replication Extracts from Books Online
Survey
Document related concepts
Transcript
Replication Extracts from Books Online Planning for Replication Careful planning before replication deployment can maximize data consistency, minimize demands on network resources, and prevent troubleshooting later. Consider these areas when planning for replication: Whether replicated data needs to be updated, and by whom. Your data distribution needs regarding consistency, autonomy, and latency. The replication environment, including business users, technical infrastructure, network and security, and data characteristics. Types of replication and replication options. Replication topologies and how they align with the types of replication. Types of Replication Microsoft® SQL Server™ 2000 provides the following types of replication that you can use in your distributed applications: Snapshot replication Transactional replication Merge replication Each type provides different capabilities depending on your application, and different levels of ACID properties (atomicity, consistency, isolation, durability) of transactions and site autonomy. For example, merge replication allows users to work and update data autonomously, although ACID properties are not assured. Instead, when servers are reconnected, all sites in the replication topology converge to the same data values. Transactional replication maintains transactional consistency, but Subscriber sites are not as autonomous as they are in merge replication because Publishers and Subscribers generally should be connected continuously for updates to be propagated to Subscribers. It is possible for the same application to use multiple replication types and options. Some of the data in the application may not require any updates at Subscribers, some sets of data may require updates infrequently, with updates made at only one or a few servers, while other sets of data may need to be updated daily at multiple servers. Which type of replication you choose for your application depends on your requirements based on distributed data factors, whether or not data will need to be updated at the Subscriber, your replication environment, and the needs and requirements of the data that will be replicated. For more information, see Planning for Replication. Each type of replication begins with generating and applying the snapshot at the Subscriber, so it is important to understand snapshot replication in addition to any other type of replication and options you choose. Publishers, Distributors, and Subscribers Before you configure publishing and distribution, consider the roles and requirements of the servers in your replication topology. Publisher The Publisher is a server that makes data available for replication to other servers. In addition to being the server where you specify which data is to be replicated, the Publisher also detects which data has changed and maintains information about all publications at that site. Usually, any data element that is replicated has a single Publisher, even if it may be updated by several Subscribers or republished by a Subscriber. The publication database is the database on the Publisher that is the source of data and database objects to be replicated. Each database used in replication must be enabled as a publication database either through the Configure Publishing and Distribution Wizard, the Publisher and Distributor properties, by using the sp_replicationdboption system stored procedure, or by creating a publication on that database using the Create Publication Wizard. Distributor The Distributor is a server that contains the distribution database and stores meta data, history data, and/or transactions. The Distributor can be a separate server from the Publisher (remote Distributor), or it can be the same server as the Publisher (local Distributor). The role of the Distributor varies depending on which type of replication you implement, and in general, its role is much greater for snapshot replication and transactional replication than it is for merge replication. Type of Replication Snapshot Replication or Transactional Replication Distributor role Merge Replication Stores replicated transactions temporarily for transactional replication. Hosts most of the replication agents unless remote agent activation or pull subscriptions are used. Stores meta data and history data. Stores meta data and synchronization history. Hosts the snapshot agent and merge agent for push subscriptions. A Distributor may require additional resources to: Store the snapshot files for a publication. Host one or more distribution databases. Host processing for most replication agents (for pull subscriptions, the Merge Agent or Distribution Agent runs at the Subscriber). Remote Distributors A remote Distributor is a computer that is physically separate from the Publisher and is configured as a Distributor of replication. A local Distributor is a computer that is configured to be both a Publisher and a Distributor of replication. When you create a publication, the default snapshot folder location is on the Distributor. Typically, you would choose to use a remote Distributor when you want to offload processing to another computer, when you want minimal impact from replication on the Publisher (for example, if the Publisher is an OLTP server), or if you want a centralized Distributor for multiple Publishers. Subscribers Subscribers are servers that receive replicated data. Subscribers subscribe to publications, not to individual articles within a publication, and they subscribe only to the publications that they need, not necessarily all of the publications available on a Publisher. Designing a Replication Topology A replication topology defines the relationship between servers and the copies of data, along with the logic that determines how synchronization occurs between copies. Designing a replication topology helps you determine how long it takes for changes to get from a Publisher to a Subscriber, whether the failure of one update prevents other Subscribers from being updated, and the order in which updated information arrives at a Subscriber, which can affect analysis and reporting. To determine your replication topology: Select the physical replication model (central Publisher, central Publisher with remote Distributor, publishing Subscriber, or central Subscriber). Determine where snapshot files will be located and how Publishers and Subscribers will synchronize initially. Determine whether the Distributor will be local or remote, and determine whether the distribution database will be shared. Determine if multiple Publishers will share a Distributor, each use its own distribution database on the Publisher, or share a distribution database. Determine the type of replication and options to use. Determine whether replication is initiated at the Publisher (using push subscriptions) or at the Subscriber (using pull subscriptions). The replication topology is not limited to the physical connections between servers because it also includes data paths between copies of the data. A Subscriber can receive multiple copies of data from different Publishers, and all of those data copies can exist on one server, incorporating a complicated topology. Replication Options Replication options allow you to configure replication in a manner best suited to your application and environment. Option Filtering Published Data Type of Replication Snapshot Replication, Transactional Replication, Merge Replication Benefits Filters allow you to create vertical and/or horizontal partitions of data that can be published as part of replication. By distributing partitions of data to different Subscribers, you can: Minimize the amount of data sent over the network. Reduce the amount of storage space required at the Subscriber. Customize publications and applications based on individual Subscriber requirements. Reduce conflicts because the different data partitions can be sent to different Subscribers. Synchronizing Data Synchronizing data refers to the process of data being propagated between Publisher and Subscribers after the initial snapshot has been applied at the Subscriber. When a subscription is synchronized, different processes occur depending on the type of replication you are using and whether the subscription has been marked for reinitialization. For snapshot replication, synchronize means to reapply the snapshot at the Subscriber so that schema and data at the subscription database is consistent with the publication database. For transactional replication, synchronizing data means that data updates, inserts, deletes, and other modifications are distributed between Publisher and Subscribers. For merge replication, synchronization means that data updates made at multiple sites are merged, conflicts (if any) are detected and resolved, and data eventually converges to the same values. The Distribution Agent and the Merge Agent move changes to data that occur at the Publisher or at Subscribers. For consistency, Microsoft® SQL Server™ 2000 replication uses the term synchronize to refer to when one of these replication agent runs. Snapshot Replication Synchronization When a subscription to a snapshot publication is synchronized, the Distribution Agent (using distrib.exe or the Distribution ActiveX® Control) runs and the most recent snapshot will be applied at the Subscriber. If modifications to data have been made, a new snapshot will need to be generated before the new data can be applied to the Subscriber. Transactional Replication Synchronization When a subscription to a transactional publication is synchronized, the Distribution Agent (using distrib.exe or the Distribution ActiveX Control) runs and UPDATE, INSERT and DELETE statements that have been logged at the Distributor are propagated to the Subscriber. If the subscription has been marked for reinitialization, the Snapshot Agent and Distribution Agent must run so that a new snapshot is generated and propagated to Subscribers. Merge Replication Synchronization Synchronization occurs when Publishers and Subscribers in a merge replication topology reconnect using the Merge Agent (replmerg.exe or the Merge ActiveX Control) and updates are propagated between sites, and if necessary, conflicts detected and resolved. At the time of synchronization, the Merge Agent sends all changed data to the other sites. Data flows from the originator of the change to the sites that need to be updated or synchronized. At the destination database, updates propagated from other sites are merged with existing values according to extensible and flexible conflict detection and resolution. A Merge Agent evaluates the arriving and current data values, and any conflicts between new and old values are resolved automatically based on the default resolver (a resolver you specified when creating the publication or a custom resolver). Changed data values are replicated to other sites and converged with changes made at those sites only when synchronization occurs. Synchronizations can occur minutes, days, or even weeks apart. Data is converged and all sites eventually end up with the same data values. However, if conflicts were detected and resolved, it means that work that was committed by some users was altered or undone to resolve the conflict according to your defined policies.