Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Object-relational impedance mismatch wikipedia , lookup
Published on ONLamp.com (http://www.onlamp.com/) http://www.onlamp.com/pub/a/onlamp/2004/11/18/slony.html See this if you're having trouble printing code examples Introducing Slony by A. Elein Mustain 11/18/2004 Slony is the Russian plural for elephant. It is also the name of the new replication project being developed by Jan Weick. The mascot for Slony, Slon, is a good variation of the usual Postgres elephant mascot, created by Jan. Figure 1. Slon, the Slony mascot. Slony-I, the first iteration of the project, is an asynchronous replicator of a single master database to multiple replicas, which in turn may have cascaded replicas. It will include all features required to replicate large databases with a reasonable number of replicas. Jan has targeted Slony-I toward data centers and backup sites, implying that all nodes in the network are always available. The master is the primary database with which the applications interact. Replicas are replications, or copies of the primary database. Since the master database is always changing, data replication is the system that enables the updates of secondary, or replica, databases as the master database updates. In synchronous replication systems, the master and the replica are consistent exact copies. The client does not receive a commit until all replicas have the transaction in question. Asynchronous replication loosens that binding and allows the replica to copy transactions from the master, rolling forward, at its own pace. The server issues a commit to the master client based on the state of the master database transaction. Cascading replicas over a WAN minimizes bandwidth, enabling better scalability and also enables readonly (for example, reporting) applications to take advantage of replicas. 10/05/2017 1/28 769875629 Figure 2. Cascading replicas Assume you have a primary site, with a database server and a replica as backup server. Then you create a remote backup center with its own main server and its backup replica. The remote primary server is a direct replica, replicating from the master over the WAN, while the remote secondary server is a cascaded replica, replicating from the primary server via the LAN. This avoids transferring all of the transactions twice over the WAN. More importantly, this configuration enables you to have a remote backup with its own local failover already in place for cases such as a data center failure. Slony's design goals differentiate it from other replication systems. The initial plan was to enable a few very important key features as a basis for implementing these design goals. An underlying theme to the design is to update only that which changes, enabling scalable replication for a reliable failover strategy. The design goals for Slony are: 1. The ability to install, configure, and create a replica and let it join and catch up with a running database. This allows the replacement of both masters and replicas. This idea also enables cascading replicas, which in turn adds scalability, limitation of bandwidth, and proper handling of failover situations. 2. Allowing any node to take over for any other node that fails. In the case of a failure of a replica that provides data to other replicas, the other replicas can continue to replicate from another replica or directly from the master. Figure 3. Replication continues after a failure 10/05/2017 2/28 769875629 In the case where master. Any other asynchronous, the replica becomes a replica. a master node fails, a replica can receive a promotion to become a replicas can then replicate from the new master. Because Slony-I is different replicas may be ahead of or behind each other. When a master, it synchronizes itself with the state of the most recent other In other replication solutions, this roll forward of the new master is not possible. In those solutions, when promoting a replica to master, any other replicas that exist must rebuild from scratch in order to synchronize with the new master correctly. A failover of a 1TB database leaves the new master with no failover of its own for quite a while. The Slony design handles the case where multiple replicas may be at different synchronization times with the master and are able to resynchronize when a new master arises. For example, different replicas could logically be in the future, compared to the new master. There is a way to detect and correct this. If there weren't, you would have to dump and restore the other replicas from the new master to synchronize again. It's possible to roll forward the new master, if necessary, from other replicas because of the packaging and saving of the replication transactions. Replication data is packaged into blocks of transactions and sent to each replica. Each replica knows what blocks it has consumed. Each replica can also pass those blocks along to other servers--this is the mechanism of cascading replicas. A new master may be on transaction block 17 relative to the old master, when another replica is on transaction block 20 relative to the old master. Switching to the new master causes the other replicas to send blocks 18, 19, and 20 to the new master. Jan, said, "This feature took me a while to develop, even in theory." 3. Backup and point-in-time capability with a twist. It is possible, with some scripting, to maintain a delayed replica as a backup that might, for example, be two hours behind the master. This is done by storing and delaying the application of the transaction blocks. With this technique, it is possible to do a point-intime recovery anytime within the last two hours on this replica. The time it takes to recover only depends on the time to which you choose to recover. Choosing "45 minutes ago" would take about one hour and 15 minutes, for example, independent of database size. 4. Hot PostgreSQL installation and configuration. For failover, it must be possible to put a new master into place and reconfigure the system to allow the reassignment of any replica to the master or to cascade from another replica. All of this must be possible without taking down the system. This means that it must be possible to add and synchronize a new replica without disrupting the master. When the new replica is in place, the master switch can happen. This is particularly useful when the new replica is a different PostgreSQL version than the previous one. If you create an 8.0 replica from your 7.4 master, it now is possible to promote the 8.0 to master as a hot upgrade to the new version. 5. Schema changes. 10/05/2017 3/28 769875629 Schema changes require special consideration. The bundling of the replication transactions must be able to join all of the pertinent schema changes together, whether or not they took place in the same transaction. Identifying these change sets is very difficult. In order to address this issue, Slony-I has a way to execute SQL scripts in a controlled fashion. This means that it is even more important to bundle and save your schema changes in scripts. Tracking your schema changes in scripts is a key DBA procedure for keeping your system in order and your database recreatable. The first part of Slony-I also does not address any of the user interface features required to set up and configure the system. After the core engine of Slony-I becomes available, development of the configuration and maintenance interface can begin. There may be multiple interfaces available, depending on who develops the user interface and how. Related Reading Jan points out that "replication will never be something where you type SETUP and all of a sudden your existing enterprise system will nicely replicate in a disaster recovery scenario." Designing how to set up your replication is a complex problem. The user interface(s) will be important to clarify and simplify the configuration and maintenance of your replication system. Some of the issues to address include the configuration of which tables to replicate, the requirement of primary keys, and the handling of sequence and trigger coordination. The Slony-I release does not address the issues of multi-master, synchronous replication or sporadically synchronizable nodes (the "sales person on the road" scenario). However, Jan is considering these issues in the architecture of the system so that future Slony releases may implement some of them. It is critical to design future features into the system; analysis of existing replication systems has shown that it is next to impossible to add fundamental features to an existing replication system. Practical PostgreSQL By John C. Worsley, Joshua D. Drake Table of Contents Index Sample Chapter The primary question to ask regarding the requirements for a failover system is how much down time can you afford. Is five minutes acceptable? Is one hour? Must the failover be read/write, or is it acceptable to have a read-only temporary failover? The second question you must ask is whether you are willing to invest in the hardware required to support multiple copies of your database. A clear cost/benefit analysis is necessary, especially for large databases. References General Bits Slony Articles on Tidbits The Slony-I Project documentation on GBorg Slonik Commands Jan Wieck's Original Slony-I Talk and Scripts, July 2004 in Portland, OR, sponsored by Affilias Global Registry Services Information from IRC's #slony on freenode.net The [email protected] mailing list A. Elein Mustain has more than 15 years of experience working with databases, 10 of those working exclusively with object relational database systems. 10/05/2017 4/28 769875629 Building and Configuring Slony by A. Elein Mustain 12/16/2004 Figure 1. Slony mascot Editor's note: in Introducing Slony, A. Elein Mustain explained the goals of Slony, the replication project for PostgreSQL. This follow-up explains how to install, configure, and start using it. Building I am pleased to report that the basic instructions for the download, build, and install of Slony-I release 1.0.5 were perfect. Slony-I is fairly version independent, but you still need to build it for each PostgreSQL version (7.3 or later) and installation on each machine participating in the replication. The same technique applies when the installations live on different machines. On one machine, I run several versions of PostgreSQL, each built from source. My plan is to replicate between my 7.4 installation and my 8.0 installation, so I configured and built Slony-I against each of those source trees. That took less than a minute for both. Repeat these steps for each source tree and installation: ./configure -with-pgsourcetree=/local/src/postgresql-version make all sudo make install Setting Up Slony-I This step-by-step reading of instructions will be applied to replicate a small database named gb. The plan is to replicate from a PostgreSQL 7.4 installation to a PostgreSQL 8.0 installation, making it possible to upgrade the database. Slonik is the command-line interface that defines the replication system. There will be Slonik scripts to create, update, and change the replication cluster for gb. There are also tools under development to simplify the creation of replication systems with Slony-I; however, this description will explore the underlying Slonik requirements. It is important to learn the basic Slonik commands. About the database gb is a simple eight-table database containing issues and articles for my General Bits web site. The database is normalized, and all tables have natural primary keys. There are several prerequisites: 10/05/2017 5/28 769875629 Each installation that will participate in replication must have Slony-I built and installed. The Slony-I Project on GBorg gives instructions for building and installing Slony-I. My experience with building Slony-I from source against PostgreSQL 7.4 and 8.0Beta3 was very good. Following the instructions provided a clean and fast builds. You need a set of master database tables to replicate and at least one other installation containing the same schema objects. The other installation will be the replica. To achieve this initially, I dumped and restored the schema for the master database on 7.4 into the 8.0 installation: pg_dump -p 5434 -C -s gb | pgsql -p 5430 As you can see, these installations are on the same host and have different port numbers. The real-time clocks of the servers hosting the nodes must be in sync. I recommend using NTP. The pg_hba.conf files on each installation must allow each machine to contact the other. Slonik Slonik is a command-line interface for Slony-I. It can connect to the various databases involved in the replication scheme to perform specific actions. It is an independent helper of Slony-I and of PostgreSQL. The first commands for most Slonik scripts constitute the identity of a group of databases and servers and the connection parameters for accessing each database in the group. Each database and Slony-I connection is a numbered node. The numbers are simply identifiers. The next parameter is the action you wish to process. Slonik commands work well when they are embedded in shell scripts, as in this example. (The next section covers the commands to identify the cluster and node connection information.) #!/bin/bash slonik << _END_ cluster name = gb_cluster; node 1 admin connifo = 'dbname=db host=localhost port=5432 user=postgres'; node 2 admin connifo = 'dbname=db host=localhost port=5430 user=postgres'; ...additional nodes... ...slonik commands... _END_ Both Slonik commands and Slony-I have full sets of commands. Node networks A node is the combination of a database in an installation and one slon process "belonging to" that database. A cluster is a set of nodes cooperating in a replication scheme. The documentation suggests that all nodes have a path to all other nodes. With only two nodes, this is simple to describe. With more nodes, be sure to include a path to all other nodes, regardless of whether you expect replication to take the paths. 10/05/2017 6/28 769875629 Figure 2. A Slony-I cluster Our first Slonik script initializes the cluster, defines each node, and defines the paths from each node to every other node. Notice that each node has an identifying number. init cluster defines the cluster on the first node. store node adds each subsequent node. The user is the slony superuser--in this case, postgres. You can choose any privileged user established as the Postgres superuser on each installation. The path is defined by designating one node as a server and the other as a client for messaging. The terminology does not relate to the replicator/replica relationship; instead it references the possible network path. The connection information in each command belongs to the server node. The client's slon daemon will connect to the server node using that connection information. #!/bin/bash # # 01: Initialize Cluster # slonik << _END_ cluster name = gbcluster; node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres'; node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres'; # # Initialize the cluster and create the second node # init cluster (id=1, comment='gb 7.4 5434'); echo 'Initializing gb cluster'; echo 'Node 1 on pgsql74 port 5434 defined'; store node (id=2, comment='gb 8.0 5430'); echo 'Node 2 on pggsql80b port 5430 defined'; # # create paths in both directions # store path (server=1, client=2, conninfo='dbname=gb host=localhost port=5434 user=postgres'); store path (server=2, client=1, conninfo='dbname=gb host=localhost port=5430 user=postgres'); echo 'path from server node 1 to client node 2 created.'; echo 'path from server node 2 to client node 1 created.'; _END_ Using Slonik's echo command can help log and track the commands in any Slonik script. 10/05/2017 7/28 769875629 Listening for events Events will occur throughout the cluster, and you must tell Slony-I what nodes listen to what nodes to receive these events. The events may be replication information or administrative information that requires propagation throughout the cluster. In the simple case of two nodes, they listen to each other. In any case, all nodes should be able to listen to all other nodes. The paths' definitions intentionally make this possible. Specifying the origin identifies which node the receiver is listening for. The origin of an event may or may not provide the event to the receiver; however, the default is to do so. It is possible for node 3 to listen for events initiated on node 1 and have those events provided by node 2 (which, one assumes, is also listening for events from node 1). In our case, we are having both nodes listen for events on the other, with the events provided by the origin node. #!/bin/bash # # 02: Listen # slonik << _END_ cluster name = gbcluster; node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres'; node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres'; # # make the nodes listen on the paths # in both directions # store listen (origin=1, receiver=2, provider=1); store listen (origin=2, receiver=1, provider=2); _END_ Starting the slon processes Once the nodes can listen to each other for events, start slon. Each database participating in the replication needs a slon process. Give slon a chance to start itself and its threads. The output in our example goes to two logs, which you can tail to watch the activity and look for errors. slon is essentially an event and messaging system. The events involve the replication of data and administrative information to facilitate the replication of data. #!/bin/sh # # 02: Start up Slon processes # # # Start Slony for each node # slon gbcluster "dbname=gb user=postgres port=5434 host=localhost" > slon_gb_74.out 2>&1 & slon gbcluster "dbname=gb user=postgres port=5430 host=localhost" > 10/05/2017 8/28 769875629 slon_gb_80.out 2>&1 & Creating sets Replication in Slony-I works by subscribing to sets of tables. The set usually should comprise the group of related tables for an application or an entire schema. To make this work, first define a set and designate the origin for the set. Then add the tables by naming the set ID, the origin of the set, a table ID, the fully qualified table name, and an optional alternate key. Make sure to enter the origin of the set as it was in the set creation (redundantly). All of the tables participating in the replication must have a primary key. If the table does not have one, you can have Slony-I add one for replication purposes only. Be careful when setting the ID number of a table; it also designates the order in which Slony will lock the tables. This means that master tables should have IDs lower than those of detail tables. The relationship hierarchy of your schema should help you determine the order of the numbers. If the ordering of the table IDs is backward or incorrect, there may be problems with deadlocking the slon process or PostgreSQL. In our example, the issues table is the topmost master, followed by articles. Each of the other tables are lookup tables for those, so their numbers are higher, accordingly. Figure 3. General Bits schema You can create a set only once, without any active subscribers. To add tables to replication set, create a new set. You can later combine two sets by using Slonik's MERGE SET command. #!/bin/sh # 10/05/2017 9/28 769875629 # 03: Create Set # slonik << _END_ # # Define cluster namespace and node connection information # cluster name = gbcluster; node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres'; node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres'; create set (id=1, origin=1, comment='gb tables'); echo 'Set created'; set add table (set id=1, origin=1, id=1, full qualified name = 'public.issues', comment='Issues table'); set add table (set id=1, origin=1, id=2, full qualified name = 'public.articles', comment='Articles table'); set add table (set id=1, origin=1, id=3, full qualified name = 'public.arttext', comment='Article Text table'); set add table (set id=1, origin=1, id=4, full qualified name = 'public.sqlbase', comment='Full SQL keywords'); set add table (set id=1, origin=1, id=5, full qualified name = 'public.whoall', comment='All contributors'); set add table (set id=1, origin=1, id=6, full qualified name = 'public.contrib', comment='Contributors by Article'); set add table (set id=1, origin=1, id=7, full qualified name = 'public.keywords', comment='Keywords by Article'); set add table (set id=1, origin=1, id=8, full qualified name = 'public.sqlkw', comment='Subset of SQL keywords'); echo 'set 1 of gb tables created'; _END_ Subscribing to sets The nodes can now subscribe to the newly created sets. To subscribe to a set, identify the set, the node that can provide the set, the receiver of the set, and whether the receiver of this set should be able to forward the set to another node. In our case, the origin node of the set is the same as the provider of the set, but for cascading subscriptions that is not necessarily the case. Even though this replication system has only two nodes, we are saying that the receiving node may forward the set. This is for the case in which we may want to switch masters or add other nodes to the cluster. Here, node 2 is subscribing to set 1. originating on node 1 and provided by node 1. #!/bin/sh # # gb_subscribeset.sh # slonik << _END_ # # Define cluster namespace and node connection information # cluster name = gbcluster; node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres'; node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres'; subscribe set (id=1, provider=1, receiver=2, forward=yes); echo 'set 1 of gb tables subscribed by node 2'; _END_ Of course, you should assume that these scripts have no typos and that you've run them exactly as intended. Yeah, right. Fortunately, you can recover from mistakes. 10/05/2017 10/28 769875629 Undoing By this time, you probably have made a typo or two and need to know how to start over. The simplest way of undoing is to start fresh. There are subtler ways of correcting mistakes by updating the underlying tables. However, I don't recommend those unless you have intimate knowledge of the underlying tables. To terminate the slon processes, list their process IDs and use kill -TERM to terminate the oldest of the processes for each node. To completely remove all Slony-I definitions from your database, uninstall each node: #!/bin/sh # gb_uninstallnode.sh slonik << _END_ # # Define cluster namespace and node connection information # cluster name = gbcluster; node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres'; node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres'; echo 'Cluster defined, nodes identified'; # # UnInstall both nodes # uninstall node (id=1); uninstall node (id=2); echo 'Nodes 1 and 2 Removed'; _END_ NOTE: UNINSTALL NODE removes all definitions, and you must start cleanly after that. Slony-I schema The underlying tables for Slony-I are fairly straightforward. The cluster name is the name of the schema in the database in which the Slony tables reside. (Use set search_path in psql.) You can verify your commands to add nodes, listens, paths, and so on by examining these tables. It also looks tempting to "fix" things by just changing the underlying tables. Resist doing so, however. Use Slonik so that it can trigger the appropriate events to perform the updates in an orderly fashion across all nodes. 10/05/2017 11/28 769875629 Figure 4. Slony schema References General Bits Slony Articles on Tidbits The Slony-I Project documentation on GBorg Slonik Commands Jan Wieck's Original Slony-I Talk and Scripts July 2004 in Portland, Oregon, sponsored by Affilias Global Registry Services Information from IRC #slony on freenode.net Mailing List: [email protected] Elein Mustain has more than 15 years of experience working with databases, 10 of those working exclusively with object relational database systems. 10/05/2017 12/28 769875629 Modifying Slony Clusters by A. Elein Mustain 03/17/2005 Replication clusters have much forethought applied to their creation. However, in the course of systems development, some changes are always necessary sooner or later. This article walks through the steps required to: Add a node to the cluster. Switch data providers of a table set. Promote a replication to master. Apply schema changes to the replication cluster. These examples will use the replication scheme originally set up in "Introducing Slony." Figure 1. Adding a Node Suppose that in addition to the replica you created for the gb database, you want another replica of the same database for reporting. Here's how to add a replica of the gb table set on a second database in the 8.0 installation. The 7.4 Node 1, database gb, will originate the data set and replicate it directly to Node 2, also database gb, and Node 3, database gb2. 10/05/2017 13/28 769875629 Figure 2. Before starting, be sure to create gb2 in the 8.0 installation, seeding it with the same empty schema as the other two databases in this cluster. You do not want a dump of the schema of gb as is it now, but rather as it was before you defined the Slony-I cluster. Next, define Node 3 and ensure there are paths from Node 3 to and from Nodes 1 and 2. From there, enable listening along each path mirroring the expected table set replication. The listening of 2 and 3 via Node 1 reflects this mirroring, rather than having a direct listen path between 2 and 3. This is a really good time to remember that the connection information in the store path command pertains to the server node. This is also a pretty good time to look up drop path and drop listen, two more slonik commands. #!/bin/sh slonik << _END_ # # Define cluster namespace and node connection information # cluster name = gbcluster; node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres'; node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres'; node 3 admin conninfo = 'dbname=gb2 host=localhost port=5430 user=postgres'; echo 'Cluster defined, nodes identified'; # # Initialize the cluster and create the second node # store node (id=3, comment='gb2 8.0 5430'); # # create paths # store path (server=1, client=3, port=5434 user=postgres'); store path (server=2, client=3, port=5430 user=postgres'); store path (server=3, client=1, port=5430 user=postgres'); store path (server=3, client=2, port=5430 user=postgres'); conninfo='dbname=gb host=localhost conninfo='dbname=gb host=localhost conninfo='dbname=gb2 host=localhost conninfo='dbname=gb2 host=localhost # # Enable listening along each path # store listen (origin=1, receiver=3, provider=1); store listen (origin=3, receiver=1, provider=3); 10/05/2017 14/28 769875629 store listen (origin=2, receiver=3, provider=1); store listen (origin=3, receiver=2, provider=1); _END_ Now you are ready to start a new slon process for Node 3, the 8.0 installation with the gb2 database. Because there are two replica databases on the 8.0 installation: #! /bin/bash # # varlena Slony Start Up # ============================== slon gb "dbname=gb2 host=localhost port=5430 user=postgres" > slon_gb_3.out 2>&1 & Once the new slon process is up and running, you can subscribe Table Set 1, originating at Node 1, to Node 3. At this point the log files are invaluable; tail -f the log files to watch for progress and errors. The log files in this case are slon_gb_1.out, slon_gb_2.out, and slon_gb_3.out. If you see any problems, you may have to remove the paths and/or listens, replacing them with corrected ones. The log file slon_gb_1.out looks like: CONFIG main: slon version 1.0.5 starting up CONFIG main: local node id = 1 CONFIG main: loading current cluster configuration CONFIG storeNode: no_id=2 no_comment='Node 2 dbname=gb host=localhost port=5430 user=postgres' CONFIG storePath: pa_server=2 pa_client=1 pa_conninfo="dbname=gb host=localhost port=5430 user=postgres" pa_connretry=10 CONFIG storeListen: li_origin=2 li_receiver=1 li_provider=2 CONFIG main: configuration complete - starting threads CONFIG enableNode: no_id=2 CONFIG storeSet: set_id=1 set_origin=1 set_comment='' CONFIG storeNode: no_id=3 no_comment='gb2 8.0 5430' CONFIG enableNode: no_id=3 CONFIG storePath: pa_server=3 pa_client=1 pa_conninfo="dbname=gb2 host=localhost port=5430 user=postgres" pa_connretry=10 CONFIG storeListen: li_origin=3 li_receiver=1 li_provider=3 The one for Node 2, slon_gb_2.out, looks very similar except the paths and listens are appropriate for Node 2. Now subscribe Node 3 to the table set. #!/bin/sh slonik << _END_ # # Define cluster namespace and node connection information # cluster name = gbcluster; node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres'; node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres'; node 3 admin conninfo = 'dbname=gb2 host=localhost port=5430 user=postgres'; subscribe set (id=1, provider=1, receiver=3, forward=yes); echo 'set 1 of gb tables subscribed by node 3'; _END_ Verification of data is reassuring at this point. 10/05/2017 15/28 769875629 Swapping Providers If your replication scheme has three or more nodes, you may want to switch providers. This is not the same as failover or promotion of a new master. It simply changes the source of the replication data for a node. In the example case, Node 1, the origin of set 1, provided the information on set 1 to Node 2. When you added Node 3, you subscribed it to Table Set 1. The Table Set both originated on and was provided by Node 1. Now the goal is to switch providers so that Node 3 retrieves Table Set 1 information from Node 2, instead of directly from Node 1. Node 1 remains the origin of the Table Set. Node 3 could have either Node 1 or Node 2 as a provider of that set information. Node 2 is available as a provider, because when you subscribed Node 2 to Table Set 1, you also enabled it as a forwarder of Table Set 1. The listen paths, however, have Node 2 and Node 3 listening to each other via Node 1. One necessary change is to ensure that 2 and 3 listen to each other directly, because Node 2 will now provide the data for Node 1. Once the paths and listens are already set up, simply resubscribe the set, setting the provider to Node 2 instead of Node 1. Figure 3. #! /bin/bash # # varlena Slony Initialization # ============================== slonik << _END_ cluster name =gbcluster; node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres'; node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres'; node 3 admin conninfo = 'dbname=gb2 host=localhost port=5430 user=postgres'; # ============================== store listen (origin=3, receiver=2, provider=3); store listen (origin=2, receiver=3, provider=2); subscribe set ( id=1, provider=2, receiver=3, forward=yes); 10/05/2017 16/28 769875629 _END_ After running this script, examine the log files to see that Slony stored the listens and updated the subscription. Switching Masters In the example, the database called gb (mastered on a PostgreSQL 7.4 installation) replicates to two nodes on 8.0 installation. The decision has been made to move the database forward so that the 8.0 installation is the master. (If you are using Slony-I, be prepared to take advantage of a fast upgrade using master promotion.) Before you even consider swapping masters, you must have in hand a complete set of steps to follow to switch your applications accessing the 7.4 installation to the 8.0 installation. These steps are applicationdependent, and so you are on your own. However, the steps probably consist of stopping each application, changing the connection information (ideally in one place), and bringing the application back up after the switchover. It is imperative for a smooth and fast switchover that you have the application switch information at hand. Write them down and save the instructions in your source code control mechanism. You will never be sorry you did that. One more cautionary note, I highly recommend that you test your application on a copy of the 8.0 database. This copy should not be the replica, if your application writes to the database; it should be another copy. Remember that replicas are read only databases. Oh, yes, one more thing. Back up your databases before performing any major operations on them. Switching replication masters is a major operation. No excuses. Back up! Everything that happens in a Slony-I replication happens because of an Event. One of the important Events is a SYNC event. Every subscribing node replicates data up to a SYNC event and then commits the changes to the database. All nodes capable of forwarding subscriptions also log those changes until every node subscribed to that set has confirmed the SYNC event. This ensures that replication data remains available in the system until Slony is sure that no node needs it. To change the master of a set, you must first ensure that there are listeners for any of the new paths. The example provided listening paths from and to both of the nodes already, so there are no new listener paths required. Before swapping a master on a subscribed set, the set must be locked to ensure that no updates occur during the swap-over part. Then you may move the set. Finally, the newly designated replica node, formerly the master node, must subscribe to the set. Before you run the script, ensure that write access to your master database is OFF. #!/bin/sh slonik << _EOF_ cluster name node 1 admin node 2 admin node 3 admin = gbcluster; conninfo = 'dbname=gb host=localhost port=5434 user=postgres'; conninfo = 'dbname=gb host=localhost port=5430 user=postgres'; conninfo = 'dbname=gb2 host=localhost port=5430 user=postgres'; # add listener paths if required # # lock and move set # lock set (id=1, origin=1); 10/05/2017 17/28 769875629 move set (id=1, old origin=1, new origin=2); # subscribe set if required subscribe set (id=1, provider=2, receiver=1, forward=yes); _END_ After this script, Slony-I is ready for the change. Restart your application and have it point to the new master for writes. This process should only take seconds. Failover The replication example here began as a 7.4 database gb (Node 1), replicating to an 8.0 installation (Node 2). Then you added a third node on the 8.0 database called gb2. Initially the third node replicated directly from Node 1. You switched Node 3's provider to Node 2, then promoted Node 2 to be the master. Now you have an 8.0 master at Node 2 and two replicas, one 7.4 and one in the same installation as gb2. Consider the second replica, gb2, as if it were on another machine or installation. Suppose the master database failed or lost power and had to be taken offline. (Remember Node 3, for the sake of this discussion, is not on this machine, and so this hypothetical situation treats it as if it were live when the machine died.) What to do next is a human decision. You must prepare for this failover scenario. In this case, you have decided to failover to Node 1 in case of a failure on Node 2. The applications communicating with the database will also start to fail, so you must take them offline and restart them, pointing at the new master quickly after the master switch takes effect. The failover procedure for Slony is a combination of a provider change and a master change, both of which "Building and Configuring Slony" covered. Previously, you added Node 3 and had Node 1 provide for it. Then you changed Node 3's provider to Node 2, and finally, promoted Node 2 to master. In the failover case, Slony must do the reverse using the failover command: Promote Node 1 to master. Switch Node 3 to pull from Node 1. Then you can safely remove Node 2 for repair. #!/bin/sh slonik <<_EOF_ cluster name node 1 admin node 2 admin node 3 admin = gbcluster; conninfo = 'dbname=gb host=localhost user=postgres port=5434'; conninfo = 'dbname=gb host=localhost user=postgres port=5430'; conninfo = 'dbname=gb2 host=localhost user=postgres port=5430'; failover (id=2, backup node = 1); _EOF_ At this point, the slon process for Node 2 should be dead. When you are ready to put Node 2 back into place, add it as a fresh empty database into the replication scheme. (See "Add Node," above.) When it catches up, then you can switch masters so that Node 2 can again be the master of the cluster. (See "Switching Masters.") While resolving some of the problems that I ran into, I found that it was easiest and clearest for me to drop Node 2. drop node "erases" the node from the entire replication cluster. This is different than 10/05/2017 18/28 769875629 uninstall node, which removes the Slony-I schema from a specific database instance. Both tools are useful, but don't confuse them. The other issue I ran into was that while quickly cutting and pasting, I had mismatched paths, listens, and connection information. It is very worthwhile to check each of these commands by hand to verify that the commands are exactly what you meant. Also, don't forget that the connection information for store path pertains to the server node. Undoing bad paths and listen connections is a delicate operation and it is very tempting to throw the whole thing away and start from scratch. Schema Changes Changing the schema of a database being replicated is not simple. The schema changes must occur at the exact transactional point in time. Direct schema changes could lead to serious corruption in Slony-I due to the handling and disabling of triggers, rules, and constraints in the replica. Slony-I provides a way to execute SQL statements via the Event mechanism. This provides the transactional integrity necessary for schema changes, as well as the trigger, rule, and constraint changes required for the replicas. You must initiate Events on the master node. To add a "dummy" column to the artfile table, issue an Event to the master Node 1 pointing to the ALTER TABLE script file so as to synchronize it between databases. The EXECUTE SCRIPT command in Slonik will do this. Remember, Slony must be able to find your change script. #!/bin/sh # # Create a new column in a table in the replicated set # echo "Creating new column in the artfile table" slonik <<_EOF_ cluster name = gbcluster; node 1 admin conninfo = 'dbname=gb host=localhost user=postgres port=5434'; node 2 admin conninfo = 'dbname=gb host=localhost user=postgres port=5430'; node 3 admin conninfo = 'dbname=gb2 host=localhost user=postgres port=5430'; execute script ( SET ID = 1, FILENAME = 'changes20050219.sql', EVENT NODE = 1); _EOF_ Once this change propagates, you can do an update to populate the new column and verify it is being updated on all replicas. References General Bits Slony Articles on Tidbits The Slony-I Project documentation on GBorg Slonik Commands Jan Wieck's Original Slony-I Talk and Scripts (PDF) July 2004 in Portland, OR, sponsored by Global Registry Services Information from IRC #slony on freenode.net Mailing list: [email protected] A. Elein Mustain has more than 15 years of experience working with databases, 10 of those working exclusively with object relational database systems. 10/05/2017 19/28 769875629 http://gborg.postgresql.org/project/slony1/projdisplay.php http://gborg.postgresql.org/project/slony1/genpage.php?slonik_commands http://www.varlena.com/varlena/GeneralBits/Tidbits/ Why yet another replication system? Slony-I was born from an idea to create a replication system that was not tied to a specific version of PostgreSQL, and allowed to be started and stopped on an existing database with out the need for a dump/reload cycle. What Slony-I is: Slony-I is a "master to multiple slaves" replication system with cascading and slave promotion. The big picture for the development of Slony-I is a master-slave system that includes all features and capabilities needed to replicate large databases to a reasonably limited number of slave systems. Slony-I is a system for data centers and backup sites, where the normal mode of operation is that all nodes are available. What Slony-I is not: Slony-I is not a network management system. Slony-I does not have any functionality within it to detect a node failure, or automatically promote a node to a master or other data origin. Slony-I is not multi-master; it's not a connection broker, and it doesn't make you coffee and toast in the morning. Why doesn't Slony-I do automatic fail-over/promotion? This is the job of network monitoring software, not Slony. Every site's configuration and fail-over path is different. For example, keep-alive monitoring with redundant NIC's and intelligent HA switches that guarantee race-condition-free takeover of a network address and disconnecting the "failed" node vary in every network setup, vendor choice, hardware/software combination. This is clearly the realm of network management software and not Slony-I. Let Slony-I do what it does best: provide database replication. Current Limitations: Slony-I does not automatically propagate schema changes, nor does it have any ability to replicate large objects. 10/05/2017 20/28 769875629 Getting started with Slony-I Installation Before you can begin replicating your databases with Slony-I, you need to install it. Requirements: Any platform that can run PostgreSQL should be able to run Slony-I. The platforms that have received specific testing at the time of this release are FreeBSD-4X-i368, FreeBSD-5X-i386, FreeBSD-5X-alpha, osX-10.3, Linux-2.4X-i386 Linux-2.6X-i386 Linux-2.6X-amd64, Solaris-2.8-SPARC, Solaris-2.9-SPARC and OpenBSD-3.5-sparc64. All the servers used within the replication cluster need to have their Real Time Clocks in sync. This is to ensure that slon doesn't error with messages indicating that slave is already ahead of the master during replication. We recommend you use ntpd running on the master, with the slaves using it as their time peer. The following software packages are required to build Slony-I: GNU make. Other make programs will not work. GNU make is often installed under the name gmake; this document will always refer to it by that name. (On some systems GNU make is the default tool with the name make) to test for GNU make enter gmake version. Version 3.76 or later is good. Previous versions may not be. You need an ISO/ANSI C complier. Recent versions of GCC work. You also need a recent version of PostgreSQL *source*. Slony-I depends on namespace support so you must have version 7.3 or newer to be able to build and use Slony-I. If you need to get a GNU package, it comes in the standard packaging for your operating system, or you can find it at your local GNU mirror (see http://www.gnu.org/order/ftp.html for a list) or at ftp://ftp.gnu.org/gnu.) If you need to obtain PostgreSQL source, you can download it form your favorite PostgreSQL mirror (see for a list), or via BitTorrent at . Also check to make sure you have sufficient disk space. You will need approximately 5MB for the source tree during build and installation. Getting Slony-I Source You can get the Slony-I source from After you have obtained the file, unpack it. gunzip slony.tar.gz tar xf slony.tar This will create a directory Slony-I under the current directory with the Slony-I sources. Change into that that directory for the rest of the installation procedure. Short Version 10/05/2017 21/28 769875629 ./configure --with-pgsourcetree= gmake all gmake install 1. Configuration The first step of the installation procedure is to configure the source tree for your system. This is done by running the configure script. Configure needs to know where your PostgreSQL source tree is, this is done with the --withpgsourcetree= option. Example: ./configure --with-pgsourcetree=/usr/local/src/postgresql-7.4.3 This script will run a number of tests to guess values for various dependent variables and try to detect some quirks of your system. Slony-I is known to need a modified version of libpq on specific platforms such as Solaris2.X on SPARC this patch can be found at . 2. Build To start the build process, type gmake all (Remember to use GNU make) The build make take any ware from 30 seconds to 2 minutes depending on your hardware. The last line displayed should be All of Slony-I is successfully made. 3. Ready to install. Installing Slony-I To install Slony-I, enter gmake install This will install files into postgresql install directory as specified by the -prefix option used in the PostgreSQL configuration. Make sure you have appropriate permissions to write into that area. Normally you need to do this as root. 10/05/2017 22/28 769875629 Replicating Your First Database In this example, we will be replicating a brand new pgbench database. The mechanics of replicating an existing database are covered here, however we recommend that you learn how Slony-I functions by using a fresh new non-production database. The Slony-I replication engine is trigger-based, allowing us to replicate databases (or portions thereof) running under the same postmaster. This example will show how to replicate the pgbench database running on localhost (master) to the pgbench slave database also running on localhost (slave). We make a couple of assumptions about your PostgreSQL configuration: 1. You have tcpip_socket=true in your postgresql.conf and 2. You have localhost set to trust in pg_hba.conf The REPLICATIONUSER needs to be PostgreSQL superuser. postgres or pgsql. This is typically You should also set the following shell variables: CLUSTERNAME=slony_example MASTERDBNAME=pgbench SLAVEDBNAME=pgbenchslave MASTERHOST=localhost SLAVEHOST=localhost REPLICATIONUSER=pgsql PGBENCHUSER=pgbench Here are a couple of examples for setting variables in common shells: bash/sh: export CLUSTERNAME=slony_example (t)csh: setenv CLUSTERNAME slony_example Creating the pgbenchuser createuser -A -D $PGBENCHUSER Preparing the databases createdb -O $PGBENCHUSER -h $MASTERHOST $MASTERDBNAME createdb -O $PGBENCHUSER -h $SLAVEHOST $SLAVEDBNAME pgbench -i -s 1 -U $PGBENCHUSER -h $MASTERHOST $MASTERDBNAME Because Slony-I depends on the databases having the pl/pgSQL procedural language installed, we better install it now. It is possible that you have installed pl/pgSQL into the template1 database in which case you can skip this step because it's already installed into the $MASTERDBNAME. createlang plpgsql -h $MASTERHOST $MASTERDBNAME Slony-I does not yet automatically copy table definitions from a master when a slave subscribes to it, so we need to import this data. We do this with pg_dump. pg_dump -s -U $REPLICATIONUSER -h $MASTERHOST $MASTERDBNAME | psql -U $REPLICATIONUSER -h $SLAVEHOST $SLAVEDBNAME To illustrate how Slony-I allows for on the fly replication subscription, lets start up pgbench. If you run the pgbench application in the foreground of a 10/05/2017 23/28 769875629 separate terminal window, you can stop and restart it with different parameters at any time. You'll need to re-export the variables again so they are available in this session as well. The typical command to run pgbench would look like: pgbench -s 1 -c 5 -t 1000 -U $PGBENCHUSER -h $MASTERHOST $MASTERDBNAME This will run pgbench with 5 concurrent clients each processing 1000 transactions against the pgbench database running on localhost as the pgbench user. Configuring the Database for Replication. Creating the configuration tables, stored procedures, triggers and configuration is all done through the slonik tool. It is a specialized scripting aid that mostly calls stored procedures in the master/salve (node) databases. The script to create the initial configuration for the simple master-slave setup of our pgbench database looks like this: #!/bin/sh slonik <<_EOF_ #-# define the namespace the replication system uses in our example it is # slony_example #-cluster name = $CLUSTERNAME; #-# admin conninfo's are used by slonik to connect to the nodes one for each # node on each side of the cluster, the syntax is that of PQconnectdb in # the C-API # -node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER'; node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST user=$REPLICATIONUSER'; #-# init the first node. Its id MUST be 1. This creates the schema # _$CLUSTERNAME containing all replication system specific database # objects. #-init cluster ( id=1, comment = 'Master Node'); # # # # # # # #-Because the history table does not have a primary key or other unique constraint that could be used to identify a row, we need to add one. The following command adds a bigint column named _Slony-I_$CLUSTERNAME_rowID to the table. It will have a default value of nextval('_$CLUSTERNAME.s1_rowid_seq'), and have UNIQUE and NOT NULL constraints applied. All existing rows will be initialized with a number #-table add key (node id = 1, fully qualified name = 'public.history'); #-# Slony-I organizes tables into sets. The smallest unit a node can # subscribe is a set. The following commands create one set containing # all 4 pgbench tables. The master or origin of the set is node 1. #-create set (id=1, origin=1, comment='All pgbench tables'); 10/05/2017 24/28 769875629 set add table (set id=1, origin=1, id=1, 'public.accounts', comment='accounts table'); set add table (set id=1, origin=1, id=2, 'public.branches', comment='branches table'); set add table (set id=1, origin=1, id=3, 'public.tellers', comment='tellers table'); set add table (set id=1, origin=1, id=4, 'public.history', comment='history table', key = fully qualified name = fully qualified name = fully qualified name = fully qualified name = serial); #-# Create the second node (the slave) tell the 2 nodes how to connect to # each other and how they should listen for events. #-store node (id=2, comment = 'Slave node'); store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER'); store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME host=$SLAVEHOST user=$REPLICATIONUSER'); store listen (origin=1, provider = 1, receiver =2); store listen (origin=2, provider = 2, receiver =1); _EOF_ Is the pgbench still running? If not start it again. At this point we have 2 databases that are fully prepared. One is the master database in which bgbench is busy accessing and changing rows. It's now time to start the replication daemons. On $MASTERHOST the command to start the replication engine is slon $CLUSTERNAME "dbname=$MASTERDBNAME user=$REPLICATIONUSER host=$MASTERHOST" Likewise we start the replication system on node 2 (the slave) slon $CLUSTERNAME "dbname=$SLAVEDBNAME user=$REPLICATIONUSER host=$SLAVEHOST" Even though we have the slon running on both the master and slave and they are both spitting out diagnostics and other messages, we aren't replicating any data yet. The notices you are seeing is the synchronization of cluster configurations between the 2 slon processes. To start replicating the 4 pgbench tables (set 1) from the master (node id 1) the the slave (node id 2), execute the following script. #!/bin/sh slonik <<_EOF_ # ---# This defines which namespace the replication system uses # ---cluster name = $CLUSTERNAME; # ---# Admin conninfo's are used by the slonik program to connect # to the node databases. So these are the PQconnectdb arguments # that connect from the administrators workstation (where # slonik is executed). # ---node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER'; node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST user=$REPLICATIONUSER'; 10/05/2017 25/28 769875629 # ---# Node 2 subscribes set 1 # ---subscribe set ( id = 1, provider = 1, receiver = 2, forward = no); _EOF_ Any second here, the replication daemon on $SLAVEHOST will start to copy the current content of all 4 replicated tables. While doing so, of course, the pgbench application will continue to modify the database. When the copy process is finished, the replication daemon on $SLAVEHOST will start to catch up by applying the accumulated replication log. It will do this in little steps, 10 seconds worth of application work at a time. Depending on the performance of the two systems involved, the sizing of the two databases, the actual transaction load and how well the two databases are tuned and maintained, this catchup process can be a matter of minutes, hours, or eons. You have now successfully set up your first basic master/slave replication system, and the 2 databases once the slave has caught up contain identical data. That's the theory. In practice, it's good to check that the datasets are in fact the same. The following script will create ordered dumps of the 2 databases and compare them. Make sure that pgbench has completed it's testing, and that your slon sessions have caught up. #!/bin/sh echo -n "**** comparing sample1 ... " psql -U $REPLICATIONUSER -h $MASTERHOST $MASTERDBNAME >dump.tmp.1.$$ <<_EOF_ select 'accounts:'::text, aid, bid, abalance, filler from accounts order by aid; select 'branches:'::text, bid, bbalance, filler from branches order by bid; select 'tellers:'::text, tid, bid, tbalance, filler from tellers order by tid; select 'history:'::text, tid, bid, aid, delta, mtime, filler, "_Slony-I_${CLUSTERNAME}_rowID" from history order by "_Slony-I_${CLUSTERNAME}_rowID"; _EOF_ psql -U $REPLICATIONUSER -h $SLAVEHOST $SLAVEDBNAME >dump.tmp.2.$$ <<_EOF_ select 'accounts:'::text, aid, bid, abalance, filler from accounts order by aid; select 'branches:'::text, bid, bbalance, filler from branches order by bid; select 'tellers:'::text, tid, bid, tbalance, filler from tellers order by tid; select 'history:'::text, tid, bid, aid, delta, mtime, filler, "_Slony-I_${CLUSTERNAME}_rowID" from history order by "_Slony-I_${CLUSTERNAME}_rowID"; _EOF_ if diff dump.tmp.1.$$ dump.tmp.2.$$ >$CLUSTERNAME.diff ; then echo "success - databases are equal." rm dump.tmp.?.$$ rm $CLUSTERNAME.diff else echo "FAILED - see $CLUSTERNAME.diff for database differences" fi If this script returns "FAILED" please contact the developers at http://slony.org/ 10/05/2017 26/28 769875629 Doing switchover and failover with Slony-I Foreword Slony-I is an asynchronous replication system. Because of that, it is almost certain that at the moment the current origin of a set fails, the last transactions committed have not propagated to the subscribers yet. They always fail under heavy load, and you know it. Thus the goal is to prevent the main server from failing. The best way to do that is frequent maintenance. Opening the case of a running server is not exactly what we all consider professional system maintenance. And interestingly, those users who use replication for backup and failover purposes are usually the ones that have a very low tolerance for words like "downtime". To meet these requirements, Slony-I has not only failover capabilities, but controlled master role transfer features too. It is assumed in this document that the reader is familiar with the slonik utility and knows at least how to set up a simple 2 node replication system with Slony-I. Switchover We assume a current "origin" as node1 (AKA master) with one "subscriber" as node2 (AKA slave). A web application on a third server is accessing the database on node1. Both databases are up and running and replication is more or less in sync. Step 1) At the time of this writing switchover to another server requires the application to reconnect to the database. So in order to avoid any complications, we simply shut down the web server. Users who use pgpool for the applications database connections can shutdown the pool only. Step 2) A small slonik script executes the following commands: lock wait move wait set for set for (id = event (id = event 1, origin = 1); (origin = 1, confirmed = 2); 1, old origin = 1, new origin = 2); (origin = 1, confirmed = 2); After these commands, the origin (master role) of data set 1 is now on node2. It is not simply transferred. It is done in a fashion so that node1 is now a fully synchronized subscriber actively replicating the set. So the two nodes completely switched roles. Step 3) After reconfiguring the web application (or pgpool) to connect to the database on node2 instead, the web server is restarted and resumes normal operation. Done in one shell script, that does the shutdown, slonik, move config files and startup all together, this entire procedure takes less than 10 seconds. 10/05/2017 27/28 769875629 It is now possible to simply shutdown node1 and do whatever is required. When node1 is restarted later, it will start replicating again and eventually catch up after a while. At this point the whole procedure is executed with exchanged node IDs and the original configuration is restored. Failover Because of the possibility of missing not-yet-replicated transactions that are committed, failover is the worst thing that can happen in a master-slave replication scenario. If there is any possibility to bring back the failed server even if only for a few minutes, we strongly recommended that you follow the switchover procedure above. Slony does not provide any automatic detection for failed systems. Abandoning committed transactions is a business decision that cannot be made by a database. If someone wants to put the commands below into a script executed automatically from the network monitoring system, well ... its your data. Step 1) The slonik command failover (id = 1, backup node = 2); causes node2 to assume the ownership (origin) of all sets that have node1 as their current origin. In the case there would be more nodes, All direct subscribers of node1 are instructed that this is happening. Slonik would also query all direct subscribers to figure out which node has the highest replication status (latest committed transaction) for each set, and the configuration would be changed in a way that node2 first applies those last minute changes before actually allowing write access to the tables. In addition, all nodes that subscribed directly from node1 will now use node2 as data provider for the set. This means that after the failover command succeeded, no node in the entire replication setup will receive anything from node1 any more. Step 2) Reconfigure and restart the application (or pgpool) to cause it to reconnect to node2. Step 3) After the failover is complete and node2 accepts write operations against the tables, remove all remnants of node1's configuration information with the slonik command drop node (id = 1, event node = 2); After failover, getting back node1 After the above failover, the data stored on node1 must be considered out of sync with the rest of the nodes. Therefore, the only way to get node1 back and transfer the master role to it is to rebuild it from scratch as a slave, let it catch up and then follow the switchover procedure. 10/05/2017 28/28 769875629