Replication is a technique used for maintaining data redundancy on database systems. Implementation in different systems is a little bit different but all of them share the same goal. On MongoDB, this setup is known as replica sets. There used to be a master-slave scheme but it's not recommended to deploy it anymore.
There are several arrangements for replica sets, but in general it depends on the number of instances that you want to use, and the number of data centers that are involved. Here, we'll follow a 1 primary, 3 secondaries and 1 arbiter scheme (the replica set should have an odd number of nodes).
The primary node is the one that receives all write operations, and secondaries are the ones where data is replicated to. There are several uses and configurations for secondaries, here I'll be giving a small explanation on some of them.
The MongoDB configuration file
As I'll be using Ubuntu 18.04 and MongoDB 3.6.X, we don't need to add extra repositories because that's the version that comes in Ubuntu repositories. So, a simple
apt-get install mongodb will do the job. The configuration file for each instance will be placed in
The default configuration file will have the old format, but we're going to use the YAML-based one. Here is a very basic configuration file:
All the nodes in you're replica set should be reachable between each other, because they will communicate over the MongoDB service process. On line 13, be sure to put the address of the interface that will be used to communicate with the other MongoDB replica set nodes.
Also, on the
replication object, we have the
replSetName key, the option used to set the replica set name. All nodes should be configured this way.
On any node, replication can be started using the
rs.initiate() function inside the MongoDB shell. A basic form of the call is:
The argument this function is called with is an object with 2 basic attributes. First, the
_id of the replica set, this should be the same name that is set in the
mongodb.conf files. Then, the
members attribute is a list of objects, each with the following 2 properties:
_id, this is a unique identifier we set on each replica set member, and
host, the address of that replica set node.
This script should be called directly on the MongoDB shell, if that's too much to type in the console, you can create a script with the contents and execute it with
mongo admin /path/to/script.js. The script can be called on any node, but keep in mind that it should only be called on one of them.
To check the status of the replica set, call
rs.status() on the MongoDB shell on any of the replica set nodes. The output is helpful to check which node is the primary one. For a full configuration summary, call
If the replica set is already initiated, the
rs.reconfig() needs to be called instead of
rs.initiate(). To provoke a new election for the primary node, kill the node that is the current master (you can also use the
rs.stepDown() function on the primary), and run
rs.status() again to check who is the new primary node.
Primary and secondary nodes
When the replica set is initiated, the primary node is chosen with an election. That's why any of the nodes can be selected as primary. The recommended way to set a particular node as primary, is to set the priority of each one. So to designate the server we want as primary, we should modify its priority to be higher than the secondary ones. We can put a priority of
1 to the node we want to become primary, and
0.5 for the secondary nodes.
The priority is configured on the
rs.initiate() call, adding the key to each of the
members replica set object.
Secondary nodes can also be used for backup purposes, being configured with a replication delay of some hours. That way you prevent destructive queries to be replicated instantly to those nodes, and then they can be used to restore data. In the example script, there is a "backup" node with 86400 seconds configured for
slaveDelay (that's one whole day). The
priority should be set to 0, and
hidden as true. Hidden secondaries are invisible to clients, and can be used to serve different purposes like reporting and such.
It is recommended to always deploy an odd number of nodes in a MongoDB replica set. If the number of nodes that you have is even, then adding an arbiter node is the way to go. Arbiter nodes don't hold data, they only participate in elections when a new primary needs to be elected.
An arbiter node can be added with the
rs.AddArb() method from the primary node, but it can also be specified in the
rs.reconfig() methods as previously shown, using the
A complete example of the replica set configuration can be found here. When you have the replica set configured, try to test some things as the replication delay for the selected node, and provoking new elections by killing the primary node.