The master-slave database replication is a process of copying (syncing) data from a database on one server (the master) to a database on another server (the slaves) and distributing the read/write request load as per the system requirements. Where the Master is the read/write server and slaves are read-only servers.
In this article, we will use PostgreSQL Streaming Replication to develop a master-slave configuration. But before moving forward few questions triggered in my mind.
- What is the Streaming Replication?
- Do we have another way of replication?
Streaming Replication:-
Streaming replication allows a standby(slave) server to stay more up-to-date than is possible with file-based log shipping. The slave connects to the master, which streams WAL records to the salve as they're generated, without waiting for the WAL file to be filled.
Streaming Replication provides the capability to continuously ship and applies the Write-Ahead Logging(WAL) records to some number of slave servers in order to keep them current.
We have two types of streaming replication:-
- Asynchronous(default) - As the name suggests a create/update/delete request on the master server doesn't wait for the completion of the same in the slave server. Which decreases the latency of the requested operation.
- Synchronous - Obviously, this type of streaming replication will wait for the operation to be completed in the slave server too. This leads to an increase in the latency of the requested operation.
We have another type of Replication method in PostgreSQL: Logical Replication.
Logical replication uses a publish and subscribe model with one or more subscribers subscribing to one or more publications on a publisher node. Subscribers pull data from the publications they subscribe to and may subsequently re-publish data to allow cascading replication or more complex configurations.
Read More about Streaming Replication and Logical Replication.
Following are the steps to set up a Master-Slave Replication in PostgreSQL.
Setup EC2 Instance:-
Create two instances named as PostgresqlMasterServer and PostgresqlSlaveServer. Keep the Private IPv4, Public IPv4 DNS of each EC2 instance, and the Private key .pem file of the EC2 instance. Following is the example of two ec2 instances with above mention attributes.
Connect To EC2 Instance Using SSH:-
Note:- Before moving forward in this step make sure your OS has SSH installed and EC2 Server has Inbound TCP/IP Security Rule enabled for Port 22. SSH uses Port 22 to connect to EC2 Server from a remote machine.
Now connect to the EC2 instance from your local terminal using the below command.
# change the access of .pem file if you are not the root user
Kodeweich@cloud-dev:$ chmod 600 /path_to_pem_file/
# connect to ec2 instance server
# only for Debian based ec2 instance for other OS replace admin with root user.
Kodeweich@cloud-dev:$ ssh -i /path_to_pem_file/ admin@Public_IPv4_DNS
Maintain two separate terminal windows for PostgresqlMasterServer and PostgresqlSlaveServer. The above operation will be performed for both servers with their .pem file.
Installing Postgres SQL:-
Now install Postgres SQL on both instances using the below command.
Kodeweich@cloud-dev:$ sudo apt-get update
Kodeweich@cloud-dev:$ sudo apt-get install postgresql postgresql-contrib postgresql-client
By default, both servers will be running on the main cluster. You can check the server status by the below command.
Kodeweich@cloud-dev:$ sudo systemctl status postgresql
If the PostgreSQL server is not running you can run the below command to enable and start your PostgreSQL server.
Kodeweich@cloud-dev:$ sudo systemctl enable postgresql # enable serve
Kodeweich@cloud-dev:$ sudo systemctl start postgresql # start server
Kodeweich@cloud-dev:$ pg_lsclusters # list all cluster in the system
Note:- If you face any error during the setup of the configuration you can check the logs of both servers using the below command:-
Kodeweich@cloud-dev:$ less /var/log/postgresql/postgresql-13-main.log
# if you are ruuning another version of postgresql replace 13 with version.
Configure The Master Slave:-
There are four things we have to do on the Master server:-
1- Enable Networking:-
The first thing is to change postgresql.conf The file can be found in /etc/postgresql/13/main/ of the main cluster. However, if you have no clue where to find postgresql.conf you can ask PostgreSQL itself to point you to the configuration file.
Here is how it works:
Kodeweich@cloud-dev:$ sudo su - postgres # logged in to the cluster with postgres user
postgres@ip-172-31-3-162~$ psql postgres # switch to the default database
postgres=# SHOW config_file; # printout path of postgresql.conf file
postgres=# \q # exit from postgres database
Now open the postgresql.conf file and update the value of listen_addresses.
liten_addresses="172.31.3.162" # you can add multiple address comma seperated
2. Create a Replication User:-
Now create a replication user which will be used for data replication over the master to slaves. (optional for best practices)
postgres@ip-172-31-3-162~$ psql postgre
postgres=# CREATE USER replication_user WITH password "password_for_user" REPLICATION;
3. Allow Remote Access:-
Once you have successfully created a user with REPLICATION permission. Configure the pg_hba.conf file located in the same location as postgresql.conf file and add the below line in the IPv4 section of the file.
# TYPE DATABASE USER ADDRESS METHOD
host all replication_user 172.31.6.22/32 md5
# md5 define the password based authentciation for connection from Slaves to Master on initial db Sink.
4. Restart the Master Server:-
Kodeweich@cloud-dev:$ sudo systemctl restart postgresq
# run below command to check weather server running on TCP/IP 172.31.3.162:5432
Kodeweich@cloud-dev:$ netstat -plntu
That's it your master server has set up property. Now let's head off toward the slave server.
Configure The Slave Server:-
The next step is to create the Slave. There are various things we need to do to make this work. The first thing is to make sure that the replica is stopped and that the data directory is empty. Let’s first make sure the service is stopped:
Kodeweich@cloud-dev:$ sudo systemctl stop postgresql
Then, we need to make sure the data directory of the PostgreSQL main cluster is empty because we are going to sync the Slave Server with the Master Server and the Slave Server will contain the same data in the data directory as Master Server:-
Kodeweich@cloud-dev:$ sudo su - postgres
# change the value 13 as per installed postgresql version
postgres@ip-172-31-6-22~$ cd /etc/postgresql/13/main/
# if above directory is not empty run below commands.
postgres@ip-172-31-6-22~$ cd /etc/postgresql/13/
postgres@ip-172-31-6-22~$ mv main main_backup
postgres@ip-172-31-6-22~$ mkdir main/
postgres@ip-172-31-6-22~$ chmod 700 main/
Initial Data Replication:-
Note:- Before starting this step make sure that your Master Server has the Inbound rule added in the Security group for the TCP/IP connection of the Slave Server at PORT 5432.
pg_basebackup will connect to the Master Server and simply copy all the data files over. The connection has to be made as replication_user.
- The -D flag defines the destination directory where we want to store the data on the Slave.
- The -R flag automatically configures our Slave for replication. No extra configuration is needed on the secondary server.
- Finally, we created a replication slot. What is the purpose of a replication slot in PostgreSQL? Basically, the primary server is able to recycle the WAL – if it is not needed anymore on the primary. But what if the replica has not consumed it yet? In that case, the replica will fail unless there is a replication slot ensuring that the primary can only recycle the WAL if the replica has fully consumed it.
postgres@ip-172-31-6-22~$ pg_basebackup -h 172.31.3.162
-U replication_user \
--checkpoint=fast \
-D /etc/postgresql/13/main/
-R
--slot=some_name -C \
# it will ask for password authentication
postgres@ip-172-31-6-22~$ password: <password_of_replication_user>
Now for the confirmation to check whether Initial Data Replication has been set up properly or not check the file created in the /etc/postgresql/13/main/ and make sure the following changes have been configured in the main folder.
- standby.signal has been created which indicated that this is the Replica (Slave) Server.
- Check the content of postgresql.auto.conf file, It should be configured as below:-
- primary_conninfo = 'user=replication_user password=secret host=172.31.3.162 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
Change the value of listening_addresses in postgresql.conf file. Which will be located in /etc/postgresql/13/main/.
listen_addresses="172.31.6.22" # you can add multiple address comma seperated
Starting Slave Server:-
Now we are ready to start the Slave server, run the below command:-
Kodeweich@cloud-dev:$ sudo systemctl start postgresql
# run below command to check weather server running on TCP/IP 172.31.6.22:5432
Kodeweich@cloud-dev:$ netstat -plntu
Test the Server Setup:-
In your master server run the below command to check the connection of the Slave server.
Kodeweich@cloud-dev: sudo su - postgre
postgres@ip-172-31-3-162:~$ psql postgres
postgres=# \x # expended display on
postgres=# SELECT * FROM pg_stat_replication;
In Your slave server run the below command to check the connection of the Master Server.
Kodeweich@cloud-dev: sudo su - postgre
postgres@ip-172-31-6-22:~$ psql postgres
postgres=# \x # expended display on
postgres=# SELECT * FROM pg_stat_wal_receiver;
If you do not get any row in the SELECT statement make sure you have performed the above step carefully and for an insights view, you can check the log file.
Now If you will perform any CREATE/UPDATE/DELETE query in the Master Server, you will find instant replication of the same query output in the Slave Server too.
E.g.
# master server
postgres=# CREATE TABLE userData (id int, name varchar(100));
CREATE TABLE
Now if you will see it in your slave server. You will have the same table in the slave server too.
So I hope you have successfully set up your Postgresql Master-Slave replication on EC2 Server. If you face any issues during setup let me know in DM. I will try to help you out.
If you want to learn more about PostgreSQL and the theoretical concept of streaming replication with visual effects. You can check it out here.
Thank you for reading my article, I you want to learn about the serverless world you can check out this article and also feel free to connect with me.