diff --git a/docs/_images/diagrams/ha-architecture-patroni.png b/docs/_images/diagrams/ha-architecture-patroni.png index 258aa1443..0f18b0d61 100644 Binary files a/docs/_images/diagrams/ha-architecture-patroni.png and b/docs/_images/diagrams/ha-architecture-patroni.png differ diff --git a/docs/solutions/ha-setup-apt.md b/docs/solutions/ha-setup-apt.md index c0887892b..7d381285a 100644 --- a/docs/solutions/ha-setup-apt.md +++ b/docs/solutions/ha-setup-apt.md @@ -5,7 +5,9 @@ This guide provides instructions on how to set up a highly available PostgreSQL ## Preconditions -For this setup, we will use the nodes running on Ubuntu 20.04 as the base operating system and having the following IP addresses: +1. This is the example deployment suitable to be used for testing purposes in non-production environments. +2. In this setup ETCD resides on the same hosts as Patroni. In production, consider deploying ETCD cluster on dedicated hosts or at least have separate disks for ETCD and PostgreSQL. This is because ETCD writes every request from the cluster to disk which can be CPU intensive and affects disk performance. See [hardware recommendations](https://etcd.io/docs/v3.6/op-guide/hardware/) for details. +3. For this setup, we will use the nodes running on Ubuntu 22.04 as the base operating system: | Node name | Public IP address | Internal IP address |---------------|-------------------|-------------------- @@ -19,31 +21,98 @@ For this setup, we will use the nodes running on Ubuntu 20.04 as the base operat In a production (or even non-production) setup, the PostgreSQL nodes will be within a private subnet without any public connectivity to the Internet, and the HAProxy will be in a different subnet that allows client traffic coming only from a selected IP range. To keep things simple, we have implemented this architecture in a DigitalOcean VPS environment, and each node can access the other by its internal, private IP. -### Setting up hostnames in the `/etc/hosts` file -To make the nodes aware of each other and allow their seamless communication, resolve their hostnames to their public IP addresses. Modify the `/etc/hosts` file of each node as follows: +## Initial setup -| node 1 | node 2 | node 3 -|---------------------------| --------------------------|----------------------- -| 127.0.0.1 localhost node1
10.104.0.7 node1
**10.104.0.2 node2**
**10.104.0.8 node3**
| 127.0.0.1 localhost node2
**10.104.0.7 node1**
10.104.0.2 node2
**10.104.0.8 node3**
| 127.0.0.1 localhost node3
**10.104.0.7 node1**
**10.104.0.2 node2**
10.104.0.8 node3
+It’s not necessary to have name resolution, but it makes the whole setup more readable and less error prone. Here, instead of configuring a DNS, we use a local name resolution by updating the file /etc/hosts. By resolving their hostnames to their IP addresses, we make the nodes aware of each other’s names and allow their seamless communication. +1. Run the following command on each node. Change the node name to `node1`, `node2` and `node3` respectively: -The `/etc/hosts` file of the HAProxy-demo node looks like the following: + ```{.bash data-prompt="$"} + $ sudo hostnamectl set-hostname node-1 + ``` -``` -127.0.1.1 HAProxy-demo HAProxy-demo -127.0.0.1 localhost -10.104.0.6 HAProxy-demo -10.104.0.7 node1 -10.104.0.2 node2 -10.104.0.8 node3 -``` +2. Modify the `/etc/hosts` file of each PostgreSQL node to include the hostnames and IP addresses of the remaining nodes. Add the following at the end of the `/etc/hosts` file on all nodes: + + === "node1" + + ```text hl_lines="3 4" + # Cluster IP and names + 10.104.0.1 node1 + 10.104.0.2 node2 + 10.104.0.3 node3 + ``` + + === "node2" + + ```text hl_lines="2 4" + # Cluster IP and names + 10.104.0.1 node1 + 10.104.0.2 node2 + 10.104.0.3 node3 + ``` + + === "node3" + + ```text hl_lines="2 3" + # Cluster IP and names + 10.104.0.1 node1 + 10.104.0.2 node2 + 10.104.0.3 node3 + ``` -### Install Percona Distribution for PostgreSQL + === "HAproxy-demo" -1. Follow the [installation instructions](../installing.md#on-debian-and-ubuntu-using-apt) to install Percona Distribution for PostgreSQL on `node1`, `node2` and `node3`. + The HAProxy instance should have the name resolution for all the three nodes in its `/etc/hosts` file. Add the following lines at the end of the file: + + ```text hl_lines="4 5 6" + # Cluster IP and names + 10.104.0.6 HAProxy-demo + 10.104.0.1 node1 + 10.104.0.2 node2 + 10.104.0.3 node3 + ``` + + +### Install the software + +Run the following commands on node1`, `node2` and `node3`: + +1. Install Percona Distribution for PostgreSQL + + * [Install `percona-release`](https://www.percona.com/doc/percona-repo-config/installing.html). + + * Enable the repository: + + ```{.bash data-prompt="$"} + $ sudo percona-release setup ppg14 + ``` -2. Remove the data directory. Patroni requires a clean environment to initialize a new cluster. Use the following commands to stop the PostgreSQL service and then remove the data directory: + * [Install Percona Distribution for PostgreSQL packages](../apt.md). + +2. Install some Python and auxiliary packages to help with Patroni and ETCD + + ```{.bash data-prompt="$"} + $ sudo apt install python3-pip python3-dev binutils + ``` + +3. Install ETCD, Patroni, pgBackRest packages: + + + ```{.bash data-prompt="$"} + $ sudo apt install percona-patroni \ + etcd etcd-server etcd-client \ + percona-pgbackrest + ``` + +4. Stop and disable all installed services: + + ```{.bash data-prompt="$"} + $ sudo systemctl stop {etcd,patroni,postgresql} + $ systemctl disable {etcd,patroni,postgresql} + ``` + +5. Even though Patroni can use an existing Postgres installation, remove the data directory to force it to initialize a new Postgres cluster instance. ```{.bash data-promp="$"} $ sudo systemctl stop postgresql @@ -56,43 +125,64 @@ The distributed configuration store helps establish a consensus among nodes duri The `etcd` cluster is first started in one node and then the subsequent nodes are added to the first node using the `add `command. The configuration is stored in the `/etc/default/etcd` file. -1. Install `etcd` on every PostgreSQL node using the following command: +### Configure `node1` + +1. Back up the configuration file ```{.bash data-promp="$"} - $ sudo apt install etcd + $ sudo mv /etc/default/etcd /etc/default/etcd.orig ``` -2. Modify the `/etc/default/etcd` configuration file on each node. +2. Export environment variables to simplify the config file creation - * On `node1`, add the IP address of `node1` to the `ETCD_INITIAL_CLUSTER` parameter. The configuration file looks as follows: + * Node name: - ```text - ETCD_NAME=node1 - ETCD_INITIAL_CLUSTER="node1=http://10.104.0.7:2380" - ETCD_INITIAL_CLUSTER_TOKEN="devops_token" - ETCD_INITIAL_CLUSTER_STATE="new" - ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.104.0.7:2380" - ETCD_DATA_DIR="/var/lib/etcd/postgresql" - ETCD_LISTEN_PEER_URLS="http://10.104.0.7:2380" - ETCD_LISTEN_CLIENT_URLS="http://10.104.0.7:2379,http://localhost:2379" - ETCD_ADVERTISE_CLIENT_URLS="http://10.104.0.7:2379" - … - ``` + ```{.bash data-prompt="$"} + $ export NODE_NAME=`hostname -f` + ``` - * On `node2`, add the IP addresses of both `node1` and `node2` to the `ETCD_INITIAL_CLUSTER` parameter: + * Node IP: - ```text - ETCD_NAME=node2 - ETCD_INITIAL_CLUSTER="node1=http://10.104.0.7:2380,node2=http://10.104.0.2:2380" - ETCD_INITIAL_CLUSTER_TOKEN="devops_token" - ETCD_INITIAL_CLUSTER_STATE="existing" - ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.104.0.2:2380" - ETCD_DATA_DIR="/var/lib/etcd/postgresql" - ETCD_LISTEN_PEER_URLS="http://10.104.0.2:2380" - ETCD_LISTEN_CLIENT_URLS="http://10.104.0.2:2379,http://localhost:2379" - ETCD_ADVERTISE_CLIENT_URLS="http://10.104.0.2:2379" - … - ``` + ```{.bash data-prompt="$"} + $ export NODE_IP=`hostname -i | awk '{print $1}'` + ``` + + * Initial cluster token for the ETCD cluster during bootstrap: + + ```{.bash data-prompt="$"} + $ export ETCD_TOKEN='PostgreSQL_HA_Cluster_1' + ``` + + * ETCD data directory: + + ```{.bash data-prompt="$"} + $ export ETCD_DATA_DIR='/var/lib/etcd/postgresql' + ``` + +3. Modify the `/etc/default/etcd` configuration file as follows:. + + ```text + ETCD_NAME=${NODE_NAME} + ETCD_INITIAL_CLUSTER="${NODE_NAME}=http://${NODE_IP}:2380" + ETCD_INITIAL_CLUSTER_STATE="new" + ETCD_INITIAL_CLUSTER_TOKEN="${ETCD_TOKEN}" + ETCD_INITIAL_ADVERTISE_PEER_URLS="http://${NODE_IP}:2380" + ETCD_DATA_DIR="${ETCD_DATA_DIR}" + ETCD_LISTEN_PEER_URLS="http://${NODE_IP}:2380" + ETCD_LISTEN_CLIENT_URLS="http://${NODE_IP}:2379,http://localhost:2379" + ETCD_ADVERTISE_CLIENT_URLS="http://${NODE_IP}:2379" + … + ``` + +4. Start the `etcd` service to apply the changes on `node1`. + + ```{.bash data-prompt="$"} + $ sudo systemctl enable --now etcd + $ sudo systemctl start etcd + $ sudo systemctl status etcd + ``` + +5. Check the etcd cluster members on `node1`: * On `node3`, the `ETCD_INITIAL_CLUSTER` parameter includes the IP addresses of all three nodes: @@ -109,128 +199,145 @@ The `etcd` cluster is first started in one node and then the subsequent nodes ar … ``` -3. On `node1`, add `node2` and `node3` to the cluster using the `add` command: +6. Add the `node2` to the cluster. Run the following command on `node1`: ```{.bash data-promp="$"} $ sudo etcdctl member add node2 http://10.104.0.2:2380 $ sudo etcdctl member add node3 http://10.104.0.8:2380 ``` -4. Restart the `etcd` service on `node2` and `node3`: - - ```{.bash data-promp="$"} - $ sudo systemctl restart etcd - ``` - -5. Check the etcd cluster members. + The output resembles the following one: - ```{.bash data-promp="$"} - $ sudo etcdctl member list - ``` - - The output resembles the following: + ```{.text .no-copy} + Added member named node2 with ID 10042578c504d052 to cluster - ``` - 21d50d7f768f153a: name=node1 peerURLs=http://10.104.0.7:2380 clientURLs=http://10.104.0.7:2379 isLeader=true - af4661d829a39112: name=node2 peerURLs=http://10.104.0.2:2380 clientURLs=http://10.104.0.2:2379 isLeader=false - e3f3c0c1d12e9097: name=node3 peerURLs=http://10.104.0.8:2380 clientURLs=http://10.104.0.8:2379 isLeader=false + ETCD_NAME="node2" + ETCD_INITIAL_CLUSTER="node2=http://10.104.0.2:2380,node1=http://10.104.0.1:2380" + ETCD_INITIAL_CLUSTER_STATE="existing" ``` -## Set up the watchdog service +### Configure `node2` -The Linux kernel uses the utility called a _watchdog_ to protect against an unresponsive system. The watchdog monitors a system for unrecoverable application errors, depleted system resources, etc., and initiates a reboot to safely return the system to a working state. The watchdog functionality is useful for servers that are intended to run without human intervention for a long time. Instead of users finding a hung server, the watchdog functionality can help maintain the service. +1. Back up the configuration file and export environment variables as described in steps 1-2 of the [`node1` configuration](#configure-node1) +2. Edit the `/etc/default/etcd` configuration file on `node2`. Use the result of the `add` command on `node1` to change the configuration file as follows: -In this example, we will configure _Softdog_ - a standard software implementation for watchdog that is shipped with Ubuntu 20.04. + ```text + ETCD_NAME=${NODE_NAME} + ETCD_INITIAL_CLUSTER="node-1=http://10.0.100.1:2380,node-2=http://10.0.100.2:2380" + ETCD_INITIAL_CLUSTER_STATE="existing" -Complete the following steps on all three PostgreSQL nodes to load and configure Softdog. + ETCD_INITIAL_CLUSTER_TOKEN="${ETCD_TOKEN}" + ETCD_INITIAL_ADVERTISE_PEER_URLS="http://${NODE_IP}:2380" + ETCD_DATA_DIR="${ETCD_DATA_DIR}" + ETCD_LISTEN_PEER_URLS="http://${NODE_IP}:2380" + ETCD_LISTEN_CLIENT_URLS="http://${NODE_IP}:2379,http://localhost:2379" + ETCD_ADVERTISE_CLIENT_URLS="http://${NODE_IP}:2379" + ``` -1. Load Softdog: +3. Start the `etcd` service to apply the changes on `node2`: - ```{.bash data-promp="$"} - $ sudo sh -c 'echo "softdog" >> /etc/modules' + ```{.bash data-prompt="$"} + $ sudo systemctl enable --now etcd + $ sudo systemctl start etcd + $ sudo systemctl status etcd ``` -2. Patroni will be interacting with the watchdog service. Since Patroni is run by the `postgres` user, this user must have access to Softdog. To make this happen, change the ownership of the `watchdog.rules` file to the `postgres` user: +### Configure `node3` - ``` {.bash data-promp="$"} - $ sudo sh -c 'echo "KERNEL==\"watchdog\", OWNER=\"postgres\", GROUP=\"postgres\"" >> /etc/udev/rules.d/61-watchdog.rules' - ``` - -3. Remove Softdog from the blacklist. +1. Add `node3` to the cluster. **Run the following command on `node1`** - * Find out the files where Softdog is blacklisted: + ```{.bash data-prompt="$"} + $ sudo etcdctl member add node3 http://10.104.0.3:2380 + ``` - ```{.bash data-promp="$"} - $ grep blacklist /lib/modprobe.d/* /etc/modprobe.d/* |grep softdog - ``` - - In our case, `modprobe `is blacklisting the Softdog: +2. On `node3`, back up the configuration file and export environment variables as described in steps 1-2 of the [`node1` configuration](#configure-node1) +3. Modify the `/etc/default/etcd` configuration file and add the output of the `add` command: + + ```text + ETCD_NAME=${NODE_NAME} + ETCD_INITIAL_CLUSTER="node1=http://10.104.0.1:2380,node2=http://10.104.0.2:2380,node3=http://10.104.0.3:2380" + ETCD_INITIAL_CLUSTER_STATE="existing" + + ETCD_INITIAL_CLUSTER_TOKEN="${ETCD_TOKEN}" + ETCD_INITIAL_ADVERTISE_PEER_URLS="http://${NODE_IP}:2380" + ETCD_DATA_DIR="${ETCD_DATA_DIR}" + ETCD_LISTEN_PEER_URLS="http://${NODE_IP}:2380" + ETCD_LISTEN_CLIENT_URLS="http://${NODE_IP}:2379,http://localhost:2379" + ETCD_ADVERTISE_CLIENT_URLS="http://${NODE_IP}:2379" + … + ``` + +4. Start the `etcd` service on `node3`: + + ```{.bash data-prompt="$"} + $ sudo systemctl enable --now etcd + $ sudo systemctl start etcd + $ sudo systemctl status etcd + ``` - ``` - /lib/modprobe.d/blacklist_linux_5.4.0-73-generic.conf:blacklist softdog - ``` +5. Check the etcd cluster members. - * Remove the `blacklist softdog` line from the `/lib/modprobe.d/blacklist_linux_5.4.0-73-generic.conf` file. - * Restart the service + ```{.bash data-promp="$"} + $ sudo etcdctl member list + ``` - ```{.bash data-promp="$"} - $ sudo modprobe softdog - ``` - - * Verify the `modprobe` is working correctly by running the `lsmod `command: - - ```{.bash data-promp="$"} - $ sudo lsmod | grep softdog - ``` - - The output will show a process identifier if it’s running. + The output resembles the following: - ``` - softdog 16384 0 - ``` + ``` + 21d50d7f768f153a: name=node1 peerURLs=http://10.104.0.7:2380 clientURLs=http://10.104.0.7:2379 isLeader=true + af4661d829a39112: name=node2 peerURLs=http://10.104.0.2:2380 clientURLs=http://10.104.0.2:2379 isLeader=false + e3f3c0c1d12e9097: name=node3 peerURLs=http://10.104.0.8:2380 clientURLs=http://10.104.0.8:2379 isLeader=false + ``` -4. Check that the Softdog files under the `/dev/ `folder are owned by the `postgres `user: +## Configure Patroni +Run the following commands on all nodes. You can do this in parallel: -```{.bash data-promp="$"} -$ ls -l /dev/watchdog* +1. Export and create environment variables to simplify the config file creation: -crw-rw---- 1 postgres postgres 10, 130 Sep 11 12:53 /dev/watchdog -crw------- 1 root root 245, 0 Sep 11 12:53 /dev/watchdog0 -``` + * Node name: + ```{.bash data-prompt="$"} + $ export NODE_NAME=`hostname -f` + ``` -!!! tip + * Node IP: - If the ownership has not been changed for any reason, run the following command to manually change it: - - ```{.bash data-promp="$"} - $ sudo chown postgres:postgres /dev/watchdog* - ``` + ```{.bash data-prompt="$"} + $ export NODE_IP=`hostname -i | awk '{print $1}'` + ``` + + * Create variables to store the PATH: -## Configure Patroni + ```bash + DATA_DIR="/var/lib/postgresql/14/main" + PG_BIN_DIR="/usr/lib/postgresql/14/bin" + ``` -1. Install Patroni on every PostgreSQL node: + **NOTE**: Check the path to the data and bin folders on your operating system and change it for the variables accordingly. - ```{.bash data-promp="$"} - $ sudo apt install percona-patroni - ``` + * Patroni information: + + ```bash + NAMESPACE="percona_lab" + SCOPE="cluster_1 + ``` 2. Create the `patroni.yml` configuration file under the `/etc/patroni` directory. The file holds the default configuration values for a PostgreSQL cluster and will reflect the current cluster setup. 3. Add the following configuration for `node1`: - ```yaml - scope: cluster_1 - namespace: percona_lab - name: node1 + ```yaml title="/etc/patroni/patroni.yml" + namespace: ${NAMESPACE} + scope: ${SCOPE} + name: ${NODE_NAME} restapi: - listen: 0.0.0.0:8008 - connect_address: 10.104.0.1:8008 + listen: 0.0.0.0:8008 + connect_address: ${NODE_IP}:8008 etcd: - host: 10.104.0.1:2379 + host: ${NODE_IP}:2379 bootstrap: # this section will be written into Etcd:///config after initializing new cluster @@ -284,9 +391,9 @@ crw------- 1 root root 245, 0 Sep 11 12:53 /dev/watchdog0 postgresql: cluster_name: cluster_1 listen: 0.0.0.0:5432 - connect_address: 10.104.0.1:5432 - data_dir: /data/pgsql - bin_dir: /usr/pgsql-14/bin + connect_address: ${NODE_IP}:5432 + data_dir: ${DATADIR} + bin_dir: ${PG_BIN_DIR} pgpass: /tmp/pgpass authentication: replication: @@ -301,11 +408,6 @@ crw------- 1 root root 245, 0 Sep 11 12:53 /dev/watchdog0 - basebackup basebackup: checkpoint: 'fast' - - watchdog: - mode: required # Allowed values: off, automatic, required - device: /dev/watchdog - safety_margin: 5 tags: nofailover: false @@ -314,81 +416,99 @@ crw------- 1 root root 245, 0 Sep 11 12:53 /dev/watchdog0 nosync: false ``` - !!! admonition "Patroni configuration file" + ??? admonition "Patroni configuration file" Let’s take a moment to understand the contents of the `patroni.yml` file. - The first section provides the details of the first node (`node1`) and its connection ports. After that, we have the `etcd` service and its port details. + The first section provides the details of the node and its connection ports. After that, we have the `etcd` service and its port details. Following these, there is a `bootstrap` section that contains the PostgreSQL configurations and the steps to run once the database is initialized. The `pg_hba.conf` entries specify all the other nodes that can connect to this node and their authentication mechanism. +3. Check that the `systemd` unit file `patroni.service` is created in `/etc/systemd/system`. If it is created, skip this step. -4. Create the configuration files for `node2` and `node3`. Replace the reference to `node1` with `node2` and `node3`, respectively. -5. Enable and restart the patroni service on every node. Use the following commands: + If it's **not** created, create it manually and specify the following contents within: - ```{.bash data-promp="$"} - $ sudo systemctl enable patroni - $ sudo systemctl restart patroni - ``` - -When Patroni starts, it initializes PostgreSQL (because the service is not currently running and the data directory is empty) following the directives in the bootstrap section of the configuration file. + ```ini title="/etc/systemd/system/patroni.service" + [Unit] + Description=Runners to orchestrate a high-availability PostgreSQL + After=syslog.target network.target -!!! admonition "Troubleshooting Patroni" + [Service] + Type=simple - To ensure that Patroni has started properly, check the logs using the following command: + User=postgres + Group=postgres - ```{.bash data-promp="$"} - $ sudo journalctl -u patroni.service -n 100 -f + # Start the patroni process + ExecStart=/bin/patroni /etc/patroni/patroni.yml + + # Send HUP to reload from patroni.yml + ExecReload=/bin/kill -s HUP $MAINPID + + # only kill the patroni process, not its children, so it will gracefully stop postgres + KillMode=process + + # Give a reasonable amount of time for the server to start up/shut down + TimeoutSec=30 + + # Do not restart the service if it crashes, we want to manually inspect database on failure + Restart=no + + [Install] + WantedBy=multi-user.target + ``` + +4. Make `systemd` aware of the new service: + + ```{.bash data-prompt="$"} + $ sudo systemctl daemon-reload ``` - The output shouldn't show any errors: +5. Now it's time to start Patroni. You need the following commands on all nodes but not in parallel. Start with the `node1` first, wait for the service to come to live, and then proceed with the other nodes one-by-one, always waiting for them to sync with the primary node: + ```{.bash data-prompt="$"} + $ sudo systemctl enable --now patroni + $ sudo systemctl restart patroni ``` - … + + When Patroni starts, it initializes PostgreSQL (because the service is not currently running and the data directory is empty) following the directives in the bootstrap section of the configuration file. + +6. Check the service to see if there are errors: - Sep 23 12:50:21 node01 systemd[1]: Started PostgreSQL high-availability manager. - Sep 23 12:50:22 node01 patroni[10119]: 2021-09-23 12:50:22,022 INFO: Selected new etcd server http://10.104.0.2:2379 - Sep 23 12:50:22 node01 patroni[10119]: 2021-09-23 12:50:22,029 INFO: No PostgreSQL configuration items changed, nothing to reload. - Sep 23 12:50:22 node01 patroni[10119]: 2021-09-23 12:50:22,168 INFO: Lock owner: None; I am node1 - Sep 23 12:50:22 node01 patroni[10119]: 2021-09-23 12:50:22,177 INFO: trying to bootstrap a new cluster - Sep 23 12:50:22 node01 patroni[10140]: The files belonging to this database system will be owned by user "postgres". - Sep 23 12:50:22 node01 patroni[10140]: This user must also own the server process. - Sep 23 12:50:22 node01 patroni[10140]: The database cluster will be initialized with locale "C.UTF-8". - Sep 23 12:50:22 node01 patroni[10140]: The default text search configuration will be set to "english". - Sep 23 12:50:22 node01 patroni[10140]: Data page checksums are enabled. - Sep 23 12:50:22 node01 patroni[10140]: creating directory /var/lib/postgresql/12/main ... ok - Sep 23 12:50:22 node01 patroni[10140]: creating subdirectories ... ok - Sep 23 12:50:22 node01 patroni[10140]: selecting dynamic shared memory implementation ... posix - Sep 23 12:50:22 node01 patroni[10140]: selecting default max_connections ... 100 - Sep 23 12:50:22 node01 patroni[10140]: selecting default shared_buffers ... 128MB - Sep 23 12:50:22 node01 patroni[10140]: selecting default time zone ... Etc/UTC - Sep 23 12:50:22 node01 patroni[10140]: creating configuration files ... ok - Sep 23 12:50:22 node01 patroni[10140]: running bootstrap script ... ok - Sep 23 12:50:23 node01 patroni[10140]: performing post-bootstrap initialization ... ok - Sep 23 12:50:23 node01 patroni[10140]: syncing data to disk ... ok - Sep 23 12:50:23 node01 patroni[10140]: initdb: warning: enabling "trust" authentication for local connections - Sep 23 12:50:23 node01 patroni[10140]: You can change this by editing pg_hba.conf or using the option -A, or - Sep 23 12:50:23 node01 patroni[10140]: --auth-local and --auth-host, the next time you run initdb. - Sep 23 12:50:23 node01 patroni[10140]: Success. You can now start the database server using: - Sep 23 12:50:23 node01 patroni[10140]: /usr/lib/postgresql/14/bin/pg_ctl -D /var/lib/postgresql/14/main -l logfile start - Sep 23 12:50:23 node01 patroni[10156]: 2021-09-23 12:50:23.672 UTC [10156] LOG: redirecting log output to logging collector process - Sep 23 12:50:23 node01 patroni[10156]: 2021-09-23 12:50:23.672 UTC [10156] HINT: Future log output will appear in directory "log". - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,694 INFO: postprimary pid=10156 - Sep 23 12:50:23 node01 patroni[10165]: localhost:5432 - accepting connections - Sep 23 12:50:23 node01 patroni[10167]: localhost:5432 - accepting connections - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,743 INFO: establishing a new patroni connection to the postgres cluster - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,757 INFO: running post_bootstrap - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,767 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,793 INFO: initialized a new cluster - Sep 23 12:50:33 node01 patroni[10119]: 2021-09-23 12:50:33,810 INFO: no action. I am (node1) the leader with the lock - Sep 23 12:50:33 node01 patroni[10119]: 2021-09-23 12:50:33,899 INFO: no action. I am (node1) the leader with the lock - Sep 23 12:50:43 node01 patroni[10119]: 2021-09-23 12:50:43,898 INFO: no action. I am (node1) the leader with the lock - Sep 23 12:50:53 node01 patroni[10119]: 2021-09-23 12:50:53,894 INFO: no action. I am (node1) the leader with the + ```{.bash data-prompt="$"} + $ sudo journalctl -fu patroni ``` A common error is Patroni complaining about the lack of proper entries in the pg_hba.conf file. If you see such errors, you must manually add or fix the entries in that file and then restart the service. - Changing the patroni.yml file and restarting the service will not have any effect here because the bootstrap section specifies the configuration to apply when PostgreSQL is first started in the node. It will not repeat the process even if the Patroni configuration file is modified and the service is restarted. + Changing the patroni.yml file and restarting the service will not have any effect here because the bootstrap section specifies the configuration to apply when PostgreSQL is first started in the node. It will not repeat the process even if the Patroni configuration file is modified and the service is restarted. + +7. Check the cluster: + + ```{.bash data-prompt="$"} + $ patronictl -c /etc/patroni/patroni.yml list $SCOPE + ``` + + The output on `node1` resembles the following: + + ```{.text .no-copy} + + Cluster: cluster_1 --+---------+---------+----+-----------+ + | Member | Host | Role | State | TL | Lag in MB | + +--------+-------------+---------+---------+----+-----------+ + | node-1 | 10.0.100.1 | Leader | running | 1 | | + +--------+-------------+---------+---------+----+-----------+ + ``` + + On the remaining nodes: + + ```{.text .no-copy} + + Cluster: cluster_1 --+---------+---------+----+-----------+ + | Member | Host | Role | State | TL | Lag in MB | + +--------+-------------+---------+---------+----+-----------+ + | node-1 | 10.0.100.1 | Leader | running | 1 | | + | node-2 | 10.0.100.2 | Replica | running | 1 | 0 | + +--------+-------------+---------+---------+----+-----------+ + ``` If Patroni has started properly, you should be able to locally connect to a PostgreSQL node using the following command: @@ -400,6 +520,7 @@ The command output looks like the following: ``` psql (14.1) + Type "help" for help. postgres=# @@ -467,13 +588,12 @@ HAProxy is capable of routing write requests to the primary node and read reques $ sudo systemctl restart haproxy ``` - 4. Check the HAProxy logs to see if there are any errors: ```{.bash data-promp="$"} $ sudo journalctl -u haproxy.service -n 100 -f ``` -## Testing +## Next steps -See the [Testing PostgreSQL cluster](ha-test.md) for the guidelines on how to test your PostgreSQL cluster for replication, failure, switchover. \ No newline at end of file +[Configure pgBackRest](pgbackrest.md){.md-button} \ No newline at end of file diff --git a/docs/solutions/ha-setup-yum.md b/docs/solutions/ha-setup-yum.md index 37523f247..4322cadaf 100644 --- a/docs/solutions/ha-setup-yum.md +++ b/docs/solutions/ha-setup-yum.md @@ -1,6 +1,6 @@ -# Deploying PostgreSQL for high availability with Patroni on RHEL or CentOS +# Deploying PostgreSQL for high availability with Patroni on RHEL and derivatives -This guide provides instructions on how to set up a highly available PostgreSQL cluster with Patroni on Red Hat Enterprise Linux or CentOS. +This guide provides instructions on how to set up a highly available PostgreSQL cluster with Patroni on Red Hat Enterprise Linux or compatible derivatives. ## Preconditions @@ -20,124 +20,162 @@ This guide provides instructions on how to set up a highly available PostgreSQL Ideally, in a production (or even non-production) setup, the PostgreSQL and ETCD nodes will be within a private subnet without any public connectivity to the Internet, and the HAProxy will be in a different subnet that allows client traffic coming only from a selected IP range. To keep things simple, we have implemented this architecture in a private environment, and each node can access the other by its internal, private IP. -## Preparation + +## Initial setup ### Set up hostnames in the `/etc/hosts` file It's not necessary to have name resolution, but it makes the whole setup more readable and less error prone. Here, instead of configuring a DNS, we use a local name resolution by updating the file `/etc/hosts`. By resolving their hostnames to their IP addresses, we make the nodes aware of each other's names and allow their seamless communication. -Modify the `/etc/hosts` file of each PostgreSQL node to include the hostnames and IP addresses of the remaining nodes. Add the following at the end of the `/etc/hosts` file on all nodes: - -=== "node1" +1. Run the following command on each node. Change the node name to `node1`, `node2` and `node3` respectively: - ```text hl_lines="3 4" - # Cluster IP and names - 10.104.0.1 node1 - 10.104.0.2 node2 - 10.104.0.3 node3 + ```{.bash data-prompt="$"} + $ sudo hostnamectl set-hostname node-1 ``` -=== "node2" +2. Modify the `/etc/hosts` file of each PostgreSQL node to include the hostnames and IP addresses of the remaining nodes. Add the following at the end of the `/etc/hosts` file on all nodes: - ```text hl_lines="2 4" - # Cluster IP and names - 10.104.0.1 node1 - 10.104.0.2 node2 - 10.104.0.3 node3 - ``` + === "node1" -=== "node3" + ```text hl_lines="3 4" + # Cluster IP and names + 10.104.0.1 node1 + 10.104.0.2 node2 + 10.104.0.3 node3 + ``` - ```text hl_lines="2 3" - # Cluster IP and names - 10.104.0.1 node1 - 10.104.0.2 node2 - 10.104.0.3 node3 - ``` + === "node2" -=== "HAproxy-demo" + ```text hl_lines="2 4" + # Cluster IP and names + 10.104.0.1 node1 + 10.104.0.2 node2 + 10.104.0.3 node3 + ``` - The HAProxy instance should have the name resolution for all the three nodes in its `/etc/hosts` file. Add the following lines at the end of the file: + === "node3" - ```text hl_lines="4 5 6" - # Cluster IP and names - 10.104.0.6 HAProxy-demo - 10.104.0.1 node1 - 10.104.0.2 node2 - 10.104.0.3 node3 - ``` + ```text hl_lines="2 3" + # Cluster IP and names + 10.104.0.1 node1 + 10.104.0.2 node2 + 10.104.0.3 node3 + ``` + + === "HAproxy-demo" + + The HAProxy instance should have the name resolution for all the three nodes in its `/etc/hosts` file. Add the following lines at the end of the file: + + ```text hl_lines="4 5 6" + # Cluster IP and names + 10.104.0.6 HAProxy-demo + 10.104.0.1 node1 + 10.104.0.2 node2 + 10.104.0.3 node3 + ``` + +### Install the software + +1. Install Percona Distribution for PostgreSQL on `node1`, `node2` and `node3` from Percona repository: -## Install Percona Distribution for PostgreSQL + * [Install `percona-release`](https://www.percona.com/doc/percona-repo-config/installing.html). + * Enable the repository: -Install Percona Distribution for PostgreSQL on `node1`, `node2` and `node3` from Percona repository: + ```{.bash data-prompt="$"} + $ sudo percona-release setup ppg14 + ``` + + * [Install Percona Distribution for PostgreSQL packages](../installing.md#on-red-hat-enterprise-linux-and-centos-using-yum). + + !!! important -1. [Install `percona-release`](https://www.percona.com/doc/percona-repo-config/installing.html). -2. Enable the repository: + **Don't** initialize the cluster and start the `postgresql` service. The cluster initialization and setup are handled by Patroni during the bootsrapping stage. +2. Install some Python and auxiliary packages to help with Patroni and ETCD + ```{.bash data-prompt="$"} - $ sudo percona-release setup ppg14 + $ sudo yum install python3-pip python3-dev binutils ``` -3. [Install Percona Distribution for PostgreSQL packages](../yum.md). +3. Install ETCD, Patroni, pgBackRest packages: -!!! important + ```{.bash data-prompt="$"} + $ sudo yum install percona-patroni \ + etcd python3-python-etcd\ + percona-pgbackrest + ``` - **Don't** initialize the cluster and start the `postgresql` service. The cluster initialization and setup are handled by Patroni during the bootsrapping stage. +4. Stop and disable all installed services: + + ```{.bash data-prompt="$"} + $ sudo systemctl stop {etcd,patroni,postgresql} + $ systemctl disable {etcd,patroni,postgresql} + ``` ## Configure ETCD distributed store -The distributed configuration store provides a reliable way to store data that needs to be accessed by large scale distributed systems. The most popular implementation of the distributed configuration store is ETCD. ETCD is deployed as a cluster for fault-tolerance and requires an odd number of members (n/2+1) to agree on updates to the cluster state. An ETCD cluster helps establish a consensus among nodes during a failover and manages the configuration for the three PostgreSQL instances. +The distributed configuration store helps establish a consensus among nodes during a failover and will manage the configuration for the three PostgreSQL instances. Although Patroni can work with other distributed consensus stores (i.e., Zookeeper, Consul, etc.), the most commonly used one is `etcd`. -The `etcd` cluster is first started in one node and then the subsequent nodes are added to the first node using the `add `command. The configuration is stored in the `/etc/etcd/etcd.conf` configuration file. +In this setup we'll install and configure ETCD on each database node. -1. Install `etcd` on every PostgreSQL node. For CentOS 8, the `etcd` packages are available from Percona repository: +### Configure `node1` - - [Install `percona-release`](https://www.percona.com/doc/percona-repo-config/installing.html). - - Enable the repository: +1. Backup the `etcd.conf` file: + + ```{.bash data-promp="$"} + $ sudo mv /etc/etcd/etcd.conf /etc/etcd/etcd.conf.orig + ``` - ```{.bash data-prompt="$"} - $ sudo percona-release setup ppg14 - ``` - - - Install the etcd packages using the following command: +2. Export environment variables to simplify the config file creation - ```{.bash data-prompt="$"} - $ sudo yum install etcd python3-python-etcd - ``` + * Node name: -2. Configure ETCD on `node1`. + ```{.bash data-prompt="$"} + $ export NODE_NAME=`hostname -f` + ``` - Backup the `etcd.conf` file: + * Node IP: + + ```{.bash data-prompt="$"} + $ export NODE_IP=`hostname -i | awk '{print $1}'` + ``` - ```{.bash data-promp="$"} - sudo mv /etc/etcd/etcd.conf /etc/etcd/etcd.conf.orig - ``` + * Initial cluster token for the ETCD cluster during bootstrap: + + ```{.bash data-prompt="$"} + $ export ETCD_TOKEN='PostgreSQL_HA_Cluster_1' + ``` - Modify the `/etc/etcd/etcd.conf` configuration file: + * ETCD data directory: + + ```{.bash data-prompt="$"} + $ export ETCD_DATA_DIR='/var/lib/etcd/postgresql' + ``` + +3. Modify the `/etc/etcd/etcd.conf` configuration file: ```text - [Member] - ETCD_DATA_DIR="/var/lib/etcd/default.etcd" - ETCD_LISTEN_PEER_URLS="http://10.104.0.1:2380,http://localhost:2380" - ETCD_LISTEN_CLIENT_URLS="http://10.104.0.1:2379,http://localhost:2379" - - ETCD_NAME="node1" - ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.104.0.1:2380" - ETCD_ADVERTISE_CLIENT_URLS="http://10.104.0.1:2379" - ETCD_INITIAL_CLUSTER="node1=http://10.104.0.1:2380" - ETCD_INITIAL_CLUSTER_TOKEN="percona-etcd-cluster" + ETCD_NAME=${NODE_NAME} + ETCD_INITIAL_CLUSTER="${NODE_NAME}=http://${NODE_IP}:2380" ETCD_INITIAL_CLUSTER_STATE="new" + ETCD_INITIAL_CLUSTER_TOKEN="${ETCD_TOKEN}" + ETCD_INITIAL_ADVERTISE_PEER_URLS="http://${NODE_IP}:2380" + ETCD_DATA_DIR="${ETCD_DATA_DIR}" + ETCD_LISTEN_PEER_URLS="http://${NODE_IP}:2380" + ETCD_LISTEN_CLIENT_URLS="http://${NODE_IP}:2379,http://localhost:2379" + ETCD_ADVERTISE_CLIENT_URLS="http://${NODE_IP}:2379" + … ``` -3. Start the `etcd` to apply the changes on `node1`: +4. Start the `etcd` to apply the changes on `node1`: - ```{.bash data-prompt="$"} - $ sudo systemctl enable etcd - $ sudo systemctl start etcd - $ sudo systemctl status etcd - ``` + ```{.bash data-prompt="$"} + $ sudo systemctl enable --now etcd + $ sudo systemctl start etcd + $ sudo systemctl status etcd + ``` -5. Check the etcd cluster members on `node1`. +5. Check the etcd cluster members on `node1`: ```{.bash data-prompt="$"} $ sudo etcdctl member list @@ -165,60 +203,72 @@ The `etcd` cluster is first started in one node and then the subsequent nodes ar ETCD_INITIAL_CLUSTER_STATE="existing" ``` -7. Edit the `/etc/etcd/etcd.conf` configuration file on `node2` and add the output from step 6: +### Configure `node2` + +1. Back up the configuration file and export environment variables as described in steps 1-2 of the [`node1` configuration](#configure-node1) +2. Edit the `/etc/etcd/etcd.conf` configuration file on `node2` and add the output from the `add` command: ```text [Member] - ETCD_NAME="node2" - ETCD_INITIAL_CLUSTER="node2=http://10.104.0.2:2380,node1=http://10.104.0.1:2380" - ETCD_INITIAL_CLUSTER_STATE="existing" - ETCD_DATA_DIR="/var/lib/etcd/default.etcd" - ETCD_INITIAL_CLUSTER_TOKEN="percona-etcd-cluster" - ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.104.0.2:2380" - ETCD_LISTEN_PEER_URLS="http://10.104.0.2:2380" - ETCD_LISTEN_CLIENT_URLS="http://10.104.0.2:2379,http://localhost:2379" - ETCD_ADVERTISE_CLIENT_URLS="http://10.104.0.2:2379" + + ETCD_NAME=${NODE_NAME} + ETCD_INITIAL_CLUSTER="node-1=http://10.0.100.1:2380,node-2=http://10.0.100.2:2380" + ETCD_INITIAL_CLUSTER_STATE="existing" ETCD_INITIAL_CLUSTER_TOKEN="${ETCD_TOKEN}" + ETCD_INITIAL_ADVERTISE_PEER_URLS="http://${NODE_IP}:2380" + ETCD_DATA_DIR="${ETCD_DATA_DIR}" + ETCD_LISTEN_PEER_URLS="http://${NODE_IP}:2380" + ETCD_LISTEN_CLIENT_URLS="http://${NODE_IP}:2379,http://localhost:2379" + ETCD_ADVERTISE_CLIENT_URLS="http://${NODE_IP}:2379" ``` -8. Start the `etcd` to apply the changes on `node2`: +3. Start the `etcd` to apply the changes on `node2`: ```{.bash data-promp="$"} - $ sudo systemctl enable etcd + $ sudo systemctl enable --now etcd $ sudo systemctl start etcd $ sudo systemctl status etcd ``` -9. Add `node3` to the cluster. Run the following command on `node1`: +### Configure `node3` + +1. Add `node3` to the cluster. **Run the following command on `node1`**: ```{.bash data-prompt="$"} $ sudo etcdctl member add node3 http://10.104.0.3:2380 ``` -10. Configure `etcd` on `node3`. Edit the `/etc/etcd/etcd.conf` configuration file on `node3` and add the IP addresses of all three nodes to the `ETCD_INITIAL_CLUSTER` parameter: +2. On `node3`, back up the configuration file and export environment variables as described in steps 1-2 of the [`node1` configuration](#configure-node1) +3. Modify the `/etc/etcd/etcd.conf` configuration file on `node3` and add the output from the `add` command as follows: ```text - ETCD_NAME=node3 + ETCD_NAME=${NODE_NAME} ETCD_INITIAL_CLUSTER="node1=http://10.104.0.1:2380,node2=http://10.104.0.2:2380,node3=http://10.104.0.3:2380" ETCD_INITIAL_CLUSTER_STATE="existing" - ETCD_INITIAL_CLUSTER_TOKEN="percona-etcd-cluster" - ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.104.0.3:2380" - ETCD_DATA_DIR="/var/lib/etcd/postgresql" - ETCD_LISTEN_PEER_URLS="http://10.104.0.3:2380" - ETCD_LISTEN_CLIENT_URLS="http://10.104.0.3:2379,http://localhost:2379" - ETCD_ADVERTISE_CLIENT_URLS="http://10.104.0.3:2379" + ETCD_INITIAL_CLUSTER_TOKEN="${ETCD_TOKEN}" + ETCD_INITIAL_ADVERTISE_PEER_URLS="http://${NODE_IP}:2380" + ETCD_DATA_DIR="${ETCD_DATA_DIR}" + ETCD_LISTEN_PEER_URLS="http://${NODE_IP}:2380" + ETCD_LISTEN_CLIENT_URLS="http://${NODE_IP}:2379,http://localhost:2379" + ETCD_ADVERTISE_CLIENT_URLS="http://${NODE_IP}:2379" … ``` -11. Start the `etcd` service on `node3`: +3. Start the `etcd` service on `node3`: ```{.bash data-prompt="$"} - $ sudo systemctl enable etcd + $ sudo systemctl enable --now etcd $ sudo systemctl start etcd $ sudo systemctl status etcd ``` -12. Check the etcd cluster members. +4. Check the etcd cluster members. + + ```{.bash data-prompt="$"} + $ sudo etcdctl member list + ``` + +5. Check the etcd cluster members. ```{.bash data-prompt="$"} $ sudo etcdctl member list @@ -234,19 +284,39 @@ The `etcd` cluster is first started in one node and then the subsequent nodes ar ## Configure Patroni -1. Install Patroni on every PostgreSQL node: +Run the following commands on all nodes. You can do this in parallel: - ```{.bash data-prompt="$"} - $ sudo yum install percona-patroni - ``` +1. Export and create environment variables to simplify the config file creation: -2. Install the Python module that enables Patroni to communicate with ETCD. + * Node name: - ```{.bash data-prompt="$"} - $ sudo python3 -m pip install patroni[etcd] - ``` + ```{.bash data-prompt="$"} + $ export NODE_NAME=`hostname -f` + ``` + + * Node IP: + + ```{.bash data-prompt="$"} + $ export NODE_IP=`hostname -i | awk '{print $1}'` + ``` + + * Create variables to store the PATH: + + ```bash + DATA_DIR="/var/lib/pgsql/data/" + PG_BIN_DIR="/usr/pgsql-14/bin" + ``` + + **NOTE**: Check the path to the data and bin folders on your operating system and change it for the variables accordingly. + + * Patroni information: -3. Create the directories required by Patroni + ```bash + NAMESPACE="percona_lab" + SCOPE="cluster_1 + ``` + +2. Create the directories required by Patroni * Create the directory to store the configuration file and make it owned by the `postgres` user. @@ -263,83 +333,88 @@ The `etcd` cluster is first started in one node and then the subsequent nodes ar $ sudo chmod 700 /data/pgsql ``` -4. Create the `/etc/patroni/patroni.yml` with the following configuration: +3. Create the `/etc/patroni/patroni.yml` configuration file with the following configuration: - ```yaml - namespace: percona_lab - scope: cluster_1 - name: node1 + ```yaml title="/etc/patroni/patroni.yml" + namespace: ${NAMESPACE} + scope: ${SCOPE} + name: ${NODE_NAME} restapi: - listen: 0.0.0.0:8008 - connect_address: 10.104.0.1:8008 + listen: 0.0.0.0:8008 + connect_address: ${NODE_IP}:8008 etcd: - host: 10.104.0.1:2379 # ETCD node IP address + host: ${NODE_IP}:2379 bootstrap: # this section will be written into Etcd:///config after initializing new cluster dcs: - ttl: 30 - loop_wait: 10 - retry_timeout: 10 - maximum_lag_on_failover: 1048576 - slots: - percona_cluster_1: - type: physical - postgresql: - use_pg_rewind: true - use_slots: true - parameters: - wal_level: replica - hot_standby: "on" - wal_keep_segments: 10 - max_wal_senders: 5 - max_replication_slots: 10 - wal_log_hints: "on" - logging_collector: 'on' - # some desired options for 'initdb' - initdb: # Note: It needs to be a list (some options need values, others are switches) - - encoding: UTF8 - - data-checksums - pg_hba: # Add following lines to pg_hba.conf after running 'initdb' - - host replication replicator 127.0.0.1/32 trust - - host replication replicator 0.0.0.0/0 md5 - - host all all 0.0.0.0/0 md5 - - host all all ::0/0 md5 - # Some additional users which needs to be created after initializing new cluster - users: - admin: - password: qaz123 - options: - - createrole - - createdb - percona: - password: qaz123 - options: - - createrole - - createdb - - postgresql: - cluster_name: cluster_1 - listen: 0.0.0.0:5432 - connect_address: 10.104.0.1:5432 - data_dir: /data/pgsql - bin_dir: /usr/pgsql-14/bin - pgpass: /tmp/pgpass - authentication: - replication: - username: replicator - password: replPasswd - superuser: - username: postgres + ttl: 30 + loop_wait: 10 + retry_timeout: 10 + maximum_lag_on_failover: 1048576 + slots: + percona_cluster_1: + type: physical + + postgresql: + use_pg_rewind: true + use_slots: true + parameters: + wal_level: replica + hot_standby: "on" + wal_keep_segments: 10 + max_wal_senders: 5 + max_replication_slots: 10 + wal_log_hints: "on" + logging_collector: 'on' + + # some desired options for 'initdb' + initdb: # Note: It needs to be a list (some options need values, others are switches) + - encoding: UTF8 + - data-checksums + + pg_hba: # Add following lines to pg_hba.conf after running 'initdb' + - host replication replicator 127.0.0.1/32 trust + - host replication replicator 0.0.0.0/0 md5 + - host all all 0.0.0.0/0 md5 + - host all all ::0/0 md5 + + # Some additional users which needs to be created after initializing new cluster + users: + admin: password: qaz123 - parameters: - unix_socket_directories: "/var/run/postgresql/" - create_replica_methods: - - basebackup - basebackup: - checkpoint: 'fast' + options: + - createrole + - createdb + percona: + password: qaz123 + options: + - createrole + - createdb + + postgresql: + cluster_name: cluster_1 + listen: 0.0.0.0:5432 + connect_address: ${NODE_IP}:5432 + data_dir: ${DATADIR} + bin_dir: ${PG_BIN_DIR} + pgpass: /tmp/pgpass + authentication: + replication: + username: replicator + password: replPasswd + superuser: + username: postgres + password: qaz123 + + parameters: + unix_socket_directories: "/var/run/postgresql/" + create_replica_methods: + - basebackup + basebackup: + checkpoint: 'fast' tags: nofailover: false @@ -348,141 +423,102 @@ The `etcd` cluster is first started in one node and then the subsequent nodes ar nosync: false ``` -5. Create the configuration files for `node2` and `node3`. Replace the **node name and IP address** of `node1` to those of `node2` and `node3`, respectively. +4. Check that the systemd unit file `patroni.service` is created in `/etc/systemd/system`. If it is created, skip this step. -6. Create the systemd unit file `patroni.service` in `/etc/systemd/system`. - - ```{.bash data-prompt="$"} - $ sudo vim /etc/systemd/system/patroni.service - ``` - - Add the following contents in the file: - - ```ini - [Unit] - Description=Runners to orchestrate a high-availability PostgreSQL - After=syslog.target network.target + If it's **not** created, create it manually and specify the following contents within: + + ```ini title="/etc/systemd/system/patroni.service" + [Unit] + Description=Runners to orchestrate a high-availability PostgreSQL + After=syslog.target network.target - [Service] - Type=simple + [Service] + Type=simple - User=postgres - Group=postgres + User=postgres + Group=postgres - # Start the patroni process - ExecStart=/bin/patroni /etc/patroni/patroni.yml + # Start the patroni process + ExecStart=/bin/patroni /etc/patroni/patroni.yml - # Send HUP to reload from patroni.yml - ExecReload=/bin/kill -s HUP $MAINPID + # Send HUP to reload from patroni.yml + ExecReload=/bin/kill -s HUP $MAINPID - # only kill the patroni process, not its children, so it will gracefully stop postgres - KillMode=process + # only kill the patroni process, not its children, so it will gracefully stop postgres + KillMode=process - # Give a reasonable amount of time for the server to start up/shut down - TimeoutSec=30 + # Give a reasonable amount of time for the server to start up/shut down + TimeoutSec=30 - # Do not restart the service if it crashes, we want to manually inspect database on failure - Restart=no + # Do not restart the service if it crashes, we want to manually inspect database on failure + Restart=no - [Install] - WantedBy=multi-user.target - ``` + [Install] + WantedBy=multi-user.target + ``` -7. Make systemd aware of the new service: +5. Make `systemd` aware of the new service: ```{.bash data-prompt="$"} $ sudo systemctl daemon-reload - $ sudo systemctl enable patroni - $ sudo systemctl start patroni ``` - !!! admonition "Troubleshooting Patroni" - - To ensure that Patroni has started properly, check the logs using the following command: +6. Now it's time to start Patroni. You need the following commands on all nodes but not in parallel. Start with the `node1` first, wait for the service to come to live, and then proceed with the other nodes one-by-one, always waiting for them to sync with the primary node: - ```{.bash data-prompt="$"} - $ sudo journalctl -u patroni.service -n 100 -f - ``` + ```{.bash data-prompt="$"} + $ sudo systemctl enable --now patroni + $ sudo systemctl restart patroni + ``` - The output shouldn't show any errors: + When Patroni starts, it initializes PostgreSQL (because the service is not currently running and the data directory is empty) following the directives in the bootstrap section of the configuration file. - ``` - … - - Sep 23 12:50:21 node01 systemd[1]: Started PostgreSQL high-availability manager. - Sep 23 12:50:22 node01 patroni[10119]: 2021-09-23 12:50:22,022 INFO: Selected new etcd server http://10.104.0.2:2379 - Sep 23 12:50:22 node01 patroni[10119]: 2021-09-23 12:50:22,029 INFO: No PostgreSQL configuration items changed, nothing to reload. - Sep 23 12:50:22 node01 patroni[10119]: 2021-09-23 12:50:22,168 INFO: Lock owner: None; I am node1 - Sep 23 12:50:22 node01 patroni[10119]: 2021-09-23 12:50:22,177 INFO: trying to bootstrap a new cluster - Sep 23 12:50:22 node01 patroni[10140]: The files belonging to this database system will be owned by user "postgres". - Sep 23 12:50:22 node01 patroni[10140]: This user must also own the server process. - Sep 23 12:50:22 node01 patroni[10140]: The database cluster will be initialized with locale "C.UTF-8". - Sep 23 12:50:22 node01 patroni[10140]: The default text search configuration will be set to "english". - Sep 23 12:50:22 node01 patroni[10140]: Data page checksums are enabled. - Sep 23 12:50:22 node01 patroni[10140]: creating directory /var/lib/postgresql/12/main ... ok - Sep 23 12:50:22 node01 patroni[10140]: creating subdirectories ... ok - Sep 23 12:50:22 node01 patroni[10140]: selecting dynamic shared memory implementation ... posix - Sep 23 12:50:22 node01 patroni[10140]: selecting default max_connections ... 100 - Sep 23 12:50:22 node01 patroni[10140]: selecting default shared_buffers ... 128MB - Sep 23 12:50:22 node01 patroni[10140]: selecting default time zone ... Etc/UTC - Sep 23 12:50:22 node01 patroni[10140]: creating configuration files ... ok - Sep 23 12:50:22 node01 patroni[10140]: running bootstrap script ... ok - Sep 23 12:50:23 node01 patroni[10140]: performing post-bootstrap initialization ... ok - Sep 23 12:50:23 node01 patroni[10140]: syncing data to disk ... ok - Sep 23 12:50:23 node01 patroni[10140]: initdb: warning: enabling "trust" authentication for local connections - Sep 23 12:50:23 node01 patroni[10140]: You can change this by editing pg_hba.conf or using the option -A, or - Sep 23 12:50:23 node01 patroni[10140]: --auth-local and --auth-host, the next time you run initdb. - Sep 23 12:50:23 node01 patroni[10140]: Success. You can now start the database server using: - Sep 23 12:50:23 node01 patroni[10140]: /usr/lib/postgresql/14/bin/pg_ctl -D /var/lib/postgresql/14/main -l logfile start - Sep 23 12:50:23 node01 patroni[10156]: 2021-09-23 12:50:23.672 UTC [10156] LOG: redirecting log output to logging collector process - Sep 23 12:50:23 node01 patroni[10156]: 2021-09-23 12:50:23.672 UTC [10156] HINT: Future log output will appear in directory "log". - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,694 INFO: postprimary pid=10156 - Sep 23 12:50:23 node01 patroni[10165]: localhost:5432 - accepting connections - Sep 23 12:50:23 node01 patroni[10167]: localhost:5432 - accepting connections - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,743 INFO: establishing a new patroni connection to the postgres cluster - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,757 INFO: running post_bootstrap - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,767 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds - Sep 23 12:50:23 node01 patroni[10119]: 2021-09-23 12:50:23,793 INFO: initialized a new cluster - Sep 23 12:50:33 node01 patroni[10119]: 2021-09-23 12:50:33,810 INFO: no action. I am (node1) the leader with the lock - Sep 23 12:50:33 node01 patroni[10119]: 2021-09-23 12:50:33,899 INFO: no action. I am (node1) the leader with the lock - Sep 23 12:50:43 node01 patroni[10119]: 2021-09-23 12:50:43,898 INFO: no action. I am (node1) the leader with the lock - Sep 23 12:50:53 node01 patroni[10119]: 2021-09-23 12:50:53,894 INFO: no action. I am (node1) the leader with the - ``` +7. Check the service to see if there are errors: - A common error is Patroni complaining about the lack of proper entries in the pg_hba.conf file. If you see such errors, you must manually add or fix the entries in that file and then restart the service. + ```{.bash data-prompt="$"} + $ sudo journalctl -fu patroni + ``` - Changing the patroni.yml file and restarting the service will not have any effect here because the bootstrap section specifies the configuration to apply when PostgreSQL is first started in the node. It will not repeat the process even if the Patroni configuration file is modified and the service is restarted. + A common error is Patroni complaining about the lack of proper entries in the pg_hba.conf file. If you see such errors, you must manually add or fix the entries in that file and then restart the service. - If Patroni has started properly, you should be able to locally connect to a PostgreSQL node using the following command: + Changing the patroni.yml file and restarting the service will not have any effect here because the bootstrap section specifies the configuration to apply when PostgreSQL is first started in the node. It will not repeat the process even if the Patroni configuration file is modified and the service is restarted. - ``{.bash data-prompt="$"} - $ sudo psql -U postgres - ``` + If Patroni has started properly, you should be able to locally connect to a PostgreSQL node using the following command: - The command output should look like the following: + ```{.bash data-prompt="$"} + $ sudo psql -U postgres - ``` - psql (14.1) - Type "help" for help. + psql (14.9) + Type "help" for help. - postgres=# - ``` + postgres=# + ``` -9. Configure, enable and start Patroni on the remaining nodes. -10. When all nodes are up and running, you can check the cluster status using the following command: +8. When all nodes are up and running, you can check the cluster status using the following command: - ```{.bash data-prompt="$"} - $ sudo patronictl -c /etc/patroni/patroni.yml list + ```{.bash data-prompt="$"} + $ sudo patronictl -c /etc/patroni/patroni.yml list + ``` + The output on `node1` resembles the following: + + ```{.text .no-copy} + + Cluster: cluster_1 --+---------+---------+----+-----------+ + | Member | Host | Role | State | TL | Lag in MB | + +--------+-------------+---------+---------+----+-----------+ + | node-1 | 10.0.100.1 | Leader | running | 1 | | + +--------+-------------+---------+---------+----+-----------+ + ``` - + Cluster: postgres (7011110722654005156) -----------+ - | Member | Host | Role | State | TL | Lag in MB | - +--------+-------+---------+---------+----+-----------+ - | node1 | node1 | Leader | running | 1 | | - | node2 | node2 | Replica | running | 1 | 0 | - | node3 | node3 | Replica | running | 1 | 0 | - +--------+-------+---------+---------+----+-----------+ - ``` + On the remaining nodes: + + ```{.text .no-copy} + + Cluster: cluster_1 --+---------+---------+----+-----------+ + | Member | Host | Role | State | TL | Lag in MB | + +--------+-------------+---------+---------+----+-----------+ + | node-1 | 10.0.100.1 | Leader | running | 1 | | + | node-2 | 10.0.100.2 | Replica | running | 1 | 0 | + +--------+-------------+---------+---------+----+-----------+ + ``` ## Configure HAProxy @@ -556,4 +592,8 @@ This way, a client application doesn’t know what node in the underlying cluste ```{.bash data-prompt="$"} $ sudo journalctl -u haproxy.service -n 100 -f - ``` \ No newline at end of file + ``` + +## Next steps + +[Configure pgBackRest](pgbackrest.md){.md-button} \ No newline at end of file diff --git a/docs/solutions/high-availability.md b/docs/solutions/high-availability.md index 016cc1178..1510e8f30 100644 --- a/docs/solutions/high-availability.md +++ b/docs/solutions/high-availability.md @@ -1,18 +1,12 @@ # High Availability in PostgreSQL with Patroni -!!! summary - - - Solution overview - - Cluster deployment - - Testing the cluster - PostgreSQL has been widely adopted as a modern, high-performance transactional database. A highly available PostgreSQL cluster can withstand failures caused by network outages, resource saturation, hardware failures, operating system crashes or unexpected reboots. Such cluster is often a critical component of the enterprise application landscape, where [four nines of availability](https://en.wikipedia.org/wiki/High_availability#Percentage_calculation) is a minimum requirement. -There are several methods to achieve high availability in PostgreSQL. In this description we use [Patroni](#patroni) - the open-source extension to facilitate and manage the deployment of high availability in PostgreSQL. +There are several methods to achieve high availability in PostgreSQL. This solution document provides [Patroni](#patroni) - the open-source extension to facilitate and manage the deployment of high availability in PostgreSQL. -!!! admonition "High availability methods" +??? admonition "High availability methods" - There are a few methods for achieving high availability with PostgreSQL: + There are several native methods for achieving high availability with PostgreSQL: - shared disk failover, - file system replication, @@ -44,7 +38,7 @@ There are several methods to achieve high availability in PostgreSQL. In this de ## Patroni -[Patroni](https://patroni.readthedocs.io/en/latest/) provides a template-based approach to create highly available PostgreSQL clusters. Running atop the PostgreSQL streaming replication process, it integrates with watchdog functionality to detect failed primary nodes and take corrective actions to prevent outages. Patroni also relies on a pluggable configuration store to manage distributed, multi-node cluster configuration and store the information about the cluster health there. Patroni comes with REST APIs to monitor and manage the cluster and has a command-line utility called _patronictl_ that helps manage switchovers and failure scenarios. +[Patroni](https://patroni.readthedocs.io/en/latest/) is a template for you to create your own customized, high-availability solution using Python and - for maximum accessibility - a distributed configuration store like ZooKeeper, etcd, Consul or Kubernetes. ### Key benefits of Patroni: @@ -67,13 +61,15 @@ The following diagram shows the architecture of a three-node PostgreSQL cluster The components in this architecture are: - PostgreSQL nodes -- Patroni provides a template for configuring a highly available PostgreSQL cluster. +- Patroni - a template for configuring a highly available PostgreSQL cluster. -- ETCD is a Distributed Configuration store that stores the state of the PostgreSQL cluster. +- ETCD - a Distributed Configuration store that stores the state of the PostgreSQL cluster. -- HAProxy is the load balancer for the cluster and is the single point of entry to client applications. +- HAProxy - the load balancer for the cluster and is the single point of entry to client applications. -- Softdog - a watchdog utility which is used by Patroni to check the nodes' health. Watchdog resets the whole system when it doesn't receive a keepalive heartbeat within a specified time. +- pgBackRest - the backup and restore solution for PostgreSQL + +- Percona Monitoring and Management (PMM) - the solution to monitor the health of your cluster ### How components work together @@ -83,13 +79,9 @@ Patroni periodically sends heartbeat requests with the cluster status to ETCD. E The connections to the cluster do not happen directly to the database nodes but are routed via a connection proxy like HAProxy. This proxy determines the active node by querying the Patroni REST API. -## Deployment - -Use the following links to navigate to the setup instructions relevant to your operating system: -- [Deploy on Debian or Ubuntu](ha-setup-apt.md) -- [Deploy on Red Hat Enterprise Linux or CentOS](ha-setup-yum.md) +## Next steps -## Testing +[Deploy on Debian or Ubuntu](ha-setup-apt.md){.md-button} +[Deploy on RHEL or derivatives](ha-setup-yum.md){.md-button} -See the [Testing PostgreSQL cluster](ha-test.md) for the guidelines on how to test your PostgreSQL cluster for replication, failure, switchover. \ No newline at end of file diff --git a/docs/solutions/pgbackrest.md b/docs/solutions/pgbackrest.md new file mode 100644 index 000000000..b7d59a932 --- /dev/null +++ b/docs/solutions/pgbackrest.md @@ -0,0 +1,341 @@ +# pgBackRest setup + +pgBackRest is the backup tool used to perform Postgres database backup, restoration, and point-in-time recovery. It is a server-client application, where the server runs on a dedicated host and a client runs on every PostgreSQL node. + +You also need a backup storage to store the backups. It can either be a remote storage such as AWS S3, S3-compatible storages or Azure blob storage, or a filesystem-based one. + +## Configure backup server + +### Install pgBackRest + +1. Enable the repository with [percona-release](https://www.percona.com/doc/percona-repo-config/index.html) + + ```{.bash data-prompt="$"} + $ sudo percona-release setup ppg-14 + ``` + +2. Install pgBackRest package + + === "Debian/Ubuntu" + + ```{.bash data-prompt="$"} + $ sudo apt install percona-pgbackrest + ``` + + === "RHEL/derivatives" + + ```{.bash data-prompt="$"} + $ sudo yum install percona-pgbackrest + ``` + +### Create the configuration file + +1. Create environment variables to simplify the config file creation: + + ```bash + export SRV_NAME="bkp-srv" + export NODE1_NAME="node-1" + export NODE2_NAME="node-2" + export NODE3_NAME="node-3" + ``` + +2. Create the `pgBackRest` repository + + A repository is where `pgBackRest` stores backups. In this example, the backups will be saved to `/var/lib/pgbackrest` + + ```{.bash data-prompt="$"} + $ sudo mkdir -p /var/lib/pgbackrest + $ sudo chmod 750 /var/lib/pgbackrest + $ sudo chown postgres:postgres /var/lib/pgbackrest + ``` + +3. The default pgBackRest configuration file location is `/etc/pgbackrest/pgbackrest.conf`. If it does not exist, then `/etc/pgbackrest.conf` is used next. Edit the `pgbackrest.conf` file to include the following configuration: + + ``` + [global] + + # Server repo details + repo1-path=/var/lib/pgbackrest + + ### Retention ### + # - repo1-retention-archive-type + # - If set to full pgBackRest will keep archive logs for the number of full backups defined by repo-retention-archive + repo1-retention-archive-type=full + + # repo1-retention-archive + # - Number of backups worth of continuous WAL to retain + # - NOTE: WAL segments required to make a backup consistent are always retained until the backup is expired regardless of how this option is configured + # - If this value is not set and repo-retention-full-type is count (default), then the archive to expire will default to the repo-retention-full + # repo1-retention-archive=2 + + # repo1-retention-full + # - Full backup retention count/time. + # - When a full backup expires, all differential and incremental backups associated with the full backup will also expire. + # - When the option is not defined a warning will be issued. + # - If indefinite retention is desired then set the option to the max value. + repo1-retention-full=4 + + # Server general options + process-max=12 + log-level-console=info + #log-level-file=debug + log-level-file=info + start-fast=y + delta=y + backup-standby=y + + ########## Server TLS options ########## + tls-server-address=* + tls-server-cert-file=/pg_ha/certs/${SRV_NAME}.crt + tls-server-key-file=/pg_ha/certs/${SRV_NAME}.key + tls-server-ca-file=/pg_ha/certs/ca.crt + + ### Auth entry ### + tls-server-auth=${NODE1_NAME}=cluster_1 + tls-server-auth=${NODE2_NAME}=cluster_1 + tls-server-auth=${NODE3_NAME}=cluster_1 + + ### Clusters and nodes ### + [cluster_1] + pg1-host=${NODE1_NAME} + pg1-host-port=8432 + pg1-port=5432 + pg1-path=/var/lib/postgresql/11/ + pg1-host-type=tls + pg1-host-cert-file=/pg_ha/certs/${SRV_NAME}.crt + pg1-host-key-file=/pg_ha/certs/${SRV_NAME}.key + pg1-host-ca-file=/pg_ha/certs/ca.crt + pg1-socket-path=/var/run/postgresql + + + pg2-host=${NODE2_NAME} + pg2-host-port=8432 + pg2-port=5432 + pg2-path=/var/lib/postgresql/11/ + pg2-host-type=tls + pg2-host-cert-file=/pg_ha/certs/${SRV_NAME}.crt + pg2-host-key-file=/pg_ha/certs/${SRV_NAME}.key + pg2-host-ca-file=/pg_ha/certs/ca.crt + pg2-socket-path=/var/run/postgresql + + pg3-host=${NODE3_NAME} + pg3-host-port=8432 + pg3-port=5432 + pg3-path=/var/lib/postgresql/11/ + pg3-host-type=tls + pg3-host-cert-file=/pg_ha/certs/${SRV_NAME}.crt + pg3-host-key-file=/pg_ha/certs/${SRV_NAME}.key + pg3-host-ca-file=/pg_ha/certs/ca.crt + pg3-socket-path=/var/run/postgresql + ``` + +4. Create the `systemd` unit file at the path `/etc/systemd/system/pgbackrest.service` + + ```ini title="/etc/systemd/system/pgbackrest.service" + [Unit] + Description=pgBackRest Server + After=network.target + StartLimitIntervalSec=0 + + [Service] + Type=simple + User=postgres + Restart=always + RestartSec=1 + ExecStart=/usr/bin/pgbackrest server + #ExecStartPost=/bin/sleep 3 + #ExecStartPost=/bin/bash -c "[ ! -z $MAINPID ]" + ExecReload=/bin/kill -HUP $MAINPID + + [Install] + WantedBy=multi-user.target + ``` + +### Create the certificate files + +1. Create the folder where to store the certificates. For example, `/pg_ha/certs` + +2. Define the variable for the certificates path: + + ```bash + export CA_PATH="/pg_ha/certs" + ``` + +3. Create the certificates and keys + + ```{.bash data-prompt="$"} + $ sudo -iu postgres openssl req -new -x509 -days 365 -nodes -out ${CA_PATH}/ca.crt -keyout ${CA_PATH}/ca.key -subj "/CN=root-ca" + ``` + +4. Create the certificate for the backup server + + ```{.bash data-prompt="$"} + $ sudo -iu postgres openssl req -new -nodes -out ${CA_PATH}/${SRV_NAME}.csr -keyout ${CA_PATH}/${SRV_NAME}.key -subj "/CN=${SRV_NAME}" + ``` + +5. Create the certificates for each node: `node1`, `node2`, `node3` + + ```{.bash data-prompt="$"} + $ sudo -iu postgres openssl req -new -nodes -out ${CA_PATH}/${NODE1_NAME}.csr -keyout ${CA_PATH}/${NODE1_NAME}.key -subj "/CN=${NODE1_NAME}" + $ sudo -iu postgres openssl req -new -nodes -out ${CA_PATH}/${NODE2_NAME}.csr -keyout ${CA_PATH}/${NODE2_NAME}.key -subj "/CN=${NODE2_NAME}" + $ sudo -iu postgres openssl req -new -nodes -out ${CA_PATH}/${NODE3_NAME}.csr -keyout ${CA_PATH}/${NODE3_NAME}.key -subj "/CN=${NODE3_NAME}" + ``` + +6. Sign the certificates with the `root-ca` key + + ```{.bash data-prompt="$"} + $ sudo -iu postgres openssl x509 -req -in ${CA_PATH}/${SRV_NAME}.csr -days 365 -CA ${CA_PATH}/ca.crt -CAkey ${CA_PATH}/ca.key -CAcreateserial -out ${CA_PATH}/${SRV_NAME}.crt + $ sudo -iu postgres openssl x509 -req -in ${CA_PATH}/${NODE1_NAME}.csr -days 365 -CA ${CA_PATH}/ca.crt -CAkey ${CA_PATH}/ca.key -CAcreateserial -out ${CA_PATH}/${NODE1_NAME}.crt + $ sudo -iu postgres openssl x509 -req -in ${CA_PATH}/${NODE2_NAME}.csr -days 365 -CA ${CA_PATH}/ca.crt -CAkey ${CA_PATH}/ca.key -CAcreateserial -out ${CA_PATH}/${NODE2_NAME}.crt + $ sudo -iu postgres openssl x509 -req -in ${CA_PATH}/${NODE3_NAME}.csr -days 365 -CA ${CA_PATH}/ca.crt -CAkey ${CA_PATH}/ca.key -CAcreateserial -out ${CA_PATH}/${NODE3_NAME}.crt + ``` + +7. Remove temporary files + + ```{.bash data-prompt="$"} + $ rm ${CA_PATH}/*.csr + ``` + +8. Reload, enable, and start the service + + ```{.bash data-prompt="$"} + $ sudo systemctl daemon-reload + $ sudo systemctl enable --now pgbackrest + ``` + +## Configure database servers + +Run the following command on `node1`, `node2` and `node3`. + +1. Create the certificates folder. For example, `/pg_ha/certs` + + ```{.bash data-prompt="$"} + $ sudo mkdir -p /pg_ha/certs + ``` + +2. Export environment variables to simplify config file creation + + ```bash + export NODE_NAME=`hostname -f` + ``` + +3. Create the configuration file. The default path is `/etc/pgbackrest.conf` + + ```ini title="/etc/pgbackrest.conf" + [global] + repo1-host=bkp-srv + repo1-host-user=postgres + repo1-host-type=tls + repo1-host-cert-file=/pg_ha/certs/${NODE_NAME}.crt + repo1-host-key-file=/pg_ha/certs/${NODE_NAME}.key + repo1-host-ca-file=/pg_ha/certs/ca.crt + + # general options + process-max=16 + log-level-console=info + log-level-file=debug + + # tls server options + tls-server-address=* + tls-server-cert-file=/pg_ha/certs/${NODE_NAME}.crt + tls-server-key-file=/pg_ha/certs/${NODE_NAME}.key + tls-server-ca-file=/pg_ha/certs/ca.crt + tls-server-auth=bkp-srv=cluster_1 + + [cluster_1] + pg1-path=/var/lib/postgresql/11 + ``` + +4. Create the `systemd` unit file at the path `/etc/systemd/system/pgbackrest.service` + + ```ini title="/etc/systemd/system/pgbackrest.service" + [Unit] + Description=pgBackRest Server + After=network.target + StartLimitIntervalSec=0 + + [Service] + Type=simple + User=postgres + Restart=always + RestartSec=1 + ExecStart=/usr/bin/pgbackrest server + #ExecStartPost=/bin/sleep 3 + #ExecStartPost=/bin/bash -c "[ ! -z $MAINPID ]" + ExecReload=/bin/kill -HUP $MAINPID + + [Install] + WantedBy=multi-user.target + ``` + +5. Reload, enable, and start the service + + ```{.bash data-prompt="$"} + $ sudo systemctl daemon-reload + $ sudo systemctl enable --now pgbackrest + ``` + +6. Change Patroni configuration to use pgBackRest. Run this command on one node only, for example, on `node1`. Edit the `/etc/patroni/patroni.yml` file : + + ```yaml title="/etc/patroni/patroni.yml" + loop_wait: 10 + maximum_lag_on_failover: 1048576 + postgresql: + parameters: + archive_command: pgbackrest --stanza=cluster_1 archive-push "/var/lib/postgresql/15/main/pg_wal/%f" + archive_mode: true + archive_timeout: 1800s + hot_standby: true + logging_collector: 'on' + max_replication_slots: 10 + max_wal_senders: 5 + wal_keep_size: 4096 + wal_level: logical + wal_log_hints: true + recovery_conf: + recovery_target_timeline: latest + restore_command: pgbackrest --config=/etc/pgbackrest.conf --stanza=cluster_1 archive-get %f "%p" + use_pg_rewind: true + use_slots: true + retry_timeout: 10 + slots: + percona_cluster_1: + type: physical + ttl: 30 + ``` + +## Create backups + +Run the following commands on the **backup server** + +1. Create the stanza. A stanza is the configuration for a PostgreSQL database cluster that defines where it is located, how it will be backed up, archiving options, etc. + + ```{.bash data-prompt="$"} + $ sudo -iu postgres pgbackrest --stanza=cluster_1 stanza-create + ``` + +2. Create a full backup + + ```{.bash data-prompt="$"} + $ sudo -iu postgres pgbackrest --stanza=cluster_1 --type=full backup + ``` + +3. Create an incremental backup + + ```{.bash data-prompt="$"} + $ sudo -iu postgres pgbackrest --stanza=cluster_1 --type=incr backup + ``` + +4. Check backup info + + ```{.bash data-prompt="$"} + $ sudo -iu postgres pgbackrest --stanza=cluster_1 info + ``` + +5. Expire (remove) a backup. Be careful with removal, because removing a full backup also removes dependent incremental backups + + ```{.bash data-prompt="$"} + $ sudo -iu postgres pgbackrest --stanza=cluster_1 expire --set=20230617-021338F + ``` + +[Test PostgreSQL cluster](ha-test.md){.md-button} \ No newline at end of file diff --git a/mkdocs-base.yml b/mkdocs-base.yml index 35edc4ae8..c40f5649c 100644 --- a/mkdocs-base.yml +++ b/mkdocs-base.yml @@ -162,7 +162,8 @@ nav: - High availability: - "High availability": "solutions/high-availability.md" - 'Deploying on Debian or Ubuntu': 'solutions/ha-setup-apt.md' - - 'Deploying on RHEL or CentOS': 'solutions/ha-setup-yum.md' + - 'Deploying on RHEL or derivatives': 'solutions/ha-setup-yum.md' + - solutions/pgbackrest.md - solutions/ha-test.md - Backup and disaster recovery: - "Backup and disaster recovery": "solutions/backup-recovery.md"