PostgreSQL流式复制入门经验分享！

本篇文章中我们将深入探讨了在 PostgreSQL 中设置流复制 (SR) 的具体细节。流式复制是在PostgreSQL 托管中实现高可用性的基本构建块，它是通过运行主从配置生成的。

主从术语

主/主服务器

可以进行写入的服务器。
也称为读/写服务器。

从/备用服务器

数据与主服务器持续保持同步的服务器。
也称为备份服务器或副本。
暖备用服务器是在提升为主服务器之前无法连接的服务器。
相比之下，热备服务器可以接受连接并提供只读查询。在接下来的讨论中，我们将只关注热备服务器。

数据写入主服务器并传播到从服务器。如果现有主服务器出现问题，其中一台从服务器将接管并继续写入以确保系统的可用性。

WAL 基于运输的复制

什么是 WAL？

WAL 代表Write-Ahead Logging。
它是一个日志文件，所有对数据库的修改在应用/写入数据文件之前都会写入其中。
WAL 用于数据库崩溃后的恢复，确保数据完整性。
WAL 用于数据库系统以实现原子性和持久性。

WAL 如何用于复制？

预写日志记录用于保持数据库服务器之间的数据同步。这是通过两种方式实现的：

基于文件的日志传送

WAL 日志文件从主服务器传送到备用服务器以保持数据同步。
Master可以直接将日志拷贝到备服务器存储，也可以与备服务器共享存储。
一个 WAL 日志文件最多可以包含 16MB 的数据。
WAL 文件仅在达到该阈值后才会发送。
这将导致复制延迟，并且如果主服务器崩溃且日志未归档，也会增加丢失数据的机会。

流式 WAL 记录

WAL 记录块由数据库服务器流式传输以保持数据同步。
备用服务器连接到主服务器以接收 WAL 块。
WAL 记录在生成时进行流式传输。
WAL 记录的流式传输不需要等待 WAL 文件被填充。
与基于文件的日志传送相比，这允许备用服务器保持更新。
默认情况下，流复制是异步的，即使它也支持同步复制。

这两种方法各有优缺点。使用基于文件的传送可实现时间点恢复和连续归档，而流可确保备用服务器上的数据即时可用。但是，您可以将 PostgreSQL 配置为同时使用这两种方法并享受好处。在这篇文章中，我们主要关注流复制以实现 PostgreSQL 高可用性。

如何设置流式复制

在 PostgreSQL 中设置流式复制非常简单。假设所有服务器上都已经安装了 PostgreSQL，您可以按照以下步骤开始：

主节点上的配置

使用该initdb实用程序在主节点上初始化数据库。
通过运行以下命令创建具有复制权限的角色/用户。运行该命令后，您可以通过运行\du 以在 psql 上列出它们来验证它。CREATE USER <user_name> REPLICATION LOGIN ENCRYPTED PASSWORD '<password>';
在主 PostgreSQL 配置 (postgresql.conf) 文件中配置与流式复制相关的属性：

# Possible values are replica|minimal|logical
wal_level = replica
# required for pg_rewind capability when standby goes out of sync with master
wal_log_hints = on
# sets the maximum number of concurrent connections from the standby servers.
max_wal_senders = 3
# The below parameter is used to tell the master to keep the minimum number of
# segments of WAL logs so that they are not deleted before standby consumes them.
# each segment is 16MB
wal_keep_segments = 8
# The below parameter enables read only connection on the node when it is in
# standby role. This is ignored when the server is running as master.
hot_standby = on

在 pg_hba.conf 文件中添加复制条目以允许服务器之间的复制连接：

# Allow replication connections from localhost,
# by a user with the replication privilege.
# TYPE DATABASE USER ADDRESS METHOD
host replication repl_user IPaddress(CIDR) md5

在主节点上重启 PostgreSQL 服务以使更改生效。

备用节点上的配置

使用该pg_basebackup 实用程序创建主节点的基本备份，并将其用作备用节点的起点。

# Explaining a few options used for pg_basebackup utility
# -X option is used to include the required transaction log files (WAL files) in the
# backup. When you specify stream, this will open a second connection to the server
# and start streaming the transaction log at the same time as running the backup.
# -c is the checkpoint option. Setting it to fast will force the checkpoint to be
# created soon.
# -W forces pg_basebackup to prompt for a password before connecting
# to a database.
pg_basebackup -D <data_directory> -h <master_host> -X stream -c fast -U repl_user -W

如果不存在，则创建复制配置文件（如果 pg_basebackup 中提供了 -R 选项，则会自动创建）：

# Specifies whether to start the server as a standby. In streaming replication,
# this parameter must be set to on.
standby_mode = ‘on’
# Specifies a connection string which is used for the standby server to connect
# with the primary/master.
primary_conninfo = ‘host=<master_host> port=<postgres_port> user=<replication_user> password=<password> application_name=”host_name”’
# Specifies recovering to a particular timeline. The default is to recover along the
# same timeline that was current when the base backup was taken.
# Setting this to latest recovers to the latest timeline found
# in the archive, which is useful in a standby server.
recovery_target_timeline = ‘latest’

启动待机。

备用配置必须在所有备用服务器上完成。配置完成并启动备用服务器后，它将连接到主服务器并开始流式传输日志。这将设置复制并可以通过运行 SQL 语句进行验证 SELECT * FROM pg_stat_replication;。

默认情况下，流式复制是异步的。如果您希望使其同步，则可以使用以下参数对其进行配置：

# num_sync is the number of synchronous standbys from which transactions
# need to wait for replies.
# standby_name is same as application_name value in recovery.conf
# If all standby servers have to be considered for synchronous then set value ‘*’
# If only specific standby servers needs to be considered, then specify them as
# comma-separated list of standby_name.
# The name of a standby server for this purpose is the application_name setting of the
# standby, as set in the primary_conninfo of the
# standby’s WAL receiver.
synchronous_standby_names = ‘num_sync ( standby_name [, ...] )’

Synchronous_commit 必须为同步复制设置，这是默认设置。PostgreSQL 为同步提交提供了非常灵活的选项，并且可以在用户/数据库级别进行配置。有效值如下：

Off - 甚至在该事务记录实际刷新到该节点上的 WAL 日志文件之前，事务提交已向客户端确认。
本地- 只有在将该事务记录刷新到该节点上的 WAL 日志文件后，才会向客户端确认事务提交。
Remote_write – 只有在由指定的服务器synchronous_standby_names 确认事务记录已写入磁盘缓存后，才向客户端确认事务提交，但不一定在刷新到 WAL 日志文件后。
On – 只有在指定的服务器synchronous_standby_names 确认事务记录刷新到 WAL 日志文件后，才会向客户端确认事务提交。
Remote_apply – 只有在由指定的服务器synchronous_standby_names 确认事务记录已刷新到 WAL 日志文件并将其应用于数据库后，才会向客户端确认事务提交。

synchronous_commit 在同步复制模式下设置为 off 或 local 会使其像异步一样工作，并且可以帮助您获得更好的写入性能。但是，这会增加备用服务器上的数据丢失和读取延迟的风险。如果设置为 remote_apply，它将确保备用服务器上的数据立即可用，但写入性能可能会降低，因为它应该应用于所有/提到的备用服务器。

如果您计划使用连续存档和时间点恢复，则可以启用存档模式。虽然流式复制不是强制性的，但启用存档模式有额外的好处。如果归档模式未开启，那么我们需要使用复制槽功能或确保根据负载将wal_keep_segments值设置得足够高。