您现在的位置是：网站首页> 编程资料编程资料

postgres主备切换之文件触发方式详解_PostgreSQL_

2023-05-27 706人已围观

简介 postgres主备切换之文件触发方式详解_PostgreSQL_

本文测试参考PostgresSQL实战一书。

本文档测试环境：

主库IP：192.168.40.130 主机名：postgres 端口：5442

备库IP: 192.168.40.131 主机名：postgreshot 端口：5442

PostgreSQL9.0版本流复制主备切换只能通过创建触发文件方式进行，这一小节将介绍这种主备切换方式，测试环境为一主一备异步流复制环境，postgres上的数据库为主库，postgreshot上的数据库为备库，文件触发方式的手工主备切换主要步骤如下：

1）配置备库recovery.conf文件trigger_file参数，设置激活备库的触发文件路径和名称。

2）关闭主库，建议使用-m fast模式关闭。

3）在备库上创建触发文件激活备库，如果recovery.conf变成recovery.done表示备库已经切换成主库。

4）这时需要将老的主库切换成备库，在老的主库的$PGDATA目录下创建recovery.conf文件（如果此目录下不存在recovery.conf文件，可以根据$PGHOME/share/recovery.conf.sample模板文件复制一个，如果此目录下存在recovery.done文件，需将recovery.done文件重命名为recovery.conf），配置和老的从库一样，只是primary_conninfo参数中的IP换成对端IP。

5）启动老的主库，这时观察主、备进程是否正常，如果正常表示主备切换成功。

1、首先在备库上配置recovery.conf，如下所示：

 [postgres@postgreshot pg11]$ cat recovery.conf | grep -v '^#' recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.40.130 port=5442 user=replica application_name=pg1' # e.g. 'host=localhost port=5432' trigger_file = '/home/postgres/pg11/trigger' [postgres@postgreshot pg11]$

trigger_file可以配置成普通文件或隐藏文件，调整以上参数后需重启备库使配置参数生效。

2、关闭主库，如下所示：

 [postgres@postgres pg11]$ pg_ctl stop -m fast waiting for server to shut down.... done server stopped [postgres@postgres pg11]$

3、在备库上创建触发文件激活备库，如下所示：

 [postgres@postgreshot pg11]$ ll recovery.conf -rwx------ 1 postgres postgres 5.9K Mar 26 18:47 recovery.conf [postgres@postgreshot pg11]$ [postgres@postgreshot pg11]$ touch /home/postgres/pg11/trigger [postgres@postgreshot pg11]$ ll recovery* -rwx------ 1 postgres postgres 5.9K Mar 26 18:47 recovery.done [postgres@postgreshot pg11]$

触发器文件名称和路径需和recovery.conf配置文件trigger_file保持一致，再次查看recovery文件时，发现后辍由原来的.conf变成了.done

查看备库数据库日志，如下所示：

 2019-03-26 23:30:19.399 EDT [93162] LOG: replication terminated by primary server 2019-03-26 23:30:19.399 EDT [93162] DETAIL: End of WAL reached on timeline 3 at 0/50003D0. 2019-03-26 23:30:19.399 EDT [93162] FATAL: could not send end-of-streaming message to primary: no COPY in progress 2019-03-26 23:30:19.399 EDT [93158] LOG: invalid record length at 0/50003D0: wanted 24, got 0 2019-03-26 23:30:19.405 EDT [125172] FATAL: could not connect to the primary server: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2019-03-26 23:30:24.410 EDT [125179] FATAL: could not connect to the primary server: could not connect to server: Connection refused Is the server running on host "192.168.40.130" and accepting TCP/IP connections on port 5442? 2019-03-26 23:31:49.505 EDT [93158] LOG: trigger file found: /home/postgres/pg11/trigger 2019-03-26 23:31:49.506 EDT [93158] LOG: redo done at 0/5000360 2019-03-26 23:31:49.506 EDT [93158] LOG: last completed transaction was at log time 2019-03-26 19:03:11.202845-04 2019-03-26 23:31:49.516 EDT [93158] LOG: selected new timeline ID: 4 2019-03-26 23:31:50.063 EDT [93158] LOG: archive recovery complete 2019-03-26 23:31:50.083 EDT [93157] LOG: database system is ready to accept connections

根据备库以上信息，由于关闭了主库，首先日志显示连接不上主库，接着显示发现了触发文件，之后显示恢复成功，数据库切换成读写模式。

这时根据pg_controldata输出进行验证，如下所示：

 [postgres@postgreshot ~]$ pg_controldata | grep cluster Database cluster state: in production [postgres@postgreshot ~]$

以上显示数据库角色已经是主库角色，在postgreshot上创建一张名为test_alived的表并插入数据，如下所示：

 postgres=# CREATE TABLE test_alived2(id int4); CREATE TABLE postgres=# INSERT INTO test_alived2 VALUES(1); INSERT 0 1 postgres=#

4、准备将老的主库切换成备库角色，在老的主库上配置recovery.conf，如下所示：

 [postgres@postgres pg11]$ cat recovery.conf | grep -v '^#' recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.40.131 port=5442 user=replica application_name=pg2' # e.g. 'host=localhost port=5432' trigger_file = '/home/postgres/pg11/trigger' [postgres@postgres pg11]$

以上配置和postgreshot上的recovery.done配置文件基本一致，只是primary_conninfo参数的host选项配置成对端主机IP。

之后在postgres主机用户家目录创建~/.pgpass文件，如下所示：

 [postgres@pghost1 ~]$ touch ~/.pgpass [postgres@pghost1 ~]$ chmod 600 ~/.pgpass

并在~/.pgpass文件中插入以下内容：

 [postgres@postgres ~]$ cat .pgpass 192.168.40.130:5442:replication:replica:replica 192.168.40.131:5442:replication:replica:replica [postgres@postgres ~]

之后启动postgres上的数据库，如下所示：

 [postgres@postgres ~]$ pg_ctl start waiting for server to start....2019-03-26 23:38:50.424 EDT [55380] LOG: listening on IPv4 address "0.0.0.0", port 5442 2019-03-26 23:38:50.424 EDT [55380] LOG: listening on IPv6 address "::", port 5442 2019-03-26 23:38:50.443 EDT [55380] LOG: listening on Unix socket "/tmp/.s.PGSQL.5442" 2019-03-26 23:38:50.465 EDT [55381] LOG: database system was shut down in recovery at 2019-03-26 23:38:20 EDT 2019-03-26 23:38:50.465 EDT [55381] LOG: entering standby mode 2019-03-26 23:38:50.483 EDT [55381] LOG: consistent recovery state reached at 0/50003D0 2019-03-26 23:38:50.483 EDT [55381] LOG: invalid record length at 0/50003D0: wanted 24, got 0 2019-03-26 23:38:50.483 EDT [55380] LOG: database system is ready to accept read only connections done server started [postgres@postgres ~]$ 2019-03-26 23:38:50.565 EDT [55385] LOG: fetching timeline history file for timeline 4 from primary server 2019-03-26 23:38:50.588 EDT [55385] LOG: started streaming WAL from primary at 0/5000000 on timeline 3 2019-03-26 23:38:50.589 EDT [55385] LOG: replication terminated by primary server 2019-03-26 23:38:50.589 EDT [55385] DETAIL: End of WAL reached on timeline 3 at 0/50003D0. 2019-03-26 23:38:50.592 EDT [55381] LOG: new target timeline is 4 2019-03-26 23:38:50.594 EDT [55385] LOG: restarted WAL streaming at 0/5000000 on timeline 4 2019-03-26 23:38:50.717 EDT [55381] LOG: redo starts at 0/50003D0 [postgres@postgres ~]$ pg_controldata | grep cluster Database cluster state: in archive recovery [postgres@postgres ~]$ postgres=# select * from test_alived2; id ---- 1 (1 row) postgres=#

同时，postgres上已经有了WAL接收进程，postgreshot上有了WAL发送进程，说明老的主库已经成功切换成备库，以上是主备切换的所有步骤。

为什么在步骤2中需要干净地关闭主库？数据库关闭时首先做一次checkpoint，完成之后通知WAL发送进程要关闭了，WAL发送进程会将截止此次checkpoint的WAL日志流发送给备库的WAL接收进程，备节点接收到主库最后发送来的WAL日志流后应用WAL，从而达到了和主库一致的状态。

另一个需要注意的问题是假如主库主机异常宕机了，如果激活备库，备库的数据完全和主库一致吗？此环境为一主一备异步流复制环境，备库和主库是异步同步方式，存在延时，这时主库上已提交事务的WAL有可能还没来得及发送给备库，主库主机就已经宕机了，因此异步流复制备库可能存在事务丢失的风险。

主备切换之pg_ctl promote方式

上面介绍了以文件触发方式进行主备切换，PostgreSQL9.1版本开始支持pg_ctl promote触发方式，相比文件触发方式操作更方便，promote命令语法如下：

 pg_ctl promote [-D datadir]

-D是指数据目录，如果不指定会使用环境变量$PGDATA设置的值。promote命令发出后，运行中的备库将停止恢复模式并切换成读写模式的主库。

pg_ctl promote主备切换步骤和文件触发方式大体相同，只是步骤1中不需要配置recovery.conf配置文件中的trigger_file参数，并且步骤3中换成以pg_ctl promote方式进行主备切换，如下：

1）关闭主库，建议使用-m fast模式关闭。

2）在备库上执行pg_ctl promote命令激活备库，如果recovery.conf变成recovery.done表示备库已切换成为主库。

3）这时需要将老的主库切换成备库，在老的主库的$PGDATA目录下创建recovery.conf文件（如果此目录下不存在recovery.conf文件，可以根据$PGHOME/share/recovery.conf.sample模板文件复制一个，如果此目录下存在recovery.done文件，需将recovery.done文件重命名为recovery.conf），配置和老的从库一样，只是primary_conninfo参数中的IP换成对端IP。

4）启动老的主库，这时观察主、备进程是否正常，如果正常表示主备切换成功。以上是pg_ctl promote主备切换的主要步骤，这一小节不进行演示了，下一小节介绍pg_rewind工具时会给出使用pg_ctl promote进行主备切换的示例

pg_rewind

pg_rewind是流复制维护时一个非常好的数据同步工具，在上一节介绍流复制主备切换内容中讲到了主要有五个步骤进行主备切换，其中步骤2是在激活备库前先关闭主库，如果不做步骤2会出现什么样的情况？下面我们举例进行演示，测试环境为一主一备异步流复制环境，postgres上的数据库为主库，postgreshot上的数据库为备库。

主备切换

--备节点 recovery.conf 配置: postgreshot 上操作

备库recovery.conf配置如下所示：

 [postgres@postgreshot pg11]$ cat recovery.conf | grep -v '^#' recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.40.130 port=5442 user=replica application_name=pg1' # e.g. 'host=localhost port=5432' trigger_file = '/home/postgres/pg11/trigger' [postgres@postgreshot pg11]$

--激活备节点: postgreshot 上操作

检查流复制状态，确保正常后在备库主机上执行以下命令激活备库，如下所示

 [postgres@postgreshot pg11]$ pg_ctl promote -D $PGDATA waiting for server to promote.... done server promoted [postgres@postgreshot pg11]$ [postgres@postgreshot pg11]$

查看备库数据库日志，能够看到数据库正常打开接收外部连接的信息，这说明激活成功，检查postgreshot上的数据库角色，如下所示:

 [postgres@postgreshot pg11]$ pg_controldata | grep cluster Database cluster state: in production [postgres@postgreshot pg11]$

从pg_controldata输出也可以看到postgreshot上的数据库已成为主库，说明postgreshot上的数据库已经切换成主库，这时老的主库（postgres上的数据库）依然还在运行中，我们计划将postgres上的角色转换成备库，先查看postgres上的数据库角色，如下所示

 [postgres@postgres pg11]$ pg_controldata | grep cluster Database cluster state: in production [postgres@postgres pg11]$

--备节点激活后，创建一张测试表并插入数据

 postgres=# create table test_1(id int4); CREATE TABLE postgres=# insert into test_1(id) select n from generate_series(1,10) n; INSERT 0 10 postgres=#

--停原来主节点: postgres 上操作

[postgres@postgres pg11]$ pg_controldata | grep cluster Database cluster state: in production [postgres@postgres pg11]$ [postgres@postgres pg11]$ pg_ctl stop -m fast -D $PGDATA 2019-03-27 01:10:46.714 EDT [64858] LOG: received fast shutdown request waiting for server to shut down....2019-03-27 01:10:46.716 EDT [64858] LOG: aborting any active transactions 2019-03-27 01:10:46.717 EDT [64858] LOG: background worker "logical replication launcher" (PI

上一篇：Postgresql创建新增、删除与修改触发器的方法_PostgreSQL_

下一篇：基于postgresql数据库锁表问题的解决_PostgreSQL_

您现在的位置是：网站首页> 编程资料编程资料

postgres主备切换之文件触发方式详解_PostgreSQL_

本文档测试环境：

主备切换之pg_ctl promote方式

主备切换

相关内容

点击排行

本栏推荐

猜你喜欢