前面章节介绍了 Redis 哨兵模式,但该哨兵模式只有一个哨兵,如果该哨兵宕掉,当 Master 出现问题时,也不能进行故障转移。下面将介绍怎样启动多个哨兵,及时宕掉一个哨兵,依然能够实现 Master 故障转移。如下图:
从上图可知,我们加入了三个哨兵(Sentinel),每个哨兵都分别对 Master 和 Slave 进行监听,并且哨兵之间还会互相监测和通信。
注意:在配置多哨兵模式之前,需要提前配置好 Master-Slave 主从复制模式,如不知道怎样配置,请参考“Redis 主从复制”。
下面介绍的 Redis 多哨兵模式总共需要启动 6 个 Redis 服务,其中三个为 Redis 服务,另外三个为哨兵服务,如下:
Master 127.0.0.1:6379 Slave1 127.0.0.1:6378 Slave2 127.0.0.1:6377 Sentinel1 127.0.0.1:26379 Sentinel2 127.0.0.1:26378 Sentinel3 127.0.0.1:26377
Redis 多哨兵模式配置目录如下图:
其中,start-redis-6377.bat(start-redis-6378.bat 和 start-redis-6379.bat 类似)脚本内容如下:
@echo off title redis5-slave2-6377 cd %~dp0\redis5-slave2-6377 redis-server.exe redis.windows.conf cd %~dp0 pause
start-redis-sentinel-26377.bat(start-redis-sentinel-26378.bat 和 start-redis-sentinel-26379.bat 也类似)脚本内容如下:
@echo off title sentinel-26377 cd %~dp0\redis5-sentinel-26377 redis-server.exe sentinel.conf --sentinel cd %~dp0 pause
下面介绍怎样去配置多个哨兵,配置前需要注意如下几点:
(1)每个哨兵需要使用 sentinel myid 定义唯一的 id
(2)每个哨兵需要使用 port 定义不同的端口
除了上面两点外,其他配置和单哨兵一致,主要需要配置 sentinel monitor、sentinel down-after-milliseconds 等。
下面是哨兵1的部分配置内容:
protected-mode no port 26379 # 设置哨兵唯一ID sentinel myid 19e18f6594d97e540e075ca27bd73708911ad101 # Sentinel去监视一个名为 mymaster 的主 redis 实例 # 这个主实例的 IP 地址为 127.0.0.1,端口号为6379 # 而将这个主实例判断为失效至少需要 2 个 Sentinel 进程的同意 # 只要同意 Sentinel 的数量不达标,自动 failover 就不会执行 sentinel monitor mymaster 127.0.0.1 6379 2 # 指定了 Sentinel 认为 Redis 实例已经失效所需的毫秒数 # 当实例超过该时间没有返回 PING,或者直接返回错误 # 那么 Sentinel 将这个实例标记为主观下线。 # 只有一个 Sentinel 进程将实例标记为主观下线并不一定会引起实例的自动故障迁移 # 只有在足够数量的 Sentinel 都将一个实例标记为主观下线之后 # 实例才会被标记为客观下线,这时自动故障迁移才会执行 sentinel down-after-milliseconds mymaster 5000 # 设置连接 master 和 slave 时的密码,注意 sentinel 不能分别为 master # 和 slave 设置不同的密码,因此 master 和 slave 的密码应该设置相同 sentinel auth-pass mymaster aaaaaa # 指定了在执行故障转移时,最多可以有多少个从 Redis 实例在同步新的主实例 # 在 Redis 从(Slave)实例较多的情况下这个数字越小,同步的时间越长,完成故障转移所需的时间就越长 sentinel config-epoch mymaster 5
哨兵2的配置文件内容如下:
protected-mode no port 26378 # 设置哨兵唯一ID sentinel myid 19e18f6594d97e540e075ca27bd73708911ad102 sentinel monitor mymaster 127.0.0.1 6379 2 sentinel down-after-milliseconds mymaster 5000 sentinel auth-pass mymaster aaaaaa sentinel config-epoch mymaster 5
哨兵3的配置文件内容如下:
protected-mode no port 26377 # 设置哨兵唯一ID sentinel myid 19e18f6594d97e540e075ca27bd73708911ad103 sentinel monitor mymaster 127.0.0.1 6379 2 sentinel down-after-milliseconds mymaster 5000 sentinel auth-pass mymaster aaaaaa sentinel config-epoch mymaster 5
分别通过 redis-server.exe sentinel.conf --sentinel 命令去启动哨兵,启动日志如下:
D:\redis5\redis5-sentinel-26379> redis-server.exe sentinel.conf --sentinel [45316] 10 Apr 12:42:28.886 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo [45316] 10 Apr 12:42:28.886 # Redis version=5.0.14.1, bits=64, commit=ec77f72d, modified=0, pid=45316, just started [45316] 10 Apr 12:42:28.886 # Configuration loaded _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 5.0.14.1 (ec77f72d/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in sentinel mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 26379 | `-._ `._ / _.-' | PID: 45316 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' [45316] 10 Apr 12:42:28.891 # Sentinel ID is 19e18f6594d97e540e075ca27bd73708911ad101 [45316] 10 Apr 12:42:28.891 # +monitor master mymaster 127.0.0.1 6379 quorum 2 [45316] 10 Apr 12:42:29.832 * +slave slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6379 [45316] 10 Apr 12:42:29.834 * +slave slave 127.0.0.1:6377 127.0.0.1 6377 @ mymaster 127.0.0.1 6379
D:\redis5\redis5-sentinel-26378> redis-server.exe sentinel.conf --sentinel [14524] 10 Apr 12:42:46.902 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo [14524] 10 Apr 12:42:46.902 # Redis version=5.0.14.1, bits=64, commit=ec77f72d, modified=0, pid=14524, just started [14524] 10 Apr 12:42:46.902 # Configuration loaded _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 5.0.14.1 (ec77f72d/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in sentinel mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 26378 | `-._ `._ / _.-' | PID: 14524 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' [14524] 10 Apr 12:42:46.907 # Sentinel ID is 19e18f6594d97e540e075ca27bd73708911ad102 [14524] 10 Apr 12:42:46.907 # +monitor master mymaster 127.0.0.1 6379 quorum 2 [14524] 10 Apr 12:42:47.847 * +slave slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6379 [14524] 10 Apr 12:42:47.848 * +slave slave 127.0.0.1:6377 127.0.0.1 6377 @ mymaster 127.0.0.1 6379 [14524] 10 Apr 12:42:48.167 * +sentinel sentinel 19e18f6594d97e540e075ca27bd73708911ad101 127.0.0.1 26379 @ mymaster 127.0.0.1 6379
D:\redis5\redis5-sentinel-26377> redis-server.exe sentinel.conf --sentinel [55580] 10 Apr 12:43:01.259 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo [55580] 10 Apr 12:43:01.260 # Redis version=5.0.14.1, bits=64, commit=ec77f72d, modified=0, pid=55580, just started [55580] 10 Apr 12:43:01.260 # Configuration loaded _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 5.0.14.1 (ec77f72d/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in sentinel mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 26377 | `-._ `._ / _.-' | PID: 55580 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' [55580] 10 Apr 12:43:01.265 # Sentinel ID is 19e18f6594d97e540e075ca27bd73708911ad103 [55580] 10 Apr 12:43:01.265 # +monitor master mymaster 127.0.0.1 6379 quorum 2 [55580] 10 Apr 12:43:02.195 * +slave slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6379 [55580] 10 Apr 12:43:02.196 * +slave slave 127.0.0.1:6377 127.0.0.1 6377 @ mymaster 127.0.0.1 6379 [55580] 10 Apr 12:43:02.278 * +sentinel sentinel 19e18f6594d97e540e075ca27bd73708911ad102 127.0.0.1 26378 @ mymaster 127.0.0.1 6379 [55580] 10 Apr 12:43:02.341 * +sentinel sentinel 19e18f6594d97e540e075ca27bd73708911ad101 127.0.0.1 26379 @ mymaster 127.0.0.1 6379
从上面日志可以得知,哨兵3中监听了一个 master,两个 slave,和两个 sentinel。
(1)先查看当前主从复制信息,如下:
127.0.0.1:6379> info replication # Replication role:master connected_slaves:2 slave0:ip=127.0.0.1,port=6378,state=online,offset=48785,lag=1 slave1:ip=127.0.0.1,port=6377,state=online,offset=48785,lag=1 master_replid:9bbd7de80825d3a703b61dfc8409fa7ff8dc264b master_replid2:0000000000000000000000000000000000000000 master_repl_offset:48918 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:48918
从上可知,6379 为 Master,6378 和 6377 为 Slave。
(2)停掉 Master 服务。
(3)观察哨兵服务删除日志,如下:
# 哨兵1 [45316] 10 Apr 12:48:57.502 # +elected-leader master mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:57.502 # +failover-state-select-slave master mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:57.581 # +selected-slave slave 127.0.0.1:6377 127.0.0.1 6377 @ mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:57.581 * +failover-state-send-slaveof-noone slave 127.0.0.1:6377 127.0.0.1 6377 @ mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:57.674 * +failover-state-wait-promotion slave 127.0.0.1:6377 127.0.0.1 6377 @ mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:58.474 # +promoted-slave slave 127.0.0.1:6377 127.0.0.1 6377 @ mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:58.474 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:58.548 * +slave-reconf-sent slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:59.485 * +slave-reconf-inprog slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:59.485 * +slave-reconf-done slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:59.561 # -odown master mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:59.561 # +failover-end master mymaster 127.0.0.1 6379 [45316] 10 Apr 12:48:59.562 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6377 [45316] 10 Apr 12:48:59.563 * +slave slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6377 [45316] 10 Apr 12:48:59.565 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6377 [45316] 10 Apr 12:49:04.635 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6377 # 哨兵2 [14524] 10 Apr 12:48:57.344 # +sdown master mymaster 127.0.0.1 6379 [14524] 10 Apr 12:48:57.428 # +new-epoch 6 [14524] 10 Apr 12:48:57.429 # +vote-for-leader 19e18f6594d97e540e075ca27bd73708911ad101 6 [14524] 10 Apr 12:48:57.430 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2 [14524] 10 Apr 12:48:57.430 # Next failover delay: I will not start a failover before Mon Apr 10 12:54:57 2023 [14524] 10 Apr 12:48:58.548 # +config-update-from sentinel 19e18f6594d97e540e075ca27bd73708911ad101 127.0.0.1 26379 @ mymaster 127.0.0.1 6379 [14524] 10 Apr 12:48:58.549 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6377 [14524] 10 Apr 12:48:58.553 * +slave slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6377 [14524] 10 Apr 12:48:58.555 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6377 [14524] 10 Apr 12:49:03.575 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6377 # 哨兵3 [55580] 10 Apr 12:48:57.328 # +sdown master mymaster 127.0.0.1 6379 [55580] 10 Apr 12:48:57.428 # +new-epoch 6 [55580] 10 Apr 12:48:57.430 # +vote-for-leader 19e18f6594d97e540e075ca27bd73708911ad101 6 [55580] 10 Apr 12:48:58.440 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2 [55580] 10 Apr 12:48:58.440 # Next failover delay: I will not start a failover before Mon Apr 10 12:54:57 2023 [55580] 10 Apr 12:48:58.548 # +config-update-from sentinel 19e18f6594d97e540e075ca27bd73708911ad101 127.0.0.1 26379 @ mymaster 127.0.0.1 6379 [55580] 10 Apr 12:48:58.548 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6377 [55580] 10 Apr 12:48:58.550 * +slave slave 127.0.0.1:6378 127.0.0.1 6378 @ mymaster 127.0.0.1 6377 [55580] 10 Apr 12:48:58.553 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6377 [55580] 10 Apr 12:49:03.559 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6377
(4)继续使用 info replication 命令查看主从复制信息,如下:
C:\Users\Administrator> redis-cli -p 6377 127.0.0.1:6377> info replication # Replication role:master connected_slaves:1 slave0:ip=127.0.0.1,port=6378,state=online,offset=94189,lag=1 master_replid:c8421a0d848b549f4902e6258a30933c5cb51a67 master_replid2:9bbd7de80825d3a703b61dfc8409fa7ff8dc264b master_repl_offset:94455 second_repl_offset:71816 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:94455
从上面日志可知,127.0.0.1:6377 为新选举的 Master,127.0.0.1:6378 为 Slave(有且只有一个 Slave)。