使用keepalived来实现高可用(ha)
2010-12-09 10:24:11 阿炯

Keepalived作用于开源界的高可用性和负载均衡,是VRRP(虚拟路由器冗余协议)的一个很好的软件实现。它可以提供IP层的高可用性,一旦某一台机器的网络出现问题,另一台服务器会立即(几秒或者更少的时间)使用出故障的服务器的IP进行工作。


Keepalived是一个类似于layer3、4 、5交换机制的软件,也就是我们平时说的第3层、第4层和第5层交换。它的作用是检测如Web服务器的状态,如果有一台web服务器死机;或工作出现故障,Keepalived将检测到,并将有故障的web服务器从系统中剔除,当其正常(符合指定的规则)后Keepalived自动将其加入到服务器群中。这些工作全部自动完成,不需要人工干涉,需要人工做的只是修复故障的web服务器。

keepalived 通过发送和接收组播包中的同一个virtual_router_id 的中的成员的存活,来确定对方的不可用,一旦检测到对方的不可用,即会切换它的备份角色为主。

即:当真实机192.168.2.110上的keepalived 检测到 真实机192.168.2.116上的keepalived 不可用时,110上将使用vip:192.168.2.103对外服务并由backup角色转变为master。

keepalived 只能实现针对网络层的一个高可用,即一台装有keepalived的进程检测不到另一台装有keepalived 的存活时,才会发生切换。

在实现高可用功能时,我们可以脚本来实现heartbeat中才能实现的功能,如资源释放/接管、服务进程的关闭/开启等,当然,ip地址的切换keepalived已经做好了。那么我们要如何来做呢?通过其自身状态的变化:master, backup, fault,这三种状态的相互转变时执行脚本来实现预期的功能。如下面所示例:
notify_master /root/notify_master.sh
notify_backup /root/notify_backup.sh
notify_fault /root/notify_fault.sh

这种状态的更改会在系统日志有记录:
Dec  8 14:02:07 lvsm Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Dec  8 14:02:07 lvsm Keepalived_vrrp: VRRP_Instance(VI_2) Transition to MASTER STATE
Dec  8 14:02:08 lvsm Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Dec  8 14:02:08 lvsm Keepalived_vrrp: Netlink: skipping nl_cmd msg...
Dec  8 14:02:08 lvsm Keepalived_vrrp: VRRP_Instance(VI_2) Entering MASTER STATE
Dec  8 14:02:08 lvsm Keepalived_vrrp: Netlink: skipping nl_cmd msg...
...
Dec  8 14:42:13 phaster Keepalived_healthcheckers: Configuration is using : 15119 Bytes
Dec  8 14:42:13 phaster Keepalived_vrrp: Configuration is using : 40547 Bytes
Dec  8 14:42:13 phaster Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Dec  8 14:42:13 phaster Keepalived_vrrp: VRRP_Instance(VI_2) Entering BACKUP STATE

它们分别表示:'notify_master'进入master状态时要执行的脚本,后面两个状态同理。

针对应用层的一个检测keepalived 也可以通过vrrp_scripts 来实现:
vrrp_script chk_httpd{
script "/home/bin/httpd_check.sh"                
interval 2
weight -2
}

在vrrp_instance中调用其:
track_script {
chk_httpd 10
}

--------------------------------------
one master server and one slave server.
* The master and slave run in parallel
* Only one of the systems is "live"
* If the master fails, the slave will automatically take over
* If the slave fails, well... the master is still doing its job

上面所提及的四点是我们使用ha的最基本的功能要求。而使用keepalived也能实现,比之于heartbeat要方便一些,而且可定制性更强一些。


主从高可用配置

Keepalived程序安装好之后需要进行配置了,假设配置结果如下:
假设A,B两台物理机的IP地址分别是192.168.1.100和192.168.1.101,我们计划用到的虚拟IP地址是192.168.1.200;
假设我们计划把1.100这台机器(主机A)作为主用服务器,1.101这台机器(主机B)作为备用服务器,那么配置好后,直接访问192.168.1.200这个地址,就直接访问到了主机A;
当A上出现异常,keepalived程序会自动配置虚IP,这样1.200这个地址就自动飘到了B这台机器上。keepalived可以通过配置来决定当A恢复后,是否把虚IP地址飘回到A上面。

keepalived配置文件

先来看一个最简单的配置文件:
配置文件路径位于/etc/keepalived/keepalived.conf

A作为主机,业务正常时虚地址飘在这台机器上,A状态为master;A出问题时,虚地址飘到B上,B状态为master。主机A(master)和主机B(backup)的配置文件基本相同,只有细微的差别。

先来看主机A(master)的配置:
global_defs {
    router_id master
}

vrrp_script check_network {
    script "/etc/keepalived/monitor.sh"
    interval 2
    weight 2
    rise 2
    fall 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 1
    priority 100
    advert_int 1
    unicast_src_ip  192.168.1.100
    unicast_peer {
        192.168.1.101
    }
    authentication {
        auth_type PASS
        auth_pass yourpassword
    }
    track_script {
        check_network
    }
    virtual_ipaddress {
        192.168.1.200
    }
}

每个参数的作用

router_id 自定义的编号,master和salve区分开即可

vrrp_script check_network部分,这部分是用于检测当前服务状态是否正常的脚本。check_network是这部分的一个别名,可以看到在配置的末尾track_script部分就写上了当前的名称。
script "/etc/keepalived/monitor.sh" 这一行是脚本的地址,我们后面再来介绍脚本内容
interval 表示脚本检测的时间间隔,单位是秒
weight 表示脚本检测结果后权重变化,可以为正值,也可以为负值;正值表示脚本判断结果成功时,在基准的优先级上加上特定的数,脚本执行失败则不处理;负值表示脚本判断结果失败时,在基准优先级上减去特定值,脚本执行成功则不处理。
rise 表示脚本判定结果为成功时,需要连续的次数。如上示例,只有连续2次脚本判定结果成功时,才认为成功。
fall 和rise相反,如上示例,脚本连续两次判定失败时,才认为失败。

vrrp_instance VI_1 部分,这部分就是虚拟IP地址的相关配置了。
state 这行表示当前服务器的默认状态是master还是backup,但是这个master和backup仅仅作用在当主备的两台机器的优先级,也就是下面的priority值相同时,才起作用
interface 要绑定虚拟IP的网卡,每台服务器可能不一样
virtual_router_id 随便设置,但是主备机的配置要保持一致
priority 优先级,这个是配置中最重要的参数,判定当前虚地址飘在A还是B就靠这个值,设定中的是一个初始值,根据脚本执行结果进行增加或这减少,最终根据两台机器上的priority字段来判定哪台机器激活master。关于这个值的详细讲解,我们后面再展开。
advert_int 主备保持一致即可
unicast_src_ip和unicast_peer 当设备所处的网络环境禁止广播时,需要通过这两个参数来人工指定设备的IP地址,一般稳妥起见,直接手动配置好
authentication 密码自己随便设定即可
track_script 就是上一个部分介绍的检测脚本
virtual_ipaddress 虚拟IP地址

基本上一个最简单的配置文件就以上内容了,我们再来看backup的配置:
global_defs {
    router_id backup
}

vrrp_script check_network {
    script "/etc/keepalived/monitor.sh"
    interval 2
    weight 2
    rise 2
    fall 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 1
    priority 99
    advert_int 1
    unicast_src_ip  192.168.1.101
    unicast_peer {
        192.168.1.100
    }
    authentication {
        auth_type PASS
        auth_pass yourpassword
    }
    track_script {
        check_network
    }
    virtual_ipaddress {
        192.168.1.200
    }
}

和master配置不一样的地方:
router_id
state
priority 注意,一般备机的priority字段肯定是要比主机的要小的,但是他们的差值不能超过weight值,后面我们详细介绍
unicast_src_ip 和 unicast_peer 主备正好相反

检测脚本

同样我们来看一个最简单的检测脚本
#!/bin/bash
# check if network is ok
ping -c 1 192.168.1.1 > /dev/null 2>&1
if [ $? -eq 0 ]; then
   exit 0
else
   exit 1
fi

这个脚本的作用就是ping一个网关的地址,如果可以ping通,就表示脚本检测结果成功(exit 0);否则认为失败。当然这个是最简单的ping网关方式确定网络是否通常,常见的是检测nginx是否正常来决定是否切换节点。

关于优先级

priority字段是真正能定义当前设备状态的,priority高的为master,低的为backup。当priority相同时,再根据state字段决定主备机位置,weight正值表示脚本执行成功时加上,weight负值表示脚本执行失败时减去主备机的priority绝对值一定要小于weight值,否则会导致切换失败。

用一个表格来分析各种情况,环境就是我们上面的配置,A的优先级是100,B的优先级是99,weight都是2
主机     A,B均正常     A异常,B正常     A正常,B异常     A,B均异常
A priority     100+2=102(master)     100(backup)     100+2=102(master)     100(master)
B priority     99+2=101(backup)     99+2=101(master)     99(backup)     99(backup)

所以这也是为什么weight要大于A和B的优先级绝对值,否则即便A出现的了异常,也无法切换到B上。


We have already implemented this system for our own purposes. In our data centers, we use LVS not only for high-availability, but also load balancing. We have put the proverbial pen to paper and tell you how to do it!

Installation
apt-get update
apt-get install keepalived

Configuration
The first step is to create a configuration file. Keepalive will look for keepalived.conf in the /etc/keepalived directory. There are slight differences between MASTER and BACKUP configurations. Below is an annotated configuration example:

This is a common global definition block. The lvs_id keyword should be set to a unique name per LVS host. The remainder of the global_defs block contains optional SMTP/email settings. This is used to send email alerts when an LVS host transitions from one state to another.

! Comments begin with a bang '!'

global_defs {
lvs_id lvs1
notification_email {
admin@yourdomain.com
}

notification_email_from noreply@yourdomain.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
}

The next section defines a vrrp_sync_group. This is basically a group of interfaces that Keepalive will manage. The group can also define a script to be called during state transitions (ex: from MASTER to BACKUP). This is done by adding an appropriate notify_ keyword, where is one of: master, backup, or fault. A fault state means that there was a fatal error when LVS attempted to transition between master to backup or vise versa. The vrrp_sync_group is given the name VG1 in this example but can be anything you choose.

vrrp_sync_group VG1 {
group {
EXTIF
INTIF
}

! Optional notification scripts
notify_master "/root/bin/notify_master.sh"
notify_backup "/root/bin/notify_backup.sh"
notify_fault "/root/bin/notify_fault.sh"

}

Next we define vrrp_instance blocks for each interface defined in the above vrrp_sync_group. In our example, we have defined an external (EXTIF) and internal (INTIF) interface. For each of those, we need to configure certain properties such as the physical interface name, ex: eth0. Note the state and priority keywords. Their values depend on which LVS host we are writing a configuration for. The state keyword can be set to MASTER or BACKUP. The priority keyword is used to sort out which LVS host will be promoted to MASTER first. If the current MASTER LVS host were to fail, the next host with the highest priority will be promoted to take over. The virtual_router_id keyword must contain an unique number per vrrp_instance. The interface keyword would be set to the physical interface name that corresponds to the name given to the vrrp_instance block (ex: EXTIF = external interface). Keepalive will broadcast vrrp packets so that all other LVS hosts can keep in sync with each other. The lvs_sync_daemon_interface is used to tell keepalived which interface it should send these broadcasts on. Generally, it's a good idea to keep these broadcasts within your network - on a local LAN segment. In the example, we are using eth1 which is internal. The advert_int keyword is set to the number of seconds between each "advertisement" broadcast. If one of the LVS hosts were to fail, the remaining hosts would be notified within this amount of time. The authentication block defines the password/key used to encrypt vrrp broadcast packets. It is recommended to stick with the AH auth_type as it is encrypted and more secure. The virtual_ipaddress block would contain the IP address for this interface. The MASTER LVS host will assign this IP address as a virtual address to the configured interface. Optionaly, as with the vrrp_sync_group block, you can define an external program to be run when an LVS host transitions between one state to another.

vrrp_instance EXTIF{
state MASTER
priority 150
virtual_router_id 1

interface eth0
lvs_sync_daemon_inteface eth1

advert_int 5

authentication{
auth_type AH
auth_pass cb7a9e8df183f71d
}

virtual_ipaddress{
192.168.2.103
}

notify_master "/root/bin/notify_master.sh"
notify_backup "/root/bin/notify_backup.sh"
notify_fault "/root/bin/notify_fault.sh"

}

The INTIF vrrp_instance follows next (which is mostly the same as above):

vrrp_instance INTIF{
state MASTER
priority 150
virtual_router_id 2
interface eth1
lvs_sync_daemon_inteface eth1
advert_int 1

authentication{
auth_type AH
auth_pass aa8317630e7e0afc
}

virtual_ipaddress{
192.168.1.1
}
}

接下来我们来实战:使用keepalived+drbd+nfs实现网络文件系统服务高可用,本文在debian 5环境实验成功。

环境:
主机名称        ip地址        角色
lvsm        192.168.2.116    master nfs-server
phaster        192.168.2.110    backup nfs-server
sdb        192.168.2.102    client nfs-client

一、安装相关软件(keepalived、drbd主备机)
# apt-get install keepalived nfs-kernel-server drbd8-utils drbd8-modules-2.6.26-2-686

注:在使用drbd时,请至少为机器准备一个干净的分区,且尽量大小相同。本文不打算具体介绍drbd,所以这里就简单地说一下其操作过程。

在'/etc/hosts'添加双方的主机名及对应的ip地址:
192.168.2.116    lvsm.freeoa.net    lvsm
192.168.2.110    phaster.freeoa.net    phaster

双方的配置文件也要一样(/etc/drbd.conf):
global {
usage-count yes;
}

common {
syncer { rate 10M; }
}

resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
}

startup {
degr-wfc-timeout 60;    # 2 minutes.
}

disk {
on-io-error   detach;
}

net {
}

syncer {
rate 10M;
al-extents 257;
}

on lvsm {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.2.116:7788;
flexible-meta-disk  internal;
}

on haster {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.2.110:7788;
meta-disk internal;
}
}
---------------------------------
DRBD的启动
在启动DRBD之前,你需要分别在两台主机的sdb1分区上,创建供DRBD记录信息的数据块.分别在两台主机上执行:
[root@lvsm /]# drbdadm create-md r0
[root@phaster /]# drbdadm create-md r0

“r0”是我们在drbd.conf里定义的资源名称.现在我们可以启动DRBD了,分别在两台主机上执行:
[root@lvsm /]# /etc/init.d/drbd start
[root@phaster /]# /etc/init.d/drbd start

现在可以查看DRBD的状态,然后在lvsm主机上执行:
[root@lvsm /]# cat /proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre, 2008-11-12 16:40:33
0: cs:StandAlone st:Secondary/Secondary ds:UpToDate/DUnknown   r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:20 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

”/proc/drbd”中显示了drbd当前的状态.第一行的st表示两台主机的状态,都是”备机”状态.ds是磁盘状态,都是”不一致”状态.这是由于,DRBD无法判断哪一方为主机,以哪一方的磁盘数据作为标准数据。所以,我们需要初始化一个主机,在lvsm上执行:
[root@lvsm /]# drbdsetup /dev/drbd0 primary -o

现在再看一个lvsm上的DRBD状态:
[root@lvsm /]# cat /proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre, 2008-11-12 16:40:33
0: cs:StandAlone st:Secondary/Secondary ds:UpToDate/DUnknown   r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:20 lo:0 pe:0 ua:0 ap:0
[==>.................] sync'ed: 14.7% (262464/305152)K
finish: 0:02:58 speed: 1,440 (1,292) K/sec
resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

主备机状态分别是”主/备”,主机磁盘状态是”实时”,备机状态是”不一致”。在第3行,可以看到数据正在同步中,即主机正在将磁盘上的数据,传递到备机上。现在的进度是14.7%,现在看一phaster上面的DRBD状态也lvsm上相差无几。

稍等一会,在数据同步完后,花费时间取决于空间大小及传输速度,再查看一下lvsm的DRBD状态:
[root@lvsm /]# cat /proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre, 2008-11-12 16:40:33
0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
ns:0 nr:274432 dw:274432 dr:0 al:0 bm:44 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:17128 misses:24 starving:0 dirty:0 changed:24
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

磁盘状态都是”实时”,表示数据同步完成了.

--------------------------------------
使用DRBD
你现在可以把主机上的DRBD设备挂载到一个目录上进行使用.备机的DRBD设备无法被挂载,因为它是用来接收主机数据的,由DRBD负责操作.在lvsm上执行:
[root@lvsm /]# mount /dev/drbd0 /mnt/drbd
[root@lvsm /]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/drbd0            289M   11M  264M   4% /mnt/drbd

现在,我们在drbd1目录里建立一个200M的文件:
[root@lvsm /]# dd if=/dev/zero of=/mnt/drbd/tempfile1.tmp bs=104857600 count=2
操作完成后,在phaster(备机)上执行:
我们先停止DRBD
[root@phaster /]# /etc/init.d/drbd stop

现在,我们可以将hdb1进行挂载
[root@phaster /]# mount /dev/sdb1 /mnt/drbd
[root@phaster /]# ls /mnt/drbd -hl
total 201M
drwx------  2 root root  12K Jul 28 23:44 lost+found
-rw-r--r--  1 root root 200M Jul 29 00:20 tempfile1.tmp
[root@phaster /]# umount /mnt/drbd

可以看到,在主机lvsm上产生的文件tmpfile1.tmp,也完整的保存在备机phaster的DRBD分区上.这就是DRBD的网络RAID-1功能。在主机上的任何操作,都会被同步到备机的相应磁盘分区上,达到数据备份的效果。

DRBD的主备机切换
有时,你需要将DRBD的主备机互换一下,可以执行下面的操作:在主机上,先要卸载掉DRBD设备。
[root@lvsm /]# umount /mnt/drbd

将主机降级为“备机”。
[root@lvsm /]# drbdadm secondary r0
[root@lvsm /]# cat /proc/drbd
version: 8.0.4 (api:86/proto:86)
SVN Revision: 2947 build by root@lvsm, 2007-07-28 07:13:14
1: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---
ns:0 nr:5 dw:5 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0

现在,两台主机都是”备机”。在备机phaster上,将它升级为”主机”。
[root@phaster /]# drbdadm primary r0
[root@phaster /]# cat /proc/drbd
version: 8.0.4 (api:86/proto:86)
SVN Revision: 2947 build by root@phaster, 2007-07-28 07:13:14
1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
ns:0 nr:5 dw:5 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0

现在,phaster成为了”主机”,你可以把它的/dev/drbd1进行挂载和使用了。同样,数据会被同步到
lvsm上面。至此,drbd就配置完成了。

注:还有一个问题要注意,两台服务器的时间要一致,不然也会出现一些无可名状的问题。

Nfs服务器的配置:
phaster:~# more /etc/exports
# /etc/exports: the access control list for filesystems which may be exported
#        to NFS clients.  See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes       hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4        gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check)
#
/mnt/drbd/nfs   192.168.2.96/27(rw,sync,no_subtree_check)

下面配置两台主机的keepalived,使之具有ha功能。
lvsm:~# more /etc/keepalived/keepalived.conf
global_defs {
notification_email {
lvsmon@freeoa.net
}
notification_email_from sbody@freeoa.net
smtp_server smtp.freeoa.net
smtp_connect_timeout 2
router_id freeoa_site
}

vrrp_instance VI_2 {
state MASTER    #backup主机上为”BACKUP”
interface eth0
virtual_router_id 52    #与其它实例不一样,但同一组是一样的
priority 100    #backup机的的优先级一定比当前值小
advert_int 1

authentication {
auth_type PASS
auth_pass 3at_ab
}
notify_master "/root/bin/hamaster.sh"
notify_backup "/root/bin/hafault.sh"
notify_fault  "/root/bin/hafault.sh"

virtual_ipaddress {
192.168.2.103
}
}

处理逻辑为:当master上keepalived启动时,设置drbd为主,挂载分区到指定目录,开启相关服务(当然在开启整个系统前的准备工作是必不可少的),这里编写了一个简单的脚本。
lvsm:~/bin# more hamaster.sh
#!/bin/bash
#set current drbd pri
drpri=$(/sbin/drbdadm primary r0)
echo $?

#mount the /mnt/drbd
mnt=$(/bin/mount /dev/drbd0 /mnt/drbd/)
echo $?

#start the nfs
ngstart=$(/etc/init.d/nfs-kernel-server start)
echo $?

#stop keepalived
#stkepa=$(/etc/init.d/keepalived stop)
#echo $?

当master失效(fault)时,我们要关闭服务,卸载分区,drbd降级,视情况关闭keepalived服务(当该机重启时或网卡中断时(假设你不幸买到了dell r610系列的机器时),会发生ip转移和资源切换;而当该服务器恢复时,如果keepalived进程还存在的话,ip将会还给master主机。但drbd中的数据却不是最新的,会发生脑裂,这个最好手动解决),所有最好将keepalived关闭,需要的情况下手动开启,以免不测。

lvsm:~/bin# more hafault.sh
#!/bin/bash

#stop the nfs
ngstart=$(/etc/init.d/nfs-kernel-server stop)
echo $?

#umount the /mnt/drbd
mnt=$(/bin/umount /mnt/drbd/)
echo $?

#set current drbd sec
drsec=$(/sbin/drbdadm secondary r0)
echo $?

#stop keepalived
#stkepa=$(/etc/init.d/keepalived stop)
#echo $?

我们可以手动运行两个脚本来实现资源的切换,当然在实际的情况中,你可能要停止你的相关服务,如果你的nginx将根目录指向了/mnt/drbd下的话且正在使用,这时当你卸下分区时就可能报错。

先在master上运行hafault.sh 使服务器放弃资源,在backup上运行fhamaster.sh,可以将备机的环境做好。如果想换回来,在各机上执行脚本即可。我们下面着重讨论一下在master断网后又恢复的情形下如何将backup上的数据转移到master上并提升其上服务级别使之成为主服务器。

注:两台机器上的keepalived均去掉了开机启动的功能。

在phaster上可能看到如下日志:
Dec  8 16:41:48 phaster Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Dec  8 16:41:48 phaster Keepalived_vrrp: VRRP_Instance(VI_2) Transition to MASTER STATE
Dec  8 16:41:49 phaster kernel: [10605.496577] drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Dec  8 16:41:49 phaster kernel: [10605.496787] drbd0: asender terminated
Dec  8 16:41:49 phaster kernel: [10605.496800] drbd0: Terminating asender thread
Dec  8 16:41:49 phaster kernel: [10605.526393] drbd0: Connection closed
Dec  8 16:41:49 phaster kernel: [10605.526393] drbd0: conn( NetworkFailure -> Unconnected )
Dec  8 16:41:49 phaster kernel: [10605.526393] drbd0: receiver terminated
Dec  8 16:41:49 phaster kernel: [10605.526393] drbd0: Restarting receiver thread
Dec  8 16:41:49 phaster kernel: [10605.526393] drbd0: receiver (re)started
Dec  8 16:41:49 phaster kernel: [10605.526393] drbd0: conn( Unconnected -> WFConnection )
Dec  8 16:41:49 phaster Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Dec  8 16:41:49 phaster Keepalived_vrrp: Netlink: skipping nl_cmd msg...
Dec  8 16:41:49 phaster Keepalived_vrrp: VRRP_Instance(VI_2) Entering MASTER STATE
Dec  8 16:41:49 phaster Keepalived_vrrp: Netlink: skipping nl_cmd msg...
Dec  8 16:41:49 phaster kernel: [10606.319019] drbd0: role( Secondary -> Primary )
Dec  8 16:41:49 phaster kernel: [10606.320561] drbd0: Creating new current UUID
Dec  8 16:41:50 phaster kernel: [10606.430792] kjournald starting.  Commit interval 5 seconds
Dec  8 16:41:50 phaster kernel: [10606.430792] EXT3 FS on drbd0, internal journal
Dec  8 16:41:50 phaster kernel: [10606.430792] EXT3-fs: recovery complete.
Dec  8 16:41:50 phaster kernel: [10606.433638] EXT3-fs: mounted filesystem with ordered data mode.
Dec  8 16:41:50 phaster kernel: [10606.547865] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Dec  8 16:41:50 phaster kernel: [10606.548236] NFSD: starting 90-second grace period

查看本机所有ip地址:
phaster:~# ip a
1: lo: mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:81:07:f1 brd ff:ff:ff:ff:ff:ff
inet 192.168.2.110/27 brd 192.168.2.127 scope global eth0
inet 192.168.2.101/32 scope global eth0
inet 192.168.2.103/32 scope global eth0
3: eth1: mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 08:00:27:6f:8d:d5 brd ff:ff:ff:ff:ff:ff

通过'df -h'可以看到分区已经挂载,我们可以在其写入一些数据,在恢复到lvsm为主服务器时看数据是否存在。
phaster:/mnt/drbd/nfs# cp /root/software/ext-2.0.2.zip ./

好,我们接下启动lvsm上的网卡(这里没有重启机器):# ifconfig eth0 up

通过日志可以看到ip地址已经还给了lvsm。
lvsm:~# ip a
1: lo: mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:b8:75:8d brd ff:ff:ff:ff:ff:ff
inet 192.168.2.116/27 brd 192.168.2.127 scope global eth0
inet 192.168.2.101/32 scope global eth0
inet 192.168.2.103/32 scope global eth0
inet6 fe80::a00:27ff:feb8:758d/64 scope link
valid_lft forever preferred_lft forever
3: eth1: mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 08:00:27:1c:be:7b brd ff:ff:ff:ff:ff:ff

phaster上的keepalived的状态进入了'BACKUP',nfs服务也关闭了。但是要注意:drbd的状态没有改变,且分区也没有被卸载!其状态为:
phaster:~# !cat
cat /proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre, 2008-11-12 16:40:33
0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown   r---
ns:204 nr:274644 dw:281068 dr:108234 al:10 bm:49 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/257 hits:1596 misses:13 starving:0 dirty:3 changed:10

日志里有如下记录:
Dec  8 16:47:52 phaster kernel: [10969.175764] drbd0: helper command: /sbin/drbdadm split-brain minor-0
Dec  8 16:47:52 phaster kernel: [10969.186671] drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Dec  8 16:47:52 phaster kernel: [10969.186695] drbd0: conn( WFReportParams -> Disconnecting )
Dec  8 16:47:52 phaster kernel: [10969.186718] drbd0: error receiving ReportState, l: 4!
Dec  8 16:47:52 phaster kernel: [10969.187061] drbd0: asender terminated
Dec  8 16:47:52 phaster kernel: [10969.187076] drbd0: Terminating asender thread
Dec  8 16:47:52 phaster kernel: [10969.187325] drbd0: Connection closed
Dec  8 16:47:52 phaster kernel: [10969.187346] drbd0: conn( Disconnecting -> StandAlone )
Dec  8 16:47:52 phaster kernel: [10969.187444] drbd0: receiver terminated
Dec  8 16:47:52 phaster kernel: [10969.187454] drbd0: Terminating receiver thread

而在lvsm上,drbd的状态为:
lvsm:~# !cat
cat /proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre, 2008-11-12 16:40:33
0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown   r---
ns:274648 nr:204 dw:436 dr:274855 al:7 bm:60 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/257 hits:51 misses:7 starving:0 dirty:0 changed:7

同时,分区也是正常被挂载了。且nfs服务也开启了。

原来发生脑裂了,双方“各自为政”了。

问题在此就出现了,用户得到的是,断网前的数据,而最新数据在phaster机器上,而非现在的lvsm上。我们就必须关闭相关服务来做手动恢复(drbd官方也不赞成自动),所以在那个hafault.sh的脚本中最后关闭keepalived这个做法是安全的。

在lvsm上手动执行hafault.sh 脚本关闭相关资源。

在lvsm机器上执行:
------------------
drbdadm secondary r0
drbdadm disconnect r0
drbdadm -- --discard-my-data connect r0

#每个步骤的作用,查一下官方手册吧。
------------------

在phaster机器上执行:
drbdadm connect r0

执行完成后,在看一下双方的drbd状态:
phaster:~# !cat
cat /proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre, 2008-11-12 16:40:33
0: cs:SyncTarget st:Secondary/Primary ds:Inconsistent/UpToDate C r---
ns:0 nr:43936 dw:43936 dr:0 al:0 bm:26 lo:0 pe:32 ua:0 ap:0
[==>.................] sync'ed: 17.7% (230496/274432)K
finish: 0:00:20 speed: 10,984 (10,984) K/sec
resync: used:1/61 hits:2771 misses:7 starving:0 dirty:0 changed:7
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

数据在同步之中,很快就会一致,lvsm上的数据就会与phaster上的一致。

在完成后,在phaster执行hafault.sh脚本。并在lvsm上运行hamaster.sh脚本将其提升为master身份。同时查看一下那个新增的文件是否存?检验码是否与源文件一样?
lvsm:~# ls /mnt/drbd/nfs/
ext-2.0.2.zip  kernel  mfs-1.6.16.tar.gz  rhythmbox-0.12.8.tar.gz

测试成功通过。

小结:keepalived相对于heartbeat要简单一些,要轻量级一些,可定制性要更强一些。与做管理lvs集群管理的功能相比,来做ha的功能来说,使用的人要少一些。

该文章最后由 阿炯 于 2021-03-19 11:07:01 更新,目前是第 2 版。