After hours and hours of research, study and
reading, I finally decided to get into action and started to configure my first
Vyatta cluster, the problem to solve was simple, two redundant uplinks, same
address space, 100% availability committed between the two links.
At first glance, looks like if you set up a
redundant server, with two CPU, two power supply, redundant memory and RAID. You
should be fine, vyatta Load Balancing will do the rest. The thing is, recently
after a regular update reboot, we faced a damaged Main Board situation, so it
was really clear for me that Mother Board got broken, and one should be able to
face this situation.
Cluster vs VRRP
Even before start to write my first lines of
configuration, I faced this decision. Originally I have decided to forget about
Vyatta cluster capabilities and to focus on Heartbeat, so using regular
scripting I will handle to wipe the configuration on the damaged router and
start it on the good one. But after try Cluster and VRRP in a test lab set up , I had a second thought about it. VRRP and
Clustering do pretty much the same job, failover "virtual" IP address
between two routers, so you can have your network up and running in seconds
after a router failure. Both are based on monitoring the other peer, and
failover if this "heartbeat" from the other side stops. After reading
for a while about both, I decided to test clustering because in theory it will
allow me to set my own resources (what they call "FCO resources
agents") which is a generic name for a script that support start, stop and
monitoring parameters.
Test went wonderful, and after I while I realized
that this way may work. this approach will allow me to failover not only IP
address but also generic resources, for me, a script to load and unload the
part of the configuration I want in the active router was the
"resource" to failover. WRRP in contrast doesn't show an easy way to
achieve this, and the more the solutions is integrated with Vyatta, the more
chances you have to succeed in get the solution to work, and be able to update
the Router without much fuzz, which is a big deal.
Router 1 already working
One characteristics of this set up was that one of
the routers in this cluster was already working, doing the job the future cluster will have to handle. So beside the limitations in time to do the job( to don't interrupt the
service), I have a template of the configuration file I need to transform to a
FCO resource.
Diagram
As all complex solutions, everything start with a
model or diagram, and this is what I got:
The router labeled as GW1 was the one on duty, the
GW2 was the newcomer.
Goals:
- Support an uplink failure in any of the two
circuits (A and B)
- Support a mayor failure in any of the two routers
- Failover Nat, static routes, and site-to-site VPN
between the two routers.
Out of the scope for the first attempt (should come in the next job)
- Support of fancy stonith capabilities as IDRAC
remote shutdown
- Support of lan interfaces failures including
heartbeat interface.
- Support of any kind of notification besides snmp
regular monitoring.
The Job:
So, here I am in front of the keyboard, what to do
first? The clustering test went ok, but only doing failover of the IP address,
including more than one virtual IP for interface:
show cluster
dead-interval 10000
group backbone {
auto-failback false
monitor 4.2.2.1
monitor 199.7.83.42
primary denvgw
secondary denvgw2
service
XX.XX.XX.1/28/eth0
service XX.XX.XX.2/28/eth0
service
XX.XX.XX.3/28/eth0
service
172.22.0.1/24/eth1
service
172.18.1.1/24/eth2
}
interface eth3
keepalive-interval 2001
monitor-dead-interval 20000
pre-shared-secret XXXX
What I needed is to test another resource, even
when there is no examples around, Vyatta documentation states as parameters for
the service branch of the cluster leaf:
So, clearly, you can add a service which is in fact
a script. Even when no one around has posted this as working, I decided to test
it, using the well know vyatta config wrapper I created a script that will
change the description of one interface as the "resource" go from
primary to secondary and back. This first script support only start and stop,
and it was based on the generic FCO resource agent you can find in:
/usr/lib/ocf/resource.d/heartbeat/anything
There is a lot of information on this file, but I
found, start, stop and monitor are the more important part.
according to the official resource:
http://www.linux-ha.org/wiki/Resource_Agents
an agent should support:
- Start
- Stop
- Monitor
- Validate-all
- Metadata
But in practice, I found that start, stop and
monitor will do the trick for Vyatta.
From my point of view, if Vyatta accept scripts in
"/etc/init.d/" as resources, start and stop should work, but in practice, without
the "monitor" parameter the command: "show
cluster status" will not work properly,
creating a mess. I'm not sure at this point if the "status" parameter
will work as well.
All the communication is done via exit codes, and
you should take attention of the returning codes to avoid problems.
My first test script:
#!/bin/sh
#
anything_start() {
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
begin
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
set interfaces ethernet eth0 description arriba
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
commit
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
end
}
anything_stop() {
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
begin
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
set interfaces ethernet eth0 description stop
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
commit
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
end
}
case "$1" in
start)
anything_start
exit
0
;;
stop)
anything_stop
exit
0
;;
*)
exit
1
;;
esac
It works without the monitor, but I found after a
while, that monitor parameter is really needed.
So, with the test script working, it was time to
create the real one, the main difference with the test, was the amount of
config lines and the addition of the monitor option. But how to implement the
monitor option for a resource that is not a service?, doesn't have a
process running?, etc. I created a monitor file to control if the
configuration was loaded on the server or not. This is the summary of the
script without the tons of config lines.
#!/bin/sh
#
anything_start() {
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
begin
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
set tons of config lines
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
commit
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
end
/bin/echo 1 >
/var/lib/monitor
}
anything_stop() {
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
begin
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
delete interfaces openvpn vtun0
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
delete interfaces openvpn vtun1
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
delete interfaces openvpn vtun2
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
delete interfaces openvpn vtun3
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
delete nat destination
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
delete nat source
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
delete protocols static interface-route
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
delete vpn ipsec
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
delete vpn l2tp
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
commit
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper
end
/bin/echo 0 > /var/lib/monitor
}
anything_monitor() {
juan=`cat
/var/lib/monitor`
if [
$juan == 0 ]; then
exit
7
else
exit
0
fi
}
pepe=`cat /var/lib/monitor`
case "$1" in
start)
if
[ $pepe == 0 ]; then
anything_start
fi
exit
0
;;
stop)
if
[ $pepe == 1 ]; then
anything_stop
fi
exit
0
;;
monitor)
anything_monitor
;;
*)
exit
1
;;
esac
So, if /var/lib/monitor is 0, the configuration is
not loaded, if it is 1, it is loaded. That is the output of the script if I
call it with the monitor option: if monitor = 0 return 7, if monitor = 1 return
0. Also, if the configuration is loaded, don't load it again, this is a special
request of an OCF resource, the start operation or stop operation should not cause
problems even if the resource is already started or stopped.
The final step is to create a config file that
instruct vyatta to failover the resources:
show cluster
dead-interval 10000
group backbone {
auto-failback false
monitor 4.2.2.1
monitor 199.7.83.42
primary denvgw
secondary denvgw2
service XX.XX.XX.1/28/eth0
service XX.XX.XX.1/28/eth0
service XX.XX.XX.1/28/eth0
service 172.22.0.1/24/eth1
service 172.18.1.1/24/eth2
service testres
}
interface eth3
keepalive-interval 2001
monitor-dead-interval 20000
pre-shared-secret XXXX
Notes:
- If you want to create a Raid 1 vyatta server with
version 6.6, it won't work. Especially if you want image installation, there is
a lot of workarounds if you use classic installation but no one for image
installation. What I did was install an old version of vyatta (6.4) which I
knew worked well with raid-1 and image installation and upgraded to 6.6 after.
- Vyatta cluster facility take care of two
important things in a cluster of routers, first of all, the healthiness of the
members of the cluster, this is done using the standard heartbeat interface. Secondly,
the health of the interfaces, to achieve this, vyatta use a "monitor"
to ping, if the ping works, vyatta assume the cluster group as healthy. I also
tested that if you configure many monitors, with only one of them working,
vyatta assume the cluster group on that server is OK. So the monitor answers to
ping are linked by OR and not AND.
- IDRAC (only for dell servers) are in a different circuit than the main interface, so if one circuit is lost, I always can connect to the console via the other circuit.
- English is not my mother tongue, so yes, there is a lot of grammatical errors. The scripts need to be polish, and the split brain problem needs to be solved. My main goal here is to show the results of my tests, I couldn't find something similar on the web or the Vyatta documentation, and for some reason, even being member of the vyatta forum for 4 years I wasn't able to post a new topic.
No comments:
Post a Comment