QMT Failover replication Setup
QMT Failover replication Setup
Craig Smith - 26th October 2006 - craig@doc-net.com
Thanks to Jake for taking the time to review this for me before posting. It always helps to have a sounding board and Jake was kind enough to be that board for me.
This page gives you a procedure to configure a backup qmt server that will be available for failover in the event of primary server failure. The backup server will only ever be 10 minute out from the primary.(depending on cronjob timing)
Please note initial replication (the first run) will take some time, so schedule this for off peak hours. Once the first run has finished and unison has a db of what it is working with subsequent runs are pretty quick. So enable the cron job settings at a time that you can manage the traffic for initial replication.
Also this setup is based on 2 servers where the port used is internal and not visible publically. If you cannot do this on a private network, then read up on using ssh for replication as this is not a secure transport and should not be used on open networks.
This was setup and tested on Fedora core 5 on both servers, and it works without any hiccups.
The details are pretty much cut and paste.
I would test this installation on a similar setup first as I only have my system to compare with, but given the ease of installation and use of unison, I don't forsee any major problems.
This setup assumes that QMT is installed and configured on both servers. The backup servers version is to match the primary's.
I have also setup a script that will test your server availability by pinging it. This is posted at the bottom of this page, but still needs work. The mail commands are purely for testing. If the primary server really were off, the emails wouldn't go out until you switch to the backup which doesn't help. I think replacing these mail commands to sms text messages would certainly help. A quick and easy monitoring system, which can be added to cron to run every x minutes.
This is a first draft and also my fist proper linux/unix scripts so please forgive any errors and let me know if you have any problems. If the primary server fails and the switch is made to backup in a short time, clients will probably not even notice the downtime.
- Unison is a great replication utility. http://www.cis.upenn.edu/~bcpierce/unison/
- We use the command line version available from http://www.cis.upenn.edu/~bcpierce/unison/
- This is based on unison-2.13.16-linux-text.bz2 which is the current stable release as of 10/10/2006
- To modify files/folders either replicated or ignored edit the unison profile (qmail.prf).
- This setup is for servers that are in a lan environment with the unison port not visible to the public world. Therefore the security overhead of ssh is not needed. However unison can be run with ssh if needed.
- To configure unison and ssh see http://www.cis.upenn.edu/~bcpierce/unison/download/releases/stable/unison-manual.html#remote. The procedure stays largely the same, but the socket server is not needed.
Comments and Notes
Please feel free to leave comments about your experience with this procedure.
Craig Smith - 26th October 2006 On our setup of 2 x fedora core 5 boxes, whenever I make a change on primary I check on backup to make sure it has replicated, and so far everytime it works a charm. I've not run into anything strange.
Also I'm not sure how people prefer the logs. I had it initially set to log each run, but that was changed to keep logging for 20MB which is roughly 24 hours worth of logging. Times are logged so finding a specific run should be fairly easy. However it's quite easy for me to change back to log each run in a seperate file. I will go with preferance really, please let me know.
- * Note about non FC versions.
One of the variables in the script obtains file size by cutting the relevant field from an ls listing. This field it turns out is not the same as on Fedora, so if your script runs into errors, the problem lies here more than likely. To fix it run this (ls -l $LOG|cut -d ' ' -f 5)portion in the command line and change the f no. until it correctly displays the size field and then change the number in the script and this should fix any script errors.
It is the following variable.
size=`ls -l $LOG|cut -d ' ' -f 5`
I believe for Centos it is f6 so that variable would be changed to
size=`ls -l $LOG|cut -d ' ' -f 6`
Primary Server (Client) setup
- Primary server is the main server being replicated. This has the unison program and the client side script to replicate data based on the profile.
- All scripts assume a default path of /unison.
- The script calls unison to run as follows
/unison/unison -force / -batch qmail
- -force / is very important, as it specifies this root as the primary and will default all conflicts to this root. NB IF THIS IS NOT INCLUDED ANY CHANGES MADE ON THE BACKUP SERVER WILL REPLICATE TO PRIMARY.
- -batch tells unison to run without promts and qmail refers to the qmail.prf file. (see below)
- To configure unison on the client side take the following steps as root/su
mkdir /unison cd /unison wget https://svn.cis.upenn.edu/svnroot/unison-contributed-binaries/linux/unison-2.13.16-linux-text.bz2 bzip2 -d unison-2.13.16-linux-text.bz2 mv unison-2.13.16-linux-text unison chmod 755 unison ./unison *This will create the initial /root/.unison database folder vi qmail-replicatec *Once in the editor, copy and paste the qmail-replicatec script from below :wq vi /root/.unison/qmail.prf *Once in the editor, copy and paste the qmail.prf details from below :wq
Client script gets added to cron to run every 10 minutes.
crontab -e */10 * * * * /unison/qmail-replicatec
*Below is the full script for the client side, script name qmail-replicatec
#!/bin/sh #Version 1.0 - Oct 11th 2006 #Script created by Craig Smith - craig@doc-net.com #This script is the client (Primary mail server) side script to replicate based on the #qmail.prf file located under /root/.unison. The Mysql variables and dump were taken from #qmailbackup script written initially by Nate Davis. #To add to or change files paths edit /root/.unison/qmail.prf accordingly. #This script assumes unison and scripts are placed in /unison #To keep logs from building too excessively as this script will run fairly regularly #they will be moved to the /unison/oldlogs folder after 10. The oldlogs folder will #be emptied after each group of 10. You can increase this if needed. See comment below. #With default settings the last 20 runs are available. Add email address below for notifications of log rotation. #starting with mysql so the dumped file can replicate #Checking for mysql.dump and oldlogs folder folder1=/unison/mysql.dump folder2=/unison/oldlogs lock=/unison/replicate.lock email=(email address)
if [ ! -d $folder1 ]; then mkdir -p $folder1 fi if [ ! -d $folder2 ];then mkdir -p $folder2 fi if [ ! -f $lock ]; then touch $lock elif [ -f $lock ]; then echo "Lock file still in place, investigate."|mail -s "unison script problems please investiagate server" $email exit 0 fi
# MYSQL variables mysqlfile="/home/vpopmail/etc/vpopmail.mysql"; mysqlhost=`cut -d\| -f1 < $mysqlfile`; mysqlport=`cut -d\| -f2 < $mysqlfile`; mysqluser=`cut -d\| -f3 < $mysqlfile`; mysqlpswd=`cut -d\| -f4 < $mysqlfile`; mysqldb=`cut -d\| -f5 < $mysqlfile`; if [ $mysqlport == "0" ]; then mysqlport="" else mysqlport="-P$mysqlport" fi #echo "Backing up MYSQL Data" mysqldump -u$mysqluser -p$mysqlpswd -h$mysqlhost $mysqlport $mysqldb > /unison/mysql.dump/vpopmail #script for log clearup #Only last 10 log files are kept in /unison/oldlogs. #The full log will be replaced every 20MB and moved to /unison/oldlogs. LOG=/unison/unisonlog.full LOGCOUNT=`ls $folder2 |wc -l` size=`ls -l $LOG|cut -d ' ' -f 5` LOGSAVE=/unison/oldlogs/unisonlog.`date +%Y%m%d%H%M` #echo $size if [ $size -gt 20000000 ];then echo "this is bigger than 20MB, moving">>$LOG mv $LOG $LOGSAVE echo "" >$LOG fi #LOG COMMENT : if you want to increase the saved logs change the 10 below. if [ $LOGCOUNT -gt 15 ]; then echo "more than 15 logs exist, moving to folder /unison/oldlogs/previous" #keep previous batch in previous rm -f /unison/oldlogs/previous/* mv /unison/oldlogs/* /unison/oldlogs/previous clear echo "`date` : log files have been moved to oldlogs" >>$LOG echo "Unison Log files moved, please check folders on server"|mail -s"Unison Logile Rotation : `date`" $email fi echo "" >>$LOG echo "`date` ***STARTING REPLICATION RUN**** " >>$LOG #Begin unison replication using /root/.unison/qmail.prf file for folder locations #Log file location and format /unison/unison -force / -batch qmail >> $LOG 2>&1 echo "Deleting lock file" >> $LOG 2>&1 rm -f $lock echo "Done `date`" >> $LOG 2>&1
;*Below is the contents of the qmail.prf file. This file points to the backup server ip and port and specifies the folders to be replicated. It will only replicate what is specified by Path. If you don't include path, then it will replicate the whole root. If you want to add details e.g. squirrellmail prefs etc, then add path = path to folder. note no trailing / add the correct ip address and port no when you paste this into the qmail.prf file. eg. root = socket://10.0.0.1:1234//
#Root and path setup for qmail backup root = / root = socket://xxx.xxx.xxx.xxx:port no.// path = home/vpopmail path = var/qmail/control path = var/qmail/users path = etc/mail/spamassassin path = etc/tcprules.d path = unison/mysql.dump ignore = Name *.lock
Backup Server (Server) Setup
- The backup server runs the unison program with the -sockets option. This creates a listening socket that the client will try and connect to. There is also a script that will deal with qmail cleanup and configuration as unison does not replicate ownerships.
*To configure Unison on the Server side take the following steps. The same unison application as above is used.
mkdir /unison cd /unison wget https://svn.cis.upenn.edu/svnroot/unison-contributed-binaries/linux/unison-2.13.16-linux-text.bz2 bzip2 -d unison-2.13.16-linux-text.bz2 mv unison-2.13.16-linux-text unison chmod 755 unison ./unison *This will create the initial /root/.unison database folder vi qmail-replicateb *Once in the editor, copy and paste the script from below :wq chmod 755 qmail-replicateb vi unison-run *Once in the editor, copy and paste the unison-run script from below :wq chmod 755 unison-run vi qmail-switch *Once in the editor, copy and paste the qmail-switch script from below. :wq chmod 755 qmail-switch cp unison-run /etc/init.d ln -s /etc/init.d/unison-run /etc/rc3.d/S50unison-run *This will configure unison to start on boot up.
- To manually start or stop the socket do the following
/unison/unison-run start
or
/unison/unison-run stop
- The server side script uses a qmail queue repair tool. This needs to be configured before running the script
cd /unison mkdir queue-repair wget http://pyropus.ca/software/queue-repair/queue-repair-0.9.0.tar.gz tar -xzf queue-repair-0.9.0.tar.gz mv queue-repair-0.9.0 queue-repair
- Run the client script if you need to switch to the backup server
;Below is the script for the server called qmail-replicateb
#!/bin/sh #Version 1.0 - Oct 11th 2006 #Script created by Craig Smith - craig@doc-net.com #This script is the server (backup mail server) side script to monitor #changes and import the changes and fix qmail accordingly. #It also sets the correct ownership on the vpopmail folders as unison doesn't #replicate ownership. #This script assumes unison and scripts are placed in /unison #The mysql commands,qmail-newu and queue repair were taken from Jake Vickers' qmail #restore scripts. #Put this script in cron job to finalize changes based on the main server #Check the qmail-replicate.log file for each run to see if changes were made. #Log file will be adjusted to similarly match client side script. #Currently qmail is left off to prevent unecessary direct mailings to the #backup server. This can be uncommented to leave it running. #Please add your mysql root password below mysqlrootpass="mysql password" #This portion compares the vpopmail files between replication for changes and #acts accordingly. If you want this to run with every cron job, comment out #the next 4 script lines as well as the last 3 script lines. #Checking for mysql.dump and oldlogs folder folder1=/unison/mysql.dump folder2=/unison/oldlogs if [ ! -d $folder1 ]; then mkdir -p $folder1 fi if [ ! -d $folder2 ]; then mkdir -p $folder2 fi #script for log clearup #Only last 10 run log files are kept in /unison. The rest are moved to #/unison/oldlogs and cleared out each time 10 are exceeded to prevent excessive #buildup. #Log file location and format LOG=/unison/unisonlog.`date +%Y%m%d%H%M` 2>&1 LOGCOUNT=`ls /unison/unisonlog.* |wc -l` #mysql file to be imported FILE=/unison/mysql.dump/vpopmail #LOG COMMENT : if you want to increase the saved logs change the 10 below. if [ $LOGCOUNT -gt 10 ] then # echo "more than 10 logs exist, moving to folder /unison/oldlogs" rm -f /unison/oldlogs/* mv /unison/unisonlog.* /unison/oldlogs clear echo "`date` : log files have been moved to oldlogs" >$LOG elif [ $LOGCOUNT -le 10 ] then echo "`date` : Log count is fine, nothing to do" >>$LOG fi mysqladmin -f -uroot -p$mysqlrootpass drop vpopmail >>$LOG mysqladmin -uroot -p$mysqlrootpass flush-tables >>$LOG mysqladmin -uroot -p$mysqlrootpass refresh >>$LOG mysqladmin -uroot -p$mysqlrootpass reload >>$LOG mysqladmin create vpopmail -uroot -p$mysqlrootpass >>$LOG mysql -uroot -p$mysqlrootpass vpopmail < $FILE #Set the file permission for vpopmail.vchkpw on vpopmail folder cd /home chown -R vpopmail:vchkpw vpopmail #cd /home/vpopmail/ #chown -R vpopmail:vchkpw * #Run queue repair echo "Running qmail-newu for any other loose ends....">>$LOG /var/qmail/bin/qmail-newu >>$LOG #Reload mysql data and restart httpd echo "Reloading (and refreshing) MySQL and Apache...." >>$LOG mysqladmin -uroot -p$mysqlrootpass reload >>$LOG mysqladmin -uroot -p$mysqlrootpass refresh >>$LOG /sbin/service httpd reload >>$LOG #Rebuild qmail cdb's to accomadate any replicated changes echo "Rebuilding QMail's CDB and starting the mail server...." >>$LOG echo "If qmail is not running the svc errors are normal and can be ignored." >>$LOG qmailctl stop >>$LOG 2>&1 sleep 3 qmailctl cdb >>$LOG #Leaving qmail stopped to prevent unecessary direct mailing etc. #qmailctl start echo "`date` The database has been imported into MYSQL. Verify user details." >>$LOG
;Below is the script for the unison control file called unison-run
!/bin/sh #Script created by Craig Smith - craig@doc-net.com #Script for Unison Start up shutdown #Set the socket number in the /unison/unison -socket # line. case "$1" in start) # have a silent kill in case someone tries to start the service when it # is already running /unison/unison-run stop >/dev/null 2>&1 echo -n "Starting Unison Socket Server" /unison/unison -socket xxxx & echo $! > /var/run/unison.pid echo "." ;; stop) echo -n "Stopping Unison Socket Server" kill `cat /var/run/unison.pid` rm -rf /var/run/unison.pid ;; help) HELP ;; *) echo "Usage: unison-run{start|stop}" exit 1 ;; esac
Below is the script to use, when you need to switch to your backup server. Change the backup server's ip to match the primary and then run this. By changing the ip you don't have to worry about dns updates by changing the host record. This script will do a final clearup and then start qmail. /unison/qmail-switch to run this file.
Below is the qmail-switch script
#!/bin/sh #Version 1.0 - Oct 11th 2006 #Script created by Craig Smith - craig@doc-net.com #Please add your mysql root password below mysqlrootpass="mysql rootpass" LOG=/unison/qmailswitch.log FILE=/unison/mysql.dump/vpopmail mysqladmin -f -uroot -p$mysqlrootpass drop vpopmail >>$LOG mysqladmin -uroot -p$mysqlrootpass flush-tables >>$LOG mysqladmin -uroot -p$mysqlrootpass refresh >>$LOG mysqladmin -uroot -p$mysqlrootpass reload >>$LOG mysqladmin create vpopmail -uroot -p$mysqlrootpass >>$LOG mysql -uroot -p$mysqlrootpass vpopmail < $FILE #Set the file permission for vpopmail.vchkpw on vpopmail folder cd /home chown -R vpopmail:vchkpw vpopmail #Run queue repair echo "Running Charles Cazabon's queue_repair python utility to fix any loose ends...." >>$LOG sleep 3 cd /unison/queue-repair chmod 777 queue_repair.py ./queue_repair.py -r >>$LOG echo echo "Running qmail-newu for any other loose ends....">>$LOG /var/qmail/bin/qmail-newu >>$LOG #Reload mysql data and restart httpd echo "Reloading (and refreshing) MySQL and Apache...." >>$LOG mysqladmin -uroot -p$mysqlrootpass reload >>$LOG mysqladmin -uroot -p$mysqlrootpass refresh >>$LOG /sbin/service httpd reload >>$LOG #Rebuild qmail cdb's to accomadate any replicated changes echo "Rebuilding QMail's CDB and starting the mail server...." >>$LOG echo "If qmail is not running the svc errors are normal and can be ignored." >>$LOG qmailctl stop >>$LOG 2>&1 sleep 3 qmailctl cdb >>$LOG qmailctl start echo "`date` The database has been imported into MYSQL. Verify user details." >>$LOG echo "QMAIL has been fixed and started. It should now be live and workings as primary mail. please verify." >>$LOG
Base State Check script
This script will ping your host for uptime, and will create a fail based on failure. To use, vi filename e.g. statecheck and paste the contents from below. Then chmod 755 filename. At this time it's not really that helpful without a method of notifying you. if you have an email address that you monitor regularly that is not on your primary server, then uncomment all email options below and you will be notified on the various levels of failure.
#!/bin/sh #Created by Craig Smith 16/10/2006 #Script file to check for up time on hosts. Will ping at x min intervals, and email after each consequtive failure. #To increase failure numbers, add more variables i.e. fouth, fifth etc. Add to cron to specify check rate. #There are probably other ways of doing this, but for now, this is what came about. HOST=host to be pinged #EMAIL=mail for testing FOLDER=/unison/uptime first=/unison/uptime/fail1 second=/unison/uptime/fail2 third=/unison/uptime/fail3 fine=/unison/uptime/fine if [ ! -d $FOLDER ];then mkdir /unison/uptime fi #If increasing error count, extend these counts. if [ -f $first ];then file=$second elif [ -f $second ];then file=$third elif [ ! -f $first ];then file=$first fi rm -f $FOLDER/* ping -c 4 $HOST >/dev/null if [ $? -eq 0 ]; then echo "$HOST is up and visible" > $fine elif [ $? -eq 1 ];then echo "failure `date`" >$file fi #used for testing. Should be replaced with sms based commands. #if [ -f $first ];then #echo "$HOST has failed to respond for the 1st time on `date`"| mail - s "Please perform first time $HOST state check" $EMAIL #elif [ -f $second ];then #echo "$HOST has failed to respond for the 2nd time on `date`"| mail -s "Please perform second time $HOST state check!!" $EMAIL #elif [ -f $third ];then # echo "$HOST has failed to respond for the 3rd time on `date`"| mail -s "$HOST state!!! OFFLINE 3RD FAILURE" $EMAIL #elif [ -f $fine ]; then #echo "$HOST is alive and well"| mail -s "No problems were detected `date`." $EMAIL; #fi