<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: Setting up fully redundant failover nagios servers	</title>
	<atom:link href="https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/</link>
	<description>...are belong to the internet.</description>
	<lastBuildDate>Mon, 05 Jun 2017 20:23:11 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>
		By: Jorge		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-273920</link>

		<dc:creator><![CDATA[Jorge]]></dc:creator>
		<pubDate>Mon, 05 Jun 2017 20:23:11 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-273920</guid>

					<description><![CDATA[Worked great!!

Thank you =)]]></description>
			<content:encoded><![CDATA[<p>Worked great!!</p>
<p>Thank you =)</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: bish		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-271884</link>

		<dc:creator><![CDATA[bish]]></dc:creator>
		<pubDate>Wed, 01 Feb 2017 20:27:48 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-271884</guid>

					<description><![CDATA[&#039;cat &#124; sed &#039; ?]]></description>
			<content:encoded><![CDATA[<p>&#8216;cat | sed &#8216; ?</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: ilie dumitru		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-271499</link>

		<dc:creator><![CDATA[ilie dumitru]]></dc:creator>
		<pubDate>Mon, 09 Jan 2017 08:36:12 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-271499</guid>

					<description><![CDATA[I do it some years ago, in a following way:

Behind nagios work httpd and mysql, so there is the key of solution

A virtual ip address positioned on the master
The mysql slave, having data every 2 second from master’s bin-log (mysql replication)
A perl script, watching  the master to know the state of services (ip up, httpd, mysqld etc) and if problems, can connect on the slave, restart the services, rise the virtual ip address, change my.cnf, and restart mysql server as master.  The services Nagios are restart  after
The perl script, positioned on a third server, can connect with public key declared in .ssh/authorized_keys

The nrpe conf files of all machines watches, must accept the 2 addr of nagios server (master and slave)

It works fine]]></description>
			<content:encoded><![CDATA[<p>I do it some years ago, in a following way:</p>
<p>Behind nagios work httpd and mysql, so there is the key of solution</p>
<p>A virtual ip address positioned on the master<br />
The mysql slave, having data every 2 second from master’s bin-log (mysql replication)<br />
A perl script, watching  the master to know the state of services (ip up, httpd, mysqld etc) and if problems, can connect on the slave, restart the services, rise the virtual ip address, change my.cnf, and restart mysql server as master.  The services Nagios are restart  after<br />
The perl script, positioned on a third server, can connect with public key declared in .ssh/authorized_keys</p>
<p>The nrpe conf files of all machines watches, must accept the 2 addr of nagios server (master and slave)</p>
<p>It works fine</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Praveen Diwakar		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-269374</link>

		<dc:creator><![CDATA[Praveen Diwakar]]></dc:creator>
		<pubDate>Wed, 24 Aug 2016 09:07:43 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-269374</guid>

					<description><![CDATA[hi 
I am trying HA solution quite a while but getting following error.

I have configured the two nagios servers as instructed above . I have been able to sync the retention .dat file.  My problem is that I am using it in private network means a private IP [192.168.x.x] ,and when I run the nagios.watchdog.sh.erb script it give me &quot;host key verification error&quot; .

I have removed the  known host files but still same error. I am able to ssh both servers without password.

Please give me solution on this!!!!!!!!!!


Thanks]]></description>
			<content:encoded><![CDATA[<p>hi<br />
I am trying HA solution quite a while but getting following error.</p>
<p>I have configured the two nagios servers as instructed above . I have been able to sync the retention .dat file.  My problem is that I am using it in private network means a private IP [192.168.x.x] ,and when I run the nagios.watchdog.sh.erb script it give me &#8220;host key verification error&#8221; .</p>
<p>I have removed the  known host files but still same error. I am able to ssh both servers without password.</p>
<p>Please give me solution on this!!!!!!!!!!</p>
<p>Thanks</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Pedro Albuquerque		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-9204</link>

		<dc:creator><![CDATA[Pedro Albuquerque]]></dc:creator>
		<pubDate>Fri, 14 Oct 2011 16:52:35 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-9204</guid>

					<description><![CDATA[Hi,

I think &quot;retain_state_information=1&quot; is just to retain information before its shutdown and not to sync every minute.
In Nagios Core 3.2.3, there is the variable &quot;retention_update_interval&quot; which determines how often (in minutes) that Nagios will automatically save retention data during normal operation.
Does anyone already figured out which variable should be configured to retain information every minute?

Cheers.]]></description>
			<content:encoded><![CDATA[<p>Hi,</p>
<p>I think &#8220;retain_state_information=1&#8221; is just to retain information before its shutdown and not to sync every minute.<br />
In Nagios Core 3.2.3, there is the variable &#8220;retention_update_interval&#8221; which determines how often (in minutes) that Nagios will automatically save retention data during normal operation.<br />
Does anyone already figured out which variable should be configured to retain information every minute?</p>
<p>Cheers.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Michael Edwards		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-8255</link>

		<dc:creator><![CDATA[Michael Edwards]]></dc:creator>
		<pubDate>Thu, 28 Jul 2011 17:05:49 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-8255</guid>

					<description><![CDATA[As of 3.2.3 there is a separate option called &quot;retention_update_interval&quot; in addition to the retain_state_information option mentioned in this blog posting.

This is what I ended up after making some changes for greater compatability with my somewhat default nagios install as well as RHEL3/5 compatability.

#!/bin/bash

# Executable variables. Useful.
RM=&quot;/bin/rm -f&quot;
MV=&quot;/bin/mv&quot;
ECHO=&quot;/bin/echo -e&quot;
FQDN=&quot;/bin/hostname --fqdn&quot;
FIXFILES=&quot;/sbin/fixfiles&quot;
MAILER=/usr/sbin/sendmail
SUBJECT=&quot;URGENT: nagios master process switch has taken place.&quot;
RECIPIENT=&quot;isadmin@vtls.com&quot;
SERVICE=/etc/init.d/nagios
RETENTIONFILE=/usr/local/nagios/var/retention.dat

# This is where we point the servers at each-other (configure this properly in your deployment!)
#This should be for the other server of the pair
MASTERHOST=10.0.0.2

# Ensure only one copy can run at a time
PIDFILE=/var/run/nagios-watchdog.pid
if [ -e ${PIDFILE} ]; then
    exit 1;
else
    touch ${PIDFILE};
fi

# Checks the actual daemon status on the other host
#echo &quot;su - nagios -c \&quot;ssh ${MASTERHOST} &#039;/etc/init.d/nagios status&#039;\&quot;&quot;
su - nagios -c &quot;ssh ${MASTERHOST} /etc/init.d/nagios status&quot;
#&#062;/dev/null 2&#062;&#038;1

# Is the other host doing all the work? 
if [ $? -eq 0 ]; then
    # Service running on MASTERHOST.  Stop my service so there is only one.
#echo &quot;Nagios running on MASTERHOST&quot;
#echo &quot; ${SERVICE} stop &quot;
    ${SERVICE} stop &#062;/dev/null 2&#062;&#038;1

    # Copy the retention data from the other nagios process
#echo &quot;su nagios -c \&quot;scp ${MASTERHOST}:${RETENTIONFILE} /tmp/\&quot;&quot;
    su - nagios -c &quot;scp ${MASTERHOST}:${RETENTIONFILE} /tmp/&quot;;

    # Verify that we didnt get a corrupted copy
    if [ `grep &quot;{&quot; /tmp/retention.dat &#124; wc -l` -eq `grep &quot;}&quot; /tmp/retention.dat &#124; wc -l` ]; then
        ${MV} /tmp/retention.dat ${RETENTIONFILE};
    else
        ${RM} /tmp/retention.dat;
    fi
    #${FIXFILES} restore /var/log/nagios
else
#   echo &quot;Service not running on MASTERHOST&quot;
    ${SERVICE} status &#062;/dev/null 2&#062;&#038;1
    if [ $? -ne 0 ]; then
#       echo &quot;Service not running here either.  Sending notification.&quot;
        ${ECHO} &quot;From: nagios-watchdog@`hostname`\nSubject: ${SUBJECT}\nTo: ${RECIPIENT}\nNow running on host: `hostname`&quot; &#124; ${MAILER} ${RECIPIENT};
#       echo &quot;Starting nagios on localhost.&quot;
        ${SERVICE} start &#062;/dev/null 2&#062;&#038;1;
    fi
fi

${RM} ${PIDFILE}

exit 0;]]></description>
			<content:encoded><![CDATA[<p>As of 3.2.3 there is a separate option called &#8220;retention_update_interval&#8221; in addition to the retain_state_information option mentioned in this blog posting.</p>
<p>This is what I ended up after making some changes for greater compatability with my somewhat default nagios install as well as RHEL3/5 compatability.</p>
<p>#!/bin/bash</p>
<p># Executable variables. Useful.<br />
RM=&#8221;/bin/rm -f&#8221;<br />
MV=&#8221;/bin/mv&#8221;<br />
ECHO=&#8221;/bin/echo -e&#8221;<br />
FQDN=&#8221;/bin/hostname &#8211;fqdn&#8221;<br />
FIXFILES=&#8221;/sbin/fixfiles&#8221;<br />
MAILER=/usr/sbin/sendmail<br />
SUBJECT=&#8221;URGENT: nagios master process switch has taken place.&#8221;<br />
RECIPIENT=&#8221;isadmin@vtls.com&#8221;<br />
SERVICE=/etc/init.d/nagios<br />
RETENTIONFILE=/usr/local/nagios/var/retention.dat</p>
<p># This is where we point the servers at each-other (configure this properly in your deployment!)<br />
#This should be for the other server of the pair<br />
MASTERHOST=10.0.0.2</p>
<p># Ensure only one copy can run at a time<br />
PIDFILE=/var/run/nagios-watchdog.pid<br />
if [ -e ${PIDFILE} ]; then<br />
    exit 1;<br />
else<br />
    touch ${PIDFILE};<br />
fi</p>
<p># Checks the actual daemon status on the other host<br />
#echo &#8220;su &#8211; nagios -c \&#8221;ssh ${MASTERHOST} &#8216;/etc/init.d/nagios status&#8217;\&#8221;&#8221;<br />
su &#8211; nagios -c &#8220;ssh ${MASTERHOST} /etc/init.d/nagios status&#8221;<br />
#&gt;/dev/null 2&gt;&amp;1</p>
<p># Is the other host doing all the work?<br />
if [ $? -eq 0 ]; then<br />
    # Service running on MASTERHOST.  Stop my service so there is only one.<br />
#echo &#8220;Nagios running on MASTERHOST&#8221;<br />
#echo &#8221; ${SERVICE} stop &#8221;<br />
    ${SERVICE} stop &gt;/dev/null 2&gt;&amp;1</p>
<p>    # Copy the retention data from the other nagios process<br />
#echo &#8220;su nagios -c \&#8221;scp ${MASTERHOST}:${RETENTIONFILE} /tmp/\&#8221;&#8221;<br />
    su &#8211; nagios -c &#8220;scp ${MASTERHOST}:${RETENTIONFILE} /tmp/&#8221;;</p>
<p>    # Verify that we didnt get a corrupted copy<br />
    if [ `grep &#8220;{&#8221; /tmp/retention.dat | wc -l` -eq `grep &#8220;}&#8221; /tmp/retention.dat | wc -l` ]; then<br />
        ${MV} /tmp/retention.dat ${RETENTIONFILE};<br />
    else<br />
        ${RM} /tmp/retention.dat;<br />
    fi<br />
    #${FIXFILES} restore /var/log/nagios<br />
else<br />
#   echo &#8220;Service not running on MASTERHOST&#8221;<br />
    ${SERVICE} status &gt;/dev/null 2&gt;&amp;1<br />
    if [ $? -ne 0 ]; then<br />
#       echo &#8220;Service not running here either.  Sending notification.&#8221;<br />
        ${ECHO} &#8220;From: nagios-watchdog@`hostname`\nSubject: ${SUBJECT}\nTo: ${RECIPIENT}\nNow running on host: `hostname`&#8221; | ${MAILER} ${RECIPIENT};<br />
#       echo &#8220;Starting nagios on localhost.&#8221;<br />
        ${SERVICE} start &gt;/dev/null 2&gt;&amp;1;<br />
    fi<br />
fi</p>
<p>${RM} ${PIDFILE}</p>
<p>exit 0;</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: brose		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-7126</link>

		<dc:creator><![CDATA[brose]]></dc:creator>
		<pubDate>Tue, 03 May 2011 13:32:49 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-7126</guid>

					<description><![CDATA[Aaron,
Thanks! Glad it was of use. I ended up going with the curl route as it did not require me to compile any custom SELinux modules. If you do the ps method, you will need to somehow enable httpd access to read the process table. You are correct, however - both ways will work well.]]></description>
			<content:encoded><![CDATA[<p>Aaron,<br />
Thanks! Glad it was of use. I ended up going with the curl route as it did not require me to compile any custom SELinux modules. If you do the ps method, you will need to somehow enable httpd access to read the process table. You are correct, however &#8211; both ways will work well.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Aaron		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-7120</link>

		<dc:creator><![CDATA[Aaron]]></dc:creator>
		<pubDate>Mon, 02 May 2011 21:57:55 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-7120</guid>

					<description><![CDATA[This guide was very helpful!

One addition I made was to check for running nagios by using exec ps -C nagios3 instead of the curl operation in your script. Both ways seem to work well.]]></description>
			<content:encoded><![CDATA[<p>This guide was very helpful!</p>
<p>One addition I made was to check for running nagios by using exec ps -C nagios3 instead of the curl operation in your script. Both ways seem to work well.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: brose		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-6954</link>

		<dc:creator><![CDATA[brose]]></dc:creator>
		<pubDate>Thu, 14 Apr 2011 20:50:47 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-6954</guid>

					<description><![CDATA[Jason,

All of my testing and work has been with Nagios core, the free open-source version available from rhel/epel and the like. However, as long as Nagios XI still uses the retention.dat mechanism for restoring it&#039;s exit state on startup, this should work. In fact, it may be even easier, as it sounds like you can point them both to the same database for config, and don&#039;t need to worry about syncing with puppet or some such. I cannot provide any information on NDOutils or any plugins, it all depends whether or not they store their retention information in retention.dat. There is no reason why my script cannot be modified to scp multiple files, though. For example, I have extended it in our environment to also copy over the logs/ directory. This syncs historical data, making end-of-year reports reliable.]]></description>
			<content:encoded><![CDATA[<p>Jason,</p>
<p>All of my testing and work has been with Nagios core, the free open-source version available from rhel/epel and the like. However, as long as Nagios XI still uses the retention.dat mechanism for restoring it&#8217;s exit state on startup, this should work. In fact, it may be even easier, as it sounds like you can point them both to the same database for config, and don&#8217;t need to worry about syncing with puppet or some such. I cannot provide any information on NDOutils or any plugins, it all depends whether or not they store their retention information in retention.dat. There is no reason why my script cannot be modified to scp multiple files, though. For example, I have extended it in our environment to also copy over the logs/ directory. This syncs historical data, making end-of-year reports reliable.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Jason		</title>
		<link>https://blog.allmybase.com/2010/10/04/setting-up-fully-redundant-failover-nagios-servers/#comment-6953</link>

		<dc:creator><![CDATA[Jason]]></dc:creator>
		<pubDate>Thu, 14 Apr 2011 20:32:37 +0000</pubDate>
		<guid isPermaLink="false">http://allmybase.com/?p=133#comment-6953</guid>

					<description><![CDATA[is this just for Core or will this work for Nagios XI using databases for configs and NDOutils?]]></description>
			<content:encoded><![CDATA[<p>is this just for Core or will this work for Nagios XI using databases for configs and NDOutils?</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>
