replication monitoring

Thursday, 20 August 2015

Hello,

I have deployed a MMR cluster with a recent (about April) version of 389 from the CentOS 6
repository.

Following example 2 of this document, I have tried to set up a monitoring script on each
node to verify that replication is correctly succeeding:
http://directory.fedoraproject.org/docs/389ds/howto/howto-replicationmoni...

The monitoring command-line search usually works, but when replication is occurring it
returns a false-positive for replication errors because some of the replicas are busy.

Rather than grepping out on the word “busy” which might lead us to miss the state when
everything is erring out because everything is busy, I thought I should ask for
recommendations on handling this.

My best idea is to run the command several times over several seconds and if it fails more
than X times in a row, then issue an alert.  Of course that wouldn’t work if there was a
longer-than-usual replication underway.  Is there a better way to do this?

Thank you,
Russ.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005