Skip to content
Advertisement

Monitor a Pacemaker Cluster with ocf:pacemaker:ClusterMon and/or external-agent

Im trying to configure Pacemaker cluster events notifications via external agent to receive notifications when failover switching happens.
I searched for below links

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/s1-eventnotification-HAAR.html

http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/

But not understanding how actually do this.
Could you please give step by step explaination.

Thank You,
Ranjan.

Advertisement

Answer

The RedHat doc is terse, but Florian’s blog entry is pretty detailed and the references at the end are helpful.

The question as asked is sort of vague, so I’m answering what I think you’re asking.

Briefly, summarizing part of Florian’s post, ClusterMon is a resource agent (ocf:pacemaker:ClusterMon) that runs crm_mon behind the scenes.

My (SLES 11 SP3) resource’s documentation says:

# crm ra info ocf:pacemaker:ClusterMon
Runs crm_mon in the background, recording the cluster status to an HTML file (ocf:pacemaker:ClusterMon)

This is a ClusterMon Resource Agent.
It outputs current cluster status to the html.

Parameters (* denotes required, [] the default):

user (string, [root]): 
    The user we want to run crm_mon as

update (integer, [15]): Update interval
    How frequently should we update the cluster status

extra_options (string): Extra options
    Additional options to pass to crm_mon.  Eg. -n -r

pidfile (string, [/tmp/ClusterMon_undef.pid]): PID file
    PID file location to ensure only one instance is running

htmlfile (string, [/tmp/ClusterMon_undef.html]): HTML output
    Location to write HTML output to.

Operations' defaults (advisory minimum):

    start         timeout=20
    stop          timeout=20
    monitor       timeout=20 interval=10

But, the real power is the extra_options because this allows you to have the resource agent tell crm_mon what to do with the results. Specifically, the extra_options are passed verbatim as command line options for crm_mon.

As Florian mentioned, more recent vintages of crm_mon (what’s actually doing the work) don’t come with SMTP (email) or SNMP support built in. However, it still supports the external agent (through the -E switch).

So, to understand what the extra_options does, you should consult man crm_mon.

From the RedHat documentation you linked to, the first “extra_options” value of -T pacemaker@example.com -F pacemaker@nodeX.example.com -P PACEMAKER -H mail.example.com tell crm_mon to send an email To pacemaker@example.com, From pacemaker@nodeX.example.com, with subject Prefix PACEMAKER, through the mail Host (smtp server) mail.example.com.

The second “extra_options” example in the RedHat doc you reference had value -S snmphost.example.com -C public which tells crm_mon to send SNMP traps to snmphost.example.com using the Community named public.

The third “extra_options” example has value -E /usr/local/bin/example.sh -e 192.168.12.1. This tells crm_mon to run the External program /usr/local/bin/example.sh and it also also specifies the ‘external recipient’ which actually just gets thrown into an environment variable CRM_notify_recipient that’s exported before spawning the script.

When running an external agent, crm_mon calls the script provided for every cluster event (including successful monitoring operations!). This script inherits a bunch of environment variables that tell you what’s going on.

From: http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-notification-external.html the environment variables that are set are:

CRM_notify_recipient    The static external-recipient from the resource definition.
CRM_notify_node The node on which the status change happened.
CRM_notify_rsc  The name of the resource that changed the status.
CRM_notify_task The operation that caused the status change.
CRM_notify_desc The textual output relevant error code of the operation (if any) that caused the status change.
CRM_notify_rc   The return code of the operation.
CRM_notify_target_rc    The expected return code of the operation.
CRM_notify_status   The numerical representation of the status of the operation.

The job of the script is to consume these environment variables and do something reasonable with them. What is “reasonable” depends on your environment.

The example with SNMP traps in Florian’s blog assumes you are familiar with SNMP traps. If not, then that’s an entirely different question and beyond the scope of the resource agent.

The example with SNMP traps gives a good conditional statement to identify events that are either unsuccessful monitor events, or events that are not monitor events.

The scaffolding of a monitoring script to do whatever you would like with the available information is really a stripped down version of the snmp trap shell script referenced in Florian’s blog post. It looks like:

#!/bin/bash

# if [[ unsuccessful monitor operation ]] or [[ not monitor op ]]
if [[ ${CRM_notify_rc} != 0 && ${CRM_notify_task} == "monitor" ]] || 
   [[ ${CRM_notify_task} != "monitor" ]] ; then

    # Do whatever you want with the information available in the
    # environment variables mentioned above that will do something
    # meaningful for you.

    # EG: Fire off an email attempting to be human readable
    # SUBJ="${CRM_notify_task} ${CRM_notify_desc} for ${CRM_notify_rsc} "
    # SUBJ="$SUBJ on ${CRM_notify_node}"
    # MSG="The ${CRM_notify_task} operation for ${CRM_notify_rsc} on "
    # MSG="$MSG ${CRM_notify_node} exited with status ${CRM_notify_rc} "
    # MSG="$MSG (${CRM_notify_desc}) and we expected ${CRM_notify_target_rc}"
    # echo "$MSG" | mail -s "$SUBJ" you@host.com


fi
exit 0 

However, if you’re following Florian’s advice and you clone the resource, the script will run on every node. For SNMP traps that’s perfectly fine. However, if you’re doing something like sending an email from the script, you may not want to actually have it cloned.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement