Im trying to configure Pacemaker cluster events notifications via external agent to receive notifications when failover switching happens.
I searched for below links
But not understanding how actually do this.
Could you please give step by step explaination.
Thank You,
Ranjan.
Advertisement
Answer
The RedHat doc is terse, but Florian’s blog entry is pretty detailed and the references at the end are helpful.
The question as asked is sort of vague, so I’m answering what I think you’re asking.
Briefly, summarizing part of Florian’s post, ClusterMon is a resource agent (ocf:pacemaker:ClusterMon
) that runs crm_mon
behind the scenes.
My (SLES 11 SP3) resource’s documentation says:
# crm ra info ocf:pacemaker:ClusterMon Runs crm_mon in the background, recording the cluster status to an HTML file (ocf:pacemaker:ClusterMon) This is a ClusterMon Resource Agent. It outputs current cluster status to the html. Parameters (* denotes required, [] the default): user (string, [root]): The user we want to run crm_mon as update (integer, [15]): Update interval How frequently should we update the cluster status extra_options (string): Extra options Additional options to pass to crm_mon. Eg. -n -r pidfile (string, [/tmp/ClusterMon_undef.pid]): PID file PID file location to ensure only one instance is running htmlfile (string, [/tmp/ClusterMon_undef.html]): HTML output Location to write HTML output to. Operations' defaults (advisory minimum): start timeout=20 stop timeout=20 monitor timeout=20 interval=10
But, the real power is the extra_options
because this allows you to have the resource agent tell crm_mon
what to do with the results. Specifically, the extra_options are passed verbatim as command line options for crm_mon
.
As Florian mentioned, more recent vintages of crm_mon (what’s actually doing the work) don’t come with SMTP (email) or SNMP support built in. However, it still supports the external agent (through the -E switch).
So, to understand what the extra_options does, you should consult man crm_mon
.
From the RedHat documentation you linked to, the first “extra_options” value of -T pacemaker@example.com -F pacemaker@nodeX.example.com -P PACEMAKER -H mail.example.com
tell crm_mon
to send an email To pacemaker@example.com, From pacemaker@nodeX.example.com, with subject Prefix PACEMAKER, through the mail Host (smtp server) mail.example.com.
The second “extra_options” example in the RedHat doc you reference had value -S snmphost.example.com -C public
which tells crm_mon
to send SNMP traps to snmphost.example.com using the Community named public.
The third “extra_options” example has value -E /usr/local/bin/example.sh -e 192.168.12.1
. This tells crm_mon
to run the External program /usr/local/bin/example.sh and it also also specifies the ‘external recipient’ which actually just gets thrown into an environment variable CRM_notify_recipient
that’s exported before spawning the script.
When running an external agent, crm_mon calls the script provided for every cluster event (including successful monitoring operations!). This script inherits a bunch of environment variables that tell you what’s going on.
From: http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-notification-external.html the environment variables that are set are:
CRM_notify_recipient The static external-recipient from the resource definition. CRM_notify_node The node on which the status change happened. CRM_notify_rsc The name of the resource that changed the status. CRM_notify_task The operation that caused the status change. CRM_notify_desc The textual output relevant error code of the operation (if any) that caused the status change. CRM_notify_rc The return code of the operation. CRM_notify_target_rc The expected return code of the operation. CRM_notify_status The numerical representation of the status of the operation.
The job of the script is to consume these environment variables and do something reasonable with them. What is “reasonable” depends on your environment.
The example with SNMP traps in Florian’s blog assumes you are familiar with SNMP traps. If not, then that’s an entirely different question and beyond the scope of the resource agent.
The example with SNMP traps gives a good conditional statement to identify events that are either unsuccessful monitor events, or events that are not monitor events.
The scaffolding of a monitoring script to do whatever you would like with the available information is really a stripped down version of the snmp trap shell script referenced in Florian’s blog post. It looks like:
#!/bin/bash # if [[ unsuccessful monitor operation ]] or [[ not monitor op ]] if [[ ${CRM_notify_rc} != 0 && ${CRM_notify_task} == "monitor" ]] || [[ ${CRM_notify_task} != "monitor" ]] ; then # Do whatever you want with the information available in the # environment variables mentioned above that will do something # meaningful for you. # EG: Fire off an email attempting to be human readable # SUBJ="${CRM_notify_task} ${CRM_notify_desc} for ${CRM_notify_rsc} " # SUBJ="$SUBJ on ${CRM_notify_node}" # MSG="The ${CRM_notify_task} operation for ${CRM_notify_rsc} on " # MSG="$MSG ${CRM_notify_node} exited with status ${CRM_notify_rc} " # MSG="$MSG (${CRM_notify_desc}) and we expected ${CRM_notify_target_rc}" # echo "$MSG" | mail -s "$SUBJ" you@host.com fi exit 0
However, if you’re following Florian’s advice and you clone the resource, the script will run on every node. For SNMP traps that’s perfectly fine. However, if you’re doing something like sending an email from the script, you may not want to actually have it cloned.