Scheduled outages applied on latency thresholds are ignored by Pollerd.
Description
Tim Miller Dyck from the mailing list has reported this problem.
He was testing out ICMP latency thresholds. They were working fine (he can see alerts triggering at the expected latency values).
However, He was not able to get threshold alerts to stop during scheduled outages where the ICMP latency threshold is marked to be ignored. The alerts continue throughout the outage.
Other types of thresholds are stopping during scheduled outages as expected – just the ICMP latency outages aren't working for him.
Checking the source code I've found that LatencyStoringServiceMonitorAdaptor is not checking if the node is on outage before applying the thresholds, and that's the reason of the problem.
Acceptance / Success Criteria
None
Lucidchart Diagrams
Activity
Alejandro Galue May 14, 2012 at 4:07 PM
Fixed on revision e460cc85c9824d64062216ca88a0b3cd616a12f0 for 1.10.
I've added a JUnit test to validate that when an a node is on outage, the thresholds won't be applied on that particular node.
Alejandro Galue May 14, 2012 at 4:04 PM
Some details about how to reproduce the problem, based on Tim's post to the mailing list:
threshd-configuration.xml
thresholds.xml:
poller-configuration.xml:
poll-outages.xml:
This outage applies to all nodes and all interfaces and has the threshold group "icmp-latency-thresholds-slownetworklink" enabled in the "Threshold Checking" section.
Below is a debug log from pollerd. The testing was done during a time when the outage 'No ICMP threshold monitoring during the daily server backup widow due to high network load' was active:
Tim Miller Dyck from the mailing list has reported this problem.
He was testing out ICMP latency thresholds. They were working fine (he can see alerts triggering at the expected latency values).
However, He was not able to get threshold alerts to stop during scheduled outages where the ICMP latency threshold is marked to be ignored. The alerts continue throughout the outage.
Other types of thresholds are stopping during scheduled outages as expected – just the ICMP latency outages aren't working for him.
Checking the source code I've found that LatencyStoringServiceMonitorAdaptor is not checking if the node is on outage before applying the thresholds, and that's the reason of the problem.