Latency thresholding fails for StrafePing, perhaps others when nulls exist in PollStatus properties

Description

Steps to reproduce:

1. Configure the "strafer" package in poller-configuration.xml to include some part of your network
2. In that package's "StrafePing" service definition, add a "thresholding-enabled" parameter with value "true"
3. In threshd-configuration.xml, add a package like the following:

<package name="strafeping">
<filter>IPADDR != '0.0.0.0'</filter>
<include-range begin="1.1.1.1" end="254.254.254.254"/>
<include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" />

<service name="StrafePing" interval="300000" user-defined="false" status="on">
<parameter key="thresholding-group" value="strafeping"/>
</service>
</package>

4. In thresholds.xml, add a corresponding threshold group with a threshold on "response-time" with its value set so low that both will surely trigger (note that response-time is in microseconds) and on "loss" with a value of 1.0:

<group name="strafeping" rrdRepository="/opt/opennms/share/rrd/response/">
<threshold type="high" ds-type="if" value="10.0" rearm="5.0" trigger="1" filterOperator="or" ds-name="response-time"/>
<threshold type="high" ds-type="if" value="1.0" rearm="0.0" trigger="1" filterOperator="or" ds-name="loss"/>
</group>

5. Reload both threshold configuration files via reloadDaemonConfig events, or just restart OpenNMS.

6. Provision a node at least one of whose interfaces includes the StrafePing service and matches the "strafer" package filter from poller-configuration.xml.

7. Wait for one poll cycle to pass (at which point a highThresholdExceeded for DS 'response-time' should appear)

8. Configure some packet loss using your SmartBITS or iptables (INPUT ... -p ICMP -m statistic --mode nth --every 6 -j DROP works for me)

9. Wait for the next poll cycle to come around.

Expected result: highThresholdExceeded events for both "response-time" and "loss" DSes

Actual result: No event for the "loss" DS, and the following in poller.log:

2012-10-29 14:52:13,222 ERROR [PollerScheduler-30 Pool-fiber2] LatencyStoringServiceMonitorAdaptor: Failed to threshold on 1:192.168.34.251:StrafePing for response-time because of an exception
java.lang.NullPointerException
at org.opennms.netmgt.poller.pollables.LatencyStoringServiceMonitorAdaptor.applyThresholds(LatencyStoringServiceMonitorAdaptor.java:154)
at org.opennms.netmgt.poller.pollables.LatencyStoringServiceMonitorAdaptor.storeResponseTime(LatencyStoringServiceMonitorAdaptor.java:132)
at org.opennms.netmgt.poller.pollables.LatencyStoringServiceMonitorAdaptor.poll(LatencyStoringServiceMonitorAdaptor.java:107)
at org.opennms.netmgt.poller.pollables.PollableServiceConfig.poll(PollableServiceConfig.java:109)
at org.opennms.netmgt.poller.pollables.PollableService.poll(PollableService.java:178)
at org.opennms.netmgt.poller.pollables.PollableElement.poll(PollableElement.java:292)
at org.opennms.netmgt.poller.pollables.PollableContainer$5.run(PollableContainer.java:305)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:263)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:249)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:227)
at org.opennms.netmgt.poller.pollables.PollableContainer.poll(PollableContainer.java:312)
at org.opennms.netmgt.poller.pollables.PollableInterface.poll(PollableInterface.java:205)
at org.opennms.netmgt.poller.pollables.PollableContainer$5.run(PollableContainer.java:305)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:263)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:249)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:227)
at org.opennms.netmgt.poller.pollables.PollableContainer.poll(PollableContainer.java:312)
at org.opennms.netmgt.poller.pollables.PollableNode$3.run(PollableNode.java:303)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:263)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:249)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:227)
at org.opennms.netmgt.poller.pollables.PollableNode.doPoll(PollableNode.java:306)
at org.opennms.netmgt.poller.pollables.PollableElement.doPoll(PollableElement.java:183)
at org.opennms.netmgt.poller.pollables.PollableService.doPoll(PollableService.java:211)
at org.opennms.netmgt.poller.pollables.PollableService$PollRunner.run(PollableService.java:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:263)
at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:249)
at org.opennms.netmgt.poller.pollables.PollableService.doRun(PollableService.java:383)
at org.opennms.netmgt.poller.pollables.PollableService.run(PollableService.java:364)
at org.opennms.netmgt.scheduler.Schedule.run(Schedule.java:135)
at org.opennms.netmgt.scheduler.Schedule$ScheduleEntry.run(Schedule.java:80)
at org.opennms.netmgt.scheduler.LegacyScheduler$1.run(LegacyScheduler.java:287)
at org.opennms.core.concurrent.RunnableConsumerThreadPool$FiberThreadImpl.run(RunnableConsumerThreadPool.java:419)
at java.lang.Thread.run(Thread.java:662)

Environment

Any environment using StrafePingMonitor and seeing any loss of ping packets.

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Jeff Gehlbach October 29, 2012 at 5:04 PM

Fix pushed to 1.10

Jeff Gehlbach October 29, 2012 at 4:59 PM

The "perhaps others" referred to in the subject of this issue includes poller monitor classes that return a multi-valued result set in the form of a Properties object on the PollStatus returned from the "poll" method. In 1.10 this includes StrafePingMonitor, CiscoPingMibMonitor, PageSequenceMonitor, MemcachedMonitor, and TrivialTimeMonitor as well as BSFMonitor in certain configurations. Any of these monitors could potentially suffer from the same bug under certain conditions.

Jeff Gehlbach October 29, 2012 at 3:23 PM

I have a fix for this queued up to commit after servicing another interrupt. If you don't see it by the end of today, please remind me slightly smiling face

Fixed

Details

Assignee

Reporter

Fix versions

Affects versions

Priority

PagerDuty

Created October 29, 2012 at 3:21 PM
Updated January 27, 2017 at 4:20 PM
Resolved October 29, 2012 at 5:04 PM

Flag notifications