GpMonitor in Poller does not change timeout to seconds from ms

Description

In capsd-configuration.xml, I have the following:

<protocol-plugin protocol="NFS" class-name="org.opennms.netmgt.capsd.plugins.GpPlugin" scan="on" user-defined="true">
<property key="script" value="/opt/opennms/bin/gp-chknfs.pl"/>
<property key="banner" value="SUCCESS" />
<property key="timeout" value="10000" />
<property key="retry" value="1" />
<protocol-configuration scan="on" user-defined="true">
</protocol-plugin>

In poller-configuration.xml, I have the following:

<service name="NFS" interval="270000" user-defined="false" status="on">
<parameter key="script" value="/opt/opennms/bin/gp-chknfs.pl"/>
<parameter key="banner" value="SUCCESS" />
<parameter key="timeout" value="3000" />
<parameter key="retry" value="1" />
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="nfs"/>
<parameter key="ds-name" value="nfs"/>
</service>

<monitor service="NFS" class-name="org.opennms.netmgt.poller.monitors.GpMonitor"/>

As described in http://www.opennms.org/wiki/GeneralPurposePoller, the timeout should be changed from milli-seconds to seconds when passed to the script.

When run by capsd, the timeout (as logged to syslog) is correctly converted from 10000ms to 10s:

Oct 21 21:47:22 brie gp-chknfs[25022]: gp-chknfs timeout set to '10'.

When run by poller, the timeout is not converted from 3000ms to 3s:

Oct 21 22:02:40 brie gp-chknfs[27337]: gp-chknfs timeout set to '3000'.

This is leading to lock contention in the poller:

[PollerScheduler-30 Pool-fiber23] PollableService: Postponing poll for 539:10.142.2.90:NFS because org.opennms.netmgt.poller.pollables.LockUnavailable: Unable to obtain lock for 539 before timeout

Environment

Operating System: Linux Platform: PC

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Benjamin Reed March 8, 2010 at 4:18 PM

merged to 1.6:
[1.6 f921afa] Fixed by fixing time unit conversion. http://bugzilla.opennms.org/show_bug.cgi?id=3401
...and 1.6.10:
[rc/stable/1.6.10 de4fa44] Fixed by fixing time unit conversion. http://bugzilla.opennms.org/show_bug.cgi?id=3401

Seth Leger (community account) March 1, 2010 at 6:01 PM

Har, found the offending line that is causing this:

From org.opennms.core.utils.TimeoutTracker():
m_timeoutInSeconds = Math.max(1L, TimeUnit.SECONDS.convert(m_timeoutInMillis, TimeUnit.SECONDS));

So it converts a value in seconds to a value in seconds. Which doesn't do anything! So the original value doesn't change and since it's in milliseconds, you get the problem in this bug. Great catch!

The fix for this should make it into the next 1.7 release.

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

PagerDuty

Created October 22, 2009 at 1:18 AM
Updated January 27, 2017 at 4:25 PM
Resolved March 8, 2010 at 4:18 PM