When run by capsd, the timeout (as logged to syslog) is correctly converted from 10000ms to 10s:
Oct 21 21:47:22 brie gp-chknfs[25022]: gp-chknfs timeout set to '10'.
When run by poller, the timeout is not converted from 3000ms to 3s:
Oct 21 22:02:40 brie gp-chknfs[27337]: gp-chknfs timeout set to '3000'.
This is leading to lock contention in the poller:
[PollerScheduler-30 Pool-fiber23] PollableService: Postponing poll for 539:10.142.2.90:NFS because org.opennms.netmgt.poller.pollables.LockUnavailable: Unable to obtain lock for 539 before timeout
Seth Leger (community account) March 1, 2010 at 6:01 PM
Har, found the offending line that is causing this:
From org.opennms.core.utils.TimeoutTracker(): m_timeoutInSeconds = Math.max(1L, TimeUnit.SECONDS.convert(m_timeoutInMillis, TimeUnit.SECONDS));
So it converts a value in seconds to a value in seconds. Which doesn't do anything! So the original value doesn't change and since it's in milliseconds, you get the problem in this bug. Great catch!
The fix for this should make it into the next 1.7 release.
In capsd-configuration.xml, I have the following:
<protocol-plugin protocol="NFS" class-name="org.opennms.netmgt.capsd.plugins.GpPlugin" scan="on" user-defined="true">
<property key="script" value="/opt/opennms/bin/gp-chknfs.pl"/>
<property key="banner" value="SUCCESS" />
<property key="timeout" value="10000" />
<property key="retry" value="1" />
<protocol-configuration scan="on" user-defined="true">
</protocol-plugin>
In poller-configuration.xml, I have the following:
<service name="NFS" interval="270000" user-defined="false" status="on">
<parameter key="script" value="/opt/opennms/bin/gp-chknfs.pl"/>
<parameter key="banner" value="SUCCESS" />
<parameter key="timeout" value="3000" />
<parameter key="retry" value="1" />
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
<parameter key="rrd-base-name" value="nfs"/>
<parameter key="ds-name" value="nfs"/>
</service>
<monitor service="NFS" class-name="org.opennms.netmgt.poller.monitors.GpMonitor"/>
As described in http://www.opennms.org/wiki/GeneralPurposePoller, the timeout should be changed from milli-seconds to seconds when passed to the script.
When run by capsd, the timeout (as logged to syslog) is correctly converted from 10000ms to 10s:
Oct 21 21:47:22 brie gp-chknfs[25022]: gp-chknfs timeout set to '10'.
When run by poller, the timeout is not converted from 3000ms to 3s:
Oct 21 22:02:40 brie gp-chknfs[27337]: gp-chknfs timeout set to '3000'.
This is leading to lock contention in the poller:
[PollerScheduler-30 Pool-fiber23] PollableService: Postponing poll for 539:10.142.2.90:NFS because org.opennms.netmgt.poller.pollables.LockUnavailable: Unable to obtain lock for 539 before timeout