Collectd does not Unschedule Deleted Nodes

Description

After deleting a node that has been deleted from OpenNMS using the Provisioning Requisition the collectd service is still collecting the information from the deleted node. It is not until we stop the services, for example SNMP, that the data collection stops but OpenNMS is still trying to run collections

Below is a sample output from our collectd.log, with the IP addressed masked:

2014-06-30 13:30:05,300 WARN [Collectd-Thread-30-of-50] CollectableService: run: failed collection for xx.xx.xx.xx/SNMP/example1
2014-06-30 13:30:05,300 ERROR [Collectd-Thread-30-of-50] CollectableService: Timeout retrieving SnmpCollectors for xx.xx.xx.xx for /xx.xx.xx.xx: SnmpCollectors for xx.xx.xx.xx: snmpTimeoutError for: /xx.xx.xx.xx
org.opennms.netmgt.collectd.CollectionTimedOut: Timeout retrieving SnmpCollectors for xx.xx.xx.xx for /xx.xx.xx.xx: SnmpCollectors for xx.xx.xx.xx: snmpTimeoutError for: /xx.xx.xx.xx
at org.opennms.netmgt.collectd.SnmpCollectionSet.verifySuccessfulWalk(SnmpCollectionSet.java:369)
at org.opennms.netmgt.collectd.SnmpCollectionSet.collect(SnmpCollectionSet.java:394)
at org.opennms.netmgt.collectd.SnmpCollector.collect(SnmpCollector.java:349)
at org.opennms.netmgt.collectd.CollectionSpecification.collect(CollectionSpecification.java:264)
at org.opennms.netmgt.collectd.CollectableService.doCollection(CollectableService.java:364)
at org.opennms.netmgt.collectd.CollectableService.run(CollectableService.java:298)
at org.opennms.netmgt.scheduler.LegacyScheduler$1.run(LegacyScheduler.java:201)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at org.opennms.core.concurrent.LogPreservingThreadFactory$3.run(LogPreservingThreadFactory.java:107)
at java.lang.Thread.run(Thread.java:724)

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Seth Leger April 8, 2015 at 4:12 PM

I suspect that this is a duplicate of bug https://opennms.atlassian.net/browse/NMS-6226#icft=NMS-6226 which was resolved in version 14.0.0. Corey, can you retest this problem in version 14.0.0 or higher? If it still occurs, please reopen this bug. Thanks!

Corey Hammerton July 2, 2014 at 2:18 PM

Would it be at all possible to run with the following flow example?

  • a nodeDeleted event was received but waits for all interfaces (IP and/or SNMP) to be deleted before deleting the node

  • an InterfaceDeleted event was received but waits for all monitored services to be deleted

  • a serviceDeleted event was received and it operates as normal AND removing it from the pollerd and collectd schedulers

Corey Hammerton June 30, 2014 at 2:00 PM

The same would apply for removing services configured for collectd using provisioning requisitions

Corey Hammerton June 30, 2014 at 1:33 PM

A ticket has been opened previously for the same issue but has been marked as resolved.

Fixed

Details

Assignee

Reporter

Fix versions

Affects versions

Priority

PagerDuty

Created June 30, 2014 at 1:31 PM
Updated April 8, 2015 at 4:12 PM
Resolved April 8, 2015 at 4:12 PM