Collectd Not Scheduling SNMP Data-collection for some devices

Description

Issue: Collectd Not Scheduling SNMP Datacollection for some devices.

Context:
We have a client, who is facing an issue where some of the devices in their environment have suddenly stopped collecting snmp-data.

We've verified the below:
1. Validated the default datacollection configs 
2. All this is being done to show that opennms is not initiating snmp-collections for certain "problem devices"
a. Did a packet capture anaylsis to check the outgoing traffic from minions which opennms would initiate.. and noticed that there were no "get-bulks" or anything similar being initiated from Minions outbound to devices. [validated the location etc]
b. Same thing when we ran the "opennms:collect" noticed at that time packet captures showed traffic outbound from minions to devices and also datacollection happened as expected during manual collect from karaf.
c. We also ran "opennms:collect" with "-p" flag and that added few data points on the opennms graph as well
d. We modified log4j2.xml to send Collectd logs to a dedicated file a the problem device..
    1. When the nodelabel filter was set to "problem devices" the dedicated collectd logs were empty
    2. When the nodelabel filter was set to "working devices" the dedicate collectd logs were working..

Other notes:
Collectd threads were set to 1500, customer has global-ttls set in snmp-config.xml file to 10mins.
The device which has problems has the default_collect package 10mins and when running the manual collect from karaf it takes around ~1min 40 sec to finish the same

Their collection packages

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Jeff Gehlbach August 23, 2022 at 9:09 PM

Still troubleshooting. If we have more ideas or hotfix JARs to try, please pass them along to reporter. Reporter is writing a tool to help track down the affected node in this particular haystack: https://github.com/Naicisum/opennms-data-scanner

Barring more ideas or hotfix JARs to pass along, hold open with low priority until reporter has more time to invest in troubleshooting.

Details

Assignee

Reporter

HB Backlog Status

FD#

Components

Affects versions

Priority

PagerDuty

Created May 12, 2022 at 1:42 PM
Updated July 26, 2023 at 2:12 PM