NullPointerException during nodeScan on devices with broken IP-MIB::ipAddressIfIndex

Description

OpenNMS is unable to finish scanning nodes with broken / missing IP-MIB::ipAddressIfIndex entries, and will retry forever without ever marking the node as scanned. This has been confirmed on OpenNMS 19.x and 20.0, but it's probably been the case in older versions as well.

The underlying cause - in my situation, at least - is a presumably broken SNMP implementation on specific equipment / versions. I'm only interested in the primary SNMP interface's IP address, which I provision through a foreign requisition, so the problem with the missing ipAddressIfIndex data is entirely harmless in my environment.

Unfortunately, IPAddressTableTracker fails to check whether it received any data from ipAddressIfIndex before attempting to parse the result, and because of this, the node scan is aborted due to a NullPointerException:
"The Node with Id: 1; ForeignSource: Dev; ForeignId:testdevice has aborted for the following reason: Aborting node scan : Agent failed while scanning the IP address tables : java.lang.NullPointerException"

I've attached a sample of the relevant snmp data... or rather, the data that is missing, along with the relevant debug logs from provisiond.log.

I'm also attaching a tiny patch with a quick and easy fix, primarily to pinpoint the problem and offer a solution along with my bug report. If the patch is used as-is, that's perfectly fine by me, since it solves the problem in my dev environment, by adding an extra warning before returning null:

2017-06-09 20:24:12,919 WARN [DefaultUDPTransportMapping_0.0.0.0/0] o.o.n.p.s.IPAddressTableTracker: BAD AGENT: Device is missing IP-MIB::ipAddressIfIndex. Skipping.
2017-06-09 20:24:12,919 INFO [DefaultUDPTransportMapping_0.0.0.0/0] o.o.n.p.s.NodeScan: Processing IPAddress table row with ipAddr null

This way, OpenNMS is able to successfully detect all services, snmp interfaces and finally update the node's lastcapsdpoll timestamp.

Acceptance / Success Criteria

None

Attachments

3

Lucidchart Diagrams

Activity

Show:

Jeff Gehlbach June 19, 2017 at 10:51 AM

I would like to see this fix go as far back as possible. If it can be cleanly applied to the foundation branch, let's do that. Otherwise foundation-2016 or foundation-2017.

Brynjar Eide June 19, 2017 at 6:54 AM

I did a recommit, based on the latest commit in release-20.0.1 before changing the base branch from develop to release-20.0.1. It looks alright to me, but I'm happy to change anything if my solution turns out to have any drawbacks.
Thanks again for your quick responses and helpful feedback!

Brynjar Eide June 19, 2017 at 6:20 AM

I see. I would definitely like to see this in release-20.0.1, so thank you for your explanation. Unfortunately, I couldn't just change the merge base to release-20.0.1, as that would include the bump from v20 to v21, but I'll see if I can figure out a way to solve this.

(In the mean time, I'd appreciate any suggestions anyone may have for how to best solve this, as there seem to be hundreds of ways to do anything with Git... and only a few of them would be appropriate in most circumstances.)

Ronny Trommer June 19, 2017 at 2:39 AM

The target version is not correct. The problem, the PR is made against develop which means it will end up in Horizon 21.0.0. If you want this change in 20.0.1 as set in "Fix Verisons" the PR needs to be changed to the "release-20.0.1" branch or somebody has to cherry-pick the PR to the release 20.0.1 when it is merged to develop.

Brynjar Eide June 16, 2017 at 5:19 PM

I just realised that I only linked to JIRA from Github, and never added a link to the original pull request here in JIRA: https://github.com/OpenNMS/opennms/pull/1533
My apologies! I believe we may have duplicate pull requests because of this, and that's perhaps the reason why I need to do an extra sign-off?

In either case, please let me know how to proceed, and I'll try to avoid making even more of a mess with the pull requests.

Fixed

Details

Assignee

Reporter

Labels

Components

Affects versions

Priority

PagerDuty

Created June 9, 2017 at 3:29 PM
Updated September 20, 2017 at 3:32 PM
Resolved June 20, 2017 at 7:24 AM