Unable provisioned node when SNMP Agent is down

Description

Here is the scenario. I have a DSL line at my house using DHCP. About three or four times a year the IP address changes. In order to maintain the node in OpenNMS I use a provisioning group.

When the IP address changes, I go in to the provisioning group and modify the existing interface IP. I save my changes and re-import the node. However, the old interface is not removed.

The problem is that I always forget to add the proper SNMP community string for the new address, so SNMP fails. Once the SNMP community string is in place, provisiond works properly.

The error message:

2009-11-07 09:35:35,730 INFO [pool-2-thread-4] NodeScan: Aborting Scan of node 85 for the following reason: Aborting node scan : Agent timedout while scanning the system table
2009-11-07 09:35:35,758 DEBUG [pool-4-thread-2] NodeScan: Finished scanning node (SortovaFarm/1238625352367)
2009-11-07 09:35:35,854 INFO [pool-2-thread-5] NodeScan: Aborting Scan of node 80 for the following reason: Aborting node scan : Agent timedout while scanning the system table
2009-11-07 09:35:35,863 DEBUG [pool-4-thread-8] NodeScan: Finished scanning node (SortovaFarm/1196974970537)

Is this something that we should fix? If there is a way to know that an IP address was added manually via provisiond, I think we should remove it regardless of SNMP. However, if there is no way to know that an address was manually added then the lack of SNMP support screws us (since we can't know that the address is no longer there) so there isn't really anything we can do except maybe throw a warning event that SNMP failed so the IP address table may be incorrect.

Environment

Operating System: All Platform: PC

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Matt Brozowski September 19, 2011 at 6:03 PM

I don't really understand Test Case #2 well enough to determine the expected behavior

I am going to resolve this issue but please feel free to reopen it or open another issue with an improved description for Test Case #2 if you'd like to

Matt Brozowski September 19, 2011 at 6:01 PM

I think maybe more than one issue has been reported against the same bug here but I will try to address each of these.

The last one.. unable to delete a node by removing it from the provisioning group appears to be working for me. I think there may have been a problem with this for a short time over the winter but it is certainly working now (in fact it is now covered in the smoke tests)

In the Test Case#1 above I think the correct answer is to use a different 'best practice' rather than requisitioning the SNMP service you should let provisiond detect it. In this way if there SNMP never become reachable on that interface than it is never added. Switching to a different address with then result in the original address being removed. By provisioning SNMP you are 'in effect' saying.. "I want SNMP to be the definitive list of addresses on this device" And so it will struggle until it can actually get that list.

In the DHCP management mentioned above, if you don't provision the SNMP service, then switching the ip address will initially add new address during the import phase and then delete the other addresses during the scan phase

Tarus Balog December 1, 2010 at 3:56 PM

Hit this bug at a client site. A device was taking out of service, marking it as down. It contained the ICMP and SNMP services. The device was removed from the requisition. When the group was synchronized it was not deleted as the following error event appeared:

The Node with Id: 5778; ForeignSource: _Tech 2; ForeignId:13962 has aborted for the following reason: Aborting node scan : Agent timedout while scanning the system table

My belief is that the node should be deletable even if it is down, since that will be a common occurrence.

Tarus Balog August 11, 2010 at 12:09 PM

Okay, this is still an issue, this time at a client site.

Here is the scenario:

The client often moves IP addresses onto new gear, but they want to keep the old nodes around for historical purposes. We discovered a couple of issues with this.

Test Case #1:

Provision a node with the IP address 2.2.2.2. Add ICMP and SNMP services.
Once the node is created, change the IP address to 3.3.3.3. Synchronize.
The provisioning will fail with the error:

Aborting node scan : Agent timedout while scanning the system table

since this whole node is made up and not reachable. I would like to catch this exception and if it fails, just use the interfaces in the requisition.

This could cause problems if a real SNMP agent is down when this is attempted, but since I don't believe we remove collected data when deleting an interface, once the agent is back up a resync should fix it.

Test Case #2:

In the provisiond node above, remove the interface information. This should still keep the node around, just without services. Since the client is mainly interested in node level data, this will allow them to keep the node around.

When deleting a primary interface from a node, the same exception above will occur. While it is arguable that in the first case we shouldn't delete interfaces, in the second case, when we are trying to delete the interface, a scan failure shouldn't matter.

The second scenario is more important to the client than the first at the moment, but this whole bug is something we should address.

Fixed

Details

Assignee

Reporter

Labels

Components

Fix versions

Affects versions

Priority

PagerDuty

Created November 7, 2009 at 10:11 AM
Updated January 27, 2017 at 4:26 PM
Resolved September 19, 2011 at 6:03 PM