Snmp Monitor reports Down when the Agent is not responding

Description

This issue comes from an interesting discussion we have had privately an onto discussion list.

We do not have service hierarchy in Poller.

The only one is that established by what we called critical service.

We should address some use cases when try to use the really awesome SnmpMonitor to for example verify process running on similar facts:
The question is: what to do when the Agent is Down?

Clearly here we have a dependency from the availability of the Snmp agent on the remote host and if for example I'm monitoring some process using process table or some net snmp mib extention I can report that the service is down while it is just running! It is only the agent not responding.

When you use the SnmpMonitor to get a specific oid and a given value you just assert that there is a monitored service that depends on the snmp agent availability.
If the agent is down you are not aware of the Service Status.
Actually the SnmpMonitor when you got a null SnmpValue that means no response from Agent create a status unavailable. This is not true, instead is true if you are monitoring
the SNMP protocol it is not true when you are monitoring something elese using the SNMP protocol.

It seems to me that the status should be "unresponsive" in this case.
Of course that won't have any effect unless the poller config has
serviceUnresponsiveEnabled="true" set.

We suggest to set up a new parameter "ignore-unresponsive-agent" with a default value of
"false"

Maybe unknown is better then unresponsive!

This is a dependency trouble.

Regards

Antonio

Environment

all

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Antonio Russo July 9, 2012 at 9:22 AM

Martin Lärcher June 27, 2012 at 9:05 AM

This also in OpenNMS version 1.10.3.
We get a lot of Service-Outages if node are reachable but snmp are down (firewall, congestion etc.) for services monitored with SNMP-Monitor.
This event logmsg appear:
<Service> outage identified on interface x.x.x.x with reason code: Unknown.

So "reason code: Unknown" should be not detected as outage.

Is this a duplicate of NMS-3508?

Duplicate

Details

Assignee

Reporter

Fix versions

Affects versions

Priority

PagerDuty

Created April 26, 2011 at 3:02 PM
Updated January 27, 2017 at 4:20 PM
Resolved July 9, 2012 at 9:22 AM

Flag notifications