Node Outage Model Broken in 1.6.5

Description

There is some serious weirdness with OpenNMS if the node outage model is set to "off".

With this set to off, no interfaceDown or nodeDown events should be sent. Instead, individual nodeLostService events should be sent.

However, it was discovered that even with it set to "off" nodeDown messages are being sent. From the xmlrpc log:

2009-09-25 19:31:00,778 DEBUG [EventQueueProcessor] XmlRpcNotifier: Start to set up communication to XMLRPC server: http://onms2core.core.example.com:8000
2009-09-25 19:31:00,778 DEBUG [EventQueueProcessor] XmlRpcNotifier: Setting timeout value to: 60000
2009-09-25 19:31:00,869 DEBUG [EventQueueProcessor] XmlRpcNotifier: Response from XMLRPC server: http://onms2core.core.example.com:8000
notifyReceivedEvent: (message: "test connection", uei: "uei.opennms.org/internal/capsd/xmlrpcNotification", txNo: "0")
2009-09-25 19:31:00,869 DEBUG [EventQueueProcessor] EventQueueProcessor: About to process event: uei.opennms.org/nodes/nodeDown
2009-09-25 19:31:00,869 DEBUG [EventQueueProcessor] EventQueueProcessor: Event
uei uei.opennms.org/nodes/nodeDown
eventid 13957619
nodeid 24452
ipaddr null
service null
eventtime Friday, September 25, 2009 8:36:24 PM GMT
2009-09-25 19:31:00,869 DEBUG [EventQueueProcessor] XmlRpcNotifier: getNodeLabel: retrieve node label for: 24452
2009-09-25 19:31:00,870 DEBUG [EventQueueProcessor] XmlRpcNotifier: getNodeLabel: retrieved node label '247578' for: 24452
2009-09-25 19:31:00,954 WARN [EventQueueProcessor] XmlRpcNotifier: Failed to send message to XMLRPC server http://onms2core.core.example.com:8000: org.apache.xmlrpc.XmlRpcException: exceptions.AttributeError:'Legacy' object has no attribute 'sendNodeDownEvent'
org.apache.xmlrpc.XmlRpcException: exceptions.AttributeError:'Legacy' object has no attribute 'sendNodeDownEvent'
at org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpcClientResponseProcessor.java:104)
at org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcClientResponseProcessor.java:71)
at org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:73)
at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:194)
at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:185)
at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:178)
at org.opennms.netmgt.xmlrpcd.XmlRpcNotifier.sendXmlrpcRequest(XmlRpcNotifier.java:594)
at org.opennms.netmgt.xmlrpcd.XmlRpcNotifier.sendNodeDownEvent(XmlRpcNotifier.java:397)
at org.opennms.netmgt.xmlrpcd.EventQueueProcessor.processEvent(EventQueueProcessor.java:160)
at org.opennms.netmgt.xmlrpcd.EventQueueProcessor.run(EventQueueProcessor.java:464)
at java.lang.Thread.run(Thread.java:595)

The error is due to the fact that the remote server does not recognize nodeDown events.

Here are the events from the DB:

  1. select eventid, eventtime, eventuei, nodeid, ipaddr, serviceid from events where nodeid=24452;
    eventid | eventtime | eventuei | nodeid | ipaddr | serviceid
    -----------------------------------------------------------------------------------------------------+-----------
    13766171 | 2009-09-18 09:45:30-04 | uei.opennms.org/nodes/nodeAdded | 24452 | |
    13766172 | 2009-09-18 09:45:30-04 | uei.opennms.org/nodes/nodeGainedInterface | 24452 | 10.232.150.50 |
    13957024 | 2009-09-25 16:21:21-04 | uei.opennms.org/nodes/nodeGainedService | 24452 | 10.232.150.50 | 1
    13957068 | 2009-09-25 16:21:43-04 | uei.opennms.org/nodes/nodeLostService | 24452 | 10.232.150.50 | 1
    13957609 | 2009-09-25 16:36:22-04 | uei.opennms.org/nodes/nodeGainedService | 24452 | 10.232.150.50 | 8
    13957619 | 2009-09-25 16:36:24-04 | uei.opennms.org/nodes/nodeDown | 24452 | |
    13958498 | 2009-09-25 17:30:32-04 | uei.opennms.org/nodes/nodeRegainedService | 24452 | 10.232.150.50 | 1
    13960519 | 2009-09-25 19:37:39-04 | uei.opennms.org/nodes/nodeRegainedService | 24452 | 10.232.150.50 | 8
    13960520 | 2009-09-25 19:37:39-04 | uei.opennms.org/nodes/nodeRegainedService | 24452 | 10.232.150.50 | 1

My guess is that the nodeDown should have been a nodeLostService for serviceid 1 (ICMP).

-T

Environment

Operating System: All Platform: PC

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Seth Leger (community account) March 22, 2010 at 2:43 PM

Matt thinks that this is probably inside the code that executes during startup when it is looking at the database contents and sending events based on the open outages.

Details

Assignee

Reporter

Labels

Components

Affects versions

Priority

PagerDuty

Created September 25, 2009 at 11:58 PM
Updated September 21, 2021 at 6:23 PM