define the order in which notifications are sent

Description

Customer reported notifications being processed in an incorrect order.

Basically he was seeing "up" notifications before he received the "down" ones.

 

Here is the final result fo ticket:

"Howdy, I'm working with John on this issue. Basically, this is a tough problem to solve. Here's why:

Notification tasks are put on a queue, as AG described, and every 20s we iterate over the tasks looking for tasks that need to be run. When we do that, a thread is created for each task in the order that they were put on the queue. The problem is that there is no guarantee that they will be executed in the same order since we are dependent on the queue fairness of threads from the kernel.

My suggestion would be to a) see if there is a way to configure notifd to correlate the ups with the downs when sending out Up notifications and to not configured specific notifications for each up event; as we do for many of the polling notifications such as the nodeUp/nodeDown:

<auto-acknowledge resolution-prefix="RESOLVED: " uei="uei.opennms.org/nodes/nodeUp" acknowledge="uei.opennms.org/nodes/nodeDown">
<match>nodeid</match>
</auto-acknowledge>
<auto-acknowledge resolution-prefix="RESOLVED: " uei="uei.opennms.org/correlation/remote/wideSpreadOutageResolved" acknowledge="uei.opennms.org/correlation/remote/wideSpreadOutage">
<match>nodeid</match>
<match>interfaceid</match>
<match>serviceid</match>
</auto-acknowledge>

b) one other thing "could" try would be to change the granularity of the queue processing thread to say, 5 seconds,

<queue>
<queue-id>default</queue-id>
<interval>5s</interval>
<handler-class>
<name>org.opennms.netmgt.notifd.DefaultQueueHandler</name>
</handler-class>
</queue>

The problem is that we "could" make the queue processing single threaded and that would guarantee sending notifications in the correct order but if a task got held up or was long running, it would block all other notifications.

I agree with AG that we need to improve the queue handling in Notifd, it's really bad, but, we will always fight the fairness of threads getting time from the Kernel when many notifications need to be sent at the same time or very near the same time."

 

.....

Environment

https://mynms.opennms.com/Ticket/Display.html?id=4562

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Details

Assignee

Reporter

Priority

PagerDuty

Created December 4, 2017 at 8:11 PM
Updated September 21, 2021 at 9:15 PM

Flag notifications