Details
Assignee
UnassignedUnassignedReporter
John BlakeJohn BlakeComponents
Priority
Minor
Details
Details
Assignee
Unassigned
UnassignedReporter
John Blake
John BlakeComponents
Priority
PagerDuty
PagerDuty
PagerDuty
Created December 4, 2017 at 8:11 PM
Updated September 21, 2021 at 9:15 PM
Customer reported notifications being processed in an incorrect order.
Basically he was seeing "up" notifications before he received the "down" ones.
Here is the final result fo ticket:
"Howdy, I'm working with John on this issue. Basically, this is a tough problem to solve. Here's why:
Notification tasks are put on a queue, as AG described, and every 20s we iterate over the tasks looking for tasks that need to be run. When we do that, a thread is created for each task in the order that they were put on the queue. The problem is that there is no guarantee that they will be executed in the same order since we are dependent on the queue fairness of threads from the kernel.
My suggestion would be to a) see if there is a way to configure notifd to correlate the ups with the downs when sending out Up notifications and to not configured specific notifications for each up event; as we do for many of the polling notifications such as the nodeUp/nodeDown:
<auto-acknowledge resolution-prefix="RESOLVED: " uei="uei.opennms.org/nodes/nodeUp" acknowledge="uei.opennms.org/nodes/nodeDown">
<match>nodeid</match>
</auto-acknowledge>
<auto-acknowledge resolution-prefix="RESOLVED: " uei="uei.opennms.org/correlation/remote/wideSpreadOutageResolved" acknowledge="uei.opennms.org/correlation/remote/wideSpreadOutage">
<match>nodeid</match>
<match>interfaceid</match>
<match>serviceid</match>
</auto-acknowledge>
b) one other thing "could" try would be to change the granularity of the queue processing thread to say, 5 seconds,
<queue>
<queue-id>default</queue-id>
<interval>5s</interval>
<handler-class>
<name>org.opennms.netmgt.notifd.DefaultQueueHandler</name>
</handler-class>
</queue>
The problem is that we "could" make the queue processing single threaded and that would guarantee sending notifications in the correct order but if a task got held up or was long running, it would block all other notifications.
I agree with AG that we need to improve the queue handling in Notifd, it's really bad, but, we will always fight the fairness of threads getting time from the Kernel when many notifications need to be sent at the same time or very near the same time."
.....