The %dpname% breaks the alarm life-cycle when having multiple minions per location
Description
Acceptance / Success Criteria
Lucidchart Diagrams
Activity

Jesse White June 22, 2021 at 5:14 PM
PR for develop/28.0.0: https://github.com/OpenNMS/opennms/pull/3369

Christian Pape April 1, 2021 at 1:46 PM
Merged.

Christian Pape March 26, 2021 at 11:02 AM
Please review:
The plan is as follows:
add flag to clear the %dpname% in foundation-2020, default is false
remove the flag in develop and always clear %dpname%
So, when this one is merged I'll add another PR to make the changes in develop.

Alejandro Galue March 22, 2021 at 2:50 PM
I like the idea of using the location name. Certainly, that'd be a breaking change for existing installations and might need to have an "upgrade task" to fix the existing alarms in the database, but definitively something to consider.

Jesse White March 22, 2021 at 2:43 PM
Here's where the value of the attribute is resolved: https://github.com/OpenNMS/opennms/blob/opennms-27.1.0-1/features/events/daemon/src/main/java/org/opennms/netmgt/eventd/StandardExpandableParameterResolvers.java#L104
Details
Details
Assignee

Reporter

Labels
Components
Sprint
Fix versions
Affects versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty

When using Minions, it is common to have more than one per location to guarantee that requests to a given location will always be processed, even if one Minion fails, or to split load when processing lots of Syslog messages Traps, or Flows.
Historically, a placeholder called %dpname% is part of the reduction-key and clear-key for all the event definitions with alarm-data.
When there are Minions involved, that placeholder is replaced at runtime with the Minion ID, which can lead to problems with the alarm life cycle, breaking the logic to decide whether or not an alarm should be cleared.
For instance, one Minion can receive a trigger event, and its sibling can receive the corresponding rearm event. Then, the alarms generated will have different content for the %dpname%, and because of that, there is no way that the trigger event will be cleared because the clear-key of the rearm event will never match the reduction-key of the trigger event.
A solution could be removing that placeholder from all the event definitions with alarm-data, but that will make upgrades complicated, as essentially all the event definition files will appear as changed.
Also, there might be scenarios when having %dpname% makes sense.
For this reason, I believe it would be useful to have a flag to ignore the %dpname%, so for the users that are affected due to the broken alarm life-cycle can use OpenNMS the way it works when Minions are not involved while keeping compatibility with the current behavior, in case, there is a justification for it.