NodeCategorySettingPolicy hit momentarily resolves open outages

Description

This issue created from report in support ticket https://mynms.opennms.com/Ticket/Display.html?id=3239

Steps to reproduce:

1. Create a requisition called "test". Create a node in that requisition called "test-sw-1". Add an interface to this node whose IP address is not reachable with snmp-primary=N. Add the ICMP service to that interface. Save the requisition.

2. Edit the foreign-source definition for "test". Add a policy called "Categorize", class "Set Node Category", key "category" => value "Switches". Match behavior = ALL_PARAMETERS. Add parameter key "label" => value "~.sw.".

3. Synchronize the "test" requisition. Wait for the test-sw-1 node to be created, populated, and to go into a nodeDown outage. Wait another moment for good measure.

4. Synchronize the "test" requisition again.

Expected result: nothing but maybe a nodeUpdated event on the test-sw-1 node

Actual result: new nodeDown outage created for the test-sw-1 node, and the previous outage was closed with no nodeUp event.

Additional remarks:

If I add a second node whose node label does not match the regex in the category-setting policy, that node's outages do not get summarily closed and re-created.

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Benjamin Reed September 25, 2014 at 7:50 AM

OK, this is fixed now. The event-handling for category and asset events was unconditionally removing outages, expecting them to be recreated on the next scan, I guess?

I modified the code to check each service individually to see if it still matched active filters, and add/remove from scanning as necessary.

Benjamin Reed September 23, 2014 at 5:46 PM

OK, I had been doing this using the 'catinc' stuff and it appeared to be behaving properly. However, I just did it with a default poller configuration and I can confirm that this is still an issue. Provisiond is doing the right thing, but Pollerd or the outage service are not.

Benjamin Reed September 22, 2014 at 10:07 PM

Provisiond has been fixed to not delete and add categories in phases, which repairs the issues with outages being cleared.

For details on the new design, see:

https://github.com/OpenNMS/opennms/blob/rc/stable/1.14.0/opennms-provision/opennms-provisiond/design.markdown#category-lifecycle

Jeff Gehlbach September 12, 2014 at 12:38 PM
Edited

Very important note: when I've discussed this problem previously in conversation, I've assumed that reproducing it would require having a poller package whose filter keys on node category memberships, like:

<filter>catincSwitches</filter>

This turns out not to be the case. I was able to reproduce the problem using the stock poller-configuration.xml.

Fixed

Details

Assignee

Reporter

Fix versions

Affects versions

Priority

PagerDuty

Created September 12, 2014 at 12:15 PM
Updated September 25, 2014 at 11:43 AM
Resolved September 25, 2014 at 7:50 AM