When using a custom prefix, the Elasticsearch Forwarder for events and situation-feedback creates a wrong template.

Description

A customer heavily relies on index prefixes for all the integrations with Elasticsearch because their cluster is shared across multiple different OpenNMS environments.

When this is the case, the template matching is incorrect, leading to something like this:

All the Elasticsearch features in OpenNMS were configured with this:

This confuses the system, and the actual indexes could end up with the wrong template.

The following is the only evidence found in the customer environment proving that the events forwarder is not working:

The karaf.log* files are full of messages like this, as the environment in question processes on average over 300 events per second.

From the initial list, only the alarms are properly defined. Although, depending on race conditions, the alarms template could end up with the events template and vice-versa, meaning all of them must be fixed.

Here is what I would expect to see on a healthy system using a prefix:

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Alejandro Galue January 19, 2021 at 2:20 PM

Are there plans for having the fix ported to Meridian 2020?

Alejandro Galue December 1, 2020 at 6:10 PM
Edited

Let's say you're running H26.2.2.

To fix the JAR files, first, you need to locate them:

You can then use the vim command to access the JAR content, select the JSON file, and then apply the fix manually to the index_patterns array.

For the Events Forwarder, eventsIndexTemplate.es7.json, remove "template" and add:

For the Situations Feedback, feedback-template.json, remove "template" and add:

The above two changes are part of the PR.

Finally, use the sha1sum command to generate a new hash and update the .sha1 files.

Alejandro Galue December 1, 2020 at 5:59 PM

As the problem lies in Elasticsearch as a result of pushing a wrong template, here is how to fix it:

Check the current status:

As you can see, the prefix is "prod_", and the "index_patterns" shows the wrong content.

First, extract the generated template, and fix the index_patterns:

Then, push the updated template to override the wrong one:

Verify again the current state:

Finally, if you currently have an index with the wrong template, delete the index after fixing the template:

A new index will be recreated properly as soon as the next event is pushed to Elasticsearch.

PROBLEM

If OpenNMS is restarted, the template will be overridden with the wrong one inside the bundle JAR, and you go back to the original situation. For this reason, you have to patch the JAR until you're able to upgrade OpenNMS. Unfortunately, Meridian users would have to patch the JAR until the next point release manually.

Alejandro Galue December 1, 2020 at 12:54 PM

Take a look at this:

https://discuss.elastic.co/t/difference-between-index-patterns-and-template/184877

When using a prefix, our client will apply a merge between the JSON file, the Elastic settings from the .cfg files, and the prefix itself when persisting the template to Elastic (to merge mappings and configuration):

https://github.com/OpenNMS/opennms/blob/develop/features/jest/client/src/main/java/org/opennms/features/jest/client/template/DefaultTemplateInitializer.java#L124

Based on my research and tests, this only happens properly on ES7 when using the index_patterns field, not when using the template field; hence, not having the fix, and we already know the consequence.

In other words, there is no justifiable reason for not having this on M2020, and that's assuming these features didn't exist in M2019; as if that's the case, we should fix that there as well (unless ES6 is used there).

Alejandro Galue December 1, 2020 at 12:45 PM

Not having it in foundation-2020 means that Meridian users are severely restricted from using this feature unless they are willing to fix the template in Elasticsearch manually. I can't find a justifiable reason for not applying the fix to M2020, but if there is one, I'd like to know about it to keep that in mind when a customer asks for it.

Fixed

Details

Assignee

Reporter

HB Backlog Status

Components

Sprint

Fix versions

Priority

PagerDuty

Created November 23, 2020 at 7:38 PM
Updated January 19, 2021 at 2:20 PM
Resolved November 30, 2020 at 8:54 PM