OpenNMS Loses Events if Elasticsearch is down

Description

Currently, if the opennms-es-rest event forwarder loses connectivity to our Elasticsearch server, it never tries to re-establish the connection or at least stops sending HTTP requests to Elasticsearch until either OpenNMS is restarted or we re-install the feature from the karaf console. While it is in failed state, it continues to consume from Kafka and commits the offsets.

This can be easily reproduced by enabling firewall on the OpenNMS server and blocking the outgoing Elasticsearch port. Once the opennms-es-rest forwarder has failed, stop the firewall and allow traffic outgoing to Elasticsearch.

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Jesse White March 29, 2017 at 8:35 AM

Seth Leger March 13, 2017 at 10:42 AM

This commit will fix the problem where further attempts to send to Elasticsearch fail until the feature (or OpenNMS) is restarted:

https://github.com/OpenNMS/opennms/commit/9fb5b7b91c6660d14a7c35a61e130f5499c537b6

It sounds like we also need to add configurable retries to the send operation so that transient outages don't result in dropped messages.

As far as round-robin sends to different Elasticsearch URIs, it doesn't look like that's a feature of the Jest library that we use so we'll have to write support for that. I'll open a separate issue for that.

Tim Fite March 11, 2017 at 1:30 PM

Do you have an ETA on when this might be fixed?

Tim Fite March 11, 2017 at 12:38 PM

Yep, that is the exception we have been seeing. As a possible future enhancement, it might help if the elasticsearchUrl could take a comma delimited list of elasticsearch nodes that it could distribute the HTTP calls across.

Seth Leger March 10, 2017 at 11:02 AM

It appears that a single exception is thrown and then processing stops:

Fixed

Details

Assignee

Reporter

Components

Sprint

Fix versions

Affects versions

Priority

PagerDuty

Created March 7, 2017 at 7:23 PM
Updated March 29, 2017 at 10:39 AM
Resolved March 29, 2017 at 8:35 AM