Currently, if the opennms-es-rest event forwarder loses connectivity to our Elasticsearch server, it never tries to re-establish the connection or at least stops sending HTTP requests to Elasticsearch until either OpenNMS is restarted or we re-install the feature from the karaf console. While it is in failed state, it continues to consume from Kafka and commits the offsets.
This can be easily reproduced by enabling firewall on the OpenNMS server and blocking the outgoing Elasticsearch port. Once the opennms-es-rest forwarder has failed, stop the firewall and allow traffic outgoing to Elasticsearch.
It sounds like we also need to add configurable retries to the send operation so that transient outages don't result in dropped messages.
As far as round-robin sends to different Elasticsearch URIs, it doesn't look like that's a feature of the Jest library that we use so we'll have to write support for that. I'll open a separate issue for that.
Tim Fite March 11, 2017 at 1:30 PM
Do you have an ETA on when this might be fixed?
Tim Fite March 11, 2017 at 12:38 PM
Yep, that is the exception we have been seeing. As a possible future enhancement, it might help if the elasticsearchUrl could take a comma delimited list of elasticsearch nodes that it could distribute the HTTP calls across.
Seth Leger March 10, 2017 at 11:02 AM
It appears that a single exception is thrown and then processing stops:
Currently, if the opennms-es-rest event forwarder loses connectivity to our Elasticsearch server, it never tries to re-establish the connection or at least stops sending HTTP requests to Elasticsearch until either OpenNMS is restarted or we re-install the feature from the karaf console. While it is in failed state, it continues to consume from Kafka and commits the offsets.
This can be easily reproduced by enabling firewall on the OpenNMS server and blocking the outgoing Elasticsearch port. Once the opennms-es-rest forwarder has failed, stop the firewall and allow traffic outgoing to Elasticsearch.