Minion stops sending flow data into Kafka

Description

After some unknown interval, minion fails to send data into Kafka
Below are excerpts from the logs which fill up almost instantly (see zgrep below)

Environment

opennms-minion-container.noarch 25.0.0-0.20190228.onms.develop.1644

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Chandra Gorantla April 15, 2019 at 6:53 PM

Closing this as it requires new feature and https://issues.opennms.org/browse/HZN-1531 should resolve this 

Chandra Gorantla April 15, 2019 at 6:30 PM

Created https://issues.opennms.org/browse/HZN-1531 for  handling large buffers.

For the case of  RecordTooLargeException or any other exception that's not TimeoutException  we should drop the message as this is not non-recoverable.

Handled this here PR: https://github.com/OpenNMS/opennms/pull/2451

 

Sean Torres April 3, 2019 at 12:40 AM

Looks like there was a similar issue internally for Kafka around this.

Sean Torres April 3, 2019 at 12:34 AM

How about catching this "RecordTooLargeException" in its own catch block.

Count the number of individual messages being bundled for logging purposes (log as warn/debug)

If not a single message, break the message size in "half", and submitting the two new batch messages and break from the loop.

If single message, it will never send so log a FAIL and break instead of looping endlessly and holding resources.

Recursion should handle it enough while keeping batch size large since in this instance its not happening all the time. The count metric in the logs would help for tuning the batch.size per parser.

Sean Torres April 2, 2019 at 11:39 PM

Issue occurred again, connected to the debug port and evaluated the exception:

Fixed

Details

Assignee

Reporter

Components

Affects versions

Priority

PagerDuty

Created April 2, 2019 at 1:11 AM
Updated October 8, 2019 at 5:23 PM
Resolved April 15, 2019 at 6:53 PM
Loading...