Minion stops sending flow data into Kafka
Description
Environment
Acceptance / Success Criteria
Lucidchart Diagrams
Activity

Chandra Gorantla April 15, 2019 at 6:53 PM
Closing this as it requires new feature and https://issues.opennms.org/browse/HZN-1531 should resolve this

Chandra Gorantla April 15, 2019 at 6:30 PM
Created https://issues.opennms.org/browse/HZN-1531 for handling large buffers.
For the case of RecordTooLargeException
or any other exception that's not TimeoutException
we should drop the message as this is not non-recoverable.
Handled this here PR: https://github.com/OpenNMS/opennms/pull/2451

Sean Torres April 3, 2019 at 12:40 AM
Looks like there was a similar issue internally for Kafka around this.

Sean Torres April 3, 2019 at 12:34 AM
How about catching this "RecordTooLargeException" in its own catch block.
Count the number of individual messages being bundled for logging purposes (log as warn/debug)
If not a single message, break the message size in "half", and submitting the two new batch messages and break from the loop.
If single message, it will never send so log a FAIL and break instead of looping endlessly and holding resources.
Recursion should handle it enough while keeping batch size large since in this instance its not happening all the time. The count metric in the logs would help for tuning the batch.size per parser.

Sean Torres April 2, 2019 at 11:39 PM
Issue occurred again, connected to the debug port and evaluated the exception:
After some unknown interval, minion fails to send data into Kafka
Below are excerpts from the logs which fill up almost instantly (see zgrep below)