Limit RPC threads on Minion using bulkhead pattern

Description

In case of Provisiond node scan, there is a chance that minion could receive too many RPC request at a given time. Although scanThreads is limited to 10 by default in Provisiond but as the requests from OpenNMS to Minion are async, RPC requests that could originate from one node scan are unlimited when those scans are happening on minion.

To avoid this, we could limit the number of threads on Minion to a configurable limit with a default limit to 1000.

To start with we could implement this for Kafka RPC.

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Chandra Gorantla May 13, 2020 at 1:14 PM

Sean Torres May 12, 2020 at 5:48 PM

This backpressure could potentially cause requests to timeout when issued from source if there is a sufficient backlog. Implementing metrics for bulkhead usage would be useful for troubleshooting. 

Chandra Gorantla January 27, 2020 at 4:43 PM

This is not a thread pool for all Minion operations. Plan is to implement this for RPC requests that are incoming.
Module already creates a separate thread so at the RPC server level we could limit incoming requests to a threshold.
All other requests would wait for execution upon that limit.

David Hustace January 27, 2020 at 12:28 PM

Is there one thread pool for all of Minion operations or is this 1000 setting for a specific thread pool.

Fixed

Details

Assignee

Reporter

Sprint

Fix versions

Priority

PagerDuty

Created November 7, 2019 at 9:01 PM
Updated June 1, 2020 at 12:02 PM
Resolved June 1, 2020 at 12:02 PM