Check for Protocol Error Cause and Resolve

Description

Acceptance / Success Criteria

None

Attachments

2

Activity

Show:

Chandra Gorantla 4 days ago

There is no need to alter grpc_read_timeout settings on nginx after the Server adds gRPC keepAlives.

Tahir Abbasi 4 days ago
Edited

I increased the "grpc_read_timeout, grpc_send_timeout, and client_body_timeout" step by step (increase 60 seconds for every test) from the default 60 seconds to 120 seconds, 180 seconds, 240 seconds, 300 seconds, 360 seconds, 420s ,480s, 540s, 600s seconds and so on.Frequency of the error decreased after 600 seconds but error was still showing for me. However, the protocol error did not reproduce at 600 seconds.

CPU and Memory Usage (Max)
opennms-bsm 54m 915Mi
opennms-nms-inventory 43m 905Mi

Chandra Gorantla 4 days ago
Edited

Should be resolved by

  1. Tried with grpc_read_timeout as 300s - still times out after 5 mins although there is data flowing with Heartbeat stream for every 1 min.

  2. With some digging found that gRPC keepAlives will send PING frames to keep proxy connections alive. Although these keepAlive frames can be sent from Server/Client, in our case these are only needed on Server.

Morteza last week
Edited

  • Do we need all of these 3 solutions applied to resolve the issue?

  • How much cpu and memory is used by SPOG and BSM services after we removed the limits? Does this limit changes with the number of Meridian’s connected to the services ? Does the number of nodes monitored by Meridian’s impact the resource utilization?

Tahir Abbasi last week
Edited

To fix the "PROTOCOL_ERROR" , I have done a few things.

  1. Updated configuration ingress , increased timeouts for reading and sending data,

  1. Removed CPU and memory resource limits for both BSM and SPOG.

  2. Removed the JAVA_TOOL_OPTIONS environment variable from both BSM and SPOG deployments.

After this, the error is no longer showing for me.
can you please test it for around 30 minutes , branch “ta/jira/LOK-3209”

Details

Assignee

Reporter

Fix versions

Priority

PagerDuty

Created March 4, 2025 at 10:44 AM
Updated 4 days ago