Check for Protocol Error Cause and Resolve
Description
Acceptance / Success Criteria
Attachments
Activity
Chandra Gorantla 4 days ago
There is no need to alter grpc_read_timeout
settings on nginx
after the Server adds gRPC keepAlives.
Tahir Abbasi 4 days agoEdited
I increased the "grpc_read_timeout, grpc_send_timeout, and client_body_timeout" step by step (increase 60 seconds for every test) from the default 60 seconds to 120 seconds, 180 seconds, 240 seconds, 300 seconds, 360 seconds, 420s ,480s, 540s, 600s seconds and so on.Frequency of the error decreased after 600 seconds but error was still showing for me. However, the protocol error did not reproduce at 600 seconds.
CPU and Memory Usage (Max)
opennms-bsm 54m 915Mi
opennms-nms-inventory 43m 905Mi
Chandra Gorantla 4 days agoEdited
Should be resolved by
Tried with
grpc_read_timeout
as 300s - still times out after 5 mins although there is data flowing with Heartbeat stream for every 1 min.With some digging found that gRPC keepAlives will send PING frames to keep proxy connections alive. Although these keepAlive frames can be sent from Server/Client, in our case these are only needed on Server.
Morteza last weekEdited
Do we need all of these 3 solutions applied to resolve the issue?
How much cpu and memory is used by SPOG and BSM services after we removed the limits? Does this limit changes with the number of Meridian’s connected to the services ? Does the number of nodes monitored by Meridian’s impact the resource utilization?
Tahir Abbasi last weekEdited
To fix the "PROTOCOL_ERROR" , I have done a few things.
Updated configuration ingress , increased timeouts for reading and sending data,
Removed CPU and memory resource limits for both BSM and SPOG.
Removed the JAVA_TOOL_OPTIONS environment variable from both BSM and SPOG deployments.
After this, the error is no longer showing for me.
can you please test it for around 30 minutes , branch “ta/jira/LOK-3209”
Details
Assignee
Tahir AbbasiTahir AbbasiReporter
Naeem AfzalNaeem AfzalSprint
Fix versions
Priority
High
Details
Details
Assignee
Reporter
Sprint
Fix versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty
