Issues
- Hikari CP leaking threadsNMS-16345Resolved issue: NMS-16345Christian Pape
- unable to find any persister factory from osgi registry, use nullPersister - OpenConfig GNMINMS-16339Resolved issue: NMS-16339Chandra Gorantla
- Unprivileged minion fails on livenessprobeNMS-16327Resolved issue: NMS-16327Morteza
- use-address-from-varbind not honored via MinionNMS-16310Resolved issue: NMS-16310Christian Pape
- Minion java-opts not loading correctlyNMS-16305Resolved issue: NMS-16305Mark Mahacek
- Grafana reports endpoint failureNMS-16275
6 of 6
Hikari CP leaking threads
Fixed
Description
Acceptance / Success Criteria
Setting idleTimeout
should remove old idle connections accordingly
Attachments
4
Details
Assignee
Christian PapeChristian PapeReporter
JianYetJianYetHB Grooming Date
Feb 13, 2024HB Backlog Status
Refined BacklogComponents
Sprint
NoneFix versions
Affects versions
Priority
Major
Details
Details
Assignee
Christian Pape
Christian PapeReporter
JianYet
JianYetHB Grooming Date
Feb 13, 2024
HB Backlog Status
Refined Backlog
Components
Sprint
None
Fix versions
Affects versions
Priority
PagerDuty
PagerDuty
PagerDuty
Created February 8, 2024 at 2:31 PM
Updated May 20, 2024 at 6:34 AM
Resolved March 1, 2024 at 12:44 PM
Activity
Show:
Christian PapeMarch 1, 2024 at 12:44 PM
Merged.
Christian PapeFebruary 28, 2024 at 10:35 AM
Please review:
JianYetFebruary 23, 2024 at 4:50 PM
@Christian Pape DM’d you the log files collected when opennms went unresponsive.
Christian PapeFebruary 23, 2024 at 6:52 AM
@JianYet The mentioned query is part of the audit phase in provisioning. We added there the async reverse lookups in NMS-15776. I want to check whether this introduced this problem.
JianYetFebruary 22, 2024 at 5:16 PM
@Christian Pape Will get the logs for you. What would be the effect w.r.t Hikari CP when setting it to false
. I need to consult the customer first and explain to them.
Resource graph shows that Hikari connection pool is leaking ~1 thread every day. Eventually, it will hit the roof and remain there when you leave opennms running continuously long enough. At that point, opennms GUI will be unresponsive and the performance will be extremely degraded to the extend that it’s not functional. A full service restart is required to restore the service. This is seen in Meridian and Horizon. This was reported by several users.
Horizon 32.0.6
The sawtooth-like trend corresponds to restarts with which the threads state is restored.
Meridian 2023.1.9
Here it was left running long enough with a restart. The active threads continued to climb up to the roof.
They have one thing in common although I cannot attest this is the root cause. They all have JDBC collector configured to collect PostgreSQL stats.
In the case of Meridian when it’s completely degraded, the connections' state retrieved from
pg_stat_activity
table showed there are exactly 50 lingering idle queries and they are identical. See attached xslx file for full dump of the table. Most of them are old connection - more than a day old. I would expect the setting theidleTimeout
in the datasource config would remove the old connections but it doesn’t. In this case, both instances are configured with default value of 600s. This deserves an investigation as well.select onmsnode0_.nodeSysOID as col_0_0_, count(*) as col_1_0_ from node onmsnode0_ left outer join pathOutage onmsnode0_1_ on onmsnode0_.nodeId=onmsnode0_1_.nodeId where onmsnode0_.nodeSysOID is not null group by onmsnode0_.nodeSysOID