Issues

Select view

List view

Detail view

Select search mode

Basic

JQL

6 of 6

Hikari CP leaking threads

Fixed

Description

Resource graph shows that Hikari connection pool is leaking ~1 thread every day. Eventually, it will hit the roof and remain there when you leave opennms running continuously long enough. At that point, opennms GUI will be unresponsive and the performance will be extremely degraded to the extend that it’s not functional. A full service restart is required to restore the service. This is seen in Meridian and Horizon. This was reported by several users.

Horizon 32.0.6

The sawtooth-like trend corresponds to restarts with which the threads state is restored.

Meridian 2023.1.9

Here it was left running long enough with a restart. The active threads continued to climb up to the roof.

They have one thing in common although I cannot attest this is the root cause. They all have JDBC collector configured to collect PostgreSQL stats.

In the case of Meridian when it’s completely degraded, the connections' state retrieved from pg_stat_activity table showed there are exactly 50 lingering idle queries and they are identical. See attached xslx file for full dump of the table. Most of them are old connection - more than a day old. I would expect the setting the idleTimeoutin the datasource config would remove the old connections but it doesn’t. In this case, both instances are configured with default value of 600s. This deserves an investigation as well.

select onmsnode0_.nodeSysOID as col_0_0_, count(*) as col_1_0_ from node onmsnode0_ left outer join pathOutage onmsnode0_1_ on onmsnode0_.nodeId=onmsnode0_1_.nodeId where onmsnode0_.nodeSysOID is not null group by onmsnode0_.nodeSysOID

Acceptance / Success Criteria

Setting idleTimeout should remove old idle connections accordingly

Attachments

08 Feb 2024, 03:32 PM
08 Feb 2024, 02:34 PM
08 Feb 2024, 02:31 PM
08 Feb 2024, 02:31 PM

Details
Assignee
Christian Pape
Reporter
JianYet
Labels
bugfixslasupport
HB Grooming Date
Feb 13, 2024
HB Backlog Status
Refined Backlog
Components
Database
Sprint
None
Fix versions
Meridian-2023.1.14
33.0.2
Affects versions
Meridian-2023.1.12
32.0.6
Priority
Major

PagerDuty

Created February 8, 2024 at 2:31 PM

Updated May 20, 2024 at 6:34 AM

Resolved March 1, 2024 at 12:44 PM

Configure

Activity

Show:

Christian PapeMarch 1, 2024 at 12:44 PM

Merged.

Christian PapeFebruary 28, 2024 at 10:35 AM

Please review:

PR: https://github.com/OpenNMS/opennms/pull/7121

JianYetFebruary 23, 2024 at 4:50 PM

@Christian Pape DM’d you the log files collected when opennms went unresponsive.

Christian PapeFebruary 23, 2024 at 6:52 AM

@JianYet The mentioned query is part of the audit phase in provisioning. We added there the async reverse lookups in NMS-15776. I want to check whether this introduced this problem.

JianYetFebruary 22, 2024 at 5:16 PM

@Christian Pape Will get the logs for you. What would be the effect w.r.t Hikari CP when setting it to false. I need to consult the customer first and explain to them.

Issues

Hikari CP leaking threads

Description

Acceptance / Success Criteria

Attachments

DetailsAssigneeChristian PapeChristian PapeReporterJianYetJianYetLabelsbugfixslasupportHB Grooming DateFeb 13, 2024HB Backlog StatusRefined BacklogComponentsDatabaseSprintNone+1Fix versionsMeridian-2023.1.1433.0.2Affects versionsMeridian-2023.1.1232.0.6PriorityMajor

Details

Assignee

Reporter

Labels

HB Grooming Date

HB Backlog Status

Components

Sprint

Fix versions

Affects versions

Priority

PagerDutyPagerDuty Incident

PagerDuty

Activity

Christian PapeMarch 1, 2024 at 12:44 PM

Christian PapeFebruary 28, 2024 at 10:35 AM

JianYetFebruary 23, 2024 at 4:50 PM

Christian PapeFebruary 23, 2024 at 6:52 AM

JianYetFebruary 22, 2024 at 5:16 PM

Details
Assignee
Christian Pape
Reporter
JianYet
Labels
bugfixslasupport
HB Grooming Date
Feb 13, 2024
HB Backlog Status
Refined Backlog
Components
Database
Sprint
None
Fix versions
Meridian-2023.1.14
33.0.2
Affects versions
Meridian-2023.1.12
32.0.6
Priority
Major

PagerDuty