Alarmd get stucks in dead-lock and stops processing events

Description

I've seen a few cases now where alarmd gets stuck with a stack similar to:

The query is stuck waiting on Postgres.

Further inspection shows that the query is actually stuck waiting for a lock, which held by another transaction that is not yet committed.

Looking back at the thread dump, we can find another thread with an open transaction, which is open, and will remain open until the other thread unblocks - hence the deadlock.

Acceptance / Success Criteria

None

Attachments

1

Lucidchart Diagrams

Activity

Show:

Jesse White March 20, 2019 at 3:47 PM

Cherry-picked to release-23.0.4.

Jesse White March 4, 2019 at 5:47 PM

Jesse White March 1, 2019 at 3:41 PM

Patch for 23.0.3 is attached, to install:

and restart OpenNMS.

Jesse White March 1, 2019 at 1:19 PM

Current workaround are to restart OpenNMS, or kill the affected queries in Postgres with: SELECT pg_terminate_backend(<pid of the process>).

Fixed

Details

Assignee

Reporter

Sprint

Fix versions

Affects versions

Priority

PagerDuty

Created March 1, 2019 at 1:18 PM
Updated November 21, 2023 at 2:25 PM
Resolved March 6, 2019 at 1:24 AM