Deadlock in ALEC causes OpenNMS to hung
Description
Attachments
Lucidchart Diagrams
Activity

Jesse White March 11, 2020 at 1:36 PM

Jesse White March 5, 2020 at 7:31 PM
Still happening. Threads as follows:
###
Thread 480 alarmd-Thread-2-of-4 WAITING
Stacktrace:
sun.misc.Unsafe.park line: -2
java.util.concurrent.locks.LockSupport.park line: 175
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt line: 836
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly line: 997
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly line: 1304
java.util.concurrent.CountDownLatch.await line: 231
org.opennms.alec.engine.cluster.AbstractClusterEngine.onAlarmCreatedOrUpdated line: 574
org.opennms.alec.datasource.opennms.jvm.DirectAlarmDatasource.lambda$handleNewOrUpdatedAlarmNoLock$4 line: 152
org.opennms.alec.datasource.opennms.jvm.DirectAlarmDatasource$$Lambda$1386/1944608698.accept line: -1
org.opennms.alec.datasource.common.HandlerRegistry.lambda$forEach$0 line: 56
org.opennms.alec.datasource.common.HandlerRegistry$$Lambda$1387/2054192579.accept line: -1
java.lang.Iterable.forEach line: 75
org.opennms.alec.datasource.common.HandlerRegistry.forEach line: 54
org.opennms.alec.datasource.opennms.jvm.DirectAlarmDatasource.handleNewOrUpdatedAlarmNoLock line: 152
org.opennms.alec.datasource.opennms.jvm.DirectAlarmDatasource.handleNewOrUpdatedAlarm line: 135
Proxy19dd579a_30be_4719_99d8_339d6c2e89c1.handleNewOrUpdatedAlarm line: -1
org.opennms.features.apilayer.alarms.AlarmLifecycleListenerManager$1.handleNewOrUpdatedAlarm line: 67
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.lambda$onNewOrUpdatedAlarm$2 line: 135
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager$$Lambda$1378/327301083.accept line: -1
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.forEachListener line: 206
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.onNewOrUpdatedAlarm line: 135
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.onAlarmUpdatedWithReducedEvent line: 155
org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl.lambda$didUpdateAlarmWithReducedEvent$1 line: 60
org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl$$Lambda$1377/1368398186.accept line: -1
org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl.forEachListener line: 121
org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl.didUpdateAlarmWithReducedEvent line: 60
org.opennms.netmgt.alarmd.AlarmPersisterImpl.addOrReduceEventAsAlarm line: 204
org.opennms.netmgt.alarmd.AlarmPersisterImpl.lambda$persist$0 line: 122
org.opennms.netmgt.alarmd.AlarmPersisterImpl$$Lambda$1357/1667485614.doInTransaction line: -1
org.springframework.transaction.support.TransactionTemplate.execute line: 133
org.opennms.netmgt.alarmd.AlarmPersisterImpl.persist line: 122
org.opennms.netmgt.alarmd.Alarmd.onEvent line: 87
sun.reflect.GeneratedMethodAccessor641.invoke line: -1
sun.reflect.DelegatingMethodAccessorImpl.invoke line: 43
java.lang.reflect.Method.invoke line: 498
#####
admin@opennms> threads 307
Thread 307 DroolsSession-alarmd-DroolsAlarmContext WAITING
Stacktrace:
sun.misc.Unsafe.park line: -2
java.util.concurrent.locks.LockSupport.park line: 175
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt line: 836
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued line: 870
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire line: 1199
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock line: 943
org.opennms.alec.datasource.opennms.jvm.DirectAlarmDatasource.handleDeletedAlarm line: 167
Proxy19dd579a_30be_4719_99d8_339d6c2e89c1.handleDeletedAlarm line: -1
org.opennms.features.apilayer.alarms.AlarmLifecycleListenerManager$1.handleDeletedAlarm line: 72
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.lambda$onAlarmDeleted$3 line: 145
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager$$Lambda$853/1947530633.accept line: -1
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.forEachListener line: 206
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.onAlarmDeleted line: 145
org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl.lambda$didDeleteAlarm$6 line: 85
org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl$$Lambda$852/644361367.accept line: -1
org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl.forEachListener line: 121
org.opennms.netmgt.dao.support.AlarmEntityNotifierImpl.didDeleteAlarm line: 85
org.opennms.netmgt.alarmd.drools.DefaultAlarmService.deleteAlarm line: 106
sun.reflect.NativeMethodAccessorImpl.invoke0 line: -2
#####
admin@opennms> threads 243
Thread 243 AlarmLifecycleListenerManager WAITING
Stacktrace:
sun.misc.Unsafe.park line: -2
java.util.concurrent.locks.LockSupport.park line: 175
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt line: 836
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued line: 870
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire line: 1199
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock line: 943
org.opennms.alec.datasource.opennms.jvm.DirectAlarmDatasource.handleAlarmSnapshot line: 102
Proxy19dd579a_30be_4719_99d8_339d6c2e89c1.handleAlarmSnapshot line: -1
org.opennms.features.apilayer.alarms.AlarmLifecycleListenerManager$1.handleAlarmSnapshot line: 51
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.lambda$null$0 line: 116
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager$$Lambda$1199/1829088878.accept line: -1
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.forEachListener line: 206
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.lambda$doSnapshot$1 line: 114
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager$$Lambda$1134/451565701.get line: -1
org.opennms.netmgt.dao.hibernate.DefaultSessionUtils.lambda$withTransaction$0 line: 68
org.opennms.netmgt.dao.hibernate.DefaultSessionUtils$$Lambda$1011/918616663.doInTransaction line: -1
org.springframework.transaction.support.TransactionTemplate.execute line: 133
org.opennms.netmgt.dao.hibernate.DefaultSessionUtils.withTransaction line: 68
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager.doSnapshot line: 107
org.opennms.netmgt.alarmd.AlarmLifecycleListenerManager$1.run line: 82
java.util.TimerThread.mainLoop line: 555
java.util.TimerThread.run line: 505
####
"ALEC Driver Startup [dbscan]" #1862 daemon prio=5 os_prio=0 tid=0x00007ff62400e000 nid=0x1d3a8 waiting on condition [0x00007ff4f3cfd000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
parking to wait for <0x0000000690800000> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at org.opennms.alec.datasource.opennms.jvm.DirectAlarmDatasource.getSituations(DirectAlarmDatasource.java:237)
at Proxy3efaee1e_bbfa_4cf7_b4c6_b7da817e5a07.getSituations(Unknown Source)
at org.opennms.alec.driver.main.Driver.lambda$initAsync$0(Driver.java:169)
at org.opennms.alec.driver.main.Driver$$Lambda$1284/1860172802.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)

Jesse White February 23, 2020 at 11:32 PM
Fixed in release-1.x with https://github.com/OpenNMS/alec/commit/bf66b27621105f2d1ab839f075b82f849e551104
My OpenNMS often hangs and I finally was able to fetch Java Stack information when the issue occured. See attachment.
OpenNMS 25.1.2
ALEC: 1.0.2-1
I could imagine that this issue I've posted here https://opennms.discourse.group/t/problems-with-hanging-opennms/ is related somehow.