IllegalMonitorStateException in Poller ReentrantLock causes polling to stop

Description

I've configured a pseudo node on my OpenNMS instance for checking internet connectivity and latency. It has a public DNS IPv4 and IPv6 address assigned as interface addresses. After upgrading to 20.0.0 the collection of IPv4 response times stop to work after several hours (~9h on my server). Since the node and its interfaces stay up and no event is sent I think it is only related to the collection of the response times. Curiously, the IPv6 response times are still being collected and persisted.

I've not configured anything related to ICMP in my opennms.properties.

Also, the poller.log is full of exceptions like this. So, maybe this is related to issue https://opennms.atlassian.net/browse/NMS-9439#icft=NMS-9439.

Exception in thread "Poller-Thread-18-of-30" java.lang.IllegalMonitorStateException at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151) at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261) at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457) at org.opennms.netmgt.poller.pollables.PollableNode.releaseTreeLock(PollableNode.java:263) at org.opennms.netmgt.poller.pollables.PollableElement.releaseTreeLock(PollableElement.java:210) at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:261) at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:241) at org.opennms.netmgt.poller.pollables.PollableService.doRun(PollableService.java:404) at org.opennms.netmgt.poller.pollables.PollableService.run(PollableService.java:379) at org.opennms.netmgt.scheduler.Schedule.run(Schedule.java:142) at org.opennms.netmgt.scheduler.Schedule$ScheduleEntry.run(Schedule.java:86) at org.opennms.netmgt.scheduler.LegacyScheduler$1.run(LegacyScheduler.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.opennms.core.concurrent.LogPreservingThreadFactory$3.run(LogPreservingThreadFactory.java:124) at java.lang.Thread.run(Thread.java:748)

Environment

Ubuntu 16.04, 97 Nodes, 140 Interfaces, IPv4 and IPv6

Acceptance / Success Criteria

None

Attachments

1
  • 05 Jul 2017, 12:16 AM

Lucidchart Diagrams

Activity

Show:

Jesse White July 5, 2017 at 12:17 AM

I can no longer reproduce the problem with the patch. I've also attached a thread dump from an existing system running 20.0.0 that experiences the reported problem.

Seth Leger June 29, 2017 at 4:51 PM

I've creating a PR that fixes the IllegalMonitorStateException problems:

https://github.com/OpenNMS/opennms/pull/1576

If somebody can attach a thread dump of a 20.0.0 system experiencing this problem so that I can verify that this is the only issue, that would be appreciated.

Seth Leger June 28, 2017 at 4:50 PM

I think I see what's going on here... the semantics of the locks are the same but the PollableElement.withTreeLock(Callable<T>, long) method can try to release a lock (in the finally block) that it failed to obtain. This is throwing the IllegalMonitorStateException instead of the expected LockUnavailable exception. This explains why the exception is being thrown however it might not explain why all polls stop for the service.

Fixed

Details

Assignee

Reporter

Sprint

Fix versions

Affects versions

Priority

PagerDuty

Created June 23, 2017 at 3:08 AM
Updated July 5, 2017 at 12:17 AM
Resolved July 5, 2017 at 12:17 AM

Flag notifications