PollableServiceConfig.poll() can hang indefinitely blocking all future polling of services on a node.

Description

I was notified for an outage on an LDAP service. I've checked and the ldap service is responsive from the command line. After looking at poller.log, I discovered that new attempts are being attempted, however, the original thread is in a waiting state and this is blocking subsequent threads from polling the service.

2014-08-26 20:11:51,125 INFO [Poller-Thread-92-of-100] PollableService: Postponing poll for 4394:172.16.88.18:LDAP-LB-PROXY because org.opennms.netmgt.poller.pollables.LockUnavailable: Unable to obtain lock for 4394 before timeout (45000ms), m_owner:Poller-Thread-78-of-100

The owner thread is waiting on a lock, and here is the thread dump:

Working branch: https://github.com/OpenNMS/opennms/tree/jira/NMS-6801

Acceptance / Success Criteria

None

depends on

Lucidchart Diagrams

Activity

Show:

Vitor Moreira November 15, 2016 at 6:05 AM

Maybe this could help you.. We have some code that enforce a timeout when trying to connect via JMX:

Seth Leger June 28, 2016 at 10:30 PM

The fact that JMX was blocking polling could probably be improved if we figure out a way to enforce timeouts on JMX connections.

Seth Leger April 9, 2015 at 11:57 AM

Needs more investigation.

Seth Leger January 22, 2015 at 2:41 PM

The last version of jldap in the public maven repo is '2009-10-07'. Another option would be to move to another library (as Ron suggested). We're already using spring-ldap so maybe that's an option. It should be fairly heavily tested since it is the LDAP implementation that is used inside Spring Security.

Seth Leger January 22, 2015 at 2:33 PM

In this proposed fix, you need to interrupt the zombie thread so that it will maybe die on the failed poll.

The original problem appears to be a thread safety issue inside the jldap library. It looks like we're using version 4.3 and the terminal version was 4.6 so we should try and see if we can upgrade.

Details

Assignee

Reporter

Components

Affects versions

Priority

PagerDuty

Created August 26, 2014 at 9:26 PM
Updated July 26, 2023 at 2:16 PM