PollableServiceConfig.poll() can hang indefinitely blocking all future polling of services on a node.
Description
Acceptance / Success Criteria
depends on
Lucidchart Diagrams
Activity

Vitor Moreira November 15, 2016 at 6:05 AM
Maybe this could help you.. We have some code that enforce a timeout when trying to connect via JMX:

Seth Leger June 28, 2016 at 10:30 PM
The fact that JMX was blocking polling could probably be improved if we figure out a way to enforce timeouts on JMX connections.

Seth Leger April 9, 2015 at 11:57 AM
Needs more investigation.

Seth Leger January 22, 2015 at 2:41 PM
The last version of jldap in the public maven repo is '2009-10-07'. Another option would be to move to another library (as Ron suggested). We're already using spring-ldap so maybe that's an option. It should be fairly heavily tested since it is the LDAP implementation that is used inside Spring Security.

Seth Leger January 22, 2015 at 2:33 PM
In this proposed fix, you need to interrupt the zombie thread so that it will maybe die on the failed poll.
The original problem appears to be a thread safety issue inside the jldap library. It looks like we're using version 4.3 and the terminal version was 4.6 so we should try and see if we can upgrade.
I was notified for an outage on an LDAP service. I've checked and the ldap service is responsive from the command line. After looking at poller.log, I discovered that new attempts are being attempted, however, the original thread is in a waiting state and this is blocking subsequent threads from polling the service.
2014-08-26 20:11:51,125 INFO [Poller-Thread-92-of-100] PollableService: Postponing poll for 4394:172.16.88.18:LDAP-LB-PROXY because org.opennms.netmgt.poller.pollables.LockUnavailable: Unable to obtain lock for 4394 before timeout (45000ms), m_owner:Poller-Thread-78-of-100
The owner thread is waiting on a lock, and here is the thread dump:
Working branch: https://github.com/OpenNMS/opennms/tree/jira/NMS-6801