Too many open files (reported by provisiond)

Description

Starting with version 15.x there seems to be a file descriptor leak.
With unmodified configuration file provisiond-configuration.xml there
are continously message appearing within the provisiond.log like

Caused by: java.io.IOException: Too many open files
at sun.nio.ch.IOUtil.makePipe(Native Method) ~[?:1.7.0_75]
at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:65) ~[?:1.7.0_75]
at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36) ~[?:1.7.0_75]
at java.nio.channels.Selector.open(Selector.java:227) ~[?:1.7.0_75]
at org.apache.mina.transport.socket.nio.NioProcessor.<init>(NioProcessor.java:59) ~[mina-core-2.0.7.jar:?]
... 23 more

Taking a deeper look onto the currently open file handles of the system, which polls about 100 nodes, there are:
about 20 open sockets,
about 300 open files residing within /opt/opennms/data,
about 800 open files residing within /opt/opennms/lib,
and more than 2.100 file handles on FIFOs.

The last point seems to be very strange to me.

------------------------------------------------------ BTW: Yes, I know, the system is limited to number of open files.
As it had caused really bad pain, when the database has run into "too many open files", I reduced the symptoms by ulimiting the OpenNMS processes:
=== OpenNMS Complimentary Thread Dump ===
begin ulimit settings:
...
open files (-n) 5000

Environment

OpenNMS Configuration OpenNMS Version: 15.0.1 Home Directory: /opt/opennms RRD store by group enabled? false RRD store by foreign source enabled? false Web-Application Logfiles: /opt/opennms/logs Reports directory: /opt/opennms/share/reports Jetty HTTP host: null Jetty HTTP port: 8980 Jetty HTTPS host: null Jetty HTTPS port: null System Configuration Server Time: Mon Mar 02 23:35:47 CET 2015 Client Time: Mon Mar 02 2015 23:35:51 GMT+0100 (CET) Java Version: 1.7.0_75 Oracle Corporation Java Virtual Machine: 24.75-b04 Oracle Corporation Operating System: Linux 2.6.18-028stab099.3 (amd64) Servlet Container: jetty/8.1.10.v20130312 (Servlet Spec 3.0) User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:35.0) Gecko/20100101 Firefox/35.0

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Seth Leger September 9, 2016 at 10:35 AM

Hi Guenther,

I just wanted to bump this issue again. OpenNMS Horizon 19 is approaching feature-complete status so I would recommend that you try out the latest snapshots to see if the issue persists for you. Thanks!

Seth Leger June 20, 2016 at 12:26 AM

Hi Guenther,

We're making several improvements to Provisiond in our latest snapshots for OpenNMS 19.0.0 that may improve this issue:

  • Better exception handling

  • Better logging of exceptions

  • Upgrade of the Apache Mina library that is probably responsible for this issue

If you get a chance, can you try out the latest snapshot code and see if the behavior has improved on your system? Thank you.

guenther.schreiner June 23, 2015 at 5:51 AM

Review of this issue after upgrading to 16.0.2:

  • the number of open files has significantly reduced by 200 without any changes of configuration files

Installed Packages:

opennms.noarch 16.0.2-1 @opennms-stable-common
opennms-core.noarch 16.0.2-1 @opennms-stable-common
opennms-docs.noarch 16.0.2-1 @opennms-stable-common
opennms-plugin-provisioning-link.noarch 16.0.2-1 @opennms-stable-common
opennms-plugin-provisioning-map.noarch 16.0.2-1 @opennms-stable-common
opennms-remote-poller.noarch 16.0.2-1 @opennms-stable-common
opennms-webapp-jetty.noarch 16.0.2-1 @opennms-stable-common

Seth Leger June 11, 2015 at 11:57 AM

Thank you for your excellent feedback on this issue. If you reduce the number of threads available to provisiond, it will potentially increase the amount of time that you need to perform the provisioning scan which occurs (by default) every 24 hours.

We'll try and take a look at this issue and see why increasing the number of threads would have a negative impact on the number of open file handles.

guenther.schreiner June 11, 2015 at 11:52 AM

My 1st guess to reduce the number of concurrent sessions in opennms.properties by setting
< org.opennms.netmgt.provision.maxConcurrentConnections=20
but in vain: still running out of descriptors....

2nd try was to adapt the number of threads in provisiond-configuration.xml. Therefore changed from
> importThreads="8" scanThreads="10" rescanThreads="10" writeThreads="8"
now to
< importThreads="2" scanThreads="2" rescanThreads="2" writeThreads="2"

AND THAT's IT: the system keeps less than 1.500 handles open. No more out-of-descriptors or out-of-memory!

Fixed

Details

Assignee

Reporter

Fix versions

Affects versions

Priority

PagerDuty

Created March 2, 2015 at 5:57 PM
Updated August 9, 2017 at 2:51 PM
Resolved March 2, 2017 at 11:30 AM