DB installer tries to class-load every file under OPENNMS_HOME as a JAR
Description
Environment
Any system with RRDTool or JRobin persisting and many (many, many) RRD files in {{OPENNMS_HOME/share/rrd}}, especially on rotational disks
Acceptance / Success Criteria
None
Lucidchart Diagrams
Activity
Jesse White February 24, 2017 at 3:06 PMEdited
I was able to reproduce this by create a number of small files in share/rrd/snmp
using:
for x in `seq 1 500000`; do
fallocate -l 40K file$x.dat
done
And by running strace
on the installer:
sudo strace -f java -XX:+UseG1GC -Xmx2g -Dopennms.bootstrap.debug=true -Dopennms.home=/home/jesse/git/opennms/target/opennms-19.0.0-SNAPSHOT -Dlog4j.configurationFile=/home/jesse/git/opennms/target/opennms-19.0.0-SNAPSHOT/etc/log4j2-tools.xml -cp /home/jesse/git/opennms/target/opennms-19.0.0-SNAPSHOT/lib/opennms_bootstrap.jar org.opennms.bootstrap.InstallerBootstrap -dis
I was able to confirm the stat
calls:
[pid 28128] stat("/home/jesse/git/opennms/target/opennms-19.0.0-SNAPSHOT/share/rrd/snmp/file6861.dat", {st_mode=S_IFREG|0664, st_size=40960, ...}) = 0
[pid 28128] stat("/home/jesse/git/opennms/target/opennms-19.0.0-SNAPSHOT/share/rrd/snmp/file50905.dat", {st_mode=S_IFREG|0664, st_size=40960, ...}) = 0
[pid 28128] stat("/home/jesse/git/opennms/target/opennms-19.0.0-SNAPSHOT/share/rrd/snmp/file79827.dat", {st_mode=S_IFREG|0664, st_size=40960, ...}) = 0
I was also able to grab a stack-trace of the JVM during that same run.
Converting the 28128
to HEX gives us 0x6de0
, which can be used to find the associated thread in the stack-trace (searching for nid=0x6de0
):
"Main" #11 prio=5 os_prio=0 tid=0x00007f57354a0800 nid=0x6de0 runnable [0x00007f56d03c1000]
java.lang.Thread.State: RUNNABLE
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
at java.io.File.isDirectory(File.java:849)
at org.opennms.bootstrap.Bootstrap$1.accept(Bootstrap.java:80)
at java.io.File.listFiles(File.java:1291)
at org.opennms.bootstrap.Bootstrap.loadClasses(Bootstrap.java:194)
at org.opennms.bootstrap.Bootstrap.loadClasses(Bootstrap.java:198)
at org.opennms.bootstrap.Bootstrap.loadClasses(Bootstrap.java:198)
at org.opennms.bootstrap.Bootstrap.loadClasses(Bootstrap.java:198)
at org.opennms.bootstrap.Bootstrap.loadClasses(Bootstrap.java:158)
at org.opennms.install.Installer.install(Installer.java:264)
at org.opennms.install.Installer.main(Installer.java:989)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.opennms.bootstrap.Bootstrap$4.run(Bootstrap.java:464)
at java.lang.Thread.run(Thread.java:745)
So, our own Bootstrap code is the culprit.
Fixed
Details
Assignee
Jesse WhiteJesse WhiteReporter
Jeff GehlbachJeff GehlbachLabels
Components
Sprint
NoneFix versions
Affects versions
Priority
Critical
Details
Details
Assignee
Jesse White
Jesse WhiteReporter
Jeff Gehlbach
Jeff GehlbachLabels
Components
Sprint
None
Fix versions
Affects versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

PagerDuty Incident
Created October 31, 2016 at 11:24 AM
Updated March 1, 2017 at 9:47 AM
Resolved March 1, 2017 at 9:47 AM
To reproduce this issue without a ton of actual RRD files, just make a bazillion ~40KB garbage files beneath
share/rrd/snmp
. Half a million files ought to be a good start. I'm not exaggerating – this kind of scale exists in the wild.Run the installer. You'll see it get through most of its work, then apparently hang after printing:
- checking if database "opennms" is unicode... ALREADY UNICODE - Checking for old import files in /opt/opennms/etc... DONE
(The last messages printed may vary from one release to the next)
You'll see the logs go silent except for gc.log (if you've got that one configured) and maybe output.log. If you run
lsof
during this period, you'll see that the JVM is opening each of those files. I've seen this go on for hours, which is a big deal on production systems.This behavior is apparently down to the way Spring does class loading, and it's scary difficult to fix. Fortunately there is a workaround: temporarily remove the symbolic link named
share
fromOPENNMS_HOME
, and restore it after the installer completes but before starting OpenNMS again. Assuming you installed from packages, this trick is available. Just be sure to note the path that the symlink pointed to before removing it (it may not point where you think it does, especially on RPM-based installs where the path is relocatable).