JRE5 causes problems with graphing/datacollection on 1.8.15
Description
Environment
Acceptance / Success Criteria
Attachments
- 28 Nov 2011, 01:29 PM
- 28 Nov 2011, 01:29 PM
Lucidchart Diagrams
Activity
Seth Leger July 31, 2012 at 2:40 PM
We have customers running on a variety of JVMs and we haven't seen a specific problem with Java 5 with regards to gaps in the collections.
Also, the current stable version (1.10) requires Java 6 now so this is no longer an issue. Marking as cannot reproduce.
Robert Drake December 3, 2011 at 11:56 AM
Even with 1.8.15 it was still showing graphs like the one pictured. In fact that graph was from 1.8.15. This was from our production machine and it has 64G of ram. I was never able to pin down a good number for JAVA HEAP, everyone always said more is better, but I have never heard of anyone trying such large numbers. I thought maybe that could be the cause so I fiddled with it.
I should also mention that I have this in opennms.conf. I believe I may have added it while trying to fix this. I think I found it on one of the opennms wiki pages.
ADDITIONAL_MANAGER_OPTIONS="-Xmx"$JAVA_HEAP_SIZE"m -Xms"$JAVA_HEAP_SIZE"m -XX:+UseParallelGC"
Seth Leger December 2, 2011 at 3:17 PM
So you are getting data collection gaps with Java5 and not with Java6 on a single node? There was a major memory leak in versions of 1.8 before 1.8.13 so that could explain the problems that you had with older versions. A 10G heap size sounds large, how much physical RAM is on the system? You shouldn't need more than 2G maximum for a small test system.
Robert Drake November 28, 2011 at 1:38 PM
Some extra information about troubleshooting:
I wiped out the /etc/opennms configs and started from scratch to see if changes I had made caused it.
Took it down to 1 node to see if it was resource contention
Changed collection to write to a ramdisk to make sure it wasn't I/O
Bumped up the thread limits to see if I was not polling in time, lowered the thread limits to see if somehow it was spawning too many threads and getting angry about it.
Raised the JAVA_HEAP_SIZE to 10G and lowered it to 4G to see if it made a difference
Set MAXIMUM_FILE_DESCRIPTORS=20480 to see if I was running out of those
ulimit -n 99999 for similar reasons
Bumping up the logging levels and watching logs, I never saw anything report an issue that would cause the problem.
I did a diff of changes between 1.8.5 and successive next releases to see if anything changed that would affect collection and couldn't find anything that might cause the problem.
Details
Assignee
Matt BrozowskiMatt BrozowskiReporter
Robert DrakeRobert DrakeComponents
Affects versions
Priority
Major
Details
Details
Assignee
Reporter
Components
Affects versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

I upgraded from 1.8.5 to 1.8.9 a long time ago and had datacollection problems. I troubleshot for a long time and never found a solution so downgraded back to 1.8.5. Recently I upgraded to 1.8.15 and tried again with the same results. After troubleshooting for a week or so I discovered I was still running java5. Upgrading to java6 solved the collection problem. I'm not sure if this affects everyone using java5 or if it varies depending on your server settings.
I'm attaching a patch that would deprecate java5 in the debian and redhat packages, and issue a warning from runjava if it detects your using java5.
I'm also attaching a graph to show the problem.