Spotted thread leak in Syslogd localhost name lookups
Description
Environment
Acceptance / Success Criteria
Attachments
Lucidchart Diagrams
Activity

Seth Leger August 5, 2013 at 10:59 AM
Since this sounds like a JVM issue that we cannot code around, I'm going to resolve this. If you encounter problems with host name lookup threads as a user, please try upgrading to Java 7 to fix the issue. Java 6 is near (or maybe past) end-of-life support at this point anyway.

jcat August 2, 2013 at 11:23 AM
Well, since the java update over a month ago, we've not seen this at all.
I even managed to go a good 10 days without a restart for any reason, and still no issues
Great news, and from point of view, I can call it resolved.
Thanks for the assistance from all.
Cheers,
Just

jcat June 24, 2013 at 11:49 AM
Small update.
Since I last posted, we've seen this 3 times.
Today I deployed Oracle Java 1.7 (1.7.0_21) in production - and as previously indicated, we'll just have to see how it goes from now
Cheers,
Just

jcat June 7, 2013 at 6:01 AM
So I can confirm here's no bad /etc/hosts entry on the server, so I guess that points to something in jvm land.
I've been trying to reproduce this in our test environment, and so far no luck. (Same os, jvm , config, etc..)
I've fired 50,032,578 syslog messages (and counting..) so far in a 24 hour period. No dice
On the test server, I've also tried invalidating the localhost entry in /etc/hosts, still no luck!
So if I can't reproduce it, it really only leave me with one course of action. Upgrade the jvm in production and see how it goes.
I'll need to run the jvm in test for a while first (maybe a week or so) before upgrading prod.
I've created some oracle java 1.7 packages from the latest jdk on the oracle website, as I do so hate do things out side the package manager
So if all goes well in test, I'll have it running in prod in a week. At that point we'll just have to see how it goes.
The thread leak occurred three times in the space of a month previously, so if it survives a month without an incident I'd say it was a likely fix.
I'll keep you all informed.
Thanks for your input so far.
Cheers,
Just

Jeff Gehlbach June 6, 2013 at 5:01 PM
Thanks for the analysis, Ben.
Among systems where we've seen many threads stuck in that getLocalHostName() method, running with a Java 6 JVM has been a common thread. Therefore upgrading those systems to use Oracle Java 7 has been part of the solution. Given this finding, I suggest very strongly trying this. It may mean that you have to install the JDK outside the package system, which sucks, but if it's at all possible to try it please do.
We have seen a large amount of thread from OpenNMS. A large amount means 32318. After stopping OpenNMS amount of threads was normal. Problem was indicated by running simple bash commands which exit with error message
bash: fork: Cannot allocate memory
and yum update with error
thread.error: can't start new thread