notifications hang on /bin/mail

Description

None

Environment

Operating System: MacOS X Platform: Macintosh

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

David Hustace September 17, 2004 at 5:20 PM

Changed default mailer from /bin/mail to JavaMail API

DJ Gregor August 22, 2004 at 11:02 AM

See FAQ 141 for some discussion about this problem, that may sometimes be triggered (more
severely?) by certain JVM versions:

http://faq.opennms.org/faq/fom-serve/cache/141.html

Former user June 25, 2004 at 7:51 AM

I'm assigning this to you DJ since you requested it.

I have added some comments below:

I haven't seen this happening on Fedora Core 1 so I haven't been able to fix it. I have done a lot of Java
development on Solaris lauching processes. I agree that the Sun bug you referenced is a problem here
but the symptom for that is usually that the execute command throws a Too Many Open Files exception
and fails to run. Though I do think a process.destroy() should be added here.

I'm not sure what the situation is here but one potential problem I see in this code is a deadlock
situation. (This situation was possible in early versions of the JVM but I don't know about the rewrite
mentioned in that bug)

The deadlock could occur in a few differ ways. But the basic idea is that the pipes between the process
and the JVM fill up.

One process would be writing to the processes input but would block because the input buffer would
fill up.

At the same time the process would be writing to its output but would also block because its output
buffer would fill up.

Neither the JVM's thread not the process can make progress because they are each waiting for the other
to empty the buffers.

In the code in org.opennms.web.notificatoin.bobject.Command the output buffer for the process is
never processed. If more than say 8K of data is created then there could be a deadlock here. (Not the
output could also be to error)

However, I don't believe that the /bin/mail program creates a great deal of output so I'm unsure that
this is really the problem.

The solution I'd really like to see is for us to remove the use of /bin/mail or other mail processes by
using javax.mail calls. This is pretty straightforward to code. Doesn't launch any OS dependent
processes and is very portable. Plus it has build in mime encoding so we won't need to depend on
metamail either.

I have used it on Solaris, Linux, Windows, HPUX and AIX and have been successful on all of them.

Let me know your thoughts,
Matt

DJ Gregor June 24, 2004 at 5:32 AM

See Sun See Sun Bug 4784692, "Process.waitFor does not release resources":
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4784692

Of particular note, the workaround section says:
"... In order to release all resources, user code must either invoke Process.destroy or manually close the
three subprocess streams. ..."

If org.opennms.web.notification.bobject.Command.execute() is what creates notifications, this could be
the culprit because it does neither of the actions mentioned above.

This might also what causes some odd errors with RRD graphing because
org.opennms.web.graph.RRDGraphServlet.doGet() doesn't call waitFor().

Here's some information on the RRD graphing issue that one person mentioned (on Solaris):

http://opennms.org/pipermail/bugs/2004-April/001305.html

You can see the deInit() function below for how I've fixed this (I think) in another application:

http://cvs.sourceforge.net/viewcvs.py/mailping/taglibs/rrd/src/com/gregor/rrd/RRDPipe.java?
view=markup

, "Process.waitFor does not release resources":
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4784692

Of particular note, the workaround section says:
"... In order to release all resources, user code must either invoke Process.destroy or manually close the
three subprocess streams. ..."

If org.opennms.web.notification.bobject.Command.execute() is what creates notifications, this could be
the culprit because it does neither of the actions mentioned above.

This might also what causes some odd errors with RRD graphing because
org.opennms.web.graph.RRDGraphServlet.doGet() doesn't call waitFor().

Here's some information on the RRD graphing issue that one person mentioned (on Solaris):

http://opennms.org/pipermail/bugs/2004-April/001305.html

You can see the deInit() function below for how I've fixed this (I think) in another application:

http://cvs.sourceforge.net/viewcvs.py/mailping/taglibs/rrd/src/com/gregor/rrd/RRDPipe.java?
view=markup

Former user May 10, 2004 at 7:51 AM

This bug seems to happen more on Solaris. I have seen it there. Need to attempt to reproduce it.

Matt Brozowski

Fixed

Details

Assignee

Reporter

Components

Fix versions

Priority

PagerDuty

Created May 7, 2004 at 2:12 PM
Updated January 27, 2017 at 4:31 PM
Resolved September 17, 2004 at 6:20 PM