Debian init script not LSB compatible
Description
Environment
Acceptance / Success Criteria
Lucidchart Diagrams
Activity
Benjamin Reed May 2, 2011 at 3:53 PM
A timeout without knowing is a potential failure, because we don't know one way or the other. That's what the current code does, and that's why the best solution is that if you don't care if it times out because it's in the middle of failing, you should hand-set the config.
Michael Schwartzkopff May 2, 2011 at 3:11 PM
Hi,
<quote>
What's the return value for "we don't know if it started successfully or not"?
</quote>
DO NOT return at all. The script should only return if the requested action definitely succeeded or definitely failed. Of you don't know: Wait until a timeout. The timeout also is a measure for a failure.
All other wanna-be solutions to tackle the problem will NOT work in any cluster environment. That will lead to confusion, non-standard home-brew start scripts for clusters or even worse, patches for clustered environments that have to be applied every upgrade.
Please stick to the LSB conventions. Thanks.
Michael.
Benjamin Reed May 2, 2011 at 2:45 PM
(and to be clear, you can make an $OPENNMS_HOME/etc/opennms.conf that sets START_TIMEOUT=0 if you wish the behavior you're asking for)
Benjamin Reed May 2, 2011 at 2:43 PM
Well, that's the question. What's the return value for "we don't know if it started successfully or not"? I'd rather err on the side of saying it failed rather than it passed, if your OpenNMS is still configured to wait rather than just return immediately. Unless there's an LSB return val that's "indeterminate," I'd rather leave it as-is.
Sven Wick May 2, 2011 at 2:20 PM
Hi,
I am running OpenNMS within a Pacemaker setup since the early 1.8 versions.
After every upgrade I had to patch another line of the INIT script
because sometimes after a failover my OpenNMS resource was not running:
Failed actions:
opennms_start_0 (node=monitoring-node-01, call=263, rc=1, status=complete): unknown error
It seems sometimes OpenNMS takes very long to start on my machine,
even with this settings in Pacemaker:
primitive opennms lsb:opennms \ op start interval="0" timeout="300s" \ op stop interval="0" timeout="300s"
and then Pacemaker throws the error mentionend above.
After an upgrade I always get rid of this problem
with this patch and failover/failback works perfectly again:
— /usr/share/opennms/bin/opennms 2011-04-11 15:53:07.000000000 +0000
+++ /usr/share/opennms/bin/opennms.patch 2011-05-02 18:02:13.000000000 +0000
@@ -314,7 +314,7 @@
done
echo "Started OpenNMS, but it has not finished starting up" >&2
- return 1
+ return 0
}
doPause(){
Details
Details
Assignee
Reporter
Original estimate
Time tracking
Components
Affects versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty

I installed opennms 1.8.10 from the opennms.org repository. It seems that the
init script provided is not LSB compliant.
Stopping an already stopped openms results in a return code of 7 instead of
the correct 0.
See also:
http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html