Availability Miscalculated in NodeAvailabilityReport.jrxml (and others)

Description

In the SQL query in NodeAvailabilityReport.jrxml
node_outages.avail_total is being calculated as total number of seconds in the period being included (e.g., 604800 for a 7 day period)
outage_seconds is being calculated as the sum of individual outages, not taking into account the number of interfaces in each node.
This causes the outage time (and outage percentage) to be effectively multiplied by the numeber of interfaces in a node with multiple interfaces monitored.

I saw this on 1.8.11, but looked at fisheye, and the query remains the same.

Environment

Linux, probably all

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Faisal M.A January 14, 2013 at 7:10 PM

Actually I'm wrong when i said nodes under default Surveillance category such as Production or Switch are generating correct availability reports. They are similar.

Surveillance Category: Switches

Outage MTTR Outage Outage Availability
Count (hours) Hours Percent Percent

test_node.com 1 158.08 158.08 94.092 5.908
Average 1.00 158.08 158.08 94.092 5.908
Maximum 1.00 158.08 158.08 94.092 5.908
Minimum 1.00 158.08 158.08 94.092 5.908

Faisal M.A January 14, 2013 at 7:02 PM

Hi, I'm very new to Opennms so please forgive my ignorance. I've installed Version: 1.10.7 on Centos 6.3 and recently had issues with customized Surveillance categories are not showing the right information (http://opennms.530661.n2.nabble.com/SLM-categories-not-updating-correctly-td6497803.html#a7582131).

But at the moment, i see negative values when i try to generate a report of Node availability for the nodes part of those customized surveillance category. Although I've other nodes which are part of default Surveillance category such as Production or Switch and they are producing correct format of availability report.

Any thoughts ?

Donald Desloge September 16, 2011 at 3:48 PM

Great, I'll drop it in and test it out. Thanks!

Tomás Heredia September 16, 2011 at 3:40 PM

Aha! here it is:
One event with eventuei = 'uei.opennms.org/nodes/nodeDown'
generates one outage (with different outageid) for each service in the node
If it has two interfaces with ICMP, the query will return two outages, instead of one.
Look at it:
opennms=# select outageid, svclosteventid from outages where svclosteventid = 1089265 and outages.serviceid=1;
outageid | svclosteventid
----------+---------------- 74896 | 1089265
74893 | 1089265
(2 rows)
....

And finally I've got a solution:
@@ -101,8 +101,8 @@
END AS
outage_counter
FROM

  • (SELECT

  • outages.nodeid,
    + (SELECT DISTINCT
    + outages.nodeid, outages.svclosteventid,
    least(('$P!{DS_START_TIME}'::TIMESTAMP + '$P!{DS_TIME_RANGE}'::INTERVAL), outages.ifregainedservice) as ifregainedservice,
    greatest('$P!{DS_START_TIME}'::TIMESTAMP,outages.iflostservice) as iflostservice
    FROM

Hope this help!

I haven't seen a bug for that, but this also solves a negative availability issue in node availability reports.

Thanks!

Tomás Heredia September 16, 2011 at 2:00 PM

Youre right I overlooked "events.eventuei = 'uei.opennms.org/nodes/nodeDown'"

I'm getting duplicated records anyway for the "outages_scope" subquery:

nodeid | ifregainedservice | iflostservice
-------------------------------------------------------- 10 | 2011-08-12 15:42:37-03 | 2011-08-12 15:40:28-03
10 | 2011-08-12 15:42:37-03 | 2011-08-12 15:40:28-03

I'm investigating a little, and will update soon.
Thanks!

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

PagerDuty

Created September 15, 2011 at 1:23 PM
Updated January 27, 2017 at 4:21 PM
Resolved September 16, 2011 at 4:48 PM