Improve handling of counter wraps/reset when using Newts

Description

When tracking a counter, it's value is typically reset to 0 when the device is restarted. When aggregating these values in Newts, a counter wrap is detected, leading to extremely large values (in the petabytes range), which can cause spikes in graphs. This distorts the graphs, and makes it difficult to see the other values.

The help remedy this problem with other persistence strategies, we have created tools like the https://wiki.opennms.org/wiki/JRobin_Spike_Hunter.

Since, Newts performs late aggregation, we could look at enhancing the "counter wrap" logic to help prevent these spikes.

Acceptance / Success Criteria

None

Attachments

Lucidchart Diagrams

Activity

Alejandro Galue April 4, 2017 at 9:51 AM
Edited

I verified on a VM with Meridian 2016.1.4 that the spikes are not rendered and won't affect the data returned by the Measurements API when using org.opennms.newts.nan_on_counter_wrap=true.

Considering how easy is introducing a spike in OpenNMS, I would set that flag to be true by default for Meridian 2017 and next Horizon.

After chatting with , we think this attribute should be true by default:

Jesse White April 4, 2017 at 8:02 AM

PR: https://github.com/OpenNMS/opennms/pull/1411

Jesse White April 3, 2017 at 1:36 PM

In Newts 1.3.4, I've added the ability to disable counter wraps globally using the org.opennms.newts.nan_on_counter_wrap system property. The attached screenshots show a graph with and without the flag enabled.

Jesse White March 30, 2017 at 8:24 AM

Assuming that most counter wraps we encounter are to resets (devices/services restarting) and not actually due to the counter growing so large that they wrap around, it could be sufficient to simply return a NaN when a wrap is detected. We could allow this behavior to be controller by a system property.

Fixed

Details
Assignee
Jesse White
Reporter
Jesse White
Labels
support
Components
Sprint
None
Fix versions
19.1.0
Meridian-2016.1.5
Affects versions
Meridian-2016.1.4
19.0.1
Priority
Major

PagerDuty

Created March 30, 2017 at 8:23 AM

Updated April 4, 2017 at 1:49 PM

Resolved April 4, 2017 at 9:51 AM

Improve handling of counter wraps/reset when using Newts

Description

Acceptance / Success Criteria

Attachments

Lucidchart Diagrams

Activity

Alejandro Galue April 4, 2017 at 9:51 AMEdited

Jesse White April 4, 2017 at 8:02 AM

Jesse White April 3, 2017 at 1:36 PM

Jesse White March 30, 2017 at 8:24 AM

DetailsAssigneeJesse WhiteJesse WhiteReporterJesse WhiteJesse WhiteLabelssupportComponentsSprintNone+1Fix versions19.1.0Meridian-2016.1.5Affects versionsMeridian-2016.1.419.0.1PriorityMajor

Details

Assignee

Reporter

Labels

Components

Sprint

Fix versions

Affects versions

Priority

PagerDutyPagerDuty Incident

PagerDuty

Alejandro Galue April 4, 2017 at 9:51 AM
Edited

Details
Assignee
Jesse White
Reporter
Jesse White
Labels
support
Components
Sprint
None
Fix versions
19.1.0
Meridian-2016.1.5
Affects versions
Meridian-2016.1.4
19.0.1
Priority
Major

PagerDuty