Improve handling of counter wraps/reset when using Newts

Description

When tracking a counter, it's value is typically reset to 0 when the device is restarted. When aggregating these values in Newts, a counter wrap is detected, leading to extremely large values (in the petabytes range), which can cause spikes in graphs. This distorts the graphs, and makes it difficult to see the other values.

The help remedy this problem with other persistence strategies, we have created tools like the https://wiki.opennms.org/wiki/JRobin_Spike_Hunter.

Since, Newts performs late aggregation, we could look at enhancing the "counter wrap" logic to help prevent these spikes.

Acceptance / Success Criteria

None

Attachments

2

Lucidchart Diagrams

Activity

Alejandro Galue April 4, 2017 at 9:51 AM
Edited

I verified on a VM with Meridian 2016.1.4 that the spikes are not rendered and won't affect the data returned by the Measurements API when using org.opennms.newts.nan_on_counter_wrap=true.

Considering how easy is introducing a spike in OpenNMS, I would set that flag to be true by default for Meridian 2017 and next Horizon.

After chatting with , we think this attribute should be true by default:

Jesse White April 4, 2017 at 8:02 AM

Jesse White April 3, 2017 at 1:36 PM

In Newts 1.3.4, I've added the ability to disable counter wraps globally using the org.opennms.newts.nan_on_counter_wrap system property. The attached screenshots show a graph with and without the flag enabled.

Jesse White March 30, 2017 at 8:24 AM

Assuming that most counter wraps we encounter are to resets (devices/services restarting) and not actually due to the counter growing so large that they wrap around, it could be sufficient to simply return a NaN when a wrap is detected. We could allow this behavior to be controller by a system property.

Fixed

Details

Assignee

Reporter

Labels

Components

Sprint

Affects versions

Priority

PagerDuty

Created March 30, 2017 at 8:23 AM
Updated April 4, 2017 at 1:49 PM
Resolved April 4, 2017 at 9:51 AM