Inconsistent Data Spike for In/Out Utilization
Description
Acceptance / Success Criteria
Attachments
Lucidchart Diagrams
Activity

Jesse White September 13, 2019 at 2:50 PM
As a follow up to this, we'll look at adding some additional utilities with and to help make debugging issues like this easier.

Jesse White September 13, 2019 at 2:50 PM
Also, the spikes are inconsistent because the values are averaged out differently based on the time range that is selected.
To mitigate against this, you could try limiting the max/min values to 100/-100 by editing the RPN expressions for the graph or switch to using Grafana to view the graph and apply the "Outlier" filter to help remove the spikes.

Jesse White September 13, 2019 at 2:44 PM
After analyzing the data for these graphs on the affected system, I found that the spikes are not related to counter wraps, but are related to what I suspect are values of "0" being returned by the agent.
The samples look like:
Here we see the values go from > 6 trillion, to 0 and back up to 6 trillion again. Going from 6 trillion to 0, triggers a counter wrap, which is replaced with a NaN, but going from 0 back up to > 6 trillion triggers the spike.

Berk Amons September 10, 2019 at 7:38 PM
David,
There are a few small gaps for that node, but there are also other nodes with large gaps that span weeks at a time. See screenshots attached. The one titled "smallGaps" shows gaps in the same node that has the spike. The one titled "largeGap" shows a different node with a large gap in data collection.

David Hustace August 29, 2019 at 8:38 PM
We are also seeing some other nodes with gaps in data collection. Not sure if these issues could be related.
Berk, does "other nodes" imply that this node with the spike has gaps in graphs, too?
We are seeing an inconsistent data spike on a graph for in/out utilization for a specific device. The spike is reaching various levels ranging from 1 million to 10 million percent. The spike appears to be due to a device reboot, but it only appears in certain scenarios. The data spike appears when certain time intervals are selected for the graph, but the data looks normal when other time intervals (still including the time of reboot) are selected. I opened a case for this and had a discussion with Alejandro Galue, but after determining the spike was inconsistent, he recommended I open a bug here. I will include some screenshots to help explain the issue.