Prometheus Collector attempting to persist non-integer values to counters
Description
RRD files initially created correctly but we only get NaNs (please see also attached rrdtool dump file total_idle.xml). This behaviour only seen with prometheus data with two labels - for example:
windows_cpu_time_total{core="0,0",mode="idle"}
Exporter data with max of one label store values correctly.
ls -al /var/lib/opennms/rrd/snmp/fs/01-Test/1605513515011/winExporterCPU/0_0/total_idle.rrd -rw-r--r-- 1 root root 38232 Nov 16 09:34 /var/lib/opennms/rrd/snmp/fs/01-Test/1605513515011/winExporterCPU/0_0/total_idle.rrd
and in graph data from Resource Graphs:
Mon Nov 16 09:40:00 2020 NaN Mon Nov 16 09:35:00 2020 NaN Mon Nov 16 09:30:00 2020 NaN Mon Nov 16 09:25:00 2020 NaN Mon Nov 16 09:20:00 2020 NaN Mon Nov 16 09:15:00 2020 NaN Mon Nov 16 09:10:00 2020 NaN Mon Nov 16 09:05:00 2020 NaN
please see also attached rrdtool dump
Acceptance / Success Criteria
None
Attachments
4
Lucidchart Diagrams
Activity
Show:
Dino Yancey December 3, 2020 at 9:31 PM
Merged
Dino Yancey November 21, 2020 at 4:07 AM
The collector is trying to persist non-integer values to counter datasources. Apparently jrobin (and newts, I'm sure) are more tolerant of this than rrd-based strategies.
Curiouser and curiouser. Something with the rrd strategy seems to be silently failing when trying to write the sample data, for a majority of the samples. I left a collection running for a few days on a 30 second interval and when I revisited this issue it contained data, but with very inconsistent timestamps:
Dino Yancey November 18, 2020 at 7:53 PM
Edited
* edited after further research *
Your collection works for me on 27.0.0 if I use JRobinRrdStrategy / jrobin, but nothing persists if I use MultithreadedJniRrdStrategy / rrd.
Not sure why this is, but it's a useful data point.
Martin Lärcher November 16, 2020 at 10:13 AM
The same behaviour with node-exporter and samples from here:
RRD files initially created correctly but we only get NaNs (please see also attached rrdtool dump file total_idle.xml).
This behaviour only seen with prometheus data with two labels - for example:
windows_cpu_time_total{core="0,0",mode="idle"}
Exporter data with max of one label store values correctly.
tested with windows_exporter: https://github.com/prometheus-community/windows_exporter installed on Windows Server 2016
to reproduce:
install windows_exporter on a windows node
please use attached datacollection configs and foreign-source configs to
add the windows node with installed prometheus exporter to a requisition and synchronize
wait some time, open node Resource Graphs and takes a look in the graph data
example data from windows_exporter:
windows_terminal_services_local_session_count{session="active"} 0
windows_terminal_services_local_session_count{session="inactive"} 3
windows_terminal_services_local_session_count{session="total"} 3
windows_logical_disk_free_bytes{volume="C:"} 9.302966272e+09
windows_logical_disk_free_bytes{volume="HarddiskVolume1"} 1.54140672e+08
windows_logical_disk_size_bytes{volume="C:"} 4.2422239232e+10
windows_logical_disk_size_bytes{volume="HarddiskVolume1"} 5.23239424e+08
windows_cpu_time_total{core="0,0",mode="dpc"} 5.984375
windows_cpu_time_total{core="0,0",mode="idle"} 248335.359375
windows_cpu_time_total{core="0,0",mode="interrupt"} 15.25
windows_cpu_time_total{core="0,0",mode="privileged"} 2093
windows_cpu_time_total{core="0,0",mode="user"} 7767.921875
windows_cpu_time_total{core="0,1",mode="dpc"} 46.03125
windows_cpu_time_total{core="0,1",mode="idle"} 250622.234375
windows_cpu_time_total{core="0,1",mode="interrupt"} 119.84375
windows_cpu_time_total{core="0,1",mode="privileged"} 2224.484375
windows_cpu_time_total{core="0,1",mode="user"} 5349.40625
in collectd debug logs no errors found:
2020-11-16 09:34:08,475 DEBUG [Collectd-Thread-9-of-150] o.o.n.c.s.AbstractCollectionAttribute: Visiting attribute Attribute[total_idle:250299.53125]
2020-11-16 09:34:08,476 DEBUG [Collectd-Thread-9-of-150] o.o.n.c.a.AbstractPersister: Persisting Attribute[total_idle:250299.53125]
2020-11-16 09:34:08,476 DEBUG [Collectd-Thread-9-of-150] o.o.n.c.a.AbstractPersister: Storing attribute Attribute[total_idle:250299.53125]
2020-11-16 09:34:08,476 INFO [Collectd-Thread-9-of-150] o.o.n.r.RrdMetaDataUtils: createMetaDataFile: creating meta data file /var/lib/opennms/rrd/snmp/fs/01-Test/1605513515011/winExporterCPU/0_0/total_idle.meta with values '{GROUP=windows_exporter_cpu_time, total_idle=total_idle}'
2020-11-16 09:34:08,476 DEBUG [Collectd-Thread-9-of-150] o.o.n.r.r.MultithreadedJniRrdStrategy: createDefinition: filename [/var/lib/opennms/rrd/snmp/fs/01-Test/1605513515011/winExporterCPU/0_0/total_idle.rrd] already exists returning null as definition
2020-11-16 09:34:08,476 DEBUG [Collectd-Thread-9-of-150] o.o.n.r.r.MultithreadedJniRrdStrategy: createRRD: skipping RRD file
2020-11-16 09:34:08,476 INFO [Collectd-Thread-9-of-150] o.o.n.c.p.r.RrdPersistOperationBuilder: updateRRD: updating RRD file /var/lib/opennms/rrd/snmp/fs/01-Test/1605513515011/winExporterCPU/0_0/total_idle.rrd with values '1605515648:250299.53125'
2020-11-16 09:34:08,476 DEBUG [Collectd-Thread-9-of-150] o.o.n.c.p.r.RrdPersistOperationBuilder: updateRRD: RRD update command completed.
the rrd file exists and gets updates
ls -al /var/lib/opennms/rrd/snmp/fs/01-Test/1605513515011/winExporterCPU/0_0/total_idle.rrd
-rw-r--r-- 1 root root 38232 Nov 16 09:34 /var/lib/opennms/rrd/snmp/fs/01-Test/1605513515011/winExporterCPU/0_0/total_idle.rrd
and in graph data from Resource Graphs:
Mon Nov 16 09:40:00 2020 NaN
Mon Nov 16 09:35:00 2020 NaN
Mon Nov 16 09:30:00 2020 NaN
Mon Nov 16 09:25:00 2020 NaN
Mon Nov 16 09:20:00 2020 NaN
Mon Nov 16 09:15:00 2020 NaN
Mon Nov 16 09:10:00 2020 NaN
Mon Nov 16 09:05:00 2020 NaN
please see also attached rrdtool dump