DnsResolutionMonitor should not use cache for lookups

Description

I followed the docs for DnsResolutionMonitor and run into an issue.

The service is defined like this:

<service name="DNS-Resolution-v4" interval="300000" user-defined="true" status="on"> <parameter key="retry" value="2"/> <parameter key="timeout" value="2000"/> <parameter key="resolution-type" value="v4"/> <parameter key="rrd-repository" value="/usr/share/opennms/share/rrd/response"/> <parameter key="rrd-base-name" value="dns-res-v4"/> <parameter key="ds-name" value="dns-res-v4"/> <parameter key="nameserver" value="8.8.8.8"/> </service> <monitor service="DNS-Resolution-v4" class-name="org.opennms.netmgt.poller.monitors.DNSResolutionMonitor" />

Scenario:

Added www.google.com and test.mydomain.com to ONMS (nodeLabels).
They got an IP from the local network and the DNS-Resolution-v4 service.
Added poller config, restarted ONMS, both nodes online.

Now I deleted my A record test.mydomain.com. Instantly nslookup test showed:

nslookup test.mydomain.com 8.8.8.8 Server: 8.8.8.8 Address: 8.8.8.8#53 ** server can't find test.mydomain.com: NXDOMAIN

But the DnsResolutionMonitor didn't go offline after 10-15 minutes. But if I restart ONMS, the poller recognizes the outage.

The www.google.com node doesn't get any outages (used as reference).

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Seth Leger April 5, 2017 at 10:31 AM

commit 66d29d2f580053d7ee6fe6f498142180c3e21a29

Jesse White April 5, 2017 at 9:47 AM

Marcel Fuhrmann April 5, 2017 at 2:34 AM

Thank you very much!

Seth Leger April 4, 2017 at 5:05 PM

The TTL would be set on the DNS server, which would return the value inside the DNS records. dnsjava would then use this value as the expiration time for its internal cache.

I'll work on this issue, looks like a 1-line fix. I don't see any need for it to be configurable; we should always do an uncached lookup straight to the server inside the DnsResolutionMonitor.

Marcel Fuhrmann April 3, 2017 at 3:26 PM

Thank you for analizing this issue. Is this call needed in the source code? Or is it useable in the configuration?

I'm not sure how the TTL was set. I can figure it out in the next days.

Fixed

Details

Assignee

Reporter

Fix versions

Affects versions

Priority

PagerDuty

Created March 22, 2017 at 10:34 AM
Updated April 5, 2017 at 1:47 PM
Resolved April 5, 2017 at 9:47 AM

Flag notifications