SNMP4J high thread churn

Description

Over the course of 5 minutes, over 54,000! new threads were created.

Over 43,600 new DefaultUDPTransportMapping_0.0.0.0/0 threads were created and exited.
Over 10,000 new Timer threads were created and exited.

I have 3342 nodes defined in the database.

This is a pretty big performance issue for me, hence the blocker priority.

Environment

JDK 8u102. RHEL 6.

Acceptance / Success Criteria

None

Attachments

1
  • 19 Oct 2016, 09:05 PM

depends on

Lucidchart Diagrams

Activity

Ron Roskens June 16, 2017 at 1:42 PM

Probably. Closing this issue as a duplicate of https://opennms.atlassian.net/browse/NMS-9233#icft=NMS-9233.

Seth Leger June 14, 2017 at 12:43 PM

Is this related to NMS-9233?

Ron Roskens October 21, 2016 at 5:53 PM

As an exercise, I wrote a simple snmp4j client that does a query for the system object for a range on a network, and only created a single Snmp session used for all queries, instead of a new Snmp session for each target. It uses a single transport instance, with a multithreaded message dispatcher pool. So I would think in org.opennms.netmgt.snmp.snmp4j.Snmp4JStrategy it would be possible to have a pool of transport threads and re-use them over time instead of creating 43,600+ threads over 5 minutes.

package net.elfin; import java.io.IOException; import org.snmp4j.CommunityTarget; import org.snmp4j.MessageDispatcher; import org.snmp4j.MessageDispatcherImpl; import org.snmp4j.PDU; import org.snmp4j.Snmp; import org.snmp4j.event.ResponseEvent; import org.snmp4j.mp.MPv2c; import org.snmp4j.mp.SnmpConstants; import org.snmp4j.smi.Address; import org.snmp4j.smi.OID; import org.snmp4j.smi.OctetString; import org.snmp4j.smi.SMIConstants; import org.snmp4j.smi.UdpAddress; import org.snmp4j.smi.VariableBinding; import org.snmp4j.transport.AbstractTransportMapping; import org.snmp4j.transport.DefaultUdpTransportMapping; import org.snmp4j.util.MultiThreadedMessageDispatcher; import org.snmp4j.util.ThreadPool; /** * * @author roskens */ public class Snmp4J { public static void main(String[] args) throws IOException { OctetString community = new OctetString(System.getProperty("snmp4j.community", "public")); final String network = System.getProperty("network", "127.0.0."); final int range = Integer.getInteger("range", 1); AbstractTransportMapping<? extends Address> transport = new DefaultUdpTransportMapping(); ThreadPool threadPool = ThreadPool.create("DispatcherPool", 10); MessageDispatcher mtDispatcher = new MultiThreadedMessageDispatcher(threadPool, new MessageDispatcherImpl()); mtDispatcher.addMessageProcessingModel(new MPv2c()); Snmp snmp = new Snmp(mtDispatcher, transport); transport.listen(); CommunityTarget comtarget = new CommunityTarget(); comtarget.setCommunity(community); comtarget.setVersion(SnmpConstants.version2c); comtarget.setRetries(2); comtarget.setTimeout(1000); PDU pdu = new PDU(); OID oid = new OID(".1.3.6.1.2.1.1.1.0"); pdu.add(new VariableBinding(oid)); for (int i = 1; i < range; i++) { comtarget.setAddress(new UdpAddress(network + i + "/" + 161)); System.out.printf("Working on %s%d\n", network, i); ResponseEvent responseEvent = snmp.send(pdu, comtarget); if (responseEvent.getResponse() == null) { System.out.printf("processResponse: Timeout.\n"); } else if (responseEvent.getError() != null) { System.out.printf("processResponse: Error during get operation. Error: %s, requestID=%d\n", responseEvent.getError().getLocalizedMessage(), responseEvent.getError(), responseEvent.getRequest().getRequestID()); } else if (responseEvent.getResponse().getType() == PDU.REPORT) { System.out.printf("processResponse: Error during get operation. Report returned with varbinds: %s, requestID=%d\n", responseEvent.getResponse().getVariableBindings(), responseEvent.getRequest().getRequestID()); } else if (responseEvent.getResponse().getVariableBindings().size() < 1) { System.out.printf("processResponse: Received PDU with 0 varbinds. requestID=%d\n", responseEvent.getRequest().getRequestID()); } else if (responseEvent.getResponse().get(0).getSyntax() == SMIConstants.SYNTAX_NULL) { System.out.printf("processResponse: Null value returned in varbind: %s. requestID={}\n", responseEvent.getResponse().get(0), responseEvent.getRequest().getRequestID()); } else { System.out.printf("processResponse: SNMP operation successful, value: %s\n", responseEvent.getResponse().get(0).getVariable()); } } snmp.close(); transport.close(); System.exit(0); } }

Ron Roskens October 19, 2016 at 9:19 PM

provisiond is set for 8 import threads, 10 scan threads, 10 rescan threads, and 8 write threads.
collectd is set for 100 threads.
pollerd is set for 100 threads.

There are 2074 active SNMP services across all nodes.

Ron Roskens October 19, 2016 at 9:09 PM

I used this command to give me the list of all threads created over the 5 minute period. Any line that had a count of 10 is assumed to have been around over the period.

egrep 'tid=' nms-8825.txt | perl -ple 's/\s*\[[0-9a-fx]+\]//g; s/ in Object.wait\(\)//; s/ runnable//; s/ waiting on condition//; s/ waiting for monitor entry//; s/ sleeping//;' |sort |uniq -c | awk '{if ($1 < 10) { print; }}'
Fixed

Details

Assignee

Reporter

Fix versions

Affects versions

Priority

PagerDuty

Created October 19, 2016 at 9:01 PM
Updated June 17, 2017 at 2:15 PM
Resolved June 16, 2017 at 1:42 PM

Flag notifications