SNMP4J high thread churn
Description
Environment
Acceptance / Success Criteria
Attachments
- 19 Oct 2016, 09:05 PM
depends on
Lucidchart Diagrams
Activity
Ron Roskens June 16, 2017 at 1:42 PM
Probably. Closing this issue as a duplicate of https://opennms.atlassian.net/browse/NMS-9233#icft=NMS-9233.
Seth Leger June 14, 2017 at 12:43 PM
Is this related to NMS-9233?
Ron Roskens October 21, 2016 at 5:53 PM
As an exercise, I wrote a simple snmp4j client that does a query for the system object for a range on a network, and only created a single Snmp session used for all queries, instead of a new Snmp session for each target. It uses a single transport instance, with a multithreaded message dispatcher pool. So I would think in org.opennms.netmgt.snmp.snmp4j.Snmp4JStrategy it would be possible to have a pool of transport threads and re-use them over time instead of creating 43,600+ threads over 5 minutes.
package net.elfin;
import java.io.IOException;
import org.snmp4j.CommunityTarget;
import org.snmp4j.MessageDispatcher;
import org.snmp4j.MessageDispatcherImpl;
import org.snmp4j.PDU;
import org.snmp4j.Snmp;
import org.snmp4j.event.ResponseEvent;
import org.snmp4j.mp.MPv2c;
import org.snmp4j.mp.SnmpConstants;
import org.snmp4j.smi.Address;
import org.snmp4j.smi.OID;
import org.snmp4j.smi.OctetString;
import org.snmp4j.smi.SMIConstants;
import org.snmp4j.smi.UdpAddress;
import org.snmp4j.smi.VariableBinding;
import org.snmp4j.transport.AbstractTransportMapping;
import org.snmp4j.transport.DefaultUdpTransportMapping;
import org.snmp4j.util.MultiThreadedMessageDispatcher;
import org.snmp4j.util.ThreadPool;
/**
*
* @author roskens
*/
public class Snmp4J {
public static void main(String[] args) throws IOException {
OctetString community = new OctetString(System.getProperty("snmp4j.community", "public"));
final String network = System.getProperty("network", "127.0.0.");
final int range = Integer.getInteger("range", 1);
AbstractTransportMapping<? extends Address> transport = new DefaultUdpTransportMapping();
ThreadPool threadPool = ThreadPool.create("DispatcherPool", 10);
MessageDispatcher mtDispatcher = new MultiThreadedMessageDispatcher(threadPool, new MessageDispatcherImpl());
mtDispatcher.addMessageProcessingModel(new MPv2c());
Snmp snmp = new Snmp(mtDispatcher, transport);
transport.listen();
CommunityTarget comtarget = new CommunityTarget();
comtarget.setCommunity(community);
comtarget.setVersion(SnmpConstants.version2c);
comtarget.setRetries(2);
comtarget.setTimeout(1000);
PDU pdu = new PDU();
OID oid = new OID(".1.3.6.1.2.1.1.1.0");
pdu.add(new VariableBinding(oid));
for (int i = 1; i < range; i++) {
comtarget.setAddress(new UdpAddress(network + i + "/" + 161));
System.out.printf("Working on %s%d\n", network, i);
ResponseEvent responseEvent = snmp.send(pdu, comtarget);
if (responseEvent.getResponse() == null) {
System.out.printf("processResponse: Timeout.\n");
} else if (responseEvent.getError() != null) {
System.out.printf("processResponse: Error during get operation. Error: %s, requestID=%d\n", responseEvent.getError().getLocalizedMessage(), responseEvent.getError(), responseEvent.getRequest().getRequestID());
} else if (responseEvent.getResponse().getType() == PDU.REPORT) {
System.out.printf("processResponse: Error during get operation. Report returned with varbinds: %s, requestID=%d\n", responseEvent.getResponse().getVariableBindings(), responseEvent.getRequest().getRequestID());
} else if (responseEvent.getResponse().getVariableBindings().size() < 1) {
System.out.printf("processResponse: Received PDU with 0 varbinds. requestID=%d\n", responseEvent.getRequest().getRequestID());
} else if (responseEvent.getResponse().get(0).getSyntax() == SMIConstants.SYNTAX_NULL) {
System.out.printf("processResponse: Null value returned in varbind: %s. requestID={}\n", responseEvent.getResponse().get(0), responseEvent.getRequest().getRequestID());
} else {
System.out.printf("processResponse: SNMP operation successful, value: %s\n", responseEvent.getResponse().get(0).getVariable());
}
}
snmp.close();
transport.close();
System.exit(0);
}
}
Ron Roskens October 19, 2016 at 9:19 PM
provisiond is set for 8 import threads, 10 scan threads, 10 rescan threads, and 8 write threads.
collectd is set for 100 threads.
pollerd is set for 100 threads.
There are 2074 active SNMP services across all nodes.
Ron Roskens October 19, 2016 at 9:09 PM
I used this command to give me the list of all threads created over the 5 minute period. Any line that had a count of 10 is assumed to have been around over the period.
egrep 'tid=' nms-8825.txt | perl -ple 's/\s*\[[0-9a-fx]+\]//g; s/ in Object.wait\(\)//; s/ runnable//; s/ waiting on condition//; s/ waiting for monitor entry//; s/ sleeping//;' |sort |uniq -c | awk '{if ($1 < 10) { print; }}'
Details
Assignee
UnassignedUnassignedReporter
Ron RoskensRon RoskensFix versions
Affects versions
Priority
Critical
Details
Details
Assignee
Reporter
Fix versions
Affects versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

Over the course of 5 minutes, over 54,000! new threads were created.
Over 43,600 new DefaultUDPTransportMapping_0.0.0.0/0 threads were created and exited.
Over 10,000 new Timer threads were created and exited.
I have 3342 nodes defined in the database.
This is a pretty big performance issue for me, hence the blocker priority.