New Focal Point Topology UI (STUI-2) very slow
Description
Environment
Acceptance / Success Criteria
Lucidchart Diagrams
Activity
Jesse White January 27, 2015 at 4:56 PM
Fixed in b99e564f801a2c.
Seth Leger January 22, 2015 at 3:16 PM
Jesse was working on this issue this week.
David Hustace September 22, 2014 at 12:24 PM
Donald,
I did as you recommended, inspected the code, but only after looking in the logs to try and narrow down where the slownesses might be. It appears to me, based on the following 2 synchronous log entries,:
2014-09-19 13:34:13,283 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding nodes without links: true
2014-09-19 13:34:45,397 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding link-less node: Node-XYZ
that it takes about 32 seconds for the getAllNodesNoACL() call on line 358 to return the "all nodes list":
354 LOG.debug("loadtopology: adding nodes without links: " + isAddNodeWithoutLink());
355 if (isAddNodeWithoutLink()) {
356
357 List<OnmsNode> allNodes;
358 allNodes = getAllNodesNoACL();
359
360 for (OnmsNode onmsnode: allNodes) {
361 String nodeId = onmsnode.getNodeId();
362 if (getVertex(getVertexNamespace(), nodeId) == null) {
363 LOG.debug("loadtopology: adding link-less node: " + onmsnode.getLabel());
364 addVertices(getVertex(onmsnode));
365 }
366 }
367
368 }
and based on the next 2 synchronous log entries and the previous log messages above where adding link-less nodes begins that it takes 18 seconds to “add link-less nodes"
2014-09-19 13:35:03,118 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding link-less node: Node-123
2014-09-19 13:35:03,120 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: could not load topology configFile:/opt/opennms/etc/saved-linkd-graph.xml
The next most lengthy process seems to be from the very first log entry after clicking the topology map menu item in the Web-UI:
2014-09-19 13:34:11,558 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopolog
y: parsing link: org.opennms.netmgt.model.LldpLink@52a08562[NodeId=10112,lldpLocalPortNum=1,lldpPortIdSubType=interfaceAlias,lldpPortId=�,lldpPortDescr=NuDesign,lldpPortIfindex=<null>,lldpRemChassisId=�,lldpRemChassisSubType=chassisComponent,lldpRemSysname=NuD
esign,lldpRemPortIdSubType=interfaceAlias,lldpRemPortId=�,lldpRemPortDescr=NuDesign,createTime=2014-09-17 12:12:59.532,lastPollTime=2014-09-17 12:12:59.532]
to the first 2 synchronous log entries I posted above:
2014-09-19 13:34:13,283 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding nodes without links: true
2014-09-19 13:34:45,397 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding link-less node: Node-ABC
which is about another 2 seconds. The math:
Total time to begin animated display of vertices based on the first and last log message times:
13:35:03,124 - 13:34:11,558 ~= 52 seconds
Which is equals the time I found in the slow places in the log. You mentioned that it was the new topology provider that was making it slow the bulk of the slowness on this seems to be in the 32 seconds it takes in the DAO. I was pretty much able to dispel that the problem is in the DAO layer by manually running the following test against the DB:
[dhustace@nms01 ~]$ time psql -U opennms -c "select n.nodeid, n.nodelabel, snmp.id, snmp.snmpifdescr, ip.id, ip.ipaddr from node n join snmpinterface snmp on snmp.nodeid = n.nodeid join ipinterface ip on ip.nodeid = n.nodeid;" > /dev/null
real 0m27.330s
user 0m23.000s
sys 0m1.090s
Here you can see that it takes 27 seconds for the PSQL query to get all the nodes joined with all the snmp and ip interfaces. So, need not blame the DAO layer for there must be some overhead in instantiating all those objects to make up the other 5 seconds or so. I’m thinking that we need to find another way to handle this in the topology code and not get all the nodes and load them into memory.
I hope this helps figure out where the slowness is happening.
Details
Assignee
Jesse WhiteJesse WhiteReporter
David HustaceDavid HustaceLabels
Due date
Sep 30, 2014Components
Fix versions
Affects versions
Priority
Blocker
Details
Details
Assignee
Reporter
Labels
Due date
Components
Fix versions
Affects versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

OpenNMS with 8k nodes and only a handful of links takes approximately 60 seconds to load. Most of the time, 32 seconds, (based on logs) is spent in the method getAllNodesNoACL() of the EnhancedLinkdTopologyProvider class. Users think something is wrong and either click again repeatedly or simply move on.