New Focal Point Topology UI (STUI-2) very slow

Description

OpenNMS with 8k nodes and only a handful of links takes approximately 60 seconds to load. Most of the time, 32 seconds, (based on logs) is spent in the method getAllNodesNoACL() of the EnhancedLinkdTopologyProvider class. Users think something is wrong and either click again repeatedly or simply move on.

Environment

OpenNMS Web Console Version: 1.13.5-0.20140915.1.12.465 Server Time: Mon Sep 22 12:16:10 EDT 2014 Client Time: Mon Sep 22 2014 12:16:10 GMT-0400 (EDT) Java Version: 1.7.0_45 Oracle Corporation Java Virtual Machine: 24.45-b08 Oracle Corporation Operating System: Linux 2.6.32-358.el6.x86_64 (amd64) Servlet Container: jetty/7.6.15.v20140411 (Servlet Spec 2.5) User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36 Database Type: PostgreSQL Database Version: 9.3.5

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Jesse White January 27, 2015 at 4:56 PM

Fixed in b99e564f801a2c.

Seth Leger January 22, 2015 at 3:16 PM

Jesse was working on this issue this week.

David Hustace September 22, 2014 at 12:24 PM

Donald,

I did as you recommended, inspected the code, but only after looking in the logs to try and narrow down where the slownesses might be. It appears to me, based on the following 2 synchronous log entries,:

2014-09-19 13:34:13,283 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding nodes without links: true
2014-09-19 13:34:45,397 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding link-less node: Node-XYZ

that it takes about 32 seconds for the getAllNodesNoACL() call on line 358 to return the "all nodes list":

354 LOG.debug("loadtopology: adding nodes without links: " + isAddNodeWithoutLink());
355 if (isAddNodeWithoutLink()) {
356
357 List<OnmsNode> allNodes;
358 allNodes = getAllNodesNoACL();
359
360 for (OnmsNode onmsnode: allNodes) {
361 String nodeId = onmsnode.getNodeId();
362 if (getVertex(getVertexNamespace(), nodeId) == null) {
363 LOG.debug("loadtopology: adding link-less node: " + onmsnode.getLabel());
364 addVertices(getVertex(onmsnode));
365 }
366 }
367
368 }

and based on the next 2 synchronous log entries and the previous log messages above where adding link-less nodes begins that it takes 18 seconds to “add link-less nodes"

2014-09-19 13:35:03,118 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding link-less node: Node-123
2014-09-19 13:35:03,120 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: could not load topology configFile:/opt/opennms/etc/saved-linkd-graph.xml

The next most lengthy process seems to be from the very first log entry after clicking the topology map menu item in the Web-UI:

2014-09-19 13:34:11,558 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopolog
y: parsing link: org.opennms.netmgt.model.LldpLink@52a08562[NodeId=10112,lldpLocalPortNum=1,lldpPortIdSubType=interfaceAlias,lldpPortId=�,lldpPortDescr=NuDesign,lldpPortIfindex=<null>,lldpRemChassisId=�,lldpRemChassisSubType=chassisComponent,lldpRemSysname=NuD
esign,lldpRemPortIdSubType=interfaceAlias,lldpRemPortId=�,lldpRemPortDescr=NuDesign,createTime=2014-09-17 12:12:59.532,lastPollTime=2014-09-17 12:12:59.532]

to the first 2 synchronous log entries I posted above:

2014-09-19 13:34:13,283 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding nodes without links: true
2014-09-19 13:34:45,397 DEBUG org.opennms.features.topology.plugins.topo.linkd:1.13.5.SNAPSHOT(102) [qtp1572104652-2943402 - /opennms/topology?focusNodes=1411] org.opennms.features.topology.plugins.topo.linkd.internal.EnhancedLinkdTopologyProvider: loadtopology: adding link-less node: Node-ABC

which is about another 2 seconds. The math:

Total time to begin animated display of vertices based on the first and last log message times:
13:35:03,124 - 13:34:11,558 ~= 52 seconds

Which is equals the time I found in the slow places in the log. You mentioned that it was the new topology provider that was making it slow the bulk of the slowness on this seems to be in the 32 seconds it takes in the DAO. I was pretty much able to dispel that the problem is in the DAO layer by manually running the following test against the DB:

[dhustace@nms01 ~]$ time psql -U opennms -c "select n.nodeid, n.nodelabel, snmp.id, snmp.snmpifdescr, ip.id, ip.ipaddr from node n join snmpinterface snmp on snmp.nodeid = n.nodeid join ipinterface ip on ip.nodeid = n.nodeid;" > /dev/null

real 0m27.330s
user 0m23.000s
sys 0m1.090s

Here you can see that it takes 27 seconds for the PSQL query to get all the nodes joined with all the snmp and ip interfaces. So, need not blame the DAO layer for there must be some overhead in instantiating all those objects to make up the other 5 seconds or so. I’m thinking that we need to find another way to handle this in the topology code and not get all the nodes and load them into memory.

I hope this helps figure out where the slowness is happening.

Fixed

Details

Assignee

Reporter

Labels

Due date

Affects versions

Priority

PagerDuty

Created September 22, 2014 at 12:20 PM
Updated April 7, 2015 at 6:24 PM
Resolved January 27, 2015 at 4:56 PM

Flag notifications