Apparent memory leak in JMX collector, possibly restricted to "weird" JMX transports

Description

A newly upgraded Horizon installation, whose only job is to do JMX data collection from many JBoss (and perhaps other enterprisey) application containers, is exhibiting leaky behavior. With -Xmx=12G and nightly restarts already in place for unrelated reasons, system-wide performance slows to a crawl daily and JMX memory utilization resource graphs exhibit the classic monotonic-increase curve associated with a memory leak.

Eclipse MAT leak suspects report PDF is attached.

Sensitive assets available for internal retrieval:

Customer can produce logs, thread dumps, and any other necessary supporting materials upon request.

Environment

See https://mynms.opennms.com/Ticket/Display.html?id=6060

Acceptance / Success Criteria

None

Attachments

4
  • 16 May 2019, 07:56 PM
  • 03 May 2019, 04:16 PM
  • 03 May 2019, 03:08 PM
  • 03 May 2019, 02:25 PM

Lucidchart Diagrams

Activity

Show:

Jeff Gehlbach June 22, 2019 at 2:14 PM

Customer upgraded to 24.1.0 and cleaned out some defunct nodes. Results are promising, ticket resolved by customer.

Jeff Gehlbach June 12, 2019 at 7:11 PM

Customer says:

I just upgraded ems-6 to H24.1.0, let's see if that has any impact on the server first. Also I removed a bunch of nodes that were not collecting and that seems to help quite a bit.

Maybe a clue in there, so I thought I'd convey it from the ticket history.

Jeff Gehlbach June 12, 2019 at 5:47 PM

Thanks . I've asked the customer for a workday's worth of heap histograms, for the same JVM PID, spaced two hours apart. Does that sound about right?

Jesse White May 22, 2019 at 1:40 PM

It's hard to tell from the histogram, we would need several of these to understand if the usage is actually growing over time.

Jeff Gehlbach May 16, 2019 at 7:56 PM

Customer reports performance still degrades with patch JAR in place. I requested and am attaching a heap histogram generated via jmap -histo:live $ONMS_PID. It looks okay-ish to me; should I request another full heap dump?

Fixed

Details

Assignee

Reporter

Sprint

Affects versions

Priority

PagerDuty

Created May 3, 2019 at 2:26 PM
Updated June 22, 2019 at 2:14 PM
Resolved May 8, 2019 at 8:07 AM

Flag notifications