Scriptd consumes CPU even when it does nothing

Description

The default configuration for Scriptd is empty, meaning it should do nothing.

However, the CPU usage of the Scriptd threads increases proportionally to the events injection rate (and fluctuates around some average). That means, on a busy system that is processing thousands of events per second, the amount of CPU taken by Scriptd can decrease the overall performance of OpenNMS, preventing other features from working properly.

I think it would be useful that Scriptd analyses the configuration and inhibit itself from listening to events when there is no configuration requiring that. And when there is a need for a listener, make sure it won't overwhelm the rest of the JVM.

On the system on which I observed this the first time, Actiond was also behaving similarly. I've never seen a customer using Actiond before, but certainly, Scriptd is more widely used, which is why I focused this issue on it.

I'm targeting M2020 and the latest H27 because before the refactoring to use Immutable Events, the impact on CPU was not that high, which makes me believe that code change might be related.

I used jvm-tools to analyze a clean system running 27.1.0:

sudo java -jar sjk-plus-0.17.jar ttop --pid $(cat /var/log/opennms/opennms.pid) --filter '*Scriptd*' --verbose

Also when using stress-events via Karaf Shell to generate 2000 events per second, I can see:

2021-03-23T10:51:45.525-0400 Process summary process cpu=96.07% application cpu=94.14% (user=74.36% sys=19.77%) other: cpu=1.93% thread count: 2 GC time=0.79% (young=0.79%, old=0.00%) heap allocation rate 173mb/s safe point rate: 1.1 (events/s) avg. safe point pause: 7.68ms safe point sync time: 0.01% processing time: 0.84% (wallclock time) [000356] user= 9.35% sys= 2.74% alloc= 44mb/s - Scriptd-Executor-Thread [000355] user= 2.31% sys= 0.56% alloc= 5847kb/s - Scriptd:BroadcastEventProcessor-Thread

I believe that's excessive for something that is not being in use.

Acceptance / Success Criteria

None

Confluence content

mentioned on

Lucidchart Diagrams

Activity

Show:

Zoë Knox March 16, 2022 at 6:26 PM

Zoë Knox March 15, 2022 at 5:56 PM

It is simple enough to disable Scriptd and Actiond when there are no scripts or actions configured. It saves a small amount of CPU, and may help under high event loads. Before the changes, at 2000 ev/s:

2022-03-15T12:42:06.970-0400 Process summary process cpu=183.72% application cpu=159.19% (user=79.22% sys=79.97%) other: cpu=24.53% thread count: 871 GC time=0.92% (young=0.92%, old=0.00%) heap allocation rate 74mb/s safe point rate: 0.9 (events/s) avg. safe point pause: 11.79ms safe point sync time: 0.03% processing time: 1.03% (wallclock time) [000476] user= 4.64% sys= 5.22% alloc= 4086kb/s - Scriptd:BroadcastEventProcessor-Thread [000475] user= 3.47% sys= 4.06% alloc= 3429kb/s - Actiond:BroadcastEventProcessor-Thread [000479] user= 2.07% sys= 2.38% alloc= 2003kb/s - Scriptd-Executor-Thread

and with Scriptd auto-disabled for having no scripts configured:

2022-03-15T13:52:01.574-0400 Process summary process cpu=175.57% application cpu=145.14% (user=74.02% sys=71.12%) other: cpu=30.43% thread count: 877 GC time=0.69% (young=0.69%, old=0.00%) heap allocation rate 77mb/s safe point rate: 1.0 (events/s) avg. safe point pause: 8.25ms safe point sync time: 0.03% processing time: 0.81% (wallclock time) [000479] user= 3.57% sys= 2.74% alloc= 4151kb/s - Actiond:BroadcastEventProcessor-Thread

 

So is it worth it to disable scriptd when not configured? (Detecting whether Actiond has a config is harder and possibly not a "quick win").

Alberto November 18, 2021 at 12:40 AM
Edited

I'm new to OpenNMS and tried to follow the same steps.

  • Started a clean instance 27.1.2

  • Started monitoring scripd

  • Started stress-events for 2000 events/s

Couldn't replicate the CPU usage problem

Running the command

sudo java -jar sjk-plus-0.17.jar ttop --pid $(cat /var/log/opennms/opennms.pid) --filter 'Scriptd' --verbose

The highest values found were:

2021-11-17T19:29:30.184-0500 Process summary    process cpu=47.82%   application cpu=34.97% (user=26.89% sys=8.07%)   other: cpu=12.86%    thread count: 2   GC time=0.25% (young=0.25%, old=0.00%)   heap allocation rate 16mb/s   safe point rate: 7.0 (events/s) avg. safe point pause: 9.95ms   safe point sync time: 0.54% processing time: 6.40% (wallclock time) [000386] user= 0.15% sys= 0.06% alloc=   56kb/s - Scriptd:BroadcastEventProcessor-Thread [000389] user= 0.11% sys= 0.03% alloc=   27kb/s - Scriptd-Executor-Th

Maybe there are other steps I should have followed to be able to replicate?

 

Fixed

Details

Assignee

Reporter

Labels

HB Grooming Date

HB Backlog Status

Components

Sprint

Affects versions

Priority

PagerDuty

Created March 23, 2021 at 2:54 PM
Updated March 29, 2023 at 1:27 PM
Resolved March 29, 2023 at 1:25 PM

Flag notifications