External Task Client hoggs CPU resources on startup

When running my external task worker which uses the Camunda External Task Client 1.2.2, I found that it hoggs CPU time for a substantial amount of time by “busy waiting”, and after some reverse engineering of the source code I assume that this is a bug in the external task client implementation.
It is likely not an issue if you run it on a multicore machine, but with limited CPU resources in the cloud, this may massively delay application startup.
What I am doing in my code is to build the external task client on startup, and later (after startup) I add subscriptions and “open” the external task client.

On startup (simplified):

ExternalTaskClient externalTaskClient = new ExternalTaskClientBuilderImpl().build();

Later on (simplified):

externalTaskClient.subscribe(topicName).handler(taskHandler).open();

The ExternalTaskClientBuilderImpl in its build method initializes the TopicSubscriptionManager, and if auto-fetching is enabled, which it is by default, it starts the topic subscription manager right away.

The TopicSubscriptionManager, in turn, is a Runnable which, after started and while not stopped, repeatedly calls its acquire method. The issue is that the acquire method runs synchronously as long as there are no subscriptions so far. This causes the “busy waiting” behaviour which consumes unreasonably much CPU share.

From my point of view, the auto-fetching feature should be opt-in rather than opt-out (disabling it actually resolved my problem). And apart from this, the TopicSubscriptionManager implementation should be tolerant against running without subscriptions (letting the Thread sleep for some short time or whatever).

What do you think?

P.S. This is another report of slow External Task Client, but it sounds unrelated: External Task Client

1 Like

Hi Andre,

Thanks for pointing this out, this makes a lot of sense.
Would you be open to creating a PR for your suggestion?

Hi Niall,

yes, I will provide a PR, but since this will be my first code contribution to Camunda, I need to set up the dev environment and read through the contribution guidelines first. I hope to find some time for it within the next week.

Thats great - thanks a lot - feel free to contact me if you have any questions

Hi Niall, actually I failed to build the camunda-external-task-client-java module after forking it. Looks like I need to configure some Camunda Maven repositories. Is there some documentation on how to get it running?

Hey Andre,

it should be possible to build the Client without configuring any specific Maven repos.

Could you please provide the error message?

Cheers,
Tassilo

Hi Tassilo,

if I just run mvn clean package without any modifications to the forked Github repo camunda-external-task-client-java, I get this Maven error message:

[ERROR]   The project org.camunda.bpm:camunda-external-task-client-root:1.4.0-SNAPSHOT (D:\ah\camunda\camunda-external-task-client-java\pom.xml) has 1 error
[ERROR]     Non-resolvable import POM: Could not transfer artifact org.camunda.bpm:camunda-bom:pom:7.12.0 from/to fix (me): Cannot access me with type default using the available connector factories: BasicRepositoryConnectorFactory @ line 44, column 19: Cannot acce
ss me using the registered transporter factories: WagonTransporterFactory: Unsupported transport protocol -> [Help 2]

This is obviously because of this repository definition in the root pom:

  <repositories>
      <repository><id>fix</id><url>me</url></repository>
  </repositories>

If I comment out this repository definition, I get a different Maven error:

[ERROR]   The project org.camunda.bpm:camunda-external-task-client-root:1.4.0-SNAPSHOT (D:\ah\camunda\camunda-external-task-client-java\pom.xml) has 1 error
[ERROR]     Non-resolvable import POM: Failure to find org.camunda.bpm:camunda-bom:pom:7.12.0 in https://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates
are forced @ line 44, column 19 -> [Help 2]

Version 7.12.0 has not yet been release to Maven Central, or am I wrong?

At that point, I gave up. Please let me know how to get it running.
Thanks, Andre

Hey Andre,

we need this definition in the root pom to work around an issue we face in CI.

The second problem is temporary. We are right now in the middle of the process to release Camunda BPM 7.12 which is scheduled for 30th of November. There is currently no SNAPSHOT of 7.13.0 available on Maven Central. This will be fixed as soon as the job to publish this artifact has been run for the first time.

In the meantime, you can just set the version in root pom.xml as well as in the client/pom.xml to 7.12.0-SNAPSHOT.

Does this answer your questions?

Cheers,
Tassilo

1 Like

Building the artifact camunda-external-task-client-java now works for me. Thanks, Tassilo!

I am not sure about how to implement the fix, though. My first guess was replacing in class TopicSubscriptionManager

    if (!taskTopicRequests.isEmpty()) {
      List<ExternalTask> externalTasks = fetchAndLock(taskTopicRequests);

by

    if (taskTopicRequests.isEmpty()) {
      // if there are no topics to fetch tasks for, be idle for some time to avoid busy waiting
      try {
        Thread.sleep(100);
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new RuntimeException(e);
      }
    } else {
      List<ExternalTask> externalTasks = fetchAndLock(taskTopicRequests);

But the RuntimeException that is thrown if the thread is interrupted will be swallowed by the “catch Throwable” in the run method:

  public void run() {
    while (isRunning.get()) {
      try {
        acquire();
      }
      catch (Throwable e) {
        LOG.exceptionWhileAcquiringTasks(e);
      }
    }
  }

Catching any Throwable here is in my opinion bad design, the thread would even try to fetch tasks after an OutOfMemoryError or anything similar serious. Changing this is however beyond the scope of the bug that I discovered.
One could argue that the thread is not interruptible in the current solution either so my fix would not make things worse, but I am not feeling comportable with providing code changes that are not bullet-proof…

@tasso94 @andre.hegerath is this issue got fixed in v7.12?

@tasso94 @andre.hegerath is this issue fixed? I am still having this problem.

Hi @giuseppe.p,

If you think this is a bug, feel free to raise a bug issue in our ticket tracker. We will then have a closer look at it.

Best,
Tassilo

In my case also, I can see high utilization of CPU, once all instance of the workflow finished. As andre.hegerath mentioned, i think here also busy waiting might be the cause of high cpu usage. Below is an example of external task client. Could you please tell me what is wrong with my code

ExternalTaskClient client = ExternalTaskClient.create().baseUrl(“http://10.118.12.18:30000/engine-rest”).build();
client.subscribe(“getMyTopic”).lockDuration(60000).handler((externalTask, externalTaskService) → {
/* my handler logic here… mostly rest call
externalTaskService.complete(externalTask);
}).open();

I am using CAMUNDA_VERSION=7.14.0