Camunda 8.8 – SDK Java searchUserTasks() takes 35–40 seconds after process start (too slow?)

Hi everyone,

I’m using Camunda 8.8.2 with the official SDK:
implementation ‘io.camunda:camunda-spring-boot-starter:8.8.2’

After starting a process instance, I run a user task search like this:

SearchResponseuserTasks = camundaClient
.newUserTaskSearchRequest()
.filter(f → f
.processInstanceKey(processInstanceKey)
.state(UserTaskState.CREATED))
.sort(s → s.creationDate().asc())
.page(p → p.limit(1))
.send().join();

However, this search takes 35–40 seconds to return results after starting a process instance.
I expected the tasks to be searchable almost instantly (within 1–2 seconds).

Is this delay normal in Camunda 8.8?
Are there any known performance issues or configuration tweaks that could help reduce this delay?

Also, I’d like to know:
:backhand_index_pointing_right: Would it be better to use the Feign REST API (/v2/user-tasks/search) instead of the SDK’s camundaClient.newUserTaskSearchRequest() for faster results?
Or is the SDK still the recommended and most efficient approach ?

Environment:

  • Camunda Platform 8.8.2 (Self-managed)

Hello @Rabeb_Abid ,

as the Spring Boot starter resp. Camunda client uses the REST API under the hood, I expect that using the API directly or an alternative client would not change a thing.

I gave it a try myself and got a result within 2-3 seconds. For this, I used this code:

@PostMapping("/process-instance")
public UserTask createProcessInstance() {
  ProcessInstanceEvent processInstanceEvent =
      camundaClient
          .newCreateInstanceCommand()
          .bpmnProcessId("userTaskProcess")
          .latestVersion()
          .execute();
  return awaitUserTask(processInstanceEvent, "Activity_0nhuyco");
}

private UserTask awaitUserTask(ProcessInstanceEvent processInstanceEvent, String userTaskId) {
  SearchResponse<UserTask> execute =
      camundaClient
          .newUserTaskSearchRequest()
          .filter(
              f ->
                  f.processInstanceKey(processInstanceEvent.getProcessInstanceKey())
                      .elementId(userTaskId))
          .execute();
  if (execute.items().isEmpty()) {
    try {
      Thread.sleep(100L);
    } catch (InterruptedException e) {
      throw new RuntimeException("Interrupted while waiting for user task", e);
    }
    return awaitUserTask(processInstanceEvent, userTaskId);
  }
  return execute.items().get(0);
}

Note: The recursion might not be the most elegant approach here, a resilience4j wrapper would be better, but I wanted to make it work quickly :slight_smile:

So I believe there could be an issue with:

  • the exporter
  • the underlying elasticsearch

Are there any suspicious logs indicating this?

What I also see is that you are adding a sort and page which is not required in that situation. Would it speed up things if you remove them?

Jonathan

Thanks Jonathan for the detailed example :folded_hands:

I tested your exact code , and unfortunately, I still get a huge delay.
The call to newCreateInstanceCommand() and newUserTaskSearchRequest() takes around 1 min 50 s before returning a result.

Here’s what I see in the logs:

o.s.b.a.health.HealthEndpointSupport : Health contributor io.camunda.client.spring.actuator.CamundaClientHealthIndicator (camundaClient) took 40327ms to respond

So even the health indicator of the camundaClient takes ~40s to respond.

It seems the delay happens before the actual search result — maybe at the client connection or exporter synchronization level?

        ProcessInstanceEvent processInstanceEvent =
                camundaClient
                        .newCreateInstanceCommand()
                        .bpmnProcessId("test")
                        .latestVersion()
                        .execute();
        awaitUserTask(processInstanceEvent);

    private UserTask awaitUserTask(ProcessInstanceEvent processInstanceEvent) {
        SearchResponse<UserTask> execute =
                camundaClient
                        .newUserTaskSearchRequest()
                        .filter(
                                f ->
                                        f.processInstanceKey(processInstanceEvent.getProcessInstanceKey())
                                                .state(UserTaskState.CREATED))
                        .execute();
        if (execute.items().isEmpty()) {
            try {
                Thread.sleep(100L);
            } catch (InterruptedException e) {
                throw new RuntimeException("Interrupted while waiting for user task", e);
            }
            return awaitUserTask(processInstanceEvent);
        }
        return execute.items().getFirst();
    }

Any other ideas on what could cause such a long initial response time?

Thanks again for your help! :folded_hands:
— Rabeb

Hello @Rabeb_Abid ,

this is really interesting…

From here, it looks ALL responses are delayed by at least 40 seconds (the health indicator also pings camunda).

What is the network that is used to connect to Camunda?

Jonathan

Thanks Jonathan :folded_hands:

Yes, that’s what surprised me too — even the health check is delayed by ~40 seconds.

Here’s my setup:

  • Camunda Platform 8.8.2 (self-managed, running with Docker Compose)

  • Spring Boot app running on the same host machine as the Camunda containers

  • All components are on localhost

So there’s no external network between the Spring Boot app and Camunda.
The delay happens even when everything runs locally (no cloud, no VPN).

Do you think the Docker network configuration could still cause this?

services:
  orchestration: # Consolidated Zeebe + Operate + Tasklist - https://docs.camunda.io/docs/self-managed/setup/deploy/other/docker/#zeebe
    image: camunda/camunda:${CAMUNDA_VERSION}
    container_name: orchestration
    ports:
      - "26500:26500"
      - "9600:9600"
      - "5088:8080"
    restart: always
    healthcheck:
      test: ["CMD-SHELL", "bash -c 'exec 3<>/dev/tcp/127.0.0.1/9600 && echo -e \"GET /actuator/health/status HTTP/1.1\r\nHost: localhost\r\n\r\n\" >&3 && head -n 1 <&3'"]
      interval: 1s
      retries: 30
      start_period: 30s
    volumes:
      - zeebe:/usr/local/zeebe/data
    configs:
      - source: orchestration-config
        target: /usr/local/camunda/config/application.yaml
    networks:
      - camunda
    depends_on:
      elasticsearch:
        condition: service_healthy

  # Camunda Connectors - executes outbound and inbound connector logic
  # Docs: https://docs.camunda.io/docs/self-managed/connectors-deployment/connectors-configuration/
  connectors:
    image: camunda/connectors-bundle:${CAMUNDA_CONNECTORS_VERSION}
    container_name: connectors
    ports:
      - "5086:8080"
    environment:
      - management.endpoints.web.exposure.include=health,configprops
      - management.endpoint.health.probes.enabled=true
    env_file: connector-secrets.txt
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:8080/actuator/health/readiness"]
      interval: 30s
      timeout: 1s
      retries: 5
      start_period: 30s
    configs:
      - source: connectors-config
        target: application.yaml
    networks:
      - camunda
    depends_on:
      orchestration:
        # Wait for orchestration service to be healthy, otherwise we get a lot of noisy logs from connection errors
        condition: service_healthy

  elasticsearch: # https://hub.docker.com/_/elasticsearch
    image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTIC_VERSION}
    container_name: elasticsearch
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      # allow running with low disk space
      - cluster.routing.allocation.disk.threshold_enabled=false
      # Disable noisy deprecation logs, see https://github.com/camunda/camunda/issues/26285
      - logger.org.elasticsearch.deprecation="OFF"
    restart: unless-stopped
    healthcheck:
      test: [ "CMD-SHELL", "curl -f http://localhost:9200/_cat/health | grep -q green" ]
      interval: 1s
      retries: 30
      start_period: 30s
      timeout: 1s
    volumes:
      - elastic:/usr/share/elasticsearch/data
    networks:
      - camunda

volumes:
  zeebe:
  elastic:

networks:
  camunda:

configs:
  connectors-config:
    content: |
      camunda:
        client:
          mode: self-managed
          grpc-address: http://orchestration:26500
          rest-address: http://orchestration:8080
        connectors:
          secretprovider:
            environment:
              prefix: "CONNECTORS_SECRET"

  orchestration-config:
    content: |
      management.endpoints.configprops.show-values: always
      camunda:
        security:
          authentication:
            method: "basic"
            unprotectedApi: true
          authorizations:
            enabled: false
          initialization:
            users:
              - username: "demo"
                password: "demo"
                name: "Demo User"
                email: "demo@demo.com"
            defaultRoles.admin.users:
              - "demo"
        database.index.numberOfReplicas: 0 # Single node elasticsearch so we disable replication
        data:
          secondary-storage:
            type: elasticsearch
            elasticsearch:
              cluster-name: elasticsearch
              url: "http://elasticsearch:9200"

— Rabeb

Hello @Rabeb_Abid ,

thank you for sending over the setup details. Also here, I cannot find anything suspicous.

In fact, I am using the entire same setup.

Does the log of “orchestration” contain something suspicious?

Jonathan

Hi Jonathan,

I found the cause of the latency :bullseye:

I had around 20 job workers running in my Spring Boot app (for different task types).
After I commented them out, the start + awaitUserTask execution dropped from ~1m50s to 3s.

In the logs, I noticed this for each worker:

[pool-2-thread-1] io.camunda.client.job.poller : Polling at max 32 jobs for worker createReservationListener#CreateReservation and job type CreateReservationListener

So it looks like all these workers were polling simultaneously and causing load on the Zeebe broker — even though I wasn’t actively using them during the test.

In my real BPMN process, these workers are required since each one handles a specific service task

Would you recommend:

  • increasing the polling interval or maxJobsActive,

  • or using a different setup (for example, starting job workers only on-demand)?

I do need these workers in my real scenario, but I’d like to avoid this kind of latency.

Thanks a lot for your help,
— Rabeb

Hi @Rabeb_Abid

Enable streaming mode for your workers. This will reduce activation latency and load on the broker. https://camunda.com/blog/2024/03/reducing-job-activation-delay-zeebe/

If you continue with polling, tune your worker config:

  • Increase poll-interval to avoid aggressive polling.

  • Increase request-timeout (long polling) to avoid frequent requests when idle.

  • Lower max-jobs-active so each poll fetches fewer jobs

Try using the following configuration values

camunda:
 client:
  worker:
   defaults:
    max-jobs-active: 5
    poll-interval: PT5S
    request-timeout: PT30S

https://docs.camunda.io/docs/apis-tools/camunda-spring-boot-starter/properties-reference/#camundaclientworkerdefaults

2 Likes

Hello @Rabeb_Abid ,

I agree with @hassang .

Thank you for jumping in!

Jonathan

Hi Camunda Team :waving_hand:

I’m still experiencing a noticeable delay (sometimes several seconds) between starting a process instance and being able to retrieve the first user task — even after applying all the recommended worker configuration optimizations.

camunda:
 client:
  worker:
   defaults:
    max-jobs-active: 5
    poll-interval: PT5S
    request-timeout: PT30S

And I can confirm that my worker is correctly initialized:

Starting job worker: JobWorkerValue{
  type='CancelReservationListener',
  name='cancelReservationListener#CancelReservation',
  timeout=PT5M,
  maxJobsActive=5,
  requestTimeout=PT30S,
  pollInterval=PT1S,
  streamEnabled=false
}

Or is there a recommended way to get the first user task faster after starting a process?

Any insights or best practices would be very helpful :folded_hands:

Thanks,
Rabeb

Hello @Rabeb_Abid ,

what would definitely help would be to use a user task listener. In that way, you could remove the polling part and rely on the callback from the listener.

I hope this helps
Jonathan

@Rabeb_Abid

Have you tried enabling streaming instead of using polling?

https://docs.camunda.io/docs/apis-tools/camunda-spring-boot-starter/configuration/#enable-job-streaming

1 Like

@jonathan.lukas

Aren’t listeners implemented as job workers? How could polling part be removed when listeners are used? Thanks in advance :smiling_face:

1 Like

Hello @hassang ,

I mean the polling of the tasklist endpoint here, not the job polling.

Sorry, that was not clear, thank you for checking back.

Jonathan

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.