Is it safe to scale multiple instances of a microservice using both Camunda 8.8 SDK and Orchestration Cluster API?

Hi everyone :waving_hand:

I have a custom microservice that serves as an integration layer between my applications and Camunda 8.8 (Self-Managed).
This service uses both:

  • the Camunda 8 Java SDK → to handle process-related operations (start process instances, publish messages, and run job workers), and

  • the Orchestration Cluster REST API → to call some management and task-related endpoints.

Currently, I’m running only one instance of this microservice.
I’m planning to scale it horizontally (run multiple instances) for performance and redundancy.

Before doing that, I want to confirm:

  1. Could there be any side effects — such as duplicated message publishing, race conditions, or job locking conflicts between workers?

  2. Are there any recommended best practices for scaling Zeebe clients or job workers (like unique worker names, connection handling, or idempotent logic)?

Thanks in advance for your help and insights :folded_hands:
— Rabeb

Hi @Rabeb_Abid! :waving_hand:

Great question about scaling your microservice horizontally! Based on your setup using both the Camunda 8 Java SDK and Orchestration Cluster REST API, here’s what you need to know:

:white_check_mark: Good News: It’s Generally Safe to Scale

Camunda 8 is designed to handle multiple client instances well, but there are important considerations and best practices to follow.

:magnifying_glass_tilted_left: Potential Side Effects & How to Avoid Them

1. Job Worker Conflicts

  • Built-in Protection: Zeebe handles job locking automatically - each job is assigned to only one worker with a timeout
  • Risk: If a worker fails to complete a job within the timeout, it may be reassigned to another worker
  • Solution: Design your workers to be idempotent - they should handle potential duplicate job execution gracefully

2. Message Publishing

  • No Duplication Risk: Publishing messages/starting processes from multiple instances won’t create duplicates unless your business logic does
  • Best Practice: Use unique correlation keys and implement idempotency checks in your business logic

3. Race Conditions

  • Risk: Multiple workers processing related data simultaneously
  • Solution: Minimize data read/written by workers and use proper synchronization mechanisms when needed

:rocket: Scaling Best Practices

For Job Workers:

  1. Configure maxJobsActive appropriately - tune it to match each worker instance’s capacity
  2. Use job streaming - more efficient than polling for high-throughput scenarios
  3. Enable long polling - set higher requestTimeout (e.g., 30 seconds) to reduce empty polling
  4. Use batch activation - activate multiple jobs per request using maxJobsToActivate
  5. Monitor backpressure metrics - scale based on actual load, not just assumptions

For REST API Usage:

  1. Use connection pooling - avoid creating new connections for every request
  2. Monitor gateway load - watch CPU and request rates to avoid overloading
  3. Prefer official SDKs - they handle connection management, backoff, and retries efficiently

General Scaling Guidelines:

  • Unique worker names: Not strictly required, but can help with monitoring and debugging
  • Idempotent logic: Essential - always assume jobs might be processed more than once
  • Resource monitoring: Use metrics to determine when scaling is actually needed

:bar_chart: Monitoring & Metrics

Watch these key indicators:

  • Job activation rates vs maxJobsActive settings
  • Gateway CPU and memory usage
  • Backpressure signals
  • Job completion times and timeouts

:link: Helpful Resources

Your architecture sounds solid! Just make sure to implement proper idempotency and monitor your system as you scale. Start with a small number of additional instances and gradually increase based on your metrics.

Let me know if you need clarification on any of these points! :folded_hands: