Hi everyone 
I have a custom microservice that serves as an integration layer between my applications and Camunda 8.8 (Self-Managed).
This service uses both:
-
the Camunda 8 Java SDK → to handle process-related operations (start process instances, publish messages, and run job workers), and
-
the Orchestration Cluster REST API → to call some management and task-related endpoints.
Currently, I’m running only one instance of this microservice.
I’m planning to scale it horizontally (run multiple instances) for performance and redundancy.
Before doing that, I want to confirm:
-
Could there be any side effects — such as duplicated message publishing, race conditions, or job locking conflicts between workers?
-
Are there any recommended best practices for scaling Zeebe clients or job workers (like unique worker names, connection handling, or idempotent logic)?
Thanks in advance for your help and insights 
— Rabeb
Hi @Rabeb_Abid! 
Great question about scaling your microservice horizontally! Based on your setup using both the Camunda 8 Java SDK and Orchestration Cluster REST API, here’s what you need to know:
Good News: It’s Generally Safe to Scale
Camunda 8 is designed to handle multiple client instances well, but there are important considerations and best practices to follow.
Potential Side Effects & How to Avoid Them
1. Job Worker Conflicts
- Built-in Protection: Zeebe handles job locking automatically - each job is assigned to only one worker with a timeout
- Risk: If a worker fails to complete a job within the timeout, it may be reassigned to another worker
- Solution: Design your workers to be idempotent - they should handle potential duplicate job execution gracefully
2. Message Publishing
- No Duplication Risk: Publishing messages/starting processes from multiple instances won’t create duplicates unless your business logic does
- Best Practice: Use unique correlation keys and implement idempotency checks in your business logic
3. Race Conditions
- Risk: Multiple workers processing related data simultaneously
- Solution: Minimize data read/written by workers and use proper synchronization mechanisms when needed
Scaling Best Practices
For Job Workers:
- Configure
maxJobsActive appropriately - tune it to match each worker instance’s capacity
- Use job streaming - more efficient than polling for high-throughput scenarios
- Enable long polling - set higher
requestTimeout (e.g., 30 seconds) to reduce empty polling
- Use batch activation - activate multiple jobs per request using
maxJobsToActivate
- Monitor backpressure metrics - scale based on actual load, not just assumptions
For REST API Usage:
- Use connection pooling - avoid creating new connections for every request
- Monitor gateway load - watch CPU and request rates to avoid overloading
- Prefer official SDKs - they handle connection management, backoff, and retries efficiently
General Scaling Guidelines:
- Unique worker names: Not strictly required, but can help with monitoring and debugging
- Idempotent logic: Essential - always assume jobs might be processed more than once
- Resource monitoring: Use metrics to determine when scaling is actually needed
Monitoring & Metrics
Watch these key indicators:
- Job activation rates vs
maxJobsActive settings
- Gateway CPU and memory usage
- Backpressure signals
- Job completion times and timeouts
Helpful Resources
Your architecture sounds solid! Just make sure to implement proper idempotency and monitor your system as you scale. Start with a small number of additional instances and gradually increase based on your metrics.
Let me know if you need clarification on any of these points! 