Best Practices for High Availability & Scalability of Custom Pub/Sub Connector

Shivansh_pandey · March 25, 2025, 6:42am

Hello Camunda Community,

I have developed a custom Google Pub/Sub connector for my organization, which will be a critical component in handling messages for most workflows. To ensure high availability, scalability, and resilience, I am looking for best practices while designing the solution.

Current Setup:

Camunda Platform: Camunda 8 (Self-Managed)
Connector Type: Custom-built Pub/Sub Connector
Runtime: Spring Boot
Deployment: Kubernetes

Key Considerations:

High Availability: Ensure zero/minimal downtime.
Scalability: Handle dynamic workloads efficiently.
Resilience: Prevent failures from affecting workflow execution.

Current Plan:

Kubernetes Horizontal Pod Autoscaling (HPA) to scale based on workload.
Retry & Dead Letter Topic (DLT) for handling failed messages.
Distributed Tracing & Logging via OpenTelemetry and centralized logging (e.g., ELK, Loki).
Pub/Sub Subscription Design: Consider using Push vs. Pull subscription for better performance.
Idempotency Handling: Avoid duplicate message processing.
Load Testing: Planning to use Gatling/K6 for performance benchmarking.

@jonathan.lukas @sbuettner