Best Practices for High Availability & Scalability of Custom Pub/Sub Connector

Hello Camunda Community,

I have developed a custom Google Pub/Sub connector for my organization, which will be a critical component in handling messages for most workflows. To ensure high availability, scalability, and resilience, I am looking for best practices while designing the solution.

Current Setup:

  • Camunda Platform: Camunda 8 (Self-Managed)
  • Connector Type: Custom-built Pub/Sub Connector
  • Runtime: Spring Boot
  • Deployment: Kubernetes

Key Considerations:

  • High Availability: Ensure zero/minimal downtime.
  • Scalability: Handle dynamic workloads efficiently.
  • Resilience: Prevent failures from affecting workflow execution.

Current Plan:

:white_check_mark: Kubernetes Horizontal Pod Autoscaling (HPA) to scale based on workload.
:white_check_mark: Retry & Dead Letter Topic (DLT) for handling failed messages.
:white_check_mark: Distributed Tracing & Logging via OpenTelemetry and centralized logging (e.g., ELK, Loki).
:white_check_mark: Pub/Sub Subscription Design: Consider using Push vs. Pull subscription for better performance.
:white_check_mark: Idempotency Handling: Avoid duplicate message processing.
:white_check_mark: Load Testing: Planning to use Gatling/K6 for performance benchmarking.

@jonathan.lukas @sbuettner

1 Like