ActivateJobs request returns 504 status code (Camunda SaaS and .NET Zeebe Client)

Dzmitry · September 29, 2025, 5:03pm

Hi! I’m experiencing a Gateway issue using Camunda 8 SaaS. When activate jobs POST request gets sent the response code 504 (Gateway Timeout) gets returned back. That happens quite often. This error was originally noticed on the cloud instance where our app is hosted but I can reproduce it locally too.

It’s a .NET Core service and we use zeebe-client-csharp package to communicate with Camunda 8 SaaS.

Below is the request that fails with 504 status code

POST https://clusterId.regionId.zeebe.camunda.io:443/gateway_protocol.Gateway/ActivateJobs

The local dev environment doesn’t have any gateways or proxies between the app and Camunda, so I assume that the only gateway is on the Camunda side.

Here is the setup of workers:

client.NewWorker()
.JobType(jobType)
.Handler(jobHandler)
.MaxJobsActive(maxJobsActive)
.Name(jobType)
.PollingTimeout(requestTimeout)
.Timeout(timeout)
.Open();

I assume that with this configuration Zeebe client uses long polling. I have played with different configurations, but nothing helped. I’ve seen several topics with the same error but they weren’t helpful.

nathan.loding · September 29, 2025, 5:22pm

Hi @Dzmitry, welcome to the forums! What values do you have set for requestTimeout and timeout?

Dzmitry · September 30, 2025, 6:05am

Hi @nathan.loding, thanks! The requestTimeout is set to 10 minutes and timeout varies from worker to worker, but to localize an issue I used just one worker implementation with 5 minutes timeout.

nathan.loding · September 30, 2025, 5:55pm

@Dzmitry - what version of Camunda is deployed in your SaaS cluster? And when you say “reproduce it locally too” do you mean that running the .NET code locally gets the same error, or you are testing with a local copy of Camunda? And what version of the C# client are you using?

How often does the error occur? Does the error occur when there are jobs waiting to be picked up by the job worker, if there are no jobs, or a mix?

Dzmitry · September 30, 2025, 6:26pm

@nathan.loding - it’s actually on cloud.camunda.io. The version is 8.7.87. Regrading testing, I’m testing our own application which has Zeebe client v2.7 which connects to the cloud.camunda.io on my laptop. We also have a dev environment in cloud (Azure) where we originally spotted the issue.

The error occurs every single minute regardless whether there are jobs available to be picked up.

nathan.loding · September 30, 2025, 6:31pm

@Dzmitry - have you tried upgrading the client to 2.9? Looking at the changes, I suspect it won’t fix the issue, but that’s the first thing I would try.

Dzmitry · September 30, 2025, 6:34pm

@nathan.loding - I did try. But as you suspected, it didn’t have any effect.

nathan.loding · September 30, 2025, 6:51pm

@Dzmitry - always worth trying! The next thing I would try is to remove the client from the equation. I would try to use an API request with gpcurl or Postman or a similar tool, and call the ActivateJobs gRPC endpoint. Do you still run into issues? If so, then I would look at networking (though that seems odd because you’re experiencing from an Azure deployment as well as locally). If not, then we know there’s something with the .NET client that isn’t working properly.

Dzmitry · September 30, 2025, 6:55pm

@nathan.loding - thanks, I’ll try it and let you know if there’re any differences.

Dzmitry · October 1, 2025, 9:18am

Hi @nathan.loding. I’ve tried calling ActivateJobs gRPC endpoint and found an interesting correlation between failed request and successful one. When the requestTimeout parameter is >= 60 seconds the server response is always GRPC_STATUS_RESOURCE_EXHAUSTED. If it’s kept lower than 60 seconds then it’s OK.

nathan.loding · October 1, 2025, 3:26pm

@Dzmitry - I would recommend opening a support ticket at this point. Something seems off, and I suspect it isn’t with the C# client if you are getting errors while calling the endpoint without it. One other thing you could try - though it might not be worth the effort - is to build a simple job worker implementation with our Java client and see if you still get errors. But as I said, I suspect you will because you’re getting them without the C# client.

Dzmitry · October 7, 2025, 9:55am

We decided to file a support ticket and now investigating it. Thanks for your support @nathan.loding.

nathan.loding · October 7, 2025, 2:41pm

Hi @Dzmitry - I looked at the support ticket internally, and wanted to share that there is currently a pull request to add the backoff feature to the C# client:

github.com/camunda-community-hub/zeebe-client-csharp

feat: added backoff for job worker

main ← LennartKleymann:813-added-backoff-to-job-worker

opened 12:49PM - 23 Sep 25 UTC

LennartKleymann

+292 -4

closes #813 ### Feature: Configurable Exponential Backoff for Job Worker …This PR adds a configurable exponential backoff strategy for job polling retries in the C# client. This new feature helps prevent overwhelming the gateway and aligns the C# client with the behavior of the Java client. #### Motivation When the gateway is under heavy load and returns a **`RESOURCE_EXHAUSTED`** gRPC error, the current worker's aggressive retry behavior can create a **"thundering herd"** problem. By adding an exponential backoff, the worker can space out its retries, giving the gateway time to recover and improving overall system resilience. #### Changes & Usage * **Adds `IBackoffSupplier` and `IExponentialBackoffBuilder`:** These new APIs provide a fluent builder to configure the backoff policy. * **New Worker Builder API:** Use the new `BackoffSupplier()` method on the job worker builder to supply a custom backoff strategy. * **Example:** ```csharp var backoffSupplier = new ExponentialBackoffBuilder() .MinDelay(TimeSpan.FromMilliseconds(50)) .MaxDelay(TimeSpan.FromSeconds(5)) .BackoffFactor(1.6) .JitterFactor(0.1) .Build(); client.NewWorker() .JobType("payment") .Handler(handler) .BackoffSupplier(backoffSupplier) .Open(); ``` * **Backoff Logic:** The backoff is applied only on `RESOURCE_EXHAUSTED` errors and is reset to the initial polling interval after a successful job activation. #### Backwards Compatibility Existing worker configurations will continue to function without errors. However, the default polling behavior for these workers is now an exponential backoff with standard values, which differs from the previous fixed retry mechanism. The API remains backward compatible. #### Testing Unit tests have been added to verify the backoff behavior, including the handling of jitter and the monotonic increase of the delay.

I help maintain the client but due to CamundaCon NYC I haven’t been able to finishing reviewing the PR yet. I will make reviewing this a priority after CamundaCon is done!

Dzmitry · October 8, 2025, 12:16pm

Hi @nathan.loding - Thanks a lot! I do appreciate it.

Dzmitry · October 22, 2025, 8:56am

Hi @nathan.loding - thanks for reviewing the aforementioned pull request! I see there’s one more reviewer required to finalize it. Perhaps you could speak to Christopher Kujawa? Thanks in advance!