ActivateJobs request returns 504 status code (Camunda SaaS and .NET Zeebe Client)

Hi! I’m experiencing a Gateway issue using Camunda 8 SaaS. When activate jobs POST request gets sent the response code 504 (Gateway Timeout) gets returned back. That happens quite often. This error was originally noticed on the cloud instance where our app is hosted but I can reproduce it locally too.

It’s a .NET Core service and we use zeebe-client-csharp package to communicate with Camunda 8 SaaS.

Below is the request that fails with 504 status code

POST https://clusterId.regionId.zeebe.camunda.io:443/gateway_protocol.Gateway/ActivateJobs

The local dev environment doesn’t have any gateways or proxies between the app and Camunda, so I assume that the only gateway is on the Camunda side.

Here is the setup of workers:

client.NewWorker()
.JobType(jobType)
.Handler(jobHandler)
.MaxJobsActive(maxJobsActive)
.Name(jobType)
.PollingTimeout(requestTimeout)
.Timeout(timeout)
.Open();

I assume that with this configuration Zeebe client uses long polling. I have played with different configurations, but nothing helped. I’ve seen several topics with the same error but they weren’t helpful.

Hi @Dzmitry, welcome to the forums! What values do you have set for requestTimeout and timeout?

Hi @nathan.loding, thanks! The requestTimeout is set to 10 minutes and timeout varies from worker to worker, but to localize an issue I used just one worker implementation with 5 minutes timeout.

@Dzmitry - what version of Camunda is deployed in your SaaS cluster? And when you say “reproduce it locally too” do you mean that running the .NET code locally gets the same error, or you are testing with a local copy of Camunda? And what version of the C# client are you using?

How often does the error occur? Does the error occur when there are jobs waiting to be picked up by the job worker, if there are no jobs, or a mix?

@nathan.loding - it’s actually on cloud.camunda.io. The version is 8.7.87. Regrading testing, I’m testing our own application which has Zeebe client v2.7 which connects to the cloud.camunda.io on my laptop. We also have a dev environment in cloud (Azure) where we originally spotted the issue.

The error occurs every single minute regardless whether there are jobs available to be picked up.

@Dzmitry - have you tried upgrading the client to 2.9? Looking at the changes, I suspect it won’t fix the issue, but that’s the first thing I would try.

@nathan.loding - I did try. But as you suspected, it didn’t have any effect.

@Dzmitry - always worth trying! The next thing I would try is to remove the client from the equation. I would try to use an API request with gpcurl or Postman or a similar tool, and call the ActivateJobs gRPC endpoint. Do you still run into issues? If so, then I would look at networking (though that seems odd because you’re experiencing from an Azure deployment as well as locally). If not, then we know there’s something with the .NET client that isn’t working properly.

@nathan.loding - thanks, I’ll try it and let you know if there’re any differences.

Hi @nathan.loding. I’ve tried calling ActivateJobs gRPC endpoint and found an interesting correlation between failed request and successful one. When the requestTimeout parameter is >= 60 seconds the server response is always GRPC_STATUS_RESOURCE_EXHAUSTED. If it’s kept lower than 60 seconds then it’s OK.

@Dzmitry - I would recommend opening a support ticket at this point. Something seems off, and I suspect it isn’t with the C# client if you are getting errors while calling the endpoint without it. One other thing you could try - though it might not be worth the effort - is to build a simple job worker implementation with our Java client and see if you still get errors. But as I said, I suspect you will because you’re getting them without the C# client.