Multiple Machines High FetchAndLock Traffic

Hello,

I have been getting bombed with the following error -

06-Jul-2020 00:41:38.034 WARNING [http-nio-8089-exec-453] org.camunda.bpm.engine.rest.exception.RestExceptionHandler.toResponse org.camunda.bpm.engine.rest.exception.InvalidRequestException: At the moment the server has to handle too many requests at the same time. Please try again later.
	at org.camunda.bpm.engine.rest.impl.FetchAndLockHandlerImpl.errorTooManyRequests(FetchAndLockHandlerImpl.java:238)
	at org.camunda.bpm.engine.rest.impl.FetchAndLockHandlerImpl.addRequest(FetchAndLockHandlerImpl.java:191)
	at org.camunda.bpm.engine.rest.impl.FetchAndLockHandlerImpl.addPendingRequest(FetchAndLockHandlerImpl.java:292)
	at org.camunda.bpm.engine.rest.impl.FetchAndLockRestServiceImpl.fetchAndLock(FetchAndLockRestServiceImpl.java:37)
	at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:137)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:296)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:250)
	at org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:140)
	at org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:103)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:377)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:200)
	at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:220)
	at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
	at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.camunda.bpm.engine.rest.filter.CacheControlFilter.doFilter(CacheControlFilter.java:44)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.camunda.bpm.engine.rest.filter.EmptyBodyFilter.doFilter(EmptyBodyFilter.java:98)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:770)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1415)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Unknown Source)

I was wondering if it might be related to the fact that I am using the FetchAndLock rest API against my own c# services implementations with the asyncResponseTimeout parameter.
I wanted to ask a few things,

  1. Is there a limit to the amount of requests that Camunda can hold with this mechanism?
  2. Can I increase this limit?
  3. Can this limit be the reason for the error?
  4. If not then what might be the cause for such an error?

A bit more Context:

We are attempting to expand and scale out our servers,
We got C# services which are using the FetchAndLock Camunda API to get external tasks from different BPMNs.
So for example if we had 1 machine/server with 3 services which constantly sending FetchAndLock requests to our tomcat/camunda machine.
Now we are trying to scale out so that instead of 1 machine with 3 services we will have 3 machines with 3 services each. So now we got a total of 9 services which are constantly sending FetchAndLock requests to the camunda engine (with some pausing with the asyncResponseTimeout property)

Thank you in advance!

Hi Ivgi,

I am checking the engine code in Camunda 7.13.

Looking at your stack trace and the java code, I found that each deferred request gets added to a blocking queue. The limit for this queue is up to 200 (class org.camunda.bpm.engine.rest.impl.FetchAndLockHandlerImpl:62):

protected BlockingQueue<FetchAndLockRequest> queue = new ArrayBlockingQueue<>(200);

Requests are added when they are deferred if the asyncResponseTimeout is not null and there are not tasks to be returned (class org.camunda.bpm.engine.rest.impl.FetchAndLockHandlerImpl:320-331) :

 if (result.wasSuccessful()) {
  List<LockedExternalTaskDto> lockedTasks = result.getTasks();
  if (!lockedTasks.isEmpty() || dto.getAsyncResponseTimeout() == null) { // response immediately if tasks available
    asyncResponse.resume(lockedTasks);

    LOG.log(Level.FINEST, "Resuming request with {0}", lockedTasks);
  } else {
    addRequest(incomingRequest);

    LOG.log(Level.FINEST, "Deferred request");
  }
}

If the queue is full, then you get this exception that you saw (class org.camunda.bpm.engine.rest.impl.FetchAndLockHandlerImpl:229-236):

protected void addRequest(FetchAndLockRequest request) {
    if (!this.queue.offer(request)) {
        AsyncResponse asyncResponse = request.getAsyncResponse();
        this.errorTooManyRequests(asyncResponse);
    }

    this.condition.signal();
}

It seems that your requests are filling up that queue, making the server unable to serve your workers anymore. The queue should get emptied out in the next run of the acquire() method, which cleans the queue, copies the content to another list, and iterates over it to respond to each caller. I would also investigate how your Camunda database is handling that load. Maybe you can activate some logging that could help you understand further what is going on (https://docs.camunda.org/manual/latest/user-guide/logging/), such as:

org.camunda.bpm.engine.cmd
org.camunda.bpm.engine.externaltask
org.camunda.bpm.engine.impl.persistence.entity.ExternalTaskEntity

I couldn’t find a way to overwrite this property. The best is to check if you can find something going on with your Camunda instance, or to scale your Camunda nodes horizontally so they can serve more workers.

Regards.

2 Likes