REST API poor performance for starting Process Instances

Hello together,

I am currently using Camunda 7.10.0 in a Docker environment.
Since a few days, we’re experiencing a very poor performance (long response times > 30-40s or even timeouts) when starting process instances by definition key via the REST API.

The performance issues occur with all of the process definitions, but in particular happen on large BPMN models (one example is around 300KB of XML).

Here are a few background infos I figured might help, maybe somebody here notices something odd that may cause the performance issues:

  • The server runs a MySQL container as database as well as the Camunda container
  • The server is a Digitalocean standard droplet w/ 6 vCores & 16GB of memory
  • Camunda’s heap memory size is configured to be 8GB max. (from which Camunda uses all 8GB)
  • History level is set to ACTIVITY
  • Currently running 1.000 active process instances waiting at intermediate timer events
  • Approx. 40.000 variables in the ACT_RU_VARIABLE table
  • Approx. 7.500 deployments
  • Most of the tasks in our process definitions are external tasks or service tasks w/ Expressions (simple calculations)
  • We run 9 external task workers (on other servers) which each poll tasks in an interval of 300-500ms via REST
  • Often times, the first start of a process definition (after not starting that particular definition for a while) takes a long time and directly after that, starts blazingly fast. Maybe there is some caching of the process definition happening in Camunda on process instance creation?

I hope that somebody here experienced similar issues and might help out with some tips & tricks.

Cheers!
Max

Hi Max, that external task worker polling interval seems very short and it could be your issue. You could increase the polling interval (ie several seconds or minutes) and see what happens or take advantage of long polling to see if that helps. If you require such short intervals you may want to consider changing your external workers to something synchronous like Java delegates or scripts.

Joe

Thanks @Beagler ! I implemented long polling now, which reduced the amount of requests from the external workers. My problem persists though.

This morning I had some Process Engine persistence exception errors like this:

09:58:09.462 WARNING [http-nio-8080-exec-126] org.camunda.bpm.engine.rest.exception.ProcessEngineExceptionHandler.toResponse org.camunda.bpm.engine.ProcessEngineException: Process engine persistence exception
        at org.camunda.bpm.engine.impl.interceptor.CommandInvocationContext.rethrow(CommandInvocationContext.java:150)
        at org.camunda.bpm.engine.impl.interceptor.CommandContext.close(CommandContext.java:177)
        at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:115)
        at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:69)
        at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:32)
        at org.camunda.bpm.engine.impl.externaltask.ExternalTaskQueryTopicBuilderImpl.execute(ExternalTaskQueryTopicBuilderImpl.java:59)
        at org.camunda.bpm.engine.rest.impl.FetchAndLockHandlerImpl.executeFetchAndLock(FetchAndLockHandlerImpl.java:227)
        at org.camunda.bpm.engine.rest.impl.FetchAndLockHandlerImpl.tryFetchAndLock(FetchAndLockHandlerImpl.java:210)
        at org.camunda.bpm.engine.rest.impl.FetchAndLockHandlerImpl.addPendingRequest(FetchAndLockHandlerImpl.java:281)
        at org.camunda.bpm.engine.rest.impl.FetchAndLockRestServiceImpl.fetchAndLock(FetchAndLockRestServiceImpl.java:37)
        at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:137)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:296)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:250)
        at org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:140)
        at org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:103)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:377)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:200)
        at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:220)
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.camunda.bpm.engine.rest.filter.CacheControlFilter.doFilter(CacheControlFilter.java:44)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.camunda.bpm.engine.rest.filter.EmptyBodyFilter.doFilter(EmptyBodyFilter.java:98)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
        at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:668)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:770)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1415)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ibatis.exceptions.PersistenceException:
### Error querying database.  Cause: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-nio-8080-exec-126] Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:20; busy:20; idle:0; lastwait:30000].
### The error may exist in org/camunda/bpm/engine/impl/mapping/entity/ExternalTask.xml
### The error may involve org.camunda.bpm.engine.impl.persistence.entity.ExternalTaskEntity.selectExternalTasksForTopics
### The error occurred while executing a query
### Cause: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-nio-8080-exec-126] Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:20; busy:20; idle:0; lastwait:30000].
        at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:150)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:141)
        at org.camunda.bpm.engine.impl.db.sql.DbSqlSession.selectList(DbSqlSession.java:97)
        at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.selectListWithRawParameter(DbEntityManager.java:183)
        at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.selectList(DbEntityManager.java:175)
        at org.camunda.bpm.engine.impl.persistence.entity.ExternalTaskManager.selectExternalTasksForTopics(ExternalTaskManager.java:88)
        at org.camunda.bpm.engine.impl.cmd.FetchExternalTasksCmd.execute(FetchExternalTasksCmd.java:69)
        at org.camunda.bpm.engine.impl.cmd.FetchExternalTasksCmd.execute(FetchExternalTasksCmd.java:39)
        at org.camunda.bpm.engine.impl.interceptor.CommandExecutorImpl.execute(CommandExecutorImpl.java:27)
        at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:106)
        ... 49 more
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-nio-8080-exec-126] Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:20; busy:20; idle:0; lastwait:30000].
        at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:712)
        at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:198)
        at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:132)
        at org.apache.ibatis.transaction.jdbc.JdbcTransaction.openConnection(JdbcTransaction.java:138)
        at org.apache.ibatis.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:60)
        at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:336)
        at org.apache.ibatis.executor.BatchExecutor.doQuery(BatchExecutor.java:90)
        at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:324)
        at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
        at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
        at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:83)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:148)
        ... 58 more

After seeing that error, I implemented the test on borrow feature for my database connections as suggested here in the forum, which leads to a weird behaviour:

It seems like the Process Enigne needs to “warm up” before it can function with proper speed. What does that mean? Whenever I start a process after a few minutes of doing nothing with Camunda, it takes 30-60s to start that process. If I now try to start a process with the same definition right after that, it starts brilliantly fast.

Do you have any idea what that might be?

public class PreLoadingDefinitionsPlugin extends AbstractProcessEnginePlugin {

@Override
public void postProcessEngineBuild(ProcessEngine processEngine) {
	RepositoryServiceImpl repositoryService = (RepositoryServiceImpl)processEngine.getRepositoryService();

	ProcessEngineConfigurationImpl config = (ProcessEngineConfigurationImpl) processEngine.getProcessEngineConfiguration();
	DeploymentCache cache = config.getDeploymentCache();
	CommandExecutor executor = config.getCommandExecutorTxRequiresNew();
	executor.execute(new PreLoadingCmd(cache, repositoryService));
}

private static class PreLoadingCmd implements Command<Boolean> {
	private final DeploymentCache deploymentCache;
	private final RepositoryServiceImpl repositoryService;

	public PreLoadingCmd(DeploymentCache deploymentCache, RepositoryServiceImpl repositoryService) {
		this.deploymentCache = deploymentCache;
		this.repositoryService = repositoryService;
	}

	@Override
	public Boolean execute(CommandContext commandContext) {
		List<Deployment> deploymentList = repositoryService.createDeploymentQuery().unlimitedList();
		for (Deployment deployment : deploymentList) {
			try {
				deploymentCache.deploy((DeploymentEntity) deployment);
			} catch (Exception ignored) {
			
			}
		}
		return Boolean.TRUE;
	}
}

}