I have to check Camunda performance in “production” environment. I have deployed the REST API application on Wildfly 10. In this tutorial https://docs.camunda.org/manual/7.6/installation/full/jboss/manual/ we are setting some properties in standalone.xml file.
I’d like to know what is the best value for “core-threads”, “max-threads”, “queue-length” etc. Of course I can write any number and run my tests, but the question is - how to count them?
Threads, pools, and sessions.
Depends on how you need to balance available resources (CPU, RAM) and BPM capacity plans.
Documentation overlaps with JBoss. Also, given Wildfly being out-in-front, instructions need some adapting to the most current version of wildfly.
Here’s what I’ve found:
- I wrote my own test harness using Apache Camel’s JUnit extensions.
- In running sets of 1000 requests, I was able to reduce response times 13ms via increasing availability of sessions (for Undertow - ReST) and threads. Roughly, doubling and trippling (in broad sweeps) while tracking RAM requirements.
- Increasing RAM is done in a separate config file (I’ll follow-up with exact details).
- Performance heavily depends on task implementations… Busy tasks (or tasks implementations eating up and locking out resources) require some oversight in review/testing.
- An initially well performing BPM container will soon get bogged down with too many objects - depends on architecture/approach to performance requirements.
Hawtio and various JMX monitoring tools help. But, if you’re familiar with Websphere… it’s a slightly different approach.
Here’s a couple of links - they don’t precisely line up with WildFly v10… but, the config changes are generally the same. Big difference is use of Undertow.
Wildfly performance tuning
Entering Undertow Web server
I’m comparing 3 BPM engines and I have a few performance tests. For now, Camunda looks nice.
There is only one case when Camunda engine is much slower than the others: signal events. There are 2 process definitions: the first one is throwing a signal to start another process.
I’m running 30k process instances of the first definition. As a consequence, I get 30k instances of the second definition.
My test takes about 7min. When running exactly the same test (same definitions, same machine etc.) on any other engine - the test takes about 3min.
I don’t know why, but Camunda takes only about 30% CPU resources when other engines takes about 90%.
How to speed Camunda engine up?
Hmmm. My test consumes 100% CPU resources of a database machine. Missing index?
Comparing CPU utilization, or consumption, between BPM engines may be like comparing apples to oranges… and depends on the BPM model (BPMN diagram) under test/load.
For example, one engine may begin immediate execution of service tasks while another may not. So, latent task-service invocation may require some extra monitoring time depending on configuration.
Additional allowances also required for number and type of BPMN executable elements in each model.
Noted above is the use of a signal event to start, or launch, a new process. However, you could also refine your logical model by removing the start-signal event requirement and replace with a simple start “none” event (empty circle). The “send event” can then simply start a new process via Java API… for example. And, I’m guessing here on CPU costs though making some assumptions based on high DB service overhead.
Regarding databases… Some just eat up all available CPU while others are more conservative. So, that point aside (assumption), sounds like your model is aggressive with DB requirements. Recommend isolating your task implementation and services - test the more suspicious task implementations individually. May require a revisit on overall architecture if task implementations can’t be “unglued” from BPM model for discrete execution/testing.
Also recommend taking a close look at process-variables. I tend to size and add up all process variables - point being a measure of cost to overall performance. And, there’s also (always) some service-task in the mix blocking threads rather than following a more resource-friendly event-driven approach to systems interaction.
I’ve recreated DB. The time is now about 3min.