1 out of 3 pods getting most job executions

Jean_Robert_Alves · January 15, 2021, 11:25am

We have a camunda springboot running in kubernetes, with 3 pods.
In our application yml, we have these configurations:

camunda:
bpm:
job-execution:
queue-capacity: 10
core-pool-size: 4
max-pool-size: 5
lock-time-in-millis: 1800000
wait-time-in-millis: 5000
max-jobs-per-acquisition: 4
history-level: none
generic-properties:
properties:
jobExecutorAcquireByDueDate: true

I have an index in MYSQL table ACT_RU_JOB by column DUEDATE_.

But today i started to see that one of our 3 pods is working so much more than the others. The cpu usage went too high, above the requested, only at that single pod:

Selecting a groupby LOCK_OWNER_ at ACT_RU_JOB i could see that one single LOCK_OWNER is getting most of the jobs too, as the others are getting 1 or 2 simultaneos jobs, sometimes 10, but only one pod is getting more than 100:

As my configuration says queue-size: 10 i was expecting that one pod should never get more than 10 jobs… what am I doing wrong? there is anyway to know if camunda is reading those configurations?

Niall · January 15, 2021, 11:35am

How much throughput are you sending - requests per second?

Jean_Robert_Alves · January 15, 2021, 11:45am

Our throughput is really low but i can see that its happening on timer jobs:

Niall · January 15, 2021, 11:50am

The problem could be with :
max-jobs-per-acquisition: 4
If you have low throughput what could be happening is that the first node grabs all the available jobs and starts working on them - when the other nodes query for jobs they don’t find any.
Because exponential back-off could be implemented, each time a node doesn’t get work it will wait longer before asking again.

This might mean that one node polls more often and keeps taking all the jobs - so maybe lowering this number will help.

Jean_Robert_Alves · January 15, 2021, 11:55am

hmm, I will try with lower max-jobs-per-acquisition, but… if my queue-capacity is set at 10, how can a single node get more than 100 jobs locked?

I even redeployed, deleted and recreated all 3 pods and it always happens again, a single pod takes almost all the Timer jobs.

Any new config i shouldve made specific for Timer events?

Jean_Robert_Alves · January 15, 2021, 12:09pm

now i’m with max-job-per-acquisition: 1 and still its happening on my new pods:

thorben · January 15, 2021, 12:40pm

Which version of the Spring Boot starter do you use?

Jean_Robert_Alves · January 15, 2021, 1:14pm

Currently this project is using this versions:
org.springframework.boot:spring-boot-starter:2.0.5.RELEASE
org.camunda.bpm.springboot:camunda-bpm-spring-boot-starter:3.0.0
org.camunda.bpm:camunda-engine-plugin-spin:7.9.0

thorben · January 15, 2021, 1:17pm

Okay, please update to 3.1.9 or higher (and Camunda along with it to stay in the compatibility matrix). The queue capacity parameter doesn’t take effect in you version as per https://jira.camunda.com/browse/CAM-11368.

Cheers,
Thorben

Jean_Robert_Alves · January 15, 2021, 1:34pm

We will try to update our projects to this version and see the effect.

Thanks a lot!

Jean_Robert_Alves · January 19, 2021, 6:11pm

we are updating our camunda from 7.9 to 7.13 starter 3.1.9, and as we expected, we got errors at deployment of our definitions, saying that the database schema is different:

Cause: org.apache.ibatis.executor.BatchExecutorException: org.camunda.bpm.engine.impl.persistence.entity.ResourceEntity.insertResource (batch index #3) failed. 2 prior sub executor(s) completed successfully, but will be rolled back. Cause: java.sql.BatchUpdateException: Unknown column ‘TYPE_’ in ‘field list’

Any guide on how to upgrade with many process instances already on production?
Im following the update guides but its safe to do it in production environment with jobs already on database?

Ingo_Richtsmeier · January 20, 2021, 8:58am

Hi @Jean_Robert_Alves,

you can find a link to the database upgrade scripts here: https://docs.camunda.org/manual/7.14/update/minor/712-to-713/, for the latest version you can find the scripts in our Nexus repository here: https://app.camunda.com/nexus/repository/public/org/camunda/bpm/distro/camunda-sql-scripts/7.14.0/camunda-sql-scripts-7.14.0.zip.

Check the upgrade folder in the zip file for the required scripts.

Each upgrade script contains create table or alter table add column statements and it depends on the database how long a table is locked for the upgrade.

Yes, the upgrade is safe to do it in the production database. But, a backup is always good to have…

Hope this helps, Ingo

Jean_Robert_Alves · January 20, 2021, 11:18am

Thanks. We made this upgrade in our development environment and will be testing it today.

Jean_Robert_Alves · February 18, 2021, 7:09pm

Just so you know guys, the upgrade worked perfectly, thanks!