We are using fetchAndLock API to get the External Task with AsyncResponseTimeout of 15 Seconds. Sometimes the lock happens after the Timeout has already expired; which results in External Task being locked for 20 min (which is configurable).
Question:
Does anyone have any suggestions on how to prevent locking if the timeout has already expired?
Version Information: Camunda: 7.18.0
Possible problem seems like:
FetchAndLockHandlerImpl
protected void acquire() {
..
FetchAndLockResult result = tryFetchAndLock(pendingRequest);
...
if (result.wasSuccessful()) {
List<LockedExternalTaskDto> lockedTasks = result.getTasks();
if (!lockedTasks.isEmpty() || isExpired(pendingRequest)) {
AsyncResponse asyncResponse = pendingRequest.getAsyncResponse();
asyncResponse.resume(lockedTasks);
...
...
}
Thanks for yours valuable inputs. We did some changes and also followed yours given suggestions. Our problem was resolved at some extent but I am still afraid some where code changes need to be addressed.
We are using MS-SQL.
Started ProcessInstance with around 7 variables,
FetchAndLock: AsynchTime : 18000 (18Sec); As our LoadBalancer has Timeout 30Sec.
We did reduce externalTask lockDuration from 20min to 10min; and passed as per API request.
Still we ended with requests being Timedout and few external Task were locked at Camunda for 10 min. While checking in detail as mentioned earlier the fetchAndLockAPI calling multiple DB query operation to get Variables and other details; and at end, it’s locking the external Task. If these queries are taking a long time (as we were fetching 15 External Tasks in 1 request, for each external Task Camunda executed set of queries and fetched 7 variables) seems causing problem.
When request and everything received at FetchAndLockHandlerImpl.java : line 116 FetchAndLockResult result = tryFetchAndLock(pendingRequest);
this method itself took more than the remaining time, it locked the External Task, and when the response was dispatched original request had already been timed out. (and stream closed)
Hence, we reduce number of Variables from 7 to 2 and ExternalTask fetching count from 15 to 3. This resulted in very less failure than the earlier, but not completely reduced.
We still exploring options like working with PostgreSQL. If you have any valuable input please provide.
@joshi.shirin i had this type of performance problems while working with complex variables, like json objects, as it is stored as bytearray in the biggest sql table, and reference it in the variables table.
The amount of work camunda has to serialize/deserialize it is the root of performance problems in many scenarios.
After years of experience, we learned that the best would be to separate process metadata and business metadata. Its very common for us to have only string variables with IDs and let our workers find the complex object by id on our business mongodb, this way reducing the amount of work camunda engine has to do on these pollings and distributing the work among the workers, as you can scale them in numbers more easily.
I also prefer to fetch a small number of tasks too, as it can prevent you from taking all the tasks on a single worker while other workers stay hungry for work.
Also it would be nice to see what are the cost of your database queries. If you can see it in your database, maybe in something like AWS Performance Insights if you use it, take the query, see the execution plan, etc. Maybe paste your execution plan here to see if we can help with some indexes.
Higher number of task fetch/locking will result in more database IOPS, so make sure that your database is handling these well