Handling Transaction Timing Issues (Distributed Transactions)

emaarco · November 12, 2024, 3:32pm

Hi everyone

I’ve run into a distributed-transaction-challenge in our system that I suspect some of you might have encountered as well in the context of zeebe. I’m reaching out to see if anyone has insights, best practices, or alternative approaches for solving this.

The Challenge

We’re using Zeebe to orchestrate processes across our microservices. During testing, I noticed that when a new process is started, there’s a chance that the first task gets executed by Zeebe before the originating transaction has fully committed. This issue applies not only to the process start but also to all cases in which a message is sent to Zeebe.

This can lead to inconsistencies: Zeebe starts the process and executes a task before the database changes are available, causing potential errors when a second worker tries to access uncommitted data.

To clarify, assume the following example, which is also visually represented in the accompanying image. Imagine we have a process that manages vacation requests. This process is deployed to and orchestrated by Zeebe. It includes a start event and an initial worker.

The process is triggered whenever a user submits a vacation request. This is implemented by a CreateVacationRequestService that saves the vacation request to the database and then calls Zeebe to start the process. The service looks something like this and is annotated with @Transactional :

@Transactional
class CreateVacationRequestService(
    val repository: VacationRequestRepository,
    val processPort: ProcessPort
) : CreateVacationRequestUseCase {

    override fun createVacationRequest(command: CreateVacationRequestCommand) {
        val request = VacationRequest(command)
        repository.save(request)
        processPort.vacationRequestCreated(request)
    }
}

The issue arises because the transaction completes only after both the database save and the Zeebe call are done. This means Zeebe might start executing the process while the database commit is still pending, which can result in errors if other tasks need to access the newly created vacation request before it’s available. This issue becomes more likely if additional operations are executed after the process call, extending the transaction duration and increasing the likelihood of inconsistency.

Possible Solutions I’ve Considered

Rely on Zeebe’s Retries: Ensure that the Zeebe start-process-call is always the last action in the service to minimize the risk. Then, I could ignore the issue and rely on Zeebe’s built-in retry mechanism. After a few retries, the transaction should be committed, allowing the task to execute successfully. However, with messages, for example, there could still be a risk of data records overwriting each other. Thus, i think this isn’t the best-idea
Zeebe Call After the Transaction: Move the Zeebe API call outside the transactional scope, ensuring the process starts only after the database commit (same applies to messages). However, this approach introduces its own challenges—such as handling errors during the Zeebe call, which could lead to scenarios where a vacation request exists without an associated process instance.

Both options have their challenges, and I haven’t found much discussion about this issue yet. Before committing to one of these solutions, I wanted to ask the community:

Is this a common problem others have faced?
Are there any best practices or patterns for handling this kind of distributed transaction issue in Zeebe?
Is there a better approach that I may have overlooked?

Looking Forward to Your Insights

I’d really appreciate any feedback, especially from those who have dealt with similar transactional issues in Zeebe. Perhaps there’s an established best practice I missed, or a simpler solution that makes more sense in this context.

Thanks a lot in advance for your help!

Cheers, Marco

Webcyberrob · November 12, 2024, 8:14pm

Hi,

Youre dealing with a classic distributed transaction issue… There are a few established patterns in this space, see saga, outbox to name a few…

I would not advocate option 1. This is ultimately a race condition which can always fail…

Im more likely to go with option 2 with some further considerations…

From a UX perspective, both operations need to succeed, thus if either fail you need to throw back to the user with an error and please try again …

However with this approach, you could end up with orphaned requests in the db. To address this I would put the process instance key in as an attribute in the db request so you know which requests are owned. You can then uses a periodic reaper process to purge orphaned requests…

The above is a simplistic approach, but is forward compatible to transition to outbox pattern if required…

Regards

Rob

dominikh · November 13, 2024, 7:58am

Hi @emaarco,

in my point of view, you have a similar issue with the worker too.
Consider the following diagram:

If something goes wrong with completing the job, you will have to deal with it.

You have the same challenge with systems like Kafka. So, how do you handle this? You cannot put the out port in the same transaction. Things can also go wrong when committing the DB transaction, e.g. optimistic locking. You then cannot rollback the remote changes. As you have described well, it also leads to problems that the outgoing events / commands are already processed before the state is actually committed.

I’m actually not so sure about the solution here. If we suppose that the engine is a highly available system, we can assume that the commands are accepted by the engine for Process Start and Send Message. Therefore, in my opinion, you can skip the outbox pattern.

If you also already have a system like Kafka in use, you could also send commands (after transaction) to yourself and process them asynchronously. But what would be the difference between sending it to the engine straight away? Any external system that you call after the transaction carries the risk that the action will fail. This also applies to a database in which the outbox pattern is mapped.

emaarco · November 13, 2024, 9:19am

Thank you, @Webcyberrob, for calling out the issue directly and mentioning established patterns like Saga and Outbox. I really appreciate your recommendation!

@dominikh, I agree with your point. The problem applies to all kinds of interactions with Zeebe, as well as with external systems in general.

When working with Kafka, we’ve addressed similar issues by publishing messages to the internal event bus during the transaction and ensuring they are actually sent after the transaction using mechanisms like TransactionalEventListeners. This is a simple solution, but it also comes with a trade-off. Because, as you pointed out, this still leaves room for potential inconsistencies—such as errors that might prevent the message from being published after the transaction. To sum it up: It’s a challenge that isn’t fully resolved even in these scenarios - but it’s a trade-off we have consciously accepted for now, with the intention of fully solving it later.

All in all I wanted to simplify my example by focusing specifically on explicit interactions with the Zeebe API, such as starting processes or sending messages to the engine. Nevertheless, I acknowledge that this broader issue extends across many integration points (like f.ex. job-workers).

Ultimately, I am searching for a lightweight solution—one that avoids the overhead of managing dedicated database tables or complex synchronization mechanisms. Ideally, I want to keep things simple while still ensuring reliable integration.

dominikh · November 13, 2024, 10:42am

If you want a fully reliable solution, consider storing the outbox entry within your transaction. However, I’m not a fan of the added complexity and would love to find a simpler approach. This is probably not necessary for jobs. It may be necessary for messages and process starts.

emaarco · November 14, 2024, 7:34am

Thank you for your response @dominikh. My concerns are not just about the added complexity—it’s also about the time-critical nature of my processes.

I rely on these processes starting in near real-time, with the first tasks being available almost immediately. Often, the first task is a user task, and it’s crucial that the user doesn’t experience a significant delay. The user initiates the process and expects to proceed with the next task promptly, so responsiveness is key to maintaining a positive user experience.

Regarding the outbox pattern, I have some thoughts based on this conversation and your suggestion (maybe they will help us find a proper solution):

If we introduce a scheduler that periodically reads data from an outbox table (potentially in batches), it inherently introduces a delay before the follow-up task becomes available. This delay could be even longer if errors occur during processing, meaning a message is not sent in the first scheduler run.

An alternative could be using a transactional event listener (or something similar) to extract the record from the outbox table immediately after the transaction completes and then send the corresponding message to Zeebe. This approach would achieve near real-time execution, unless a failure occurs, in which case retry logic would be required to ensure reliability. However, using a synchronous event listener could extend the response time significantly, potentially blocking the user’s request to start the process, which is unacceptable. Thus, we would need to execute the logic asynchronously or limit the number of retries. The latter, however, introduces the risk that the process may never be created, necessitating a second mechanism, such as a scheduler, to handle messages that couldn’t be sent initially.

Ultimately, we reach the same conclusion: solving this issue results in either increased complexity or an accepted risk of inconsistency in the system. I’m still trying to find a balance that ensures satisfying reliability without compromising simplicity or the responsiveness that users expect. I hope, we can find a good solution together .

stephanpelikan · November 22, 2024, 7:17am

Hi all,

we built an abstraction layer for hexagonal business processing in Java called VanillaBP (https://www.vanillabp.io/) and in the adapter for Camunda 8 we implemented a near-zero-gap solution:

Actions to Zeebe (like completing a task or starting a process) are postponed until the transaction completed sucessfully by using an transaction listener.
Additionally, within the transaction - immediately before committing the TX - we probe for Camunda (in case of a task we try to set the timeout): If there is an error the transaction is rolled back (Camunda is not available or task is gone due to a boundary event, etc.).

This minimizes the chance for getting out-of-sync between your business-data and the Camunda process state with minimal overhead. I know this is not bullet-proof but it is good enough for most use-cases.

May this a solution also good enough in your use-case?

github.com

camunda-community-hub/vanillabp-camunda8-adapter/blob/5ddd512315c436a5b3d91d8cc719a877ed644ebb/spring-boot/README.md#transaction-behavior

![Draft](../readme/vanillabp-headline.png)
# Camunda 8 adapter

This is an adapter which implements the binding of the Blueprint SPI to [Camunda 8](https://docs.camunda.io/) in order to run business processes.

Camunda 8 is a BPMN engine that represents a closed system similar to a database. The engine itself uses a cloud native storage concept to enable horizontal scaling. By using the remote API BPMN can be deployed and processes started. In order to execute BPMN tasks a client has to subscribe and will be pushed if work is available. Next to the BPMN engine other optional components are required to handle user tasks or to operate the system. Those components do not access the engine's data but rather use exported data stored in Elastic Search. As it's nature suggests it fits good to mid- and high-scaled use-cases.

This adapter is aware of all the details needed to keep in mind on using Camunda 8.

To run Camunda 8 on your local computer for development purposes consider to use [docker-compose](https://github.com/camunda/camunda-platform#using-docker-compose).

## Content

1. [Usage](#usage)
1. [Features](#features)
    1. [Worker ID](#worker-id)
    1. [Module aware deployment](#module-aware-deployment)
    1. [SPI Binding validation](#spi-binding-validation)
    1. [Managing Camunda 8 connectivity](#managing-camunda-8-connectivity)
    1. [Logging](#logging)

This file has been truncated. show original

emaarco · November 22, 2024, 10:08am

Hi @stephanpelikan,

thank yor for your reply and for linking your solution. I find your approach very interesting - it broadens my perspective on the issue.

It’s actually quite similar to the approach I currently favor - when looking for a simple solution:
Solving the problem by sending an afterTransaction message. For example using a TransactionSynchronizationManager - or as you do with a TransactionalEventListener. We already use both for other use cases in our application.

What I find particularly interesting in your approach is the idea of checking whether interaction with Zeebe is even possible beforehand. Extending the timeout is an interesting angle to consider.

However, I find it challenging to apply this idea to my specific use case. My primary focus is on sending message events to either start processes (Message Start Event) or continue them (Intermediate Message Receive Event). In this context, there is nothing comparable to a timeout that you have extended - unless I am missing an obvious option here.

The closest thing I could find using the ZeebeClient Spring implementation was checking if a broker is healthy. However, it is not clear to me whether the query makes sense - nor is the result as reliable as your approach. It tells me that zeebe can receive messages but doesn’t confirm whether a process can actually be started—or if a message has already been received (meaning the process has already continued).

I have just read the release notes for Camunda 8.6. Therefore I would like to add the following: Updates for the Zeebe client are announced there. These may help me here to find similar ways (e.g. by querying the process instance / -definition).

I also wonder to what extent you work with retries in your approach. Assuming the request to Zeebe fails despite the previous check - is a retry executed in any way? Having a short look, I couldn’t find anything like this in the code.

Have you had any experiences that speak in favor of not using such a pattern (retries) - and solving the problems when a request fails in a different way? Perhaps because it will hardly ever occur? Or because, its a never-ending-story and even with all the retries, we need to manually intervene at some time. I’m still not sure what might be a good balance here.

However, I could only find an error logger that logs that the request has failed. You could then take care of this manually if the worst comes to the worst.

I’d be excited to hear your thoughts and discuss this further

stephanpelikan · November 22, 2024, 11:11am

A lot of things to answer

Regarding starting by message and getting a confirmation: As I heard this is a feature provided in the new REST API introduced with 8.6.

Checking availability of Zeebe: Yes, getting the topology is fine. Additionally you could check the result for having at least 1 (or n) healthy brokers (Zeebe API RPCs | Camunda 8 Docs). And yes, as for the task it just minimizes the gap and does not close the gap

Retries: If the check fails and the TX is rolled back in case of the probe fails due to unavailability of Camunda (not due to task already gone) then Camunda should rerun the job worker since it was not completed successfully after passing the configured job-timeout. But do not mix up this kind of retry with the retry of a failed job (Zeebe API RPCs | Camunda 8 Docs)!

How bullet-proof the implementation has to be designed depends on the use-case. Since handling of all edge-cases is really hard and often brings some performance problems, it is always a trade-off how likely it is to hit the gap left or what is the business impact of it. For example: Zeebe runs job workers using “at least once” strategy (Dealing with problems and exceptions | Camunda 8 Docs), so actually one has to provide some sort of idempotency. Who does this for every worker ?

emaarco · November 27, 2024, 8:16am

Hi @stephanpelikan.

Thank you for your detailed answer
The above example should already perform the mentioned check.

I will think about it further and possibly create a repository with example cases and potential solutions - also in the context of zeebe 8.6.

However, If anyone still has suggestions or ideas, I would be happy to have further discussions.

dominikh · November 27, 2024, 9:09am

I think it would be great to discuss this with someone from Camunda and work out some bulletproof best practices. I think many users are not really aware of this challenge.

emaarco · January 14, 2025, 8:18am

I’ve created a repository to address this topic and provide example solutions. Feel free to check it out and contribute if you have other ideas or approaches to solve this problem!

By now, I’d consider this issue resolved and will reference the repository for further discussion or collaboration:
Distributed Horcruxes

Thanks to everyone for the valuable insights and input!

system · January 21, 2025, 8:19am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.