Hello zeebe-team,
I read all the documentation, gave your demos a try and try to be up to date. Simply to be ready to use zeebe once it is released . I’m thinking about Camunda related situations within our software we solved in the past and how to solve them in context of zeebe. Doing so I was facing this:
We’ve implemented our own user task list and even an own process list for a couple of reasons. One main reason is to make user tasks or processes searchable by custom data which is not stored in process variables. This data might be complete different for each process definition or user task so this information is related to the context. We use Camunda’s event stream to keep our lists up to date and load custom data during event processing.
In zeebe there is no historic data available for random access. One could implement an event log subscription which fills a database as Camunda’s DbHistoryEventHandler does. This would be similar to what we do to build our own process and task lists. So from this pespective our current approach matches zeebe well.
But to be as scalable as zeebe is, it is necessary to avoid rdbms as you do. There are several possibilities for doing so (NoSQL, indexes, etc.) but whatever we choose it leads to have two competing data stores which might get out of sync. For the processes and tasks zeebe’s log is the primary data store and what about the custom data? I would prefer to have only one primary data store and be able to recreate any derived copy (for searching) if necessary.
A possible solution would be to set/update the payload of a process instance to store the custom data. This might be in the context of a certain task or the process instance or any other context of the process. But to store current data for an upcoming task an execution listener would be necessary to set the payload. Otherwise the new payload could only be set on completing the task which might be far in the future and any new task would receive outdated data.
But there are also disadvantages for storing custom data in the payload:
- For data related to certain tasks you might create a shadow structure in the payload to the process’ structure or it is necessary that zeebe introduces the concept of local payloads similar to local variables in Camunda.
- A lot of redundant data: I don’t know if this is true, but is it necessary to update the entire payload on completing a task? What about partial updates? What about concurrent updates? Is there a OptimisticLockingException for payload updates of parallel executions?
What I also was thinking about is to populate “custom” events. This events could be used to store custom data in zeebe as part of the event’s payload. This would reduce redundancy but requires to merge those events to get the entire view of custom data. Merging happens on updating the secondary storage, so this should not be a problem. But I think this is also something zeebe will have to be able to once local/partial payloads are possible.
What do you think about this and what are your thoughts about payload maintenance? Do you have any concepts ready to use but not yet implemented? It would be nice to start a discussion to get an idea where the journey is going to.
Cheers,
Stephan