Change default behavior of output mapping of tasks

BerndRuecker · March 21, 2018, 12:26pm

Hey guys.

We should change the default behavior or data flow on output mappings. The typical usage is that a task provides additional data, so the data handed in should be merged with the existing data. This is the behavior we also know from the Map in Camunda BPM 7. This is pretty intuitive and allows to work without data mappings for most use cases.

The current default (0.7.0) is that the payload is overwritten if you hand in data on completion of a task. This is seldom what you want to have in my current understanding of the use cases.

In the example I currently do it requires a lot of dumb mappings, as every task adds at least one attribute to the payload. But then I have to do a lot of these things:

<zeebe:output source="$.pickId" target="$.pickId" />

It is especially annoying as I have to know the names of the attributes now in the BPMN.xml - I shouldn’t (in my case).

When I checked an example Ryan is currently doing I saw the same for basically each and every service task:

So my vote is clear to change this behavior asap - or a least provide a configuration attribute to allow a simple JSON merge much easier (and check it by default in the modeler :-)).

WDYT?

Cheers
Bernd

thorben · March 21, 2018, 4:47pm

Hi Bernd,

The way we intended this case to work (i.e. default input mapping and updating some parts of the payload) is that you would submit the entire payload on task completion, not just the parts you change. Then the broker doesn’t have to do any merging which we gives the best performance.

Whenever you send an update to the broker that is not self-contained (i.e. only a delta of changes), then the broker has to go back in the log stream and find the latest version of the event, merge it with the changes and write that to the end of the stream. When you have a self-contained update, then all the broker has to do is append it to the stream. So merging is rather inefficient and throws away some of the benefits of the stream processing model. That’s why we try to keep these situations to the minimum and want to have default behavior that avoids those situations. That’s also why the task completion by ID topic is not an easy one.

Cheers,
Thorben

ryan · March 21, 2018, 7:51pm

Thorben’s comments notwithstanding, I do agree with Bernd that it would be nice to be able to easily specify that we simply want to merge the new data with the existing data that already exists in the workflow instance. Moreover, I would tend to agree that a simple checkbox labeled “Merge” (or something like that) would be intuitive.

Now, given Thorben’s follow-up comments above, maybe the checkbox should be deselected by default, and we should put a comment in the documentation indicating that the use of the merge feature will slow down the performance of Zeebe slightly (or perhaps more than slightly - that would be a question for Thorben and the core development team ).

-Ryan

Philipp_Ossler · March 22, 2018, 7:21am

Hi,

there is also a simple way to merge the complete task output with the workflow payload. If you define the output mapping

$ -> $.result

then it adds the complete task payload to the workflow payload under the key “result”.

Maybe this helps

Best regards,
Philipp

BerndRuecker · March 22, 2018, 7:47am

Hey guys.

Thanks for the quick feedback!

I think that a performance gain which will be leveraged by probably 5% of our users (or less) should not result in the other 95% of the users suffer from avoidable complexity when designing their workflows (which also increases learning curve, questions in the forum, support cases, consulting effort, …). Personally I would prefer to have that default of merging everything, but a checkbox to suppress it. In the docs we can make a clear statement that if you want to optimize for performance you should do the data flow like Thorben just described. This way we make it possible but the 5% of the performance users have to learn about it, all others can simply ignore this. Maybe that can be highlighted boldly in the docs - probably also in other places as I can imagine there are some more design decisions which differ for “normal folks” and high performance scenarios.

then it adds the complete task payload to the workflow payload under the key “result”.

I saw this, but it actually would mean that my resulting map gets cluttered a lot as for every attribute it gets nested into some other attribute (result1.pickId, result2.shipmentId, …).

For me the underlying question (which is also raised in Output mapping on start event - Discussion & Questions - Camunda Forum) boils down to usage patterns. Take e.g. the architecture alternatives in https://github.com/flowing/flowing-retail/tree/zeebe/zeebe (just started with that example, readme is still a todo). You could use Zeebe as work distribution (as it is currently advocated on the zeebe.io homepage) or you can use it as workflow engine within one service (as a typical microservice architecture would do). How you approach data flow and mappings is very different in this scenarios. In the “Zeebe as work distribution” scenario the services must not know any details of the workflow definition but act like they would send and receive messages. And here it gets more tricky.

Probably worth a F2F discussion? And with F2F I of course mean Skype or Goto or the like Next week would work perfectly for me. Anybody interessted?

Cheers
Bernd

BerndRuecker · March 26, 2018, 11:41am

One additional remark on “submit the entire payload on task completion”: This only works if you have a synchronous worker. If you do something longer running in between you would have to persist the payload within the worker, which is normally exactly not what you want.

menski · March 31, 2018, 9:24pm

Thanks for all your feedback and input. We will work on this topic in the next quarter.

menski · May 30, 2018, 5:36am

A small update on this topic. We plan to change the default behavior if no output mapping is specified to a non-nested top level merge, see issue https://github.com/zeebe-io/zeebe/issues/907. Also we plan to provide a XML attribute to override that behavior where the provided payload will override the existing one for performance reasons https://github.com/zeebe-io/zeebe/issues/908.