Camunda 8: Best practice to iterate over a large collection in zeebe

camundaenthu · January 23, 2025, 12:24am

Hello, looking for a best practice to iterate over a large collection (python dictionary with 1000+ entries) in the bpmn.

Currently, the workflow has a task where the dictionary is created and returned. The rest of the workflow runs in a loop iterating over the dictionary items. The size of the dictionary is over 3 MB and I understand its close to zeebe max payload size and its not advisable to store such large content in the output variable of the task.

What is a recommended approach in this case if I wish to retain the loop in the bpmn?

nathan.loding · January 23, 2025, 8:27pm

Hi @camundaenthu - what needs to happen with each value in the dict during the loop? Usually I recommend passing in just the ID and using another worker/connector to do something with the data, but how that would work depends on what is happening in your process.

camundaenthu · February 12, 2025, 4:38am

Thanks @nathan.loding.

Each entry is a json whose values are used as inputs to the workflow (and sub workflows) that run in the loop

nathan.loding · February 12, 2025, 6:54pm

Hi @camundaenthu - I would look into referencing the data using keys, to avoid flooding the process with extra data. The amount of variables inside the process also increases the amount of storage needed in the cold data store (eg, Elastic). If using keys isn’t an option, I would try batching the data instead: can you iterate over the service that fetches the data, and fetch 250 records at a time?