Using camunda in apache spark big data distributed environment

JG2023 · December 5, 2023, 5:51pm

Hi
We have a lot of apache spark jobs with reading data from hdfs running on a hadoop cluster and lot of rules inside those jobs needs to moved to workflow management for business owners to operate.
I am thinking couple of approaches on how to fit Camunda in this scenario:

Camunda is installed on all nodes on hadoop, When spark job is executing the spark task on partitioned data from hdfs, it will invoke process instance with batch of partitioned data which is call a subprocess (to loop through batch of data) and process its own workflow
Camunda cluster is installed outside the hadoop cluster independently on its own. How spark job is calling the camunda cluster remains same as option 1.
Camunda cluster is installed outside the hadoop cluster independently on its own. At a particular time interval, call a service task which call spark job (through rest api) to fetch the initial data set then call another service task which will in turn call another spark job and so on.
Any input on how camunda can fit with big data jobs will be helpful.
Thanks
JG