I am beginner to Camunda platform and doing poc.
I have scenario :
Service task to send a promotion email to purchase a sku → delay of 2 days(Timer intermediate catch event) → send discount email(service task)
Couple of questions:
Will camunda engine using up resources like cpu and memory during delay state? We have millions of emails sitting in the delay state every day. I am thinking, it wont use up resources and Job executor responsibility will keep polling database on when to start the event back. Please clarify my understanding.
If the user purchases the sku when the user is in delay state, we need to cancel the timer and dont send discount email. We have data coming in kafka whether user is purchased or not. Should I use external kafka consumer to call Camunda REST api to cancel the process instance? How does the consumer know which process instance it has to cancel is a challenge. Any thoughts on use Camunda conditional event here?
Your assumption on the first point is correct. The job for the timer is inserted into the database and will only be picked up by the job executor once it’s due for execution.
Regarding your second point, there are multiple solutions coming to my mind.
Use a event based gateway where one flow goes to your timer and another to a message catch event. The you could correlate a message, once you receive the kafka event. For correlation you could e.g. use the businessKey (if that’s unique).
You could also use an event subprocess with a message start event and make that interrupting.
But as you can see, I would definitely correlate a message to your running process once you receive the kafka event.
Thanks a lot. It helps
For 1) We have millions of process instances (like 15M to 20M of volume) everyday sitting in the delay.
Do you see any concerns like delay in processing by job executor or things to be noted? We will definitely scale the cluster based on number of process instances triggered.
Only thing that might be worth looking at, is the amount of history produced. Since for running process instances your history will be kept. So based on the amount of activities and variables you have, that could amount to some storage size.
Also you should care about the cleanup of your history, if you have a high amount of process instances. You can control that, by setting the history level and the history time to live, as well as the cleanup jobs for the history.
So, if i understand correctly, I will add message boundary event interrupting the subprocess and give a unique message id. I will have a external kafka consumer(not part of workflow) and consumer will make a rest call to the workflow to interrupt the event with same message. Is my understanding right?
In continuation, If I want to call another workflow(say workflow 2) from message boundary event of workflow 1, I think I need to use call activity. In this case, how do I end workflow 1 when boundary event is triggered? In other words, I dont want to come back to workflow 1. Is this possible?
So the name of the message is only unique for your process definition. All process instances will listen to events of this message name.
But when correlating the message you can define additional conditions for the correlation (e.g. process instance id, business key or some variable value). By doing that you can uniquely identify the process instance that should receive the message.
If the boundary message event is interrupting, no further activity will be triggered inside the scope it is attached to, as the token will be removed there.
Hey @JG2023, i think its important to say that you will have a big impact on cpu, memory and even more on IOPS at your database if many of this timers trigger at the same time (or even near).
When you get to the timer step, it will stop consume your cpu as you said, but when the timer get triggered it it will be treated as any other job… youll be using job executor threads to continue each one of them, and will have lots of Delete/Update/Insert jobs in database.
So you need to be prepared for your process of two days ago impacting performance of the new process starting today.
Try to manage these timers to expire out of your most important business hours, and be prepared to scale your database as i think the biggest impact will be on database operations.
another problem i had on a similar scenario with milions of timers was that the job acquisition query was scanning many rows. With the best index possible we couldnt take out the weight of those milions timers, even with the DUE_DATE_ on the index.
So in these scenarios i had to change my timers to a message receiver “Expired” and make a single scheduler out of the process, executing my business logic to find process that should expire and fire the message correlation to them. As the message receiver event stays in other table, outside of the job acquisition query target, it got to full speed again.
I dont think that trade my timer to an expiration message made my business process design any worst for any business people, and also helped me to not have my expiration logic decoupled, instead of a fixed 2 days on the bpmn.
Ah, i was thinking about camunda 7, as camunda 8 doesnt use a database.
In camunda 8 you wont have those worries about job acquisition query and database indexes.
But even at camunda 8 youll end with many jobs in the queue for the process instances from 2 days ago, while receiving new jobs from new process instances. But in camunda 8 you should be able to scale your cluster nodes to hadle this, as you wont have a database as a single point of failure as in camunda 7.