Hi Guys,
we need your urgent help as we stuck with a big issue, which might be a simple one for you guys.
In our design we have an user task followed by a polling task which retries for few times (as per modeler configuration lets say PT1S,PT2S and so on). We are doing Asynchronous before for this. And after that once it is successful it continues. All works well and there is no issue.
Problem Description:
Image is above and please ignore incidents, but look at the stuck issue.
Problem comes when we re-deploy our service which includes all the bpmn files along with our java code for all the functionality. Suddenly everything stops working and all the processes stop at the service task (Poll for status). We keep on triggering many instances and they all got stuck there. We tried re-deploying but no success.
Once we delete the whole kubernetes pod, and re-deploy, then only it starts working for the new instances.
Please let me know if there is any solution for this or please feel free to contact if need any more information?
@vineet_saxena check for incidents in the cockpit.
@aravindhrs Incidents are fine, as those supposed to be there, but issue is with the instance which is stuck there. In the image if you would see there are 16 instances and 15 incidents. That 1 instance would be stuck there forever. Not only that, if I trigger any other user tasks after that, those would also be stuck at Poll for status.
In service task 16 instance were in incident which is highlighted in red. You need to check the incidents.
Ok let me again explain a bit, so those 16 incidents are correct as those are supposed to be there. For previous 16 instances the retries done, and result was unsuccessful so there are incidents for those instances. Now I triggered a 17th instance, I am expecting this also to be failed and create a incident, but problem here is that it never got executed and stuck there at service task forever. Actually it never got triggered by service task. All the 16 instances were before we re-deployed our service and this 17th instance we triggered after the service re-deployment.
@vineet_saxena, can you upload your bpmn model?
Here it isocr.bpmn (5.5 KB)
In service task you configured retry as : PT1S,PT2S,PT4S,PT8S,PT16S,PT32S,PT1M,PT2M,PT4M,PT8M,PT16M,PT32M which is almost ~1hr
After all the retry attempts completed(retry time elapsed) incident will be created. Unless all the retry attempts are completed incident won’t be created.
Hi Aravind, Yes you are correct, that is the correct behavior.
My problem is with the instance which do not even trigger for this service task.
I can see that instance here as 17th Instance as above picture, but this instance never get triggered which means it never fails/success.
17th instance how long it got stuck? more than a hour?
Its there forever and never got triggered and then we have to clear our Kubernetes pod and then only other instances worked.
In any of the process application has the property been set to below?
<property name="jobExecutorDeploymentAware">true</property>
or
camunda:
bpm:
enabled: true
job-execution:
enabled: false
deployment-aware: true
For all the nodes, the process engine configuration is same?
We do not have this configuration anywhere.
Yes all the nodes have same process engine configuration.
Hi Arvind, Is there any entry for this in any table or somewhere else? I could check there also.