Showing which step really failed

fml2 · May 24, 2019, 10:51am

Hello,

we have a process model with three sequential activities: Activity1 —> Activity2 —> Activity3. Of these activities only the first one has the attribute “async before” set to “true”; all other activities have “async before” and “after” set to “false”.

If a process instance fails during the execution at the Activity2, then the instance is rolled back to before the Activity1 – that’s OK.

Is it possible to see in the cockpit that the instance actually failed in the Activity2?

It’s clear to me that it’s only possible to “restart” the process at Activity1. But I’d like to clearly see what and where went wrong.

Thank you for any hints.

patozgg · May 24, 2019, 4:05pm

Hello fml2,

Unfortunately and through Cockpit you can’t really see such information. However, you could do the following.

Mark your tasks are asyncBefore = true
View what went wrong by taking a look at your logs
Use an external log query and alerting application to get notified when there exists a specific type of stack trace

Best practice is to always use number 1. The main reason behind this is to avoid application logic from being executed twice if for some reason an activity in the process fails. This is what we recommend to all of our users .

Does this make sense?

Regards,

fml2 · May 24, 2019, 10:49pm

Hello potozgg,

thank you for your advices. However I can’t think that your proposals are quite good

Doing number one (asyncBefore everywhere) would degrade the performance. I understand that it’s not a concern in every project but still the feature exists not without a reason.

Number two is how I think to handle it. Number three (log analysis) is too complicated IMO for such a basic task in a BPM system.

Frankly, I don’t understand the reason why the failed activity is not saved somewhere. The status of the failed job is saved (this is how we see the incident), hence a commit is made to the DB. Why not remember the failed activity as well (as one of the attributes of the failed job)? This is of course not always the place we can restart the process from (failed activity != possible restart activity). But seeing clearly (also in the graphical notation) what activity (not what job!) failed would help to understand the problem faster. After all, we design processes out of activities, not out of jobs!

This would also have some other advantages. E.g. having a correct statistics about what activities fail.

fml2 · May 29, 2019, 6:47am

I’ve created a feature request: https://app.camunda.com/jira/browse/CAM-10378

patozgg · July 9, 2019, 5:55pm

Hey @fml2,

Just saw your reply and realized that in your second to last post you tagged me as potozgg (my usernaame is patozgg). Given this I did not get a notification of you reply . In any case, you did the right thing in creating a feature request. BTW… are you an enterprise customer? If you are you should be using our internal enterprise support ticket system.

Regards,

patozgg · July 9, 2019, 6:08pm

@fml2

Regarding Number 1. Yes, this is true that marking your tasks as async before might cause some performance degradation due to the additional transactions. However, if Camunda were to track the executed failed activities, this would also cause a performance degradation. Anyways, With async before you can somewhat control the performance degradation to some degree. If it helps, to maximize your performance, Camunda also recommends tuning the job executor to maximize the utilization of the resources were you are running the engine.

Regards,

McAlm · April 27, 2020, 2:29pm

Hi there, if you take a look at this announcement (https://blog.camunda.com/post/2020/04/camunda-bpm-7130-alpha3-released/ ) you will recognize that 7.13 will address that. With the newest version in Cockpit you#ll have a clear indication which activity actually caused the incident. 7.13 will be published end of May.
Best, Stefan