Process instances could not be canceled

We are currently evaluating Camunda 8 cluster in our private cloud as part of a modernization initiative. We have unexpectedly discovered that we cannot cancel running instances (5 instances) of a process which are still shown in the Camunda Operate UI.
Please find the stacktrace extracted from our cluster admin (I do not have anymore logs than that) below.

Before the problem shows up, the file containing the BPMN was renamed and deployed with the same IDs (ProcessId, Activity, etc.)

My Questions:

  • Could the redeployment under a new file name (same IDs) be the reason for the issue?
  • How can we stop, cancel, delete the instances or the whole process which is not needed anymore?
  • If we have to delete it manually how can we do it if possible?

Stacktrace:
2023-08-14 07:22:48.821 DEBUG 7 — [tion_executor_2] i.c.o.w.z.o.AbstractOperationHandler : Operation 35c653b2-5e9b-46d2-8e1b-2a9889fb96da failed with message: Unable to process o
peration: Command ‘CANCEL’ rejected with code ‘NOT_FOUND’: Expected to cancel a process instance with key ‘2251799814458669’, but no such process was found
2023-08-14 07:22:48.821 ERROR 7 — [tion_executor_2] i.c.o.w.z.o.AbstractOperationHandler : Unable to process operation with id 35c653b2-5e9b-46d2-8e1b-2a9889fb96da. Reason: Comma
nd ‘CANCEL’ rejected with code ‘NOT_FOUND’: Expected to cancel a process instance with key ‘2251799814458669’, but no such process was found. Will NOT be retried.
io.camunda.zeebe.client.api.command.ClientStatusException: Command ‘CANCEL’ rejected with code ‘NOT_FOUND’: Expected to cancel a process instance with key ‘2251799814458669’, but no su
ch process was found
at io.camunda.zeebe.client.impl.ZeebeClientFutureImpl.transformExecutionException(ZeebeClientFutureImpl.java:93) ~[zeebe-client-java-8.2.5.jar!/:8.2.5]
at io.camunda.zeebe.client.impl.ZeebeClientFutureImpl.join(ZeebeClientFutureImpl.java:50) ~[zeebe-client-java-8.2.5.jar!/:8.2.5]
at io.camunda.operate.webapp.zeebe.operation.CancelProcessInstanceHandler.handleWithException(CancelProcessInstanceHandler.java:43) ~[classes!/:?]
at io.camunda.operate.webapp.zeebe.operation.AbstractOperationHandler.handle(AbstractOperationHandler.java:55) ~[classes!/:?]
at io.camunda.operate.webapp.zeebe.operation.OperationCommand.run(OperationCommand.java:24) ~[classes!/:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) ~[?:?]
Caused by: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: NOT_FOUND: Command ‘CANCEL’ rejected with code ‘NOT_FOUND’: Expected to cancel a process instance wi
th key ‘2251799814458669’, but no such process was found
at java.util.concurrent.CompletableFuture.reportGet(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture.get(Unknown Source) ~[?:?]
at io.camunda.zeebe.client.impl.ZeebeClientFutureImpl.join(ZeebeClientFutureImpl.java:48) ~[zeebe-client-java-8.2.5.jar!/:8.2.5]
… 8 more
Caused by: io.grpc.StatusRuntimeException: NOT_FOUND: Command ‘CANCEL’ rejected with code ‘NOT_FOUND’: Expected to cancel a process instance with key ‘2251799814458669’, but no such pr
ocess was found
at io.grpc.Status.asRuntimeException(Status.java:539) ~[grpc-api-1.54.1.jar!/:1.54.1]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487) ~[grpc-stub-1.54.1.jar!/:1.54.1]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:576) ~[grpc-core-1.54.1.jar!/:1.54.1]
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) ~[grpc-core-1.54.1.jar!/:1.54.1]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:757) ~[grpc-core-1.54.1.jar!/:1.54.1]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:736) ~[grpc-core-1.54.1.jar!/:1.54.1]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.54.1.jar!/:1.54.1]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.54.1.jar!/:1.54.1]
… 3 more

Hi @Avoodoo :wave: and welcome to our forums!

It’s pretty interesting that Zeebe cannot find the process instance with key 2251799814458669 that you’ve tried to cancel from Operate.

Typically, this should work, but perhaps one of the following is true:

  • The process instance no longer exists in Zeebe, but that information has not been exported to ElasticSearch or Operate has not yet imported this information from ElasticSearch.
  • The process instance is called from a parent process instance using a Call Activity. Child process instances cannot be canceled directly (at the time of writing).
  • The process instance encountered an error while terminating, raising an incident. This should be visible in Operate. If so, I think you might have encountered a bug, so please share a screenshot of Operate showing the process instance with the incident (can be sent to me via DM if you want to avoid sharing this publicly). If the process instance is stuck while terminating, new cancel commands would be rejected with a NOT_FOUND
  • The process instance encountered an unrecoverable unexpected error and was banned as a result. This is a safety mechanism protecting the rest of your process instances. Sadly, Operate does not show whether an instance is banned. If you have access to the metrics (Monitoring) you’ll be able to find this under zeebe_banned_instances_total.

Could the redeployment under a new file name (same IDs) be the reason for the issue?

No. When deploying the model under a new file name with the same IDs, the process is either considered a duplicate, or it’s considered a new version of that process. In any case, the existing process instance is left untouched, because it is an instance of the previously deployed process version. Zeebe keeps track of all versions of the process that you’ve deployed, and you can even create instances of older versions. See CreateProcessInstance.

How can we stop, cancel, delete the instances or the whole process which is not needed anymore?

Typically, your approach (deleting via Operate) should work. But, let’s have a look at my mentions above. Deleting an entire process is a feature that is out since 8.3.0-alpha5. Note that it requires all instances of that process to be completed or terminated.

If we have to delete it manually how can we do it if possible?

As a last resort, you could ban the instances manually, to fully disable them. Afterward, you can mark them as canceled in Operate by updating the indices in ElasticSearch. I don’t know the details on how to achieve this, but it should not be too difficult.

1 Like