How can I “create a process instance with result” *and* get information about exceptions in the process instance?

JGeek: Create Process Instance With Results

We are exploring <Create a process instance with results | Camunda Platform 8 Docs approach> of creating process instance though we are blocked with an issue. When runtime exceptions occur while executing the workflow we don’t get to know the process instance key of the workflow and neither the error stack trace that appeared. Are there any tips that I am missing out on?

Thanks

JGeek: Create Process Instance With Results

We are exploring <Create a process instance with results | Camunda Platform 8 Docs approach> of creating process instance though we are blocked with an issue. When runtime exceptions occur while executing the workflow we don’t get to know the process instance key of the workflow and neither the error stack trace that appeared. Are there any tips that I am missing out on?

Thanks

Josh Wulf: Do you mean that your worker code and the code that starts this process are running in the same application?

JGeek: No. I didn’t mean that.
I am trying to figure out a way through which I can get to know the status of the process (whether it completed?) and the value of the process instance variables defined in the bpmn file.

There is one approach mentioned in the docs but it doesn’t work when exceptions occur at runtime.

korthout: @JGeek Querying process instance status is best done via <https://docs.camunda.io/docs/apis-clients/operate-api/|Operate’s API> or https://github.com/camunda-community-hub/zeeqs/|Zeeqs (if you don’t have a license for Operate)

JGeek: @korthout - thanks for the inputs. We don’t have the license of Operate (though we have seen in development environment that even operate doesn’t scale well during load) hence we tried using zeeqs but zeeqs doesn’t scale well and isn’t designed for production use. Hence we started exploring other approaches to know the workflow status.

I understand one way is to write a custom exporter or even an app that can read from existing exporters (hazelcast/elasticsearch). This requires quite some development time and effort.

korthout: The problem for us is that we cannot know whether or not the process instance has reached a state in which it won’t still complete, because:
• the process could be modeled to contain other ways to complete, even if an incident occurs in another part of the process
• this could also be the case via interactions with messages/signals
• the incident could be automatically resolved
• and there are probably other cases I couldn’t come up with just now
So in the end, all we can do is inform the client when the process instance has completed (or when it’s cancelled :thinking_face:)

One potential idea is to depend on the request timeout if you know how long your process typically takes.

korthout: Another idea is to model the failures you described explicitly with Error Events leading to alternative End Events

JGeek: > • the incident could be automatically resolved
What configuration do I need to automatically resolve an incident? I am thinking if I can do it for runtime exceptions so that I can use “CreateProcessInstanceWithResult” api.

Josh Wulf: Something very complex and distributed. You would need to listen to the exported events from the cluster to detect the incident, but you couldn’t correlate it back to the calling code, because it didn’t get a process id.

So you would call it with createProcess, get an id, then….

No wait, I’m about to design a very complex distributed architecture that will solve your problem by introducing a 1000% increase in system complexity.

korthout: > • the incident could be automatically resolved
> What configuration do I need to automatically resolve an incident?
@JGeek , my comment was not about you automatically resolving the incident, but the engine resolving the incident. For example, when the flow in which the incident exists gets interrupted by another flow or a triggering event. My point is not about resolving the incident, but that the Zeebe engine cannot know whether a process instance has reached an state in which it won’t still complete. This information is necessary to respond a “result failed” response to the CreateProcessInstanceWithResult. Does that make sense?

JGeek: > My point is not about resolving the incident, but that the Zeebe engine cannot know whether a process instance has reached an state in which it won’t still complete
right and that’s where I am trying to see if we can configure the engine in some way to fail the process when runtime exception occurs. From @Josh Wulf comments I understand that the only way to do this is to write your own exporter - listen to workflow events and send a fail grpc command when runtime exception occurs.

Josh Wulf: Yeah, this combination is currently not possible.

@korthout - could it be possible to make CreateProcessInstanceWithResult return a stream, and stream back first the process instance id, then the outcome?

Is there some technical limitation in the gateway / broker architecture that precludes this?

korthout: @Josh Wulf that would be a breaking change. I don’t see any way to achieve that without introducing an entirely new RPC. I don’t see us making such a change any time soon.

Coming back to the original question:
> When runtime exceptions occur while executing the workflow we don’t get to know the process instance key of the workflow
I’m still struggling to understand this. If an incident occurs and you consume the exported Incident record, then <zeebe/IncidentRecord.java at 8.1.6 · camunda/zeebe · GitHub contains the process instance key>.

What am I missing? :smile:

JGeek: > If an incident occurs and you consume the exported Incident record, then <zeebe/IncidentRecord.java at 8.1.6 · camunda/zeebe · GitHub contains the process instance key>.
@korthout - Our code to start a process instance what you see https://docs.camunda.io/docs/apis-clients/java-client-examples/process-instance-create-with-result/#processinstancewithresultcreatorjava|here. When an incident occurs an exception is thrown by the api and what we get is only a io.camunda.zeebe.client.api.command.ClientException This doesn’t have anything except a message string. How do I get the Incident record?

korthout: I’m sorry @JGeek but I don’t think you can use the WithResult RPC like that. It’s clearly a flaw in our design, but currently Zeebe just doesn’t have any way to figure out that something went wrong. The ClientException is caused by a Timeout of the request from the client to the gateway. From Zeebe’s perspective the process instance is still running (even if an incident exists on it). It can only respond to the Create…WithResult RPC once the created process instance has fully completed.

What you want to achieve will require consuming the exported log. I see no other solution. @Josh Wulf Do you see any other solutions?

Josh Wulf: Yes, I do actually, because I am a hacker.

Josh Wulf: Here is how you do it. You add your own uuid to the variable payload. Now you have a known correlation key.

Then you have workers post their stack traces with the correlation keys to a central datastore.

If your withResult call times out, you can query the datastore for the correlation key and get the corresponding failed worker’s stack trace.

JGeek: <@UT1BZ1GAG>
> Then you have workers post their stack traces with the correlation keys to a central datastore.
Does this hack require us to write an zeebe exporter? If so then this is something that we discussed earlier on this thread and doesn’t look to be an easy solution

Josh Wulf: 1. Add your own uuid to the variable payload. Now you have a known correlation key.
2. Have workers post their stack traces with the correlation keys to a central datastore.
3. If your withResult call times out, query the datastore for the correlation key and get the corresponding failed worker’s stack trace.
Those are the only three steps to this pattern.

Josh Wulf: It requires an external database accessible by your workers and your application, no exporter.

JGeek: > Have workers post their stack traces with the correlation keys to a central datastore.
@Josh Wulf - our workflows are designed by the admin of our application. Also most of the tasks in the workflows are service tasks that call rest apis of different microservices. There may not always be a need to use a service task that calls our worker. For this solution it seems we will have to modify the https://github.com/camunda-community-hub/zeebe-http-worker|zeebe-http-worker.

Also this solution requires all workers used in designing the workflow to post their stacktraces. This means we have to wrap all our worker code within a try-catch block to catch any exceptions to appear during execution (not a recommended code practice). If we do this somehow the process would anyways complete and we would get the result.

Josh Wulf: The worker library already has “a catch all unhandled exceptions” wrapper that calls the FailJob API. So this is another “aspect-oriented” concern that would belong in a library layer.

It’s “not a recommended code practice” for an application coder, but you wouldn’t let them do that. You would bake it into a library which is your own wrapper around the client library.

And yes, you would need your own custom version of the zeebe-http-worker.

Because you have a custom requirement.

JGeek: > The worker library already has “a catch all unhandled exceptions” wrapper that calls the FailJob API.
Are you referring to some java zeebe library? If so, can you please point me to it?
> And yes, you would need your own custom version of the zeebe-http-worker
I am unable to follow the solution - if there is a worker library which I assume is a core java library extended/used by all workers, then why would I need to modify zeebe-http-worker code too.

Josh Wulf: Because you would need to rebuild the worker using the new library.

JGeek: > Because you would need to rebuild the worker using the new library.
can you please share the link to this core worker library?

Josh Wulf: It’s the Spring Worker that does this

Josh Wulf: https://github.com/camunda-community-hub/spring-zeebe/

Josh Wulf: Sorry, no it’s the standard Zeebe Java client (which is in the Zeebe code base).

The Spring Zeebe client wraps that.

Here is the code from the Java client:

/**
   * Handles a job. Implements the work to be done whenever a job of a certain type is received.
   *
   * &lt;p&gt;In case the job handler throws an exception the job is failed and the job retries are
   * automatically decremented by one. The failed job will contain the exception stacktrace as error
   * message.
   *
   * &lt;p&gt;If the retries reaches zero an incident will be created, which has to be resolved before the
   * job is available again (see {@link ZeebeClient#newResolveIncidentCommand(long)}
   */
  void handle(JobClient client, ActivatedJob job) throws Exception;

Josh Wulf: https://github.com/camunda/zeebe/blob/1edfe4f762ef700885859bf87e1370963d3448a1/clients/java/src/main/java/io/camunda/zeebe/client/api/worker/JobHandler.java#L37-L36|https://github.com/camunda/zeebe/blob/1edfe4f762ef700885859bf87e1370963d3448a1/cli[…]rc/main/java/io/camunda/zeebe/client/api/worker/JobHandler.java

JGeek: @Josh Wulf - are you coming to a conclusion that there is no core worker library that every job worker extends from? Hence we would need to modify every job worker code used in the workflow for the hack to work

Josh Wulf: My point is that you provide a library with the functionality baked into it.

It could be in the form of an annotation, or a function call. Let me write is as pseudocode, so you get an idea.

function createWorker (String taskType, Function jobHandler) {
    const wrappedJobHandler = function _ (ZeebeJob job) {
        try {
            jobHandler(job)
        } catch (Error e) {
            postStackTrace(job.processKey, e)
            throw e
        }
    Zeebe.JavaWorker.createWorker(taskType, jobHandler)
}

Now, instead of using the standard createWorker from the Zeebe client library, your applications use the custom createWorker with the wrapper. It’s aspect-oriented programming using a facade.

Josh Wulf: https://en.wikipedia.org/wiki/Aspect-oriented_programming

Josh Wulf: https://en.wikipedia.org/wiki/Decorator_pattern

Josh Wulf: The decorator pattern also can augment the <https://en.wikipedia.org/wiki/Facade_pattern|Facade pattern>. A facade is designed to simply interface with the complex system it encapsulates, but it does not add functionality to the system. However, the wrapping of a complex system provides a space that may be used to introduce new functionality based on the coordination of subcomponents in the system. For example, a facade pattern may unify many different languages dictionaries under one multi-language dictionary interface. The new interface may also provide new functions for translating words between languages. This is a hybrid pattern - the unified interface provides a space for augmentation. Think of decorators as not being limited to wrapping individual objects, but capable of wrapping clusters of objects in this hybrid approach as well.

Josh Wulf: I know that the Java client uses a builder pattern, but I would create a class that provides a createWorker method, and in that method you wrap the job handler that was passed in, and call the builder.

Josh Wulf: @archivist2 How can I use CreateProcessInstanceWithResult and get information about exceptions in the process instance?

Note: This post was generated by Slack Archivist from a conversation in the Camunda Platform 8 Slack, a source of valuable discussions on Camunda 8 (get an invite). Someone in the Slack thought this was worth sharing!

If this post answered a question for you, hit the Like button - we use that to assess which posts to put into docs.