Problem Correlating with Dynamic Message Names

I am using an externally-generated id for message correlation. The id is called requestId, and the messages are dynamically named as MsgSomeName-${requestId}. Every service task obtains a unique requestId, and immediately then sets it as a process variable. The next step after nearly every service task is a message wait on the dynamically-named message. I am very careful with scope so that a requestId is always unique within an execution.

As described, this works most of the time - but not always. Correlation failures occur because the requestId is apparently not available at the time the message name is derived. I’ve verified this by sleeping my thread for several seconds and then retrying. Sometimes the correlation succeeds after this. Notably, most of the correlation failures occur during multi-instance iterations.

The typical scenario is:

  1. Receive a request in an external controller; the request contains the requestId
  2. The controller uses Camunda runtime to send a message:
String messageName = controllerProperties.getMessagePrefix() + baseName + "-" + request.getId();
runtimeService.createMessageCorrelation(messageName).correlate();

Here is the BPMN. Note the start script on the “Parallel Panel Instance” where I create a local variable “requestId” and initialize it to null. In the “Match Input File Pair” service task, the delegate sets the requestId to its actual value in its execution. The message wait should then use this value in constructing its name. Also note that I set async before and after on the service task. I can’t see any reason why this process variable would not be available at the time the message name is being constructed.

ancova-analysis-match.bpmn (15.4 KB)

Here is the actual history data for a process instance that failed due to message correlation failures. There were 34 multi-instance items. You can see the loopCounter variable for each. You can also see that at the time of failure some of the requestIds had values, and others were null as if the values had not been written yet to the database, but all had been assigned values.

query.csv (35.9 KB)

I have seen this model and code work flawlessly with 200 iterations, and I have seen it fail with far fewer. I have been able to resubmit jobs that failed with message correlation exceptions and have them work on the next try. I really need to know if I am doing something wrong, or if there is a bug here. Our application just went to production, so your time is greatly appreciated.

Camunda Version 7.13