Hi @gumang We have tried a few things and overall I think it has improved matters but its hard to tell for sure but I don’t think we have seen this issue for awhile now.
So one of the things we have done is to always make call activities async after.
And the other thing was to override the AsyncContinuationJobHandler
public class CustomAsyncContinuationJobHandler extends AsyncContinuationJobHandler {
private static final Logger LOG = LoggerFactory.getLogger(CustomAsyncContinuationJobHandler.class);
// Use cache bounded in size to avoid memory leaks.
private Cache<String, Boolean> retriedExecutionId = CacheBuilder.newBuilder().maximumSize(50).build();
@Override
public void execute(AsyncContinuationConfiguration configuration, ExecutionEntity execution, CommandContext commandContext, String tenantId) {
String retryKey = execution.getId() + "_" + execution.getActivityId();
try {
super.execute(configuration, execution, commandContext, tenantId);
if (retriedExecutionId.getIfPresent(retryKey) != null) {
retriedExecutionId.invalidate(retryKey);
LOG.warn("Successful retry of Legacy Behaviour activity '{}' and execution '{}'",
execution.getActivityId(),
execution.getId());
}
} catch (NullPointerException npe) {
//If the null pointer is a legacy async exception we want to log it and retry
if (npe.getStackTrace()[0].getClassName().equals(LegacyBehavior.class.getName())) {
if (retriedExecutionId.getIfPresent(retryKey) == null) {
LOG.warn("Legacy Behaviour Exception caught, retrying. Running Async Job on activity '{}' and execution '{}'",
execution.getActivityId(),
execution.getId());
retriedExecutionId.put(retryKey, Boolean.TRUE);
throw new RetriableJobException(npe.getMessage());
} else {
LOG.warn("Legacy Behaviour Exception caught after retry. Running Async Job on activity '{}' and execution '{}'",
execution.getActivityId(),
execution.getId());
}
}
throw npe;
}
}
}
And a custom job retry cmd
public class CustomJobRetryCmd extends DefaultJobRetryCmd {
public CustomJobRetryCmd(String jobId, Throwable exception) {
super(jobId, exception);
}
@Override
protected boolean shouldDecrementRetriesFor(Throwable t) {
return super.shouldDecrementRetriesFor(t) && !(t instanceof RetriableJobException);
}
}
And a custom failed FailedJobCommandFactory
public class CustomFailedJobCommandFactory implements FailedJobCommandFactory {
@Override
public Command<Object> getCommand(String jobId, Throwable exception) {
return new CustomJobRetryCmd(jobId, exception);
}
}
And to set it up
config.setFailedJobCommandFactory(new CustomFailedJobCommandFactory());
config.getCustomJobHandlers().add(new CustomAsyncContinuationJobHandler());
The RetriableJobException is our own defined Exception
Hope that helps