Operate 8.7.3 - Error occurred while archiving data. Will be retried

igormsk · June 3, 2025, 9:27pm

Hello in the last days (maybe after upgrading to 8.7.3 from 8.6.x, maybe later - I’m not sure) I constantly see the following errors in Operate log (I run camunda self-managed in kuber with the provided latest helm chart). What does it mean? How can I resolve it?

2025-06-03 21:19:04.609 ERROR 7 --- [     archiver_1] i.c.o.a.AbstractArchiverJob              : Error occurred while archiving data. Will be retried.

java.util.concurrent.CompletionException: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-11 [ACTIVE]
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:708) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194) ~[?:?]
        at io.camunda.operate.util.ElasticsearchUtil$DelegatingActionListener.lambda$onFailure$1(ElasticsearchUtil.java:767) ~[operate-schema-8.7.3.jar:8.7.3]
        at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-6.2.6.jar:6.2.6]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-11 [ACTIVE]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387) ~[httpcore-nio-4.4.16.jar:4.4.16]
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:98) ~[httpasyncclient-4.1.5.jar:4.1.5]
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:40) ~[httpasyncclient-4.1.5.jar:4.1.5]
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) ~[httpcore-nio-4.4.16.jar:4.4.16]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261) ~[httpcore-nio-4.4.16.jar:4.4.16]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:506) ~[httpcore-nio-4.4.16.jar:4.4.16]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211) ~[httpcore-nio-4.4.16.jar:4.4.16]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) ~[httpcore-nio-4.4.16.jar:4.4.16]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.16.jar:4.4.16]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.16.jar:4.4.16]
        ... 1 more

nathan.loding · June 4, 2025, 3:13pm

Hi @igormsk - I admit I am not 100% certain from just the information provided, but the archive task within Operate is trying to move data to a dated index within Elasticsearch (or OpenSearch) for archival (docs reference). It seems it is timing out after 30 seconds during that operation.

Can you share your Helm values (with secrets redacted), or at least the global, elastic, and operate sections?

igormsk · June 23, 2025, 4:06pm

values.yaml (3.3 KB)
@nathan.loding Sorry for the long delay with my reply. Here are the parts of the values.yaml file you asked for.
Is there a way to increase the 30 sec timeout for this operation?

I’m still having this problem in my prod and test environments, even with the latest operate 8.7.6

nathan.loding · June 30, 2025, 5:26pm

Hi @igormsk - nothing jumps out at me in your values file, and no one else has jumped in on this thread. Have you opened a support ticket yet? It might be best to open a support ticket for this.

igormsk · July 4, 2025, 7:27am

Probably all I need to do is to increase the 30sec timeout for this request from operate to elasticSearch. How can I do it?

I found this How to avoid 30,000ms timeout during reindexing - Elasticsearch - Discuss the Elastic Stack
But how can I change the sockerTimeout set in Operate?

nathan.loding · July 7, 2025, 2:22pm

@igormsk - I don’t know if you can; it would be best to open a support ticket for this, I think.

igormsk · October 14, 2025, 8:27pm

I increased the timeout on requests from Operate to ElasticSearch by adding the following line
camunda.operate.elasticsearch.socket-timeout=1800000
into operate-version.properties inside the operate docker container in lib/operate-common-8.7.15.jar
The same method works also for Tasklist.

But it would be VERY nice to have this timeout configurable from values.yaml or through environment variables