Camunda Operate 8.1.6 java.lang.OutOfMemoryError: Java heap space

camundaenthu · April 26, 2023, 7:36pm

Hello, we have deloyed the chart camunda-platform-8.1.6. The camunda-operate pod keeps restarting due to

java.lang.OutOfMemoryError: Java heap space

Appreciate any help addressing this.

satyaram413 · April 27, 2023, 3:45am

Try increasing the memory for that particular pod, if you are helm chart for that, try increasing the memory in yaml file.

camundaenthu · April 27, 2023, 5:28am

@satyaram413 thanks, tried it but didn’t help. This is the configuration we use:

        resources:
          limits:
            cpu: "2"
            memory: 2Gi
          requests:
            cpu: 600m
            memory: 400Mi

Increased memory to 800Mi, still Out of Memory with Java heap space

How can I configure the java heap min and max for Operate? Is there an env variable to specify Xms and Xmx?

For e.g. elasticsearch javaopts can be configured using

esJavaOpts: "-Xmx1g -Xms1g"

Is there an equivalent for Operate?

satyaram413 · April 27, 2023, 5:33am

This is my Configuration for local Kubernetes cluster using helm chart

global:
  identity:
    auth:
      # Disable the Identity authentication for local development
      # it will fall back to basic-auth: demo/demo as default user
      enabled: false

# Disable identity as part of the camunda platform core
identity:
  enabled: false

optimize:
  enabled: false

# Reduce for Zeebe and Gateway the configured replicas and with that the required resources
# to get it running locally
zeebe:
  clusterSize: 1
  partitionCount: 1
  replicationFactor: 1
  pvcSize: 10Gi

zeebe-gateway:
  replicas: 1

connectors:
  enabled: true
  inbound:
    mode: disabled

# Configure elastic search to make it running for local development
elasticsearch:
  imageTag: 7.16.2
  replicas: 1
  minimumMasterNodes: 1
  # Allow no backup for single node setups
  clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"

  # Allocate smaller chunks of memory per pod.
  resources:
    requests:
      cpu: "500m"
      memory: "2048M"
    limits:
      cpu: "1000m"
      memory: "2048M"

  # Request smaller persistent volumes.
  volumeClaimTemplate:
    accessModes: [ "ReadWriteOnce" ]
    storageClassName: "standard"
    resources:
      requests:
        storage: 15Gi

camundaenthu · April 27, 2023, 5:59am

@satyaram413 what is your resource configuration for operate? Do you use the defaults that come with the Helm Chart? What I mentioned in my post is the resource configuration for the operate deployment. We use default values provided in the Helm Chart for resources for elasticsearch, operate and zeebe

camundaenthu · April 27, 2023, 6:08am

I noticed the following in the operate logs. It seems operate may be stuck processing a flow node and incident. It seems to be processing the same over and over. I suspect a recursion max depth situation?

'Expected to throw an error event with the code 'exception' with message 'Failed job. Recoverable error: local path does not exist: /data/temp/new/docs/doc4', but it was not caught. No error events are available in the scope.', errorMessageHash=-972287963, state=ACTIVE, flowNodeId='NO_CATCH_EVENT_FOUND', flowNodeInstanceKey=4503599650952071, jobKey=4503599650952075, processInstanceKey=4503599650952064, creationTime=2023-03-27T22:51:15.230Z, processDefinitionKey=2251799813685251, treePath='null', pending=true}]
2023-04-27 05:18:25.394 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Finished processing
2023-04-27 05:18:25.422 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Processing flow node instances: [4503599650952071] and incidents: [4503599650952091]
2023-04-27 05:18:25.428 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Processing pending incidents: [IncidentEntity{key=4503599650952091, errorType=UNHANDLED_ERROR_EVENT, errorMessage='Expected to throw an error event with the code 'exception' with message 'Failed job. Recoverable error: local path does not exist: 'Expected to throw an error event with the code 'exception' with message 'Failed job. Recoverable error: local path does not exist: /data/temp/new/docs/doc4', but it was not caught. No error events are available in the scope.', errorMessageHash=-972287963, state=ACTIVE, flowNodeId='NO_CATCH_EVENT_FOUND', flowNodeInstanceKey=4503599650952071, jobKey=4503599650952075, processInstanceKey=4503599650952064, creationTime=2023-03-27T22:51:15.230Z, processDefinitionKey=2251799813685251, treePath='null', pending=true}]
2023-04-27 05:18:25.394 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Finished processing
2023-04-27 05:18:25.422 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Processing flow node instances: [4503599650952071] and incidents: [4503599650952091]
2023-04-27 05:18:25.428 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Processing pending incidents: [IncidentEntity{key=4503599650952091, errorType=UNHANDLED_ERROR_EVENT, errorMessage='Expected to throw an error event with the code 'exception' with message 'Failed job. Recoverable error: local path does not exist: /data/temp/new/docs/doc4', but it was not caught. No error events are available in the scope.', errorMessageHash=-972287963, state=ACTIVE, flowNodeId='NO_CATCH_EVENT_FOUND', flowNodeInstanceKey=4503599650952071, jobKey=4503599650952075, processInstanceKey=4503599650952064, creationTime=2023-03-27T22:51:15.230Z, processDefinitionKey=2251799813685251, treePath='null', pending=true}]
2023-04-27 05:18:25.448 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Finished processing
2023-04-27 05:18:25.476 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Processing flow node instances: [4503599650952071] and incidents: [4503599650952091]
2023-04-27 05:18:25.487 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Processing pending incidents: [IncidentEntity{key=4503599650952091, errorType=UNHANDLED_ERROR_EVENT, errorMessage='Expected to throw an error event with the code 'exception' with message 'Failed job. Recoverable error: local path does not exist: /data/temp/new/docs/doc4', but it was not caught. No error events are available in the scope.', errorMessageHash=-972287963, state=ACTIVE, flowNodeId='NO_CATCH_EVENT_FOUND', flowNodeInstanceKey=4503599650952071, jobKey=4503599650952075, processInstanceKey=4503599650952064, creationTime=2023-03-27T22:51:15.230Z, processDefinitionKey=2251799813685251, treePath='null', pending=true}]
2023-04-27 05:18:25.512 DEBUG 8 --- [   postimport_1] i.c.o.z.p.IncidentPostImportAction       : Finished processing
Terminating due to java.lang.OutOfMemoryError: Java heap space

Divyansh_Garg · November 7, 2023, 4:27pm

@camundaenthu i am facing the same error in Operate. Were you able to resolve your issue? How did you change the heap memory configurations for Operate?