Camunda OIDC Entra ID

Hi,

I am trying to setup OIDC with Entra ID, it is Camunda 8.5 deployed via Helm chart and i’m looking for some clarify on the following document.
Connect to an OpenID Connect provider | Camunda 8 Docs
We have a working instance until we try to deploy OIDC, but are struggling to troubleshoot why it fails once enabled.

My current values.yaml attached with placeholder values in

global:
  ingress:
    enabled: true
    host: "[ourcamundaurl].com"
    tls:
      enabled: true
      secretName: "[cert]"
  identity:
    auth:
      issuer: https://login.microsoftonline.com/[tenantID]/v2.0
      issuerBackendUrl: https://login.microsoftonline.com/[tenantID]/v2.0
      tokenUrl: https://login.microsoftonline.com/[tenantID]/oauth2/v2.0/token
      jwksUrl: https://login.microsoftonline.com/[tenantID]/discovery/v2.0/keys
      type: "MICROSOFT"
      publicIssuerUrl: https://login.microsoftonline.com/[tenantID]/v2.0

      identity:
        clientId: [camunda_app_clientId]
        existingSecret: [camunda-oidc-secret]
        audience: [camunda_app_clientId]
        initialClaimName: "oid"
        initialClaimValue: [unsure?)]
        redirectUrl: https://[ourcamundaurl].com/auth/login-callback

      operate:
        clientId: [camunda_app_clientId]
        audience: [camunda-oidc-secret]
        existingSecret: [camunda-oidc-secret]
        redirectUrl: https://ourcamundaurl.com/identity-callback

      tasklist:
        clientId: [camunda_app_clientId]
        audience: [camunda-oidc-secret]
        existingSecret: [camunda-oidc-secret]
        redirectUrl: https://[ourcamundaurl].com/identity-callback

      zeebe:
        clientId: [camunda_app_clientId]
        audience: [camunda_app_clientId]
        existingSecret: [camunda-oidc-secret]
        tokenScope: "[camunda_app_clientId]/.default"

      connectors:
        clientId: [camunda_app_clientId]
        existingSecret: [camunda-oidc-secret]

identity:
  contextPath: "/identity"
  fullURL: "https://[ourcamundaurl].com/identity"

optimize:
  enabled: false

operate:
  contextPath: "/operate"


tasklist:
  contextPath: "/tasklist"

zeebe:
  clusterSize: 1
  partitionCount: 1
  replicationFactor: 1
  pvcSize: 10Gi

zeebe-gateway:
  replicas: 1
  ingress:
    grpc:
      enabled: true
      host: "zeebe.$[ourcamundaurl].com"
      tls:
        enabled: true
        secretName: "[cert]"
    rest:
      enabled: true
      annotations:
      host: "zeebe.$[ourcamundaurl].com"
      tls:
        enabled: true
        secretName: "[cert]"

connectors:
  enabled: true

elasticsearch:
  master:
    replicaCount: 1
    persistence:
      size: 15Gi

The redirect URI’s in my Entra app reg match what I have above.
Access token & ID tokens enabled and admin consent given for the following API permissions, email, offline_access, openid and profile.

The documentation is confusing, am i also meant toa dd all the env vars to each container/service, or does the helm chart do that for me?

What is the initialClaimValue meant to be if using OIDC? The OID string of the first user?

Any help much appreciated…

Hi @anonymousbadger, welcome to the forums! When using Helm, I believe all you need are the values under the “Helm values” tab. The component specific configurations are needed if you are manually configuring components (for instance, using Docker and individually deploying images).

The initial claim name and value are needed to define the initial admin user who can then log into Identity and begin configuring the rest of the claim mappings (for roles, groups, etc.). Without the initial claim information, Camunda has no way to know which user should have any rights to administer the system.

You say it fails when it’s enabled: can you elaborate? What fails? What are the errors? What do the logs show?

Thanks for that, as I expected with the claim value. Apologies, should’ve elaborated but was wondering if I was missing something obvious.

The following pods are all failing to reach ready with various errors.
Connectors, Identity, Operate, Tasklist.

Identity: Complaining it can’t connect to query JDBC metadata and shuts down
Operate: Can’t reach Zeebe and eventually shuts down
Tasklist: Can’t reach Zeebe and eventually restarts the pod
Connectors: Spewing repeated errors about being unable to connect o operate or process any jobs for various workers.

Most pods are spitting out the following

The request's security level does not guarantee that the credentials will be confidential

Zeebe gateway

SEVERE: Exception while executing runnable io.grpc.internal.ServerImpl$ServerTransportListenerImpl$1HandleServerCall@dae4d8a
java.lang.IllegalStateException: java.lang.IllegalArgumentException: URI with undefined scheme
        at io.grpc.internal.ServerImpl$ServerTransportListenerImpl$1HandleServerCall.runInternal(ServerImpl.java:617)
        at io.grpc.internal.ServerImpl$ServerTransportListenerImpl$1HandleServerCall.runInContext(ServerImpl.java:603)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: URI with undefined scheme

Postgres, Keycloak, elastic look to be fine on the surface with pods running and only a couple of warnings.

Happy to provide morelogs if needed, but there’s so much in there i’m not sure where to start and none of it screams identity directly. I assume connectors is where we should be as OIDC client?

@anonymousbadger - I would start with troubleshooting Zeebe. None of the other components matter if Zeebe and Zeebe Gateway aren’t running (and this is why Operate and Tasklist and Connectors are failing). But first, two things:

  • the requests's security level does not guarantee... message is often expected; for instance, messages passing between services inside Kubernetes is often HTTP not HTTPS. You can adjust the log level to ignore these if you want.
  • I didn’t notice it originally, but the redirectUrl value for the components are incorrect in your Helm values. If you look at the table on the docs page, it has the redirect URLs for each component. For instance, using your config, tasklist.redirectUrl should be https://[ourcamundaurl].com/tasklist/identity-callback

I would fix that redirect configuration first. For the Zeebe Gateway error, it looks like there’s an issue with the identity.baseUrl property, which seems odd. Can you add this environment variable to the zeebeGateway config and let me know if that works (or if not, what the Zeebe Gateway logs show)? (Replace the [] with your value.)

zeebeGateway:
  env:
    - name: CAMUNDA_IDENTITY_BASEURL
      value: "http://[identity-pod-name]:80/auth"

Thanks for this it’s interesting you say that as I did get somewhat more promising errors when setting some env_vars as a test earlier.

Good spot on the tasklist, I assume that’s because we have set a context path?

Zeebe gateway still not happy, I assume by [identity-pod-name] you mean the service FQDN as pods are named uniquely per deployment? Or am I missing a way to do this.

Setting it to use our identity service endpoint shown below

(http://camunda-identity.camunda.svc.cluster.local:80/auth) 

gives the following error as far back in the logs as I can see.

Aug 08, 2024 5:17:16 PM io.grpc.internal.SerializingExecutor run
SEVERE: Exception while executing runnable io.grpc.internal.ServerImpl$ServerTransportListenerImpl$1HandleServerCall@2f79c1b5
java.lang.IllegalStateException: java.util.concurrent.RejectedExecutionException: Thread limit exceeded replacing blocked worker
        at io.grpc.internal.ServerImpl$ServerTransportListenerImpl$1HandleServerCall.runInternal(ServerImpl.java:617)
        at io.grpc.internal.ServerImpl$ServerTransportListenerImpl$1HandleServerCall.runInContext(ServerImpl.java:603)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.util.concurrent.RejectedExecutionException: Thread limit exceeded replacing blocked worker
        at java.base/java.util.concurrent.ForkJoinPool.tryCompensate(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.compensatedBlock(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.waitingGet(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
        at java.net.http/jdk.internal.net.http.HttpClientImpl.send(Unknown Source)
        at java.net.http/jdk.internal.net.http.HttpClientFacade.send(Unknown Source)
        at io.camunda.identity.sdk.impl.rest.RestClient.send(RestClient.java:118)
        at io.camunda.identity.sdk.impl.rest.RestClient.request(RestClient.java:105)
        at io.camunda.identity.sdk.impl.generic.GenericAuthentication.getPermissions(GenericAuthentication.java:139)
        at io.camunda.identity.sdk.authentication.AbstractAuthentication.verifyToken(AbstractAuthentication.java:215)
        at io.camunda.identity.sdk.authentication.AbstractAuthentication.verifyToken(AbstractAuthentication.java:164)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at io.camunda.identity.sdk.annotation.AnnotationProcessor.lambda$apply$0(AnnotationProcessor.java:33)
        at jdk.proxy2/jdk.proxy2.$Proxy108.verifyToken(Unknown Source)
        at io.camunda.zeebe.gateway.interceptors.impl.IdentityInterceptor.interceptCall(IdentityInterceptor.java:79)
        at io.grpc.ServerInterceptors$InterceptCallHandler.startCall(ServerInterceptors.java:269)
        at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.startWrappedCall(ServerImpl.java:701)
        at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.access$2200(ServerImpl.java:408)
        at io.grpc.internal.ServerImpl$ServerTransportListenerImpl$1HandleServerCall.runInternal(ServerImpl.java:613)
        ... 9 more

@anonymousbadger - the redirect URL was wrong for all the components with a context path, not just Tasklist. For the pod name, I think it should just be camunda-identity, based on what you’ve shared. (Note: that may not be the fix, it’s the first thing I thought of to try.)

Understood, i’ve added those paths to all 3 component redirectUrl’s.
Added the baseurl env var.

Errors are identical to the last configuration currently.

Should we move to loading env vars into containers rather than using helm to attempt to do the same thing? Or is this a replicable bug that can be fixed?

Hi,

Any suggestions to progress this one? Should we abandon the helm chart?

Thanks

@anonymousbadger - can you share the latest version of your values.yaml file, with the changes discussed above?

This is currently what we have with the proposed changes

global:
  ingress:
    enabled: true
    host: "[ourcamundaurl].com"
    tls:
      enabled: true
      secretName: "[cert]"
  identity:
    auth:
      issuer: https://login.microsoftonline.com/[tenantID]/v2.0
      issuerBackendUrl: https://login.microsoftonline.com/[tenantID]/v2.0
      tokenUrl: https://login.microsoftonline.com/[tenantID]/oauth2/v2.0/token
      jwksUrl: https://login.microsoftonline.com/[tenantID]/discovery/v2.0/keys
      type: "MICROSOFT"
      publicIssuerUrl: https://login.microsoftonline.com/[tenantID]/v2.0

      identity:
        clientId: [camunda_app_clientId]
        existingSecret: [camunda-oidc-secret]
        audience: [camunda_app_clientId]
        initialClaimName: "oid"
        initialClaimValue: "[user-oid-guid]"
        redirectUrl: https://[ourcamundaurl].com/identity/auth/login-callback

      operate:
        clientId: [camunda_app_clientId]
        audience: [camunda-oidc-secret]
        existingSecret: [camunda-oidc-secret]
        redirectUrl: https://ourcamundaurl.com/operate/identity-callback

      tasklist:
        clientId: [camunda_app_clientId]
        audience: [camunda-oidc-secret]
        existingSecret: [camunda-oidc-secret]
        redirectUrl: https://[ourcamundaurl].com/tasklist/identity-callback

      zeebe:
        clientId: [camunda_app_clientId]
        audience: [camunda_app_clientId]
        existingSecret: [camunda-oidc-secret]
        tokenScope: "[camunda_app_clientId]/.default"

      connectors:
        clientId: [camunda_app_clientId]
        existingSecret: [camunda-oidc-secret]

identity:
  contextPath: "/identity"
  fullURL: "https://[ourcamundaurl].com/identity"

optimize:
  enabled: false

operate:
  contextPath: "/operate"


tasklist:
  contextPath: "/tasklist"

zeebe:
  clusterSize: 1
  partitionCount: 1
  replicationFactor: 1
  pvcSize: 10Gi

zeebe-gateway:
  replicas: 1
  env:
    - name: CAMUNDA_IDENTITY_BASEURL
      value: http://camunda-identity:80/auth
  ingress:
    grpc:
      enabled: true
      host: "zeebe.$[ourcamundaurl].com"
      tls:
        enabled: true
        secretName: "[cert]"
    rest:
      enabled: true
      annotations:
      host: "zeebe.$[ourcamundaurl].com"
      tls:
        enabled: true
        secretName: "[cert]"

connectors:
  enabled: true

elasticsearch:
  master:
    replicaCount: 1
    persistence:
      size: 15Gi

@anonymousbadger - try adding this environment variable to the ZeebeGateway also and redeploying:

zeebe-gateway:
  env:
    - name: ZEEBE_GATEWAY_THREADS_GRPCMAXTHREADS
      value: 10

Adding this causes the zeebe gateway pod to not be created at all.
Removing it creates the pod again, strange.

@anonymousbadger - that’s quite strange … there are no errors in the deployment logs?

As it was being deployed through CI and failing anyway due to other pods not starting I couldn’t see them, but digging in it was because it’s set to an int but was expecting a string.

Deploying with that gives the same issue, same error.

Caused by: java.util.concurrent.RejectedExecutionException: Thread limit exceeded replacing blocked worker

Hi @anonymousbadger - unfortunately this has gone beyond my knowledge, so I’ve asked our engineers to take a peek; no guarantee on a response time though. In the mean time, when you’re doing a deployment, are you tearing down the previous environment (including PVCs)? If not, could you try this?

Yes everything is being removed, entire namespace is deleted also.

Thanks, if they need any direct contact they can reach me on my email tied to this forum account.

Any updates on this one?

Hi @anonymousbadger - unfortunately not. I’ve asked again to see if I can get some additional assistance.

Are you, or your company, currently looking into an enterprise license for Camunda? Another option would be to engage our support team if you already have a license.

@anonymousbadger - I have what I think might have the answer! Our engineers suggest upgrading to Zeebe 8.5.6. See this GitHub issue: gRPC thread pool saturated by blocked Identity calls · Issue #18697 · camunda/camunda · GitHub

To upgrade to Zeebe 8.5.6, you will need to use the v10.3.2 Helm charts (docs reference).

Try that, let me know if you have the same error, new errors, or if it works!!