Job status on zeebe and operate not equal

I’ve met the problem that when I fail job via go client

client.NewFailJobCommand().JobKey(job.GetKey()).Retries(job.Retries - 1).ErrorMessage(errorInfo).Send(ctx)

this code will successfully fail the job on zeebe side, but in the Operate this instance always shows as “Active”


I tried manually fail this job via CLI client

$ zbctl fail job 2251799827014829 --errorMessage 'manual fail job' --retries=-1

Error: rpc error: code = FailedPrecondition desc = Command 'FAIL' rejected with code 'INVALID_STATE': Expected to fail job with key '2251799827014829', but it is in state 'FAILED'

zeebe version: 8.0.0
Operate version: 8.0.0
Operate config:

camunda.operate:
  # ELS instance to store Operate data
  elasticsearch:
    numberOfShards: 3
    numberOfReplicas: 1
    # Cluster name
    clusterName: ce-elasticsearch
    # URL
    url: http://es.example.com:9200

So how do I diagnose this problem? thank you.

Hi @Lethe_Yi ,

Are you able to upload Operate log messages, maybe filtered by the flow node instance id, or process instance id? I can imagine that the import/processing of the Zeebe events into Operate has failed.

Thanks and best regards,

Ralf

The flow node instance Id in the Operate screenshot does not match the job Id in the fail command.

You know that when you fail a job, it will be immediately available for reactivation on the broker if it has >0 retries? This means that any worker serving the task type of that job will reactivate it straight away with a new job id.

Could this be what you are seeing?

Josh

I have several process instances that always keep “Active” on Operate until I manually click “cancel” button to force cancel them. And workers are healthy during this time.

And how to get the job Id from Operate? I just get the job Id from my worker’s log.
BTW I set all service task node’s retries=0

I cannot find operate logs that contains instance/node id.
In the logs excludes heart beat records such as

2022-07-18 18:18:52.303 [Broker-0-HealthCheckService] [Broker-0-zb-actors-1] INFO
      io.camunda.zeebe.broker.system - Partition-1 recovered, marking it as healthy

I see some WARN logs like

2022-07-18 18:23:37.720 [] [grpc-default-worker-ELG-5-2] WARN
      io.grpc.netty.NettyServerHandler - Stream Error
io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
        at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:172) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:481) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:357) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:1007) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:963) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:515) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:521) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:613) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:444) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:509) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:259) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) ~[netty-codec-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) ~[netty-codec-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) ~[netty-codec-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
        at java.lang.Thread.run(Unknown Source) ~[?:?]
2022-07-18 18:23:37.721 [] [grpc-default-worker-ELG-5-2] WARN
2022-07-18 03:32:47.010 [] [grpc-default-worker-ELG-5-3] INFO
      io.grpc.netty.NettyServerTransport.connections - Transport failed
io.netty.handler.codec.http2.Http2Exception: HTTP/2 client preface string missing or corrupt. Hex dump for received bytes: 16030100f3010000ef03039dd5e0b87f25619d3e29dd7d50
        at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:108) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.readClientPrefaceString(Http2ConnectionHandler.java:306) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:239) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[netty-codec-http2-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) ~[netty-codec-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) ~[netty-codec-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) ~[netty-codec-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
        at java.lang.Thread.run(Unknown Source) ~[?:?]

And I’ve found that, all pending “Active” process instances shows on Operate is actually failed by “FailJobCommand” by my workers.
I can directly access the incident data from ES shows that

"incidents": [
      {
        "errorMessage": "Failed to send activated jobs to client",
        "errorType": "No more retries left",
        "flowNodeId": "90iel4msulo5jjb2nvg",
        "uid": "90iel4msulo5jjb2nvg",
        "creationTime": "2022-07-04T07:56:37.538+0000"
      }
    ]