Expected to handle gRPC request, but received an internal error from broker: BrokerError{code=INTERNAL_ERROR, message='Failed to write client request to partition '1', because the writer is full.'} io.camunda.zeebe.gateway.cmd.BrokerErrorException: Receiv

wjw314520 · April 18, 2025, 6:31pm

Hello ,How should I solve this situation?

ERROR [io.camunda.zeebe.gateway.grpc.GrpcErrorMapper.mapBrokerErrorToStatus(GrpcErrorMapper.java:147)] Expected to handle gRPC request, but received an internal error from broker: BrokerError{code=INTERNAL_ERROR, message=‘Failed to write client request to partition ‘1’, because the writer is full.’}

io.camunda.zeebe.gateway.cmd.BrokerErrorException: Received error from broker (INTERNAL_ERROR): Failed to write client request to partition ‘1’, because the writer is full.

at io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager.handleResponse(BrokerRequestManager.java:194) ~[zeebe-gateway-8.2.11.jar:8.2.11]

at io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequestInternal$2(BrokerRequestManager.java:143) ~[zeebe-gateway-8.2.11.jar:8.2.11]

at io.camunda.zeebe.scheduler.future.FutureContinuationRunnable.run(FutureContinuationRunnable.java:28) [zeebe-scheduler-8.2.11.jar:8.2.11]

at io.camunda.zeebe.scheduler.ActorJob.invoke(ActorJob.java:94) [zeebe-scheduler-8.2.11.jar:8.2.11]

at io.camunda.zeebe.scheduler.ActorJob.execute(ActorJob.java:45) [zeebe-scheduler-8.2.11.jar:8.2.11]

at io.camunda.zeebe.scheduler.ActorTask.execute(ActorTask.java:119) [zeebe-scheduler-8.2.11.jar:8.2.11]

at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask(ActorThread.java:109) [zeebe-scheduler-8.2.11.jar:8.2.11]

at io.camunda.zeebe.scheduler.ActorThread.doWork(ActorThread.java:87) [zeebe-scheduler-8.2.11.jar:8.2.11]

at io.camunda.zeebe.scheduler.ActorThread.run(ActorThread.java:205) [zeebe-scheduler-8.2.11.jar:8.2.11]
ERROR [io.camunda.zeebe.gateway.grpc.GrpcErrorMapper.mapBrokerErrorToStatus(GrpcErrorMapper.java:147)] Expected to handle gRPC request, but received an internal error from broker: BrokerError{code=INTERNAL_ERROR, message=‘Failed to write client request to partition ‘4’, because the writer is full.’}
io.camunda.zeebe.gateway.cmd.BrokerErrorException: Received error from broker (INTERNAL_ERROR): Failed to write client request to partition ‘4’, because the writer is full.
at io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager.handleResponse(BrokerRequestManager.java:194) ~[zeebe-gateway-8.2.11.jar:8.2.11]
at io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequestInternal$2(BrokerRequestManager.java:143) ~[zeebe-gateway-8.2.11.jar:8.2.11]
at io.camunda.zeebe.scheduler.future.FutureContinuationRunnable.run(FutureContinuationRunnable.java:28) [zeebe-scheduler-8.2.11.jar:8.2.11]
at io.camunda.zeebe.scheduler.ActorJob.invoke(ActorJob.java:94) [zeebe-scheduler-8.2.11.jar:8.2.11]
at io.camunda.zeebe.scheduler.ActorJob.execute(ActorJob.java:45) [zeebe-scheduler-8.2.11.jar:8.2.11]
at io.camunda.zeebe.scheduler.ActorTask.execute(ActorTask.java:119) [zeebe-scheduler-8.2.11.jar:8.2.11]
at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask(ActorThread.java:109) [zeebe-scheduler-8.2.11.jar:8.2.11]
at io.camunda.zeebe.scheduler.ActorThread.doWork(ActorThread.java:87) [zeebe-scheduler-8.2.11.jar:8.2.11]
at io.camunda.zeebe.scheduler.ActorThread.run(ActorThread.java:205) [zeebe-scheduler-8.2.11.jar:8.2.11]
Is the concurrency capability insufficient? Or is there a problem with my configuration?
This problem occurs in high concurrency situations.

aravindhrs · April 18, 2025, 7:13pm

@wjw314520 The error message you’re encountering—Failed to write client request to partition '1', because the writer is full—indicates that the Zeebe broker’s internal writer buffer for the specified partition is saturated. This situation typically arises under high load conditions, where the broker cannot process incoming requests swiftly enough, leading to backpressure and potential request failures.

To address this issue, consider the following strategies:

1. Implement Client-Side Retry Mechanisms

Zeebe’s gRPC API communicates backpressure through specific error codes. When the broker is overwhelmed, it may return a RESOURCE_EXHAUSTED status. In such cases, it’s advisable to implement retry logic on the client side, utilizing strategies like exponential backoff or jitter to prevent overwhelming the broker further.

2. Optimize Broker Configuration

Adjusting certain broker settings can help manage load more effectively:

maxCommandsInBatch: This setting determines the maximum number of commands processed in a single batch. Reducing this value can prevent large batches from overwhelming the system, especially if you’re experiencing frequent rollbacks due to oversized batches.
Partitioning Strategy: If you’re using a fixed partitioning scheme, ensure that partitions are evenly distributed across brokers to balance the load effectively.

3. Scale Resources Appropriately

High CPU utilization or throttling can exacerbate backpressure issues. Consider the following:

Increase Partition Count: Adding more partitions can distribute the load more evenly, especially if your current setup has brokers leading multiple partitions.
Scale Up Brokers: If brokers are consistently under high load, scaling up (adding more brokers) can help distribute the processing load.
Allocate Sufficient CPU Resources: Ensure that brokers and the gateway have adequate CPU allocations to handle the processing demands.

4. Monitor and Address Scheduled Task Overloads

Scheduled tasks, such as timers, can sometimes lead to sudden spikes in load, overwhelming the broker’s writer buffer. Ensure that scheduled tasks are managed efficiently and do not introduce unexpected load surges.

5. Ensure Idempotent Worker Implementations

Given Zeebe’s at-least-once delivery semantics, it’s crucial to design worker implementations that can handle duplicate messages gracefully. This approach prevents inconsistencies and ensures reliable processing, even under retry scenarios.

By implementing these strategies, you can mitigate the “writer is full” errors and enhance the resilience and performance of your Zeebe deployment.