New Zeebe Redis Exporter

VonDerBeck · February 20, 2023, 1:09pm

Hi Community folk,

we just started a new community project: a Zeebe Redis Exporter

See GitHub - camunda-community-hub/zeebe-redis-exporter: Export events from Zeebe to Redis

Why do I want to use exporters?
Camunda 8 is based on a highly loose-coupled architecture - and I have On-Premise projects where I’m interested in fine-grained events from the engine like Incidents, Human-Task Jobs, etc. and where I do not want to poll any API but rather get notified immediately. To summarize: workers and connectors are great, but there are times where I want more.

Why Redis?
Redis (https://redis.io/) offers a lot of options to us. I can use it as fast in-memory technology and simple transport layer. I can optionally add persistence, but it is not required. I have Redis streams at my hand which are great when I want to use multiple consumers and scaling on the event receiver side. In Java the Lettuce client (https://lettuce.io/) offers great connectivity features out of the box, so that the exporter itself is easy to maintain. …

If neither the Hazelcast nor the Kafka exporter fits your exact needs, this could be the one you’re looking for.

Please be aware: the project is still in incubation phase and might change in some parts. If you like the project, give it a star on GitHub. If you have any suggestions, improvements, etc: there are GitHub issues and GitHub pull requests at your hand. If you have something much better to connect Zeebe to Redis, go for it and let us know. Happy to get you involved and to hear from you.

Gunnar

datakurre · March 20, 2023, 10:04am

@VonDerBeck Thanks. This is cool! I am dreaming of a “minimal” Zeebe-setup for mostly headless automation setups, and this would be perfect.

I tried this out with Python consumer, populating a SQLite database, and it was fun.

What kind of Redis setup are you using in production? Have you compared AOF to RDB for persistence? Have you tried to let consumer to do XTRIM once it has consumed the data? And any thoughts about TCP port vs Unix sockets with Redis server?

VonDerBeck · March 20, 2023, 4:13pm

@datakurre Thanks for your feedback

Some of your questions are quick to answer (at least for me):

Production Setup: it will be some time before we are in production. We are dealing with something bigger. Persistence will likely be a topic. Considering AOF / RDB - not yet done. So if you want to share some experience - you’re welcome.
TCP vs Unix sockets - not a topic for us. We’re running in a distributed cloud environment. TCP will be the only option.

The question currently on my mind is the question of the most efficient way to keep the streams as small as possible and to delete processed data in a timely manner. Always keeping in mind that we have scaled consumers with consumer groups and we only want to really delete data once it has been consumed by all different participants.
The current quick win is an option within the Java Connector being able to optionally call XDEL after a specific message has been acknowledged. Which is not yet what I really want - it does not consider multiple consumer groups and in terms of performance I do not yet know if this is really the way to go.
The exporter itself currently is able to XTRIM entries based on time (thus deleting too old entries). As stated in the project documentation we’re not yet done here.

But there might be a way to possibly achieve the real thing…

We can use XINFO GROUPS to get all consumer groups, including the “last-delivered-id”, the minimum of these tells us up to where we can delete. Hence we need to run XTRIM with MINID on a regular base using the XINFO GROUPS result as base. I still have to try that out.

What do you think? Am I missing something? The topic of “delete after acknowledge” is widely discussed in all sorts of Redis related threads - in very differenty flavours.

datakurre · March 20, 2023, 7:21pm

Thanks for quick answers. They made sense.

Personally I’d like try AOF, because I have 20 years of good experiences on AOF object database of particular brand, and therefore it sounds like an approach I could trust. That said, I have no experience about it with Redis, and I wonder how it manages to rewrite the files under high load. (According to docs, probably fine after version 7.0.0 with splitting it to multiple files.)

That sounds like something that you could eventually support directly in the plugin? That sounds superior to the current XTRIM based on time. But how to initialize the groups to wait for? Should that be in the plugin configuration or preconfigured to Redis instance before starting Zeebe?

VonDerBeck · March 22, 2023, 4:44pm

@datakurre:
That sounds like something that you could eventually support directly in the plugin? That sounds superior to the current XTRIM based on time. But how to initialize the groups to wait for? Should that be in the plugin configuration or preconfigured to Redis instance before starting Zeebe?

XINFO GROUPS returns a list of all consumer groups having consumed data together with their “last-delivered-id”. So it doesn’t seem to be a good idea to delete data where absolutely no consumer group at all has consumed any data. In this case I could optionally fall back to time based cleanup configured with a bigger time to live value. That’s easy. As far as this is an optional feature and everone knows what to expect it could be a way to go.

An edge case is, where I have an unknown consumer group which has never connected to Redis but where I want to startup this consumer at some time in the future and want to receive historic entries. Or where I had a consumer once connected but now being taken down forever.
In order to manage that I would need a complete list of potential consumer groups to consider during cleanup. The downside of such a consumer list parameter is indeed that I need to configure this stuff before starting Zeebe. And the configuration needs to change everytime I want to add or remove a new consumer group. Not sure if I like this. Rather not…!

A good compromise could be to combine the simple “delete-after-acknowledge” algorithm like described above with some time to live values, e.g. minTimeToLive and maxTimeToLive. This should leave enough room for all sorts of scenarios.

What do you think?

datakurre · March 22, 2023, 8:10pm

I believe that you are right.

With Zeebe, it is a tempting idea to store the complete event history, to be able to deploy new applications with their own “replayed” databases later. But most probably Redis is not the right tool for that. (I know that Kafka is, and personally I’d like to try out RabbitMQ Streams for that in the future.)

Also, I don’t have experience with Redis AOF yet with a lot of data, but if it really has to replay the whole AOF when it starts, maxTimeToLive really sounds mandatory to keep AOF size reasonable. (Of course, AOF also needs BGREWRITEAOF to actually rewrite new version of log without expired data.)

VonDerBeck · March 23, 2023, 12:15pm

Thanks. Exactly, Redis is not the right tool for that. It’s intended as high performance cache distributing events to other parties. Of course this needs some flexibility - but its not intended to replace e.g. Kafka. For other scenarios other exporters are available. Personally I’d like to keep the footprint as small as possible.

VonDerBeck · March 31, 2023, 9:36am

Version 0.9.5 of the Redis Exporter is now available - including an enhanced cleanup mechanism and some stability fixes in the Java Connector.

datakurre · April 18, 2023, 3:52am

Redis exporter is inspiring:

datakurre · September 22, 2023, 6:10am

@VonDerBeck I’d like to ask a few exporter related design questions, you probably know answer to or have thought about while developing Redis exporter:

What are the conditions where an exporter may lose records? The documentation is a little bit open for interpretation:

Zeebe only guarantees at-least-once semantics. That is, a record is seen at least once by an exporter, maybe more.

Currently, exporter implementations are expected to implement their own retry/error handling strategies, though this may change in the future.

If exporter is down, does Zeebe really wait until the end of available disk space until it truncates unexported recods?

Why not just chain Camunda maintained Hazelcast exporter with large enough buffer and TTL with a Redis shipper?

Thank you for your time.

VonDerBeck · September 22, 2023, 8:06am

@datakurre
Interesting and indeed important questions.

In order to get familiar with exporters I would definitely recommend to read the following 2 articles of Bernd Rücker:
Writing a Zeebe Exporter - Part One | Camunda
Writing a Zeebe Exporter - Part Two | Camunda

In order to understand the Zeebe Redis Exporter please be aware of the following:

The underlying implementation relies on the Lettuce Client which already implements retry/error handling strategies out of the box. This is the aspect which allows the implementation of the Redis exporter itself to be quite simple and leave some parts mentioned in part 2 of the article series above.
There is even a small unit test checking if the exporter works successfully in case Redis is temporarily unavailable / down.

Having that in mind I personally can’t see an advantage in chaining the Hazelcast exporter. This just shifts the problem to another place. Read the blog articles above and afterwards have a look at https://github.com/camunda-community-hub/zeebe-hazelcast-exporter/blob/85632ec4b82dfc297b7fc7a0cf6e9468d9e9d6cd/exporter/src/main/java/io/zeebe/hazelcast/exporter/HazelcastExporter.java#L144.

Have you found a specific gap in using the Redis exporter? Or is it first of all a certain distrust in Zeebe exporters in general? Maybe exporters would be a topic that Camunda could address in a tech talk at the Community Summit?

I actually consider the Redis Exporter to be stable. However, an extreme test of the absolut limits is still pending. That is why it is not yet version 1.0. If you have any insights from you own use case - please share them.

datakurre · September 22, 2023, 8:23am

Thanks a lot for a such quick reply! I had missed those articles by Bernd and will definitely check them.

As much I love the simplicity of Redis, we already have a trusted RabbitMQ cluster in place, and I am encouraged to look into exporting directly to RabbitMQ. Therefore I am learning, how hard it is to make it really robust.

Really appreciate your opinion on that chaining with existing exported would just add complexity.

Update: I believe that this was the thing I wanted to know

Remember that you must move the exporter record position forward, otherwise the broker will not truncate the event log.

And

github.com

camunda-community-hub/zeebe-hazelcast-exporter/blob/85632ec4b82dfc297b7fc7a0cf6e9468d9e9d6cd/exporter/src/main/java/io/zeebe/hazelcast/exporter/HazelcastExporter.java#L156


      
            if (ringbuffer != null) {
              final byte[] transformedRecord = recordTransformer.apply(record);
          
              final var sequenceNumber = ringbuffer.add(transformedRecord);
              logger.trace(
                  "Added a record to the ring-buffer [record-position: {}, ring-buffer sequence-number: {}]",
                  record.getPosition(),
                  sequenceNumber);
            }
          
            controller.updateLastExportedRecordPosition(record.getPosition());
          }
          
          private byte[] recordToProtobuf(Record record) {
            final Schema.Record dto = RecordTransformer.toGenericRecord(record);
            return dto.toByteArray();
          }
          
          private byte[] recordToJson(Record record) {
            final var json = record.toJson();
            return json.getBytes();

So, if the exporter is really responsible for moving the pointer ahead, with RabbitMQ I could possibly even use publisher confirms for export guarantee (if it scaled for 5000 msg / second 10 years ago in a simple test case, it probably performs even better now).

VonDerBeck · September 22, 2023, 1:11pm

@datakurre:
Thanks, now I have some more context

What I did so far with the Redis Exporter in a local environment with limited resources:

start Zeebe & Redis & a consumer
continuously start processes
shut down Redis, keep starting processes
wait, wait, wait more
ups, shutdown Zeebe as well (the Redis Lettuce client keeps things for retry in memory, so play the worst card and get rid of that memorized stuff as well)
startup Zeebe & Redis again
check if consumer has lost any events

Result:
well done, not one event lost. This is why I consider it stable.
missing is a load test in a distributed environment, checking performance after reconnecting to Redis, memory impact especially in case of a lost Redis connection, behaviour in case resources run really low, find out more about edge cases… Thats why it is not version 1.0
known todos: some improvements for the embedded cleanup mechanism in case Redis is not available

What I learned so far:

You need to be aware that with not consuming a record in your own additional exporter you can keep Zeebe from exporting this record to Elasticsearch as well! Which results in a restricted experience in Operate. The availability of the respective system to which you export records is not to be taken lightly.

If I where you I would build my own retry/exception handling mechanism for RabbitMQ and combine that with updating the last exported record position once you have guaranteed delivery. See part 2 of Bernds blog articles for ideas. For a more sophisticated exporter have a look into Camunda’s ElasticSearch Exporter. The Hazelcast Exporter is - in my humble opinion - only the “starter” edition but good for learning