Hi,
I am using zeebe 0.20.1. now I am getting out of memory issue. and it taking more space now. last zeebe 0.18.0 version is taking low space but this one taking more.
And also it’s taking more Ram up to 6.5 GB
How to reduce data?? is there any changes in the config I have to do?
log :
10/25/2019 7:08:18 PM ... 27 more
10/25/2019 7:08:18 PMCaused by: java.lang.OutOfMemoryError: Java heap space
10/25/2019 7:18:12 PM13:48:12.001 [] [raft-server-raft-atomix-partition-1] WARN io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Caught OutOfDiskSpace error! Force compacting logs...
10/25/2019 7:27:35 PM13:57:35.716 [] [raft-server-raft-atomix-partition-1] WARN io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Caught OutOfDiskSpace error! Force compacting logs...
10/25/2019 7:36:38 PM14:06:38.906 [] [raft-server-raft-atomix-partition-1] WARN io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Caught OutOfDiskSpace error! Force compacting logs...
10/25/2019 7:45:22 PM14:15:22.968 [] [raft-server-raft-atomix-partition-1] WARN io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Caught OutOfDiskSpace error! Force compacting logs...
10/25/2019 7:54:20 PM14:23:19.473 [] [raft-server-raft-atomix-partition-1] WARN io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Caught OutOfDiskSpace error! Force compacting logs...
10/25/2019 8:06:37 PM14:35:57.215 [] [raft-server-raft-atomix-partition-1] ERROR io.atomix.utils.concurrent.SingleThreadContext - An uncaught exception occurred
10/25/2019 8:06:37 PMjava.lang.IllegalArgumentException: Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer" for class: io.atomix.protocols.raft.storage.system.Configuration
10/25/2019 8:06:37 PM at com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:65) ~[kryo-4.0.2.jar:?]
10/25/2019 8:06:37 PM at com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:43) ~[kryo-4.0.2.jar:?]
10/25/2019 8:06:37 PM at com.esotericsoftware.kryo.Kryo.newDefaultSerializer(Kryo.java:396) ~[kryo-4.0.2.jar:?]
10/25/2019 8:06:37 PM at com.esotericsoftware.kryo.Kryo.getDefaultSerializer(Kryo.java:380) ~[kryo-4.0.2.jar:?]
10/25/2019 8:06:37 PM at com.esotericsoftware.kryo.Kryo.register(Kryo.java:423) ~[kryo-4.0.2.jar:?]
10/25/2019 8:06:37 PM at io.atomix.utils.serializer.Namespace.register(Namespace.java:550) ~[atomix-utils-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.utils.serializer.Namespace.create(Namespace.java:509) ~[atomix-utils-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) ~[kryo-4.0.2.jar:?]
10/25/2019 8:06:37 PM at io.atomix.utils.serializer.Namespace.borrow(Namespace.java:568) ~[atomix-utils-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.utils.serializer.Namespace.serialize(Namespace.java:362) ~[atomix-utils-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.storage.journal.FileChannelJournalSegmentWriter.append(FileChannelJournalSegmentWriter.java:235) ~[atomix-storage-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.storage.journal.MappableJournalSegmentWriter.append(MappableJournalSegmentWriter.java:130) ~[atomix-storage-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.storage.journal.SegmentedJournalWriter.append(SegmentedJournalWriter.java:76) ~[atomix-storage-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.storage.journal.DelegatingJournalWriter.append(DelegatingJournalWriter.java:47) ~[atomix-storage-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.protocols.raft.roles.LeaderRole.appendAndCompact(LeaderRole.java:1073) ~[atomix-raft-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.protocols.raft.roles.LeaderRole.appendAndCompact(LeaderRole.java:1057) ~[atomix-raft-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.protocols.raft.roles.LeaderRole.commitCommand(LeaderRole.java:704) ~[atomix-raft-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.protocols.raft.roles.LeaderRole.onCommand(LeaderRole.java:659) ~[atomix-raft-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.protocols.raft.impl.RaftContext.lambda$null$30(RaftContext.java:712) ~[atomix-raft-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.protocols.raft.impl.RaftContext.lambda$runOnContext$35(RaftContext.java:731) ~[atomix-raft-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at io.atomix.utils.concurrent.SingleThreadContext$1.lambda$execute$0(SingleThreadContext.java:53) ~[atomix-utils-3.2.0-alpha5.jar:?]
10/25/2019 8:06:37 PM at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:?]
10/25/2019 8:06:37 PM at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:?]
10/25/2019 8:06:37 PM at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:?]
10/25/2019 8:06:37 PM at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:?]
10/25/2019 8:06:37 PM at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
10/25/2019 8:06:37 PM at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?]
10/25/2019 8:06:37 PM at java.lang.Thread.run(Thread.java:748) [?:?]
10/25/2019 8:06:37 PMCaused by: java.lang.reflect.InvocationTargetException
10/25/2019 8:06:37 PM at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
10/25/2019 8:06:37 PM at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:?]
10/25/2019 8:06:37 PM at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
10/25/2019 8:06:37 PM at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:?]
10/25/2019 8:06:37 PM at com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:51) ~[kryo-4.0.2.jar:?]
10/25/2019 8:06:37 PM ... 27 more
10/25/2019 8:06:37 PMCaused by: java.lang.OutOfMemoryError: Java heap space
10/25/2019 8:17:02 PM14:47:02.815 [] [raft-server-raft-atomix-partition-1] WARN io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Caught OutOfDiskSpace error! Force compacting logs...
10/25/2019 8:26:00 PM14:56:00.655 [] [raft-server-raft-atomix-partition-1] WARN io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Caught OutOfDiskSpace error! Force compacting logs...
10/25/2019 8:34:44 PM15:04:44.349 [] [raft-server-raft-atomix-partition-1] WARN io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Caught OutOfDiskSpace error! Force compacting logs...
10/25/2019 8:49:28 PM15:19:07.717 [] [raft-server-raft-atomix-partition-1] WARN io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Caught OutOfDiskSpace error! Force compacting logs...
1.do you run out of memory when starting?
No, when startup it takes around 650mb ram but after couple of days it takes more ram (after 5 days 2 GB, 2 week GB).
Or after running for some time?
After running 2 weeks.
What load are you putting on it?
I deployed 20 to 25 workflows and 15 jobs are listening by 1 java client with spring boot
Are you using any exporters?
Default elastic search.
Did you start a completely new instance, or run the 0.20.1 broker on the 0.18 event log?
Actually I was using 0.21.1 after 1 week it got some long polling exception(I created issue in git ) so downgraded to 0.20.1
Are you really using 0.20.1, or do you mean 0.21.1?
Yes 0.20.1
Even I am having another instance of 0.20.1 running separately, its fresh startup instance with 1 java client and 15 job workers, I installed 10 days ago and now it’s taking 3.5gb of ram and 4 gb space. It has 14 deployed workflows.
@jwulf yes you are right. 40 instance max per day. And also the instantces will complete in 10 min but only thing is having many workflows. Is thery any problem?
I Assume after downgrading from 0.21.1 to 0.20.1 this issue happened of not deleting old segments.
40 instances / day with 10 minute end-to-end execution time per instance is not a lot. @Zelldon, do we have any way to profile the memory usage of a broker?
I depends on the setup. On kubernetes you could use prometheus and grafana to see the memory usage over time on a dashboard or just connect to the pod and check the heap. This is similar to just run it on docker. Connect to the container and check the current usage.
1.What will happen if some instance are still not completed from past 2 weeks and which are in waiting state? Is it data and also Ram usage will grow?
The state have to be store somewhere, but it will not grown if nothing else happens.
2.Is there any method to get the list of running instances Or any queries ?
You could use the exporter API for that. You could also check this with Operate for example.
If data and ram usage is high, is there any tool to monitor and fix this issue? Because this will be the pain point for support team.
Questions on that: How many partitions do you use? Clustering involved or just single broker?
If you see any problems with usage you could report it, might be a bug.
Note: If you use 15 workers and version 0.20.1 it will increase in space because the Worker will send periodically new commands to activated jobs, even if there a none. With version 0.21.0 we introduced long polling which reduced this problem.
4.I am using zeebe in production now. Is there tool to monitor , because Operate is only for non commercial use.
If you want to use it in production then not for free.
Its dump questions… Can I able to delete data manually ?
No, the broker should handle this by himself. It will eventually compact the logs.
Now I am using java client in the spring boot(not spring zeebe). Is there any performance issue if I use this?
3 days ago I created one fresh zeebe 0.20.1 instance with no data and changed the snapshot period of 12h.
I deployed 4 workflows into it and after half an hour, I created 12 instances. in that 7 of them are completed in 5 minutes, other instances are in waiting state(1.loop of Tmer event for every 4h. 2.Still now also the instance are not completed).
Then I didn’t deploy workflow/instance for 2 days. it is in the idle state except the 5 above instances. now today its 3rd day. I read the logs. in logs, there is no trace of deleting the segments. the data keep growing. (attached screenshot and logs below).
But I have another instance in that I keep on working. it is able to delete the older segments.
10/25/2019 1:34:48 AM20:04:48.940 [stream-processor-snapshot-director] [10.42.49.195:26501-zb-fs-workers-0] INFO io.zeebe.logstreams - Deleted 5 segments from log storage (64 to 69).
10/25/2019 2:48:57 AM21:18:57.712 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 63505387492360
10/25/2019 6:48:58 AM01:18:58.388 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 64763813404640
10/25/2019 10:12:39 AM04:42:39.064 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 65833259466440
10/25/2019 10:59:09 AM05:29:09.256 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 66073779035080
10/25/2019 1:13:59 PM07:43:59.603 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 66782448189280
10/25/2019 1:34:48 PM08:04:48.783 [stream-processor-snapshot-director] [10.42.49.195:26501-zb-fs-workers-0] INFO io.zeebe.logstreams - Deleted 4 segments from log storage (69 to 73).
10/25/2019 3:30:09 PM10:00:09.980 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 67491116351072
10/25/2019 5:44:30 PM12:14:30.358 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 68195491495616
10/25/2019 7:59:30 PM14:29:30.714 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 68899867440984
10/25/2019 10:14:30 PM16:44:30.994 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 69608537072232
10/26/2019 1:34:49 AM20:04:49.448 [stream-processor-snapshot-director] [10.42.49.195:26501-zb-fs-workers-0] INFO io.zeebe.logstreams - Deleted 5 segments from log storage (73 to 78).
10/26/2019 2:49:21 AM21:19:21.592 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 71051644744528
10/26/2019 6:49:22 AM01:19:22.252 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 72310070520816
10/26/2019 10:49:32 AM05:19:32.885 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 73568496351328
10/26/2019 1:34:48 PM08:04:48.296 [stream-processor-snapshot-director] [10.42.49.195:26501-zb-fs-workers-0] INFO io.zeebe.logstreams - Deleted 4 segments from log storage (78 to 82).
10/26/2019 2:49:33 PM09:19:33.566 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 74826922097552
10/26/2019 6:31:04 PM13:01:04.160 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 75990856915304
10/26/2019 8:45:34 PM15:15:34.678 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 76695231957944
10/26/2019 10:59:25 PM17:29:25.023 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 77395312532808
10/27/2019 1:14:25 AM19:44:25.368 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 78103982078736
10/27/2019 1:34:48 AM20:04:48.473 [stream-processor-snapshot-director] [10.42.49.195:26501-zb-fs-workers-0] INFO io.zeebe.logstreams - Deleted 4 segments from log storage (82 to 86).
10/27/2019 3:29:05 AM21:59:05.716 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 78808357640608
10/27/2019 5:44:16 AM00:14:16.084 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 79521319998152
10/27/2019 7:58:56 AM02:28:56.454 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 80225695625536
10/27/2019 10:51:26 AM05:21:26.872 [] [raft-server-raft-atomix-partition-1-state] INFO io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Backup log raft-atomix-partition-1 at position 81131932825344
10/27/2019 1:34:48 PM08:04:48.939 [stream-processor-snapshot-director] [10.42.49.195:26501-zb-fs-workers-0] INFO io.zeebe.logstreams - Deleted 5 segments from log storage (86 to 91).
Working instance Storage :
Is there any issue if the Broker/Client is in IDLE Mode??
If you want a recipe to explode your disk space usage, here are a few ways to do it:
Create a high number of snapshots with a long period between them.