Operate migration doesn't seem to complete or takes too long to complete

Milan Lesnek: Hi :wave: we are trying upgrade of self hosted camunda cloud from 1.1.1 to 1.2.6
But our operate is not working and in logs I can see some migration

2022-02-22 10:59:26.232  INFO 7 --- [migration_2] i.c.o.u.RetryOperation: GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{739044} - Waiting 2SECONDS. 400/2147483647

If we would have to wait for migration 400/2147483647 we would wait couple of weeks is it normal?

Because we tried it with empty elastic with new version and it started like a charm imidietly and we need to migrate to 1.2.6 since company is pushing elastic cloud with minimal version 7.16.X.

svetlana.dorokhova: This log message may be misleading, but 400/2147483647 should not be read as migration progress. This is just some logic to wait and check migration status, it can finish much earlier.

svetlana.dorokhova: How much data do you have in your indices? If you provide bigger piece of your log, we can find out, which specific index is currently being migrated, and then try to check the migration progress by observing size of “old” and “new” index with the help of cat indices API

Milan Lesnek: Operate logs starts with this and after 24h I am on 8960/2147483647

svetlana.dorokhova: All the “waiting” logs that you mention are coming from thread [migration_2] ?

Milan Lesnek: no, it is a lot of migration 2,3,4

2022-02-24 09:31:02.630  INFO 7 --- [    migration_2] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557373} - Waiting 2 SECONDS. 9100/2147483647
2022-02-24 09:31:02.630  INFO 7 --- [    migration_3] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557314} - Waiting 2 SECONDS. 9100/2147483647
2022-02-24 09:31:42.696  INFO 7 --- [    migration_3] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557314} - Waiting 2 SECONDS. 9120/2147483647
2022-02-24 09:31:42.696  INFO 7 --- [    migration_2] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557373} - Waiting 2 SECONDS. 9120/2147483647
2022-02-24 09:31:42.697  INFO 7 --- [    migration_4] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557383} - Waiting 2 SECONDS. 9120/2147483647
2022-02-24 09:32:22.768  INFO 7 --- [    migration_2] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557373} - Waiting 2 SECONDS. 9140/2147483647
2022-02-24 09:32:22.768  INFO 7 --- [    migration_4] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557383} - Waiting 2 SECONDS. 9140/2147483647
2022-02-24 09:32:22.769  INFO 7 --- [    migration_3] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557314} - Waiting 2 SECONDS. 9140/2147483647
2022-02-24 09:33:02.836  INFO 7 --- [    migration_4] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557383} - Waiting 2 SECONDS. 9160/2147483647
2022-02-24 09:33:02.837  INFO 7 --- [    migration_3] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557314} - Waiting 2 SECONDS. 9160/2147483647
2022-02-24 09:33:02.837  INFO 7 --- [    migration_2] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557373} - Waiting 2 SECONDS. 9160/2147483647
2022-02-24 09:33:42.906  INFO 7 --- [    migration_3] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557314} - Waiting 2 SECONDS. 9180/2147483647
2022-02-24 09:33:42.906  INFO 7 --- [    migration_4] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557383} - Waiting 2 SECONDS. 9180/2147483647
2022-02-24 09:33:42.906  INFO 7 --- [    migration_2] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557373} - Waiting 2 SECONDS. 9180/2147483647
2022-02-24 09:34:22.974  INFO 7 --- [    migration_2] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557373} - Waiting 2 SECONDS. 9200/2147483647
2022-02-24 09:34:22.975  INFO 7 --- [    migration_4] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557383} - Waiting 2 SECONDS. 9200/2147483647
2022-02-24 09:34:22.975  INFO 7 --- [    migration_3] i.c.o.u.RetryOperation                   : GetTaskInfo{KS7ukY7GRKqvXI1f7MGoog},{557314} - Waiting 2 SECONDS. 9200/2147483647

svetlana.dorokhova: could you also paste the response from GET <http://localhost:9200/_cat/indices?v&amp;s=index> ?

Milan Lesnek: I am sorry I have very base skills around elasticsearch, but it says that GET is not allowed in this case

curl -XGET "<http://our-elastic:9200/_cat/indeces?v&amp;s=index>"

response:

{"error":"Incorrect HTTP method for uri [/_cat/indeces?v&amp;s=index] and method [GET], allowed: [POST]","status":405}

svetlana.dorokhova: Your URL is incorrect in part of word “indices”. There is a typo

svetlana.dorokhova: indeces -> indices

Milan Lesnek: I am dumb, sorry :smile: here is the output of indicies

svetlana.dorokhova: From what I see the migration is not happening for some reason. I see the new indices created: operate-incident-1.2.0_ , operate-list-view-1.2.0_ , but they are empty. I would check Elasctisearch logs at this point

Milan Lesnek: Ok I can see that elastic has some errors, we may be low on available shards

svetlana.dorokhova: There are no ingest nodes in this cluster, unable to forward request to an ingest node.
This may be also the reason. We’re using reindex queries with pipelines, I suspect that they need ingest nodes for execution.

Milan Lesnek: yes it was by ingest role forbidden which was on our cluster by default false, thank you very much for cooperation

svetlana.dorokhova: :+1:

svetlana.dorokhova: I will create an issue to improve Operate logging for such case

Note: This post was generated by Slack Archivist from a conversation in the Zeebe Slack, a source of valuable discussions on Zeebe (get an invite). Someone in the Slack thought this was worth sharing!

If this post answered a question for you, hit the Like button - we use that to assess which posts to put into docs.