Reading the documentation on the migration APIs and how they work, I wondered what is considered good practice for the triggering of the migrations.
In my case I’m considering performing migrations through the API on startup of a service, of which there might be multiple running instances.
Do people perform migrations after the engine has been started and is already running or do they refrain starting the JobExecutor until the migration has been performed? And if the first, how does the mechanism deal with process instances that have progressed in the time between query and migration itself?
Also, assuming a service was running in a scaled fashion and for deployment, all instances were brought down and brought up together, each having the same trigger logic to start a migration batch, would there be automatic prevention of multiple equivalent batches or not? I understand the actual created jobs would be picked up by any instance that has the required deployments, but creation of the work might potentially be done redundantly, which seems rather inefficient.
I would look at Suspending the Definitions being migrated. Perform the migration, and then un-suspend.
Thanks, that’s a good tip to prevent progression during migration. How about the duplicate batch creation?
Can you explain more. Not sure I understand the concern.
What we currently do is to create a migration plan and have it executed asynchronously. This is triggered by a startup action in the container. However, there may be multiple containers started at the same time, that will all create the migration plan and schedule it. Is there a way to detect these duplicates? You can’t set an ID for the migration or something like it, from what I can find.
I would look at using the migration batch/async tooling. And then use the process instance query feature to set the specific groups of distributed migrations. But would be curious if it is really beneficial to do this, or just spin up a single migration server/instance rather than go to the effort of clustered migrations. Your migration server could have all the logic to pause definitions and perform the migration.
What do you mean by the migration batch/async tooling? The UI that’s in cockpit? We’re going for automated upgrades.
Spinning up a separate instance might work too, but complicates the setup somewhat.
When you setup your batch you can set a processInstanceQuery. You can use this to control which instances you want to specifically target (so you can distribute the list). When you use the Async Batch it returns a Batch ID, do you can use that to track which lists where distributed to which nodes/engines.
Ah, now I see what you mean, that could work. Not quite what I was aiming for, but it’s an option to keep in mind. Thanks for the help 
I am still very sceptical that distributing the migration job is really going to be that valuable compared to initial effort to build and maintenance
Yes, I know. Really, I’m not interested in actively distributing the migration. All I want to do is create the migration jobs just once and then have as many nodes executing same in parallel as needed. That part, we get for free, by creating an async batch execution for the migration. The tricky part is in making sure we only create the batch once if we startup multiple nodes at the same time.