Ghost workflow triggered even after removal!

Hello,

I have made a zeebe stack in a docker swarm.

The only persistent data are Zeebe data and Elasticsearch data.

I have made a few test with a bpmn file with id transcode_video.

Then I have:

  • Stopped all stack containers.
  • Removed all Elasticsearch data.
  • Renamed the workflow and workflow id to transcode_video_example.
  • Restarted the stack.
  • Deployed the new bpmn file to Zeebe.
  • Publish the watchfolder_message to trigger a workflow instance.

But then it triggers and execute two workflows simultaneously, the removed one and the new one…

I have made a python script to inspect Zeebe state from Elasticsearch indices:

Here are the Zeebe indices:

------------------------------------------------------------------------------------------------------------------------
health    status    index          
------------------------------------------------------------------------------------------------------------------------
green     open      zeebe-record_deployment_0.23.1_2020-07-24
green     open      zeebe-record_job_0.23.1_2020-07-24
green     open      zeebe-record_variable_0.23.1_2020-07-24
green     open      zeebe-record_workflow-instance_0.23.1_2020-07-24

Here are the deployed workflows:

------------------------------------------------------------------------------------------------------------------------
workflowKey              bpmnProcessId                      version        resourceName   
------------------------------------------------------------------------------------------------------------------------
2251799813696759         transcode_video_example            2              transcode_video_example.bpmn

Here are the instances triggered by the message event:


bpmnProcessId            version   workflowKey         flowScopeKey        bpmnElementType     parentWIK           parentEIK           WIK                 elementId           
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         -1        2251799813696703    -1                  START_EVENT         -1                  -1                  2251799813742658    watchfolder         
                         -1        2251799813696759    -1                  START_EVENT         -1                  -1                  2251799813742661    watchfolder         
transcode_video          2         2251799813696703    -1                  PROCESS             -1                  -1                  2251799813742658    transcode_video     
transcode_video_example  2         2251799813696759    -1                  PROCESS             -1                  -1                  2251799813742661    transcode_video_example
transcode_video          2         2251799813696703    -1                  PROCESS             -1                  -1                  2251799813742658    transcode_video     
transcode_video_example  2         2251799813696759    -1                  PROCESS             -1                  -1                  2251799813742661    transcode_video_example
transcode_video          2         2251799813696703    2251799813742658    START_EVENT         -1                  -1                  2251799813742658    watchfolder         
transcode_video_example  2         2251799813696759    2251799813742661    START_EVENT         -1                  -1                  2251799813742661    watchfolder         
transcode_video          2         2251799813696703    2251799813742658    START_EVENT         -1                  -1                  2251799813742658    watchfolder         
transcode_video_example  2         2251799813696759    2251799813742661    START_EVENT         -1                  -1                  2251799813742661    watchfolder         

As you see, the old workflow transcode_video is executed, but does not exist in Elasticsearch nor appear in Operate.

I don’t know what Zeebe keeps in its data, but may be there is a remain of the old workflow somewhere…

How-to really get rid of a removed workflow ?

Hi @vtexier.

I like title of this topic :laughing:

Elasticsearch is only the data sink for Zeebe. It exports its data to ES and Operate reads from ES.

The data of Zeebe itself (i.e. its internal state) is stored on disk. So, we also need to remove the Zeebe data directories.

Best regards,
Philipp

1 Like

As I understand it:

  • Zeebe keeps tracks of things in a RocksDB.
  • Zeebe export events as they arrive in the exporters, in Elasticsearch for Operate.

So modifying the exporters data is a very bad idea as it broke a synchronous state between Zeebe and exporters DB.

As RocksDB seems to be a log DB, storing events, it is hard to remove something atomic like an instance or a workflow. It requires to remove entire history in Zeebe DB and send a full update to the exporters. …

A better approach should be to add an event to disable a workflow ID from running.
On Github, there is an open issue to remove/disable a workflow:

Things should be removable with cascading inside Zeebe DB.
For the exporters, may be send just an event to inform other DB and clients that an element is deleted.
Then every client can purge the exporters DB from obsolete data…

For the time being, I will remove all Zeebe AND Elasticsearch data in sync.
Then I will remove “message event” elements from my workflows if I need to “disable” them.

1 Like

The solution I used:

  • I keep all Zeebe data!
  • In the modeler I have created a new workflow with the old ID, with an empty and disconnected Start Event.
  • I have deployed the new version of the old workflow.
  • I have published the same message, and now only the new workflow create an instance (thanks to the disconnected Start Event in the new version of the old workflow).

Hope it helps some people who wants to “delete/disable” workflows.

1 Like