Handling of business data


Our goal is to save the business data (including business keys) in a separate database. The link between the business data and the process instance is the ProzessInstanzId. However, sometimes we need the business data (for example the state) to search for tasks or instances.
As an example: List all process instances with state ‘X’.

Option A: Process Engine is the 'master’
Select from processEngine where in runtimeTable
Join: Select from businessData where processInstanceId = runtime.processInstanceId and state = ‘X’

Option B: Business DB is the 'master’
Select from businessData where state = ‘X’
Join: Select from processEngine where processInstanceId = businessData.processInstanceId

Option C: Business data as a redundant process variable (which we really do not want to)
Select from processEngine where variable("state ") = ‘X’

Are there experiences in this regard? Which is the proposed option or are there other possibilities?

I’m leaning more towards a hybrid, mixed approach whereby your business data ontology, or semantics, initially lives in the model during early process discovery and evolution. Then, as the ontology (i.e. message schema) matures, it migrates into the SOA layer (business services).

Reason for this approach is to avoid potential delays caused by early attempts in premature schema definition. The business language must be tested in real, actual context - meaning that this is the language being defined while the team figures out how a model’s tasks communicate (inter-task and collaborating conversations) - before formal implementation within SOA-services and/or DB schema.

Camunda is an ideal fit for this approach as it generally guides task implementation (my preference) towards an initial data-free model (leaving the process “executable” option unchecked). Though the designer may add data-object references between tasks and model collaboration (see bpmn reference for data object and store). Additionally, the BPMN 2 standard doesn’t officially define schema (note: last time I checked - however, I think the run-time may have something - but, there is no annotation for message schema).

Here’s my recommended approach:

  1. allow the model a reasonably unconstrained early genesis and evolution during the discovery process.

  2. insure an information, or data, specialist attends discovery sessions. This person’s primary responsibility is careful observation of the business language (terms, subject, object) that describes various process scenarios and views. Noting that a process model has the following general perspectives: process, information, transaction, system or integration.

  3. Require a noted effort to segment data, or the ontology, between business information and process data. Sometimes they overlap. But, entities should be flagged as “process” or “business” data. For example, a process entity is generally used to help guide the token’s progression through model execution (model instance state). An example of business data is “customer address”. And, an overlap between the two is when “customer address” is applied as input to guide customer sales efforts (process) towards the appropriate regional office - but, the overlap is temporary and only used to generate office assignment which is then linked, via reference, to help associate regional office responsibility. The “regional office” is then process-data as it’s now required to direct the token at, for example, a “regional office” exclusive gateway.

  4. After the model undergoes a few iterations past initial implementation - this being after the “executable” option is select and deployed into the run-time - business data is migrated into the SOA layer (business DB). It helps if you have a few SOA specialists available to help guide service requirements towards proper domain management (i.e. business data managed via business services).

Generally speaking, BPM execution history isn’t a good home for business-information services. It’s been my experience that attempting to use a BPM system-DB for such things as “customer status” or “sales history” will cause later headaches during BPM life-cycle (iterations) and general growth in both capacity (size) and use (transactions and volatility).


Many thanks for your opinion.

Any other ideas?

A few additional references for this discussion. Would be interesting to hear additional voice on this topic.

References/further reading:

[1] Rui henriques, Antonio Silva. Object-Centered Process Modeling: Principles to Model Data-Intensive Systems. Business Process Management Workshops (2010)

[2] Wil van der Aalst, Mathias Weske, Dolf Grunbauer. Case handling: A new paradigm for business process support. Data and Knowledge Eng. 53 (2005)

[3] Vera Kunzle, manfred Reichert. Towards object-aware process management systems: Issues, challenges, benefits. In: Enterprise, BP and IS Modeling. LNBIP, vol. 29, pp.197–210. Springer, Heidelberg (2008)

[4] Künzle, Weber, Reichert, Manfred. Object-aware Business Processes: Properties, Requirements, Existing Approaches (2010)

And, a link to my interpretation…