Separating Oracle BPM business processes from their underlying data is a topic that comes up regularly with customers. When first starting out, most customers create a large set of process data objects that carry the payload of each work item instance throughout the life of a process and its subprocesses. While it is tempting to begin this way, the work item instance process data should be kept light by storing an ID field that can be used to look up information from a relational database when it is needed.
What Information Should be Stored in the Process Instance’s Payload
Although you should always strive to keep the process instance’s payload small, you can go overboard. Each work item instance should contain variables that provide:
- The unique way to identify a single instance in the process (e.g., the order id). This would be used by the user interface forms to retrieve information from a database and for process instance correlations.
- The variables used in the process’s Exclusive Gateway conditional sequence flow logic should be driven from easily accessible process data object variable information in the instance’s payload.
- The information you want to use to populate public or protected flex fields should to be stored in the instance’s payload.
- The information used to populate Business Indicator measures and dimensions for Business Activity Monitoring (BAM) should to be stored in the instance’s payload.
Reasons to Keep to Process Instance Payload Small
This has long been a best practice recommended by Oracle. In Oracle’s Performance Tuning for Oracle Business Process Management Suite 11g document, on page 17 it states:
"Minimize the amount of data stored in the process instance. Obviously, there is a tradeoff between the cost of storing data locally compared to storing keys and retrieving data for use within the process, which needs to be considered.
A reasonable starting point would be to model the process state to hold only values that are needed to control the process flow and keys to get any other (external) data on an ‘as needed’ basis. If retrieval is too frequent/slow, or the systems holding that data are not always available, then move more data into the process."
AVIO Consulting strongly recommends decoupling the process payload and the underlying data. Recently, Carlo Arteaga, Mark Peterson, Kris Nelson, Bhaskar Rayavaram, Suyash Khot, Adam DesJardin and I discussed this with a customer and went over these reasons for process data decoupling:
1. Differing Lifecycles - The underlying data and the processes typically have different lifecycles and need to be independent of one another. There is often a need to maintain each at different times. They are sometimes modified by different parts of the organization. The data stored in a database is typically the “source of truth” that sometimes must be able to be accessed and easily manipulated by applications outside of Oracle BPM. If stored as process instance data instead of in database tables, the outside applications would need to access it through Oracle BPM APIs that they will not be familiar with.
2. Runtime Performance - Lightweight process data persistence improves performance. The underlying message contract between the process instance and the engine that persists the payload should leverage key values where possible (think primary keys / relational keys from classic DBMS design patterns), rather than defining instance variables for every data element. The performance of the Oracle BPM engine is improved and the data for the instances is rendered faster because the process instance carries only essential information in its payload. At each step in the process, the process payload is hydrated and then dehydrated (read from the engine’s underlying database tables and then written back to the tables). If this information is stored in an external database, there is no need for the overhead of this hydration and dehydration of large amounts of data to occur. At each step in the process, if stored externally in a separate database outside of Oracle BPM, only the data required is read and updated when it is required to do so.
3. Development Speed - Decoupling helps speed development. Oracle BPM was built with the decoupled Model View Controller (MVC) pattern in mind. One of its strengths is the architecture‘s business services layer that can make the source of the data transparent. Given a single key value stored in the process instance payload, services can be invoked from the process and Interactive activities in the process that represent the real source of truth that the business needs. The MVC pattern’s model layer assumes that given the process’s key value, it is then possible to easily access underlying business data from a variety of sources including databases, EJBs and web services. Once exposed, the business services can be reused by any business process needing the information. Although storing all of the information inside the process payload can be considered one of the model’s business service sources, because of its overhead, using this in production systems is not recommended.
User interfaces created with Oracle’s Application Development Framework (ADF) have out-of-the-box components and operations that take advantage of this MVC pattern. Some examples of these out-of-the-box patterns that do not have to be programmed include:
- Database table information that is easily displayed using Next and Previous functionality that automatically retrieve the next or previous sets of rows
- Similarly, scrolling in a table with many rows up and down renders data automatically
- Both server and client side validations and rules
- Database dropdowns and cascading dropdowns
- Forms automatically created with Master / Detail patterns
4. Reduced Complexity - Decoupling reduces the complexities arising from data synchronization. When orchestrating various external systems into a process, care must be given to account for “Systems of Record” and the purview these systems have over data values. Decoupling process instance data so that only key values are in the payload allows the Systems of Record to continually update the subservient element values without fear of stagnant data in the process. Participants in the process receive the most current data values when dealing with process instances. When data objects span several process instances, finding and updating data is easier if it is stored in a database
A common example is when a process instance is based on an order. Several process instances may involve an order for a single customer. When order information changed, you might need to synchronize any number of process instances in various processes. Placing the data in a database makes the solution simple. Simply update the order table(s), and now all of the affected process instances have the latest information. There is no need to find its related process instances and update them individually.
5. Reporting and Archiving - Decoupling facilitates the data capture for reporting and archiving. Keeping data in the BPM payload takes away the option to do custom reporting outside of BAM and the archiving of business data. Storing data in a database makes it possible to create custom reports outside of BAM which would be not be possible or hard to do if all the data only existed in the process instance’s payload. To capture custom data changes as work items progress through a BPM or BPEL process, database tables have a clear advantage over payload information. The process payload does not store how the data progressively changes over time. Archiving and data accessibility is very limited if data is stored in the BPM payload.
Most customers struggle with this question within days of beginning their first Oracle BPM projects. While I am a fan of keeping things simple, keeping all of your data in the process’s payload will quickly impact your project's success.
Join the Conversation
Nice Article to read before you start any OBPM projects.
Great tip, Great article. Thanks
Great article Dan! Quite agree with you. Would be very interesting to see how these concepts get combined with REST/JSON and HATEOAS approaches that let you linking resources and resolve dependencies quickly.
Excellent post Dan. This should be actually one of the many patterns to be followed for BPM designs.
And indeed in any BPM product I see this challenge of having extra overhead of retreiving or logging the payload data in external database either using some Event driven approach such as Oracle EDN which brings in more point of failures as far the overall architecture is concerned. Would be interested to explore more on how would the BPMN diagram would look in case before every human task an extra service task would be needed to retreive the payload information and accordingly update it back to the external DB through another service task. Please throw some light on this as well.
I see what you are saying, but it's not as bad as you might think. Instead of cluttering up the diagram with the additional events before every User activity in the process, we would instead read the information from the database as the form loads and then store the information back into the database when the form is submitted.
Hope this helps,