There are many situations when building composite SOA services where we need to invoke several backend services in parallel for performance reasons.  Your first thought may be to use a flow activity in your BPEL process and this is correct, although to have the services truly invoked in parallel there are a few more steps that must be done.  A BPEL process is executed using a single thread, even when a flow is reached.  Each branch of the flow will then be executed sequentially.  The solution to this is to enable the nonBlockingInvoke property on your partner links.  This results in a dehydration point at each invoke activity and a new thread created to perform the invoke.  You can think of this as the same as having an invoke activity followed by a receive activity when calling an asynchronous service.  This will allow each of the invoke activities to occur in parallel, starting immediately one after anothe , but will also result in a dehydration point for each which needs to be considered as well.

One important point to note that was seen while building this sample project was that when all 3 branches of the flow were calling the same partner link/service reference only 2 were happening in parallel.  Once a 2nd partner link/service reference to the same service was added then all 3 branches executed successfully in parallel.

First, you should go ahead and implement your flow with as many branches as required each with a service invocation.  You should then have a process that looks similar to the one below. 

flow with branches

If you deploy your process as it is now each branch will be executed sequentially, resulting in a response time equal to the sum of all of the service invocation response times plus anything else your process is doing.  You can see this response time from the test page as well as the audit trail below.

request log

To execute these invocations in parallel, double click on each partner link on the right side of the BPEL process.  Go to Properties and add a new property.  Select nonBlockingInvoke from the list and set the value to true.

edit partner link

This property will cause a new thread to be created for each invocation of the partner link, which will then callback to the main thread once it has completed, resulting in true parallelism.

If you now run your service again, you will see each of the invokes start at the same time.  You response time should now be closer to the response time of the slowest service you are invoking, plus anything else your process is doing.  You can see the process now took 10.9 sec and the longest invocation was set to take 10sec.  You can also see in the audit trail that all three invocations started concurrently at 6:05:34.

response log