Sunday, February 28, 2016

Asynchronous interaction in Oracle BPEL and BPM. WS-Addressing and Correlation sets

There are different ways to achieve asynchronous interaction in Oracle SOA Suite. In this blog article, I'll explain some differences between WS-Addressing and using correlation sets (in BPEL but also mostly valid for BPM). I'll cover topics like how to put the Service Bus between calls, possible integration patterns and technical challenges.

I will also shortly describe recovery options. You can of course depend on the fault management framework. This framework however does not catch for example a BPEL Assign activity gone wrong or a failed transformation. Developer defined error handling can sometimes leave holes if not thoroughly checked. If a process which should have performed a callback, terminates because of unexpected reasons, you might be able to manually perform recovery actions to achieve the same result as when the process was successful. This usually implies manually executing a callback to a calling service. Depending on your choice of implementation for asynchronous interaction, this callback can be easy or hard.


WS-Addressing

The below part describes a WS-Addressing implementation based on BPEL templates. There are alternatives possible (requiring more manual work) such as using the OWSM WS-Addressing policy and explicitly defining a callback port. This has slightly different characteristics (benefits, drawbacks) which can be abstracted from the below description. BPM has similar characteristics but also slightly different templates.

When creating a BPEL process, you get several choices for templates to base a new process on. The Synchronous BPEL template creates a port which contains a reply (output message) in the WSDL. When you want to reply, you can use the 'Reply' activity in your BPEL process. The activity is present when opening your BPEL process after generation by the template, but you can use it in other locations, such as for example in exception handlers to reply with a SOAP fault. If you want to call a synchronous service, you only need a single 'Invoke' activity.

The output message is not created in the WSDL when using the One Way or Asynchronous templates. Also when sending an asynchronous 'reply', you have to use the Invoke activity in your BPEL process instead of the 'Reply' activity. One Way BPEL process and Asynchronous BPEL process templates are quite similar. The Asynchronous template creates a callback port and message. The 'Invoke' activity to actually do the asynchronous callback is already present in the BPEL process after it has been generated based on the template. The One Way template does not create a callback port in the WSDL and callback invoke in the BPEL process. If you want to call an Asynchronous service and want to do something with an asynchronous callback, you should first use an 'Invoke' activity to call the service and then wait with a 'Receive' activity for the callback.

For all templates, the bindings and service entry in the WSDL (to make it a concrete WSDL for an exposed service) are generated upon deployment of the process based on information from the composite.xml file (<binding. tags).

The callback port of an asynchronous BPEL process is visible in the composite.xml file as followed;


This callback tag however does not expose a service in the normal WSDL, but a WSDL with the callback URL can be obtained. When a request is send to a service, the WS-Addressing headers contain a callback URL in the wsa:ReplyTo/wsa:Address field. This URL can be appended by ?WSDL to obtain a WSDL which contains the actual exposed services.


The SOA infrastructure uses WS-Addressing headers to match the 'Invoke' message from the called service with the Receive activity which should be present in the calling service. These headers are not visible by default. If you want to see them, you can use the JDeveloper HTTP analyzer as a proxy server (the calling process can configure a reference to use a proxy server). You have to mind though to disable local optimization for the calls, else you won't see requests coming through the analyzer. You can do that by adding the below two properties to the reference (see here).

<property name="oracle.webservices.local.optimization">false</property>
<property name="oracle.soa.local.optimization.force">false</property>

Even though WS-Addressing is used, the WS-Addressing OWSM policy is not attached to the service. The WS-Addressing headers can look like for example


Since the callback message send from the called service is not a reference/service, you cannot tell the SOA Infrastructure to send the message through a proxy server (HTTP analyzer). You can however specify in JDeveloper to attach the log_policy to the callback port. This gives you the request and reply of the called service. The messages are stored in DOMAIN_HOME/servers/server_name/logs/owsm/msglogging. Using this log policy also allows you to recover messages later.

Recovery in case the callback does not come

Manual recovery is not straightforward. You need the request headers in order to construct a callback message and it helps if you have a successful callback message as a sample on how to construct the callback headers. Obtaining the request headers can be done in several ways. You can use the log OWSM policy or a Service Bus Log or Alert action. From within BPEL or BPM it is more difficult to obtain the complete set of headers. You can obtain some headers from properties on the Receive (see here for example something to mind when using the wsa.replytoaddress property) but these properties are header specific (not complete) and you can save specific SOAP header fields to variables at a Receive, but these are also specific. The method described by Peter van Nes here seems to allow you control the outgoing WS-Addressing headers and (a similar method) might allow you to also gather the incoming WS-Addressing headers. However, again, this method is per header element. You might be allowed to leave some headers out in order for the callback to still arrive at the correct instance and determine some of the headers from the Enterprise Manager or soainfra database tables but I haven't investigated this further. Once you've obtained the required headers and constructed a suitable reply message, you can use your favorite tool to fire the request at the server.

Service Bus between calls

In case your company uses an architecture pattern which requires requests to go through the Service Bus, you might have a challenge with WS-Addressing. The callback is a direct call based on the callback address in the request. If you want to go through the Service Bus, you have to do some tricks. You can use the SOA-Direct transport (see some tips from Oracle on this here) and use a workaround like described here. Here the callback data is stored somewhere else in the WS-Addressing headers and later overwritten and used by the Service Bus on the way back to perform a proper callback. SOA-Direct however also has some drawbacks. SOA-Direct are blocking RMI calls which do not allow setting a timeout on the request (and thus for example can cause stuck threads). SOA-Direct also provides some challenges when working on a cluster and it causes a direct dependency between caller and called service requiring measures to be taken to achieve a deployment order of your composites (which can cause dependency loops, etc). It can also cause server start-up issues. I have not looked at recovery options for using SOA-Direct. Since it is based on RMI calls, you most likely will have to write Java code to resume a failed process. For plain WS-Addressing correlation I have not found an easy way without using SOA direct to have the callback go through the Service Bus thus you will most likely end up with the second pattern in the below image.


WS addressing in summary

WS Addressing benefits
  • Works out of the box. little implementation effort required.
  • Works in the JDeveloper integrated WebLogic server.
  • Is a generally accepted and widely acknowledged standard (see here). Also allows interoperability with several other web service providers.
WS Addressing drawbacks
  • Possible integration patterns are limited. WS-Addressing is SOAP specific and always one caller and one called service. Multiple callbacks to the same process are not possible and integrating calls from other sources (e.g. a file which has been put on a share) is also not straightforward.
  • It can be hard to propagate WS-Addressing headers through a chain of (e.g. composite) services.
  • The callback when using SOAP over HTTP is always direct (the caller sends a callback port with the request in the WS-A headers). difficult to put a Service Bus in between unless using SOA-Direct (which has other drawbacks).
  • Manual recovery in case a callback does not come, is possible, but requires determining the request headers. With SOA-Direct this becomes harder.
  • The callback cannot arrive before the process which needs to do the callback has been called since the WS-Addressing headers have not been determined yet. if for example you want to respond to events which have arrived before your process has been started, WS-Addressing is not the way to go.
  • Correlation is technical and cannot directly be based on functional data. It is not straightforward (or usual) to let the callback arrive at a different process.
Correlation

Instead of using the out of the box WS-Addressing feature, Oracle SOA Suite offers another mechanism to link messages together in an asynchronous flow, namely correlation. Correlation can be used in BPM and in BPEL. Correlation sets are not so hard and very powerful! You can for example read the following article to gain a better understanding. I borrowed the below image from that blog because it nicely illustrates how the different parts you need to define for correlation to work, link together.


You first create a correlation set. The correlation set contains properties. These properties have property aliases. These aliases usually are XPath expressions on different messages which allow the SOA infrastructure to determine the value for the property for that specific message. If a message arrives which has the same property as the initialized correlation set in a process instance, the message is delivered at that process instance.

For example, the process gets started with a certain id. This id is defined as a property alias for the id property field in a correlation set. When receiving a message, this same correlation set is used but another alias for the property is used, namely an alias describing how the property can be determined from the received message. Because the property value of the process instance (based on the property alias of the message which started the process) and the property value of the received message (based on the property alias of the received message) evaluate to the same value, the message is delivered to that specific instance.

The power in this mechanism becomes more obvious when multiple events from diverse sources need to be orchestrated in a single process. You can think about making the receiving of a file part of your process or do asynchronous processing of items in a set and monitor/combine the results.

Recovery in case the callback does not come

Recovery in case a callback does not come is relatively easy. Correlation usually works based on a functional id and not on message headers so it is usually easier to construct a callback message. The callback message can be constructed based on the original request which could be determined from the audit logs. You can choose a construction such as with WS-Addressing with the implicit callback port defined in the composite, but it is easier to explicitly expose a callback port. This way you can even from the Enterprise Manager see where the callback should go to and use for example the test console to fire a request (this will most likely have preference with the operations department compared to using a separate tool).

Service Bus between calls

The scope of this part is a pattern in which Composite A calls Service Bus B, calls Composite B, calls Service Bus A, calls Composite A (all using SOAP calls).

There is one challenge when implementing such a pattern. Where should the callback go? At first, the callback URL can be present in the headers but when a Composite B calls Service Bus A, it overwrites the WS-Addressing headers and you loose this information unless you forward it in another way. If you hard-code the callback URL in Service Bus A, Composite B can only do callbacks to Composite A and you loose re-usability. If you are implementing a thin layer Service Bus in which the interface of the Service Bus is the same as the interface of the composite, you are not allowed to add extra fields to headers or message (to forward the callback URL in). A solution for this can be to provide a callback URL in the request message and use that callback URL to override the endpoint in the business service in the Service Bus. You can of course also use a custom SOAP header for this but having it in the message is a bit easier to implement and more visible in audit logging.

Obtaining the callback URL can easily be automated. You can use an X-Path expression like for example: concat(substring-before(ora:getCompositeURL(),'!'),'/ServiceCallback'). ServiceCallback of course depends on your specific callback URL. I'll explain a bit more about the ora:getCompositeURL function later.

ora:getCompositeURL obtains (as you can most likely guess) the composite URL. The substring part strips the version part so the callback goes to the default version and you are not dependent on a specific version. Correlation is process based and does not depend on process version for correlation to determine to which instance to go. The below image explains the process on how such a pattern can work.


By using the default version, you are not dependent on a specific version. You can remove them and still have working correlation as long as you have a valid default version which is capable of receiving the callback.

Correlation in summary

Drawbacks
  • Implementation requires some thought. Where do you correlate and how do you correlate?
  • Tight coupling between caller and called service requires effort to avoid. The called service should not explicitly call back to the caller with an hardcoded endpoint in order to allow re-use of the called service. A solution for this can be requiring the calling service to send a callback URL with the request or do integration not with HTTP calls but for example with JMS.
  • The request and callback are required to contain data which allows correlation
  • Correlation does not work on the JDeveloper embedded WebLogic server. You'll receive the below error
    [2015-12-17T07:04:40.500+01:00] [DefaultServer] [ERROR] [] [oracle.integration.platform.blocks.event.jms2.EdnBus12c] [tid: DaemonWorkThread: '1' of WorkManager: 'wm/SOAWorkManager'] [userId: <anonymous>] [ecid: c370fba3-1fcd-480f-8c27-7c4cd6a7a41e-0000010e,0:19] [APP: soa-infra] [partition-name: DOMAIN] [tenant-name: GLOBAL] Error occurred when saving event to JMS mapping to database: java.lang.ClassCastException: weblogic.jdbc.wrapper.PoolConnection_org_apache_derby_client_net_NetConnection cannot be cast to oracle.jdbc.OracleConnection
  • Correlation sets can be used on the SOA infrastructure but is not a generally acknowledged standard method for doing correlation (vendor specific).
Benefits
  • Very flexible. multiple correlation sets can be used in a single process. external events using diverse technologies can be correlated to running processes. complex integration patterns are possible.
  • Correlation does not depend on SOAP headers (technical data) but (most often) on functional data / message content
  • Recovery can be relatively easy (no need to determine request headers, only expected callback message)
  • The correlating events can arrive before the process is listening; they will be processed once the correlation set is initialized
  • SB between request and response can be used (it does not matter from where the callback comes)
Common

For both WS-Addressing and correlation, messages arrive in the soainfra database schema in the DLV_MESSAGE table (when using oneWayDeliveryPolicy async.persist). You can look at the state to determine if the message has been delivered (see here for a handy blog on states). You can also browse the error hospital in the Enterprise Manager to see and recover these messages. For more information on transaction semantics and asynchronous message processing (WS-A based), you can look here. DLV_SUBSCRIPTION is also an important table storing the open Receive activities. You can do many interesting queries on those tables such as demonstrated here to determine stuck processes.

In 11g undelivered messages are not automatically retried. You can schedule retries though. See here. If this is not scheduled, you are dependant on your operations department for monitoring these messages and taking action. Undelivered messages can go to the Exhausted state. When in this state, they are not automatically retried and you should reset them (to state undelivered) or abort them (what is more suitable for the specific case). See here. In 12c retries are scheduled by default.

Clustering

In a clustered environment you have to mind several environment properties to make sure your callback also is load-balanced. The getCompositeURL function and WS-Addressing headers use several properties to determine the callback URL. Some are based on soainfra settings and some on server settings.
  • First the Server URL configuration property value on the SOA Infrastructure Common Properties page is checked. 
  • If not specified, the FrontendHost and FrontendHTTPPort (or FrontendHTTPSPort if SSL is enabled) configuration property values from the cluster MBeans are checked. 
  • If not specified, the FrontendHost and FrontendHTTPPort (or FrontendHTTPSPort if SSL is enabled) configuration property values from the Oracle WebLogic Server MBeans are checked. 
  • If not specified, the DNS-resolved Inet address of localhost is used
If your operations department has followed the Enterprise Deployment Guide (for 11.1.1.6 see here for 12.1.3 see here), you have nothing to worry about though; these settings should already be correct.

Long running instances

As always, you want to avoid large numbers of long running processes. These cause difficulties in service life-cycle management and can cause performance issues and manual recovery can only be done for small numbers onless you want to automate that. It is recommended to put timers in all long running instances and do not wait forever (manual recovery can be time consuming). Also think about options for restarting your process in a new version if new functionality is added. Making long running processes short-running and have an event based architecture might also be a suitable solution.