Friday, April 27, 2012

Things to mind in a clustered SOA Suite 11g environment


Introduction

When working in a clustered environment, there are several challenges which need to be taken into account. I've encountered and documented several of them in this post.

It is always wise to use the latest version of the software and keep your version up to date with regular patches. Oracle SOA Suite 11g is the first SOA Suite on Weblogic server and Oracle has done a great job on migrating several engines from OC4J to Weblogic. Also they have provided many new and useful features such as the EDN and the MDS. There is still some work to be done though but it will only get better!

Issues

Callbacks do not arrive

When working with SOA Suite 11.1.1.4 in a clustered environment, it is possible asynchronous callbacks do not arrive. The parent process is waiting for a callback, however the child has already send the callback.

When looking at the diagnostics log, the following error is shown;

an unhandled exception has been thrown in the Collaxa Cube systemr; exception reported is: "javax.persistence.PersistenceException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.1.3.v20110304-r9073): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLIntegrityConstraintViolationException: ORA-00001: unique constraint (SOA_SOAINFRA.AT_PK) violated

Error Code: 1
Call: INSERT INTO AUDIT_TRAIL (CIKEY, COUNT_ID, NUM_OF_EVENTS, BLOCK_USIZE, CI_PARTITION_DATE, BLOCK_CSIZE, BLOCK, LOG) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
bind => [851450, 5, 0, 118110, 2012-01-12 16:08:04.063, 5048, 3, [B@69157b01]
Query: InsertObjectQuery(com.collaxa.cube.persistence.dto.AuditTrail@494ca627)
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.flush(EntityManagerImpl.java:744)

The solution to this problem is described on;

There are 3 known workarounds;
- shutting down one managed server in the cluster
- disable audit trail logging for processes using asynchronous callbacks (http://albinoraclesoa.blogspot.com/2012/02/oracle-soa-11g-callback-not-reaching.html)
- set the AuditStorePolicy to "async". Howto: SOA Administration -> BPEL Properties -> More BPEL Configuration Properties. https://support.oracle.com/CSP/main/article?cmd=show&type=BUG&id=9964636

There is one fix;
- install patch; https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=PROBLEM&id=1338478.1


Depending on the specific situation at a customer such as
- development process
- application server maintenance process
- time available
- local legislation

The best option for a specific customer can differ.

Database adapter

For polling issues see: http://javaoraclesoa.blogspot.com/2012/04/polling-with-dbadapter-in-clustered.html

Oracle recommends putting the configuration plan for the JCA adapters (such as the DbAdapter) on a shared storage. If a customer has not done this, the Plan.xml needs to be copied manually to other managed servers in a cluster after every change. If this is not done, the managed servers can have different (mismatched) configurations on the managed servers which can cause unexpected results.

Scheduling using Quartz

Scheduling using a servlet is described in; http://www.oracle.com/technetwork/middleware/soasuite/learnmore/soascheduler-186798.pdf
This can cause issues in a clustered environment depending on the specific implementation

In a clustered environment it is better to use EDN business events and DBMS_SCHEDULER;
See: http://javaoraclesoa.blogspot.com/2012/04/scheduling-edn-business-events-using.html

Test console does not function in Enterprise Manager

In a clustered environment which is installed using Oracle's Enterprise Deployment Guide, when using the Enterprise Manager, the WSDL URL of the test console (available to test a webservice) which is generated can be a managed server specific URL. This URL is in certain setups not directly available. This can be fixed by manually replacing the URL of the managed server with the URL of the loadbalancer and then click the Parse button.

Environment specific issues

It is not known if the below issues are bugs or caused by misconfiguration of the server. They have been observed on one installation at a specific customer (11.1.1.4).

Deployment issues

Deployment (using Ant scripts) does not always (it works most of the time...) roll out a process to both managed servers in a cluster. I have not found the cause of this issue and I'm not sure if this is a known bug. It is however wise to check that a process is deployed to all managed servers in a cluster and if one branch does not contain the process, repeat the deployment till it does.

Partition creation

After creation of a partition, it was only present on one managed server in the cluster. Removing the partition and recreating it solved this issue.

Conclusion

There are some challenges when working on a clustered environment. I will expand this post when I find more issues or things to mind. For a predictable development process it is wise to use a clustered development environment when production is also clustered so these issues can be solved in development and not first encountered on production.