Friday, April 27, 2012

Things to mind in a clustered SOA Suite 11g environment


Introduction

When working in a clustered environment, there are several challenges which need to be taken into account. I've encountered and documented several of them in this post.

It is always wise to use the latest version of the software and keep your version up to date with regular patches. Oracle SOA Suite 11g is the first SOA Suite on Weblogic server and Oracle has done a great job on migrating several engines from OC4J to Weblogic. Also they have provided many new and useful features such as the EDN and the MDS. There is still some work to be done though but it will only get better!

Issues

Callbacks do not arrive

When working with SOA Suite 11.1.1.4 in a clustered environment, it is possible asynchronous callbacks do not arrive. The parent process is waiting for a callback, however the child has already send the callback.

When looking at the diagnostics log, the following error is shown;

an unhandled exception has been thrown in the Collaxa Cube systemr; exception reported is: "javax.persistence.PersistenceException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.1.3.v20110304-r9073): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLIntegrityConstraintViolationException: ORA-00001: unique constraint (SOA_SOAINFRA.AT_PK) violated

Error Code: 1
Call: INSERT INTO AUDIT_TRAIL (CIKEY, COUNT_ID, NUM_OF_EVENTS, BLOCK_USIZE, CI_PARTITION_DATE, BLOCK_CSIZE, BLOCK, LOG) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
bind => [851450, 5, 0, 118110, 2012-01-12 16:08:04.063, 5048, 3, [B@69157b01]
Query: InsertObjectQuery(com.collaxa.cube.persistence.dto.AuditTrail@494ca627)
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.flush(EntityManagerImpl.java:744)

The solution to this problem is described on;

There are 3 known workarounds;
- shutting down one managed server in the cluster
- disable audit trail logging for processes using asynchronous callbacks (http://albinoraclesoa.blogspot.com/2012/02/oracle-soa-11g-callback-not-reaching.html)
- set the AuditStorePolicy to "async". Howto: SOA Administration -> BPEL Properties -> More BPEL Configuration Properties. https://support.oracle.com/CSP/main/article?cmd=show&type=BUG&id=9964636

There is one fix;
- install patch; https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=PROBLEM&id=1338478.1


Depending on the specific situation at a customer such as
- development process
- application server maintenance process
- time available
- local legislation

The best option for a specific customer can differ.

Database adapter

For polling issues see: http://javaoraclesoa.blogspot.com/2012/04/polling-with-dbadapter-in-clustered.html

Oracle recommends putting the configuration plan for the JCA adapters (such as the DbAdapter) on a shared storage. If a customer has not done this, the Plan.xml needs to be copied manually to other managed servers in a cluster after every change. If this is not done, the managed servers can have different (mismatched) configurations on the managed servers which can cause unexpected results.

Scheduling using Quartz

Scheduling using a servlet is described in; http://www.oracle.com/technetwork/middleware/soasuite/learnmore/soascheduler-186798.pdf
This can cause issues in a clustered environment depending on the specific implementation

In a clustered environment it is better to use EDN business events and DBMS_SCHEDULER;
See: http://javaoraclesoa.blogspot.com/2012/04/scheduling-edn-business-events-using.html

Test console does not function in Enterprise Manager

In a clustered environment which is installed using Oracle's Enterprise Deployment Guide, when using the Enterprise Manager, the WSDL URL of the test console (available to test a webservice) which is generated can be a managed server specific URL. This URL is in certain setups not directly available. This can be fixed by manually replacing the URL of the managed server with the URL of the loadbalancer and then click the Parse button.

Environment specific issues

It is not known if the below issues are bugs or caused by misconfiguration of the server. They have been observed on one installation at a specific customer (11.1.1.4).

Deployment issues

Deployment (using Ant scripts) does not always (it works most of the time...) roll out a process to both managed servers in a cluster. I have not found the cause of this issue and I'm not sure if this is a known bug. It is however wise to check that a process is deployed to all managed servers in a cluster and if one branch does not contain the process, repeat the deployment till it does.

Partition creation

After creation of a partition, it was only present on one managed server in the cluster. Removing the partition and recreating it solved this issue.

Conclusion

There are some challenges when working on a clustered environment. I will expand this post when I find more issues or things to mind. For a predictable development process it is wise to use a clustered development environment when production is also clustered so these issues can be solved in development and not first encountered on production.

Sunday, April 15, 2012

Scheduling EDN Business Events using DBMS_SCHEDULER

Introduction

Scheduling BPEL processes to automatically run at a specific time is a common requirement. Lucas Jellema has provided an article describing several options for this; http://technology.amis.nl/2006/10/22/starting-a-bpel-process-instance-according-to-a-timed-schedule-in-oracle-bpel-pm/

Quartz can be used to schedule processes from the applicationserver; http://javaoraclesoa.blogspot.com/2012/02/scheduling-bpel-processes.html. This provides some challenges when working in a clustered environment however. I didn't get this to work. In a non-clustered environment, I've not had any issues. In the used Quartz example, the schedule and BPEL process to call, were hardcoded into the scheduling process (SOAScheduler). This setup is thus not very flexible (although can be extended of course to introduce flexibility)

In this post I will describe another (in my opinion probably usually better) solution on how BPEL processes can be scheduled. DBMS_SCHEDULER and the Event Delivery Network (EDN) API will be used for this. DBMS_SCHEDULER is a PL/SQL package which can be used for scheduling database processes. Oracle has provided a PL/SQL API for firing EDN Business Events. This API can be called from a DBMS_SCHEDULER job. JDeveloper makes it easy to work with EDN Business Events in BPEL.

This solution has benefits compared to others such as;
- loose coupling through the usage of the Event Delivery Network
- no active polling from the BPEL process on a database table is required for this mechanism to work
- the EDN is supported in the GUI and makes for easy developing
- DBMS_SCHEDULER jobs can be managed from the Enterprise Manager and from PL/SQL (http://docs.oracle.com/cd/B19306_01/server.102/b14231/scheduse.htm#i1022969) which improved flexibility (also for maintenance)
- this mechanism requires relatively little (and simple) code to work

Implementation

I will describe a database implementation which is not how you eventually want to do this in a production environment; I created a package and jobs in the DEV_SOAINFRA schema. I did this for demonstration purposes only. How you probably want to implement this is by using a separate user for the database jobs and package and granting that user execute privilege on the EDN API procedure in the DEV_SOAINFRA schema (edn_publish_event) and scheduling privileges.

To be able to call the API package, I need to know how the message from my business event exactly looks. It is not very easy to reconstruct the exact business event from the definition. What I did to get an example business event, was the following;

Enable EDN logging

Execute the following script under the DEV_SOAINFRA user;

DECLARE
  ENABLED NUMBER;
BEGIN
  ENABLED := 1;
  EDN_ENABLE_LOGGING(
    ENABLED => ENABLED
  );
END;

This causes EDN messages to be logged in the table; EDN_LOG_MESSAGES under the same schema.

Create a simple BPEL process to fire a business event

In BPEL create a simple process with an exposed SOAP binding, which fires a business event and start this process from the Enterprise Manager. I've used the event definition at; http://dl.dropbox.com/u/6693935/blog/eventdefinition.zip

Get the business event

Open up your favourite PL/SQL editor, go to the DEV_SOAINFRA schema and get a sample business event from EDN_LOG_MESSAGES.


You need the XML part of the log message which starts with Body. In this example it is;

<business-event xmlns:ns="http://schemas.oracle.com/events/edl/CommonEvents" xmlns="http://oracle.com/fabric/businessEvent">
   <name>ns:CommonDataEvent</name>
   <id>5d52e455-2d5e-479a-bd20-18e8ca86b50d</id>
   <priority>5</priority>
   <content>
      <itemCollectionArray xmlns:ns1="http://test.ms/itemcollections" xmlns="http://test.ms/itemcollections"/>
   </content>
</business-event>

Remove information which you don't need. This event can be stripped to;

<business-event xmlns:ns="http://schemas.oracle.com/events/edl/CommonEvents" xmlns="http://oracle.com/fabric/businessEvent">
   <name>ns:CommonDataEvent</name>
   <priority>5</priority>
   <content>
      <itemCollectionArray xmlns:ns1="http://test.ms/itemcollections" xmlns="http://test.ms/itemcollections"/>
   </content>
</business-event>

Create a procedure to fire the business event

I've created the following package in DEV_SOAINFRA to fire the business event. In bold is the actual call to the API. the 1 specified after the message is the priority. Default is 5.

create or replace
PACKAGE SOA_EVENT AS
 procedure start_event;
END SOA_EVENT;
/


create or replace
PACKAGE BODY SOA_EVENT AS
 procedure start_event AS
 BEGIN
   edn_publish_event(
'http://schemas.oracle.com/events/edl/CommonEvents',
'CommonDataEvent',
'<business-event xmlns:ns="http://schemas.oracle.com/events/edl/CommonEvents" xmlns="http://oracle.com/fabric/businessEvent">
   <name>ns:CommonDataEvent</name>
   <priority>5</priority>
   <content>
      <itemCollectionArray xmlns:ns1="http://test.ms/itemcollections" xmlns="http://test.ms/itemcollections"/>
   </content>
</business-event>
',1);
 commit;
 END start_event;
END SOA_EVENT;

Grant scheduling privileges

To be able to use DBMS_SCHEDULER, you need some grants. You can use the system user to grant them;

grant create job to dev_soainfra;
grant manage scheduler to dev_soainfra;

DBMS_SCHEDULER configuration

The below part is specific for this example. You can find more examples on; http://www.apex-at-work.com/2009/06/dbmsscheduler-examples.html

begin
-- daily from Monday to Sunday at 22:00 (10:00 p.m.)
dbms_scheduler.create_schedule
(schedule_name => 'INTERVAL_DAILY_2200',
 start_date=> trunc(sysdate)+18/24, -- start today 18:00 (06:00 p.m.)
 repeat_interval=> 'FREQ=DAILY; BYDAY=MON,TUE,WED,THU,FRI,SAT,SUN;
BYHOUR=22;',
 comments=>'Runtime: Every day (Mon-Sun) at 22:00 oclock');
end;
/
begin
   -- Call a procedure of a database package
   dbms_scheduler.create_program
   (program_name=> 'START_EVENT',
    program_type=> 'STORED_PROCEDURE',
    program_action=> 'SOA_EVENT.start_event',
    enabled=>true,
    comments=>'Procedure to trigger start event'
    );
end;
/
begin
   -- Connect both dbms_scheduler parts by creating the final job
   dbms_scheduler.create_job
    (job_name => 'JOB_START_EVENT',
     program_name=> 'START_EVENT',
     schedule_name=>'INTERVAL_DAILY_2200',
     enabled=>true,
     auto_drop=>false,
     comments=>'Job to trigger the Start event');
end;

Result

The defined business event will fire according to schedule and you can create a BPEL process which listens to this event and thus gets triggered according to the set schedule.

Friday, April 6, 2012

Polling with the DbAdapter in a clustered environment

Introduction

Most customers use a clustered production environment. The development environment is often not clustered. There are several things to consider for developers when the software developed will eventually run in a clustered environment. It would be a shame if the software has been developed, unit tested, system tested, accepted by the users and then breaks on the production system.

I will first discuss the DbAdapter and polling in this post. This is not a complete description of all the settings which can influence this behavior, just some things I've tried and problems I've encountered.

I've used the (active-active cluster) setup as described in;
http://javaoraclesoa.blogspot.com/2012/03/oracle-soa-suite-cluster-part-1.html
http://javaoraclesoa.blogspot.com/2012/03/oracle-soa-suite-cluster-part-2.html

This article is about the DbAdapter. An error which can occur when using the AqAdapter when dequeueing in a clustered environment is that a message is queued once and dequeued more then once. This can occur in 11.2 databases. Look at; http://www.oracle.com/technetwork/middleware/docs/aiasoarelnotesps5-1455925.html for a description on how to fix this. Below has been copied from the mentioned document;

Bug: 13729601
Added: 20-February-2012
Platform: All
The dequeuer returns the same message in multiple threads in high concurrency environments when Oracle database 11.2 is used. This means that some messages are dequeued more than once. For example, in Oracle SOA Suite, if Service 1 suddenly raises a large number of business events that are subscribed to by Service 2, duplicate instances of Service 2 triggered by the same event may be seen in an intermittent fashion. The same behavior is not observed with a 10.2.0.5 database or in an 11.2 database with event10852 level 16384 set to disable the 11.2 dequeue optimizations.

Workaround: Perform the following steps:

    Log in to the 11.2 database:
    CONNECT /AS SYSDBA

    Specify the following SQL command in SQL*Plus to disable the 11.2 dequeue optimizations:
    SQL> alter system set event='10852 trace name context forever,
    level 16384'scope=spfile;

Polling setup

In an active/active cluster configuration, a deployed process will have two instances of a process polling on the same table. In this case it is important to consider if it will be a problem if more then one instance picks up the same entry in the table.

I used the following database setup to simulate and log the test;
http://dl.dropbox.com/u/6693935/blog/cluster_test.sql

The script contains three tables;
POLLING_TEST_CLUSTER
- this table will be used by the DbAdapter for polling
POLLING_TEST_LOG
- this table will log status changes in POLLING_TEST_CLUSTER (POLLING_TEST_CLUSTER has a before update trigger)
POLLING_TEST_OUTPUT
- a BPEL process will read from POLLING_TEST_CLUSTER and put entries in this table. this table has a unique constraint on the ID column. the same ID is used as in POLLING_TEST_CLUSTER thus if the same entry from the  POLLING_TEST_CLUSTER table is picked up twice by BPEL, it will cause a unique key constraint when it tries to insert the entry in POLLING_TEST_CLUSTER

I've used a 'pragma autonomous_transaction' in the logging procedure. This will fail (with an ORA-06519: active autonomous transaction detected and rolled back) if I don't end the procedure with an explicit commit.

Next configure a datasource and the database adapter in the Weblogic Console so you can use them in BPEL. Don't create an XA datasource! It will cause problems with autonomous transactions such as java.sql.SQLException: Cannot call rollback when using distributed transactions (XA datasources can also cause problems with database links; http://javaoraclesoa.blogspot.com/2012/02/exception-occured-when-binding-was.html)

When configuring the DbAdapter, keep in mind that you have to copy the Plan.xml file (deployment plan for the DbAdapter) to the other managed server if you have not configured a shared storage for this file (which is suggested in the Enterprise Deployment Guide, http://docs.oracle.com/cd/E17904_01/core.1111/e12036/extend_soa.htm, paragraph 5.21.1). If you don't do this, the connection factory will not be available in the other managed server.

Polling test

You can download the processes here;
http://dl.dropbox.com/u/6693935/blog/FilePollingTest.zip

I created a small process to insert a record in the POLLING_TEST_CLUSTER table with a status NEW so it would directly be picked up. I used SOAPUI (http://www.soapui.org/) to do a stress test and call this process a large number of times.

I was able to produce the error (a small number of times at high loads) that two instances of the adapter, running on different servers in the cluster, picked up and processed a message at the same time. I have also seen this happening at a customer.

In my setup this situation would cause a unique constraint violation as shown below;

<bpelFault><faultType>0</faultType><bindingFault xmlns="http://schemas.oracle.com/bpel/extension"><part name="summary"><summary>Exception occured when binding was invoked. Exception occured during invocation of JCA binding: "JCA Binding execute of Reference operation 'insert' failed due to: DBWriteInteractionSpec Execute Failed Exception. insert failed. Descriptor name: [write_textline_DB.PollingTestOutput]. Caused by java.sql.BatchUpdateException: ORA-00001: unique constraint (TESTUSER.POLLING_TEST_OUTPUT_PK) violated . Please see the logs for the full DBAdapter logging output prior to this exception. This exception is considered not retriable, likely due to a modelling mistake. To classify it as retriable instead add property nonRetriableErrorCodes with value "-1" to your deployment descriptor (i.e. weblogic-ra.xml). To auto retry a retriable fault set these composite.xml properties for this invoke: jca.retry.interval, jca.retry.count, and jca.retry.backoff. All properties are integers. ". The invoked JCA adapter raised a resource exception. Please examine the above error message carefully to determine a resolution. </summary></part><part name="detail"><detail>ORA-00001: unique constraint (TESTUSER.POLLING_TEST_OUTPUT_PK) violated </detail></part><part name="code"><code>1</code></part></bindingFault></bpelFault>

This occurred even with the NumberOfThreads value set to 1 (this is the default);


Below I will describe two possible solutions for this issue and my experience with it. Distributed polling and using a Reserved Value.

Distributed Polling

This is also described more extensively in; http://www.oracle.com/technetwork/database/features/availability/maa-soa-assesment-194432.pdf

It is possible to set the DbAdapter property to do distributed polling;


Distributed Polling means that when a record is read, it is locked by the reading instance. Another instance which wants to pickup the record skips locked records. This can however cause problems with locks which could originate from different sources then the processes; records which would require processing, could be skipped.

Also, a BPEL process is by default invoked asynchronously by the DbAdapter;


This causes the lock to be released right after the DbAdapter is done with it and the BPEL process is started. This makes a case for using the logical delete provided in the DbAdapter if you want to use this mechanism and not update the field later in the BPEL process.

Using distributed polling in combination with logical delete is however not recommended by Oracle; from the manual (Help button in the JDeveloper wizard); A better alternative is to set either NumberOfThreads or MarkReservedValue for logical delete or delete strategies.

I tested the same process with distributed polling enabed. Still a small number of processes failed with a unique key constraint thus this mechanism is not 100% safe. I did get a bit better results however.

Unread and reserved value

Unread Value

Setting the Unread value causes the select query used for polling to contain a where clause matching the field to the Unread value. During my test I found that setting the Unread value in the DbAdapter configuration wizard caused my process not to pickup records with the set value. I have however seen at customers that this value was used succesfully to limit the records being picked up.


The help documentation says the following (which made me doubt the purpose of this field); Unread Value
(Optional) Enter an explicit value to indicate that the row does not need to be read. During polling, this row is skipped.


The below image showed a setting that did work.


Reserved value

In the release notes of SOA Suite 11.1.1.4 (https://supporthtml.oracle.com/epmos/faces/ui/km/SearchDocDisplay.jspx?_afrLoop=3308018508015000&type=DOCUMENT&id=1290512.1&displayIndex=3&_afrWindowMode=0&_adf.ctrl-state=y8cqyff7j_134), the following is documented;

18.1.5.1 Distributed Polling Using MarkReservedValue Disabled by Default
In this release, Oracle recommends that you use the new distributed polling approach based on skip locking. When editing an Oracle Database Adapter service which has a MarkReservedValue set, that value will be removed to enable the new best practice. To use the old distributed polling approach based on a reserved value, select the value from the drop down menu.

In the help in JDeveloper, the following is documented for distributed polling (skip locking as mentioned above); Distributed Polling. Select this checkbox if distributed polling is required. However, this implementation uses a SELECT FOR UPDATE command. A better alternative is to set either NumberOfThreads or MarkReservedValue for logical delete or delete strategies.


As you can see in the above screenshots, it is possible to set a reserved value.


This reserved value (MarkReservedValue in *_db.jca) causes an instance of the process to set an identifier. This identifier is skipped by the other polling instances in the cluster.

When I however tried to use this setting (Reserved Value) in 11.1.1.6 (of course setting the Unread value to ''), I noticed the DbAdapter did not pickup any messages. When changing the Read Value and redeploying (after emptying the reserved value), it did pickup messages again immediately. I'm not sure why it didn't work in this test. My guess is for this to work, an additional setting is required. If I've found this setting, I will update this post. Also notice the wizard empties the Reserved Value if you go through it again. For the time being, I'll use the skip locking setting (distributed polling).

Singleton DbAdapter

Based on a suggestion done in the comments of this post, there is also the option to configure the DbAdapter as a singleton with a JCA property.

See the documentation for more information on this;
http://docs.oracle.com/cd/E23943_01/integration.1111/e10231/life_cycle.htm#BABDAFBH

The behavior of this property is described in the following post; http://ayshaabbas.blogspot.nl/2012/11/db-adapter-singleton-behaviour-in-high.html