Wednesday, May 1, 2013

Cleaning up unused namespaces in Oracle SOA 11g BPEL processes by using a Python script

Composites are often created and after creating, they are changed/expanded to implement functionality or bugfixes. When adding new partnerlinks and variables, removing them, importing new XSD, removing them, etc it often occurs that there are namespace definitions inside for example BPEL processes which are no longer relevant because they are not used anymore. This can adversely effect performance/memory usage and increases the chance of errors when XSD's are changed, removed or added (such as inconsistent duplicate namespace definitions).

I took this issue as a nice opportunity to learn myself a bit of Python. Python is a popular scripting language (http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html) and used by several software vendors such as Oracle for Weblogic server; WLST (http://docs.oracle.com/cd/E14571_01/web.1111/e13813/quick_ref.htm) and ESRI (http://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=224&moduleID=0) for GIS related programming. There is an official Python tutorial available on http://docs.python.org/3.3/tutorial/ and I've also used http://www.vogella.com/articles/Python/article.html to learn some basics.

First impression of the Python language
- I like the usage of indentation compared to the use of brackets or end statements
- code completion is far from perfect with the PyDev plug-in for Eclipse when using Python 3.3 (I needed to Google a lot for API documentation)
- even without background in Python, you can quickly get something working after reading some tutorials (although I have some experiences with other scripting languages like PERL, PHP, JavaScript. I usually use PERL for my regular scripting needs).

Implementation

Purpose

I wanted to create a script which would cleanup unused namespaces in XML files. I started with a BPEL file as an example (the same example as used in; http://javaoraclesoa.blogspot.nl/2013/04/soa-suite-ps6-11117-service-loose.html. It can be downloaded here; https://dl.dropboxusercontent.com/u/6693935/blog/HelloWorldTokens.zip). The BPEL file had the following contents;

<?xml version = "1.0" encoding = "UTF-8" ?>
<!--
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  Oracle JDeveloper BPEL Designer
 
  Created: Thu Apr 11 12:23:10 CEST 2013
  Author:  Maarten
  Type: BPEL 2.0 Process
  Purpose: Synchronous BPEL Process
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
-->
<process name="CallHelloWorld"
               targetNamespace="http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld"
               xmlns="http://docs.oasis-open.org/wsbpel/2.0/process/executable"
               xmlns:client="http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld"
               xmlns:ora="http://schemas.oracle.com/xpath/extension"
               xmlns:bpelx="http://schemas.oracle.com/bpel/extension"
         xmlns:bpel="http://docs.oasis-open.org/wsbpel/2.0/process/executable"
         xmlns:ns1="http://xmlns.oracle.com/HelloWorld/HelloWorld/HelloWorld">

    <import namespace="http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld" location="CallHelloWorld.wsdl" importType="http://schemas.xmlsoap.org/wsdl/"/>
    <!--
      ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        PARTNERLINKS                                                     
        List of services participating in this BPEL process              
      ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    -->
  <partnerLinks>
    <!--
      The 'client' role represents the requester of this service. It is
      used for callback. The location and correlation information associated
      with the client role are automatically set using WS-Addressing.
    -->
    <partnerLink name="callhelloworld_client" partnerLinkType="client:CallHelloWorld" myRole="CallHelloWorldProvider"/>
    <partnerLink name="HelloWorld" partnerLinkType="ns1:HelloWorld"
                 partnerRole="HelloWorldProvider"/>
  </partnerLinks>

  <!--
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
      VARIABLES                                                       
      List of messages and XML documents used within this BPEL process
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  -->
  <variables>
    <!-- Reference to the message passed as input during initiation -->
    <variable name="inputVariable" messageType="client:CallHelloWorldRequestMessage"/>

    <!-- Reference to the message that will be returned to the requester-->
    <variable name="outputVariable" messageType="client:CallHelloWorldResponseMessage"/>
    <variable name="InvokeHelloWorld_process_InputVariable"
              messageType="ns1:HelloWorldRequestMessage"/>
    <variable name="InvokeHelloWorld_process_OutputVariable"
              messageType="ns1:HelloWorldResponseMessage"/>
  </variables>

  <!--
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
     ORCHESTRATION LOGIC                                              
     Set of activities coordinating the flow of messages across the   
     services integrated within this business process                 
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  -->
  <sequence name="main">

    <!-- Receive input from requestor. (Note: This maps to operation defined in CallHelloWorld.wsdl) -->
    <receive name="receiveInput" partnerLink="callhelloworld_client" portType="client:CallHelloWorld" operation="process" variable="inputVariable" createInstance="yes"/>
    <assign name="Assign1">
      <copy>
        <from>$inputVariable.payload/client:input</from>
        <to>$InvokeHelloWorld_process_InputVariable.payload/ns1:input</to>
      </copy>
    </assign>
    <invoke name="InvokeHelloWorld"
            partnerLink="HelloWorld" portType="ns1:HelloWorld"
            operation="process"
            inputVariable="InvokeHelloWorld_process_InputVariable"
            outputVariable="InvokeHelloWorld_process_OutputVariable"
            bpelx:invokeAsDetail="no"/>
    <assign name="Assign2">
      <copy>
        <from>$InvokeHelloWorld_process_OutputVariable.payload/ns1:result</from>
        <to>$outputVariable.payload/client:result</to>
      </copy>
    </assign>
    <!-- Generate reply to synchronous request -->
    <reply name="replyOutput" partnerLink="callhelloworld_client" portType="client:CallHelloWorld" operation="process" variable="outputVariable"/>
  </sequence>
</process>


The ora namespace was not used in this process but it is specified in the process tag.

lxml

Based on the following I started with lxml; http://lxml.de/api/lxml.etree-module.html#cleanup_namespaces since it had a function to easily clean unused namespaces and on StackOverflow.com there were many posts on lxml. To install lxml for Windows, I had to first download and install it from http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml

cleanup_namespaces

My first try was the following;

import lxml.etree as et
filename_in='D:\\dev\\HelloWorld\\CallHelloWorld\\CallHelloWorld.bpel'
filename_out='D:\\dev\\HelloWorld\\CallHelloWorld\\CallHelloWorld.bpel.out'
tree = et.parse(filename_in)
et.cleanup_namespaces(tree)
tree.write(filename_out)

In the output I noticed that although my process definition had been cleaned up from

<process name="CallHelloWorld"
               targetNamespace="http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld"
               xmlns="http://docs.oasis-open.org/wsbpel/2.0/process/executable"
               xmlns:client="http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld"
               xmlns:ora="http://schemas.oracle.com/xpath/extension"
               xmlns:bpelx="http://schemas.oracle.com/bpel/extension"
         xmlns:bpel="http://docs.oasis-open.org/wsbpel/2.0/process/executable"
         xmlns:ns1="http://xmlns.oracle.com/HelloWorld/HelloWorld/HelloWorld">

To

<process xmlns="http://docs.oasis-open.org/wsbpel/2.0/process/executable" xmlns:bpelx="http://schemas.oracle.com/bpel/extension" name="CallHelloWorld" targetNamespace="http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld">

Several namespaces were removed which were in use such as the ns1 namespace which was used in a partnerlink definition as part of an attribute value;

<partnerLink name="HelloWorld" partnerLinkType="ns1:HelloWorld" partnerRole="HelloWorldProvider"/>

My conclusion was that using prebuild functions like the above would most likely not help solve this problem. I did not find a way to limit the functionality to specific namespaces.

Determining used namespaces

I tried a different approach; determine namespaces used in the root element and try to find them on different locations in the BPEL file. Then rewriting the root element.

import lxml.etree as et
import copy
filename_in='D:\\dev\\HelloWorld\\CallHelloWorld\\CallHelloWorld.bpel'
filename_out='D:\\dev\\HelloWorld\\CallHelloWorld\\CallHelloWorld.bpel.out'
tree = et.parse(filename_in)
root=tree.getroot()
nsmap=root.nsmap
nsmapnew= copy.deepcopy(nsmap)

#print (nsmap.values())
namespaces=set(nsmap.values())
print ("Namespaces found: "+str(len(nsmap)))
for nsitem in nsmap:
    found=0
    nscount=0;
    if nsitem != None:
        print ("Processing prefix: "+nsitem+" Namespace: "+nsmap.get(nsitem))
        #processing all elements
        walkAll = tree.getiterator()
        for elt in walkAll:
            #check element
            eltns=elt.xpath('namespace-uri(/*)')
            if eltns==nsmap.get(nsitem):
                found=1
                #print("Found namespace as element namespace")
                break
            if str(elt.text).find(nsitem+":") != -1:
                found=1
                #print("Found prefix as part of element text")
                break
            #check attributes
            for attribute in elt.attrib:
                if attribute.startswith("{"+nsmap.get(nsitem)+"}"):
                    found=1
                    #print ("Found namespace as attribute name namespace")
                    break
                if str(elt.attrib[attribute]).find(nsitem+":") != -1:
                    found=1
                    #print ("Found prefix as part of attribute value")
                    break
    else:
        #default namespace not removing
        found=1
    if found==0:
        print("Not found")
        del nsmapnew[nsitem]
print ("Namespaces remaining: "+str(len(nsmapnew)))

new_root = et.Element(root.tag, attrib=root.attrib,nsmap=nsmapnew)
new_root[:] = root[:]

#to add the top level comment
try:
    firstcomment=root.getprevious()
    new_root.addprevious(firstcomment)
except:
    None

tree = et.ElementTree(new_root)

tree.write(filename_out, xml_declaration=True, encoding='utf-8',pretty_print=True) 


This rewrote my XML like I wanted it to. Even if namespaces were used in XPATH expressions in the BPEL file, they were being recognized as being used. One drawback however was the namespace prefix which was added for all the subelements even though these elements were in the default namespace. I considered using a transformation to fix this. This would however remove the comments from the file so I decided not to. The script output was as followed;

Namespaces found: 6
Processing prefix: ns1 Namespace: http://xmlns.oracle.com/HelloWorld/HelloWorld/HelloWorld
Processing prefix: ora Namespace: http://schemas.oracle.com/xpath/extension
Not found
Processing prefix: client Namespace: http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld
Processing prefix: bpel Namespace: http://docs.oasis-open.org/wsbpel/2.0/process/executable
Processing prefix: bpelx Namespace: http://schemas.oracle.com/bpel/extension
Namespaces remaining: 5


The created output file was as followed;

<?xml version='1.0' encoding='UTF-8'?>
<!--
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  Oracle JDeveloper BPEL Designer
 
  Created: Thu Apr 11 12:23:10 CEST 2013
  Author:  Maarten
  Type: BPEL 2.0 Process
  Purpose: Synchronous BPEL Process
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
-->
<bpel:process xmlns:ns1="http://xmlns.oracle.com/HelloWorld/HelloWorld/HelloWorld" xmlns:client="http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld" xmlns:bpel="http://docs.oasis-open.org/wsbpel/2.0/process/executable" xmlns:bpelx="http://schemas.oracle.com/bpel/extension" xmlns="http://docs.oasis-open.org/wsbpel/2.0/process/executable" name="CallHelloWorld" targetNamespace="http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld"><bpel:import namespace="http://xmlns.oracle.com/HelloWorld/CallHelloWorld/CallHelloWorld" location="CallHelloWorld.wsdl" importType="http://schemas.xmlsoap.org/wsdl/"/>
    <!--
      ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        PARTNERLINKS                                                     
        List of services participating in this BPEL process              
      ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    -->
  <bpel:partnerLinks>
    <!--
      The 'client' role represents the requester of this service. It is
      used for callback. The location and correlation information associated
      with the client role are automatically set using WS-Addressing.
    -->
    <bpel:partnerLink name="callhelloworld_client" partnerLinkType="client:CallHelloWorld" myRole="CallHelloWorldProvider"/>
    <bpel:partnerLink name="HelloWorld" partnerLinkType="ns1:HelloWorld" partnerRole="HelloWorldProvider"/>
  </bpel:partnerLinks>

  <!--
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
      VARIABLES                                                       
      List of messages and XML documents used within this BPEL process
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  -->
  <bpel:variables>
    <!-- Reference to the message passed as input during initiation -->
    <bpel:variable name="inputVariable" messageType="client:CallHelloWorldRequestMessage"/>

    <!-- Reference to the message that will be returned to the requester-->
    <bpel:variable name="outputVariable" messageType="client:CallHelloWorldResponseMessage"/>
    <bpel:variable name="InvokeHelloWorld_process_InputVariable" messageType="ns1:HelloWorldRequestMessage"/>
    <bpel:variable name="InvokeHelloWorld_process_OutputVariable" messageType="ns1:HelloWorldResponseMessage"/>
  </bpel:variables>

  <!--
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
     ORCHESTRATION LOGIC                                              
     Set of activities coordinating the flow of messages across the   
     services integrated within this business process                 
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  -->
  <bpel:sequence name="main">

    <!-- Receive input from requestor. (Note: This maps to operation defined in CallHelloWorld.wsdl) -->
    <bpel:receive name="receiveInput" partnerLink="callhelloworld_client" portType="client:CallHelloWorld" operation="process" variable="inputVariable" createInstance="yes"/>
    <bpel:assign name="Assign1">
      <bpel:copy>
        <bpel:from>$inputVariable.payload/client:input</bpel:from>
        <bpel:to>$InvokeHelloWorld_process_InputVariable.payload/ns1:input</bpel:to>
      </bpel:copy>
    </bpel:assign>
    <bpel:invoke name="InvokeHelloWorld" partnerLink="HelloWorld" portType="ns1:HelloWorld" operation="process" inputVariable="InvokeHelloWorld_process_InputVariable" outputVariable="InvokeHelloWorld_process_OutputVariable" bpelx:invokeAsDetail="no"/>
    <bpel:assign name="Assign2">
      <bpel:copy>
        <bpel:from>$InvokeHelloWorld_process_OutputVariable.payload/ns1:result</bpel:from>
        <bpel:to>$outputVariable.payload/client:result</bpel:to>
      </bpel:copy>
    </bpel:assign>
    <!-- Generate reply to synchronous request -->
    <bpel:reply name="replyOutput" partnerLink="callhelloworld_client" portType="client:CallHelloWorld" operation="process" variable="outputVariable"/>
  </bpel:sequence>
</bpel:process>


The ora namespace had been removed from the process attribute. The file had also become smaller. The ora namespace is however a default Oracle namespace and to make sure I didn't break anything, I tried to compile and deploy the altered process. This was successful. I also tested it with more complex processes. The resulting BPEL file was still fully functional.

Possible followups could be;

- expand the script and allow recursively processing of multiple files and filetypes
- determine used and unused namespaces for every element, not just the root element
- link the results of unused namespaces within a project to XSD's which could also be removed from a project if they are not used by any file in the project
- if for example problems start to occur with XPATH expressions after removing default Oracle namespaces; exclude specific namespaces from cleaning
- remove the namespace prefix for the default namespace