vrijdag 9 november 2012

SOA Suite Cluster deployments and loadbalancers

When using Ant tasks to deploy to an Oracle SOA Suite cluster certain issues can occur. You deploy usually to one Managed Server and the cluster will propagate the deployment to the other nodes in the cluster. Often before this happens, the Ant script continues with the deployment of a next process (usually deployment is scripted this way. see for example http://biemond.blogspot.nl/2009/09/deploy-soa-suite-11g-composite.html). When resources are accessed using a loadbalancer, the loadbalancer can refer for certain resources to a managed server (node in the cluster) where the process is not deployed yet. This can cause deployment issues. You can not work around this by separately deploying to the nodes of the cluster. See http://www.oracle.com/technetwork/topics/soa/oracle-soa-suite-ha-faq-1-131459.pdf section 3.. Usually re-executing the deployment script is a workaround since the process is then usually already present on both cluster nodes.

Also when starting a managed server, a similar problem can arise. Take for example a cluster which consists of two managed servers. The first server is shutdown and started (changing certain settings requires a restart of the server) and when the first server has state 'Running' the second server is shutdown and started, the first server can get into problems loading composites since the loadbalancer might refer to resources on the second server when those resources are not loaded yet. Inconsistencies can arise between the two nodes; for example processes which are valid on the first node and invalid on the second.

Also see http://docs.oracle.com/cd/E28271_01/admin.1111/e10226/soainfra_config.htm#BHCCIJAE for this behavior; After the SOA Infrastructure is started, it may not be completely initialized to administer incoming requests until all deployed composites are loaded.I have not found a way to change this behavior. What you would want is that the loadbalancer does not refer to a server when it is not completely initialized.

The description below is the path I took to eventually find the best solution to both described issues above. If you're just interested in the solution, look at the bottom of this post and skip the 'Failed experiments' section. If you're interested in a bit of background and would like to avoid certain pitfalls, I'd suggest reading the entire post. The issue is that the loadbalancer should be able to get an accurate reading concerning the availability of a server.

How to deal with the loadbalancer

If you have a smart loadbalancer, you can configure what it uses as probe to determine if a server is running. The Weblogic Server state is insufficient for that when running Oracle SOA Suite as described above.

Creating an efficient probe for the loadbalancer is not as straightforward as it might seem. The loadbalancer looks at individual nodes to determine if it is running. You can create a probe to determine loaded processes. This is however insufficient since you will not be able to tackle the problem of deployment to one node and at a later time to the other node; both nodes will show lists of loaded processes, however one node will show less processes.

The Locator class (http://docs.oracle.com/cd/E21043_01/apirefs.1111/e10659/oracle/soa/management/facade/Locator.html) can be used to obtain a list of composites. A Locator instance can be created from within a deployed composite or by using JNDI connection properties. Both are however server instance specific and you are not able to compare the two nodes.

It is possible to obtain the managed server instances from the admin server by using MBeans. See; http://middlewaremagic.com/weblogic/?p=913. I've adapted this to allow remote connections for local testing and local connections from BPEL. When calling the code from BPEL I've used a Java embedding activity. Required libraries (JDeveloper) for compilation are;

- Weblogic 10.3 remote client
- SOA Runtime
- SOA Designtime
- BPEL Runtime

Failed experiments

'Failed experiments' would suggest I tried to achieve certain functionality but couldn't get the code to perform what I wanted. In this case the code produced does what I wanted to achieve, however my way of thinking was wrong. The below code examples look at the server state based on loaded composites and comparing managed servers. This is not an accurate measure of server availability and is not usuable by a loadbalancer. The code can however be used in other usecases. That's why I included it in this post.

Remote client for local testing

The getServerUpRemote method in the code below returns true or false. It asks the AdminServer for all Managed Servers which are up. It then goes to the managed servers and asks for the Composites running in those servers and compares the lists. True means everything is ok. False means that one or more managed servers in the cluster have differences in loaded composites, states or versions.

In order to get the managed servers I first access the AdminServer by using JMX in order to obtain a MBeanServerConnection. This connection can be used to determine the Managed Servers. Next, I use  JNDI to find the Locator class on the Managed Servers in order to get the list of Composites.

See for the code; https://dl.dropbox.com/u/6693935/blog/remoteconnections.txt

Local connection from deployed composite/servlet

Of course the loadbalancer would not be able to access my local JDeveloper installation so I wanted to deploy the code to run on the server. Also I wanted to avoid having to use server credentials. I first tried wrapping the code in a servlet and deploying the servlet on the AdminServer. This was not succesful since I had difficulties accessing resources on the Managed Servers without providing credentials. I encountered for example the following error;

java.security.PrivilegedActionException: javax.naming.NameNotFoundException: Unable to resolve 'FacadeFinderBean'. Resolved ''; remaining name 'FacadeFinderBean'

When I deployed the code in a BPEL process, there would be difficulties to access AdminServer resources and resources on other Managed Servers in the cluster.

To avoid above issues, I split the code in two parts. A servlet running on the AdminServer and a BPEL process running on the Managed Servers. The servlet would be able to provide the managed servers and the BPEL processes would be able to provide information on their runtime. The servlet could call the processes on the managed servers and compare the results.

The below code also illustrates how to call a webservice from Java and deal with the result without the requirement to generate webservice specific Java proxy classes. I used http://www.coderanch.com/t/206857/sockets/java/Http-Post-XML-example as an example.

See for the code;

https://dl.dropbox.com/u/6693935/blog/localconnections.txt

This however was still insufficient because I would need to ask the AdminServer via the servlet if the Managed Servers are up and all processes are in the same state. A loadbalancer would be unable to determine which node is up, only if the state as a whole is safe. This total state can be used during an installation to check if it is already time to deploy the next process or if a wait is required to propagate the deployment to the other nodes. This does not fix the node start-up issues. If I proxy the AdminServer servlet on the managed servers, during deployment, both managed servers would seem to be down if the responses of loaded processes are not equal. This can cause the loadbalancer to not send requests to a server which is actually up. When I return the Ok status for the managed server which has most processes loaded, I still am not sure that server is fully started.

The right path

The magic bean I couldn't access

What I needed to solve the issues in the introduction is an MBean on the Managed Servers indicating is the Managed Server has loaded all processes or is still 'initializing'.


Eventually I found this MBean; oracle.soa.config:Application=soa-infra,j2eeType=CompositeLifecycleConfig,name=soa-infra

It can be accessed using an MBeanServerConnection found by using JMX looking for weblogic.management.mbeanservers.runtime. I also used http://www.albinsblog.com/2011/10/oracle-soa-11g-getting-default-version.html#.UJvMYoaJXsc for a bit of the JMX part.

This is illustrated in the following code. To get this to compile I used the following libraries;
- Weblogicv 10.3 remote client
- Servlet runtime
- wljmxclient.jar and wlclient.jar (wlserver_10.3/server/lib). these are required for the t3 protocol used when connecting from a client (code in public static void main). Else you will encounter 'java.net.MalformedURLException: Unsupported protocol: t3' (see; http://code.google.com/p/wlsagent/issues/detail?id=3)

The below code illustrates how to get to the MBean you want. I followed the following procedure to determine how to get there;

- browsed the MBeans using Fusion Middleware Control to find the correct MBean
- used the post mentioned before to get to the correct MBeanServerConnection (this differs from the MBeanServerConnection used in the 2 code samples earlier in this post!)
- illustrated in comments at the end of the code; used the following to obtain the correct ObjectName; Set<ObjectName> myObjs = myCon.queryNames(new ObjectName("*:j2eeType=CompositeLifecycleConfig,*"), null);
- determined the class of the attribute (in this case SOAPlatformStatus) I wanted. it appeared to be a javax.management.openmbean.CompositeDataSupport. I did this like; connection.getAttribute(getSOAInfraServiceName(),"SOAPlatformStatus").getClass().toString()
- get the value from the obtained object by using the key ('isReady' in this case)

package ms.testapp.soa.utils;

import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringWriter;

import java.net.MalformedURLException;

import java.util.Hashtable;

import javax.management.MBeanServer;
import javax.management.MBeanServerConnection;
import javax.management.MalformedObjectNameException;
import javax.management.ObjectName;
import javax.management.remote.JMXConnector;
import javax.management.remote.JMXConnectorFactory;
import javax.management.remote.JMXServiceURL;

import javax.naming.Context;
import javax.naming.InitialContext;
import javax.naming.NamingException;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class DetermineServerStatus extends HttpServlet {
    public DetermineServerStatus() {
        super();
    }

    private static MBeanServerConnection getRemoteConnection(String hostname,
                                                             String portString,
                                                             String username,
                                                             String password) throws IOException,
                                                                                     MalformedURLException {
        JMXConnector connector = null;
        MBeanServerConnection connection = null;
        //System.out.println("ServerDetails---Started in initConnection");
        String protocol = "t3";
        Integer portInteger = Integer.valueOf(portString);
        int port = portInteger.intValue();
        String jndiroot = "/jndi/";

        String mserver = "weblogic.management.mbeanservers.runtime";
        //String mserver = "weblogic.management.mbeanservers.domainruntime";
        JMXServiceURL serviceURL =
            new JMXServiceURL(protocol, hostname, port, jndiroot + mserver);
        Hashtable h = new Hashtable();
        h.put(Context.SECURITY_PRINCIPAL, username);
        h.put(Context.SECURITY_CREDENTIALS, password);
        h.put(JMXConnectorFactory.PROTOCOL_PROVIDER_PACKAGES,
              "weblogic.management.remote");
        connector = JMXConnectorFactory.connect(serviceURL, h);
        connection = connector.getMBeanServerConnection();
        return connection;
    }

    private static MBeanServerConnection getLocalConnection() throws NamingException {
        InitialContext ctx = new InitialContext();
        MBeanServer server = (MBeanServer)ctx.lookup("java:comp/env/jmx/runtime");
            //(MBeanServer)ctx.lookup("java:comp/env/jmx/domainRuntime");
        return server;
    }

    private static ObjectName getSOAInfraServiceName() {
        ObjectName service = null;
        try {
            service =
                    new ObjectName("oracle.soa.config:Application=soa-infra,j2eeType=CompositeLifecycleConfig,name=soa-infra");
        } catch (MalformedObjectNameException e) {
            throw new AssertionError(e.getMessage());
        }
        return service;
    }

    private static javax.management.openmbean.CompositeDataSupport getSOAPlatformStatusObjects(MBeanServerConnection connection) throws Exception {
        //System.out.println(connection.getAttribute(getSOAInfraServiceName(),
        //                                           "SOAPlatformStatus").getClass().toString());
        return (javax.management.openmbean.CompositeDataSupport)connection.getAttribute(getSOAInfraServiceName(),
                                                                                        "SOAPlatformStatus");
    }

    private static String getSOAPlatformStatus(MBeanServerConnection connection) {
        try {
            return ((Boolean)getSOAPlatformStatusObjects(connection).get("isReady")).toString();
        } catch (Exception e) {
            //in case bean not accessible -> server not ready
            return stackTraceToString(e);
        }
    }

    private static String stackTraceToString(Throwable e) {
        String retValue = null;
        StringWriter sw = null;
        PrintWriter pw = null;
        try {
            sw = new StringWriter();
            pw = new PrintWriter(sw);
            e.printStackTrace(pw);
            retValue = sw.toString();
        } finally {
            try {
                if (pw != null)
                    pw.close();
                if (sw != null)
                    sw.close();
            } catch (IOException ignore) {
                //System.out.println(stackTraceToString(e));
            }
        }
        return retValue;
    }

    public void doPost(HttpServletRequest request,
                       HttpServletResponse response) throws ServletException,
                                                            IOException {
        doGet(request, response);
    }

    public void doGet(HttpServletRequest request,
                      HttpServletResponse response) throws ServletException,
                                                           IOException {
        PrintWriter out = response.getWriter();
        try {
            out.println(getServerStatusLocal());
        } catch (Exception e) {
            out.println(stackTraceToString(e));
        }
    }

    public String getServerStatusLocal() {
        MBeanServerConnection myCon;
        try {
            myCon = getLocalConnection();
        } catch (NamingException e) {
            //no MBean connection; server not ready
            return stackTraceToString(e);
        }
        return getSOAPlatformStatus(myCon);
    }
   
    public static void main(String[] args) {
        String host = "192.168.178.17";
        String port = "7001";
        String user = "weblogic";
        String password = "xxx";

        MBeanServerConnection myCon;
        try {
            myCon = getRemoteConnection(host, port, user, password);
            //Set<ObjectName> myObjs = myCon.queryNames(new ObjectName("*:j2eeType=CompositeLifecycleConfig,*"), null);
            //for (ObjectName myObj : myObjs) {
            //  System.out.println(myObj.getCanonicalName());
            //}
            System.out.println(getSOAPlatformStatus(myCon));
        } catch (Exception e) {
            System.out.println(stackTraceToString(e));
        }
    }
}


This servlet I could deploy to the managed servers to get the status. This however... failed because of;

javax.management.RuntimeMBeanException: java.lang.SecurityException: MBean attribute access denied.
  MBean: oracle.soa.config:name=soa-infra,j2eeType=CompositeLifecycleConfig,Application=soa-infra
  Getter for attribute SOAPlatformStatus
  Detail: Access denied. Required roles: Admin, Operator, Monitor, executing subject: principals=[]
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:856)
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:869)
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:670)
    at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)


So I thought I could maybe deploy this as part of a BPEL process. When running in another context, these SecurityExceptions might not be raised. When refactoring the code and testing the BPEL process however I still got the same exception.

Accessing the MBean

I found; http://docs.oracle.com/cd/E11035_01/wls100/security/thin_client.html#wp1046345 and got back to my servlet code. Then I added a weblogic.xml deployment descriptor, a role to web.xml and a role mapping to weblogic.xml and it worked! The code can be downloaded from; https://dl.dropbox.com/u/6693935/blog/DetermineServerStatus.zip

The BPEL code is for reference purposes only. I haven't applied the role assignment to the BPEL code. The servlet has an additional benefit that it is more easily tweakable to the response wanted by the loadbalancer


Below a screenshot of the result when the server is fully started.


We performed the following test to confirm the expected behavior.
- first deploy the servlet to the managed servers in a cluster
- confirm the servlet output for the managed servers is true (by accessing them directly, not via a loadbalancer)
- shutdown one node and confirm the servlet cannot be accessed. confirm the other node still replies 'true'
- start the node and wait till it has state Running
- confirm the servlet output for that node is false and for the other node is true
- wait a while (until all processes are loaded)
- confirm that for both nodes the servlet output is true