Friday, February 7, 2014

JAXB and XML generation; missing elements

The Java API for XML Binding, JAXB (https://jaxb.java.net/) is a framework often used in Java to make working with XML easier. Most modern Java IDE's (such as JDeveloper, Eclipse, Netbeans) support direct generation of Java classes from an XML schema definition. JAXB allows Java code to be marshalled to XML and XML to be unmarshalled to Java classes.




If you generate Java classes from an XML schema definition, create an instance of the generated class and unmarshall it to XML, the XML will not in all cases be compliant to the schema the classes were generated from. This can cause problems if correctness is assumed based on usage of the framework. Validation can help detect problems and there are several methods to circumvent them.

Missing elements

To illustrate this, I have created a simple schema. A person has 2 elements. A first name and a last name.


 <?xml version="1.0" encoding="windows-1252" ?>  
 <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"  
       xmlns="http://www.mstest.ms" targetNamespace="http://www.mstest.ms"  
       elementFormDefault="qualified">  
  <xsd:element name="Person">  
   <xsd:complexType>  
    <xsd:sequence>  
     <xsd:element name="FirstName" type="xsd:string"/>  
     <xsd:element name="LastName" type="xsd:string"/>  
    </xsd:sequence>  
   </xsd:complexType>  
  </xsd:element>  
 </xsd:schema>  

Next I generate Java code from this in JDeveloper as can be seen in the first screenshot of this post. Just right click the XSD and tell it to Generate a JAXB 2.0 Content Model.

The generated Java code is relatively simple for this simple schema. It generates a Person.java class, some supporting classes and a property file.

I created a test class; TestPerson.java. This class creates a Person object and prints the XML generated from this object to the console.

 package ms.mstest.model;  
 import javax.xml.bind.JAXBContext;  
 import javax.xml.bind.Marshaller;  
 public class TestPerson {  
   public static void main(String[] args) {  
     JAXBContext jaxbContext;  
     ObjectFactory myObjFact = new ObjectFactory();  
     Person myPerson = myObjFact.createPerson();  
     try {  
       jaxbContext = JAXBContext.newInstance(Person.class);  
       Marshaller jaxbMarshaller;  
       jaxbMarshaller = jaxbContext.createMarshaller();  
       jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);  
       jaxbMarshaller.marshal(myPerson, System.out);  
     } catch (Exception e) {  
       System.out.println(Thread.currentThread().getStackTrace());  
     }  
   }  
 }  

The output is the following;

 <?xml version="1.0" encoding="UTF-8"?>  
 <Person xmlns="http://www.mstest.ms"/>

This output is not compliant to the schema from which the class was generated.

Validating

There are several examples available on how to validate XML files against schema's such as at: http://docs.oracle.com/javase/7/docs/api/javax/xml/validation/package-summary.html.

I have used the following code to validate my message (Java can sometimes require quite some code for relatively simple functionality).

 package ms.mstest.model;  
 import java.io.File;  
 import java.io.PrintWriter;  
 import java.io.StringWriter;  
 import java.util.LinkedList;  
 import java.util.List;  
 import javax.xml.XMLConstants;  
 import javax.xml.bind.JAXBContext;  
 import javax.xml.bind.Marshaller;  
 import javax.xml.bind.util.JAXBSource;  
 import javax.xml.validation.Schema;  
 import javax.xml.validation.SchemaFactory;  
 import javax.xml.validation.Validator;  
 import org.xml.sax.ErrorHandler;  
 import org.xml.sax.SAXException;  
 import org.xml.sax.SAXParseException;  
 public class TestPerson {  
   public static void main(String[] args) {  
     JAXBContext jaxbContext;  
     ObjectFactory myObjFact = new ObjectFactory();  
     Person myPerson = myObjFact.createPerson();  
     try {  
       jaxbContext = JAXBContext.newInstance(Person.class);  
       JAXBSource source = new JAXBSource(jaxbContext, myPerson);  
       Schema mySchema = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI).newSchema(new File("helloworld.xsd"));  
       Validator validator = mySchema.newValidator();  
       final List<SAXParseException> exceptions = new LinkedList<SAXParseException>();  
       validator.setErrorHandler(new ErrorHandler() {  
           @Override  
           public void warning(SAXParseException exception) throws SAXException {  
             exceptions.add(exception);  
           }  
           @Override  
           public void fatalError(SAXParseException exception) throws SAXException {  
             exceptions.add(exception);  
           }  
           @Override  
           public void error(SAXParseException exception) throws SAXException {  
             exceptions.add(exception);  
           }  
         });  
       validator.validate(source);  
       for (SAXParseException myEx : exceptions) {  
         System.out.println(myEx);  
       }  
       Marshaller jaxbMarshaller;  
       jaxbMarshaller = jaxbContext.createMarshaller();  
       jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);  
       jaxbMarshaller.marshal(myPerson, System.out);  
     } catch (Exception e) {  
       System.out.println(getStackTrace(e));  
     }  
   }  
   static String getStackTrace(Throwable t) {  
     StringWriter sw = new StringWriter();  
     PrintWriter pw = new PrintWriter(sw, true);  
     t.printStackTrace(pw);  
     pw.flush();  
     sw.flush();  
     return sw.toString();  
   }  
 }  

I've overridden the ErrorHandler class since I wanted to see as much exceptions as possible (see http://stackoverflow.com/questions/11131662/how-to-validate-xml-against-xsd-and-get-all-errors). Also since I wanted to see the complete stacktrace when something goes wrong, I copied a getStackTrace method (http://stackoverflow.com/questions/1069066/get-current-stack-trace-in-java).

The output was as follows;

 org.xml.sax.SAXParseException: cvc-complex-type.2.4.b: The content of element 'Person' is not complete. One of '{"http://www.mstest.ms":FirstName}' is expected.  
 <?xml version="1.0" encoding="UTF-8"?>  
 <Person xmlns="http://www.mstest.ms"/>  

In order to find errors during development, I often use (as a quick free solution) Notepad++ with the XML Tools plugin;



The minOccurs field is 1 by default for elements so the FirstName and LastName element should both be present in the output XML.

Solutions

The specific problem illustrated above with the missing elements can be solved in the Java code and by altering the XSD.

Solving the problem in the Java code

The Java code using the JAXB generated classes can be altered in order to produce correct XML. The Java code generation by JAXB cannot easily be customized. The JAXB generated classes can themselves also be altered in order to create required child elements. This however is tricky since this code might be regenerated after XSD changes.

If I add two lines to the code creating the Person object;

 myPerson.setFirstName(null);  
 myPerson.setLastName("");  

The generated XML ends up as followed;

 <?xml version="1.0" encoding="UTF-8"?>  
 <Person xmlns="http://www.mstest.ms">  
   <LastName></LastName>  
 </Person>  

Thus, if a Java object representing an element in an XML hierarchy is null, the element does not get generated and as a result, the generated XML is not schema compliant. If I assign an empty string to an element, it gets generated. Thus if I make sure that I add empty strings to all XML fields which otherwise would have been null, the XML gets generated correctly. This also holds for complex elements. If the element is null, it is missing. If the elements constructor is called, a tag is generated. Automating this by walking through the entire XML tree and setting all null fields to an empty string might not be desirable. Performance can be a consideration here and the difference between null and "" might be relevant.

Solving the problem by altering the schema definition

We can change the XSD and make it such that the Java classes generated from it, can produce XML code which is schema compliant. It is of course curious to create an XSD based on the code generation behavior of a framework, but this way we can make sure that when we regenerate the Java classes from the XSD file, we still get Java code which produces complaint XML.

 <?xml version="1.0" encoding="windows-1252" ?>  
 <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"  
       xmlns="http://www.mstest.ms" targetNamespace="http://www.mstest.ms"  
       elementFormDefault="qualified">  
  <xsd:element name="Person">  
    <xsd:complexType>  
    <xsd:sequence>  
     <xsd:element name="FirstName" type="xsd:string" minOccurs="0"/>  
     <xsd:element name="LastName" type="xsd:string" minOccurs="0"/>  
    </xsd:sequence>  
   </xsd:complexType>  
  </xsd:element>  
 </xsd:schema>  

Setting the minOccurs attribute in the XSD for a field, allows it to be missing and makes the XML generated from the Java classes generated from the XSD consistent with the XSD.

This is of course not an option when for example the schema is provided by a third party and cannot easily be altered (when for example using a webservice and the WSDL is provided).

Conclusion

JAXB is an often used framework. It can be tempting to think that when using JAXB classes, your produced XML is schema compliant. This is not guaranteed. XML message validations can help detect problems. For performance reasons, it might be useful to validate messages in development and test environments and if the code has been thoroughly tested, to disable the validations in order to gain performance in a production environment. If you could trust the generated Java classes to generate schema compliant XML, the validation of the outgoing/incoming XML would of course not be needed at all.

In this blog post a specific problem was examined; missing elements in a generated XML. A problem in this case can be solved by altering the Java code using the generated classes, by altering the generated classes or by altering the XSD. Which is most suitable depends (amongst others) on specific requirements and application architecture.

I choose specifically not to dive into the JAXB generated classes themselves. In my experience, these classes are volatile and will be regenerated often (during every build for example: http://mojo.codehaus.org/jaxb2-maven-plugin/). Especially when complex nested schema's are used which change often. Adding code to these classes makes it harder to regenerate them from the source XSD without losing alterations.