Handling Binary Data in SOAP with MTOM
soap is an xml-based protocol, which means that all data inside the soap envelope must be text-based. if you want to include binary data in a soap message, it too must be text-based. to achieve this, you can convert binary data to a base64 encoded string and simply embed the string inside the soap message. the diagram below shows a sample soap message with binary data embedded as a base64 string.
while this is a simple approach for dealing with binary data with soap, there are a few things to consider. when binary data is base64 encoded, it increases in size by approximately 30%. for small amounts of binary data, this probably won't be an issue, but for larger volumes of data, the increased message size can significantly impact performance.
something else to consider is the overhead for the xml parsers that will consume the soap messages. a large binary object will result in a huge base64 encoded string, and more cpu intensive parsing for the consumer.
introducing mtom
message transmission optimisation protocol or mtom for short can be used efficiently handle binary data transmission via soap. rather than base64 encoding binary data and embedding it in the soap body, the binary data is sent as a mime attachment.
as shown in the diagram below, the binary data (a pdf in this case) is sent in the http request/response as a mime attachment. the soap message contains a unique key used to reference the mime attachment.
the soap message is pretty lean, as we've avoided the base64 encoded bloat that we saw previously. a smaller xml payload also means less resource-intensive parsing by the consumer.
sample code
in the next few sections, i'll show you how mtom can be configured in a cxf service. the source code for this post includes a fully working mtom enabled cxf service and integration test. feel free to pull it from github before reading on.
schema definition for binary elements
our sample soap service returns simple bank account data, represented by the account xsd type defined below. on line seven, a new base64binary typed statement element has been added to represent a pdf document.
<xsd:complextype name="account">
<xsd:sequence>
<xsd:element name="accountnumber" type="xsd:string"/>
<xsd:element name="accountname" type="xsd:string"/>
<xsd:element name="accountbalance" type="xsd:double"/>
<xsd:element name="accountstatus" type="enumaccountstatus"/>
<xsd:element name="statement" type="xsd:base64binary"/>
</xsd:sequence>
</xsd:complextype>
when the wsdl2java process is run, jaxb generates a pojo with a statement instance variable of type byte array as shown below.
using a byte array for the statement element means that consumers will have to read the entire binary statement into memory in one go. this can be improved by telling jaxb to use a datahandler instead of a byte array. datahandler returns an inputstream which allows the client application to stream the binary data if needs be. this is particularly useful when dealing with large volumes of binary data.
to switch from byte array to datahandler you need to update the xsd with the expected mime content types. on line seven below xmime:expectedcontenttypes="application/pdf"
indicates that we are expecting the binary data to be of mime type application/pdf. the xmime:expectedcontenttypes
attribute can be set to any valid mime type or a comma-separated list of mime types.
<xsd:complextype name="account">
<xsd:sequence>
<xsd:element name="accountnumber" type="xsd:string"/>
<xsd:element name="accountname" type="xsd:string"/>
<xsd:element name="accountbalance" type="xsd:double"/>
<xsd:element name="accountstatus" type="enumaccountstatus"/>
<xsd:element name="statement" type="xsd:base64binary" xmime:expectedcontenttypes="application/pdf"/>
</xsd:sequence>
</xsd:complextype>
when i run the wsdl2java process, jaxb regenerates the domain model and the statement element is now typed as a datahandler annotated with @xmlmimetype(application/pdf)
.
configuring mtom with cxf
the spring configuration below defines an endpointimpl
class using an injected cxf bus. the endpoint is configured to use two interceptors for logging, and mtom is enabled on line 21 by simply setting the mtom-enabled
key to true.
@configuration
@importresource({ "classpath:meta-inf/cxf/cxf.xml" })
@propertysource("classpath:application.properties")
public class config {
@bean
public servletregistrationbean servletregistrationbean(applicationcontext context) {
return new servletregistrationbean(new cxfservlet(), "/*");
}
@bean
public endpointimpl serviceendpoint(bus cxfbus,
accountserviceendpoint accountserviceendpoint,
@value("${mtom-enabled}") boolean mtomenabled,
loggingininterceptor ininterceptor,
loggingoutinterceptor outinterceptor) {
endpointimpl endpoint = new endpointimpl(cxfbus, accountserviceendpoint);
endpoint.getininterceptors().add(ininterceptor);
endpoint.getoutinterceptors().add(outinterceptor);
endpoint.getproperties().put("mtom-enabled", mtomenabled);
endpoint.publish("http://localhost:8080/mtom-demo/service");
return endpoint;
}
}
the sample code includes an integration test that calls the service and logs the soap response. below is an extract from the logged response showing the soap body and mime attachment. note that there is no binary data in the statement element, but an xop:include
element instead. the href
value on line 20 references the content-id value of the mime attachment on line 31.
the mime attachment content type on line 28 is application/pdf. this is consistent with the mime content type we set in the xsd.
id: 1
response-code: 200
encoding: utf-8
content-type: multipart/related; type="application/xop+xml"; boundary="uuid:74cee6e6-296e-44f8-9de6-a2d5da7b4f2b"; start="<root.message@cxf.apache.org>"; start-info="text/xml"
headers: {}
payload: --uuid:74cee6e6-296e-44f8-9de6-a2d5da7b4f2b
content-type: application/xop+xml; charset=utf-8; type="text/xml"
content-transfer-encoding: binary
content-id: <root.message@cxf.apache.org>
<soap:envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:body>
<accountdetailsresponse xmlns="http://com/blog/samples/webservices/accountservice">
<accountdetails>
<accountnumber>12345</accountnumber>
<accountname>joe bloggs</accountname>
<accountbalance>3400.0</accountbalance>
<accountstatus>active</accountstatus>
<statement>
<xop:include xmlns:xop="http://www.w3.org/2004/08/xop/include" href="cid:eec6e626-5536-4778-a821-37de8e1f018b-1@com"/>
</statement>
</accountdetails>
</accountdetailsresponse>
</soap:body>
</soap:envelope>
--uuid:74cee6e6-296e-44f8-9de6-a2d5da7b4f2b
content-type: application/pdf
content-transfer-encoding: binary
content-id: <eec6e626-5536-4778-a821-37de8e1f018b-1@com>
%pdf-1.4
%????
2 0 obj
<</filter/flatedecode/length 565>>stream
x????j?0 e?z
-?e]???-??b? ?l ?,? ????_?h??2 ? ]?|t?????t?< ?=??k? b?? 7?????????a??:?9?~)c?????f$??w
sample code
feel free to grab the source from github and have a play around. if you have any comments or question feel free to leave a comment below.