RFC011 TITLE: SOAP as the standard transport encapsulation for DAS/2 messages Author: Thomas Down Dependencies: none Version: 1 Date: 1 September 2001 Introduction ------------ The DAS 1.0 protocol is a successful application of the XML-over-HTTP (web services) approach to protocol design. It is therefore unsurprising that a number of people have suggested re-casting DAS as a SOAP service. Indeed, for some people, SOAPification of DAS is the driving force behind DAS/2 development. This document outlines several of the major advantages of adopting SOAP, and also raises a number of implementation issues which would affect a SOAP-based DAS. Advantages of SOAP ------------------ SOAP [1,2] is a simple messaging system, whereby all messages are encoded as XML documents. It supports a variety of messaging models, and is independent of underlying transport protocol, but for DAS, we will presumably be using a standard request-response paradigm. At least initially, transport will be over HTTP or HTTPS. - Unlike the current DAS model, requests will be XML encoded, as well as responses. This gives much more scope for extending the request format, and makes it easier to support a powerful query language in the requests (indeed, it would be easy to embed XQueryX in SOAP messages). - Message components must be namespace-qualified, guaranteeing extensibility. - Basic exception-reporting semantics are defined. - There is full support for pipelines of actors processing a given message. This makes technologies like smart caching and proxying easy to retrofit onto protocols. - There are a large (and increasing) number of toolkits which make developing SOAP applications easy. - SOAP-Encoding provides a standard format for marshaling arbitrary data structures. (but see below for issues with this). Things to consider ------------------ Use of SOAP Encoding: The SOAP Encoding is a set of rules for translating between data structures and a simple subset of XML. This is definitely a Good Thing for most simple structures, since it means that a toolkit can handle all your marshaling/unmarshalling. Unfortunately, no current or proposed feature table format appears to fit into the SOAP Encoding. There are a number of possible solutions: + Design a new feature-table format which fits the SOAP Encoding model. However, SOAP Encoding requires that all values are represented as element content, making it potentially very leaf-heavy. This and other limitations mean that such a format would probably never gain popularity outside the SOAP environment. + Embed a non-SOAP-Encoded feature table (e.g. XFF) into the SOAP message. This is entirely legal: not every part of the SOAP message has to be SOAP Encoded. However, this approach would make DAS incompatible with many current SOAP toolkits. Such messages could be handled in several ways: - Hand-coded marshaling/unmarshaling routines. - Toolkits which can automatically marshal/ unmarshal data in arbitrary XML forms, rather than requiring the SOAP Encoding. Almost all the information required to do this is included in XML Schemas, so it is quite practical. A number of people, including some of the original designers of SOAP, believe that SOAP Encoding should go away in the future [1]. + Use the SOAP Attachments [3] mechanism to associate feature tables with messages without actually embedding them. This approach would even allow non-XML data to be carried in DAS. However, it is likely to increase implementation complexity significantly, and there currently doesn't seem to be much toolkit support for SOAP Attachments. Data structures used by toolkits: Most current SOAP toolkits build DOM or DOM-like data structures representing the full XML message. This is a reasonable design for smallish messages, but is likely to cause serious trouble with larger messages (e.g. biological feature tables), especially when the XML markup is relatively leaf-heavy. As well as the significant memory cost of the DOM, there are large set-up and tear-down costs involved. There has been some interest in developing toolkits which don't require a full DOM-tree [this was on the requirements list for the Axis toolkit, but won't make the 1.0 release]. The optimal solution would be a SOAP toolkit which works in a purely event-driven fashion. Is anyone working on this? [Author's note: I've developed some proof-of-concept SOAP code using the StAX APIs, and this all works out quite nicely]. Conclusions ----------- Given that DAS is already based around a system of XML-encoded messages, placing these in SOAP envelopes will be quite easy. However, before work starts on a SOAP-based DAS protocol, there should be some discussion on the best way to encapsulate large, complex structures of biological data (feature tables, in particular) into the SOAP messages. Online Bibliography ------------------- [1] ``A brief history of SOAP'', by Don Box, one of the primary developers (highly recommended): http://www.develop.com/dbox/postsoap.html [2] The SOAP 1.1 specification http://www.w3.org/TR/SOAP/ [3] SOAP messages with attachments http://www.w3.org/TR/SOAP-attachments [4] The SOAP 1.2 working draft http://www.w3.org/TR/soap12/