Caucho maker of Resin Server | Application Server (Java EE Certified) and Web Server


 

Resin Documentation

home company blog wiki docs 
app server web server 
health cloud java ee pro 
 Resin Server | Application Server (Java EE Certified) and Web Server
 

metaprotocol taxonomy


Choosing a metaprotocol framework is a major architectural decision, affecting performance, reliability, maintainability, and development effort. For example, Hessian may be a reliable and clean fit for a Streaming or RPC application, while SOAP may be required to match an existing WSDL specification. This article examines the common metaprotocols, categorizes them into a taxonomy, and provides a framework for choosing the appropriate metaprotocol for a given application.

Distributed computing has become more complicated with an alphabet soup of metaprotocols (SOAP, CORBA, RMI, JSON, POX, Hessian) and a proliferation of communication patterns (REST, RPC, Messaging and Streaming). This article is an attempt to sort through the morass by looking at the commonalities between the metaprotocols, i.e to create a metaprotocol taxonomy. The core defining feature will center on the description language and its relationship to the protocol, i.e. whether it defines object-oriented language typed, or merely defines a syntax. For some, the IDL is carried in the wire protocol for validation and flexibility, others require an out-of-band agreement, or even require the IDL to be shared at runtime.

Because each application has different communication requirements, there is no simple "best" metaprotocol. Still, some general conclusions are possible. Object-oriented communication, i.e. sending an object model from one machine to another, needs an object-oriented specification (and API where appropriate). The typed metaprotocols (Hessian, RMI, CORBA, JSON) best fit this requirement since they are designed around language types. The syntactic metaprotocols (SOAP, POX) are better suited for document-based applications, i.e. applications which retrieve data through syntactic queries like XPath or XQuery.

The common communication patterns: REST, RPC, Messaging and Streaming also influence the choice of metaprotocol. Since RPC is a typed and API-based pattern, the typed metaprotocols fit better. Messaging applications can use the syntactic metaprotocols if they are document or query based, and can use the typed metaprotocols if they are object model based. Like Messaging, REST can use either typed or syntactic metaprotocols, depending on the applications. Streaming applications like AJAX/Comet fit dynamic typed protocols best.

Metaprotocol Taxonomy

Metaprotocols

Metaprotocols, like SOAP, CORBA, JSON or Hessian, are specifications designed to create protocols, like NFS v4 or the Atom Publishing Protocol. Because the metaprotocols require application types and specifications, they are not complete protocols themselves. As an example, NFS version 3 is an application protocol which uses ONC-RPC as its metaprotocol.

Metaprotocols are distinct from envelope protocols like SMTP, HTTP, REST, AMQ or even TCP. The envelope protocols transmit opaque data and can be used as a transport layer for a metaprotocol, but they are not metaprotocols themselves because they impose no real structure on their data.

The communication patterns, e.g. RPC vs REST vs Messaging, are othogonal to the metaprotocol. In other words, the same metaprotocol can be used in a variety of communication patterns, e.g. JSON can be used for REST and Streaming, and Hessian can be used for RPC, REST, Messaging, and Streaming. Some metaprotocols fit better with some of the patterns, e.g. SOAP is designed primarily for Messaging, while CORBA and RMI are best used for RPC or Streaming.

Dynamic-Typed Metaprotocols

  • Examples: Hessian, JSON, Burlap, XML-RPC
  • Languages include scripting and object-oriented: Flash, JavaScript, Java, C#, Ruby, PHP, etc.
  • Specification IDL defines end-to-end object model communication.
  • Versioning is flexible.
  • Streaming, RPC, REST and Messaging communication patterns.

Static-Typed Metaprotocols

  • Examples: CORBA, RMI
  • Object-oriented languages: Java, C#, C.
  • Specification IDL defines end-to-end object model communication.
  • Versioning is brittle.
  • Streaming and RPC communication patterns.

Syntactic Metaprotocols

  • Examples: SOAP, POX
  • Scripting and object-oriented languages: Flash, Java, C#, C, PHP.
  • Specification IDL (WSDL) defines wire protocol only. Object model/API definition is application-specific.
  • Many SOAP implemtentations require runtime availability of WSDL.
  • Versioning is flexible.
  • REST and Messaging communication patterns.

Typed Metaprotocols

Typed metaprotocols like Hessian, CORBA, and JSON are built on an abstract type system. The types are translatable directly with programming language types like C, Java, EcmaScript, PHP, etc. Depending on the power of the metaprotocol's type system and the closeness of the match, the language bindings between the programming language and the metaprotocol are fairly straightforward.

Metaprotocol type systems

The typed metaprotocols can be compared by examining their type system. Each metaprotocol has a set of primitive types and a set of recursive contructor types to build composite types from primities. Some of the metaprotocols support only trees (JSON, XML-RPC), while others can create arbitrary object graphs (Hessian, RMI, CORBA).

Hessian

Hessian provides a basic type system:

  • primitives: null, boolean, int, long, double, date, string, binary
  • recursive: list, map, object
  • graphs are supported

JSON

JSON provides a restricted type system:

  • primitives: null, boolean, integer, double, string
  • recursive: list, map
  • only trees are supported

CORBA

CORBA/IIOP provides more types:

  • primitive: boolean, octet, char, wchar, short, unsigned short, long (32-bit), unsigned long, long long (64-bit), unsigned long long, float, double, long double, fixed decimal, enum, remote object, null
  • recursive: struct, union, array, sequence, string, wide string, value, any
  • graphs are supported

Language type systems

The type systems of the metaprotocols compare importantly with the target languages. These mismatches are the areas where interoperability issues can arise. In general, the language type systems will not match exactly with the metaprotocol.

C

  • primitive: char, uchar, short, ushort, int, uint, float, double
  • recursive: array, struct

Notice that C does not include the boolean or null types and does not implement a standard map.

PHP

  • primitive: null, boolean, integer, double, binary/string
  • recursive: array/map, object

The PHP array/map value is used for both numerically-indexed arrays as well as associative maps. The PHP 5 object is a distinct type. The integer type will be 32-bits or 64-bits depending on the underlying hardward. The binary/string value is 8-bits in PHP 4 and 5. PHP 6 introduces a unicode string.

Java

  • primitive: null, boolean, Boolean, byte, Byte, char, Character, short, Short, int, Integer, long, Long, float, Float, double, Double, Date, String, Thread
  • recursive: array, object, Map (pseudo), List (pseudo)

Java distinguishes between primitive types and "boxed" types, e.g. int vs Integer. This distinction can result in distinct serialization data for some metaprotocols like RMI.

In addition, Java has a number of pseudo-types like Date, Map (HashMap) and List (ArrayList). Although these are technically library types, they are used in applications as if they were primitive types. These pseudo-types are important in providing interoperability between Java and scripting languages like JavaScript or PHP.

Dynamic vs Static Typing

The typed metaprotocols divide further into dynamic and static typed. Dynamically typed systems contain all type information within the protocol itself, while statically typed systems share data outside the protocol, e.g. with shared IDL (CORBA) or matching classes (RMI).

Dynamic-Typed Metaprotocols

The dynamic-typed metaprotocols transmit their type data as part of the protocol itself. These protocols are typically designed to support scripting languages directly. Examples include Hessian, Burlap, JSON, and XML-RPC.

Interop and Versioning

Because dynamically-typed metaprotocols include each type, they are more flexible in managing differences between language semantics and supporting changes in the protocol version.

example type encodings

The manner of conveying the type varies with the encoding. In Burlap and XML-RPC, the XML tags convey the type:

Burlap data
<int>37</int>

<list>
  <string>hello, world</string>
  <int>29</int>
<list>

<map>
  <string>color</string><string>green</string>
  <string>year</string><string>2003</string>
<map>

JSON uses JavaScript syntax to convey the type:

JSON
37

["hello, world", 29]

{"color" : "green", "year" : 2003}

Hessian uses a bytecode to encode the type:

Hessian
0xb7                # int values 0-47 are encoded by 0x90 to 0xbf

V
  x0c hello, world  # string with length 0-31 start with 0x00 to 0x1f
  xad               # integer 0xb7

O t x00 x06 qa.Car  # Class definition
  x92               # two fields
  x05 color         # "color" field
  x04 year          # "year" field

o x90               # object instance, 0x90 is def #0
  x05 green         # color : green
  xcf xd3           # year : 2003

Static-Typed Metaprotocols

The static-typed metaprotocols require an out-of-band definition of the type system. In the case of CORBA, the Interface Definition Language (IDL) file provides the out-of-band definitions. In the case of Java's RMI, the Java classes provide the out-of-band definitions via reflection.

The runtime IDL requirement comes from the history of statically-typed metaprotocols. Because the main languages were C and C++, the protocols required translation tools to generate C code from the IDL for the stubs and skeletons. The IDL's types were already implicit in the generated stubs, so sending the type appeared redundant.

Statically-typed metaprotocols require the runtime IDL because the data packing format is not sent across the wire. The following example of a serialized t_car structure is typical. The field order and types are specified by the IDL. The wire format only includes the data. Changing the order of the fields or changing the "year" type to integer would change the wire data dramatically.

Car structure serialized in IIOP/CORBA
struct t_car {
  char *name;
  short year;
  int mileage;
}

# IIOP/CORBA encoding

  x00 x00 x00 x05      # length, data for name="civic"
  'C' 'i' 'v' 'i' 'c'
  x07 xd3              # year = 2003 stored in short
  x00 x01 x00 x00      # 65,536 miles

In contrast, a dynamically-typed metaprotocol like Hessian will include the relevant types. Notice, that the class only defines the field names, not the types, because the field data will include its own type.

Car structure serialized in Hessian
O t x00 x05 t_car    # class definition #0
  x93
  x04 name
  x04 year
  x07 mileage

o x90                # object instance of class #0
  x05 Civic          # name data is a string
  xc7 xd3            # year is a 32-bit integer packed into 2 bytes
  xe9 x00 x00        # mileage is a 32-bit integer packed into 3 bytes

The existence of a formal interface definition in itself is not a distinguising feature. Dynamically-typed metaprotocols like Hessian may also expect a formal interface definition, but that definition is not required for the operation of the protocol. In a dynamically-typed metaprotocol, a scripting language like PHP can send its dynamically-typed objects without any runtime IDL. If a PHP script were to use CORBA, it would require a runtime IDL.

Hessian and RMI can both use Java as their interface definition language, but Hessian is dynamically-typed while RMI is statically typed. The distinction means RMI is restricted to Java, while Hessian can be used with scripting languages an C#.

Envelope Protocols

Envelope protocols like HTTP, AMQ(JMS) and SMTP encapsulate raw data and attach routing, quality, and metadata headers to the data. Because they do not impose any structure on the data, they are not true metaprotocols. They are mentioned here to distinguish them from the syntactic metaprotocols and to avoid some confusion.

Syntactic Metaprotocols

Syntactic metaprotocols focus on the syntax of the wire data, essentially defining an untyped document language to be passed from machine to machine. At the coarse level, this could only require that the payload conform to the XML syntax. More restrictive syntaxes could define schema/grammars to restrict the valid XML documents. SOAP and POX are the main syntactic metaprotocols.

Applications using syntactic metaprotocols directly use the abstract syntax tree, e.g. the XML DOM in the case of SOAP, or use tools based on the abstract syntax tree like XPath or XQuery.

The focus on wire-protocol syntax differs sharply from the typed metaprotocols. The typed metaprotocols completely define their wire syntax and encapsulate it from the application developer. They consider the wire syntax as an implementation detail developers should spend time on. Instead, the typed metaprotocols expose the type system for customization. In other words, applications using typed metaprotocols work with an object model in the native language, while applications using syntactic metaprotocols work with an abstract syntax tree like the XML DOM.

SOAP

Pinning down SOAP as a metaprotocol is a surprisingly difficult task. Since SOAP is primarily an envelope protocol, the bulk of the SOAP specification discusses the envelope. The body, i.e. the metaprotocol part of SOAP, is just an XML document. So there really is no fundamental difference between SOAP and POX (plain old XML), unless the application actually uses the SOAP headers.

The common usage of SOAP goes beyond the actual SOAP specification. For example, WSDL (web service definition language), is not part of the SOAP specification, but instead a layer built on top of SOAP, but provide a key definition of the syntax of SOAP documents through its incorporation of XSchema. (The WS-* specifications are also built on SOAP, but generally provide extensions for the headers, i.e. the envelope protocol, so are not relevant here.)

In addition, some attempts have been make to define dynamically-typed metaprotocols on top of SOAP, e.g. Sun's JAXB v1 and JAXB v2, and the SOAP RPC-encoding model. (JAXB v1 and v2 are sufficiently different that they count as entirely different metaprotocol models.) These attemps at extending SOAP have not been particularly successful.

The original design of SOAP intended it as a dynamically-typed metaprotocol using XML as the encoding. The "SOAP encoding" of the original specification defined that metaprotocol. That metaprotocol failed and has now been deprecated.

Java's JAXB is a statically-typed mapping to Java built on XML. It uses WSDL as runtime IDL. Since WSDL is not sufficiently powerful to support several types like maps, JAXB adds additional type information to the Java code. Since JAXB is specific to Java, the peer application must either use the underlying statically-typed XML as a DOM or create another language-specific metaprotocol that hopefully maps properly to the Java semantics.

More importantly, protocols built using SOAP are specified as grammars using WSDL, i.e. they are fundamentally syntactic specifications. In some cases, type information may be layered on top of the syntax, but this does not change the underlying orientation.

Encoding

For the typed metaprotocols, the encoding is a implementation detail. Applications and developers do not need to understand or care about the underlying encoding. The encoding choices only matter in terms of performance, compactness, and reliability of the implementation software, i.e. by externally visible effects.

In contrast, the encoding syntax is fundamental to the syntactic metaprotocols, since the applications use the parsed abstract syntax tree. Although the syntax is presented directly to the application, some lexical details such as the choice of pointy brackets or character encoding can be hidden from the application. So a "Binary XML" encoding is possible variation on SOAP without changing its underlying nature.

Interoperability

The metaprotocol interoperability is critical to the success of the application, and affects the testing, maintenance and general annoyance of developing with a metaprotocol. The specification for the application protocol, however, differs strongly between typed and syntactic metaprotocols. Typed metaprotocols define APIs. Syntactic metaprotocols define grammars.

Typed metaprotocols enforce interoperability using a formal definition of the API, either using an IDL document (Interface Definition Language for CORBA), or with an object oriented language capable of reflection, e.g. Java for RMI, and Java or C# for Hessian. The key focus is defining an end-to-end specification between the object model of the client application and the object model of the server application.

Syntactic metaprotocols define the syntax of the transferred document. In the case of SOAP or POX, the document is defined by XML and by any additional XSchema defined by the WSDL (Web Service Definition Language).

The syntactic metaprotocols do not define the APIs used by the applications. In essence, the API on both sides is either a DOM (Document Object Model) or a combination of the DOM with query languages like XPath or XQuery.

Some tools define APIs on top of the document grammar, like JAXB (Java XML Binding) for Java. However, these APIs are not end-to-end specifications as used by the typed metaprotocols. They are simply convenience APIs used by the application. Importantly, there is no requirement that both the client and the server share the same API, only that they produce and parse the document following the specified grammar.

Specifications and Validation

Choosing a metaprotocol greatly affects the quality of the specifications possible for the application protocol, which will either help or hurt interoperability, versioning and reliability. Protocol specifications serve two purposes: documentation for application developers and validation that the application follows the protocol, ensuring interoperability with peer servers. A good metaprotocol choice will give a clear and unambiguous specification for both the developers and for the validation tools.

Object-model based protocols map naturally to a set of classes and methods, so they fit best with the typed metaprotocols. The IDL or API becomes the protocol specification. The implementation language tools automatically validate the API and ensure that the application uses the protocol correctly. Since modern languages also generate documentation, like javadoc, the API is automatically readable and validated.

Protocols defined as a file format like RSS are already syntax-based, so naturally fit syntactic metaprotocols like SOAP or POX. The XML grammar definition included in SOAP's WSDL matches the file-format defined for RSS. It is important to notice, however, that the syntactic metaprotocols define a file format, not an API. Application may define an API on top of the file format, but the API is not specified or validated by the syntactic metaprotocol.

SOA (service-oriented architecture)

The service-oriented architecture movement correctly identifies the quality of specifications as a key feature of distributed computing. The SOA focus is independent of specific technologies. Its key requirements are:

  • Service interfaces are specified using a human and computer-readable description language.
  • The interfaces are platform and language-independent.
  • The services are loosely-coupled and coarse-grained.

Hessian: A Hessian protocol can use any reflective, statically-typed object-oriented language as its description language, e.g. Java, C# or Ruby. Because the decription language is the application language, the development tools and documentation provide a clear and reliable specification. The specification API can be mechanically converted from language to language using reflection since Hessian type system is a common subset of the target languages, so Because it is a dynamic-typed metaprotocol, it provides end-to-end specification of the service. Like other typed metaprotocols, Hessian's wire syntax is mechanically generated from the IDL, so no extra specification or validation is required.

CORBA: The CORBA protocol uses a custom-defined interface-definition language (IDL) for specification. A translation tool generates the API for specific languages, although many CORBA implementations such as JavaEE will use introspection on a Java API to derive the IDL. As a static-typed metaprotocol, it also provides an end-to-end API specification. Again, the wire syntax is defined by the CORBA specification as a mechanical translation from the IDL.

SOAP: The SOAP protocol defines the wire syntax through the WSDL (web-services description language). It does not define an object-model to XML mapping. Many SOAP implementations require the WSDL to be available at runtime, adding additional service requirements to a SOA system. In addition, the WSDL documents are generally difficult for people to read and verify which requires extra tools. SOAP applications that provide object-oriented APIs do so outside the protocol specification, making interoperability difficult.

RMI: Like Hessian, RMI uses standard programming language syntax for API specifications, however RMI is restricted to the Java language. Because of the language restriction, RMI does not meet the requirements for a SOA metaprotocol.

JSON: Although JSON does not use a description language, and so technically does not qualify as a SOA protocol, it would be possible to define a standard JSON binding to a reflective, object-oriented language similar to the Hessian definition. That defined mapping could then allow a SOA application to use JSON as the communication protocol.

Versioning

A flexible metaprotocol can help an applications evolve, adding features and fields to successful protocols without breaking existing servers and clients. In contrast, brittle metaprotocols force and all-at-once upgrade to all pieces of the application as new data is added.

The dynamic-typed metaprotocols and syntactic metaprotocols are more flexible than the static-typed metaprotocols. When an application adds a field to a CORBA service, all servers and clients must upgrade to the latest definitions. In contrast, a Hessian service will allow new object fields types for either client or server to work with an older peer. Like the other dynamic-typed metaprotocols, Hessian allows this flexibility by sending enough information in the protocol itself to allow each side to choose the fields it understands. In contrast, CORBA requires an out-of-band protocol agreement in the IDL.

Communications Patterns

The metaprotocols can be used in several communication patterns, e.g. REST, RPC, Messaging and Streaming. Any of the kinds of metaprotocols can be used in each pattern, although some kinds are better suited to each pattern.

The communications pattern closely matches the application requirements. If the application needs queuing and remote batching of request, like processing purchase orders, then a messaging pattern is the proper choice. If the application primarily reads the state of a web service, e.g. a stock quote or RSS feed, then a REST pattern fits best. For continuous updates from the server, the streaming pattern is the right choice. For more flexibility and a well-defined API, RPC works well. Since the pattern matches the application, none of the patterns are better than another, merely more or less appropriate.

Messaging

  • queuing (store and forward)
  • complete request document (e.g. invoices)
  • envelope/headers (e.g. SMTP mail)
  • extra security considerations (since messages stored)
  • syntactic or dynamic typed: SOAP, POX, Hessian

Messaging protocols send fully-formed request documents like purchase orders to a batch server for processing. Messaging builds in the concept of a queue and storing the message for later processing. The pattern is primarily useful when slower batch processing is queued from a front-end process, and when reliability and transations are required.

The messaging protocols are primarily concerned with the envelopes and headers needed for routine and security, e.g. the main specification SMTP. Because the message may be stored in the queue and possibly on disk, messaging protocols have extra encryption and signature validation requirements. The payload itself of a message can any syntactic or dynamically typed metaprotocols like POX, JSON or Hessian.

The current version SOAP protocol is designed primarily for messaging. The SOAP protocol itself is an envelope protocol which defines headers for routing, encryption, etc. The payload is only specified as XML. Other specifications like JAXB or XSchema are required to add structure to the SOAP payload. Thus, technically, SOAP itself only applies to the messaging pattern, although the XML payload (POX) can be used for the other patterns.

Dynamic typed and syntactic metaprotocols are well-suited to messaging applications because their payload is complete, and is also flexible enough to absorb version or implementation differences from the sender and the receiver.

REST

  • mainly read, many clients
  • web page model (HTTP GET)
  • polled
  • cacheable
  • syntactic or dynamic typed: POX, JSON, Hessian

The REST (Representational State Transfer) pattern follows the basic HTTP GET model of the web. The entire state of the resource is polled and downloaded by the client, e.g. a snapshot of a stock quote, or weather report, or RSS feed. The REST model is essentially the same as retrieving a document from the web, but the retrieved resource typically represents an object model, rather than a displayable resource. Because REST resources as quasi-static, they are cacheable and can be efficiently served to large numbers of clients.

The REST model can be a useful development tool even for applications planning to use the RPC pattern. Because REST transfers the entire state at once, it forces developers to carefully consider the essential structure of the object. This complete transfer model avoid the common mistakes many RPC developers have made of transferring data in too small chunks.

Both dynamic-typed and syntactic metaprotocols can be used as the REST payload, e.g. Hessian, JSON or XML. The choice of metaprotocol depends on the application. If the primary application model is object-based, then a dynamic-typed metaprotocol is the best choice. If the model is document or query based, e.g. using XQuery or XPath to extract data from an XML document, then an XML payload (POX) would fit best, e.g. for an RSS feed. (SOAP itself does not make sense with REST since it is an envelope protocol and its headers are redundant with the HTTP response headers.

RPC

  • programming API/call model (Remote Procedure Call)
  • flexible, easy to prototype
  • solid specifications (IDL, OO programming API)
  • static typed or dynamic typed: CORBA, RMI, JSON, Hessian

The RPC (Remote Procedure Call) pattern fits most closely with normal programming style since the pattern follows the procedure call. It is well suited to controller operations, e.g. asking an administered object to change state.

RPC protocols are easy to specify and document cleanly, since the protocol is essentially a language API whether represented in an OO language or a custom IDL (interface definition language.) RPC applications should use typed metaprotocols, either static or dynamic typed, since the typed model matches the programming languages exactly.

In contrast, syntactic metaprotocols are a poor choice for RPC applications. To use a syntactic metaprotocol for RPC, the application needs to build a new typed metaprotocol layer on top of the syntactic layer, e.g. JAXB on XML or the old SOAP-RPC on XML. But these layers are language specific or poorly defined: JAXB is only Java and the SOAP-RPC language bindings are implementation-specific. Because the protocol specification is based on the protocol syntax, not the API, using a syntactic metaprotocol essentially means the programming APIs are not specified at all in an interoperable fashion. The attempt to use SOAP as an RPC protocol has been deprecated by the SOAP community itself.

Note, though, that the issue is the syntactic metaprotocol, not the XML encoding itself. XML-RPC is a dynamic typed metaprotocol that successfully uses XML as an encoding.

Streaming

  • continuous stream of packets/objects/messages
  • AJAX, monitoring, flash, multiplayer games, etc.
  • dynamic typed: JSON, Hessian
  • (CORBA and RMI can simulate with bi-directional RPC)

The streaming pattern is used for application which continually monitor changes in a resource, e.g. air traffic controller displays, administration monitors, AJAX applications, and multiplayer games. In the streaming pattern, the server is asynchronously sending packets of data as the state changes. The client to server control may use a streaming pattern or may use an RPC style, i.e. a data channel and a control channel. It's essentially an event-driven architecture.

The packets may represent as event objects which update the display state of the monitoring client. Since the event packets are object-based, a dynamic typed metaprotocol is a good fit, e.g. JSON or Hessian. The static typed metaprotocols RMI and CORBA can support streaming applications as a bidirectional RPC protocol, i.e. the server calls the client to update its state.

Debugging

When things go wrong, it can be handy to check the data going across the wire to help track the source of a bug. For example, a client might not be updated to the latest version.

Debugging is straightforward for dynamic-typed metaprotocols because they contain all the type information needed to reconstruct a printable version of the wire data. In contrast, static-typed metaprotocols like CORBA do not generally make wire-debugging available, because CORBA requires an out-of-band IDL sharing. Since the debugging tool does not have the IDL, it can't decode the CORBA stream.

The output from a Hessian debugging option might look like the following:

Hessian Debugging
class TestAPI {
  example.Combine combine(String a, String b);
}

[2007/05/08 02:51:31.000] call 2.0
[2007/05/08 02:51:31.000]   method "combine"
[2007/05/08 02:51:31.000]   "hello"
[2007/05/08 02:51:31.000]   "world"
[2007/05/08 02:51:31.000] reply 2.0
[2007/05/08 02:51:31.000]   /* defun example.Combine [a, b] */
[2007/05/08 02:51:31.000]   object example.Combine (#1)
[2007/05/08 02:51:31.000]     a: "hello"
[2007/05/08 02:51:31.000]     b: "world"

Debugging syntactic metaprotocols like SOAP is also generally straightforward. Since the wire protocol itself is already human readable, it can be formatted and inspected easily.

Bandwidth Benchmarks

The bandwidth efficiency of the metaprotocols depends more on the specific encoding choices rather than the taxonomy. An XML-based metaprotocol will be much less efficient than a binary metaprotocol even if both are dynamically-typed. For example, both Hessian and Burlap are dynamically-typed metaprotocols, yet the binary Hessian is much more compact than the XML-based Burlap.

Static and dynamically-typed metaprotocols can have the same efficiency, even though the dynamically-typed metaprotocol sends more data. For example, Hessian 2.0 is generally more efficient than RMI because of a more compact encoding. The CORBA results show the importance of choosing encodings even between two statically-typed metaprotocols. The string encoding and object encoding for CORBA is less efficient than RMI. (This example uses RMI/IIOP for CORBA and UTF-16 for its strings. Using a straight IDL for objects and UTF-8 for Strings would show results closer to RMI. However, most Java/EJB applications using CORBA will use RMI/IIOP.)

Since the main syntactic metaprotocols are XML-based, they are less efficient. In theory, a binary-encoded syntactic metaprotocol could be significantly more compact than SOAP or POX.

HESSIANRMI (JAVA.IO)CORBASOAP (JAXB)BURLAP
INT ARRAY2,0454,1734,20016,35714,367
STRING ARRAY13,27715,38841,21425,57929,733
MESH ARRAY15,15822,73133,01656,325180,315
EVENT ARRAY26,39233,93066,01889,095211,028
ELEMENT TREE3,605,6934,932,26112,724,4926,335,08722,497,213

The following graph summarizes the serialization size data. The values are normalized to the size of the serialized Hessian data.

Int

The Int benchmark serializes a single integer array. It is from the Grid Computing SOAP benchmark by Head, Govindaraju, et. al.

As expected, the binary typed-metaprotocols are significantly more efficient than SOAP. For RMI and CORBA, the array is just 4-byte integers placed end-to-end. Hessian's size is smaller, since it stores many 32-bit integers in less than 4-bytes. SOAP and Burlap are significantly larger because the XML overhead dominates the size of the data in this.

public class IntValue {
  int []data;
}

String

The String benchmark serializes a single array of Strings. It is also from the Grid Computing SOAP benchmark.

In this case, the results are closer since the string data itself is a larger component of the serialized result. Hessian, RMI and SOAP all serialize the strings in UTF-8 encoding, so the string data size is identical. In this case, CORBA is using UTF-16 to serialize the string, resulting in larger data.

public class StringValue {
  String []data;
}

Mesh

The Mesh benchmark measures serialization of primitive integers, doubles, and structures, i.e. data typical of scientific or accounting messages. It is also from the Grid Computing SOAP benchmark. It consists of an array of MeshInterfaceObject values. In this case, the array size is 1024.

As expected, the binary typed-metaprotocols are more efficient than SOAP.

public class MeshInterfaceObject {
  int x;
  int y;
  double value;
}

public class MeshValue {
  MeshInterfaceObject []data;
}

Event

The Event benchmark is also from the Grid Computing SOAP benchmark. It consists of an array of SimpleEvent values. Again, the array size is 1024.

public class SimpleEvent {
  int sequenceNumber;
  String message;
  double timestamp;
}

public class EventValue {
  SimpleEvent []data;
}

Element

The third benchmark is intended to compare graph serialization between the metaprotocols. It serializes a tree structure that's a simplification of the XML DOM, containing only Element and Text nodes. For the test, all attributes are converted to child elements with a single text node. The source data is the test.rdf file from xmlbench, and is 11.9M.

@XmlRootElement(name="text")
public class Text extends Node {
  @XmlValue
  private String value;
}

@XmlRootElement(name="element")
public class Element extends Node {
  @XmlAttribute
  private String name;

  @XmlElements({@XmlElement(name="element",Element.class),
                @XmlElement(name="text",Text.class),})
  private ArrayList<Node> children;
}

Caveats

As always, benchmark results are extremely dependent on the test data. The only accurate benchmarks are based on your own application's data.

Conclusions

The purpose of comparing metaprotocols is to choose an appropriate one for an application. Because application have different requirements, there is not one-size-fits all answer to choosing a metaprotocol, but it is possible to give some general advice.

Scripting/RIA clients

For browsers and RIA clients, dynamically-typed metaprotocols like JSON, XML-RPC, or Hessian are generally good fits. Browsers and RIA clients like Adobe Flash typically use scripting languages like JavaScript as the client language. Simplicity and compactness are important, e.g. avoiding extra defined documents. Any complication should be pushed to the server's side.

Syntactic metaprotocols do not fit as well because the clients should avoid extra complications like a WDSL, and also want to avoid the complications of using a DOM or XPath to parse the document directly. For scripting/RIA clients, it's better to use APIs appropriate to the client code and match those on the server.

API/Object Model (RPC, REST, Messaging, Streaming)

In cases where the client and server both use an object-oriented model, the typed metaprotocols are good choices. Because the typed metaprotocols define an end-to-end API specification, the applications on either end are carefully defined.

The communication pattern is not particularly important in this choice, whether RPC, REST or message-passing. The critical consideration here is that the applications on both ends use object-oriented models.

Document Model (REST, Messaging)

When the focus of the application is the document itself, then syntactic metaprotocols fit best, whether SOAP or POX. Document-centered messaging applications are an obvious fit, but some REST application work as well. The REST applications which will work are still document-centered. However, if the producer and consumer of a REST call both use API/object models, then a typed metaprotocol will be a better choice.


Copyright © 1998-2012 Caucho Technology, Inc. All rights reserved. Resin ® is a registered trademark. Quercustm, and Hessiantm are trademarks of Caucho Technology.

Cloud-optimized Resin Server is a Java EE certified Java Application Server, and Web Server, and Distributed Cache Server (Memcached).
Leading companies worldwide with demand for reliability and high performance web applications including SalesForce.com, CNET, DZone and many more are powered by Resin.

home company blog wiki docs 
app server web server 
health cloud java ee pro