Thursday, April 20, 2017

Pandora's Xbox

In 2012 Mark Nottingham wrote a blog post entitled "JSON or XML: Just Decide". It arguments that instead of trying to support both formats, web APIs should simply just use JSON. His advice seems to have been followed widely, as there are very few XML APIs left nowadays. However...

XQuery is a language that was originally developed as a DSL to query and transform XML documents. Since JSON has pushed XML aside as a data format on the web, the authors sought to support other data structures as well: JSON-like arrays and maps. This was realized in XQuery version 3.1, and a recommendation candidate was published in 2015. I know what you're thinking: too bad, too late, and against MNot's advice. This would indeed be my TL;DR, but just let me elaborate a bit more...

Together with XPath and XSLT, XQuery is a powerful tool to handle semi-structured data-"documents" based on the InfoSet datamodel. However, maps and arrays (JavaScripts' native data structures), are not part of the XML infoset. This means that maps and arrays exist in XQuery on their own, separately from other data types, as so-called "function items". When trying to serialize an XML element containing a map, XQuery will throw an error like "XQTY0105 Enclosed expression contains function item". The W3C XQuery workgroup admits that function item types were tacked on to the language as an afterthought, and has spent a lot of time and resources debating how to fit them into the syntax and datamodel. The solution that was settled upon seems more of a truce than a sound consensus. It is doubtful that after this struggle XQuery will ever see a version 3.2.

It should be mentioned that due to the stateless (or side-effect free) nature of XQuery, maps and arrays are required to be persistent (AKA immutable), which means that "function items" aren't available as native types in common host languages for XQuery, like Java, and also not in JavaScript, and are in that sense not really JSON at all. Implementations are often borrowed from popular functional programming languages like Clojure or Scala, where they have been developed by people like Rich Hickey and Phil Bagwell. Like all non-native types, performance depends heavily on the choice of algorithm, and there's no "general-purpose" immutable data type. Each implementation has its own set of characteristics, like faster appends over slower concatenation or iteration, for example. Strangely enough, there's no mention of this in the XQuery specification. Well, not so strange, since XML seems to be all about "don't explain how it works, just how to use it"...

Although not the first inconsistencies in XQuery (consider the higher-order function notation, for example), function items are a blemish upon the "immaculate" datamodel and performance concerns. But more than anything else, they open Pandora's Box, or rather, a means for XQuery to break out of the DSL and try to do stuff formerly reserved for general-purpose programming languages. Sure enough, smart developers have always found ways to work around the intentional limitations of XML tools, but still... maps and arrays have allowed me to parse, serialize and execute XQuery and XSLT within XQuery! Parsing speed is of course hampered by the implementation's performance characteristics, but this goes to demonstrate that it can be done, and that XQuery has grown out of its DSL boundaries.

So, how to continue? On one hand it's clear that XQuery will not gain more popularity by adding JSON to the stack, since most developers don't require XML at all anymore. But on the other hand, it does have some neat tricks for working with data we rarely see in other languages. So maybe there's still hope for XQuery? One of the problems of JSON is that there's no way to retrieve data from complex structures of maps and arrays conveniently. This has always been a forte of XML tools, because they all include XPath, the standard syntax for traversing the XML tree. But alas, since functional items remain alien to the XML infoset, they can't be traversed using XPath! There's some "syntactic sugar" to select something from a hierarchy of "JSON" data in XQuery, but this is not only alien to XQuery, but also very unlike anything in JavaScript, and won't become popular as such.

So what about modifying "JSON" data in XQuery? XML fragments are simply recreated, but most implementations can optimize this process somewhat behind the scenes. However, doing this with maps and arrays straightforwardly would be a performance disaster! For XML documents there's also the XUpdate extension to XQuery, but again inconsistent and not for JSON.

Ironically, modern JavaScript solutions for traversal and modification of JSON data (or HTML DOM) trees shy away from complex traditional APIs and move towards functional-style, stateless techniques, some reminiscent of XPath, some of CSS selectors, and again others of XQuery-style for-comprehensions, in the form of LINQ, a query DSL that has its origins in the C# .NET tool set. It seems that we should probably look to proprietary solutions nowadays for doing what we need to do with data.

To conclude, JSON is a latecomer in XQuery, very likely too late. Had the focus been on developing a more intuitive syntax for handling maps and arrays, the W3C XQuery Working Group could have had the answer for creating JSON in the best possible way, pouring years of knowledge on treating hierarchical data into a solid model. As it now stands, the feature is a half-hearted attempt to save the language from its impending demise.

No comments:

Post a Comment