Wednesday, February 27, 2013

Abandoning hope... and XForms

Sometimes a decision must be made. I have invested many hours in a standard that is quite complex and tedious to maintain, and that was great fun, but enough is enough.

Why did I invest in XForms when it took so much effort and time? Because I believe standards are good and good standards come from the W3C and the XML community. Also, I always had somewhat of an inferiority complex in IT development that I tried to compensate by using techniques supposedly invented for common folk, i.e. non-programmers. Both are bad reasons from where I now stand. I took up XForms because I was infected by the enthusiasm of a friend (who was even quicker to abandon it) about 3 years ago. At first I assumed it just used a model from which to infer a form. It turned out it was like a superset of HTML forms with options for displaying and validating data. This was largely taken up by HTML5, as we all know. The Model remains the core of XForms, but the W3C is slow to respond to innovations and the need for a better user experience, while the programming community is evolving rapidly and growing steadily. The needs and expectations for dumbed-down tools is shifting accordingly.

When it comes to the model of XForms, there is a gap with HTML5. The model separates the form elements from their respective types, constraints and representations. But wait, is there really a need to separate these? Why not have all types, constraints and representations in the document proper? Ah, because of re-usability. Well, to be honest, so far I never encountered a use case where I could actually reuse a model! It never mattered if I declared binds on the elements or in the model, and I can't quickly think of a use case where it should.

Back to my first intuition of generating a form from a model. I must have been quite stupid back then, because I was obviously thinking about a Schema... As it turned out, people have problems grasping the difference between a model and schema, and to be honest, so do I. If a model is not generic in any way, than that means it is just another document. I recently found out that this is the core problem that I encounter when working with XForms. Like HTML5, it is document-centric, not data-centric. That was my error.

The first time I thought of abandoning XForms started with a technical debate on the usage of a specific JavaScript toolkit on the betterFORM users mailing list. I was brooding on something all this while, but couldn't grasp the issue. Now I get it: the problem was not the choice of client toolkit for the job, but the problems that arise when implementing a document-centric solution in a data-centric environment. JavaScript has taken a leap forward since toolkits like Dojo opened the possibility to switch between declarative (document-centric) widgets to programmatic (data-centric) widgets. Since then a lot of patterns have emerged that deal with problems that arise when binding events and methods in a document. To be able to benefit from these solutions, you will have to do it the Dojo way. Clearly, this my-way-or-the-highway approach is not particularly friendly towards other document-centric solutions. HTML5 will allow for a standard to be developed in tandem with dojo/event and dojo/method, XForms probably won't.

At some point I made a decision to only use programmatic widgets. The power of JavaScript engines these days is enormous, and smart design allows for a seamless user experience. At this point the incommensurability with XForms became most apparent. My use case here is localization: a website that needs to have forms in two languages.

At first my idea was document-based: use a different form for each language. Maintenance was of course killing, but the thought was correct. Much later I decided to translate the forms after all. How to go about it? I read something about putting a static key/value map into the model. Bad idea, right? I also thought about other solutions. I could, for instance, add selectors to my form elements in XForms and map those in the client from a different, more flexible location. Or I could translate the form using XSLT or Xquery, and apply xforms to an already translated document. As per betterFORM documentation I chose the first, and it's a mess. How does Dojo solve this? Each widget has a language attribute and can be assigned an nls object from its own namespace or anywhere else. The object is of course malleable at runtime, and if it's not available a default language is selected. As an aside, betterFORM does use the Dojo locale, but, alas, incorrectly (in 4.1).

There is no way I can have the same power in a declarative way when it comes to localization. As it turns out, the publishing company I work for published a standard work on localization. Perhaps I should read it to find my assumptions are wrong. But apart from localization I think the issue remains. Forms are widgets, rarely used in inline text. They should be approached data-centric, at least on the client. When form data needs to be validated or stored on the server it should be done in parallel with processing on the client, which in my opinion should be leading (I'm staying far away from HTML5 forms too). Forms can be easily generated from simple types and a schema. Creating a schema in a GUI is much more user-friendly than writing XML, so this is a MUST. Even if that GUI would create XForms, forcing its rules on a user is a no-go. Yes, I have ideas for such a GUI, but they are not part of this article. The topic was abandoning hope and XForms. Perhaps now it is only fair to admit that I never had much hope for XForms anyway...

Thursday, February 14, 2013

XML is dead. Long live RDF?

I'd choose concept over implementation any time. I kinda always knew that, but I rediscovered this recently. I want to be able to confide in that and in my intuition. It tells me XML is dead. Really. So here goes.

At XML Prague 2013 it occurred to me that RDF means the death of XML. I was discussing RDF with +Manuel Lautenschlager, and at one point he said: you can just infer XML. I tried to get him to elaborate on this statement, but we didn't seem to agree on the implications. But I thought, if one successfully manages to reason about the format of data, then XML would be one of the possible outcomes. This doesn't just mean that XML could be a subset of RDF, but conceptually: XML, its media type and any knowledge about it could simply become part of an ontology.

I discussed this with +John Snelson, who wasn't impressed. According to him, RDF is too fine-grained to present itself as a tree, the serialization would not be performant, and implementing the concept would be more complicated and time-consuming than just using XML. I'm not sure if he rather supports the possibility of embedding XML in RDF, as proposed by his MarkLogic colleague +Charles Greer, who held a talk on the subject. John thought the idea might be interesting in theory, but would not go into practice. But I feel that his approach is still a bit too techie. Data is just data, and if it weren't for concepts developed over the past decades we would still be punching holes in cards. Of course, I wholeheartedly agree that when it comes to computer science, the only way progress can be made or will even occur is when a thought is put into practice, and solves some real-world problem. But in this case, I think Manuel may have had a point, whatever it was.

Yes, for now I see that we shouldn't "infer XML", but the problem of using HTML, XML, RDF and JSON together and at what moment remains a issue. Particularly for myself, because I have a lot of room to experiment and choose the best solution at any time. From an eagle-eye perspective, the world of data just doesn't seem so complicated as to need all of them. Personally, I'd rather loose some things along the way and go back to pick them up again, then to stay put and juggling formats all the way till the bitter end.

Do I really need HTML? No. I need some way to tell a machine: this is a rectangle, this is a bitmap, this is a font rendered at this size and at that location. This you can click and it screams at you, this is just sitting there and will shift like sand when I try to resize it. Do I need XML? Do I? Sometimes a user wants to see what he's actually doing. He wants to see the under-water-screen and understand it. Why deny that? Anyone can understand and write XML (as long as it has no namespaces). Do I need RDF? We all do. We need to finally understand that the world is about local knowledge and conventions. It's the only way we can improve upon the WWW and fight the googly-eyed monster. Do we need JSON? Probably not. We need a way to transport a construct of every-day datatypes we use in our programming language. We're just very lucky JavaScript looks as it does, and I wouldn't for the life of me go back to PHP.

Since JavaScript took off, a lot of worries have faded to the background. But recent ideas like moving RSS to JSON tend to become a little like using a hammer for everything. Just to recap: data is still just data. More and more I get something like: who cares? We'll keep on blogging about the advantages of this over that, and meanwhile the world keeps turning. The main message is: the concepts are much more important, and they are: relations versus multidimensional arrays. Someone told me some time ago he would represent graphs in trees no matter what, just for the sake of having a user interface that can be navigated in a traditional way. And only now I see that, yes, so should I. So we have come to a full circle...

Trees are dead. Long live graphs. In the form of trees.

Tuesday, February 12, 2013

XML Prague 2013 Afterthought

An anniversary is supposed to be a happy occasion, but at some point you also tend to feel sad. You sense that when something reaches a certain age, it's also a step closer to death. Happy birthday dear XML.

However, if MicroXML will succeed XML (as proposed by Uche Ogbuji), then perhaps it means that ENQUIRE is going to replace the WWW. Some weeks ago I watched the unveiling of Nintendo 64, when we got all the cool games we still play today. Afterwards I wondered: how can it be Nintendo developed all this stuff back in '96, when I'm still struggling with namespaces? Happy birthday dear me.

I read Michael Kay's blog entry on MicroXML, and his concern for namespaces in XSLT. I understand this concern, but xquery doesn't seem to share this problem. Defining a module from a URI and mapping it to a local namespace is common practice, and recently found its way into JavaScript in the form of require.js, which sails under the flag of the Dojo Foundation. I don't see any problem with that, but I do see what terrible things can happen when you encode modules into your data. It's a bit like trying to draw Java depency management into RDBMS. It would have killed SQL instantly.

To continue musing on JavaScript some more, the require.js pattern was devised to solve the problem of how to download modules from the web asynchronously, and still be able to use them at the proper moment in the application. Although this is a typical requirement for web applications, it does add asynchronism to the stack. I wonder, what's Kay's approach to this in his XSLT for the client implementation? I know that eXist has a function in xquery that can spawn a new thread, and discussed the possibility for asynchronous functionality in xquery with Wolfgang Meier. It seems a lot can be gained from looking at node.js and the way synchronous versus multithreaded programming is handled there.

But perhaps by now the following question has risen: why asynchronous processing of XML, or perhaps at all? Well, lots of reasons really. Say I want to run batch jobs, and I'm certain my machine can take much more load, but is simply waiting to finish the thread. Or I send an http request and don't want to wait for a response. Sure, but why in xquery? Back to JavaScript again. The main problems I face with it on a day-to-day basis is that developers don't know or care about functional programming and immutable data. Moreover, they don't feel any need to write semantically sound code. And yes, we now have require.js and dojo patterns like deferreds and aspects, but JavaScript developers are still relying heavily on delegation, monkey patching and closures.

Another problem with JavaScript that struck me after listening to Juan Zacarias on JSONiq: it doesn't have a good querying interface for it's own bloody data model! Oh, that's right, it doesn't even have a data model ;-) how silly of me. It's just an in-memory construct of what was already available. Why not put it in a database and pretend it is a model and... ok I'll stop. The query interface in JavaScript was never properly fixed by jsonquery, but RQL is a much more solid attempt. Too bad the tla sucks, it does what it needs to do. One problem solved, but still a few to go. When I look at some code that I have to work with or extend upon in Dojo the hairs stand up on the back of my neck. And yet we all know it's the best toolkit out there...

Wouldn't it be more proper to have a client-side implementation of (asynchronous) xquery? Sorry, master, I mean no disrespect, but XSLT doesn't seem to do it for me. Not in this form anyway. Nor does ClojureScript by the way, with it's terseness and steep learning curve. I will leave you with an open question: what should the data model look like?

Mwuhahaha!