Friday, October 30, 2015

Shuffle it

Coming from the muddy world of javascript, I've always desired cleaner code, not only from myself, but from the community as well. I tried to achieve that by creating a small language that parses to javascript, as so many others have done before me. It's based on Joy, developed by Manfred von Thun over a decade ago, although it seems to have been abandoned. A similar language, Forth, was already developed back in the sixties, and is still actively used in embedded computer systems. Like other "purely functional" languages the core mechanic is the mathematical function. Mathematical functions perform operations on a number of values, and return a new value, not touching anything else in the program. Unlike other functional languages, the functions – or rather, "words" – in Joy (and Forth) have no formal parameters. Consider the following function in javascript:

function add(a,b) {
    return a + b;

Parameters 'a' and 'b' are needed to create the purely mathematical add function (indeed, I just made a common JS operator into a word...). Instead of using parameters, functions in Joy operate on a stack, which is always available to the program. By simply writing values and words in a left-to-right order, the stack is both appended to and consumed. A simple example of adding two integers:

2 2 add

As can be clearly read from this example, two integers are appended to the stack, and lastly the addition function is executed on those two integers on the stack, which are then discarded. After the operation the integer 4 is left on the stack. Had the program continued, that value could have been used for new operations, for example:

2 2 add 3 multiply

Which obviously leaves 12 on the stack.

The advantages of a stack-based programming style for the web is quite evident, as it's pure, concise and presupposes little knowledge of the syntax. Furthermore it doesn't require knowledge on the more complex subject of function composition, as you can simply read the flow of the program. But there's a catch: there are no temporary placeholders other than the stack. So what do you do when you have some value that you need further down the line? As you now know, words consume values on the stack. How do you retain values until you actually use them?

Let's assume that you need to count the number of items in a list, and that value is needed more than once. Counting items in a list is an operation that you'd only want to do once, not only for performance benefits, but because it makes sense. The following program creates a list and counts the items (the list is created just for the example, normally it would have been put on the stack by another operation):

1 2 3 list count

This would leave 3 on the stack. Next, the result is tested for equality, which puts a boolean on the stack. However, this would consume the value (3) that we need further in the program, and so it needs to be duplicated. The first indented line gets executed when the result is true, the second when it's false:

1 2 3 list count dup 2 eq if
    4 add
    5 add

As you may already figured out, dup introduces another kind of operation that is not part of the flow of the program, but is simply there to retain a value that would otherwise be lost forever. These kinds of operations are called stack shuffling. You might even argue that a language like Forth didn't become mainstream because of the lack of temporary placeholders. But, since it hasn't got any, stack shuffling is a dire necessity. It also puts a great burden on the programmer's mind, who has to keep track of what is on the stack at all times. Furthermore, the programmer needs to make choices what to do first and what to retain for later, making code harder to revise.

Stack shuffling has become a true skill in itself, with a number of techniques that make it more endurable, but in the end a person like me can only take up so much stack space in his own mind. In the end it wasn't the program, but the programmer who had a 'stack overflow'. My enthusiasm for stack-based programming eventually died and I went back to the good old functions with parameters, and the good old placeholders for temporary values.

Monday, June 22, 2015

A web developer's unrest #2

This is a small series of blog posts where I try to find out how to best build UI components for the web. I build on Dojo, but for its upcoming 2.0 release, a decision is yet to be made on the successor of its widget system, Dijit. In this series I compare Dijit with some newer techniques now available for creating UI components, with the ultimate goal to be able to decide upon a long-term strategy myself.

Part 2: State

State is a bit of a neglected fire hazard in UI tools. The danger is that there are so many changes going on in your components that you tend to loose track. This is especially true when you have a single-page application. Dijit doesn't force you to create single-page applications, but when you run many components it makes sense to work that way, especially once you start handling navigation. That may seem like a bad idea, but it does give you full control of the program, its data flow and persistence, DOM rendering, transitions/animations, even styling, but comes with the ultimate price: you, the developer, have to manage state. That is as far removed from traditional web pages as one can get, because the WWW is initially stateless: client state is not retained by the browser.

Back in the day, Javascript libraries used to extend the DOM. Dojo chose instead for decoupling, where isolated pieces of code render and control a portion of the DOM. As a consequence, Dijit is entirely stateful: each instance of a Dijit inherits first from the Stateful class, which ensures it has a "get", "set" and "watch" method, and so its own internal state. Updates of the instance's properties may be reflected in its DOM attachment. Although most changes are shielded from the API, it does mean that when you want to create a custom component, you have to master Dojo's way of managing state. It is perhaps mostly due to this aspect that Dojo is said to have a steep learning curve.

A good reason to introduce state is for example the support of WAI-ARIA for custom components, to make them accessible to people with disabilities. In other areas it's less obvious. For instance, Dijit's layout is managed through a diversity of containers and content panes that can be nested to create desktop-like screens. Any resizing of a container sets off a chain of calculations, each retained in the widgets concerned for optimization. This was introduced to enable a layout system in older browsers, but can and should be avoided nowadays.

How does Polymer deal with state? It is handled firstly by two-way data binding: properties are stored in plain old objects, referred to as the "model". The DOM is hijacked: Javascript is fooled into interacting with the model as if that were the DOM itself. So, when the model changes, the DOM changes, but also vice versa (hence two-way data binding), entirely in sync. What I just called "hijacking" is an alternative way to avoid extending the native DOM, and more sophisticated than keeping pointers. This means there's a bit less state involved, but this doesn't mean that Polymer is entirely stateless: the state is in the model, and managing it can become as opaque as it can become with Dijit.

Finally, while looking at how React.js handles state, the first thing that struck me was that there's actually a property explicitly called "state"! There are also methods that seem to pertain exclusively to handling state. Apparently the React.js developers are coming more from a functional programming paradigm, as in functional languages state is something you have to achieve, rather than it being at the basis, like in javascript. However, at the top of the developer's guide it says: Components are Just State Machines. So much for introducing state where it ought to be, and not throughout the component library, right? Wrong! React explicitly tells us to not use state everywhere, but only where needed:
Most of your components should simply take some data from prop[ertie]s and render it. However, sometimes you need to respond to user input, a server request or the passage of time. For this you use state. Try to keep as many of your components as possible stateless. By doing this you'll isolate the state to its most logical place and minimize redundancy, making it easier to reason about your application.

With the above quote I end the second part of my attempt to find my bearings in UI development in the year 2015.

Saturday, June 20, 2015

A web developer's unrest

Being a long time Dojo Toolkit zealot, I was eager to see what would become of its upcoming 2.0 release. I can't say I'm disappointed, not at this point at least, but I do experience some unrest. This is not only because of the general direction Dojo is headed (TypeScript), but also because of the obvious difficulty the team has in deciding on a UI component solution. To my knowledge it is generally agreed that the current widget facility in Dojo (Dijit) is no longer deemed viable.

I expect that the core Dojo contributors have already made some sort of overview of their options, but I have not. For my own insight I'll attempt to sketch out the current developments in UI techniques, clearly with the ultimate goal to be able to decide upon a long-term strategy myself. Since this will all become very TL;DR I will divide the post into topics. I also promise to make a chart at the end of the series, to show all considerations at a glance. Today I'll start with code reuse, comparing Dijit with some newer techniques now available.

Part 1: code reuse

Dijit is an object-oriented widget library that builds on the core technique of Dojo's class-based approach to code reuse through modularity: the "declare" mechanism. The "declare" function takes as its main parameter a list of classes that are mixed together to form a new class. Inheritance is ordered from left-to-right: each class can call this.inherited(arguments) in any method in order to execute its counterpart in the preceding class. This way a chain of inheritance is formed that is able to reach all the way back up to the first class in the list. This manner of ordering stems from a solution to the problem that arises with multiple inheritance. It is called C3 linearization, and is found in some serious object-oriented languages. Not for the faint of heart, and definitely not something you would expect to find in a web tool.

As far-fetched as C3 linearization may sound, it is not the reason Dijit is on the way out, on the contrary. The initial replacement of Dijit was to be Delite, a project backed by IBM. It takes the same approach to code reuse, but differs from Dijit on other points, as I will discuss later in this series.

A solution for mixing classes together was also discussed for the next version of Javascript, ECMAScript 6. The proposal has been revised many times on this point, and still remains a topic of debate, but as it now stands the inheritance model will follow that of firmly established object-oriented dinosaurs, such as Java and C#. Although ECMAScript 6 doesn't embrace the Dojo solution, Dojo's approach to object-oriented Javascript did play a role behind the scenes. In addition, the solutions Dojo proposed for reusable components didn't stop with multiple inheritance, and the exploration for new patterns continues to this day. I won't go into detail about those here, because that was not the intended topic. However, it is likely that they will persist in Dojo 2.0.

So how do other tools solve the problem of re-usable code? Some more recent players of major import are Polymer and React.js. To start with Polymer, it's a project aimed at enriching markup, and does so by encapsulating it, together with Javascript and CSS, into a single component. Because the focus is on markup, one would expect the possibility for extension to be limited. However, there is some room for code reuse through mixins, which, instead of mixing together classes, does so with instances. This means there's no chain of inheritance, but it's arguable if that was necessary in the first place. Polymer proposes to simply reuse what behavior is in another piece of code, as if it were a container of functionality, and that's a valid way to make use of Javascript's inherent features.

Although Polymer (or rather, web components) doesn't support Dojo's multiple inheritance model, it's very much related to Dijit, and one could argue that it has it's origins in it. The goal of Dijit was also to create a landscape of encapsulated, albeit interdependent, widgets. Although Dijit requires almost half of the stuff in the Dojo tool chain, it does rather well at separating concerns. And although Polymer is presented as the future of the web (because, hey, it's web components) it also needs to download quite a bit of code to get started.

React.js is a radical departure from Dijit, but also from Polymer. It is more concerned with updating the user interface efficiently than with component encapsulation. Perhaps it would be possible to make use of React's fast UI rendering without using its component engine, but it would be quite an undertaking. React's engine is build around an in-memory representation of the document tree: instead of updating the actual document tree that the browser renders, all manipulation is done on the "virtual DOM".

Every time a change occurs due to some outside event, the virtual DOM is reloaded. A batch of changes are compared to the previous snapshot, and the differences are then rendered to the UI. This diffing is what makes React so efficient, but there is some critique on the component side of React. Because the components need to interact with the virtual DOM, all native parts of the component need to be "virtualized", including its markup and styling. Although these are mere technical limitations that at some point will be overcome, the question remains if it is a good idea to work with an in-memory copy of the document tree, and to basically overwrite the default behavior of the browser software, at all times.

As to the re-usability problem, React has chosen to completely abandon the idea of multiple inheritance and mixins, in favor of "higher-order components".  put development of multiple inheritance and mixins on hold, since it's not a part of upcoming ECMAScript standards. When googling for React and mixins, I came across an interesting alternative to mixins, proposed by React developer Sebastian Markb√•ge: higher-order components. All this means is that a container of reusable code is bound to a component. This can be achieved by simply wrapping the component in a function, and calling the function with the container as its argument. The container will remain in the scope of the function, and so accessible to the component. One could argue that this is very much like mixins, with the important difference that the technique is available in plain vanilla Javascript, and doesn't require any boilerplate code for mixins, let alone C3 linearization. If the concept will hold for more complex cases remains to be seen.

This concludes the first part of my attempt to find my bearings in UI development in the year 2015.

Update 06/22/2015: the assumption that React would move away entirely from mixins seems to be incorrect. The resource I used was authored by a React contributor, and not (yet) part of official documentation. Source:

JQuery critique (dis)continued

Some time ago I posted a not so nice attack on jQuery, which I removed. Before that I posted a critique of jQuery that I myself find constructive and informative, but was nonetheless a bit random and incomplete. Moreover, it was also a shameless plug for a new technique I created myself: a stack-based, concatenative language that parses to Javascript in the browser. However, I have stopped working on it, as I think that stack-based languages are hard because of the mental bookkeeping you have to do on the stack. Sure, that is a perfect brain trainer, and there are ways to make the exercise less cumbersome, but I don't expect a mere fraction of jQuery users will ever warm up to such a niche technique. To cut it short, this introduction serves as an apology to both the reader and the user of jQuery, for whom it has created huge opportunities. But now to end my critique of the bling chain once and for all.

First of all, I'm no user of jQuery and never have been. The reason for my crusade against it has been the isolation I experienced in relation to the community of fellow web developers that created content based solely on jQuery. It would have been beneficial to my work (and perhaps that of others) if a more open technique would have been embraced. I've never found a community that completely fitted into my personal view, but the overall enthusiasm the past years for jQuery has put me off again and again. I know that at the moment the world of web development is going through a major shift, and my critique has become less relevant. However, I would like to put it out there, to try and prevent developers from making the same mistake in upcoming solutions.

JQuery is a catchall. You can just throw anything into $() and you don't have to care what comes out at the other end of the chain. You use .get() to evaluate the result, which can be anything, from a DOM node to an event handler. I suppose this is simply caused by organic growth of the library: first it was just a uniform way to query and transform DOM, but as the library grew, so did the types of data that were processed in this style. The uniformity obviously served a purpose back when you had so many browser incompatibilities, but it was never a good programming practice to begin with, as type systems have mattered since the dawn of computation. Actually, proper typing and other standards have probably been slow to develop in Javascript because of the popularity of jQuery. I think a lot of people underestimate the damage it has done, because it lacks any formalization, mainly regarding but not limited to the following:
  1. what the input is that $ operates on
  2. what types are allowed in that input
  3. what the types are of the throughput
  4. what the semantics are that operate on the throughput
  5. how to consistently extend those semantics
In short, jQuery lacks what constitutes a proper programming language, while it does insist on a uniformity that diverts from its "host language", Javascript. One might say jQuery is just a pattern, and one that will get the job done quickly, but this pattern comes at a price: it is not compliant with any ECMA standard and cannot be transparently ported to one. Moreover, due to the lack of formalization, it could not and never will become a de facto standard.

Anecdote: I once came across a widget where an XMLHttpRequestUpload instance was thrown into $() just so bind() could be called. Imagine doing that with document.querySelectorAll... The point here is that typing doesn't necessarily has to be enforced, but using a tool that doesn't even adhere to best practices in this regard should not have been allowed to become this successful. It shows what little formal knowledge its large user-base has, and I don't say this to insult anyone. To conclude, I found a video on the internetz from 2012, where a designer makes some valid points about jQuery. Though not from a formal background, the speaker instinctively finds some flaws that I need not repeat here, but in my opinion the community should have taken note of back then.