Monday, December 31, 2018

Towards a safer JavaScript

Let's face it, JavaScript by itself gives you enough freedom to write terribly bad code. It may not yet be the new Perl, but one can certainly go for the write-only style that still haunts one of the most useful programming languages in history. The recent ES generations (are we really finally about to call it ECMAScript?) gave my favorite language a huge productivity boost, and along came static type checkers, proper linters and decent style guides. But it's not all sunshine and unicorns shitting rainbows. The community seems to thirst for more and more syntactic sugar, yet at the same time needs more and more linter rules and type constraints to rein itself in...

Perhaps at some point ES will stabilize and a new JavaScript (or whatever we will call it) may arise, discarding all the idiosyncrasies that we tried so hard to get rid of. But I don't have the patience to wait. And no matter what the community or business world says, I'm pretty sure we can discard some things straight away. As I'm an advocate of pure functional programming, I like to strife for limiting imperative code to a very confined space. Stateful operations with side-effects, like object-oriented programming, have their use, but not always and everywhere. Since we already constrain developers with linter rules and types, why not take it a step further and disallow some more "dangerous" operations as well?

While the relative safety of a more limited syntax seems obvious, it also makes for a much leaner and more predictable syntax tree, the resulting structure of interpreting this syntax (i.e. with a parser). This gives us the ability to parse and transmit code more efficiently and to manipulate it with less effort, which has already been observed in LISP, for example. This means rewriting programs is easier and unlocks possibilities for the arcane science of code generation. On the more humble side, domain-specific languages can be created on top of the simplified syntax tree, for instance a language to query a specific database. However, the topic outlined in this paragraph lies outside of the scope of this article.

I've worked with pure functional languages enough to see the downsides of having no imperative constructs. The main handicap becomes apparent when producing some low level utility. There one has to resort to some abstract alternative to concrete state, leading to much more complexity. Yet while it can certainly be tedious to have no access to imperative code, it doesn't mean we should always expose it. Most of the time we are on the consuming side of said low level utility, and we just need our code to be clean, productive and safe. We want it to communicate what we mean by it, not the procedure how we came to that meaning. But enough has been said of the advantages of removing the imperative mess, let's take a look at how we can actually achieve it.

I would like to propose splitting up JavaScript into an unconstrained and a constrained version. The first is (mostly) for producing low-level stuff and the second is (mostly) for consuming it. One is imperative in nature, the other is declarative. In addition, the latter is a form of lambda calculus, the mathematical ancestor of all functional programming languages. From this it should be clear that I don't mean to just use current ES functionally, but to create a pure, yet related, syntax.

To design this new language, I've taken inspiration from XQuery, the XML querying language. Although it's a strongly typed language (it seems to map to TypeScript quite well), XQuery resembles JavaScript in some respects. XQuery has many nice features that make it more complete than TypeScript, React, Redux, Immutable and RxJS combined together (in my opinion). However, if I would have to choose one and only one feature it would be that it in XQuery you can't have side-effects. How can we transport this to JavaScript? It might be easier than you think.

Every statement must produce a result

A "block" in JavaScript has its own scope, but must eventually address and change state that lives outside of it, or nothing will happen (the only exception is when return is encountered within a function body). In XQuery a block always returns a value instead, which is provided by the last expression in the block, and return is reserved for disambiguation only. If we would apply this rule to a block included in a JavaScript imperative statement (like if, for, while, try or catch) it means we have eliminated our imperative mood all at once (well, almost)! We may then either return that value from the enclosing block (e.g. a function), or assign it to a variable1.

No more vars

In a pure language, a variable can never be modified, in other words, there is no "state". A variable can only be constant. Yet we may want to reuse a variable name by assigning a new value to a previously bound variable name (although this can lead to confusion about its meaning). In any case, we can keep the ES const keyword as it is, and allow for the reuse of a variable name bound with let. However, we can still modify a variable bound with let outside of its scope, in other words, have side-effects. To prevent this, we introduce a small change: every variable must be bound explicitly. From this it follows that we have no more use for var.

Example


function test(a, b) {
  // if returns a value
  // the variable a is reused, but no state is retained
  let a =
    if(a > 0) {
      // the value of the expression in this block is returned
      a + 1
    } else {
      // re-assigning a value to a here doesn't have side-effects
      let a = 2
      return a
    }
  return a + b
}


Sure enough, these are only minor changes, but with great consequences. Obviously there remain some details to work out. For instance, what should loop-statements like for and while return? To me the answer is straight-forward, because in XQuery for always returns a sequence (as part of the FLWOR expression syntax). A sequence is just an abstract data type that serves as an iterable. The iterable already exist in ES, but sequences allow for more functional operations on this data type.

To conclude, this exploration is the result of my years of struggling to find the best tool for the job, that is, my job. I'm still searching for other functional feats to put in my imaginary ES spin-off, but I hope that what I've described here could be the start of something real an usable. Let me know what you think of these functional fancies!
 



1.The difference in syntax for fat arrow functions with and without return will naturally dissolve because of this, as will the difference between an if and a ternary expression (i.e. ? : ).

Wednesday, April 18, 2018

Trie Demo

https://codepen.io/wshager/full/zWBzMV

It says "directed acyclic word graph", which I think is the correct term, but I just found out on Wikipedia that this is indeed a trie. The difference is illustrated quite precisely, but I think it's really the same thing. I didn't use an existing algorithm, but I just made this up myself. I didn't study computer science (and I'm mostly glad I didn't). If you did, let me know if it's any good.

Tuesday, March 20, 2018

Rich Reducers

Writing parsers is not for humans. Humans define grammars, and feed those to parsers. However, I was determined to try my hand at making a parser for XQuery in XQuery as a human, because I saw that the language is actually quite consistent. The same is true for XML and S-expressions, for example, and for those languages there already exist hand-built parsers (for example in in JavaScript, see https://github.com/isaacs/sax-js and https://github.com/fwg/s-expression respectively).

After all, how hard can it be? Well pretty damn hard, as I've worked on it on and off for two years. Yet only recently I came up with three basic utility functions that help to understand how a parser actually works. These functions are variations of the widely known reduce (or fold) function.

Looking ahead

What you often need to do when parsing a string of text – character by character – is to look at what the next character is. For example, a colon has several different meanings in XQuery, so the parser needs some context to decide if a colon is part of a qualified name, a comment or an assignment. This can only be decided by "peeking" at the next character. A "peek" method is often available under this name in a parser in some object-oriented environment, but not in a strictly functional language like XQuery. So, time to introduce a function that actually does what I need: reduce-ahead.

The function I've created in XQuery could obviously also be written in other languages. Let's start by recapping what the regular reduce function does. Given some list-like structure (e.g. an array, a sequence, or a stream), and a starting value, apply a function to each item in the list, and return the accumulated result. Example:

  1. A list: (1,2,3,4)
  2. reduce with starting value: 0
  3. A function that adds each value to the accumulative result:
    (accumulative,currentItem) => accumulative + currentItem

Now each call will return an intermediate result for each item in the list:


iterationaccumulativecurrentItem
101
212
333
464

After the last function call, the result will of course be 10.

My reduce-ahead function simply adds an extra parameter to the accumulating function, that contains the next item in the list. With this function we could return another value depending on which item comes next in the list. The following function doesn't add when the next item equals 4:
(accumulative,currentItem,nextItem) => nextItem == 4 ? accumulative : accumulative + currentItem


iterationaccumulativecurrentItemnextItem
1012
2123
3334
434

Now the outcome will be 7.

This functionality is easily extrapolated to a parser, where it can look ahead whenever a character should be disambiguated. Other variants in the same spirit are reduce-behind, that provides the previous item in the list, and reduce-around, that provides both the next and previous item.

Especially when reducing streams this is quite useful, because you can't simply (read: efficiently) inspect a stream by using an index. Apart from hand-made parsers other use cases for this powerful principle may pop up in the future.

JavaScript implementation: https://gist.github.com/wshager/df8e0c67281068cd5d01f30452df7287

Friday, March 16, 2018

Pleidooi voor een meer formele benadering van programmeren

Programmeren is zowel een wetenschap als een gereedschap. De "man van de praktijk" loopt gaandeweg tegen problemen aan die formeel allang verkend zijn. Deze zaken zijn veelal uitgewerkt in ingewikkelde wetenschappelijke publicaties. Dat niet iedereen die leest is begrijpelijk, maar veel kennis is ook geïmplementeerd in functionele talen, als Haskell, Scala en recente LISP/ML varianten. Om inzicht te krijgen in fundamentele concepten is het verstandig om contact te houden met de ontwikkelingen in dergelijke talen. Ook in Javascript begint deze kennis door te dringen, al is dit geen zuivere functionele taal...

Kennis van zuiver functioneel programmeren opdoen kan door zelf een taal te leren of door bronnen te raadplegen op het WWW, zoals diverse blogs en tutorials (er zijn ook veel videos) of hulp te vragen van experts op StackOverflow. Het doel is niet om zo'n taal te gebruiken, maar om concepten te doorgronden die eraan ten grondslag liggen. Dit geldt overigens ook voor XQuery, een taal met een veel minder steile leercurve dan bovengenoemde. Maar XQuery heeft wel een aantal beperkingen, die blijven voortbestaan door het isolement waarin de taal zich bevindt, namelijk in de impopulaire wereld van XML. Echter is de taal niet in isolement ontstaan, en er zijn ook hierin verborgen juweeltjes die je kunt vinden door simpelweg met de taal te werken.

De verborgen formele oorsprong van XQuery zit in zaken als:

  • de monadische aard van sequences: alle data is veilig “ingepakt” zonder dat je daarover hoeft na te denken
  • for-comprehensions: ingepakte data zijn intuitief uit te spreiden en zo te verwerken
  • parallelle computatie: sequences lenen zich uitstekend voor impliciete luie evaluatie en het afhandelen van asynchrone processen

Ook Michael Kay van Saxon, de bekendste en meest gebruikte implementatie van XQuery (en XSLT), heeft toegegeven dat een XQuery implementatie zonder enige vorm van luie evaluatie slecht zal performen. Een voorbeeld van een implementatie zonder luie evaluatie is eXist-db: er is geen verschil tussen "complexe-computatie()" en "complexe-computatie()[1]". De functie wordt in zijn geheel uitgevoerd, voordat alleen het eerste item in de sequence wordt uitgevraagd. Kay maakt zelf overigens ook fouten, volgens mij door de theorie niet goed genoeg te bestuderen (hij noemt zichzelf beslist praktisch). Er is wel een formele specificatie van XQuery, maar die is weinig leerzaam in dit opzicht:
https://www.w3.org/TR/xquery-semantics/

De voordelen van XQuery zijn tegelijkertijd ook een nadeel. Het maakt gebruik van formele concepten, die zijn weggestopt voor de "eindgebruiker". Dit kan enerzijds gebruiksgemak vergroten, maar zorgt er anderzijds voor dat je je eigenlijk nauwelijks ontwikkelt als programmeur. Ik denk dat het beter is om zaken op de één of andere manier expliciet te maken of ergens uit te leggen. Maar ik denk nog meer dat het alleen mogelijk is om concepten te doorgronden door ze zelf te bouwen. Dat is niet mogelijk in een taal die is geïmplementeerd als black box. We moeten daar dus af en toe uit zien te komen.

Ik zie in JavaScript een zeer toegankelijke taal, die door velen wordt omarmt en waarin het mogelijk is om allerlei concepten uit te werken, voor zover ze nog niet zijn toegevoegd aan de taal zelf. Ik zou dit graag met anderen samen doen die JavaScript beheersen en openstaan voor een meer formele benadering, maar tegelijk realiseer ik mij dat iedereen hierin zijn eigen ontwikkelingsweg heeft te gaan. Ik denk echter dat het op dit moment voor iedereen die van software zijn beroep heeft gemaakt, noodzakelijk is om de volgende zaken te verkennen:

  • Reactive Extensions for Javascript / Observables / Streams
  • Immutable datastructuren
  • Een vorm van type-aanduiding
  • Code-generatie op basis reeds bekende, overzichtelijke tools, zoals formulieren, spreadsheets of gestructureerde editors, vooral in plaats van de ouderwetse database / CMS / CRM


Ik denk dat we zelf begrip moeten opbrengen voor deze zaken, anders raakt de kennis uit ons blikveld. Niet iedereen zal op hetzelfde moment en met dezelfde controle de kennis aan kunnen wenden, maar waar het nu om draait in software is precies bekend. Net als in zelfverdediging zijn er maar een paar dingen die je moet weten om jezelf enigszins te kunnen verdedigen. Je kunt zeggen: "ik bouw software, maar ik hoef niet alles zelf te begrijpen". Je kunt je uiteraard aan de confrontatie onttrekken... totdat er toch iets op je pad komt. Maar dan heb je in ieder geval de noodzakelijke gereedschappen om daarmee om te gaan.


"RxJS is een hamer"