Possible projects for Boston Haskell Hackathon: smarter evaluation strategies and refactoring combinators

This weekend I'm going to be participating in the Boston Haskell hackathon. I'm very excited about it and I have a couple idea for projects to work on. If any of these sound interesting or you are thinking of something similar, I'm looking for people to collaborate with! Send me an email or come talk to me in person! I think I'm going to get there sometime in the late afternoon to early evening on Friday and I'll be around all weekend.

Evaluation strategies for Haskell that don't leak space

The first project is doing some research / prototyping of an alternate evaluation strategy with the same termination properties as normal order evaluation, but with much easier reasoning about space usage. For lack of a better name, I'm calling it specializing, strictness-propagating evaluation. In this model, calling a function is something like two communicating coroutines. When calling a function, the callee begins evaluating its body, yielding control back to the caller when it needs its first argument, and also indicating whether that argument should be strict or lazily passed, using whatever information is available at runtime. Subsequent arguments work similarly. As a result, functions are automatically specialized as arguments are passed, and we do not construct thunks if they are going to be consumed strictly by a subsequent callee. This can be implemented efficiently using just two call stacks, and there are various optimizations to the scheme. It is intended to augment, not replace, the existing static strictness analysis and argument passing.

Here's an example working through this for the if function, which let's assume has the following implementation:

foldl f z l = case l of 
  [] -> z
  h:t -> foldl t f (f z h)

I'm also going to assume we've done some static strictness analysis to determine that all branches evaluate z and that therefore the h:t branch evaluates f (since all branches evaluate z and in the h:t case, f appears at the head of an expression passed as z). Suppose we call this with foldl (+) 0 [1,2,3,4].

  1. Caller pushes foldl onto the call stack. foldl begins evaluating with no arguments. It gets as far as the case l. It will then request l strictly, since it is about to evaluated it anyway.
  2. To request the argument, foldl pops its currently running frame from the call stack and pushes it onto the save stack. It then resumes the caller now at the top of the call stack with an argument of strict.
  3. The caller passes the argument as requested - if the caller were itself receiving these arguments as function parameters, it would propagate the strictness request of foldl to its caller.
  4. To resume foldl, it pops foldl from the save stack and pushes it onto the call stack, giving it the (strictly evaluated) list it requested.
  5. Now we hit the interesting case: inside the h:t branch, we know that z is strict (this is known statically). We also know that f can now be evaluated, so we request this argument strictly from our caller. With f now evaluated, we can propagate its stricness information. We know we will be evaluating f z h - what we did not know until runtime was that f was plus (let's just say it was + specialized to Int), and therefore static SA has no choice but to pass f z h as a thunk. We now know that f is strict in both its arguments, so the call to f z h means we can fully evaluate z (which we do by requesting z strictly from our caller), h, and then f z h.

Each step of the iteration works similarly and foldl ends up running in constant space. I'm handwaving a lot here, but in general I want an evaluation order that is totally predictable in its space usage - values are immediately forced as soon as their consuming functions are known at runtime. The consuming functions tell us if an argument will ultimately be forced so we find out sooner rather than building up enormous thunks.

This needs some serious whiteboarding, but assuming it is at all sensible, here's what I propose doing:

  • Come up with an instruction set for this evaluation model, and write a simple interpreter for it
  • Write a compiler for a toy functional language to this instruction set, including the basic static analysis needed to kickstart the dynamic analysis
  • Try writing some programs with it

Some other interesting ideas - I wonder if there's some way to embed this evaluation model in GHC itself.

A code database for Haskell and refactoring combinators

The other project I'm interested in working on is a code database for Haskell, and a Datalog interpreter to go with it. Using this database and the datalog query language, I then want to implement a set of refactoring combinators. A "refactoring" is simply a compilation-preserving function from one code database to another. I've started tinkering with a set of combinators that individually preserve compilation and can be composed to allow arbitrary code transformations. I wrote up some ideas for that here:

... Refactoring times in this new model will go from weeks or months to hours, and writing code to transform a codebase will become a separate but critical skill, distinct from the usual act of programming. That is, programmers do not simply conceive of a refactoring (which is often quite simple to express to another programmer), then begin a tedious, manual and error-prone process of text munging to implement it. Instead, the programmer conceives of a refactoring, then conceives of a code transforming program to implement the refactoring, then applies this transformation to the code database, all in the span of a few hours.

... First, I am not advocating for datalog syntax. I don't care about that. The key functionality enabled by datalog over and above the relational algebra is the ability to express transitive closure and mutual recursion guaranteed to terminate. Together these features enable many of the common queries we'd like to express in transforming and querying our codebases. For instance, here is a hypothetical query to find all references to a given function id, fid. Don't worry if the syntax looks alien or doesn't make sense. The key is more that this query is just a few lines of code to express, and it can be reused and built upon.

-- propagate reference to containing apply
refs(Id) :- apps(Id, fid, _). 
refs(Id) :- apps(Id, _, fid).
refs(Id) :- refs(X), apps(Id,X,_).
refs(Id) :- refs(X), apps(Id,_,X).
-- any lambda whose body is or contains fid
-- is considered to reference fid
return(Id) :- lambdas(Id,_,Id1), refs(Id1).
return(Id) :- lambdas(Id,_,fid).

Much of the analysis required to implement refactorings has this sort of "transitive-closure" feel to it - you need to do something to the "direct" callers, then do some transformation for their callers as necessary, and so on.

Here's what I propose for this project:

  • Implement datalog, possibly backed by just in-memory data structures, or maybe tied to something like SQLite. Or if there's an existing free datalog interpreter and backend for it somewhere, let's see if we can use that.
  • Come up with the normalized datalog representation for the Haskell AST and type information - besides just the AST I think you'll need to know all the type information. Is there some way to use the GHC API to get the type of all expressions in the
  • Implement or steal a Haskell parser, and write code to translate this to the normalized datalog representation. As a proof of concept, take some existing Haskell project and "code-database-ify" it.
  • Come up with a good set of refactoring combinators. Implement them using datalog. As a proof of concept, use the combinators to express some nontrivial refactoring (like - make this value monadic rather than pure, and propagate the change in calling convention to all direct and indirect callers as needed - this is exactly the sort of refactoring that is trivial to describe to another Haskell programmer, and is totally mechanical, but is still done via a tedious process of text munging)

If all this is too much, I propose not doing this for Haskell but instead for a toy functional language with a very simple AST and type system.

The future of programming

What will programming look like 10 or even 20 years from now? With another new year almost here, now is the time to wax philosophical about the future of our industry. We are on the cusp of a number of major transformations in programming that will make 2011 programming technology, techniques, and ideas seem primitive by comparison. The transformations will occur in several key areas: tooling and infrastructure, languages and type systems, and runtime systems.

Before I get started, let me preface this all with the disclaimer that this should be taken with a grain of salt, in the spirit of being provocative, and these are not really predictions per se. There are lots of reasons why inferior technologies might continue to dominate the industry longer than expected. I won't talk much about that here, although it's certainly interesting. No, instead I want to give my vision of the future of programming. If there were no accumulated network effects attached to all the mediocre technologies currently in use for programming, if the programming world were given a clean slate and the directive to invent the future, what would I want to result?

An overarching theme to all these transformations is the move away from incidental structure and its close cousin incidental complexity. Manifested in various forms, these factors are major barriers to the building programs of greater complexity, and many of the major transformations will be to enable lifting of these barriers. The resulting programming world will look much different--orders of magnitude more code reuse, ease of deployment, efficiency, and much more.

Where we begin

The first major change is the move away from storing programs as text, split across "files". You can be forgiven for thinking of this as a minor thing. Who cares how programs are stored or represented, right? In fact, this major source of incidental complexity has spillover effects all the way down the programming toolchain, starting with the pollution of programmers mental thought processes, continuing to development environments (which I'll discuss next), and then to build systems and deployment. Like many of the things I discuss here, the full extent of lost productivity due to this incidental complexity is below most programmers radars. Programmers have unconsciously accepted this lost productivity, to the point that many now rationalize their dependence on files and text for programming.

But what about intentional programming, you ask? Or Smalltalk development environments? The idea of moving away from programs as text spread across files is not exactly new, but the devil is in the details. Previous efforts to rid ourselves of this incidental structure have failed in part because they merely substituted one arbitrary structure for another. For instance, intentional programming advocated the good idea of separating code presentation from its storage, substituting instead another format like XML with only slightly less incidental structure. Any advantage conferred by the new format and structure was not exploited to significant benefit, and there was a huge, very real cost in terms of lost network effects from all the text-based tools that could no longer be leveraged. Likewise for Smalltalk development environments. Also, somewhat ironically for Smalltalk, being an OO language, its fundamental premise includes one of the most arbitrary choices of all, in some sense the ultimate generator of incidental complexity, namely, the decision of which class should implement a particular method.

There is a real barrier to entry here for any new technology. It doesn't need to just be better than the antiquated status quo; it needs to be significantly so to overcome the network effects and switching cost advantage that status quo technologies inherently posses. In my opinion, none of the proposed text-in-file alternatives so far have come close to overcoming these handicaps.

But this is nothing fundamental. We should not conclude from these failed experiments that the idea is unsound. It's merely been poorly executed. The first major shift in getting this right is not storing programs as text at all, not even some normalized textual form like XML, and not some normalized binary form which nonetheless contains a huge amount of incidental structure. Likewise, get rid of files. Instead, a codebase is stored relationally, as a database, with some version of datalog being used for querying and update. Careful thought will be given to this query-update language so that many of the large-scale refactorings that are currently difficult or impossible can be fully automated, or guided automated, with just a few lines of this codebase transformation code. As a simple example, we eliminate the notion of where a function or datatype "is located". Instead, each function gets a unique id (perhaps content addressed, to enable a simple, automated form of code deduplication), and a separate names table maps these ids to names for purposes of rendering the code in some fashion to a human (and in principle, there is really no reason why there can't be multiple names tables, with different programmers choosing different names for the same operation). This representation trivially enables the "renaming" refactoring in just a single table cell update.

Let me sketch out a few additional thoughts on how this could work. First, I am not advocating for datalog syntax. I don't care about that. The key functionality enabled by datalog over and above the relational algebra is the ability to express transitive closure and mutual recursion guaranteed to terminate. Together these features enable many of the common queries we'd like to express in transforming and querying our codebases. For instance, here is a hypothetical query to find all references to a given function id, fid. Don't worry if the syntax looks alien or doesn't make sense. The key is more that this query is just a few lines of code to express, and it can be reused and built upon.

-- propagate reference to containing apply
refs(Id) :- apps(Id, fid, _). 
refs(Id) :- apps(Id, _, fid).
refs(Id) :- refs(X), apps(Id,X,_).
refs(Id) :- refs(X), apps(Id,_,X).
-- any lambda whose body is or contains fid
-- is considered to reference fid
return(Id) :- lambdas(Id,_,Id1), refs(Id1).
return(Id) :- lambdas(Id,_,fid).

Current IDEs, with their enormous development staff and reams of special purpose code, support a very limited subset of code transformation/querying operations, poorly, slowly, and with huge overhead. As a result, there is an entire subculture of programmers (myself included) who have largely rejected IDEs in favor of minimal, less feature-rich but more dependable text editors. Future development environments will take the refactorings now supported by the most advanced IDEs as the most insignificant starting point, and build from there.

Large scale refactorings are the inevitable consequence of achieving the reuse needed to build programs of significant complexity that don't die of heat death. Refactoring times in this new model will go from weeks or months to hours, and writing code to transform a codebase will become a separate but critical skill, distinct from the usual act of programming. That is, programmers do not simply conceive of a refactoring (which is often quite simple to express to another programmer), then begin a tedious, manual and error-prone process of text munging to implement it. Instead, the programmer conceives of a refactoring, then conceives of a code transforming program to implement the refactoring, then applies this transformation to the code database, all in the span of a few hours.

The code as database concept has spillover simplification effects in other areas of tooling, in particular version control. The problem of handling merges of ordered sequences of characters spread across files and directories with yet more arbitrary structure is extremely difficult, resulting in a huge amount of complexity in the area of version control software. The difficulties have led many to settle for what are arguably inferior VCS models (Git, Mercurial), where changesets form a total order and cherrypicking just doesn't work. In the future, with code databases, we'll see a resurrection of Darcs' patch theory, only this time, it will work exactly as expected, and will be trivial to implement.

Related to this notion of code databases, we get much more fine-grained dependency management. Dependency management is another serious problem for software development, a huge hindrance to code reuse, and once again, something that is simply below many programmer radars. A library will often introduce some new interface or abstraction and give several instances of that abstraction for particular concrete types. The library now depends on these concrete types being available for compilation. Including this library now means pulling in all these dependencies, even if you just require the one instance. The overhead of doing this in conjunction with the existing complexity of builds (again partially caused by the programs-as-text-in-files paradigm) means code is reused much less than would be possible or easy otherwise.

In the future, we'll be able to conditionally specify code, and not using an ad hoc, 1970s technology like the C preprocessor. The datalog query and update language will factor in here, and allow us to express things like: if a concrete type X is available in the codebase, define some set of functions and new datatypes. Likewise, dependencies will in general not be on "a library" or "a module". A function will depend only on the set of functions and types it references, and a codebase will be a much more fluid thing. We can extract any connected component of functions and datatypes from a codebase, and this is trivial, automated transformation supported by the query language. There are some unknowns here, all solvable, around versioning, and they are related to how we reconceptualize version control as a partial order of changesets, with no extraneous dependencies due to the representation of programs as ordered sequences of characters.

Code editing, IDEs, and type systems

Code editing will be done in structural editors, which will look nothing like the existing batch of IDEs that are little more than glorified text editors (and they are actually rather poor text editors). In a structural editor, the programmer will construct expressions which may have holes in them not yet filled with terms. Importantly, these structural editors will be type-directed, so for any given hole the programmer can be presented with set of values matching the expected type, ordered in some sensible way. The editor will perform local program search to enable autocompleting of multiple expressions. If you've ever seen someone live-code Agda, you'll know how powerful and productive this idea could be. Yeah, the actual interface for programming Agda is still kind of 1970s (a custom Emacs mode), but the idea of type-directed editors is powerful. It makes it clear that types are effective not just at preventing many software errors, but also in guiding development. They are a powerful tool to augment our puny programming brains.

Along these lines, the rise of type-directed development in languages with real, state of the art type systems will mark the beginning of the end for dynamically typed (aka, single-typed) languages. Dynamic typing will come to be perceived as a quaint, bizarre evolutionary dead-end in the history of programming. This is already widely accepted in some programming circles, where the general consensus is that most dynamic typing advocates are not familiar with type-directed development in a language with a real type system and are basically unaware of the state of the art in type systems and programming language theory. With some additional developments in type systems we'll see any last of any advantages to dynamic typing completely evaporate.

A related transition is that types will become even more critical and type systems will grow features to better handle data normalization, the lack of which is a major source of incidental structure and complexity. A big problem in large codebases is data normalization. Lack of data representation normalization is responsible for the significant amounts of plumbing code involved in aligning two pieces of code so they can talk to each other. Most of the mainstream programming world isn't really aware of this issue because code reuse in most imperative codebases is extremely limited (because side-effects don't compose well).

In the functional programming world, there is insane amounts of code reuse happening, but along with this comes a fair bit of plumbing. As a small example, consider a function, f, of type X -> Y -> Int, and a value, xs of type [(Y, X)] and suppose you want to apply f to the list. An experienced functional programmer will bust out map (uncurry f . swap) xs in about 3 seconds and not think twice about it. But this code is total boilerplate, there to convince the compiler this is what you really want to do, and is noise when reading the code. Yes, you are acheiving reuse (an imperative programmer would still be writing out a for loop for the millionth time instead of reusing higher-order functions) but it should be cleaner. If this code needs to exist (and I'm not sure it even does, see below), I'd rather get to this point in the editing process and then tell my editor to map f over xs. The editor will search for a program to make the types align, show me the program for confirmation if I request it, and then not show this subexpression in the main editor view, perhaps just showing map f* xs, where the * can be clicked and expanded to see the full program.

Even better, perhaps we can get away with much less plumbing in the first place. Plumbing can be eliminated from code in two general ways--the first, which I've just discussed, is to have plumbing code written for you automatically, then hidden unless requested. The second is to have more normalized types and type systems so that there is no incidental structure for code to broker between in the first place. To support this we'll see things like row types, unordered tuples, type sets, and so on, and any related type system machinery needed to make all this feasible and convenient. An idea I'm interested is explicit separation between the model underlying a type (which could be something very normalized, with no incidental structure), and views of that type, which may contain some additional structure. Functions will generally be written to operate over the model, from which any number of views can be reconstructed post-transformation. The compiler or runtime is generally smart about choosing runtime representations to avoid unnecessary round trip conversions between different representations.

Language runtimes

Non-strict languages will come to dominate the programming world, due to the increase in code reuse and modularity that comes with pervasive non-strictness. As I've argued before, optional laziness doesn't cut it. As with other issues I've mentioned, the problems with strict as default aren't apparent to most programmers, even those who view themselves as well-versed in functional programming. The problems with strictness only become fully apparent as you get much further along in the development of the FP style, in particular, after the discovery of combinator libraries and the further abstractions that develop to remove duplication across these libraries. This has been a source of tension in the Scalaz library, which supports functional programming in Scala, a strict-by-default language, and also in our Scala codebase at work.

The only real problem with laziness has to do with reasoning about space usage performance, and evaluation stack usage. These problems get a lot of play among people who enjoy rationalizing their ignorance of Haskell and FP in general, but the truth is some additional research and smarter evaluation strategies can address the problems. There's nothing fundamental here suggesting we should throw up our hands and resort to strict evaluation.

The usual normal order evaluation is guaranteed to terminate if any is, but reasoning about is space usage in this evaluation order is problematic. We can do much better. Besides static strictness analysis, which only covers a fraction of the cases we'd like, we can conceive of additional evaluation orders which terminate for the same set of programs as normal-order evaluation, and which can propagate additional strictness information. The one I am particularly interested in I refer to as specializing, strictness-propagating evaluation. I'll elaborate on this in another post, but in this evaluation model, calling a function is something like two communicating coroutines. When calling a function, the callee begins evaluating its body, yielding control back to the caller when it needs its first argument, and also indicating whether that argument should be strict or lazily passed, using whatever information is available at runtime. Subsequent arguments work similarly. As a result, functions are automatically specialized as arguments are passed, and we do not construct thunks if they are going to be consumed strictly by a subsequent callee. This can be implemented efficiently using just two call stacks, and there are various optimizations to the scheme. It is intended to augment, not replace, the existing static strictness analysis and argument passing.

In general, the goal of new evaluation strategies should not be efficiency (though that is a nice goal as well), but simple reasoning about space and evaluation stack usage, so that these things do not depend as they currently do on incidental details of how a function is factored or what the compiler chooses to inline (being forced to reason about these things in Haskell breaks the black box abstraction of functions that FP and higher-order programming depend on).

Language runtimes and VMs will grow to support these new evaluation strategies, and the current crop of "general-purpose" VMs that are actually quite specialized for strict, imperative languages (the JVM, the CLR) will likely die off.

Code distribution and the future of the web

What of the web, javascript, html 5 and beyond? Are we going to keep hobbling along with these technologies indefinitely? I don't think it's too controversial to say that writing applications of any significant complexity as a web application involves far more work than would be required with access to a truly powerful client-side language. Again, I don't think many web programmers realize just how bad the situation is. In comparison to mainstream languages like Java, C#, or Python, Javascript isn't so bad; in some ways it's even a better language. But compared to a programming language with a good type system and real support for FP like Haskell, Scala, and whatever the next generation of languages can bring, Javascript is quite sad.

Of course, proponents of existing web technologies are always finding ways to rationalize the status quo, pointing out how much better things are today than they used to be. That's maybe true, but why settle?

I envision a future where Javascript dies off, as do browser-specific plugins like Flash, and instead we'll see client code written in arbitrary, compiled languages, using something like NaCl. This will be combined with a signed code caching and dependency tracking mechanism, so that for instance, you can distribute an application that downloads the entire Java virtual machine and other dependencies, but only if they aren't already cached on your local machine (and the signatures match, of course). This changes how we think about software. Software won't be something you "download onto our computer and then run". Instead, software exists "out there", and you run it. As an implementation detail of running it, it may choose to download some additional code it needs to do its work.

This will end the monopoly that html+javascript has on client-side web applications and largely eliminate the switching costs of moving to new client-side technologies. A few lower-level protocols and standards will be enough to tie everything together and maintain the good parts of the web today (I'll say more about this next), but nothing so high-level as specifying the client side language for interaction (Javascript, or Dart, or whatever) or the client-side language for layout and display (html + CSS). In the future, client-side code is written in whatever language, compiled to native code if desired.

In place or in addition to native client support, another option would be some sort of in-browser low-level VM designed by people who know what they're doing, who are fluent in the state of the art in programming languages and runtimes. In other words, not the people who designed Dart, Go, Ceylon, or any of the other recent languages that are 30 years or more behind state of the art. We need something that actually supports languages of the future, not something that rearranges the deck chairs on the titanic sinking ship that form the current batch of mainstream languages.

What of the suggestion that we simply use Javascript as the "assembly language" of the web, use it as a compilation target, and avoid programming in it directly? This obviously is not ideal. Compiling a real programming language like Haskell or whatever is next through Javascript or Dart is obviously not how anyone would not how design things today given a clean slate. Even if this were possible to do well without bloating client code size beyond what's acceptable, there is the inevitable efficiency hit of compiling to a language whose performance is as bad as Javascript. When you think about it, it makes no real sense to be reverting to 1/10th or 1/100th the speed of what native code or a well-JIT'd VM could provide, purely for ease of deployment. We don't need to be making this tradeoff, and with the rise of something like NaCl and/or a real browser VM, we won't have to.

There's one wrinkle. There are tremendous network effects on the web. This is part of its power, and its usefulness, and we don't want to lose it. We do still need the moral equivalent of urls and hyperlinking but one thing we don't necessarily need is the DOM. What will take its place? Don't we need the DOM to enable mashups and services like search engines? Actually, no. DOM munging and traversal is not the only way programs could obtain information from applications on the web, and when you think about it, it's actually rather primitive. Why screen scrape when you can call a real API? This transformation is already sort of happening; most modern web applications expose APIs in the form of REST+JSON. It just needs to be taken a little further.

What would this look like? Well, we would need a standard type system for the web. By this I mean a standard way for the applications that live at a url to expose a module of types and functions they support. The underlying type system would be expressive enough to encode richer data than what JSON currently provides (which besides being dynamically typed, cannot even represent sum types effectively) and will support for algebraic data types and some of the type system features alluded to earlier. With this in place, standards will arise for certain function signatures, with standard meanings attached to them. So, for instance, rather than the googlebot crawling the DOM looking for links, it invokes the getAllLinks function of the application at the given url, which returns an actual list of url values. getAllLinks is some separate standard, and new ones can arise on an ad hoc basis, as we grow new modes of interaction with web sites. There will be certain near-universal standards (like getAllLinks) and more specialized ones, specific to the web application in question (for instance, Facebook exports certain functions and datatypes that are specific to Facebook, which are unlikely to be implemented by other sites, though this is certainly not a requirement).

Already, we can see this happening somewhat: there are various ad hoc APIs and mechanisms for basically controlling how web crawlers should interpret the DOM.

With a standard way of interacting with web sites programmatically, there's no longer any need for the DOM and we can see a proliferation of innovation of different display and layout technologies.

Closing thoughts: the rise of functional programming

I've hinted at this throughout: functional programming will absolutely win. Not everyone shares my views, but many of the improvements I've talked about start with the assumption that we are working in a functional, post-imperative world. FP provides a dramatic increase in productivity due to massive increases in code reuse and the ease of reasoning about functional code (not to mention ease of parallelization). The industry is slowly starting to understand this, but it doesn't really matter if many programmers are still for whatever reason resistant to learning FP (which is unfortunately true). Eventually, the selection pressures of a competitive market will weed out less productive and effective techniques, and this largely means the death of imperative programming as we know it, except in certain niche areas. Regardless of initial biases or attitudes, programmers and companies who wish to stay competitive will be forced to employ FP.

As for why FP hasn't gained more prominence already, well, perhaps I'll write more about that in another post. What I will say is FP is currently reaching a tipping point enabling its wider adoption and relevance. There are several factors in play: functional language compiler technology is advanced enough, computers are fast enough, and most importantly, the FP community is rapidly discovering all the necessary techniques to organize large programs that preserve referential transparency. The Haskell community has mostly led this charge, Haskell being the only widely-used language that, due to its non-strict evaluation, had no choice but to fully commit to preserving referential transparency throughout. Ten years ago, admittedly, there was a good chance that expressing some programs purely functionally involved what was basically new research. Today, many of the required techniques are known, and expressing even the most imperative-seeming program functionally is possible, and for an experienced functional programmer, pretty effortless. There are still interesting open questions of how to express certain programs, but these are diminishing rapidly.

That said, I understand why many people claim FP is too hard, or it's unnatural or awkward, etc. Like many worthwhile subjects, attaining fluency in FP is difficult and requires dedication. Once this level of fluency is reached, though, expressing functional programs is quite natural and effortless (of course, software design is still hard, but finding functional designs becomes easy with practice). For people looking in, the techniques of FP seem opaque and unnecessarily difficult. For people on the inside, there's nothing difficult or complex about it and the benefits are enormous. For those who would criticise FP, I think a little humility is in order. To draw an analogy, no one without mathematical background would feel equipped to dismiss or criticise an entire branch of mathematics ("real analysis is a stupid idea"), and yet programmers with barely a cursory understanding of FP regularly (and loudly) criticise it.

I see why some people are frustrated with some of the existing resources for learning the subject. But it's wrong to dismiss the subject on that basis, or on the basis of personalities of people (myself included!) who feel that FP is more productive; objectively, either FP is worth knowing because of the productivity and other benefits it provides, or it isn't. For my part, I am co-authoring a book on FP that I hope is helpful to some who are interested in learning the subject. With the solidification of FP techniques I expect to see more and more resources like this, to the point that fluency and the huge resulting benefits are something that any motivated programmer can work toward, without encountering some of the discouraging hurdles to learning FP that are present today.

Will the future of programming look anything like what I've laid out here? Maybe, maybe not. But I certainly hope it will.

Programmatic translation to iteratees from pull-based code

When working with iteratees, one notices a sort of dual between producer and consumer--for any given stream processing task (which pairs a producer of values with some consumer of these values), operations can often be expressed on either the producer side or the consumer side. For instance, if we want to square each value in a stream of integers, we can map over the stream (the producer), or we can map (contravariantly) over the values as they enter the stream's consumer (the iteratee). Here is an intriguing idea: what if we could programmatically translate stream processing tasks from one mode to the other? That is, write pull-based code, but translate to push-based for purposes of execution. It turns out this is actually possible and I'll show code in this post for doing the conversion.

Why is this desirable? Push-based APIs like iteratees are generally preferable for execution--data is produced and sent to its consumers, then any resources associated with that data can be freed. This model is simple and easy to reason about in terms of memory and resource usage. Pull-based execution is more problematic: resource lifetimes (and therefore space usage) are not deterministic since at any point in time, a function may choose to request "historical" data from the producer. The consequences of this have plagued some FRP implementations (though there are ad hoc workarounds that involve limiting access to history in various ways or only exposing certain "safe" combinators rather than the full pull-based API that forms the underlying model) and led many to reject lazy IO in favor of iteratees.

While iteratees address these problems for streaming IO, programming with them directly requires inverting your thinking, which is arguably somewhat unnatural. Likewise for other push-oriented APIs like arrows, which have used as the basis for FRP implementations that aren't as susceptible to space leaks. Honestly, I'm not sure what I think about the claims that push-based APIs are less "natural". What's most important is that whatever API you use composes, avoids code duplication, and gives you some reasoning tools. With those ingredients, I suspect you can get comfortable with any API. (I remember when I first started writing iteratee code, which we have a lot of at work, my brain nearly exploded when writing even the simplest of iteratees and HOFs for composing them... but after some practice it came much more naturally.)

Moving on, the key to converting from pull-based code to push-based code is to represent streams of data as a ListT, universally quantified over the underlying monad. I'll explain this in a minute. First, a ListT is just a list that yields a monadic effect with each uncons. Here is a ListT implementation:

-- Skip nodes let us do filtering without lookahead
-- uncons is then a simple loop that removes any Skip nodes
data Step a s = Empty | Yield a s | Skip s

newtype ListT f a = 
  ListT { runListT :: f (Step a (ListT f a)) }

instance Monad f => Monad (ListT f) where
  return a = ListT (return (Yield a empty))

  (ListT s) >>= f = ListT $
    s >>= \step -> case step of  
      Yield h t -> return $ 
        (Skip $ f h `mplus` (t >>= f))
      Skip t -> return $ Skip (t >>= f)
      Empty -> return Empty

instance (Monad f) => MonadPlus (ListT f) where
  mzero = ListT (return Empty)

  (ListT s) `mplus` b = ListT $ 
    s >>= \step -> case step of  
      Yield h t -> return $ Yield h (t `mplus` b)
      Skip t -> return $ Skip (t `mplus` b)
      Empty -> return $ Skip b

instance MonadTrans ListT where
  lift fa = ListT $ liftM (\a -> Yield a empty) fa

empty :: Monad f => ListT f a
empty = ListT (return Empty)

cons :: Monad f => a -> ListT f a -> ListT f a
cons h t = ListT . return $ Yield h t

Now, consider a function of type forall f . Monad f => ListT f a -> ListT f a. Parametricity guarantees the function cannot assume anything about f other than that it's a monad. Substitute the identity monad for f and we can see how any function [a] -> [b] could be written as a forall f . Monad f => ListT f a -> ListT f b. In other words, this API is just as expressive as the usual lazy IO style pull-based API.

The trick is that the caller of such a function can substitute another monad in for f, namely the reader monad. This lets us push values into the ListT.

data Input a = Done | Await | Element a 

prompts :: ListT (Reader (Input a)) a 
prompts = ListT . reader $ \input -> case input of
  Done -> Empty
  Await -> Skip prompts
  Element a -> Yield a prompts

prompts is a potentially infinite stream of a, but with each step, we obtain a function from Input a -> ListT (Reader (Input a). On the one hand, we can feed this to a forall f . Monad f => ListT f a -> ListT f b and from its perspective it is pulling values from the stream. On the other hand, at each step we get to push a value into the stream (or signal termination, or whatever "instruction set" we wish to support).

We need one more ingredient, a slightly-reformulated version of iteratees. A well-behaved iteratee will always yield a result (rather than a Cont) when fed EOF. We can make this more explicit in the type:

type IsEOF = Bool

data Moore a b = 
    Feed b (Input a -> Moore a b) 
  | Stop b IsEOF

I've renamed this Moore since it is essentially just a Moore machine supporting early termination (the Stop case just encodes the fact that the state transition function always leads back to the same state from that point). With iteratees, it is really just a convention that intermediate b values are not inspected (and hence not computed in a lazy language) until either EOF is hit or the iteratee signals termination early.

We can now use this to perform the inversion of control programmatically. Note the signature for invert!

isDone :: Input a -> Bool
isDone Done = True 
isDone _ = False

invert :: (forall f . Monad f => ListT f a -> ListT f b) 
       -> Moore a (Maybe b)
invert f = go Nothing (f prompts) where
  go cur res = Feed cur $ \a -> 
    case runReader (runListT res) a of
      Empty -> Stop cur (isDone a)
      Yield h t -> go (Just h) t
      Skip s -> go cur s

We can write very similar code for monadic iteratees:

data MooreT f a b = 
    FeedT b (Input a -> f (MooreT f a b)) 
  | StopT b IsEOF

invertT :: Monad f 
        => (forall t . (MonadTrans t, Monad m)
          => ListT (t f) a -> ListT (t f) b) 
        -> MooreT f a (Maybe b) 
invertT = ...

Here is the full code. I haven't played with it much, but isn't it fascinating that this translation is possible? (I suspect there is some categorical connection here though I'm not quite sure what it is yet) It means we can write code using a pull-based API, where functions have access to the "full history" of the stream they are transforming, but translate to a push-based API for purposes of execution. The translation will discover and retain the exact portion of the history required to express the transformation, and it will retain this history for only as long as needed. I find it interesting to think about how different operations on ListT will get mapped to the equivalent Moore machine. For instance, a function that conses onto a ListT results in a Moore that "delays" the input stream by one step. And so on...

This trick of writing code which is parametric in the choice of monad is not really specific to stream processing and I suspect it has other uses. (Ed Kmett uses a similar technique in his recent searching infinity post and it's probably been used elsewhere) For any given function parametric in the choice of monad, it's fun to consider what sort of interesting structures can be built by substituting different monads.

(Software) patents don't make much economic sense

Patents, in particular software patents, usually don't make sense from an economic standpoint. Patents grant a monopoly to an individual for the invention. Having a single individual control production and use of an invention is obviously inefficient compared to allowing everyone to use and build on it freely - there are a huge number of people in the world who could be doing useful things with the invention that are unable to do so because of the monopoly granted. Thus, granting the monopoly means that during the period the patent is valid, society makes less efficient usage of the invention than if it were in the public domain. How much? Well, let us say for the sake of argument that it's 10x less efficient. This number is completely made up, but bear with me.

Of course, patents act as an incentive to inventors. Without the incentive, perhaps it would have taken longer for the invention to become common knowledge. Let us say for the sake of argument that it took an extra five years for invention X to arise in the absence of patents. In this case, after the five year period, society benefits ten times more each year than it would if a single inventor were given a monopoly. If we assume, say, that society derives one unit of utility each year from the invention, then after a single year in the public domain, society will already have obtained double the benefit from the invention than in the previous five years that the patent holder was granted his monopoly. And this will continue for the lifetime of the patent.

Obviously, these numbers are made up. The point is more that the invention really does have to be something that would not have been discovered for a very long time to make up for the inefficiency of granting a monopoly. If it would have been discovered relatively soon anyway we would have been better off just waiting for that to occur. Software patents rarely make sense because software development requires almost no capital investment, and as a result, it is almost impossible for an individual to develop some software invention that would not be discovered by multiple other people soon in the future. Do you know of any individual or organization that is even capable of creating some software "invention" that would not be rediscovered independently anyway in the next five or ten years? I don't. No one is that far ahead of everyone else in software, precisely because there is no capital investment required and no real barriers to entry.

What I find irritating about all the software patent discussion is that patents are intended to benefit society - that is their purpose, "To promote the Progress of Science and useful Arts". But no one seems to want to reason about whether that is actually happening - that would mean doing things like thinking about how likely the invention was to be independely discovered soon anyway, estimating the multiplier of having the invention be in the public domain, etc. Instead we get regurgitation of this meme about making sure the little guy working in his basement gets compensated for his invention. Aside from the disingenuousness of this example, this is shifting the discussion away from the real issue. The issue is not whether "the little guy" deserves to be compensated for his work, the issue is whether granting "the little guy" or anyone else a monopoly for 17 years is net beneficial to society.

Unrestricted effects and nondeterminism in purely functional code

This post is partially a continuation of Referentially transparent nondeterminism.

In a program's outermost input and output layers, unrestricted effects and nondeterminism do not break referential transparency since there is no enclosing program around to observe the effects. This gives us a lot of freedom to do things we wouldn't ordinarily be able to do without breaking purity. Just how far can we push this?

Consider the mealy machine datatype:

data Mealy a b = Mealy (a -> (b, Mealy a b))

Clearly, we can feed a Mealy a b from an IO a producer. But we can also combine two IO a producers nondeterministically. It is only when we sample the output that we observe the effect of the nondeterminism. Here is a typeclass expressing this:

class (MonadPlus s) => Source s where
  -- <|> has the effect of merging two sources, nondeterministically
  -- perfectly okay, since when we sample later 
  -- (at the end of the universe), we get IO
  (<|>) :: s a -> s a -> s a
  transform  :: s a -> Mealy a b -> s b
  drain :: s a -> (a -> IO b) -> IO b
  -- deterministic merge - explicit interleaving
  zip   :: s a -> s b -> s (a,b)
  filter :: (a -> Bool) -> s a -> s a
  -- etc

This gives us nondeterminism in our input streams, which are nicely first class. Our choice of stream transformers, Mealy, are also first class. Only when we observe the output of a source (in drain) do we pay for the side effects we may have accumulated. Note that our (<|>) does not require us to pick a deterministic interleaving of the two sources - imagine we have a Source for keypresses of the letter a and another Source of keypresses of b. If our source representation were something like data Source a = Source { sample :: IO a }, our implementation of (<|>) would be required to pick a deterministic interleaving (wait for an a, then wait for a b, or vice versa). Our implementation can simply pass through either event when it occurs. Effects are paid for only when the stream is observed.

We can allow for more composition in our output handlers with a type like:

type Sink a b = Iteratee a IO b

-- replace drain signature in Source with:
drain :: s a -> Sink a b -> IO b

We then compose Sink values like we would any monadic Iteratee. What we lack, though, is the ability to pass along the nondeterminism from the merging we did with Source. Meaning, if we call drain (a <|> b) i, we ought to have the option (since this is the end of the universe), to evaluate this like drain a i OR drain b i, meaning we do not specify how the effects produced by the two separate instances of i are interleaved. To acheive this, we can add another constructor to Sink:

data Sink a b = One (Iteratee a IO b) | Many (Sink a b)

Many is used to signal to the Source evaluator than any effects <|>'d together may be fed to the given Sink concurrently. Again, since no one is around to observe it, this nondeterminism does not break RT. I don't know how often this sort of output nondeterminism is really useful, but it's interesting that it is possible.

So, we so far have allowed unrestricted side effects and nondeterminism at both the input to our programs as well as the output of our programs. What else can we do? Well, we can allow commutative effects in the implementation of our stream transformer:

data Mealy m a b = Mealy (a -> m (a, Mealy a b))

Here m must be some commutative monad. This ensures that we do not need to pick a global ordering on our effects. One interesting addition is to explicitly separate zip from the monad interface and require only that zip commute. This lets Mealy pipelines see effects from earlier pipeline stages, but not from parallel pipelines:

class Monad m => Commutative m where 
  -- subject to zip a b == fmap (snd *** fst) (zip b a)
  zip :: m a -> m b -> m (a,b)

As a final thought, I wonder: is it possible to make an instance of Commutative for something like State? Clearly, the implementation of zip should feed the same input s into both rather than sequencing the state as usual, but what do we do with the two different output states? We require either some sort of commutative state merging function s -> s -> s to combine the two new states, or we are forced to propagate both values along somehow, without committing to an order!

Related posts:

Do (side) effects really need a total order?

Monads and applicative functors enforce a total ordering on sequencing of effects - often not the right choice. Sure, you could write your distributed or concurrent program to globally sequence effects, but the result is often unnatural, less efficient, and requires more coordination than truly dictated by the problem domain. At the other end of the spectrum we have commutative monads and applicatives - structures where liftM2 f a b == liftM2 (flip f) b a. These give maximum flexibility in how effects may be sequenced, but the full commutativity requirement is just not possible for most situations. In a commutative monad, there can be no dependencies between the effects - an effect cannot depend in any way on what effects have come before, only on the set of effects that constitute it.

As an aside, it makes more sense to talk about the commutative applicative functor underlying a monad, rather than speaking of "commutative monads". The reason is the usual definition of commutativity of effects for monads - that do { a <- e1; b <- e2; f a b } is equal to do { b <- e2; a <- e1; f a b } - is only even defined if e2 does not reference a. That is, the code cannot be actually monadic - for effects that commute, it must be possible to describe the behavior in terms of some underlying applicative. Applicative lifts the global ordering on the production of the effects (we no longer have to wait until all previous effects have generated to produce the current effect), but not the global ordering on their sequencing (since we cannot rearrange effects without changing meaning).

What I'd like is the ability to specify and reason with a partial order on effects - this would incidentally cover commutativity as the degenerate case where all effects are incomparable. I'm not sure how to express this as an API, though. Here was a first attempt:
class PosetApplicative m where 
  pure :: a -> m a
  lift2 :: (a -> b -> c) -> m a -> m b -> m c
  -- False if m a <= m b, that is m b depends on m a
  -- True if not (m a <= m b), m b does not depend on m a
  commutes :: m a -> m b -> Bool
The idea here is that if commutes a b, then lift2 f a b == lift2 (flip f) b a. This enables a few things - sequence and traverse could now work very differently - they might internally rearrange the effects in some way that is more efficient, then reconstruct the original sequence at the end. Or imagine the stream of effects one might feed to an iteratee - the iteratee could choose to skip a value in the input - the corresponding effect needed to generate this input could then be placed in a buffer and sequenced in later only if a subsequent effect depends on it. Hence the iteratee would not be forced to wait for evaluation of effects it was not interested in!

This interface would need to be implemented a bit differently than most applicatives. Consider the state applicative. The commutes a b translates to the claim that a and b operate on separate portions of the state. E.g., the state is a pair and a only touches the first element of the pair and b only touches the second. We might say there is some lensing going on, and the two foci of the lenses do not overlap. But it should be clear that the regular representation of state - s -> (a,s) - is not going to work, as we have no knowledge of what this opaque function might do with the given s. We need some sort of ADT to track the portion of the state each state action is given access to, so we can explicitly reason about dependencies and how they overlap.

I am only just starting to think about these possibilities. I'd love to hear if this has been looked into before, or if anyone else has experience with this sort of thing.

Referentially transparent nondeterminism

At NEScala, I gave a talk on using mealy machines for deterministic parallelism. I think it's a nice model that works out pretty well. But, the more I've thought about it, the more I think deterministic parallelism isn't enough - nondeterminism is needed in some form, but with a way of preserving referential transparency (RT) by not allowing this nondeterminism to be observable by programs.

Nondeterminism is both useful and important. In the context of stream transforming functions, we need to be able to merge two inputs to a stream transformer without specifying an ordering on the elements. As an example, if we have an event stream for mouse clicks and an event stream for key presses, we need to be able to merge these streams in a nondeterministic way, without saying anything at all about how the streams interleave - that is, I do not want to specify that the output stream is, say, all the mouse clicks followed by all the key presses, or the two interleaved in a completely regular way. No, what I want is more like the result of summing two independent random variables - the actual stream of events for the merged stream will be interleaved in a random way (not truly random, but modeled outside of the system), where there is no interaction between these events. I should be able to remove (via filtering) all the key presses from the merged stream and obtain something equivalent to the original mouse click stream.

In order to keep RT, then, whatever consumes this stream must also end the universe (it can never return). If such a consumer did return, we could then feed the stream to another consumer and observe that we did not have the same results. But this is nothing new, functional programming is all about moving side effects to the edges of the program where they cannot be observed. The only difference is what effects we're hiding in this way. There's something neat about the approach though - instead of eliminating the nondeterminism entirely, which would force us to chose some global, explicit ordering for our effects (unrealistic for a distributed system and many other situations), we keep the nondeterminism and just focus on pushing it to the outside of our programs like we would other effects.

At some point I'll write up these ideas in code to make it clearer what heck I'm talking about.