Wednesday, September 22, 2010

The four stages of (functional) programming

There seem to be a series of stages of abstraction in functional programming - actually, I'd say in all programming, but it's more obvious in FP:
  • Stage 0: no functions, and thus, no abstraction
  • Stage 1: functions, but no higher-order functions
  • Stage 2: higher-order functions, combinator libraries
  • Stage 3: typeclasses, abstracting over combinator libraries
  • Stage 4: abstracting over typeclasses
Each stage lets you remove a class of code duplication. If you were to code in a stage 0 language, your only means of reuse would be copy/paste. Stage 1 introduces the ability to abstract over functionality, but in a limited way, since the structure of the computation is fixed. (I think many programs in a language like Java are stage 1 programs, or close to it.) Stage 2 introduces HOFs which removes a class of duplication by letting us abstract over control flow (this is where I put most Lisp/Scheme programs). From this we get the usual combinator library style. But at this point, we are still dealing with concrete types. Common patterns across these libraries are not abstracted over.

Stage 3 is the generalization of combinator libraries - code that is generic across multiple such libraries - think of the generalized combinator library we obtain from a monad, or a traversable. This removes the duplication found due to common patterns in collections of higher-order functions. I wonder - was Haskell's type system and commitment to purity a prerequisite to developing this stage of programming? In principle stage 3 is possible in any dynamic language with first class modules and first class functions.

And stage 4... well, we're only starting to see what stage 4 looks like. In stage 3 programming, we can write code polymorphic to any instance of the typeclass, but there could be duplication in the implementations of different typeclass instances - for instance, you might create a data type that is essentially a list of trees, but in giving the Applicative instance for your type, you start from scratch. In stage 4 programming, you'd assemble your type and appropriate typeclass implementations from combinators. This is done a little bit here and there in FP - you might compose Applicatives or take their product, but in stage 4 this style is pervasive and taken to its logical limit.

Stage 4 requires some serious rewiring of your programming brain. Forget concrete types or concrete type implementations. You don't implement a tree datatype. You assemble the type and all its useful functions from a catalog of typeclass combinators. Category theory would become an essential tool for programming in this style ("oh, there's a natural isomorphism between this type and blah, just gotta establish this isomorphism to get all that functionality"). Would we then see category theory and this style of programming taught as part of the standard computer science curriculum? Maybe...

I don't know of any any language that supports stage 4 programming adequately. It seems to require, among other things, first class and higher order modules. Probably more powerful type systems, too. Ed Kmett has an interesting language he's been working on, called Kata. Based on what he's told me about it, it looks specifically designed to make stage 4 programming easy. Interestingly, I think he's given up on static typing.

What do you think? Is stage 4 the future of (functional) programming?

18 comments:

Eric said...

Hi Paul, a running example on the 4 stages would be very nice, especially on stage 4 which I don't visualize clearly (and this is not saying that I master stage 3,...)

Thanks.

Paul Chiusano said...

@Eric -

Good idea. :) I will add something or maybe do a follow up post.

Olle K said...

In my world the big step up from say SML to Haskell is purity. But you do not talk about purity in this post, instead type classes (or lack of static typing) is the main focus.

Do you consider the absence of side effects less important than level 4 programming?

Paul Chiusano said...

@Olie - I think purity is more of a prerequisite. Side effects destroy composability so you can't get very far in building up these abstractions.

Joeri van Eekelen said...

If I understand things correctly, stage-4 programming is then something (what's in the Haskell-world known as) like generic programming?

And now I'm wondering if there's a stage-5, and if so, what it'd look like...

Sam Stainsby said...

Stage 5 is where you abstract over the developers and create a general AI to write the code :-)

Tom said...

For static type systems at least, it seems like you're missing a very important stage of abstraction: parametric polymorphism.

metaphysicaldeveloper said...

I might be mistake, but stage 4 capabilities remind me a lot of Scala's views. Maybe that with structural typing...

Anonymous said...

Hi Paul,
Very nice read! It is interesting seeing shape up your thoughts, going from "pull libraries" over to "four" levels of programming. I would't have called the post "THE four levels" because, As others suggested there might(will?!) be more. I'm looking forward to your promised example on level 4!
Regards Andreas S.

Runar said...

So there are not only 4 levels of programming, but 4 levels of programmers. That's handy, to be able to say "he's a level 3 programmer".

Paul Chiusano said...

@Joeri - datatype generic programming is not totally what I have in mind for stage 4, but maybe it is just approaching it from a different angle. Generic programming seems more about deriving typeclass implementations from the internal structure of the datatype. I don't think this is the full story. This is a very vague thought though...

@Tom - I think parametric polymorphism is more of an enabler - it is what makes HOFs so useful. By itself, it is not a new level of abstraction. Likewise, higher-kinded types are an enabler for stage 3 programming - but it's the generalized combinator libraries that form the new level of abstraction.

@Sam - Lol. :)

Edward Kmett said...

I don't give up static typing, completely. I just use it to prove a very different set of properties and guarantees than is provided by Hindley-Milner-style type systems. This lets me avoid the difficulties of "hill climbing" in the solution space near HM, which is important since HM provides such a good price/performance trade off that few minor variations provide objective improvement in any one area without disproportionate costs in others.

Tom said...

@Paul Chiusano - I disagree, parametric polymorphism is an important level of abstraction independent of HOFs. Java has had it since 1.5, and this alone allows to type-safely avoid lots of code duplication (List<A> instead of IntList, StringList, etc.), even in the absence of HOFs. So I would insert parametric polymorphism at level 2.

Paul Chiusano said...

@Ed - Thanks for clarification. :) Can you explain what properties you use your type system to establish? Or is that a longer conversation?

Anonymous said...

While the first 3 stages are fairly general, the last 2 are specific to Haskell and languages that borrowed type classes from Haskell.
Module systems and higher order modules (such as those present in ML variants) are I would say more general and better examples of the last 2 stages (given my interpretation of what you were saying, of course).
And dependent types are an even better (and again more general) example seeing how with dependently typed records you get such a module system for free.

Jonathan O said...

Stage 4 reminds me a lot of how Scala's collections library is built up. Traits are combined to build interfaces and implementation.

Edward Kmett said...

Sadly it is a much longer conversation. =)

In short I use them to check that your definitions are consistent, but I provide little machinery to ask about provenance.

This gives you enough to use OOP as a code reuse mechanism, but the design deliberately removes the ability to ask certain questions that let you violate parametricity (such is "is this object an instance of a given class"?). This makes it harder for code behind the scenes to violate Liskov substitutability, and favors the needs of the library maintainer allowing them to refactor many ways without regard to the downstream consumer. When you combine that with a language that is lazy and immutable where 'new' ties a knot, a lot of the issues with multiple inheritance drop away, and since you can't ask 'isa?' questions I can include language features that would make that question considerably more expensive.

The net result is that you get a system that it is easy to express small incremental refinements in laws or in code reuse, which lets you build, say, a deep algebraically motivated numerical tower without boilerplate, and without the white box inheritance issues of traditional OOP designs or the 'abstraction sweet spot' issues required by the 'interface'-like design of Haskell class hierarchies, which place a lot of burden on people who instantiate classes that are deep in a class hierarchy.

It isn't perfect, but I think it is an interesting point in the design space.

Anonymous said...

What's the URL to Kata's language definition? Mikael