Sunday, June 6, 2010

The SPJ trick for typesafe ADT annotations

I came across this technique in SPJ's book, Implementing functional languages: a tutorial (see page 217). In retrospect I guess it's pretty obvious, but I figured I'd write it up since it's kind of nifty and I recently had occasion to use it at work.

Here's a simple ADT:
data Expr = 
Primitive Val |
Variable Name |
App Expr Expr |
Lambda Name Expr
Say we're writing a compiler and we want to annotate each node with its type, or information about free variables, etc. We could add an extra constructor to hold these sorts of annotations:
data Expr = 
Primitive Val |
Variable Name |
App Expr Expr |
Lambda Name Expr |
Annotated Annotation Expr

data Annotation = TypeAnnotation Type | FreeVariables [Name] | ...
This is okay, except that functions expecting certain annotation info have no guarantee it is actually present, or might receive annotations of the wrong type. (Also, how do we know what order the annotations are nested in, if multiple are expected to be present? The types don't tell us anything.)

Okay, let's try again:
data Expr a = 
Primitive Val |
Variable Name |
App Expr Expr |
Lambda Name Expr |
Annotation a Expr
Slightly better... now we can at least guarantee that the annotation is of a certain type. But we still don't have any assurance that the annotation is attached to the expected nodes.

Going one step further, we have:
data Expr a = 
Primitive a Val |
Variable a Name |
App a Expr Expr |
Lambda a Name Expr |
This guarantees the annotation is present, and has a definite type, but there's no uniform way to pull out the annotation info from each constructor. We could write a function, annotation that pattern matches on the Expr to retrieve the annotation. But, it turns out we can do one better:
data Expr a = Expr a (ExprImpl a)

data ExprImpl a =
Primitive Val |
Variable Name |
App (Expr a) (Expr a) |
Lambda Name (Expr a)
This is the technique described by SPJ. All nodes in the ADT are guaranteed to have the annotation, and access of the annotation is uniform. Expr can even just be a type alias for (a, ExprImpl a).

10 comments:

liamoc said...

While this technique works for all ADTs, i think you're referring to ASTs (abstract syntax trees), correct?

Sam Martin said...

There's a slight variation on this that has some other pros and cons using Fix. Instead of specifying Expr inside ExprImpl directly you can parametise the recursive points, eg.:

data ExprImpl e =
...
App e e
...

(I'm ignoring the 'a' parameter here for brevity)

Then use Fix to close the loop:

newtype Fix f = In { out::f (Fix f) }

type Expr = Fix ExprImpl

Parametising and using Fix effectively gives you access to the recursive points so you can extend them. However it does make pattern matching and deriving instances harder.

At the moment I'm still unsure whether it's really any better though :) Just thought I'd mention it.

Paul Chiusano said...

@liamoc - Well, I just happened to pick an ADT that is also an AST, but yeah, it works for any ADT. :)

@Sam - neato

augustss said...

While this is used in GHC, I don't think Simon was the first to come up with this idea. But I wouldn't be able to say who it was.

Paul Chiusano said...

@augustss - yeah, probably not. This is probably one of those tricks that has been discovered independently over and over again...

Anonymous said...

I thought the trendy way to do this these days was with a GADT:

data Expr :: * -> * where
Primitive :: a -> Expr a Val

etc.

Yair said...
This comment has been removed by the author.
Yair said...

This can also be implemented using "monadic ADTs":

type Value m a = m (a m)

data ExprImpl m =
Primitive Val |
Variable Name |
App (Value m ExprImpl) (Value m ExprImpl) |
Lambda Name (Value m ExprImpl)

type PlainExpr = Value Identity ExprImpl

data Annotated a b = Annotated a b

type AnnotatedExpr a = Value (Annotated a) ExprImpl

type ExprInDatabase = Value DBRef ExprImpl

Edward Kmett said...

This is just a monomorphic application of the fact that you can use a cofree comonad as an annotation.

data Expr a = Expr a (ExprImpl a)

data ExprImpl a =
Primitive Val |
Variable Name |
App (Expr a) (Expr a) |
Lambda Name (Expr a)

can be rewritten

data f :> a = a :< f (f :> a)

data Expr a =
Primitive Val |
Variable Name |
App a a |
Lambda Name a

And then Expr :> a -- is an 'a'-annotated Expression.

and then you can even have unannotated expressions using

newtype Mu f = Mu (f (Mu f))

where the unannotated expression is Mu Expr.

If you 'roll the fixed point' slightly you can even let these annotations apply over heterogenous data structures like you get out of a parser, but that won't fit in a comment. ;)

Paul Chiusano said...

@Annonymous - is that equivalent? Seems you not assured that all ctors will have an annotation of the same type?

@Ed - this looks like the same thing as Sam's comment above, but you've used infix data ctors. :)