Here's a simple ADT:
data Expr =Say we're writing a compiler and we want to annotate each node with its type, or information about free variables, etc. We could add an extra constructor to hold these sorts of annotations:
Primitive Val |
Variable Name |
App Expr Expr |
Lambda Name Expr
data Expr =This is okay, except that functions expecting certain annotation info have no guarantee it is actually present, or might receive annotations of the wrong type. (Also, how do we know what order the annotations are nested in, if multiple are expected to be present? The types don't tell us anything.)
Primitive Val |
Variable Name |
App Expr Expr |
Lambda Name Expr |
Annotated Annotation Expr
data Annotation = TypeAnnotation Type | FreeVariables [Name] | ...
Okay, let's try again:
data Expr a =Slightly better... now we can at least guarantee that the annotation is of a certain type. But we still don't have any assurance that the annotation is attached to the expected nodes.
Primitive Val |
Variable Name |
App Expr Expr |
Lambda Name Expr |
Annotation a Expr
Going one step further, we have:
data Expr a =This guarantees the annotation is present, and has a definite type, but there's no uniform way to pull out the annotation info from each constructor. We could write a function,
Primitive a Val |
Variable a Name |
App a Expr Expr |
Lambda a Name Expr |
annotation that pattern matches on the Expr to retrieve the annotation. But, it turns out we can do one better:data Expr a = Expr a (ExprImpl a)This is the technique described by SPJ. All nodes in the ADT are guaranteed to have the annotation, and access of the annotation is uniform.
data ExprImpl a =
Primitive Val |
Variable Name |
App (Expr a) (Expr a) |
Lambda Name (Expr a)
Expr can even just be a type alias for (a, ExprImpl a).
10 comments:
While this technique works for all ADTs, i think you're referring to ASTs (abstract syntax trees), correct?
There's a slight variation on this that has some other pros and cons using Fix. Instead of specifying Expr inside ExprImpl directly you can parametise the recursive points, eg.:
data ExprImpl e =
...
App e e
...
(I'm ignoring the 'a' parameter here for brevity)
Then use Fix to close the loop:
newtype Fix f = In { out::f (Fix f) }
type Expr = Fix ExprImpl
Parametising and using Fix effectively gives you access to the recursive points so you can extend them. However it does make pattern matching and deriving instances harder.
At the moment I'm still unsure whether it's really any better though :) Just thought I'd mention it.
@liamoc - Well, I just happened to pick an ADT that is also an AST, but yeah, it works for any ADT. :)
@Sam - neato
While this is used in GHC, I don't think Simon was the first to come up with this idea. But I wouldn't be able to say who it was.
@augustss - yeah, probably not. This is probably one of those tricks that has been discovered independently over and over again...
I thought the trendy way to do this these days was with a GADT:
data Expr :: * -> * where
Primitive :: a -> Expr a Val
etc.
This can also be implemented using "monadic ADTs":
type Value m a = m (a m)
data ExprImpl m =
Primitive Val |
Variable Name |
App (Value m ExprImpl) (Value m ExprImpl) |
Lambda Name (Value m ExprImpl)
type PlainExpr = Value Identity ExprImpl
data Annotated a b = Annotated a b
type AnnotatedExpr a = Value (Annotated a) ExprImpl
type ExprInDatabase = Value DBRef ExprImpl
This is just a monomorphic application of the fact that you can use a cofree comonad as an annotation.
data Expr a = Expr a (ExprImpl a)
data ExprImpl a =
Primitive Val |
Variable Name |
App (Expr a) (Expr a) |
Lambda Name (Expr a)
can be rewritten
data f :> a = a :< f (f :> a)
data Expr a =
Primitive Val |
Variable Name |
App a a |
Lambda Name a
And then Expr :> a -- is an 'a'-annotated Expression.
and then you can even have unannotated expressions using
newtype Mu f = Mu (f (Mu f))
where the unannotated expression is Mu Expr.
If you 'roll the fixed point' slightly you can even let these annotations apply over heterogenous data structures like you get out of a parser, but that won't fit in a comment. ;)
@Annonymous - is that equivalent? Seems you not assured that all ctors will have an annotation of the same type?
@Ed - this looks like the same thing as Sam's comment above, but you've used infix data ctors. :)
Post a Comment