Friday, January 15, 2010

Actors are not a good concurrency model

Update: Also see follow up post and discussion on reddit.

Actors are not a good concurrency model, and neither are Erlang processes.

Wait, what? Isn't Erlang the king of high-uptime distributed systems? If Erlang can do all this using the actor model (Erlang processes are identical to actors in the essential ways, discussed below), isn't there something good about it? Isn't there??

Well, yes. What's good about it is it's better than the dinosaur era alternative of shared-state preemptive multithreading. But so what? Just because the actor model is better than dinosaur era technology doesn't mean we should keep using it. Remember the rise of OO? Before OO, the alternative was programming in languages like C, with pretty much zero support for polymorphism. Any approach to polymorphism was better than nothing at all, so OO's concepts of classes and subtyping was a huge improvement (my opinion is that polymorphism was the real unfilled niche that OO filled). Now that we've learned a bit more, OO has started to seem less appealing to some - parametric polymorphism exists independent of OOP, and bounded polymorphism can be implemented with either typeclasses or a combination of first-class and higher-order modules. It's possible there are some other ideas from OO worth salvaging, but more likely I think OO is just an evolutionary dead-end.

And so it might be with actors. Though I don't know exactly what the replacement looks like yet (I'd like to look at some alternatives and explore ideas in a future post), I do know what's wrong with actors, and that's what I'd like to explore here. But going further, I'd like to use actors as an example to show what's also problematic with side effects and impure functions in general. In doing so I'll try to avoid the usual FP evangelizing and make this a more precise, technical argument.

So what's wrong with actors? The problem which dooms the actor model is that actors are fundamentally not composable. I'm going to leave that term undefined for now, except to say vaguely that entities are composable if we can easily and generally combine their behaviors in some way without having to modify the entities being combined. I think of composability as being the key ingredient necessary for acheiving reuse, and for achieving a combinatorial expansion of what is succinctly expressible in a programming model. (Also see this explanation by Tony Morris.) It goes without saying that code reusability is a very desireable property for programs to have, and many of the other virtues of good software engineering end up tying back to reusability - for instance, testability is nothing more than the ability to reuse a component in the context of a test.

So what makes actors not composable? Well, an actor (and an Erlang process) is constructed from a function A => Unit. That this function returns Unit rather than a meaningful type means that it must hardcode some action to take once the result (whose type is not even shown) is available. It is this unfortunate property that destroys any chance we have at combining actors in any sort of general way like we would in a typical combinator library.

As a simple example, consider the following two actors: 1) an actor that accepts lists of integers and turns the list into a min-heap. 2) An actor that accepts (heap,k) pairs and extracts the top k elements in order from the given heap. Can we write a function that accepts these two actors and returns an actor which accepts (list,k) pairs and pulls out the k minimum elements in sorted order? And generalizing a bit, can we write the more general combining function, the one that doesn't care whether the types are 'heap', 'int' and 'list' or A, B or C?

You just can't do this with actors since you can't make any assumptions about what the actor will do with the result - it might forward it to some other actor, write it to a file, extract from it the missile launch codes and launch the missile, etc. In fact, if you actually try to achieve the composability I'm talking about using actors, what you end up doing is having all actors return their result as a response to their sender, essentially recreating pure functions within the actor model, badly, losing type information in the process. What this should tell you is that actors, if worth anything, might be useful more as a tool for building some other higher-level abstraction. But if that's the case, perhaps we shouldn't even expose actors as a programming model and instead just program directly in the higher-level model.

In practice, I suspect actors are basically only used at the top level of a program, where the damage to composability is minimized, and pure functions (which are composable and work fine at any level of the program) are used everywhere else to achieve reuse. The problem with this approach is when you find out a few months later that what you thought was going to be the top level of your program actually is going to become a very small embedded component in some larger system, and this top level now contains a large amount of unreusable complex logic that must be gutted to work as an embedded component.

The problem I am talking about here is not in any way specific to actors. It applies to any functions with side effects. I claim the only way to achieve maximum composability is to program with pure functions, and this applies to concurrent programs or to any other programs you'd like to write. (Sometimes, you decide it's worth taking the composability hit and you write functions with side effects - of course I don't object to this in absolutely all cases - but I don't think the fundamental model underlying concurrent and distributed programming should have to take this hit.)

What makes pure functions composable is they are only the logic of the computation. A pure function makes no decisions about what actions to take with the result of the computation, and also makes no decisions about what actions to take before the computation is executed. By keeping these concerns separate, we can reuse the function elsewhere in places where we need to do something different with the result, or where we need to do something different before the computation is executed (for instance, in testing, to generate an instance of the input type rather than obtaining the input from some impure function).

In fact, you can think of any impure function as having three "steps": 1) An "input" side effect, Unit => B, a pure function (A,B) => C, and an "output" side effect C => Unit. It makes perfect software engineering sense to decouple these components - and that is exactly what is done in purely functional programming.

Let's look at a very simple example. Suppose I write a sort function, which sorts the input list in place. For this function, we have no input side effects. The pure core of this function is (conceptually) a sorting function that returns a new list. What is the output side effect? It is to rebind the name that the input list is bound to in the caller to the sorted version of the list. So if the caller were something like
def foo(list: ArrayList[Int]): Foo = { 
...
sort(list)
...
}
then sort will rebind the name list to the sorted version of that list. Since list is itself a parameter of foo, this rebinding will occur in the caller of foo, and possibly its caller, all the way out to the place where that list was originally declared. The question is, should the sort function really be making the decision about whether to do this? If we let the sort function make this decision, we're allowing it to make an assumption about the caller, namely, that the caller no longer intends to maintain references to the unsorted version of the list (an assumption which, incidentally, is not even tracked by the type system).

Of course, this example of an in-place sort isn't too terrible. The caller just needs to be aware of this side effect and has to adjust its calling convention accordingly. But the lack of uniformity in chaining and combining logic caused by side effects starts to add up very quickly. In practice, what actually occurs in large systems with impure functions sprinkled about is that many functions end up with so many assumptions about their callers (and these assumptions propagate to their callers, as above) that the call graph becomes very static, with functions often having only a single or small number of callers. This makes the system a lot more rigid, more difficult to test without elaborate mocking and dependency injection frameworks (which brings with it its own set of problems), and results in a lot of (often hidden) duplication of logic.

Other simple examples show more obvious destruction of composability. If I write a function that reads a bunch of things from a file and then performs some complex logic, we can't easily reuse that logic elsewhere (unless we want to populate a file first, which may not be what we want). If a function performs some complex logic and then launches the missile, we can't easily reuse that complex logic elsewhere in places where we don't want to launch the missile.

Taking a step back from all this bashing of actors and side effects, I do certainly agree that actors and side effects in general have a certain "intuitive" appeal, a rough analog to how the real world works. But that does not justify their usage as a programming model. The technical challenges of engineering large pieces of software mean that a good programming model may have to tradeoff intuitiveness for attributes like composability. But I suspect even this tradeoff is overplayed - intuitiveness is also often code language for "what system I am familiar with". Once you learn and internalize a different model your ability to reason within that model (and thus, I believe, its intuitiveness) is more a function of the features and formal structure of that model than of some inherent "intuitiveness" metric.

29 comments:

Anonymous said...

Why do you think actors necessarily have side effects?

Anonymous said...

What actor system are you talking about, because as I understand actors your criticism isn't valid. You need to be more specific, I think. As I understand Actors, creating an actor basically spins up a thread (of some kind) but then you interact with it using methods (some asynchronous, some synchronous). Which do have return values (either returning the actual value, or a "future"), so composability is no problem.

E.S. said...

I think it's useful to think about these things, but as typical with theory, you confusing the map for the territory.

Depending on what constraints you set, Actors certainly *are* composable. They may not be composable in the way you like, but they are. This isn't a theory thing, it's preference thing.

You provided the perfect example, the in-place sort. Consider that there may be a *very good* reason for an in-place sort other than intuition, and that your unpacked functional alternative is *not* the same thing at all when you escape the boundaries of academic type theory into the real world.

These theories have their utility, but they are models for programming, not programming itself. This is the confusion FP advocates make -- yes, it s a*wonderful* way to abstract things, but it performs like shit and eats gobs of memory compared to the best imperative techniques for anything even moderately complex. You can *not* apply the model universally.

Barry Kelly said...

Object orientated polymorphism doesn't have a strong relationship in practice to parametric polymorphism, while most OO languages have always used bounded polymorphism. OO polymorphism is about dynamicism, making different decisions at runtime. Static typing isn't a good match here; it makes decisions too early, and it ends up too difficult to try and type the entire object graph - the burden of ensuring that types flow along all paths needed in the presence of OO polymorphism and dynamic dispatch is prohibitive.

As to OO being a dead end, it seems unlikely. The language du jour have been getting more and more dynamic over time, as the benefits of dynamicism are much larger in distributed, loosely coupled systems.

Barry Kelly said...

Further: "I claim the only way to achieve true composability is to program with pure functions"

This smells like the No True Scotsman fallacy. You have to define composability explicitly in order to make such a bold assertion, as people have been successfully "composing" - as they themselves would describe it - applications for many years using component-oriented approaches to software development.

Paul Chiusano said...

@Anonymous1 and @Anonymous2 - Actors necessarily have side effects because they take actions other than simply returning a value. When you send a message to an actor, it just returns Unit, so the actor is either a very boring one that always returns Unit, or it is performing side effects.

Anon2, yes, if you always stick to the discipline that an actor never does anything other than reply to the sender, you can recover some composability with actors. But this is along the lines of what I mentioned, of reinventing functions, badly, sans type information, within the actor model. One thing this hints at though, is that perhaps pure functions A => Future[B] are a more appropriate building block for concurrency!

Paul Chiusano said...

@ES - As I indicated, I do think there may be good reasons to tradeoff composability for some other desireable attribute like efficiency (although this tradeoff isn't necessary as often as you'd think - purely functional code can be very fast). My point was not to claim that we should never make this tradeoff, only to emphasize it IS a very serious tradeoff and only take it if there are no other win-win options and the other benefits (be they efficiency or whatever) outweigh these costs. And in the case of concurrent programming, I think there are other more composable, convenient, and efficient abstractions out there than actors. But more on that in a later post!

Paul Chiusano said...

@Barry - I'm also not totally sold on static typing in all cases, but probably not for the same reasons as you. In any case, I don't think you need subtyping and classes to get the kind of dynamism you are after. That is a whole other discussion, though. :)

On composability, yes, I should definitely try to come up with a more rigorous, less handwavy definition. :) I hope that intuitively you get what I'm saying - that a pure function is the unit of functionality that makes the fewest assumptions about its environment and is thus easiest to combine in general ways with other units of functionality. Of course people can and do still combine functionality that is not side-effect free (like the sort in place function, for instance) - my point was not to claim that all impure functions are isolated silos that can each only be used in exactly one place, more that there is some (often substantial) loss in composability associated with impure functions.

Also, just to be clear, in general, people still build very impressive things with technology that could be a lot better. But just because people are able to do this doesn't mean we shouldn't try to find better ways of doing these things, or point out areas where existing techniques are lacking. :)

Barry Kelly said...

Paul -

I agree that functional programming is an excellent approach for a certain set of programming problems - but I believe that you ultimately end up writing a virtual machine for a dynamic language in your functional language when you try to target other problems.

Sum types (as typically found in functional languages) combined with pattern matching is isomorphic to subtyping combined with overridden methods with dynamic dispatch. The difference lies in what is optimized for change.

Once you've defined your sum type, you're generally done. You then go off and write as many pure functions as you like, and you can compose them easily because they only depend on the inputs, and the input type is small and well-defined.

With OO, you define your base class with all the relevant methods, and what would be the other members of the sum type become descendants; where you would use pattern matching to make distinctions, you instead use dynamic dispatch. Common OO polymorphic dispatch isn't as flexible as multiple dispatch, and isn't as flexible as generalized pattern matching, but we'll leave that aside.

The point being that when you're building a system, you have to ask yourself what you need to optimize for change, and what you need to pre-define. Functional programming generally gives you flexibility on the code side, but your data types are baked in at compile time; object orientation gives you flexibility over protocol implementation, but you can reimplement the protocol dynamically at runtime.

The question is, which is more valuable in practice? If your problem domain is well defined - a compiler is the ideal example, as it is just a big function - functional programming is an extremely good match. The language being matched doesn't dynamically change, and nor does the language being targeted. All you need is a functional semantic-preserving transformation of encoding, folding strings up into trees, mapping them, and unfolding them.

But if your problem is dealing with a rapidly changing environment, with disparate elements developed independently and at high pace, you want two things - things which have a natural tension: a defined protocol and loose coupling. They're in tension because the more over-defined your protocol, the more tight your coupling. So your protocol needs to be forgiving, flexible and non-strict; it has to take what it can get, and drop the rest; it has to potentially query for what it needs, but struggle on if it can't get it. Things as fundamental as method arity errors due to version mismatches may need to be ignored. Static typing can end up being the enemy because it can't help overspecifying, and thereby introducing massive fragility. Dynamicism and OO viewed as a protocol is a valid way out.

Anonymous said...

Looks like you might want to look into erlangs OTP, it basically solves this problem to some extend by separating the part of an actor that has side effects, from the part that is purely computational. Using OTP, you can give an actor the general "shape" of a gen_server (something like an object), a state-machine, etc.
Using pre-defined "behaviours" like this helps composability a lot.

單中杰 said...

Continuation-passing style comes in handy, no?

Chris Quenelle said...

If your program is purely functional, the compiler can assign threads to whichever chunks of calculation it wants to. Hence you don't need actors. Or any other form of explicit parallelism. The purpose of explicit synchronization is to manage the timing of side-effects in the presence of parallelism.

alb said...

"Actors necessarily have side effects because they take actions other than simply returning a value. When you send a message to an actor, it just returns Unit, so the actor is either a very boring one that always returns Unit, or it is performing side effects."

This is not correct. You're confusing effects and side effects. Of course, actors have effects on data they're processing. That's fundamental.

They do not have "side effects" however. This means that the internal operations have no effects on data outside of the actor. This is true in all actor systems and creates a model in which parallel operations can be programmed fairly straightforwardly.

Actors may or may not be your cup of tea, but they do not suffer from the defect you claim.

dohzya said...

I have always seen actors like a great replacement of synchronized concurrency models.
They *are* good concurrency model (because you don't have to think about the synchronization logic), but not for functional programming (because of side effects).

Maybe functional languages should think about other models, but actors is a nice enhancement for imperative languages !

Razie said...

Yes, actors are indeed a stepping stone. I believe that simplified work-flows (with defined in/out types and sync/async execution will be the final solution of both concurrent/parallel and distributed, in Von Neumann architectures.

OO however, it is an end-all. Yes, there are other conceptual ways to depict knowledge - for Quantum Physicists, but for the average Joe, they're surrounded by objects with properties! It's the end-all there...

Anonymous said...

Paul,
Again, what actor system are you talking about specifically? The ones I've seen do not return unit always, they return a result. It's not a convention, it's right there in the type. It's more like a regular method invocation then sending a message. Sure, some of them will be unit methods if their purpose is to have side effects, but not all of them must be. Anything returning a value would have a return type and either be synchronous or return a future.

I think you're criticising "actors" in general, when you really have a specific, flawed, variation of actors in mind.

martin said...

I believe the post confuses to some degree concurrency with parallelism. Clearly, to go from a list of integers to a min heap you want a function, not an actor. Actors are a good technology for reactive systems that need to process and link many different concurrent outside events. Another interesting technology for that is STM. But to do the things you mention you need neither. What you want instead is a good compiler which parallelizes your min-heap function (maybe after you have annotated it). Until we have such compilers, are actors a good way to implement parallelism? Again, I think no, because they introduce some unavoidable overhead over
basic threads.

So in summary, if you need to program inherently concurrent systems, actors and STMs are both interesting technologies. If you want to make you functions run faster on multicores or clouds, and you do not have a compiler which does it for you, pick the lowest level and best performing primitive available. In most systems that would be threads with atomic memory accesses or locks.

Anonymous said...

You have an odd view of actors and how their effects should be directed. I'll echo the other commenters asking "which 'actor' system are you dealing with?" and ask further "where/whom did you learn it from?". I just spent the last semester studying actors with Gul Agha, one of the early exponents of the model. The designs presented nearly always passed a continuation argument to any actor whose effect was not a 'sink' (e.g. write some data to a file). If you wanted the result of some computation to come back to yourself, the continuation passed was your own; i.e. the message send was essentially callcc. This whole "they return unit" business is indeed a poor design, which is why it's not what one should do. As with any tool, it's possible to misuse actors.

Bob Foster said...

@Razie Yes, we're surrounded by objects, but they hardly model the world. Consider three objects: board, nail and carpenter. If you take a pile of boards and a bucket of nails and run them through a carpenter, you get a house. The house object is not a property of board, nail or carpenter. A carpenter is a function. (If it were a pure function, it would produce a house without consuming the wood and nails, but that's another argument.) Classes are useful but rigid and brittle; objects with mutable state make concurrency difficult; OOP is a dead-end-all.

Runar said...

On the money as usual, Paul. A thoughtful and informed post. Your commenters seem pretty confused though. Maybe post a couple of links to material about compositionality and pure vs impure functions.

Ulf Wiger said...

Having spent many years building complex systems with Erlang, I think your post leaves out a very important aspect. I will limit myself to Erlang's concurrency model here. One of the most important problems addressed with Erlang processes is decomposition. There are four very good reasons to do this: domain modeling, error recovery, scalability and reduction of complexity.

In telecoms, the original domain of Erlang, the core function is to coordinate between signalling (message-passing) entities. The actor model fits the domain extremely well.

Furthermore, redundancy is a must, and redundancy implies loose coupling and network communication.

Telecom network elements scale by adding more computers. This is in large part due to what is called "knock-out units" - i.e. how many calls do you drop if one processor crashes? Thus, scaling for capacity and partitioning for fault tolerance are key requirements that are well met with a message-passing (distributed actor) architecture.

While you may argue that proper composition is the way to solve complex tasks, the thing that often kills large projects is dependency management. Imagine having 100 different large customers, each with a long list of requirements on your product. You will need a distributed team of people just to keep up with requirements, and another team of people to try to coordinate them. A third team will try to translate the high-level (usually fuzzy) requirements into something that design teams can make sense of, and wrapping their brain around this will take up much of the designers' time. This is mainly why projects within this domain tend to grow very large. I have seen examples of small excellent teams come up with wonderful 80% solutions but failing to deliver a complete solution simply because they didn't have the manpower to cover all bases.

Large projects tend to effectively smoke out most of the brilliant abstraction gurus - otherwise, they tend to hide away in some corner where they can focus on a very small but highly challenging piece of the puzzle.

Approaches that rely heavily on composition, inheritance, etc. that result in a large degree of dependencies between subsystems, tend to either die due to the constant ripple effects of changing requirements cutting through massive volumes of code, or sink into a performance quagmire of code riddled with "buffer zones" added by developers to insulate themselves from the surrounding chaos. I've seen it many, many times.

By comparison, message-passing architectures tend to lead towards loosely coupled components interacting through a fairly narrow protocol. This offers a good foundation for fault tolerance, scalability and separation of concerns. In addition, the model is similar to that used for the overall network solution, which makes it easier to have constructive discussions with the network engineers.

It is best to view the actor model as separate programs that interact with each other, much like people do in a human organization. Within each program/actor, you may well exploit a raft of powerful abstraction and composition techniques.

Tiark said...

Note that there are other things that might be considered "wrong" about actors.

An important aspect is that actors lack nondeterministic choice of receivers. Once a message is inside an actor's mailbox, this actor is bound to handle the message. There is no such thing as "have this message handled by the first actor out of 3 that becomes available". This makes it extremely hard to implement any notion of fairness. For example, I have yet to see a convincing actor implementation of the Dining Philosophers Problem (not counting solutions involving a global supervisor or those that reduce the available parallelism by imposing a specific order on the forks).

What is more, actors handle their messages one after the other, so there is no parallelism inside an actor. For many uses this is indeed the desired behavior, but there are also lots of cases where this turns out to be a bottleneck and messages could be handled concurrently. In any event, the sequential behavior causes many programmers to build load-balancing schemes on top of actor pools, but due to the missing nondeterministic choice, this is hard to get right (not to mention performance).

Razie said...

@BobFoster well, I read that to be:
class Carpenter {
def seeWhatCanBeDoneWith (stuff:List[_]) : Any
}

- it's a sequence matcher with some sweat :)

And...objects don't have to be mutable...nothing in the example above needs to be mutable.

Just look at http://www.scala-lang.org/docu/files/api/scala/PartialFunction.html

Objects, especially since UML, are the backbone of (almost) any domain model...

Cheers.

Anonymous said...

Your problem is that you're using a type system and a notion of composition which weren't designed for parallel programming. Looking at something like CSP, all programs are parallel compositions of small processes -- the "type system" of CSP takes this into account by placing restrictions on the alphabet which the actors can use to communicate.

For example,

actor1 = ?x:Int -> !(x + 1) -> actor1

actor2 = !5 -> ?x:Int -> print(x)

actor1 || actor2

is an entirely valid composition. actor1 waits until it can read an int, and then writes the successor of that int, and actor 2 writes and int, waits till it can read an int and prints the int it just read.

Kresten Krab Thorup said...

You said: I do certainly agree that actors and side effects in general have a certain "intuitive" appeal, a rough analog to how the real world works. But that does not justify their usage as a programming model.

I strongly disagree. The link to our intuition and hence ability to leverage prior experience is paramount in being able to comprehend complex systems. I wrote a blog post on that here: Actor Thinking.

pavan said...

Lets assume Paul is not entirely correct,then why do we lack the use of actors in common joe's world. why do none of the grad schools have a course on *actors* for concurrency control, even if the method is about 3 decades old?
I am a grad student(newbie joe).

Paul Chiusano said...

Thanks everyone for the comments. Rather than responding to everyone inline I thought I'd collect some of the comments and my responses into a follow up post. I'll try to get to that soon. :)

Paul Chiusano said...

Hey everyone, follow up post is here: http://pchiusano.blogspot.com/2010/03/follow-up-to-actors-are-not-good.html

Bill la Forge said...

JActor retweaks the concept of actor, adding support for 2-way messages without requiring a wait for replies. And by the definitions of this article, JActor actors are composable, as well as being faster than scala actors by orders of magnitude. http://jactorconsulting.com/product/jactor/