Monday, June 6, 2011

Referentially transparent nondeterminism

At NEScala, I gave a talk on using mealy machines for deterministic parallelism. I think it's a nice model that works out pretty well. But, the more I've thought about it, the more I think deterministic parallelism isn't enough - nondeterminism is needed in some form, but with a way of preserving referential transparency (RT) by not allowing this nondeterminism to be observable by programs.

Nondeterminism is both useful and important. In the context of stream transforming functions, we need to be able to merge two inputs to a stream transformer without specifying an ordering on the elements. As an example, if we have an event stream for mouse clicks and an event stream for key presses, we need to be able to merge these streams in a nondeterministic way, without saying anything at all about how the streams interleave - that is, I do not want to specify that the output stream is, say, all the mouse clicks followed by all the key presses, or the two interleaved in a completely regular way. No, what I want is more like the result of summing two independent random variables - the actual stream of events for the merged stream will be interleaved in a random way (not truly random, but modeled outside of the system), where there is no interaction between these events. I should be able to remove (via filtering) all the key presses from the merged stream and obtain something equivalent to the original mouse click stream.

In order to keep RT, then, whatever consumes this stream must also end the universe (it can never return). If such a consumer did return, we could then feed the stream to another consumer and observe that we did not have the same results. But this is nothing new, functional programming is all about moving side effects to the edges of the program where they cannot be observed. The only difference is what effects we're hiding in this way. There's something neat about the approach though - instead of eliminating the nondeterminism entirely, which would force us to chose some global, explicit ordering for our effects (unrealistic for a distributed system and many other situations), we keep the nondeterminism and just focus on pushing it to the outside of our programs like we would other effects.

At some point I'll write up these ideas in code to make it clearer what heck I'm talking about.


jamiiecb said...

This is close in spirit to the CALM theorem ( Basically, the eventual result of a monotonic logic program is referentially transparent with respect to the ordering of elements in incoming streams. This means that distributed monotonic logic programs are eventually consistent.

I also did some work in the past using transactional mealy machines (with non-deterministic scheduling) to implement a betting exchange. I wrote about it very briefly at . I haven't had a chance to watch your talk yet (still at work) but I'll be interested to see how you reconciled determinism and addressable actors, if at all.

jamiiecb said...


Your approach seems similar to Opis ( Opis comes with some fairly amazing tooling which could be ported to Scala. From the original announcement:

Beyond an EDSL, Opis also provides some useful tools to speed-up the
development and deployment of event functions. Hence, given an event
function and without any modification, we can:
- deploy it on a real network
- simulate a network executing the event function, to test the
behavior of the system "in the large"
- debug a network of nodes running the event function, to inspect
the system "in the small", with forward and backward execution steps,
state inspection, ...
- model-check the distributed system against safety properties,
featuring a dynamic partial-order reduction mode that avoids the usual
combinatorial explosion in most systems
- performance-debug the pure functions, by measuring their
processing time and inferring their (algorithmic) complexity

Paul Chiusano said...

@jamieecb - Thanks for the references. :) I will take a look.