jueves, 21 de enero de 2016

Monads, I/O and Concurrency in Lux

Hello, and welcome to the 5th entry in the Lux Devlog, a blog dedicated to the design and implementation of the Lux Programming Language, an upcoming functional language in the Lisp tradition.

You can check out the Lux repo over here: https://github.com/LuxLang/lux

I am very pleased to announce that Lux 0.3.2 is ready, and in this release came in several cool features for the language, plus some additions to the standard library.

On this post, I'm going to talk about one such addition: concurrency, through the Async type.

But first, I need to talk about 2 other things which are going to be useful to understand: how to use monads in Lux, and the IO type.

Monads

I'm going to skip the details of what monads are and just assume that anyone reading this is already familiar with that abstraction.
For those of you who don't know, I'll just say that a monad is just a monoid in the category of endofunctors.
Easy-peasy.

Anyway, the Monad signature is defined in the lux/control/monad module as this:
 (defsig #export (Monad m)  
   (: (Functor m)  
      _functor)  
   (: (All [a]  
        (-> a (m a)))  
      wrap)  
   (: (All [a]  
        (-> (m (m a)) (m a)))  
      join))  

And, as you can see, it's defined as depending on the Functor signature, which is defined in the lux/control/functor module as this:

 (defsig #export (Functor f)  
   (: (All [a b]  
        (-> (-> a b) (f a) (f b)))  
      map))  

Whereas Haskell has a return function for monads, Lux has renamed that to wrap.
Also, those of you who are already familiar with monads (perhaps from using Haskell) will notice that there is no mention of the bind (or >>=) function anywhere. The reason is that, for any monad, bind can just be implemented as the composition of the monad's join function, with its functor's map function, so bind was omitted from the definition.

To make working with monads as easy as possible, Lux comes with it's own implementation of do-notation, through the do macro (located in lux/control/monad).


For example:
 (def #export nat^  
   (Parser Int)  
   (do Parser/Monad  
     [n int^  
      _ (assert (i:>= n 0) "Expected a natural number (>= N 0)")]  
     (wrap n)))  

This is a parser from the lux/meta/syntax module for writing macros. As you can see, the do macro in Lux requires that a monad implementation is explicitly passed to it.
If you need to refer to the monad implementation while inside the do code, you can avoid having to type the full name and just refer to it as %, which is very useful for working with nested do expressions or for invoking monadic map (map%) or any other function to which you need to pass the monad structure.

Here's an example:
 (defsyntax #export (jvm-import [long-name? (tag?^ ["" "long"])]  
                                [class-decl class-decl^]  
                                [members (*^ (import-member-decl^ (second class-decl)))])  
   (do Lux/Monad  
     [kind (class-kind class-decl)  
      =members (map% % (member-import$ long-name? kind class-decl) members)]  
     (wrap (list& (class-import$ long-name? class-decl) (foldL list:++ list:unit =members)))))  

In that example, the call to map% takes Lux/Monad as just %, which is much easier to write.
That macro, by the way, is one of the many macros for doing easy JVM interop in Lux; a topic we'll cover in a future post.

I/O


Now that you know how to use monads in Lux, it's time to talk about a really important one: the IO type (located in the lux/codata/io module).

Most languages have what could be described as informal I/O operations.
That is, I/O operations that are not tracked by the compiler or regarded as anything special.

One of the problems with this is that you cannot signal when a function has side effects or interacts with the real world.
Another sad consequence is that you cannot easily separate side-effecting computations from pure ones, and this often leads to pervasive I/O being performed all over the program, which can lead to many kinds of subtle bugs.

One that has probably bitten every Clojure programmer at some point is when you map some side-effecting function to some list and then wonder why nothing happens, only to later find out that you needed to apply dorun or doall to the list, as the laziness was interfering with the side-effects.

Now... it might sound weird that Lux can have an IO type, considering the JVM does not offer any kind of separation between side-effecting code and pure code.
The way Lux does it is by cheating, and defining the IO type in a very simple way:
 (deftype #export (IO a)  
   (-> Void a))  

Yep. Creatio ex nihilo.
IO is a function from nothing to something.

Very fitting, considering that you'll often get values from the outside world, and not through regular computation.
How that works out is an implementation detail.
Suffice it to say that it works pretty well, and it's often used for doing JVM interop, considering that many methods have side-effects and working with JVM objects should be faithfully modelled by the type-system.

Concurrency


The IO type is really useful, but it does have 1 problem: it works synchronously.
Concurrency often requires communication between different threads or processes, and that communication is often done asynchronously.

A lot of work has been done on designing a paradigm that can properly encode both the properties of concurrent execution, and the needs of programmers.

While working on this release, I considered many options:
  • Futures & Promises
  • Communicating Sequential Processes (CSP)
  • Functional-Reactive Programming (FRP)
  • The Actor Model
  • Software Transactional Memory (STM)

I eventually settled for what I considered the simplest and most primitive of all options (Futures & Promises), and then I simplified further.

That was how I came up with the (Async a) type (which is housed in the lux/concurrency/async module).

. . .

I'm not going to paste here the definition of Async, since it's fairly complicated, as I had to defined a new class with (synchronized) methods to do all of the work.
But suffice it to say that Async is just a promise.

The reason why I didn't name it Promise is that promises tend to be write-only, with futures providing the reading end.

Async can be both written-to and read, with the detail that you can only write to an Async once, and then it's value is set in stone.
Every other write you attempt will fail, though rather than through an exception, it will just give back a false value, to signal failure.
The beautiful thing about Async is that it has a Monad implementation, which means you can do computations with it and even compose Async values.

It's even possible to implement other paradigms on top of Async, and the standard library already ships with a small module for doing Functional-Reactive Programming, located at lux/concurrency/frp.
FRP channels (Chan) are implemented in terms of Async, which means FRP computations and regular async computations can interact seamlessly.

CSP doesn't really offer much above Async, which is why it wasn't implemented.

The Actor Model offers a greater value proposition, but I felt that it was better to leave that for a full-fledged distributed-programming framework, which is why it wasn't added to the standard library.

You might wonder whether it is possible to combine IO operations with Async, and the answer to that is yes (but there is a caveat).
The road to Async is a one-way street.
You can always treat a synchronous operation asynchronously, but you can't do it the other way around, which is very important to note.

That doesn't mean you cannot access the value of an Async from synchronous code, but it does mean that the only resource you have available is polling for it's value until some thread sets it, as you cannot block on it synchronously.

The trick for turning an IO into an Async is to use the future function:
 (def #export (future computation)  
   (All [a] (-> (IO a) (Async a)))  
   ...)  

The code will be run on a new thread and it's output will be stored in an Async, which you will receive.
Finally, I gotta say that while working with Async, Lux will be using a thread pool in order to avoid an inordinate amount of threads to be spawned. It's size will correspond to the number of cores in your machine.

future, however, does not use threads from this pool and always spawns a new one, to avoid the case of some IO operation taking too long and blocking regular Async operations by occupying a pooled-thread.

_____________________________________________________________________________

That's it for today, folks.
Hopefully you now understand how to use Lux for practical stuff involving both I/O and concurrency.
I'll be posting info on some more of the awesome things that came in v0.3.2 later on.

Until then, take care and have fun (... and don't forget to check out the documentation for the Standard Library on the repo, for more detailed info about the lux/concurrency/async and lux/concurrency/frp modules).

See you next time :D