jueves, 5 de noviembre de 2015

Lux's values, types, signatures & structures

Hello, and welcome to the 2nd entry in the Lux Devlog, a blog dedicated to the design and implementation of the Lux Programming Language, an upcoming functional language in the Lisp tradition.

Today I'll touch 4 topics that are essential to the functioning & structure of Lux programs. You'll also learn about some of the unconventional language design involved in Lux and how to use these features.

Without further ado, let's begin!


Lux gives you 5 different kinds of base values and 3 different kinds of composite values. This is besides having access to whatever types of objects the host-platform provides.

First, we've got the Bool type, with only two member values: true and false.

Second, we've got the Int type. Right now, it ranges over the values in Java's long type.

Third, we've got the Real type. It ranges over the values offered by Java's double type.

Fourth, we've got the Char type. It's syntax is a little unconventional; they look like these: #"a", #"\n", #"3".

Fifth, we've got the Text type. It's basically just strings, and has "the same conventional syntax everyone is used to".

Regarding the composite values, Lux offers variants, tuples and records (or sums and products, as they're also called; with records being a kind of product).

Tuples always look the same; just a sequence of values encased in brackets, like this one: [1 true "YOLO"].
There's a special tuple called "unit", it's special because all instances are identical. The reason for that is simple: it's empty [].

Variants are structures containing 2 elements: a tag and a value; like this: (#Some 123)
If the value is unit, you can write something like (#None []), or, for convenience, you can just write #None.
If you're trying to associate more than 1 value with a tag, you'll need to use tuples for that. An example is the #Cons tag: (#Cons [head tail])
However (also for convenience), you can omit the brackets and Lux will infer you're using a tuple when it sees more than 1 element following the tag. E.g. (#Cons head tail)

Finally, we have to talk about records. Their syntax looks like this: {#foo 1 #bar true #baz "YOLO"}
The order in which you specify the fields doesn't matter to the compiler.

Records, I must reveal, are special among all of the value-types that I have mentioned, in that they only exist as a syntactic structure. In reality, all records are translated to tuples.

Now that I have talked about these fairly simple & easy concepts, let's complicate things a bit. Let's talk about why the basic data-types have unconventional names, what are tags and how Lux stores variants & tuples.

What's the deal with the names?

You might find it annoying that I'm using the terms Bool, Int, Real, Char and Text for stuff that already has names within the Java platform (Boolean, Integer, Double, Character and String). The reason for me doing so has to do with portability.

Here's the thing: I'm designing Lux so it's easy to port to other platforms. Because of that, I had to make a few choices regarding what can or can't be done in Lux, and how you do it.

One thing you might have noticed if you were careful is that I never mention that Lux has support for floats (only for doubles). The same is true for bytes, shorts & ints (the 32-bit variety). That must look strange, coming from a language with a JVM compiler.

Well... the first thing is that you can use all of those types; Lux just doesn't give you literal syntax for making them, so you have to resort to casting (there are a bunch of casting special forms in Lux, like _jvm_l2i, a.k.a. long to int).

The second thing is that I can't rely on all platforms offering that much variety in terms of datatypes, so Lux was designed to rely on as few data-types as possible.

The funny thing is that, even the amount of types Lux has is too much for some platforms (consider that JavaScript lacks both chars & ints). However, those 5 basic types seemed like a good enough set.

Finally, there's the issue of the names (sorry that it took me so long to actually talk about the subject of this subsection :P )

Since Lux's 5 basic types are meant to shield you from the details of what the host platform provides, there wasn't any reason to base their names on the types of any one platform (remember that, for instance, JS has Number, rather than Int or Float). So, I got creative, I kept the names short and I fixed (in my opinion) some historical mistakes (floats were renamed as reals because a type's name should reflect it's meaning, rather than it's implementation; and strings were renamed as text because... well... nothing else made sense).

What the hell are tags?

This is a very important question, considering that this is a Lisp we're talking about.
First things first: tags are not Lux's equivalent to Lisp keywords. Lux doesn't even have an equivalent to keywords.

Tags are, in reality, nothing more than glorified ints.
That's it. There you have it, folks. Now you know Lux's deep, dark secret.

Wait... ints? What the heck is going on here!?

Ok, ok, ok. Here's what's going on: you can't use tags before you declare them and when you declare them, they get assigned int values. We'll talk more about declaring tags in the Types section.
Suffice it to say that ints make it possible for Lux to do efficient pattern-matching on variants, since it's faster than comparing text tags, and it also makes it possible to reorder records as tuples, since you always know where every field needs to go.

How does Lux store variants & tuples?

Some of you might be familiar with Scala and think that, just like that language, Lux must create classes every time you define new data-types.
If you think that, you're dead wrong.

You shouldn't feel bad about it, though. I almost went down a similar road in the early days of Lux's design. Eventually, though, I came up with a much simpler model: arrays.

It's very simple, really: n-tuples are just n-arrays, while variants are 2-arrays (index 0 is reserved for the tag, index 1 is for the value).


Types are very interesting in Lux, for a few reasons:
  • They are first-class objects
  • They are structural, rather than nominal
  • If you know what you're doing, you can do awesome things with them

First-class types

The trick to do this is to make types using the very same data-types I discussed in the previous section, and to allow the compiler to evaluate those data-structures and incorporate them into the type-checking phase.

To find out how to make types, let's first check out their type:
 (deftype #rec Type  
   (| (#DataT Text (List Type))  
      (#VariantT (List Type))  
      (#TupleT (List Type))  
      (#LambdaT Type Type)  
      (#BoundT Int)  
      (#VarT Int)  
      (#ExT Int)  
      (#UnivQ (List Type) Type)  
      (#ExQ (List Type) Type)  
      (#AppT Type Type)  
      (#NamedT Ident Type)  

#DataT is for representing host-platform types. It includes the name of the type and any type parameters it might have (if it is, for instance, a generic type in Java).

#VariantT and #TupleT require a list of all the member types (you might wonder where tags enter the mix, but I'll explain that soon enough).

All functions in Lux are curried, so they only take a single argument. That's the reason behind #LambdaT only having a type for it's input, and a type for it's output.

#BoundT represents a De Bruijn index, useful when working with both univeral and existential quantification (what #UnivQ and #ExQ are for).

Also, the lists of types in #UnivQ and #ExQ represent their captured context (types with multiple parameters are also curried and require context for that).

#VarT is a type variable. The int it holds serves to look up types inside a context when the compiler is doing type-checking.

#ExT is for existential types. Instances can only match when they have the same number.

#AppT is for applying universally-quantified types to parameters.

Finally, #NamedT is a convenient tool for giving names to types (helpful for debugging and documentation purposes).

Now, you might be puzzled that there's a #rec tag in from of the Type definition. The reason is that types are not recursive by default and #rec allows deftype to perform a minor cosmetic rearrangement on your type to make it recursive.

Another thing that deftype does is to (automatically) wrap your type definitions inside #NamedT so you don't have to worry about it.

A final thing that it does is declare the tags referenced in your type. The int values each would get would depend on the order in which they appear, beginning with index 0 (tags cannot have negative values). The next time Lux sees the #TupleT tag, it will know it actually means 2.

Now, quick trivia for the curious:
Q: Does that mean that I can actually use ints instead of tags when writing variants?

A: Yes. In fact, if you check out the lux.lux (prelude) file, that's exactly what I do at the beginning to define types, as the tags for Type & List aren't defined at the very beginning.

Structural, rather than nominal types

This should be obvious just from looking at the definition of Type.
An interesting consequence of this is that, since types are just Lux data-structures, I can pattern-match against them and get useful information.

Doing awesome stuff with types

The Lux compiler stores a lot of useful information regarding types. For instance, you can know the type of every definition in any module that has been compiled, and you can also ask which tags are associated to which types.

You'll see some examples of what that enables in the next section.

Signatures & Structures

Lux handles polymorphism the same way that ML does, via signatures and structures.
Think of signatures as interfaces that describe what a suitable implementation should provide.
Structures are those implementations.

Here is one example:
 (defsig #export (Functor f)  
   (: (All [a b] (-> (-> a b) (f a) (f b)))  
 (defstruct #export List/Functor (Functor List)  
   (def (map f fa)  
     (case fa  
       #;Nil          #;Nil  
       (#;Cons a fa') (#;Cons (f a) (map f fa')))))  
One difference between ML languages and Lux is that ML separates signatures and structures from the rest of the language, whereas Lux piggybacks on plain-old types and values in order to implement them.

How? By using record/tuple-types as signatures and records as structures.
The conversion is performed by the defsig and defstruct macros, which only serve to provide a more pleasant syntax for working with signatures & structures.

Without the sugar they provide, our previous example would look like this:
 (deftype (Functor f)  
   (& #map (All [a b] (-> (-> a b) (f a) (f b)))  
 (def List/Functor  
  (Functor List)  
  {#map (lambda map [f fa]  
          (case fa  
            #;Nil          #;Nil  
            (#;Cons a fa') (#;Cons (f a) (map f fa'))))  

Lux offers many ways to work with structures in an easy way, depending on what you're trying to do.

If you want to work with a structure inside a local scope, use the using macro:
 (using List/Monoid  
   (foldL ++ unit list-of-lists))  
If you want to work with a structure in several places throughout a module, use the open macro:
 (open List/Fold)  
If you only want to use a specific element in a structure, use the :: macro:
 (:: Text/Eq (= x y))  
 (:: List/Monoid unit)  
Also, remember when I told you that the Lux compiler stores a lot of type information? Those macros access that information to do their magic. They find out what are the names of the necessary tags and create all the code to generate lexical bindings, full-blown definitions, or just simple access code.

Bonus content: get@, set@ and update@

Oh... I almost forgot. There's one last bit I want to share with you.
Remember records? They're really nice and all, but in order to access their contents you have to pattern-match on them.

As you can imagine, pattern-matching on an record with 8 fields is not going to be nice, even if you do it in tuple-form.

For that reason, there are 3 simple macros that take care of a lot of the complexity for you.

get@ allows you to access a single field from a record:
 (get@ #;modules state)  
set@ allows you to set a single field from a record:
 (set@ #pl;name "Lux" lang)  
update@ allows you to transform a single field from a record:
 (update@ #;seed (i:+ 1) state)  
Again, these macros access the type information in the compiler to generate all the pattern-matching and record-building code necessary to perform these operations.


I'm sorry that the post was so long, but I wanted to be thorough.

Next week is going to be pretty awesome, as I'm going to talk about custom-pattern matching & advanced macrology in Lux.

6 comentarios:

  1. Nice write-up, isn't it better to keep this in the github repo as well ?

    1. I'll be linking the blog to the repo, but I like publishing to the blog because that way people can talk back and offer their criticism, ideas and ask questions if anything is unclear.

  2. Any reason for the "unconventional" syntax for Char ?

    1. I was trying to minimize the amount of special characters in Lux.

      Right now, Lux only has 3 special characters: ", # and ;
      That's besides delimiters () [] {} and whitespace.

      If I had added the common syntax for chars, the quote ' would have been claimed. If I had gone with Clojure's syntax, I would have lost the backslash \

      To keep as many free chars as possible, I went with the current char syntax.

      The syntax is still subject to change, depending on the feedback of people, but that's the rationale for it.

    2. Besides, since chars are probably the least used data-type in programming, I didn't think people would mind the new syntax a lot.

    3. Thanks.

      Well, I agree that chars are least used.