Baby Steps

A blog about programming and my daughter, both of which are in their infancy.

Versioning Considered OK

Marijn pointed out to me that our current setup should avoid the worst of the versioning problems I was afraid of. In the snapshot, we package up a copy of the compiler along with its associated libraries, and use this compiler to produce the new compiler. The new compiler can then compilers its own target libraries, thus avoiding the need to interact with libraries produced by the snapshot.

Of course, I should have known this, since I have relied on this so that I can changed the metadata format without worrying about backwards compatibility. That’s what I get for writing blog posts late at night.

Anyhow, the good news is that we are able to serialize and deserialize AST trees faithfully, and I have written (but not tested) the code to serialize the side tables. I am now working on the deserialization code and the pass which will instantiate sources to be inlined.

CCI and Versioning

I’ve been busily implementing the Cross-Crate Inlining stuff, but one area I haven’t looked at much is versioning. In particular, if we are going to be serializing the AST, we need a plan for what to do when the AST changes. Actually, if inlining were only to be used for performance, we wouldn’t really need to have a plan: we could just not inline when the AST appeared to be stored in some form we don’t understand. However, if we fully monomorphize, we will not have that luxury: without type descriptors, the only way to compile cross-crate, generic calls will be by inlining.

This because particularly important because Rust is self-hosting. In particular, the compilation process begins by compiling the standard libraries for use by later stages. But if we change the form of the AST, the snapshot compiler that bootstraps our compilation will still be generating the older AST—so we had better have a way of reading it!

I am not really sure what’s the best way to handle this. I had always assumed that one we reach 1.0, we would just keep a version of that AST module around forever, and convert to the newer AST formats. This is a somewhat painful but acceptable price to pay, so long as the set of versions is not too high. But this scheme looks less attractive if we have to do it for every field that we add to the AST.

In addition, there is another wrinkle I hadn’t really thought about: alongside the AST, we also store the results of various analyses which are used during code gen. For example, there is an analysis that indicates whether a variable is mutated, or whether a particular copy can in fact be implemented with a move. If new analyses are added in the future (and they will be), we won’t have results available for older crates, so we will have to be sure we can always get by without those results. In most cases, though, these results are just used to generate faster code, so we can always generate less efficient code without a problem. But it is something that we nonetheless have to be aware of—and it affects how the side table information is stored. For example, keeping a set of variables that we can optimize better is good, but keeping a set of variables for which we must be conservative is bad. This is because if the set leads to optimization, we can always just use an empty set without affecting correctness. But anyhow this can all be handled with some code.

Anyway, what would be nicest is to have attributes into the AST to indicate what kind of values should be provided for fields that are missing and so forth. This would mean that the serialization code would have to get somewhat smarter, so that it can cope with things like a record with fields that may or may not be present. This is where having automated the serialization process should really pay off, though, since I can make these adjustments once and have all the code be automatically adjusted. Still, I’d have to figure out how to best encode things so that I can figure out what data is present and what is not, and what kinds of changes we should accept.

One note: these kinds of “default-providing” attributes can be dropped once a new snapshot is generated, except in cases where they are required for backwards compatibility to some publicly supported release.

I had rather hoped to avoid these kinds of questions, at least not yet. These seem like detailed questions that are the domain of a specialized library. But I think that they will be hard to avoid so long as we are bootstrapping, as we will always have to deal with executables generated based on the older AST definition.

Returning Refs

One commonly requested feature for regions is the ability to return references to the inside of structures. I did not allow that in the proposal in my previous post because I did not want to have any region annotations beyond a simple &. I think, however, that if you want to allow returning references to the interior of a parameter, you need a way for the user to denote region names explicitly.

The big problem with returning references to the interior of data structures is ensuring the lifetime and validity of that reference. I think it can be supported in some cases but not particularly well in general. It will work ok for structures allocated on the stack but be quite limited for structures allocated in the heap, unless we have strong support from the garbage collector. Another scenario where it could work well would be user-managed memory pools (which we probably do want to support eventually).

Simplest case

Let’s start with the simplest case:

type T = {mut f: uint};
fn get_f(t: r&T) -> r&mut uint { &t.f }

Here we have a function that returns a pointer to a field of its parameter. On the callee side, I think this is fairly straightforward. You name the region in which the parameter is located (r) and say that the return value is a mutable pointer to uint in the same region (r&mut uint). The notation may leave something to be desired, but the concept is hopefully clear enough.

Returning references to the stack

But what about the caller? Again we’ll start with the simplest case, where get_f() is invoked with data that lives on the stack:

fn caller_on_stack() {   // let region `b` refer to the function body
    let t = &{mut f: 3}; // type of t: `b&T`
    let p = get_f(t);    // type of p: `b&mut uint`
    *p = 5;              // legal, `b` region is in scope.
}

In this case, there is no concern about memory management per se. The returned pointer is in the region b. The type checker will enforce the rule that if a function returns a reference type, the reference must be in scope (note: I haven’t thought through what this means for generic types with the ref kind, but I guess they can be handled one way or another). Generally, because the parameter regions must be in scope for the call, the returned region will be in scope too—but we’ll see that this does not always hold.

Returning references to the heap, take 1

Let’s move on to a more complicated case where the data being accessed is in the heap. I’ll first discuss how it could work in some theoretical world and then show how this can cause problems:

fn caller_on_heap() {    // (note: not fully sound)
    let t = @{mut f: 3}; // type of t: `@T`
    let p = get_f(t);    // type of p: `@mut uint`
    *p = 5;              // legal, `@` region is in scope.
}

The only difference is that the variable t is in the heap region @. Now the returned pointer is considered to be in that region as well, and so the assignment is permitted. This seems reasonable at first, but it is of course incompatible with ref counting (p is not a ref-counted entity). If we moved to a garbage collector which could handle interior pointers, this would be reasonably safe. Interior pointer support is needed to accomodate cases where p gets returned, like this one:

fn caller_on_heap_r() -> @mut uint {
    let t = @{mut f: 3};
    ret get_f(t);
}

So is there anything we can do that is compatible with ref. counting? The answer is “sort of”.

Returning references to the heap, take 2

One thought is that we say that @ is not actually a region. It’s never been the best fit, due to ref counting requirements and implicit headers. Instead, we say that regions always refer to some block-scoped slice of the program execution. The most common case would be a block in the program, but in some cases the region might be “the time in which a given expression is evaluated” and so forth (as an aside: this is basically what I called an “interval” in my thesis…minus all the parallel parts).

An @T pointer could then be implicitly coerced into an r&T pointer where the region r is the biggest region for which the type checker can guarantee the validity of the @T pointer. So, we can now revisit the previous examples. The first example works more-or-less the same as before:

fn caller_on_heap() {    // region of the block is `r`
    let t = @{mut f: 3}; // type of t: `@T`
    let p = get_f(t);    // type of p: `r&mut uint`
    *p = 5;              // legal, `r` region is in scope.
}

The differences here lie in the type checker. The pointer p is no longer in the @ region but rather in the region r corresponding to the function body. The reason that the region r could be safely used is that t is an immutable local variable (I am assuming issue #1273 is implemented…working on that right now). This means that the memory will remain valid as long as t is in scope.

EDIT: There is no reason to impose the following restriction. See discussion below. This implies that if t were mutable, the example would not work In that case, the validity of the memory to t could not be guaranteed for the entire block, as t could be overwritten. In other words, a program might do something like this:

fn caller_on_heap() {
    let mut t = @{mut f: 3};
    let p = get_f(t);
    t = @{mut f: 22};    // original memory is now freed
    *p = 5;              // memory error.
}

However, such a program would not type check. The reason is that, because t is mutable, when t was coerced to a region type, a narrow region s would be assigned. The region s would correspond to precisely the call to get_f(). The result of get_f() would therefore have type s&mut uint, but the region s would be out of scope after get_f() returned, and so a type error occurs (this is that rule I mentioned before: when returning a reference, the region must be in scope).

As an aside, coercing a unique pointer ~T into a region would work similarly to the second case: that is, the resulting region is always a very narrow one. It does not matter if the variable storing the unique pointer is immutable or not. The reason is that the local variable is being borrowed for the lifetime of the region. If we assigned a large region, the local variable would be inaccessible after the call, because we would not be able to guarantee the uniqueness invariant, as there might be escaped region-typed pointers into its interior. Anyway, I don’t want to go into details about unique pointers in this post as it’s already plenty long.

EDIT: pcwalton pointed out to me that there is no reason to treat mutable variables specially. Instead, we can basically just increment the ref count whenever we coerce an @T to a r&T. The region r would still be the region of an enclosing block b (probably the innermost one, or perhaps the one where the variable is declared) and we would release the reference upon exiting the block b can still optimize immutable variables to not increase the reference at all because it is unnecessary. I rejected this approach initially because I was thinking that we would want to keep it very predictable when references would be dropped, but that’s not actually an important property. Garbage collection traditionally does not define precisely when dead memory will be reclaimed, after all (and, as graydon correctly points out, RC+CC is garbage collection). Note though that borrowing unique pointers probably still ought to use a narrow region corresponding to the call or alt statement in which the borrow occurs.

But does it scale?

So, I think this system I showed above is reasonable. I think it has a clear story, too, which is important to me, because it helps me believe that it is sound even though I haven’t made any kind of formal proof or argument. The story is basically that

  • a region is always a block-scoped slice of the dynamic execution;
  • when a @T is coerced into a region pointer, the result is the largest region for which the validity of the @T pointer can be guaranteed;
  • when a ~T is coerced into a region, the result is a narrow region corresponding to just the duration of the borrow (I haven’t gone into details on this… perhaps in a later post)

However, in all of these examples I showed only the simplest case, where a pointer was returned directly into one of the parameters. A more realistic scenario is probably returning some interior pointer to a record reachable from a parameter. For example, a common C trick is to have a hashtable lookup not return the value which was found but rather a pointer to the value. This allows the caller to update the value without having to use any further API calls. Let’s look at that example: we will find that the regions don’t scale up.

The prototype for such a function might be:

fn get_value_ptr<K,V>(m: r&map<K,V>, k: &K) -> option<r&V> {
    ...
}

This looks reasonable, but once we start digging into the details things don’t work so well. Let’s assume a hashmap with chains for each key:

enum bucket<K,V> = {k: K, mut v: V, mut next: option<@bucket<K,V>>};
enum map<K,V> = {buckets: [mut option<@bucket<K,V>>], ...};

fn get_value_ptr<K,V>(m: r&map<K,V>, k: &K) -> option<r&mut V> {
    let bkt_idx: uint = find_bucket_index(m, k);
    alt search_bucket_chain(m.buckets[bkt_idx], k) {
        none { none }     // no bucket with key k
        some(bkt) {       // found a bucket with key k
                          // bkt has type @bucket<K,V>
            some(&bkt.v)  // (*) error
        }
    }
}

The example should be straightforward if you’ve ever coded up a hashtable. The tricky part is marked with a (*): once we’ve found the bucket containing the value, we attempt to return a pointer to its interior. But this is not safe, of course! The caller expects a pointer in the same region as the map: that is, with the same lifetime. There is no way for get_value_ptr() to guarantee that bkt is valid as long as the map m is valid. The caller might, for example, remove the key from the hashtable, thus invalidating the bucket. This error manifests itself as a type error, because the type of &bkt.v is a region pointer corresponding to the region of the alt, not the region r which was provided as a parameter.

Could this example be made to work?

I don’t think it can without a lot of work. You could imagine that the caller would consider the map “borrowed” while the returned value is in scope and thus try not to modify it. But there may be aliases to the map, so there are still no guarantees.

I think there are two ways to handle an example like that in a safe fashion:

  • Improve the GC, as described before
  • Re-write the example into a top-down style.

The second “solution” is what we support today, and what is supported by the “Regions Lite” idea I described before. You would write:

fn get_value_ptr<K,V,R>(
    m: r&map<K,V>,
    k: &K,
    f: fn(option<&mut V>) -> R) -> R {
    ...
}

In other words, the get_value_ptr() method will call the closure f with the result of the lookup. In this way, the get_value_ptr() method itself can guarantee that the reference is valid.

Is it all worth it?

It may still be worth having labeled region parameters and supporting a limited form of returning references, but I am not sure. I feel like writing things in a “top-down”, CPS-ish style is perhaps just a better solution—it’s certainly more widely applicable.

Regardless, I do like the idea of formalizing regions as representing a slice of time, and defining coercions from @T and ~T. I think that feels intuitively sensible and I think it can be explained in a reasonable way.

There is however one potentially important future case where the simple form of returning references would be enough. If we had user-defined memory pools, then it would be possible to have large, dynamically allocated, multi-object data structures that all lie within one region (the memory pool). A map, for example, might have an associated arena. This scheme would then work. When you think about it, treating @ as a region and having a GC which can handle interior pointers is really the same thing as this scheme, from an abstract point of view.

Another important thing is that I think there is a path from all of these more limited forms to the more general ones. If we say, for now, that @ is not a region and we do not support labeled parameters, we can add both of those later. Existing programs will continue to type check.

Regions-lite…ish

I was talking to brson today about the possibility of moving Rust to a regions system. He pointed out that the complexity costs may be high. I was trying to make a slimmer version where explicit region names were never required. This is what I came up with. The truth is, it’s not that different from the original: adding back region names wouldn’t change much. But I’m posting it anyway because it includes a description of how to handle regions in types and I think it’s the most complete and correct proposal at the moment.

The summary

You would have four kinds of Rust pointers:

@MT --- pointers to task-local, boxed data
~MT --- pointers to unique data
&MT --- safe references
*MT --- unsafe, C-like pointers

Here the M refers to a mutability qualifier (default, mut, or const) and T refers to a type.

&MT types, called references, are the new addition. A reference is a pointer which always points at memory whose validity is guaranteed by some outer stack frame. The idea is that a caller can give a callee a reference to some memory that the callee may use but which may not escape the callee. This memory may be on the caller’s stack frame or it may be a reference into the task or exchange heaps which the caller is going to keep valid. This guarantee is upheld by the type system.

Reference types may appear anywhere. However, if they are used within another aggregate type such as a record, enum, or class, they “infect” their container so that it too is considered to be a reference. This is done by introducing a new kind into the type system, ref (actually, this is sort of a negative kind: more formally, there is a kind heap which contains all types but for those that may transitively include a reference). This kind may or may not be user visible: see the section on generics for a discussion of the options.

Coercion between pointer types

The type &MT is not a supertype of @MT and ~MT, but it is coercable. In the case of @, we could probably make it a true subtype, but at the moment a box pointer includes a header, ref count, etc and so is not binary compatible with a & pointer, which would be just a pointer to the box body. If we changed our representation so that @ pointers point directly to the box and the header is stored at a negative offset, then we could allow @T to be a subtype of &T.

The type ~MT, however, can never be a subtype. ~ is not a region. Rather, the data at the other end of the pointer logically belongs to a region of its own. So we can allow ~MT to be coerced to &MT, but the region will be a fresh region, and access to the ~MT pointer must be prevented for the scope of that fresh region. This is called “borrowing” a unique pointer. It is only possible for “unique paths”, where a “unique path” is a path of identifiers a.b.c...z that is the only path by which the unique variable can be reached (in practice, this means that a must be a local variable and all of the fields b...z must have unique or interior type). All of the prefixes of the unique path must be considered borrowed as well. I am not going into great detail on the handling of uniques here: it should be quite similar to what we have today in practice.

Tracking validity of references

Although the user never needs to write it explicitly, each instance of a reference type is internally associated with a region. There is one region for every block in the code. In addition, each function/method has a special region called caller. For simplicity I do not consider classes nor impls; it is relatively straightforward to extend the system to such cases.

Regions are arranged into a tree derived from the structure of the blocks in the source code. The region caller is a superregion of all the internal regions to a function.

In the implementation / formal version of the type system, these regions are represented explicitly. So a user-written type &MT expands to a type r&MT where r is the node id of the block or of the function itself (in the case of the caller region). The region r is derived from the position where &MT appears and by inference:

  • if &MT appears within a parameter list, r is the caller region.
  • if &MT appears on the type of a local variable, inference is used.
  • if &MT appears in a type declaration, see section below.

In general, the type a&T is a subtype of b&T if b is a subregion a. The reason is that, because a is a superregion of b, the pointer a&T is always valid whenever the region b is valid.

References in type declarations

The rules for which region is assigned when the user writes &MT omitted one important case: what happens when this type appears in a type declaration? Consider the following example:

type crate_ctxt = {
    mut_map: &map<...>,
    node_map: &map<...>,
    another_map: &map<...>,
    yet_another_map: &map<...>
};

In such cases, the region for the internal references will be assigned when the type is used. For example:

fn trans_foo(ccx: &crate_ctxt) {...}

Here, the type of ccx will be expanded to:

caller&{mut_map: caller&map<...>, node_map: caller&map<...>, ... }

In effect, types which contain references (transitively) are implicitly parameterized by a region parameter. There is only one such parameter. When the type is instantiated in a specific context, the value for that parameter is provided based on the context.

Taking the address of variables and so forth

The unary operator &M can be be used with both lvalues and rvalues. When used with an lvalue, it takes the address of the lvalue. The mutability qualifier provided must agree with the mutability of the lvalue. When used with an rvalue, it creates temporary space on the stack and copies the rvalue into it.

Here is an example of taking the address of lvalues:

fn foo() { // region for this block is "r"
    let x = 3;
    let mut y = 4;
    let px1 = &x;       // OK: yields type r&int
    let px2 = &const x; // OK: yields type r&const int
    let px2 = &mut x;   // Error: x is immutable
    let py1 = &y;       // Error: y is mutable.
    let py1 = &const y; // OK: yields type r&const int
    let py1 = &mut y;   // OK: yields type r&mut int
}

Here is an example of taking the address of rvalues:

fn foo() { // region for this block is "r"
    let p1 = &{x: 3, y: 4}; // OK: yields type r&{x:int,y:int}
    let p2 = &mut {x: 3, y: 4}; // OK: yields type r&mut {x:int,y:int}
}

Limitations on references

In order to guarantee that reference types do not escape the callee, the type system imposes some limitations:

  • Reference types may not be returned.
  • Reference types may not be closed over (copied/moved into a closure or interface instance).
  • Generic type variables cannot be bound to reference types unless the generic type variable is of the ref kind.

I will cover each restriction in turn. First, though, I want to more precisely define what the type checker considers to be a reference type. The definition is inductive:

  • a reference &MT;
  • a type whose definition may contain a reference (e.g., @&T or {x: &T}, or a class with a field of reference type);
  • a generic variable with bound ref.

Reference types may not be returned

The danger here is that the callee may pass back a reference to the caller that is no longer valid. This is relatively straightforward to prevent: do not allow the return type of a function to be a reference type.

Reference types may not be closed over

It is not allowed to copy a reference type into a boxed/unique closure nor is it allowed to cast a reference type to a boxed or unique iface. The reason is that these are the points where the type system loses the ability to track the constituent types and so we cannot distinguish a fn@() that closes over a reference type from other fn@() types.

Generic types

There is of course a concern that the limitations on reference types might be circumvented through the use of generics. This is prevented through the use of a type kind ref. A generic type variable may not be bound to a reference type unless it includes the bound ref. Moreover, any generic type variables bound by ref are considered reference types and therefore must obey the above restrictions.

A note on variance

In general, ptr types like &MT or @MT are covariant in T if M is not mut. This is different from references today which are always covariant in T; the current behavior is what leads to the type hole pointed out in the mailing list.

Using Futures in the Task API

Brian pointed out to me a nice solution to the Task API problem that I have overlooked, though it’s fairly obvious. Basically, I had rejected a “builder” style API for tasks because there is often a need for the child task to be able to send some data back to its parent after it has been spawned, and a builder API cannot easily accommodate this. Brian’s idea was to encapsulate these using futures. It’s still not perfect but it’s better I think and more composable than my first, limited proposal. It still requires that the actor pattern be a separate module.

For those of you who don’t care about the intimate details of Rust task generation, sorry, this is a kind of a “document the idea” sort of post. I’m sure I’m also brushing over some of the Rust-specific context that might be needed to make the examples easy to understand.

Builder-based idea

Our basic task builder data types looks like:

type task_builder = ~{
    stack_size: uint,
    notify_chan: comm::chan<result>,
    ... various options ...
    gen_task_body: fn@(fn~()) -> fn~
};
type task_id = uint;
enum task_result { tr_success, tr_failure };

Basically it’s a struct with a bunch of options. The most interesting part is the gen_task_body() field, which contains a closure that—given the user’s task body—will return the real task body. This allows us to accumulate transformations on the body.

Creating a builder implicitly sets it up with the default options:

fn mk_task_builder() -> task_builder {
     fn identity(f: fn~()) { f }

     ret ~{ ... default_options ..., gen_task_body: identity };
}

Then people can add impl methods to configure the builder. Here is one simple example:

impl task_builder for task_builder {
    fn set_stack_size(ss: uint) {
        self.stack_size = ss;
    }
}

Here is an example of how we could make tasks that send a message to a channel when they fail:

impl task_builder for task_builder {
    fn notify_chan_on_failure(ch: chan<task_result>) {
        self.gen_task_body = fn@(body: fn~()) {
            fn~[copy ch, move body]() {
                let _ = resource send_final_msg {
                    comm::send(
                        ch,
                        if rt::is_failing {tr_failure} else {tr_success})
                }
                body();
                rt::await_all_children();
            }
        };
    }
}

The effect of this code is to replace the gen_task_body closure with one that will wrap the user’s supplied body. The wrapper will await all children created by the body and send a message at the end. The message will indicate whether it failed or not. The final message send is written using an inline resource (no syntax exists for this, but I just didn’t want to write out the full resource declaration that is currently required).

This is a pretty basic mechanism. It could be wrapped up to be a bit more widely applicable using wrappers like so:

impl task_builder for task_builder {
    fn make_joinable() -> future<task_result> {
        let port = comm::port();
        self.notify_port_on_failure(port);
        future::from_port(port)
    }

    fn notify_port_on_failure(p: port<task_result>) {
        notify_chan_on_failure(comm::chan(ch));
    }
 }

Finally, the spawn method looks like this:

fn spawn(-builder: task_builder, body: fn~()) -> task_id {
    let body = builder.gen_task_body(body);
    ret rt::spawn(self, body); // do the *actual* spawn
}

One interesting design choice was to make the task_builder unique and have it be consumed by spawn. The idea was to prevent people from using the same configuration to launch multiple tasks. This is not safe in general though it may be in some cases: people can still explicitly copy the builder if desired.

Wrapping spawn: The actor module

Sometimes there are cases, like the actor module, where you want to spawn the task using a different sort of body than a fn~(). The actor module, for example, expects a body fn~(port<A>) that is provided with a port. The idea is that you will spawn a task that creates a port for itself and then sends a channel to that port back to its creator. This is effectively a wrapper around task::spawn:

mod actor {
    enum actor<A> = { t_id: task::task_id, ch: chan<A> };
    fn spawn<A>(-builder: task_builder, body: fn~(port<A>)) -> actor<A> {
        let tmp_port = comm::port();
        let tmp_chan = comm::chan(port);
        let t_id = task::spawn(builder, fn~[copy tmp_chan; move body]() {
            let port = comm::port();
            comm::send(tmp_chan, port);
            actor_body(port);
            body();
        };
        let ch = comm::recv(tmp_port);
        actor({ch: ch, t_id: t_id})
    }
}

It would actually be possible to move the actor body stuff into the builder, but the resulting module is a little weird and error-prone to use. Basically it ends up being another kind of thing that wraps gen_task_body(), and the danger is that if the user doesn’t invoke the notify wrappers first, you get in a situation where the actor code is executing before the notification wrappers get setup. Probably not what you wanted. So we decided it’s better to separate out spawn functions, which provide the real body of the task, from configuration wrappers. There may still be ordering dependencies between wrappers but it’s no doubt fewer.

The future module

All of this kind of assumes a simple future module which I think looks like:

mod future {
    enum future<A> = {
        mutable v: either<A,port<A>>
    };

    impl future<A> for future<A> {
        fn get() -> A {
            alt self.v {
                either::left(v) { v }
                either::right(p) {
                    let v = comm::recv(p);
                    self.v = either::left(v);
                    ret v;
                }
            }
        }
    }

    fn from_port<A>(p: port<A>) -> future<A> {
        future({v: left(p)})
    }

    fn from_value<A>(v: A) -> future<A> {
        future({v: right(p)})
    }
}

To make futures more convenient to use, a function like

mod future {
    fn spawn<A>(f: fn~() -> A) -> future<A> {
        let port = comm::port();
        let chan = comm::chan(port);
        let builder = task::mk_task_builder();
        task::spawn(builder, fn~[move f, chan]() {
            let v = f();
            comm::send(chan, v);
        });
        from_port(port)
    }
}

would let you write code like:

let f = future::spawn {|| some_expensive_computation() };
...
let r = f.get();

Horray.

Task API

One of the thorny API problems I’ve been thinking about lately is the task API for Rust. I originally had in mind this fancy and very flexible aproach based on bind. When I spelled it out I found it was very powerful and flexible but also completely unworkable in practice.

So here is a more limited proposal. There is a core task API that looks something like this:

enum task = uint; // wrap the task ID or whatever
type opts = { ... };

fn default_opts() -> opts;

fn spawn(opts: opts, body: fn~()) -> task;

The options struct will let you control simple things like stack size and so forth.

On top of this there will be several patterns. These probably live in separate modules. For example, an actor pattern which starts up a task with its own mailbox:

enum actor<A> = {
    task: task::task,
    chan: comm::chan<A>
}

impl actor<A> for actor<A> {
    fn send(msg: A) { comm::send(self.chan, msg) }
}

fn spawn<A,R>(
    opts: options,
    body: fn~(p: comm::port<A>) -> R) -> task<A,R> {

    let pp_tmp = comm::port();
    let ch_tmp = comm::chan(p);
    let t = task::spawn(opts) {||
        let p = comm::port();
        comm::send(ch_tmp, comm::chan(p));
        body(p);
    };
    let ch = comm::recv(pp_tmp);
    actor(t, ch)
}

Or a futures-like pattern that spawns off a task and allows you to invoke a get() method to read its result:

enum future<R> = {
    task: task::task,
    mutable rslt: either<R, comm::port<R>>
}

impl future<A> for future<A> {
    fn get() -> R {
        alt self.rslt {
            either::left(r) { r }
            either::right(p) {
                let r = comm::recv(self.port);
                self.rslt = either::left(r);
                ret r;
            }
        }
    }
}

fn spawn<R>(
    opts: options,
    body: fn~() -> R) -> future<R> {

    let pp = comm::port();
    let ch = comm::chan(p);
    let t = task::spawn(opts) {||
        comm::send(ch, body())
    };
    future(t, either::left(pp))
}

The downside of this approach is that it is not particularly composable (you can’t, for example, combine a future and actor together in one task without writing your own code). However, it’s easy enough to understand and it’s probably good enough.

Hmm, that last sentence makes me wonder if classes and traits could help here. Oh well, a thought for another day.

Auto-serialization in Rust

I’ve been working on implementing Cross-Crate Inlining. The major task here is to serialize the AST. This is conceptually trivial but in practice a major pain. It’s an interesting fact that the more tightly you type your data, the more of a pain it (generally) is to work with in a generic fashion. Of functional-ish languages that I’ve used, Scala actually makes things relatively easy by using a combination of reflection and dynamic typing (interfaces like Product come to mind).

Anyway, Rust does not (yet?) have reflection, but I have been working on a program which will autogenerate the serialization code for our AST based on the type definitions itself. Normally, I would probably do this with some Python program and a bunch of hacky regular expressions. But instead I am taking advantage of one of Rust’s nicer (and somewhat unusual, although becoming less so) features: the fact that the Rust compiler is itself a library. (An aside: I plan to implement this serialization code as a syntax extension or macro once those systems mature.)

To use serializer, you provide it with a crate file and a set of type names. It will then generate Rust code that serializes instances of those types. Internally, it invokes the compiler to parse and type check the crate, using the compile_upto() function, which allows you to compile a given input up until a certain point (in this case, up until the type checking phase has completed).

An aside: This is the point where the beauty of crate files becomes more apparent: a crate is a self-contained specification that not only contains a listing of the source modules and so forth, but also the external crates that are required, default compilation options, etc. Having all of this mess encapsulated in a crate means that it is trivial for a tool like serializer to recreate the compilation environment for your package: just provide it with a crate file. If this were a C program, you’d also have to supply a random smattering of gcc options, which you would in turn have to figure out how to extract from your makefile, not to mention the makefiles from external packages that you are using. Ugh.

Once serializer has parsed and type-checked your source, it is provided with a crate AST and a type context (ty::ctxt). Using these two things, it’s fairly straightforward to locate the definitions for the types we are supposed to serialize and walk over them, generating code as we go.

The actual code works by walking ty::t instances. ty::t is the type used in the Rust compiler to represent types. This is distinct from ast::ty, which is the syntax tree that represents a type. ty::t is modeled after the type system in the abstract, which makes it easier to work with. The other reason to walk ty::t instances and not ast::ty is that there is no AST available for types defined in external crates (such as option::t, defined in libcore).

Basically, for each unique ty::t that we encounter we generate a function of the form:

fn serialize<C: serialization::ctxt>(cx: C, t: T) {
    ...
}

Here T is the type represented by the ty::t. The variable cx is a serialization context. This is defined using an interface serialization::ctxt, which looks like so:

mod serialization {
    iface ctxt {
        fn emit_u64(x: u64);
        fn emit_i64(x: i64);

        fn emit_record(f: fn());
        fn emit_field(f_name: str, f_id: uint, f: fn());

        fn emit_enum(e_name: str, f: fn());
        fn emit_variant(v_name: str, v_id: uint, f: fn());

        ...
    }
}

So, for example, the serialization function for a type {x: uint, y: uint} would look something like:

fn serialize1<C: serialization::ctxt>(cx: C, &&v: {x: uint, y: uint}) {
    cx.emit_record {||
        cx.emit_field("x", 0) {||
            cx.emit_u64(v.x as u64);
        }
        cx.emit_field("y", 1) {||
            cx.emit_u64(v.y as u64);
        }
    }
}

Now, to deserialize, we generate similar code for a deserialization interface:

fn deserialize1<C: deserialization::ctxt>(cx: C) -> {x: uint, y: uint} {
    cx.read_record {||
        let x = cx.read_field("x", 0) {||
            cx.read_u64() as uint
        }
        let y = cx.read_field("y", 1) {||
            cx.read_u64() as uint
        }
        {x: x, y: y}
    }
}

The deserialization interface looks like:

mod deserialization {
    iface ctxt {
        fn read_u64() -> u64;
        fn read_i64() -> i64;

        fn read_record<T>(f: fn() -> T) -> T;
        fn read_field<T>(f_name: str, f_id: uint, f: fn() -> T) -> T;

        fn read_enum<T>(f: fn(uint) -> T);

        ...
    }
}

A somewhat more interesting case concerns enums. Let’s consider the enum option<R> where R is the record type we’ve been working with. It would be serialized as:

type R = {x: uint, y: uint};
fn serialize2<C: serialization::ctxt>(cx: C, &&v: option<R>) {
    cx.emit_enum("std::option::t<R>") {||
        alt v {
            none {
                cx.emit_variant("std::option::none", 0u) {||
                }
            }
            some(r) {
                cx.emit_variant("std::option::some, 1u) {||
                    serialize1(cx, r); // link to the previous code we saw
                }
            }
        }
    }
}

The deserializer meanwhile would look like:

fn deserialize2<C: deserialization::ctxt>(cx: C) -> option<uint> {
    cx.read_enum {|v_id|
        alt v_id {
            0u { // std::option::none
                std::option::none
            }

            1u { // std::option::some
                std::option::some(deserialize1(cx))
            }

            _ {
                fail #fmt["Unexpected discriminant %u for option::option",
                    v_id];
            }
        }
    }
}

Breaking Out Is Hard to Do

One of the things I’d like to do for the iteration library is settle on a convention for breaking and continuing within loops. There is a bug on this issue (#1619) and it seems like the general approach is clear but some of the particulars are less so. So I thought I’d try to enumerate how code will look under the various alternatives and then maybe we can settle on one: they’re all fairly similar. Who knows, maybe just writing things out will settle my mind.

Alternative #1: The loop_ctl type.

This was my original proposal. Basically, there will be a type called iter::loop_ctl defined as so:

enum loop_ctl { lc_break, lc_cont }

I wanted to design something that felt as much like normal loops as possible. So my thought was that, for sugared closures where the return type was loop_ctl, the compiler could insert lc_cont as the tail expression should there not be one already.

The idea then was to change vec::iter() from a function with the signature:

fn iter<T>(v: [T], f: fn(T))

to the following:

fn iter<T>(v: [T], f: fn(T) -> loop_ctl)

In other words, the function supplied to iter would be allowed to break the loop in the middle if it wanted to. Due to the default rules, this is mostly invisible, except that you can say break and cont and things work as you expect.

Unfortunately, as I think more about it, I realize that the default rules aren’t quite subtle enough. It’s only mostly invisible. For example:

vec::iter(v) {
    while cond { }
}

This would fail because while loops have a result type of (), and in this case the while loop occupies the tail expression slot. So you would have to write:

vec::iter(v) {
    while cond { }
    cont;
}

This makes me unhappy. I’m happy with smart rules but only if they really work all the time or have a consistent story. You could extend the rule to say “if the tail expression has unit type, still insert a default cont”, but now it’s starting to sound really magical.

Alternative #2:

We can keep the loop_ctl type but just not make it special. Iterable types define two methods, iter and iter_brk (not sure about those names), with signatures as shown (these are for vectors:

iface iterable<T> {
    fn iter(f: fn(T)) /* as today */
    fn iter_brk(f: fn(T) -> loop_ctl)
}

Now when you want to break, you have to end the loop explicitly with cont.

For most types, you need only define iter_brk: the iter function itself can be defined generically as shown (this assumes traits are implemented):

trait base_iter<T> {
    req fn iter_brk(f: fn(T) -> loop_ctl);

    fn iter(f: fn(A)) {
        self.iter_brk {|e|
            f(e);
            cont; 
        }
    }
}

Alternative #3:

Same as #2, but we replace the loop_ctl type with boolean. This is appealing because it’s so minimalistic. break would effectively return false and cont would return true. This makes iter_brk effectively the same as the predicate test all(), which returns true if the block returns true for all members.

Of course, if we actually used all() instead of iter_brk that’d be ok too, except that the return type of all is bool, so a semicolon would be required:

v.all {|i|
    ...
};

We could of course have both all and iter_brk (as we would in alternative #2).

My preference?

I started out liking alternative #1, but writing this blog post has more-or-less persuaded me that I prefer alternative #2. Less compiler magic is good, and compiler magic that fails is bad. Between alternatives #2 and #3, I tend to slightly prefer an explicit loop_ctl type over a boolean for a couple of reasons:

  • the types more closely reflect the intention. To me, testing whether a predicate holds on all members is not the same as interrupting a loop early.
  • you can’t use break and cont to return out of arbitrary blocks that happen to return boolean.
  • you can always write helpers like

    fn break_if(b: bool) -> loop_ctl { if b { lc_break } else { lc_cont } }
    fn cont_if(b: bool) -> loop_ctl { break_if(!b) }
    

    to convert between bool and loop_ctl when convenient.

But obviously there is no substantive difference between alternatives

2 and #3.

Cross-crate Inlining

Cross-crate inlining (CCI) refers to the ability to inline a function across crate boundaries. In Rust, a “crate” is the unit of compilation, rather than an individual file as in C or C++. A crate basically corresponds to a single library or executable, but it may contain any number of modules and source files internally. CCI is important for performance due to the ubiquitous use of small methods like vec::iter() in our source code. Such methods have proven to be a very scalable way to define iteration abstracts, but performance is currently somewhat lacking.

The major language-level issue associated with CCI is that it interferes with separate compilation. I won’t talk about this at the moment; we may choose to only inline when statically linking, or to give users some way to distinguish what can be inlined and what must not be, etc.

What I do want to look at is the best way to implement CCI in our compiler. Right now the compiler is focused on compiling one crate at a time and so a few things will have to change.

pcwalton forwarded me a partial patch which tries to separate out various parts of the compiler to generalize to multiple crates. I am not sure, though, that this is worth the effort: after all, we are still compiling one main crate, we’re just borrowing code from other crates. Furthermore, we never intend to report errors on the imported crates: they have typechecked etc, so nothing should go wrong. The only reason that we will need to know about them at all is for line number reporting within the compiler. Therefore, I am leaning now towards keeping things mostly the same, but adding files from inlined crates into the existing codemap structure where necessary.

Another question is how to make the AST available within a compiled crate; the inliner will need to reference it to produce an inlined version, after all. There are a couple of dimensions here. The first is how to serialize the AST at all—the easiest way is to use the Rust pretty printer. The best way, I think, is to write the tree out in EBML. The reason for this is that it allows to retain the spans from the original source, which will be lost by the pretty printer. It also allows us to keep the various information we keep in side-tables, such as the type associated with each node. We may nonetheless start with pretty printed source and change later, if that proves expedient.

The second dimension to the question of serializing source is at what granularity to do it. Graydon has pointed out that we should be sensitive to the compile-time impact of inlining and monomorphization, and I agree. However, we are primarily interested in the compile-time impact on debug-mode compilations, meaning that inlining would probably be disabled. Nonetheless, we should be careful about how we package up the source for three reasons: (1) monomorphization, when it occurs, will require access even in debug mode; (2) if we play our cards right, we may be able to consolidate some of the control paths we use in type checking and elsewhere (more on that in a bit); and (3) having faster compile times even with optimizations enabled never hurts.

This suggests to me that we do not want to include the source for an entire crate at a time. It seems like items are the logical level for this. I am wondering if we could encode the module structure using EBML but at each top-level item we stop and encode two things: the signature of the item as well as the source. Currently, we encode the signature using EBML, and perhaps we can just keep that path, though it may be easier to pretty-print the signature and then parse it again.

Why would that be easier, you ask? After all, the current system works, right? Well, the idea (hat tip to pcwalton here) is to consolidate some of the logic in the compiler. Currently, everytime we resolve the type of an item, we must ask “is it in the current crate or not?” If it is, we go through one path, which involves looking up the AST and other internal tables. If it is not, then we look into a crate metadata cache and—if needed—reconstruct the signature by parsing EBML. So my thought is that perhaps we can bring these paths together by filling in the AST and other information lazilly, extracting what is needed out of the crate.

Actually, the question of pretty-printing vs EBML is somewhat orthogonal, I suppose. In fact, it might be better to keep the EBML for the reasons discussed earlier (easier to reconstruct the various side table information that is required).

So, I am starting to see a high-level vision for how this might all be organized, but I don’t know these paths of the code that well, so I might turn out to be rather confused. It’s also a bit unclear to me if consolidating the paths through the compiler is important to CCI or just a nice thing to do. I’d rather get results first and work on refactoring second.

Update

It’s been a while since I wrote anything on the blog! A lot has been going on in the meantime, both in Rust, parallel JavaScript, and personally…I hate to write a big update post but I gotta’ catch up somehow!

Rust

First, we made our 0.1 release, which is great. We are now planning for 0.2. The goal is to make frequent, relatively regular releases. We’re still in such an early phase that it doesn’t seem to make sense to literally release every few months, but at the same time we don’t plan to wait long.

Lately I’ve been working on several issues, some major, some minor. The most exciting to me has been #1493, which vastly improved our performance when working with thread-local boxes. There is still work to be done on box performance, but before taking any next steps more investigation is needed. The problems may be creating type descriptors, they may be garbage collection, etc. This is also a first step to simplifying some of the scarier parts of the Rust runtime.

On a smaller note, I’ve been working on a new iteration library that I’m reasonably excited about, based on the principles from an earlier blog post. Making it nice to use will require a lighter bind syntax, which I’ve almost got complete now. This also (as a side effect) improves our type inference somewhat. One final thing I’d like to do is integrate support for break and cont into the library, though some details remain to be discussed there.

Finally, I’m beginning work now on cross-crate inlining. I plan to write a post later exploring some of the dynamics of this. It’s an interesting problem. It’s clearly important for performance, though: functions like vec::iter() are used ubiquitously and we need to be able to inline their definitions. This is also where Rust’s emphasis on static dispatch can really pay off (not that it is not helpful already at reducing dispatch costs).

Parallel JavaScript

I have been progressing on the parallelism work that occupied the last few posts. First, the parallel JavaScript work now has an official name: the PJs project. That github project is not that useful at the moment, though: I am in the process of moving the code from an older GitHub project into the new one which contains the rest of mozilla-central.

What exists today is a (very) simple multi-threaded infrastructure that will run JavaScript functions in parallel, multiplexing them over a fixed number of workers. Right now though there is no data sharing between threads. I am working on the membrane approach.

I have also had some discussions with the Rivertrail group. The Rivertrail definition of an elemental function dovetails very well with my own ideas, so it seems like the two projects could be fruitfully combined: the PJs API can be used both for task-based parallel tasks and for those instances where vectorization either fails or is not effective.

Finally, I drew up a Java-based prototype of the API as well as the static type system that I described earlier. Having this prototype gives me confidence that the approach will work, as I found it quite easy to parallelize a number of interesting examples.

Whew!

Man, I didn’t realize how much stuff has been going on. No wonder I haven’t had time for blog posts! And this isn’t even a complete list, really. But it’s everything that’s interesting, I suppose.